Java Selenium Script for Scraping Jokes from Neihanshequ and Exporting to Excel
The author shares a Java Selenium program that navigates the Neihanshequ website, extracts jokes, stores them in a list, and writes the collected data to an Excel file, while also discussing challenges such as content approval and rate‑limit restrictions.
The author needed to enrich a Turing robot's knowledge base with jokes and attempted to crawl content from the web; after facing approval failures and rate‑limit issues with the crawler, they decided to share a Java Selenium script that performs the scraping and exports the results to Excel.
Below is the complete Java code used for the task:
package wepractice;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import selenium.Library;
import selenium.Excel;
public class NeiHanjokes extends Library{
public static void main(String[] args){
Library library = new Library();
List<String[]> sheet = new ArrayList<String[]>();
Map<Integer, List<String[]>> dateJoke = new HashMap<Integer, List<String[]>>();
driver.get("http://neihanshequ.com/");
String home = driver.getWindowHandle();
library.findElementByXpathAndClick(".//*[@id='detail-list']/li[1]/div/div[2]/a/div/h1/p");
Set<String> handles = driver.getWindowHandles();
for(String handle : handles){
if (!handle.equals(home)) {
driver.switchTo().window(handle);
}
}
for(int i = 0; i < 15; i++) {
library.output(i);
String joke = library.getTextByXpath("html/body/div[3]/div[1]/div/ul/li[1]/div/div[2]/a/div/h1/p");
String[] jokes = new String[1];
jokes[0] = joke;
sheet.add(jokes);
library.findElementByIdAndClick("prevGroupLink");
}
dateJoke.put(1, sheet);
Excel excel = new Excel();
excel.writeXlsx(dateJoke);
driver.close();
for(String handle : handles){
if(handle.equals(home)){
driver.switchTo().window(handle);
}
}
driver.quit();
}
}The article also lists a collection of technical and non‑technical reference links for further reading.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
