Web Scraping CBA Match Data with Java: Methodology and Full Code Example
This article explains how to scrape Chinese Basketball Association (CBA) match data from a portal website, analyzes the page structure, extracts table rows using regular expressions, converts them to CSV format, and provides a complete Java/Groovy code example for automated data collection.
The author describes a personal project to collect and analyze Chinese Basketball Association (CBA) match data by writing a web crawler.
1. Selecting the data source – The data is taken from a domestic portal's CBA section, with a link provided for reference.
2. Analyzing the data – The page is rendered on the server side, so the data cannot be fetched via an API; instead, the relevant information is located inside HTML tables.
3. Determining the approach – Using regular expressions, each row of the table is extracted, unnecessary decorations are filtered out, and commas are inserted to create CSV‑compatible records.
After filtering, the extracted CSV data looks like:
球队,第一节,第二节,第三节,第四节,总比分
广州,33,37,36,27,133
北控,23,18,17,34,92
...The author then shares the complete Java/Groovy source code used for crawling and processing:
package com.fun
import com.fun.frame.Save
import com.fun.frame.httpclient.FanLibrary
import com.fun.utils.Regex
import com.fun.utils.WriteRead
class sd extends FanLibrary {
public static void main(String[] args) {
int i = 1
def total = []
range(300, 381).forEach {x ->
total.addAll test(x)
}
Save.saveStringList(total, "total4.csv")
testOver()
}
static def test(int i) {
if (new File(LONG_Path + "${i}.csv").exists()) return WriteRead.readTxtFileByLine(LONG_Path + "${i}.csv")
String url = "http://cbadata.sports.sohu.com/game/content/2017/${i}"
def get = getHttpGet(url)
def response = getHttpResponse(get)
def string = response.getString("content").replaceAll("\\s", EMPTY)
def all = Regex.regexAll(string, "<tr.*?<\/tr>")
def list = []
all.forEach {x ->
def info = x.replaceAll("</*?tr.*?>", EMPTY).replaceAll("<\/t(d|h)>", ",")
info = info.replaceAll("<.*?>", EMPTY)
info = info.charAt(info.length() - 1) == ',' ? info.substring(0, info.length() - 1) : info
if (info.startsWith("总计")) info = "," + info
list << info
output(info)
}
Save.saveStringList(list, "${i}.csv")
return list
}
}Readers interested in the project can reply with the phrase “大爷来玩啊” to obtain the author's WeChat ID for private discussion.
The post also includes a curated list of technical articles covering topics such as Java one‑liner heart shape printing, Netdata localization, Jacoco coverage, performance testing frameworks, HTTP mind maps, Swagger to test code conversion, static blog generation, Selenium testing with JUnit, and more.
Additionally, a selection of non‑technical articles is provided, discussing software testing career choices, programming mindset, steps to become a good automation test engineer, and related subjects.
Finally, the article showcases a collection of “expert” posts on topics like cloud testing platforms, Android testing tools, CI/CD for UI automation, JVM memory, and other industry insights, accompanied by an image and a call to follow the author.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
