Groovy Script for Crawling and Downloading QR Code Images Using HTTP and Regex
This article demonstrates a Groovy script that extracts QR‑code image URLs from a web page using regular expressions, then downloads each image to a local directory, illustrating practical web‑scraping techniques and reusable utility methods for HTTP requests and file handling.
Previously I wrote a Java & Groovy file‑download comparison article that focused on verifying downloads via images; building on that, I created a small crawler to fetch QR‑code image assets.
The idea is the same: retrieve the homepage, extract the image URL links with a regular expression, and download them locally.
Script
package com.funtester.groovy
import com.funtester.httpclient.FunLibrary
import com.funtester.utils.FileUtil
import com.funtester.utils.RWUtil
import com.funtester.utils.Regex
import java.util.stream.Collectors
class FunTester extends FunLibrary {
static void main(String[] args) {
String url = "https://kt.fkw.com/muban/word-7502-0-0-0-0-0-0.html"
def get = getHttpGet(url)
def response = getHttpResponse(get)
def s = response.getString(RESPONSE_CONTENT).replaceAll("\\s", EMPTY)
def urls = (Regex.regexAll(s, "//kt\\.fkw\\.com/tupian/\\w{8}.html") as Set) as List
def collect = urls.stream().map { x -> "https:" + x }.collect(Collectors.toList())
output(collect)
collect.each { downPic(it) }
}
/**
* Download image
* @param picurl
* @return
*/
static def downPic(String picurl) {
def get1 = getHttpGet(picurl)
def response1 = getHttpResponse(get1)
def pic = response1.getString(RESPONSE_CONTENT).replaceAll("\\s", EMPTY)
def all = "https:" + Regex.findFirst(pic, "//1\\.s91i\\.faiusr\\.com/\\d/.+?\\.png")
def tuple = FileUtil.handlePicName(all)
RWUtil.down(tuple.first, LONG_Path + "pic/" + tuple.second)
}
}Regular expressions prove extremely handy; spending a little time mastering their basics makes tasks like this very convenient.
Below is a helper method that returns the first match of a given regex:
Utility Method
/**
* Get the first matching object
*
* @param text
* @param regex
* @return
*/
public static String findFirst(String text, String regex) {
Matcher matcher = matcher(text, regex);
if (matcher.find()) return matcher.group();
return EMPTY;
}Console Output
INFO-> 当前用户:fv,IP:192.168.0.103,工作目录:/Users/fv/Documents/workspace/funtester/,系统编码格式:UTF-8,系统Mac OS X版本:10.16
WARN-> 响应体非json格式,已经自动转换成json格式!
INFO-> 请求uri:https://kt.fkw.com/muban/word-7502-0-0-0-0-0-0.html,耗时:2058 ms,
INFO-> 第1个:https://kt.fkw.com/tupian/g0a7Z5l6.html
……此处省略N条日志……
INFO-> 第50个:https://kt.fkw.com/tupian/2icutZh7.html
INFO-> 第51个:https://kt.fkw.com/tupian/2icutZh5.html
WARN-> 响应体非json格式,已经自动转换成json格式!
INFO-> 请求uri:https://kt.fkw.com/tupian/g0a7Z5l6.html,耗时:1790 ms,
INFO-> 下载链接:https://1.s91i.faiusr.com/4/AFsIABAEGAAgq8-27AUohsvXxgMwhAc49AM!800x800.png,存储文件名:/Users/fv/Documents/workspace/funtester/long/pic/AFsIABAEGAAgq8-27AUohsvXxgMwhAc49AM!800x800.png
Process finished with exit code 130 (interrupted by signal 2: SIGINT)If you are interested, you can inspect the page structure yourself and try crawling with frameworks such as Selenium.
FunTester is a Tencent Cloud Community‑selected author, a non‑famous test developer; feel free to follow.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
