QueryList: A Modern PHP Content Scraping Library – Features, Installation, and Usage Guide
This article introduces QueryList, a modern PHP content‑scraping tool that uses CSS selectors instead of regex, explains its two versions (V3 and V4), shows how to install it via Composer, demonstrates basic crawling code and various collection methods such as flatten, take, reverse, filter, map, and multi‑request concurrency.
QueryList is a PHP content‑scraping library that adopts modern development ideas, offering concise syntax, extensibility, and CSS‑selector based extraction, which simplifies and makes code more maintainable compared with traditional regex‑based crawlers.
It provides a complete solution including DOM selection via CSS selectors, HTTP client GuzzleHTTP, content filtering, built‑in charset handling, and extensible plugins.
Two supported versions exist: V3 (requires PHP 5.3+, single file, no Composer) and V4 (requires PHP 7.1+, Composer‑based, modular, richer API). Installation via Composer:
composer require jaeger/querylist
composer require jaeger/querylist:~V4
composer config -g repo.packagist composer https://mirrors.aliyun.com/composer/Basic usage example shows loading Composer autoloader, using QL\QueryList , fetching a page, defining rules for title and link, limiting range, and printing results:
require_once('./vendor/autoload.php');
use QL\QueryList;
$data = QueryList::get('https://www.baidu.com/s?wd=QueryList', null, [
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
'Accept-Encoding' => 'gzip, deflate, br',
]
])->rules([
'title' => ['h3', 'text'],
'link' => ['h3>a', 'href']
])->range('.result')
->queryData();
print_r($data);The returned collection can be processed with methods such as flatten() , take() , reverse() , filter() , map() , and these methods can be chained for complex transformations.
$rt = $data->flatten()->all();
print_r($rt);
$rt = $data->take(2)->all();
print_r($rt);
$rt = $data->reverse()->all();
print_r($rt);
$rt = $data->filter(function($item){
return $item['image'] != '/path/to/2.jpg';
})->all();
print_r($rt);
$rt = $data->map(function($item){
$item['image'] = 'http://xxx.com' . $item['image'];
return $item;
})->all();
print_r($rt);
$rt = $data->reverse()->map(function($item){
$item['image'] = 'http://xxx.com' . $item['image'];
return $item;
})->take(2)->all();
print_r($rt);Multi‑request concurrency is supported: define URLs, rules, range, then call multiGet($urls) with concurrency, options, headers, and success/error callbacks.
use GuzzleHttp\Psr7\Response;
use QL\QueryList;
$urls = [
'https://github.com/trending/go?since=daily',
'https://github.com/trending/html?since=daily',
'https://github.com/trending/java?since=daily'
];
$rules = [
'name' => ['h3>a', 'text'],
'desc' => ['.py-1', 'text']
];
$range = '.repo-list>li';
QueryList::rules($rules)
->range($range)
->multiGet($urls)
->concurrency(2)
->withOptions(['timeout' => 60])
->withHeaders(['User-Agent' => 'QueryList'])
->success(function (QueryList $ql, Response $response, $index) {
$data = $ql->queryData();
print_r($data);
})
->error(function (QueryList $ql, $reason, $index) {
// handle error
})
->send();Official website: http://www.querylist.cc
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.