Backend Development 7 min read

QueryList: A Modern PHP Content Scraping Library – Features, Installation, and Usage Guide

This article introduces QueryList, a modern PHP content‑scraping tool that uses CSS selectors instead of regex, explains its two versions (V3 and V4), shows how to install it via Composer, demonstrates basic crawling code and various collection methods such as flatten, take, reverse, filter, map, and multi‑request concurrency.

Laravel Tech Community

Apr 2, 2023

QueryList: A Modern PHP Content Scraping Library – Features, Installation, and Usage Guide

QueryList is a PHP content‑scraping library that adopts modern development ideas, offering concise syntax, extensibility, and CSS‑selector based extraction, which simplifies and makes code more maintainable compared with traditional regex‑based crawlers.

It provides a complete solution including DOM selection via CSS selectors, HTTP client GuzzleHTTP, content filtering, built‑in charset handling, and extensible plugins.

Two supported versions exist: V3 (requires PHP 5.3+, single file, no Composer) and V4 (requires PHP 7.1+, Composer‑based, modular, richer API). Installation via Composer:

composer require jaeger/querylist</code>
<code>composer require jaeger/querylist:~V4</code>
<code>composer config -g repo.packagist composer https://mirrors.aliyun.com/composer/

Basic usage example shows loading Composer autoloader, using QL\QueryList, fetching a page, defining rules for title and link, limiting range, and printing results:

require_once('./vendor/autoload.php');</code>
<code>use QL\QueryList;</code>
<code>$data = QueryList::get('https://www.baidu.com/s?wd=QueryList', null, [</code>
<code>    'headers' => [
        'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
        'Accept-Encoding' => 'gzip, deflate, br',
    ]
])->rules([
    'title' => ['h3', 'text'],
    'link' => ['h3>a', 'href']
])->range('.result')
    ->queryData();
print_r($data);

The returned collection can be processed with methods such as flatten(), take(), reverse(), filter(), map(), and these methods can be chained for complex transformations.

$rt = $data->flatten()->all();
print_r($rt);

$rt = $data->take(2)->all();
print_r($rt);

$rt = $data->reverse()->all();
print_r($rt);

$rt = $data->filter(function($item){
    return $item['image'] != '/path/to/2.jpg';
})->all();
print_r($rt);

$rt = $data->map(function($item){
    $item['image'] = 'http://xxx.com' . $item['image'];
    return $item;
})->all();
print_r($rt);

$rt = $data->reverse()->map(function($item){
    $item['image'] = 'http://xxx.com' . $item['image'];
    return $item;
})->take(2)->all();
print_r($rt);

Multi‑request concurrency is supported: define URLs, rules, range, then call multiGet($urls) with concurrency, options, headers, and success/error callbacks.

use GuzzleHttp\Psr7\Response;
use QL\QueryList;

$urls = [
    'https://github.com/trending/go?since=daily',
    'https://github.com/trending/html?since=daily',
    'https://github.com/trending/java?since=daily'
];

$rules = [
    'name' => ['h3>a', 'text'],
    'desc' => ['.py-1', 'text']
];
$range = '.repo-list>li';

QueryList::rules($rules)
    ->range($range)
    ->multiGet($urls)
    ->concurrency(2)
    ->withOptions(['timeout' => 60])
    ->withHeaders(['User-Agent' => 'QueryList'])
    ->success(function (QueryList $ql, Response $response, $index) {
        $data = $ql->queryData();
        print_r($data);
    })
    ->error(function (QueryList $ql, $reason, $index) {
        // handle error
    })
    ->send();

Official website: http://www.querylist.cc

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data-processing Web Scraping Content Extraction querylist

Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.