Backend Development 5 min read

Using PHP and Selenium WebDriver for Browser-Based Web Scraping

This article explains how to install php-webdriver via Composer, set up a Selenium WebDriver instance in PHP, and write a script that automates a Chrome browser to scrape search results from Baidu, demonstrating key WebDriver APIs for element interaction and data extraction.

php中文网 Courses
php中文网 Courses
php中文网 Courses
Using PHP and Selenium WebDriver for Browser-Based Web Scraping

With the rapid growth of online data, manually collecting information becomes inefficient, so automated scripts are increasingly popular; this guide shows how to use PHP together with WebDriver technology to simulate a browser for web scraping.

WebDriver is an open‑source project that provides APIs to control browsers, enabling automated testing and data mining; Selenium is a concrete implementation, and the PHP binding php-webdriver allows PHP developers to drive browsers.

First, install php-webdriver using Composer:

<code>composer require php-webdriver/webdriver</code>

Then include the library in your PHP code:

<code>require_once 'vendor/autoload.php';
use Facebook\WebDriver;
use Facebook\WebDriver\Remote;
use Facebook\WebDriver\WebDriverBy;</code>

Below is a simple example that scrapes Baidu (https://www.baidu.com) to retrieve the search box, button, and result titles.

Create a WebDriver instance pointing to a Selenium Server (Chrome is used here):

<code>$host = 'http://localhost:4444/wd/hub'; // Selenium Server address
$driver = Remote\RemoteWebDriver::create($host, Remote\DesiredCapabilities::chrome());</code>

Open the target website:

<code>$driver->get("https://www.baidu.com");</code>

Locate the search box and button by their element IDs, input a keyword, and click the button:

<code>$search_box = $driver->findElement(WebDriverBy::id('kw'));
$search_box->sendKeys('WebDriver');
$search_button = $driver->findElement(WebDriverBy::id('su'));
$search_button->click();</code>

Collect the titles of the search results using a CSS selector and output them:

<code>$results = $driver->findElements(WebDriverBy::cssSelector('#content_left .result .t a'));
foreach ($results as $result) {
    echo $result->getText() . PHP_EOL; // print title text
}</code>

The complete script, including cleanup, is as follows:

<code>require_once 'vendor/autoload.php';
use Facebook\WebDriver;
use Facebook\WebDriver\Remote;
use Facebook\WebDriver\WebDriverBy;
$host = 'http://localhost:4444/wd/hub'; // Selenium Server address
$driver = Remote\RemoteWebDriver::create($host, Remote\DesiredCapabilities::chrome());
$driver->get("https://www.baidu.com");
$search_box = $driver->findElement(WebDriverBy::id('kw'));
$search_box->sendKeys('WebDriver');
$search_button = $driver->findElement(WebDriverBy::id('su'));
$search_button->click();
$results = $driver->findElements(WebDriverBy::cssSelector('#content_left .result .t a'));
foreach ($results as $result) {
    echo $result->getText() . PHP_EOL;
}
$driver->quit();</code>

In summary, using WebDriver with PHP enables powerful browser automation for tasks such as data collection and testing; mastering its API along with an understanding of browser behavior and page structure is essential for building robust, efficient scripts.

BackendautomationPHPWeb ScrapingSeleniumwebdriver
php中文网 Courses
Written by

php中文网 Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.