Using WebDriver on CentOS: Install Chrome & ChromeDriver, PHP WebDriver Example, and XPath Basics
This tutorial explains how to install Google Chrome and ChromeDriver on CentOS, use PHP‑WebDriver to scrape dynamic web pages, and introduces XPath syntax for locating elements, providing step‑by‑step commands, code samples, and execution results.
When collecting data from sites that use front‑end/back‑end separation, the page source may appear empty and APIs can be encrypted, making scraping difficult; WebDriver can render the full page, including JavaScript‑generated content, enabling data extraction.
Part 1: Install Google Chrome and ChromeDriver on CentOS
Download the Chrome RPM package:
<code>wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm</code>Install Chrome:
<code>sudo yum localinstall google-chrome-stable_current_x86_64.rpm</code>Verify the installation:
<code>google-chrome --version</code>Download the matching ChromeDriver version and extract it to /usr/local/bin :
<code>wget https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_linux64.zip
unzip chromedriver_linux64.zip -d /usr/local/bin/</code>Start the ChromeDriver service:
<code>LANGUAGE=ZH-CN.UTF-8 /usr/local/bin/chromedriver --port=9515</code>When the service runs successfully you will see output similar to:
<code>Starting ChromeDriver {version} on port 9515...
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.</code>Part 2: PHP Implementation
Install the PHP WebDriver library via Composer:
<code>composer require php-webdriver/webdriver</code>Example PHP script that launches ChromeDriver, navigates to a Bilibili page, retrieves the page source, and prints it:
<code><?php
require_once('vendor/autoload.php');
use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;
$options = new ChromeOptions();
$options->addArguments(['--no-sandbox','--headless']);
$capabilities = DesiredCapabilities::chrome();
$capabilities->setCapability(ChromeOptions::CAPABILITY, $options);
$host = 'http://localhost:9555';
$driver = RemoteWebDriver::create($host, $capabilities);
$url = 'https://www.bilibili.com/movie/index/?from_spmid=666.7.index.1#st=2&style_id=10104&area=-1&release_date=-1&season_status=-1&order=2&sort=0&page=1';
$driver->get($url);
$pageSource = $driver->getPageSource();
echo $pageSource;
$driver->quit();
?>
</code>The script successfully obtains the rendered HTML source of the target page.
Part 3: Introduction to XPath
XPath is a language for locating nodes in XML/HTML documents; it is essential for element selection in PHP WebDriver. Common XPath syntaxes include:
//tagname – select all elements with the given tag.
//tagname[@attribute='value'] – select elements with a specific attribute value.
//tagname[text()='text'] – select elements containing exact text.
/parent/child – select a direct child of a parent node.
/parent//descendant – select all descendants under a parent.
/preceding-sibling::sibling and /following-sibling::sibling – select sibling nodes before or after the current node.
Logical operators: and , or , not() for combining conditions.
Wildcards: * matches any element, @* matches any attribute.
Example PHP code using XPath to locate an input element and send keys:
<code><?php
$driver = RemoteWebDriver::create($host, $capabilities);
$driver->get('https://www.example.com/');
$element = $driver->findElement(WebDriverBy::xpath("//input[@name='username']"));
$element->sendKeys('admin');
$driver->quit();
?>
</code>This demonstrates how $driver->findElement() together with WebDriverBy::xpath() can locate elements such as <input name='username'> and interact with them.
XPath provides a powerful and flexible way to craft expressions tailored to the structure and attributes of the target page.
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.