How to Efficiently Fetch Hundreds of URL Titles in PHP Using Multi‑Process Design
This article explores two PHP multi‑process architectures for rapidly retrieving page titles from hundreds of URLs, comparing a simple socket‑based server with a threaded URL‑batch processor, and discusses performance trade‑offs, memory usage, and scalability considerations.
Although PHP rarely uses multi‑process programming, certain scenarios—such as extracting titles from hundreds of submitted URLs—require it. Users expect a response within about 10 seconds, yet fetching a single title can range from 0.1 s to several seconds, making a single‑threaded approach insufficient.
First design: Deploy a lightweight server (see reference) that receives a URL, reads the response in 128‑byte chunks, and stops once the <title> tag is found, conserving bandwidth. The client opens up to 100 sockets to this server; if more URLs exist, the process repeats. This method handles fast sites like google.com 100 times in roughly 1 s, but occasional blocking (≈1 s) occurs when too many connections are opened simultaneously.
The approach suffers from high TCP connection overhead, substantial memory consumption for client and server buffers, and potential scalability problems when many users access the service.
In PHP, multi‑process programming is typically needed for extensive network‑bound operations or CPU‑intensive tasks that can be divided across multiple cores.
Second design: Reuse the same server, but modify it to accept up to 100 URLs at once, spawn 100 sub‑threads to download titles concurrently, then merge and return the results. This version dramatically improves efficiency and stability, fetching 100 Google titles in about 0.7 s and rarely exceeding 1 s, compared to the first design’s occasional >5 s delays (≈20% probability).
However, this solution remains simple and may not scale for heavy traffic. Enterprise‑level PHP applications must carefully manage memory; large arrays can quickly exhaust resources, and handling tens of thousands of records often requires C extensions to avoid consuming hundreds of megabytes per request.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
