Tagged articles
13 articles
Page 1 of 1
AI Tech Publishing
AI Tech Publishing
Mar 30, 2026 · Industry Insights

How to Optimize Your Content for GEO and Get Cited by DeepSeek, Doubao, and ChatGPT

This guide explains what Generative Engine Optimization (GEO) is, why AI‑driven search traffic converts far better than traditional SEO, and provides concrete writing, platform‑specific, and technical steps—including robots.txt, llms.txt, and Schema markup—to make your content reliably cited by Chinese AI search engines and global models.

AI SEOChinese AIContent Optimization
0 likes · 22 min read
How to Optimize Your Content for GEO and Get Cited by DeepSeek, Doubao, and ChatGPT
Java Tech Enthusiast
Java Tech Enthusiast
Feb 26, 2026 · Fundamentals

Why the 30‑Year‑Old robots.txt Is Crumbling in the AI Era

From a 1993 accidental DoS attack that sparked the creation of robots.txt to modern AI crawlers ignoring the protocol, this article traces the history, purpose, and challenges of the robots exclusion standard and explores new proposals to adapt it for AI-driven web scraping.

AI ethicsWeb Crawlingprotocol
0 likes · 9 min read
Why the 30‑Year‑Old robots.txt Is Crumbling in the AI Era
IT Services Circle
IT Services Circle
Jan 31, 2026 · Information Security

Why the Humble robots.txt Is Facing an Existential Crisis in the AI Era

The article recounts a personal experiment that unintentionally launched a DoS attack, explains how that incident spurred the creation of the robots.txt protocol, and examines how AI‑driven data scraping, legal battles, and new licensing proposals are challenging its relevance today.

AI data scrapingWeb Crawlinginternet standards
0 likes · 10 min read
Why the Humble robots.txt Is Facing an Existential Crisis in the AI Era
Architecture and Beyond
Architecture and Beyond
Jul 1, 2023 · Industry Insights

Web Crawlers Unveiled: History, Value, and How to Tackle Their Challenges

This article traces the development of web crawlers from their 1990s origins to modern implementations, examines their multifaceted value in search, data analysis, and archiving, outlines technical, ethical, and legal challenges for both crawler creators and target sites, and presents practical strategies to mitigate malicious crawling.

Data ExtractionWeb Crawlinganti-scraping
0 likes · 24 min read
Web Crawlers Unveiled: History, Value, and How to Tackle Their Challenges
Sohu Tech Products
Sohu Tech Products
Oct 6, 2021 · Frontend Development

Front‑End SEO Technical Optimization Guide

This article presents a comprehensive front‑end SEO checklist, covering passive and active optimization techniques such as site structure, meta tags, semantic links, speed improvements, external traffic acquisition, sitemaps, robots.txt, and search‑engine‑specific configurations to help developers enhance website visibility and ranking.

Meta TagsSEOWeb Optimization
0 likes · 13 min read
Front‑End SEO Technical Optimization Guide
Python Programming Learning Circle
Python Programming Learning Circle
Jan 2, 2020 · Backend Development

How to Crawl Responsibly: Avoid Legal Risks and Server Overload

This guide outlines responsible web‑crawling practices, covering robots.txt compliance, legal pitfalls such as unauthorized personal data and copyrighted content, recommended request intervals, and relevant Chinese data‑security regulations, helping developers avoid server overloads and potential lawsuits.

Data EthicsScrapyWeb Crawling
0 likes · 4 min read
How to Crawl Responsibly: Avoid Legal Risks and Server Overload
MaGe Linux Operations
MaGe Linux Operations
Dec 25, 2019 · Backend Development

Master Web Crawling in Python: From urllib to requests and Robots.txt

This guide explains the fundamentals of web crawling, covering crawler types, the Robots.txt protocol, Python's urllib and urllib3 modules, the requests library, handling HTTP methods, user‑agents, HTTPS certificates, and practical code examples for extracting data from websites.

Pythonrequestsrobots.txt
0 likes · 18 min read
Master Web Crawling in Python: From urllib to requests and Robots.txt
21CTO
21CTO
May 22, 2019 · Fundamentals

What Is a Web Crawler? Definitions, Types, and How It Works

This article explains web crawlers—what they are, their classifications, typical use cases, and step‑by‑step workflow—covers the robots protocol, then delves into HTTP and HTTPS fundamentals, request/response structures, common methods, headers, status codes, and the security trade‑offs of HTTPS.

HTTPStatus CodesWeb Crawler
0 likes · 10 min read
What Is a Web Crawler? Definitions, Types, and How It Works
MaGe Linux Operations
MaGe Linux Operations
Dec 5, 2017 · Information Security

How to Defend Your Website Against Web Crawlers: Techniques & Tools

This article explores why web content needs protection, explains common server‑side and client‑side anti‑crawling methods—including User‑Agent checks, token cookies, headless‑browser detection, fingerprinting, captchas, and robots.txt—and offers practical guidance for raising the cost of unauthorized scraping.

Browser FingerprintingCaptchaHeadless Browser
0 likes · 12 min read
How to Defend Your Website Against Web Crawlers: Techniques & Tools
21CTO
21CTO
Nov 13, 2016 · Backend Development

How to Build a Simple PHP Web Crawler: From Robots.txt to cURL

This guide explains the fundamentals of creating a PHP web crawler, covering server communication basics, interpreting robots.txt and sitemap files, and providing practical code examples using file_get_contents and cURL for efficient content retrieval.

PHPWeb Crawlerbackend-development
0 likes · 6 min read
How to Build a Simple PHP Web Crawler: From Robots.txt to cURL