Understanding HTTP: Origins, TCP/IP Foundations, Connection Process, Requests, Responses, and Connection Teardown
This comprehensive article explains the origins of HTTP, its relationship with the TCP/IP protocol suite, the three‑way handshake for establishing connections, the structure of HTTP request and response messages, status codes, long and short connections, and the four‑step termination process, providing essential knowledge for web development and networking interviews.
1. Introduction
Web crawlers, also known as web spiders, require an understanding of the network. A network consists of nodes and links; interconnected networks form the Internet. HTTP (HyperText Transfer Protocol) is the most widely used application protocol on the Internet, standardized by the World Wide Web Consortium (W3C).
1. Origin
In 1991 Tim Berners‑Lee launched the first website at CERN, introducing the concepts of HTTP, HTML, URL, web browsers, and web servers, which laid the foundation for the modern web.
Proposed HTTP, allowing users to access resources via hyperlinks.
Proposed HTML as the standard markup language for web pages.
Created the Uniform Resource Locator (URL) as the address system.
Created the first web browser, which also acted as a web editor.
Created the first web server (http://info.cern.ch) and the first web page describing the project.
2. Characteristics
HTTP has five main characteristics:
Supports a client/server model.
Simple and fast: a client only needs to send the request method and path.
Flexible: any type of data object can be transferred, identified by the Content‑Type header.
Connectionless: each request is handled on a separate connection.
Stateless: the protocol does not retain session information; mechanisms such as cookies and sessions are built on top of it.
2. TCP/IP Protocol
HTTP relies on the TCP/IP protocol suite. The transport layer uses TCP, while the network layer uses IP (among many other protocols).
For example, the ping utility uses the ICMP protocol, which explains why a VPS may have internet access but fail to ping Google.
The encapsulation process works layer by layer: data is wrapped with headers at each layer on the sending side and unwrapped in reverse order on the receiving side.
3. Establishing a TCP Connection
Understanding the TCP packet header is essential because HTTP communication occurs over a TCP connection.
1. TCP Header Information
A TCP segment consists of a header and a data payload. The header contains six control flags that represent the state of the connection: URG, ACK, PSH, RST, SYN, and FIN.
2. Connection Establishment Process
The three‑way handshake proceeds as follows:
Client sends a packet with SYN=1 and a random sequence number.
Server replies with SYN=1, ACK=1, acknowledges the client’s sequence number + 1, and provides its own random sequence number.
Client acknowledges the server’s sequence number + 1 with ACK=1, completing the handshake.
Interview question: Why does establishing an HTTP connection require three handshakes? Answer: Three is the minimum number that ensures reliability; two would be unsafe, and four would waste resources.
4. Client Request
After the TCP connection is established, the client can send an HTTP request.
1. HTTP Request Message Structure
The request consists of a start line, headers, a blank line, and an optional body.
2. Example of an HTTP Request
The request line includes the method (e.g., GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, TRACE), the URL, and the HTTP version. Headers such as User-Agent and Referer convey client information and the origin of the request.
5. Server Response
The server replies with an HTTP response that mirrors the request structure.
1. HTTP Response Message Structure
The response includes a status line, headers, a blank line, and an optional body.
2. Example of an HTTP Response
Key elements are the status code (e.g., 200, 404, 500) and the reason phrase. Status codes are grouped into informational, success, redirection, client error, and server error categories.
6. Closing the Connection
After the response, the connection may be closed depending on the HTTP version.
1. Persistent vs. Non‑Persistent Connections
In HTTP/1.0, the connection is closed after each request/response (short connection). HTTP/1.1 introduced persistent connections (long connection) using the Connection: keep-alive header, allowing multiple requests over the same TCP connection.
2. Advantages and Disadvantages of Persistent Connections
Advantages: reduces latency for multiple static resources. Disadvantages: idle connections consume server resources.
3. Connection Teardown Process
Closing a TCP connection requires a four‑step handshake (four‑way handshake) using the FIN flag to indicate termination from each side.
7. Additional Topics
1. Common Interview Questions
Why does establishing a connection need three handshakes while closing it needs four?
2. HTTP/2.0
Although HTTP/1.1 served for about 20 years, HTTP/2.0 was released in 2015 and offers features such as multiplexing, header compression, and server push.
3. HTTP vs. RPC
Due to HTTP’s relatively high latency and large headers, many microservice architectures prefer RPC mechanisms for internal communication.
4. HTTP vs. HTTPS
HTTP transmits data in plaintext and cannot guarantee integrity, which is why HTTPS is increasingly adopted to provide encryption and integrity verification.
【End】
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.