HTTP Overview by dokuDoku system design 🔍 ## What is HTTP? HTTP (HyperText Transfer Protocol) is a protocol for fetching resources such as HTML documents. It's the foundation of any data exchange on the web and is a client server protocol. This means that requests are initiated by the client and responses are given by the server. Complete documents (website pages) can be constructed from resources such as text, layout instructions (CSS), images, videos, scripts (JavaScript), etc. Clients and Servers communicate through exchanging individual messages. Messages sent by the client to the server are defined as requests, and messages sent to the server from the client are defined as responses. HTTP was designed in the early 1990's as an extensible protocol. It is an application layer protocol (Layer 7, see [OSI Model Blog Post](https://doku88-github-io.onrender.com/blog/osi-model)) that is sent over TCP, or a TLS-encrypted TCP connection. Since it's an application layer protocol, theoretically any reliable transport protocol could be used. TCP is typically chosen over UDP due to its transport protocol guarantees. Due to its extensibility, HTTP can not only fetch hypertext documents, but also images, videos, or post content to servers. HTTP can also be used to fetch part of documents to update Webpages, one common way being through JavaScript scripts in websites. ## Components of HTTP Systems Between the client and the server there are typically numerous machines connecting them together making up the web. These machines are collectively labeled as proxies and include gateways and caches. HTTP is at Layer 7 of the OSI model (Application layer) and many computers between the client and the server are effectively in the network and transport layers (layers 3 & 4 respectively). The network layer handles TCP connections, so these effectively forward TCP packets with guarantees. Note that for HTTP protocol that the underlying layers are largely abstracted away, in accordance with the OSI model. Each individual request is sent to a server, which sends back an answer called a response. ### Client The client typically acts through the web-browser to request data from the web-server. This can also be a bot crawler (web search engine cataloger). The client **always** initiates the request, and it it typically never the server who initiates a request. There are some exceptions but this is the common pattern. The steps for a client to get and display a webpage are: 1. Client sends request to fetch HTML document that represents the page 2. Parse file and make additional requests corresponding to execution scripts, layout information (CSS) and sub-resources contained within page. This can include images and videos. 3. Client combines these resources to present the complete document, the webpage 4. Scripts executed by client (usually Javascript) can fetch more subsequent resources To note, webpages are hypertext documents and some parts of displayed content are links. ### Server The web server serves documents and information as requested by the client. It typically appears as a single machine virtually. However, it can be a collection of machines such as an API gateway or a load balancer that a client will interact with. ### Proxies Proxies are computers that are between the client and the server. Most of these, operate at the transport, network, or physical layers (OSI Layers 4,3, & 1 respectively). Those operating at the transport layer are generally called **proxies**. These can be transparent and forward requests without alteration. Conversely, they can be non-transparent and edit the request before forwarding it. Typically proxies can perform functions such as: - Caching - Filtering (anti-virus, parental controls, etc.) - Load Balancing - Authentication - Logging ## Basic Aspects of HTTP The following are some important notes about HTTP and diving into the protocol. HTTP is designed to be human readable and extensible with new functionality being introduced between a client and a server easily. HTTP is stateless but not sessionless. What does this mean? This meanas that there's no link between two requests that are successively carried out on the same connection. However, HTTP cookies allow the use of stateful sessions. Using header extensibility, HTTP cookies have been added, enabling session creation on each HTTP request to share the same context & state. For HTTP, the connections are controlled at OSI Layer 4 (Transport Layer) typically by TCP. This is out of scope for HTTP (Layer 7 Application protocol). However, HTTP only requires that the connection is reliable, which is why TCP is typically chosen, especially over UDP. ### HTTP/1.0 & HTTP/1.1 For HTTP/1.0, the default behavior was to open a separate TCP connection for each HTTP request/response pair. As you can imagine this was not as efficient as leaving the same TCP connection open, especially since handshaking is needed to establish the connection and close it. In response to this, HTTP/1.1 introduced *pipelineing* and *persistent connections* (i.e. keep-alive or connection reuse). Persistent connections means that the underlying TCP connection can be partially controlled using the Connection header. With a persistent connection, a single TCP connection can be reused to send and receive multiple HTTP requests and responses, instead of closing the connection after each request/response pair. So given that the client and the server sent a request/response pair on a TCP connection, the next request from the same client reuses the same connection, so no *new TCP handshake* is needed. Servers would typically add a `Keep-Alive` header to tell the client how long they plan to keep the idle connection open with something like: `Keep-Alive: timeout=5, max=100` Similarly, HTTP/1.1 pipelining, as mentioned, was an optional feature that builds directly on persistent connections. It enables a client to send multiple HTTP requests back to back over the same persistent TCP connection, without having to wait for the previous response to arrive first. In theory this is great for reducing round trip time waste, especially for high latency links. However, almost no one uses it today. The primary disadvantage is that responses must be sent in the request order, so one slow resource can stall all the responses. Furthermore, many HTTP/1.1 server mishandled pipelined requests, so clients were unable to safely enable it. ### HTTP/2 HTTP/2 introduced **multiplexing** which is a key feature that made HTTP/2 significantly faster than HTTP/1.1, especially for pages with dozens of resources including CSS, JavaScript, images, API calls, etc. Multiplexing enables multiple HTTP requests and responses to be sent and received simultaneously over a single TCP connection, without forcing them into strict order. This eliminates the blocking issue with pipelining. Unlike HTTP/1.1 and HTTP/1, HTTP/2 is a binary protocol and breaks everything down into small frames. Note that the semantics of each message is unchanged though and the client basically reconstitutes the original HTTP/1.1 request even with HTTP/2, which is why it's still worthwhile to study HTTP/1 to understand HTTP/2. Some definitional terms, a *stream* is a logical sequence of frames that belong to one HTTP request/response pair. Each stream gets a *Stream ID*, which has odd numbers from client and even from server. Frames from different streams can be interleaved on the wire. An example flow for loading page is as follows: 1. Client opens a TCP connection 2. Client sends frames for request #1, #2, #3, #4, etc... all at once or quickly in succession 3. Server processes them 4. Server sends back Data frames for each stream in arbitrary order 5. Client receives interleaved frames, uses StreamID to reassemble responses Some key technical details include - Different frame types (HEADERS, DATA, PRIORITY, RST_STREAM, SETTINGS, PUSH_PROMISE, etc.) - Flow control are per stream windows that prevent greedy streams from starving others - Stream priority/dependency tree tell server what should be prioritized - HPACK header compression compresses headers for less overhead HTTP/2 was important for webdevelopment (2015 - 2025) since browsers stopped using 6-8 parallel connections per domain to workaround these issues, and less parallelization overhead was needed. Additionally, lower connection count means less server memory, means fewer TLS handshakes and better CDN scaling (save money!). There's still some TCP hold blocking, where packet loss stalls the whole connection, but that's why we now have HTTP/3 + QUIC. ## HTTP Flow This is the typical HTTP flow for a request/response pair: 1. Open a TCP connection - Used to send request or several - Client may open a new connection, reuse existing, or open several TCP connections 2. Send an HTTP request - HTTP (before HTTP/2) are human readable ```HTML GET / HTTP/1.1 Host: name.website.com Accept-Language: fr ``` 3. Read the response sent by the server ```HTML HTTP/1.1 200 OK Date: Thurs, 19 Feb 2026 14:28:02 GMT Server: Apache Last-Modified: Thurs, 19 Feb 2026 12:18:22 GMT ETag: "51142bc1-7449-479b075b2891b" Accept-Ranges: bytes Content-Length: 29769 Content-Type: text/html <!doctype html>… (here come the 29769 bytes of the requested web page) ``` 4. Close or reuse TCP connection ## HTTP Messages These are human readable as defined in HTTP/1.1 and HTTP/1. However in HTTP/2 these messages are embedded into a frame (binary structure), which enables optimizations like compression of headers and multiplexing, but not human readable. The semantics of each message is unchanged though and the client reconstitutes the original HTTP/1.1 request largely. Thus it's useful to study the HTTP/1.1 format. There are two types of HTTP Messages: **Requests** and **Responses** ### Requests Given the following HTTP request message, we will describe its attributes: ```HTML GET / HTTP/1.1 Host: name.website.com Accept-Language: fr ``` Attributes: 1. HTTP Method: **GET** - can also be: GET, POST, UPDATE, DELETE, etc. 2. Path of URL to fetch: **/** - can be any path, for example: /blog/http-overview 👀 3. Version of HTTP protocol: **HTTP/1.1** 4. Headers: **Host: name.website.com** and **Accept-Language: fr** - Headers are optional and are extensible 5. Body: None here - Typically JSON for methods like POST, containing resources sent ### Responses Given the following HTTP response message, we will describe its attributes: ```HTML HTTP/1.1 200 OK Date: Thurs, 19 Feb 2026 14:28:02 GMT Server: Apache Last-Modified: Thurs, 19 Feb 2026 12:18:22 GMT ETag: "51142bc1-7449-479b075b2891b" Accept-Ranges: bytes Content-Length: 29769 Content-Type: text/html <!doctype html>… (here come the 29769 bytes of the requested web page) ``` Attributes: 1. Protocol Version: **HTTP/1.1** 2. Status Code: **200** - There can be many, famously *404 Not Found* is an example - In general: 200's success, 300's redirects, 400's error/request denied 3. Status Message: **OK** - In accordance with code 4. Headers: **Date: Thurs, 19 Feb 2026 14:28:02 GMT**, **Server: Apache**... **Content-Type: text/html** - Easily extensible for communication between server and client 5. Body: **<!doctype html>…** - Typically JSON - Contains fetched resource, including potentially a HTML page ### HTTP Status Codes The status code of a response is a three digit integer code that describes the result of the request and the semantics of the response. This includes wif the request was successful and what/if content is enclosed. All valid status codes are between 100 to 599. The first digit of the status code defines the class of response. The last two digits don't have any specific categorization role. For the first digit: - 1xx (Informational) - Request received, continuing process - 2xx (Successful) - Request received, understood, and accepted - 3xx (Redirection) - Further action needs to be taken to accept request - 4xx (Client Error) - Request contains bad syntax, invalid permissions, or cannot be fulfilled - 5xx (Server Error) - Server failed to fulfill a valid request These codes are designed to be extensible, however the class of the status code (first digit) remains constant. Some highlights of important status codes are: - 200 OK - Basic success - 201 Created - Resource successfully created (typically after POST) - 204 No Content - Success, nothing to send back - 301 Moved Permanently - Resource moved forever, update links - 302 Found - Requested resource has been temporarily moved to URL in location header - 400 Bad Request - 401 Unauthorized - 403 Forbidden - Permissions issue - 404 Not Found - 500 Internal Server Error - 502 Bad Gateway - Server acting as proxy got bad response from upstream ### HTTP Methods There are some common HTTP methods that one should know for HTTP requests. This largely has to do with either retrieving or sending data to the server. - GET - Retrieves resource, idempotent - POST - Creates new resource, not idempotent - PUT - Replaces/updates resource, idempotent - PATCH - Partially updates a resource, can be idempotent - DELETE - Removes resource, can be idempotent ## HTTP vs HTTPS HTTP (HyperText Transfer Protocol) versus HTTPS (HyperText Transfer Protocl *Secure*) are pretty similar, but have a slight difference. HTTPS is secure, meaning that instead of sending data in plaintext (like HTTP) it uses **TLS (Transport Layer Security)** to have data be encrypted in transit. With HTTPS only the client and server can read the data in transit, unlike with HTTP where all intermediary machines can read it. This has big implications for data confidentiality and integrity. Also to note, the default port for HTTP is Port 80, unlike for HTTPS it's 443, with respective url prefixes `http://` versus `https://`. The overhead incurred by HTTPS is negligible. TLS (Transport Layer Security) is a modern cryptographic protocol, that's the successor to SSL (Secure Sockets Layer). A high level over view is that TLS is between the application layer (HTTP) and the transport layer (TCP). TLS works in two phases: 1. TLS Handshake - Negotiate security parameters & establish shared secret keys 2. Record Protocol - Encrypt/decrypt application data (HTTP requests/responses) using these keys The TLS Handshake works as follows: 1. ClientHello (sent in clear text by client) - Signals that he supports TLS 1.3 with supplementary information including what cypher to use 2. ServerHello (encrypted) - Server picks cypher, sends public key and server random - Client can independently compute their same shared secret - Additional certificates and finish message 3. Client Finished - Client verifies server certificate and computes shared secret - Sends finished message acknowledging its ready to communicate ## What is HTTP? HTTP (HyperText Transfer Protocol) is a protocol for fetching resources such as HTML documents. It's the foundation of any data exchange on the web and is a client server protocol. This means that requests are initiated by the client and responses are given by the server. Complete documents (website pages) can be constructed from resources such as text, layout instructions (CSS), images, videos, scripts (JavaScript), etc. Clients and Servers communicate through exchanging individual messages. Messages sent by the client to the server are defined as requests, and messages sent to the server from the client are defined as responses. HTTP was designed in the early 1990's as an extensible protocol. It is an application layer protocol (Layer 7, see [OSI Model Blog Post](https://doku88-github-io.onrender.com/blog/osi-model)) that is sent over TCP, or a TLS-encrypted TCP connection. Since it's an application layer protocol, theoretically any reliable transport protocol could be used. TCP is typically chosen over UDP due to its transport protocol guarantees. Due to its extensibility, HTTP can not only fetch hypertext documents, but also images, videos, or post content to servers. HTTP can also be used to fetch part of documents to update Webpages, one common way being through JavaScript scripts in websites. ## Components of HTTP Systems Between the client and the server there are typically numerous machines connecting them together making up the web. These machines are collectively labeled as proxies and include gateways and caches. HTTP is at Layer 7 of the OSI model (Application layer) and many computers between the client and the server are effectively in the network and transport layers (layers 3 & 4 respectively). The network layer handles TCP connections, so these effectively forward TCP packets with guarantees. Note that for HTTP protocol that the underlying layers are largely abstracted away, in accordance with the OSI model. Each individual request is sent to a server, which sends back an answer called a response. ### Client The client typically acts through the web-browser to request data from the web-server. This can also be a bot crawler (web search engine cataloger). The client **always** initiates the request, and it it typically never the server who initiates a request. There are some exceptions but this is the common pattern. The steps for a client to get and display a webpage are: 1. Client sends request to fetch HTML document that represents the page 2. Parse file and make additional requests corresponding to execution scripts, layout information (CSS) and sub-resources contained within page. This can include images and videos. 3. Client combines these resources to present the complete document, the webpage 4. Scripts executed by client (usually Javascript) can fetch more subsequent resources To note, webpages are hypertext documents and some parts of displayed content are links. ### Server The web server serves documents and information as requested by the client. It typically appears as a single machine virtually. However, it can be a collection of machines such as an API gateway or a load balancer that a client will interact with. ### Proxies Proxies are computers that are between the client and the server. Most of these, operate at the transport, network, or physical layers (OSI Layers 4,3, & 1 respectively). Those operating at the transport layer are generally called **proxies**. These can be transparent and forward requests without alteration. Conversely, they can be non-transparent and edit the request before forwarding it. Typically proxies can perform functions such as: - Caching - Filtering (anti-virus, parental controls, etc.) - Load Balancing - Authentication - Logging ## Basic Aspects of HTTP The following are some important notes about HTTP and diving into the protocol. HTTP is designed to be human readable and extensible with new functionality being introduced between a client and a server easily. HTTP is stateless but not sessionless. What does this mean? This meanas that there's no link between two requests that are successively carried out on the same connection. However, HTTP cookies allow the use of stateful sessions. Using header extensibility, HTTP cookies have been added, enabling session creation on each HTTP request to share the same context & state. For HTTP, the connections are controlled at OSI Layer 4 (Transport Layer) typically by TCP. This is out of scope for HTTP (Layer 7 Application protocol). However, HTTP only requires that the connection is reliable, which is why TCP is typically chosen, especially over UDP. ### HTTP/1.0 & HTTP/1.1 For HTTP/1.0, the default behavior was to open a separate TCP connection for each HTTP request/response pair. As you can imagine this was not as efficient as leaving the same TCP connection open, especially since handshaking is needed to establish the connection and close it. In response to this, HTTP/1.1 introduced *pipelineing* and *persistent connections* (i.e. keep-alive or connection reuse). Persistent connections means that the underlying TCP connection can be partially controlled using the Connection header. With a persistent connection, a single TCP connection can be reused to send and receive multiple HTTP requests and responses, instead of closing the connection after each request/response pair. So given that the client and the server sent a request/response pair on a TCP connection, the next request from the same client reuses the same connection, so no *new TCP handshake* is needed. Servers would typically add a `Keep-Alive` header to tell the client how long they plan to keep the idle connection open with something like: `Keep-Alive: timeout=5, max=100` Similarly, HTTP/1.1 pipelining, as mentioned, was an optional feature that builds directly on persistent connections. It enables a client to send multiple HTTP requests back to back over the same persistent TCP connection, without having to wait for the previous response to arrive first. In theory this is great for reducing round trip time waste, especially for high latency links. However, almost no one uses it today. The primary disadvantage is that responses must be sent in the request order, so one slow resource can stall all the responses. Furthermore, many HTTP/1.1 server mishandled pipelined requests, so clients were unable to safely enable it. ### HTTP/2 HTTP/2 introduced **multiplexing** which is a key feature that made HTTP/2 significantly faster than HTTP/1.1, especially for pages with dozens of resources including CSS, JavaScript, images, API calls, etc. Multiplexing enables multiple HTTP requests and responses to be sent and received simultaneously over a single TCP connection, without forcing them into strict order. This eliminates the blocking issue with pipelining. Unlike HTTP/1.1 and HTTP/1, HTTP/2 is a binary protocol and breaks everything down into small frames. Note that the semantics of each message is unchanged though and the client basically reconstitutes the original HTTP/1.1 request even with HTTP/2, which is why it's still worthwhile to study HTTP/1 to understand HTTP/2. Some definitional terms, a *stream* is a logical sequence of frames that belong to one HTTP request/response pair. Each stream gets a *Stream ID*, which has odd numbers from client and even from server. Frames from different streams can be interleaved on the wire. An example flow for loading page is as follows: 1. Client opens a TCP connection 2. Client sends frames for request #1, #2, #3, #4, etc... all at once or quickly in succession 3. Server processes them 4. Server sends back Data frames for each stream in arbitrary order 5. Client receives interleaved frames, uses StreamID to reassemble responses Some key technical details include - Different frame types (HEADERS, DATA, PRIORITY, RST_STREAM, SETTINGS, PUSH_PROMISE, etc.) - Flow control are per stream windows that prevent greedy streams from starving others - Stream priority/dependency tree tell server what should be prioritized - HPACK header compression compresses headers for less overhead HTTP/2 was important for webdevelopment (2015 - 2025) since browsers stopped using 6-8 parallel connections per domain to workaround these issues, and less parallelization overhead was needed. Additionally, lower connection count means less server memory, means fewer TLS handshakes and better CDN scaling (save money!). There's still some TCP hold blocking, where packet loss stalls the whole connection, but that's why we now have HTTP/3 + QUIC. ## HTTP Flow This is the typical HTTP flow for a request/response pair: 1. Open a TCP connection - Used to send request or several - Client may open a new connection, reuse existing, or open several TCP connections 2. Send an HTTP request - HTTP (before HTTP/2) are human readable ```HTML GET / HTTP/1.1 Host: name.website.com Accept-Language: fr ``` 3. Read the response sent by the server ```HTML HTTP/1.1 200 OK Date: Thurs, 19 Feb 2026 14:28:02 GMT Server: Apache Last-Modified: Thurs, 19 Feb 2026 12:18:22 GMT ETag: "51142bc1-7449-479b075b2891b" Accept-Ranges: bytes Content-Length: 29769 Content-Type: text/html <!doctype html>… (here come the 29769 bytes of the requested web page) ``` 4. Close or reuse TCP connection ## HTTP Messages These are human readable as defined in HTTP/1.1 and HTTP/1. However in HTTP/2 these messages are embedded into a frame (binary structure), which enables optimizations like compression of headers and multiplexing, but not human readable. The semantics of each message is unchanged though and the client reconstitutes the original HTTP/1.1 request largely. Thus it's useful to study the HTTP/1.1 format. There are two types of HTTP Messages: **Requests** and **Responses** ### Requests Given the following HTTP request message, we will describe its attributes: ```HTML GET / HTTP/1.1 Host: name.website.com Accept-Language: fr ``` Attributes: 1. HTTP Method: **GET** - can also be: GET, POST, UPDATE, DELETE, etc. 2. Path of URL to fetch: **/** - can be any path, for example: /blog/http-overview 👀 3. Version of HTTP protocol: **HTTP/1.1** 4. Headers: **Host: name.website.com** and **Accept-Language: fr** - Headers are optional and are extensible 5. Body: None here - Typically JSON for methods like POST, containing resources sent ### Responses Given the following HTTP response message, we will describe its attributes: ```HTML HTTP/1.1 200 OK Date: Thurs, 19 Feb 2026 14:28:02 GMT Server: Apache Last-Modified: Thurs, 19 Feb 2026 12:18:22 GMT ETag: "51142bc1-7449-479b075b2891b" Accept-Ranges: bytes Content-Length: 29769 Content-Type: text/html <!doctype html>… (here come the 29769 bytes of the requested web page) ``` Attributes: 1. Protocol Version: **HTTP/1.1** 2. Status Code: **200** - There can be many, famously *404 Not Found* is an example - In general: 200's success, 300's redirects, 400's error/request denied 3. Status Message: **OK** - In accordance with code 4. Headers: **Date: Thurs, 19 Feb 2026 14:28:02 GMT**, **Server: Apache**... **Content-Type: text/html** - Easily extensible for communication between server and client 5. Body: **<!doctype html>…** - Typically JSON - Contains fetched resource, including potentially a HTML page ### HTTP Status Codes The status code of a response is a three digit integer code that describes the result of the request and the semantics of the response. This includes wif the request was successful and what/if content is enclosed. All valid status codes are between 100 to 599. The first digit of the status code defines the class of response. The last two digits don't have any specific categorization role. For the first digit: - 1xx (Informational) - Request received, continuing process - 2xx (Successful) - Request received, understood, and accepted - 3xx (Redirection) - Further action needs to be taken to accept request - 4xx (Client Error) - Request contains bad syntax, invalid permissions, or cannot be fulfilled - 5xx (Server Error) - Server failed to fulfill a valid request These codes are designed to be extensible, however the class of the status code (first digit) remains constant. Some highlights of important status codes are: - 200 OK - Basic success - 201 Created - Resource successfully created (typically after POST) - 204 No Content - Success, nothing to send back - 301 Moved Permanently - Resource moved forever, update links - 302 Found - Requested resource has been temporarily moved to URL in location header - 400 Bad Request - 401 Unauthorized - 403 Forbidden - Permissions issue - 404 Not Found - 500 Internal Server Error - 502 Bad Gateway - Server acting as proxy got bad response from upstream ### HTTP Methods There are some common HTTP methods that one should know for HTTP requests. This largely has to do with either retrieving or sending data to the server. - GET - Retrieves resource, idempotent - POST - Creates new resource, not idempotent - PUT - Replaces/updates resource, idempotent - PATCH - Partially updates a resource, can be idempotent - DELETE - Removes resource, can be idempotent ## HTTP vs HTTPS HTTP (HyperText Transfer Protocol) versus HTTPS (HyperText Transfer Protocl *Secure*) are pretty similar, but have a slight difference. HTTPS is secure, meaning that instead of sending data in plaintext (like HTTP) it uses **TLS (Transport Layer Security)** to have data be encrypted in transit. With HTTPS only the client and server can read the data in transit, unlike with HTTP where all intermediary machines can read it. This has big implications for data confidentiality and integrity. Also to note, the default port for HTTP is Port 80, unlike for HTTPS it's 443, with respective url prefixes `http://` versus `https://`. The overhead incurred by HTTPS is negligible. TLS (Transport Layer Security) is a modern cryptographic protocol, that's the successor to SSL (Secure Sockets Layer). A high level over view is that TLS is between the application layer (HTTP) and the transport layer (TCP). TLS works in two phases: 1. TLS Handshake - Negotiate security parameters & establish shared secret keys 2. Record Protocol - Encrypt/decrypt application data (HTTP requests/responses) using these keys The TLS Handshake works as follows: 1. ClientHello (sent in clear text by client) - Signals that he supports TLS 1.3 with supplementary information including what cypher to use 2. ServerHello (encrypted) - Server picks cypher, sends public key and server random - Client can independently compute their same shared secret - Additional certificates and finish message 3. Client Finished - Client verifies server certificate and computes shared secret - Sends finished message acknowledging its ready to communicate Comments (0) Please log in to comment. No comments yet. Be the first to comment! ← Back to Blog
Comments (0)
Please log in to comment.
No comments yet. Be the first to comment!