WebSockets by dokuDoku system design 🔍 ## What is a WebSocket? A WebSocket is an open two way interactive communication session between a Client and a Server. Typically, this is a TCP connection that originates from an upgraded [HTTP](https://doku88-github-io.onrender.com/blog/http-overview) connection. HTTP fundamentally is a request/response model where the client requests information from a server, and the server then returns information to said client in a response. However, what if we want to have bidirectional communication with delivery guarantees? This is exactly what WebSockets do (note delivery guarantees are done through the Layer 4 TCP connection, **not** Layer 7 WebSocket). With the WebSocket API for HTTP, clients can send messages to the server and receive live updates/responses without polling, enabling real time data exchange. This is great for applications such as chat applications, live notifications, online games, and collaborative editing tools. Additionally, WebSockets reduce overhead of repeated HTTP headers giving faster message delivery along with using frames (structured protocol units) to exchange information, including strings, JSON, protobufs, images, etc. WebSockets utilize a single TCP connection that stays open until either side decides to close it, meaning that we do not need redundant handshakes to send information. Furthermore, because WebSockets are an upgraded HTTP protocol we can leverage existing HTTP session information such as cookies, headers, JSON Web Tokens (JWTs), etc. which is great for developers. Typically WebSockets use either port 80 (HTTP) or port 443 (HTTPS). Of note, L4 Transport layer load balancers (see [OSI Model Blog Post](https://doku88-github-io.onrender.com/blog/osi-model)) directly support WebSockets since they forward TCP level connections. L7 Application layer load balancers typically require specific configuration, so be sure to check! To note, WebSockets typically have heartbeats and timeouts, which will need to be tuned appropriately for use cases. ## Upgrading to WebSocket from HTTP Procedure To use WebSockets, we first assume that there is an HTTP/1.1 connection between a client and a server. Note that we could do this in HTTP/2, but this example will assume HTTP/1.1. Given this, we do the following: 1. Upgrade Request Client sends HTTP request with `Upgrade: websocket` header to server ``` GET /chat HTTP/1.1 Host: example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13 Origin: https://example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Extensions: permessage-deflate ``` Required headers include - `Upgrade: websocket` signals intent to switch protocols - `Connection: Upgrade` required in HTTP/1.1 for protocol switching - `Sec-WebSocket-Version: 13` version 13 is standard - `Sec-WebSocket-Key` key used so the server can verify it received a real WebSocket upgrade request 2. Server Response The server signals agreement with: ``` HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= Sec-WebSocket-Protocol: chat Sec-WebSocket-Extensions: permessage-deflate ``` - `Sec-WebSocket-Accept` proves the server understands WebSocket and is calculated using the client's `Sec-WebSocket-Key` previously sent so the Client can verify this value 3. Persistent TCP Link Connection switches from HTTP to WebSocket protocol, keeping TCP connection open Enables full duplex communication such that client & server send messages without requests HTTP semantics stop and WebSocket frames begin flowing 4. Close Handshake: Close Frame Sent Client or server decides to terminate connection through a two-step closing handshake, here assume client closes: Close Frame from Client (human readable, typically encoded masked): ``` Opcode: 0x8 (Close) FIN: 1 Payload: Status code: 1000 Reason: "done" ``` - Status code: 1000 indicates normal closure 5. Close Handshake: Close Frame Received, Send Response Close Frame Confirm from Server ``` FIN=1 Opcode=0x8 Payload: 03 E8 ``` - Sever to client frames not masked - Typically servers do not include reason string unless they want to 6. Connection Closed ### WebSocket Features To note there are some nice WebSocket features. Firstly, WebSockets split data into a frame-based binary protocol and send these frames over the connection. Frames are basically small structured packets inside of a stream containing metadata and payload. Frames enable fragmentation, streaming large messages, control messages, binary transport, and low overhead. This is an advantage over repeated HTTP headers since it reduces bandwidth and latency. Note that despite HTTP/2 also using frames, WebSockets can be used over HTTP/1.1 (persistent connection feature key here) using the `Upgrade: websocket` header. For HTTP/2 WebSockets can be bootstrapped over HTTP/2 using the standardized `Extended CONNECT` mechanism. With WebSocket frames, there's no repeated HTTP headers, allowing for lighter, faster messages. #### WebSocket Masking Masking exists to protect against proxy cache poisoning attacks, which is when an attacker tricks a caching proxy into storing a malicious or incorrect response, that is then served to other users. This is quite dangerous since one malicious request can impact many users, since proxy caches exist to store HTTP responses to serve them without hitting the origin server again (CDN's, reverse proxies, corporate proxies, and ISP (Internet Service Provider) caches). The rule is that Client $\rightarrow$ Server frames **must** be masked, but Server $\rightarrow$ Client frames **must not** be masked. The purpose of masking is to ensure that client payload bytes are unpredictable and cannot accidentally look like valid HTTP traffic. It's not encryption or security, but a mitigation against others attempting to form legitimate looking frames. The way that masking works is that each client frame contains - FIN bit (Final Fragment) 1 $\rightarrow$ final frame, 0 $\rightarrow$ more frames - Opcode (4 bits) Type of frame (text, binary, etc.) - Mask bit (1 bit) 1 $\rightarrow$ payload masked, 0 $\rightarrow$ payload not masked - Payload length - 4-Byte Masking Key Randomly generated for each frame - Payload Data If masked, XOR'd with masking key Mask repeats every 4-Bytes Of note, masking adds 4 bytes per frame and XOR cost per byte. Server must unmask every client frame, and is measurable in high throughput systems. However, it is not a big deal since it is simple. Additionally, control frames (ping/pong/close) **cannot** be fragmented, have max 125 bytes, and may appear in the middle of fragmented messages. Fragmentation enables a message to be split, but applications usually receive complete messages not partial frames. ## WebSocket API's The WebSocket API makes it possible to upgrade and use WebSockets from an HTTP connection. There are typically two ways to use WebSockets the **WebSocket Interface** and the **WebSocketStream Interface**. Less commonly there is also the **WebTransport** API. ### WebSocket Interface/API The WebSocket interface is stable with good browser and server support, however it does not support back pressure. This means that when messages arrive faster than the application can process, this will fill up the device's memory by buffering these messages, become unresponsive due to 100% CPU usage, or both. This has been standardized by the Internet Engineering Task Force (IETF) in 2011, but could potentially (maybe?) soon be replaced by the WebTransport API. This interface is event-based meaning that we attach listeners and data arrives whenever it arrives. It is for this reason that it does not support back pressure since it is simply listening on a TCP connection. ### WebSocketStream Interface/API This is a promise based alternative meaning that it uses JavaScript promises and the Streams API instead of event callbacks. This means that the connection establishment returns a promise so we can explicitly await (async basically for Python folks) the next piece of data back. This uses the Streams API to handle receiving and sending messages meaning that socket connections can use stream back pressure automatically, regulating the speed of reading or writing to avoid bottlenecks in the application. Instead of simply waiting for messages through a TCP connection, this API returns a promise for connection establishment, exposes a ReadableStream for incoming messages, and exposes a WritableStream for outgoing messages. This changes the data flow model from the original WebSocket API since the original was a Push Based (browser delivers messages as events) and WebSocketStreams are Pull Based (application explicitly reads from a stream). The Streams API lets the application apply back pressure to reading/writing, preventing unbounded buffering in the browser or applicaiton. This was introduced since it aligns WebSockets with fetch streaming, Web Streams API, modern async/await, and back pressure aware systems. *However* this is not widely supported (Chromium only) and most production applications use the classic WebSocket Interface. ### WebTransport Interface/API This is an up and coming alternative that is versatile and low level providing back pressure and several other features not supported by the previous two interfaces. Some of these features include unidirectional streams, out of order delivery, and unreliable data transmissions with datagrams. A datagram is a small, unreliable, unordered message sent over QUIC that may be dropped and not re-transmitted (similar to UDP). Note that WebTransport runs over [QUIC](https://quicwg.org/) and QUIC supports both reliable streams (like TCP) and unreliable datagrams (similar to UDP). This means that WebTransport enables both types of transmissions. However, this is more complex to use than WebSockets and cross-browser support is limited. ## What is a WebSocket? A WebSocket is an open two way interactive communication session between a Client and a Server. Typically, this is a TCP connection that originates from an upgraded [HTTP](https://doku88-github-io.onrender.com/blog/http-overview) connection. HTTP fundamentally is a request/response model where the client requests information from a server, and the server then returns information to said client in a response. However, what if we want to have bidirectional communication with delivery guarantees? This is exactly what WebSockets do (note delivery guarantees are done through the Layer 4 TCP connection, **not** Layer 7 WebSocket). With the WebSocket API for HTTP, clients can send messages to the server and receive live updates/responses without polling, enabling real time data exchange. This is great for applications such as chat applications, live notifications, online games, and collaborative editing tools. Additionally, WebSockets reduce overhead of repeated HTTP headers giving faster message delivery along with using frames (structured protocol units) to exchange information, including strings, JSON, protobufs, images, etc. WebSockets utilize a single TCP connection that stays open until either side decides to close it, meaning that we do not need redundant handshakes to send information. Furthermore, because WebSockets are an upgraded HTTP protocol we can leverage existing HTTP session information such as cookies, headers, JSON Web Tokens (JWTs), etc. which is great for developers. Typically WebSockets use either port 80 (HTTP) or port 443 (HTTPS). Of note, L4 Transport layer load balancers (see [OSI Model Blog Post](https://doku88-github-io.onrender.com/blog/osi-model)) directly support WebSockets since they forward TCP level connections. L7 Application layer load balancers typically require specific configuration, so be sure to check! To note, WebSockets typically have heartbeats and timeouts, which will need to be tuned appropriately for use cases. ## Upgrading to WebSocket from HTTP Procedure To use WebSockets, we first assume that there is an HTTP/1.1 connection between a client and a server. Note that we could do this in HTTP/2, but this example will assume HTTP/1.1. Given this, we do the following: 1. Upgrade Request Client sends HTTP request with `Upgrade: websocket` header to server ``` GET /chat HTTP/1.1 Host: example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13 Origin: https://example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Extensions: permessage-deflate ``` Required headers include - `Upgrade: websocket` signals intent to switch protocols - `Connection: Upgrade` required in HTTP/1.1 for protocol switching - `Sec-WebSocket-Version: 13` version 13 is standard - `Sec-WebSocket-Key` key used so the server can verify it received a real WebSocket upgrade request 2. Server Response The server signals agreement with: ``` HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= Sec-WebSocket-Protocol: chat Sec-WebSocket-Extensions: permessage-deflate ``` - `Sec-WebSocket-Accept` proves the server understands WebSocket and is calculated using the client's `Sec-WebSocket-Key` previously sent so the Client can verify this value 3. Persistent TCP Link Connection switches from HTTP to WebSocket protocol, keeping TCP connection open Enables full duplex communication such that client & server send messages without requests HTTP semantics stop and WebSocket frames begin flowing 4. Close Handshake: Close Frame Sent Client or server decides to terminate connection through a two-step closing handshake, here assume client closes: Close Frame from Client (human readable, typically encoded masked): ``` Opcode: 0x8 (Close) FIN: 1 Payload: Status code: 1000 Reason: "done" ``` - Status code: 1000 indicates normal closure 5. Close Handshake: Close Frame Received, Send Response Close Frame Confirm from Server ``` FIN=1 Opcode=0x8 Payload: 03 E8 ``` - Sever to client frames not masked - Typically servers do not include reason string unless they want to 6. Connection Closed ### WebSocket Features To note there are some nice WebSocket features. Firstly, WebSockets split data into a frame-based binary protocol and send these frames over the connection. Frames are basically small structured packets inside of a stream containing metadata and payload. Frames enable fragmentation, streaming large messages, control messages, binary transport, and low overhead. This is an advantage over repeated HTTP headers since it reduces bandwidth and latency. Note that despite HTTP/2 also using frames, WebSockets can be used over HTTP/1.1 (persistent connection feature key here) using the `Upgrade: websocket` header. For HTTP/2 WebSockets can be bootstrapped over HTTP/2 using the standardized `Extended CONNECT` mechanism. With WebSocket frames, there's no repeated HTTP headers, allowing for lighter, faster messages. #### WebSocket Masking Masking exists to protect against proxy cache poisoning attacks, which is when an attacker tricks a caching proxy into storing a malicious or incorrect response, that is then served to other users. This is quite dangerous since one malicious request can impact many users, since proxy caches exist to store HTTP responses to serve them without hitting the origin server again (CDN's, reverse proxies, corporate proxies, and ISP (Internet Service Provider) caches). The rule is that Client $\rightarrow$ Server frames **must** be masked, but Server $\rightarrow$ Client frames **must not** be masked. The purpose of masking is to ensure that client payload bytes are unpredictable and cannot accidentally look like valid HTTP traffic. It's not encryption or security, but a mitigation against others attempting to form legitimate looking frames. The way that masking works is that each client frame contains - FIN bit (Final Fragment) 1 $\rightarrow$ final frame, 0 $\rightarrow$ more frames - Opcode (4 bits) Type of frame (text, binary, etc.) - Mask bit (1 bit) 1 $\rightarrow$ payload masked, 0 $\rightarrow$ payload not masked - Payload length - 4-Byte Masking Key Randomly generated for each frame - Payload Data If masked, XOR'd with masking key Mask repeats every 4-Bytes Of note, masking adds 4 bytes per frame and XOR cost per byte. Server must unmask every client frame, and is measurable in high throughput systems. However, it is not a big deal since it is simple. Additionally, control frames (ping/pong/close) **cannot** be fragmented, have max 125 bytes, and may appear in the middle of fragmented messages. Fragmentation enables a message to be split, but applications usually receive complete messages not partial frames. ## WebSocket API's The WebSocket API makes it possible to upgrade and use WebSockets from an HTTP connection. There are typically two ways to use WebSockets the **WebSocket Interface** and the **WebSocketStream Interface**. Less commonly there is also the **WebTransport** API. ### WebSocket Interface/API The WebSocket interface is stable with good browser and server support, however it does not support back pressure. This means that when messages arrive faster than the application can process, this will fill up the device's memory by buffering these messages, become unresponsive due to 100% CPU usage, or both. This has been standardized by the Internet Engineering Task Force (IETF) in 2011, but could potentially (maybe?) soon be replaced by the WebTransport API. This interface is event-based meaning that we attach listeners and data arrives whenever it arrives. It is for this reason that it does not support back pressure since it is simply listening on a TCP connection. ### WebSocketStream Interface/API This is a promise based alternative meaning that it uses JavaScript promises and the Streams API instead of event callbacks. This means that the connection establishment returns a promise so we can explicitly await (async basically for Python folks) the next piece of data back. This uses the Streams API to handle receiving and sending messages meaning that socket connections can use stream back pressure automatically, regulating the speed of reading or writing to avoid bottlenecks in the application. Instead of simply waiting for messages through a TCP connection, this API returns a promise for connection establishment, exposes a ReadableStream for incoming messages, and exposes a WritableStream for outgoing messages. This changes the data flow model from the original WebSocket API since the original was a Push Based (browser delivers messages as events) and WebSocketStreams are Pull Based (application explicitly reads from a stream). The Streams API lets the application apply back pressure to reading/writing, preventing unbounded buffering in the browser or applicaiton. This was introduced since it aligns WebSockets with fetch streaming, Web Streams API, modern async/await, and back pressure aware systems. *However* this is not widely supported (Chromium only) and most production applications use the classic WebSocket Interface. ### WebTransport Interface/API This is an up and coming alternative that is versatile and low level providing back pressure and several other features not supported by the previous two interfaces. Some of these features include unidirectional streams, out of order delivery, and unreliable data transmissions with datagrams. A datagram is a small, unreliable, unordered message sent over QUIC that may be dropped and not re-transmitted (similar to UDP). Note that WebTransport runs over [QUIC](https://quicwg.org/) and QUIC supports both reliable streams (like TCP) and unreliable datagrams (similar to UDP). This means that WebTransport enables both types of transmissions. However, this is more complex to use than WebSockets and cross-browser support is limited. Comments (0) Please log in to comment. No comments yet. Be the first to comment! ← Back to Blog
Comments (0)
Please log in to comment.
No comments yet. Be the first to comment!