TCP: Transmission Control Protocol by dokuDoku system design 🔍 ## What is TCP (Transmission Control Protocol)? ### Definitions A **packet** is a small, structured chunk of data sent across a network. Specifically a packet is the following: - Header (metadata) - Source IP - Destination IP - Sequence numbers (for TCP) - Protocol info - Payload - The actual data (some bytes) A **data stream** is a continuous flow of bytes that is sent from one entity to another. The data stream is a clean, ordered flow of bytes that TCP returns processing a series of packets. ### Overview Transmission Control Protocol (TCP) is a Transmission Layer 4 (see [OSI Model Blog Post](https://doku88-github-io.onrender.com/blog/osi-model)) protocol that enables two hosts to connect over an Internet Protocol (IP) network and exchange data streams. A key feature of TCP is that it guarantees error checked delivery of data and packets in the same order as they were sent. TCP is used for internet, email, remote admin, file transfer and streaming media. Secure Sockets Layer (SSL) and Transport Layer Security (TLS) runs on top of TCP along with HTTP utilizing TCP too. TCP is connection oriented, so the sender and the receiver need to establish a connection before messages are sent. To do this there is a 3 way handshake procedure to establish the connection and a 4 way handshake procedure to terminate the connection. The server must be listening for connection requests from clients to be able to accept them. ## TCP Connection Properties Importantly, TCP solves 5 distributed systems problems: 1. Lost Packets: retransmission 2. Out of Order Packets: sequence numbers 3. Duplicated Packets: ACK tracking 4. Slow Receiver: flow control 5. Overloaded Network: congestion control For **Lost Packets** TCP ensures reliability. It does this through numbering each packet with sequence numbers and a confirmation `ACK` is sent to denote that that specific packet was received. For example let's say that we sent the following packet sending the message *Hello* & acknowledging (ACK) that it has received up to sequence number 2000 from the other party: ``` Header (metadata) Source IP: 192.168.1.10 Destination IP: 93.184.216.34 IP Total Length: 45 bytes Protocol: TCP Source Port: 51514 Destination Port: 80 (HTTP) Sequence Number: 1001 (packet starts at byte 1001 in data stream) Acknowledgment Number: 2001 (I’ve received everything from you up to byte 2000) TCP Header Length: 20 bytes (where header ends & payload begins) Flags: ACK Payload "Hello" ``` We can break this illustrative packet down into several parts: 1. IP Layer The IP Layer includes the **Source IP**, **Destination IP**, **IP Total Length** (total size of packet), and the **Protocol**. In this case the **Protocol** is set to TCP, but it could be UDP instead for example. The IP Layer states who sent the packet, where it's going, how big it is, and the protocol for protocol specific behavior as it's being passed through the network. 2. TCP Layer - Source Port - Destination Port - Sequence Number Location of first byte in sender's data stream - Acknowledgement Number What the sender has successfully read from the other party's data stream - TCP Header Length - Flags `ACK` common, denotes how to interpret the packet This manages TCP specific functions such as reliability, ordering and flow control. 3. Payload Data being transferred as part of data stream. Note that TCP packets are **full-duplex** meaning that one byte stream goes from client $\rightarrow$ server and another goes from server $\rightarrow$ client. This example packet adds bytes to the client $\rightarrow$ server stream & acknowledges it has received data in the server $\rightarrow$ client stream. It is colloquially known as *piggybacking the acknowledgement*. ### Common TCP Flags Since we have introduced TCP flags, let's list them all out to get a sense of the capabilities of TCP: - SYN: start connection - ACK: acknowledge data - FIN: gracefully close connection - RST: immediately terminate - PSH: deliver data immediately (push) - URG: urgent data present - ECE/CWR/NS: congestion control Note that each flag is a 1 bit switch, so you can have multiple flags set to true at the same time, such as `SYN` & `ACK`. ### Properties TCP connection properties are built from the ground up with respect to the packet. TCP ensures reliability using that each packet contains a sequence number and the size of the packet. Then the receiver can `ACK` with $sequence\ number + size + 1$, which shows that it has received up to $sequence\ number + size$ from the sender's data stream. Note that if the sender does not receive this `ACK` with the proper sequence number within a timeout period, he will retransmit this packet, since it has been assumed to be lost. Flow control is implemented using a sliding windows where the receiver states how many bytes they can buffer. Sender does not send more bytes than the receiver can handle, until it receives an `ACK` stating that it has received those bytes. Not that this also plays into congestion control, since the critical assumption is that packet loss indicates congestion. Thus if the receiver is losing too many packets (not sending enough `ACK`'s then the sender will slow the rate at which he sends packets. TCP handles reordering by buffering out of order segments. This means that if we receive segments A & C, but not B, then C is buffered and the receiver keeps `ACK`ing it got A until the sender resends B, so that the receiver can have ABC in order. Note that these properties primarily derive from the fact that TCP has *guaranteed delivery* through `ACK`'s. ## Handshakes To note, the handshakes primarily involve the previously mentioned flags to indicate intent, and an internal state for each host. The states are well named and obvious on inspection, but later in the blog post the major states are explicitly listed. ### 3 Way Connection Initialization Handshake 1. Host A sends Host B a synchronize flag `SYN` with sequence number x. Host B starts in `LISTEN` state and Host A is now in `SYN_SENT` state from the previous `CLOSED` state before sending `SYN`. 2. Host B replies with a synchronize acknowledgement `SYN` & `ACK` flags with its sequence number y, and acknowledgement number x+1. Host B is now in the `SYN_RECEIVED` state. 3. Host A replies with an acknowledgement flag `ACK` & acknowledgement number y+1. Host A & Host B are now in the `ESTABLISHED` state. Now the connection has been established. ### 4 Way Connection Termination Handshake 1. Host A sends a finish flag `FIN` to Host B with sequence number x and enters the state: `FIN_WAIT_1` 2. Host B acknowledges the FIN with an acknowledgement `ACK` and sequence number x+1 sent to Host A. Host B enters into `CLOSE_WAIT` state. Host A receives acknowledgement and enters into `FIN_WAIT_2` state, meaning that Host A cannot send new information, but Host B can. 3. Host B proceeds to send remaining data to Host A, then sends finished `FIN` with sequence number y. Host B enters `LAST_ACK` state. 4. Host A sends final acknowledgement `ACK` with sequence number y+1 to acknowledge Host B's finish. Host A now enters `TIME_WAIT` state in case of any lingering data and will timeout. Host B enters `CLOSED` state and the connection is closed. 4 steps were needed because we were closing two independent byte streams. Note that closing Host A $\rightarrow$ Host B byte stream does not close Host B $\rightarrow$ Host A byte stream. Each `FIN` closes in one direction. #### Host States As mentioned, each host (client and server) can be in a specific state. These are the major states: - LISTEN - SYN_SENT - SYN_RECEIVED - ESTABLISHED - FIN_WAIT_1 - FIN_WAIT_2 - CLOSE_WAIT - LAST_ACK - TIME_WAIT - CLOSED ## What is TCP (Transmission Control Protocol)? ### Definitions A **packet** is a small, structured chunk of data sent across a network. Specifically a packet is the following: - Header (metadata) - Source IP - Destination IP - Sequence numbers (for TCP) - Protocol info - Payload - The actual data (some bytes) A **data stream** is a continuous flow of bytes that is sent from one entity to another. The data stream is a clean, ordered flow of bytes that TCP returns processing a series of packets. ### Overview Transmission Control Protocol (TCP) is a Transmission Layer 4 (see [OSI Model Blog Post](https://doku88-github-io.onrender.com/blog/osi-model)) protocol that enables two hosts to connect over an Internet Protocol (IP) network and exchange data streams. A key feature of TCP is that it guarantees error checked delivery of data and packets in the same order as they were sent. TCP is used for internet, email, remote admin, file transfer and streaming media. Secure Sockets Layer (SSL) and Transport Layer Security (TLS) runs on top of TCP along with HTTP utilizing TCP too. TCP is connection oriented, so the sender and the receiver need to establish a connection before messages are sent. To do this there is a 3 way handshake procedure to establish the connection and a 4 way handshake procedure to terminate the connection. The server must be listening for connection requests from clients to be able to accept them. ## TCP Connection Properties Importantly, TCP solves 5 distributed systems problems: 1. Lost Packets: retransmission 2. Out of Order Packets: sequence numbers 3. Duplicated Packets: ACK tracking 4. Slow Receiver: flow control 5. Overloaded Network: congestion control For **Lost Packets** TCP ensures reliability. It does this through numbering each packet with sequence numbers and a confirmation `ACK` is sent to denote that that specific packet was received. For example let's say that we sent the following packet sending the message *Hello* & acknowledging (ACK) that it has received up to sequence number 2000 from the other party: ``` Header (metadata) Source IP: 192.168.1.10 Destination IP: 93.184.216.34 IP Total Length: 45 bytes Protocol: TCP Source Port: 51514 Destination Port: 80 (HTTP) Sequence Number: 1001 (packet starts at byte 1001 in data stream) Acknowledgment Number: 2001 (I’ve received everything from you up to byte 2000) TCP Header Length: 20 bytes (where header ends & payload begins) Flags: ACK Payload "Hello" ``` We can break this illustrative packet down into several parts: 1. IP Layer The IP Layer includes the **Source IP**, **Destination IP**, **IP Total Length** (total size of packet), and the **Protocol**. In this case the **Protocol** is set to TCP, but it could be UDP instead for example. The IP Layer states who sent the packet, where it's going, how big it is, and the protocol for protocol specific behavior as it's being passed through the network. 2. TCP Layer - Source Port - Destination Port - Sequence Number Location of first byte in sender's data stream - Acknowledgement Number What the sender has successfully read from the other party's data stream - TCP Header Length - Flags `ACK` common, denotes how to interpret the packet This manages TCP specific functions such as reliability, ordering and flow control. 3. Payload Data being transferred as part of data stream. Note that TCP packets are **full-duplex** meaning that one byte stream goes from client $\rightarrow$ server and another goes from server $\rightarrow$ client. This example packet adds bytes to the client $\rightarrow$ server stream & acknowledges it has received data in the server $\rightarrow$ client stream. It is colloquially known as *piggybacking the acknowledgement*. ### Common TCP Flags Since we have introduced TCP flags, let's list them all out to get a sense of the capabilities of TCP: - SYN: start connection - ACK: acknowledge data - FIN: gracefully close connection - RST: immediately terminate - PSH: deliver data immediately (push) - URG: urgent data present - ECE/CWR/NS: congestion control Note that each flag is a 1 bit switch, so you can have multiple flags set to true at the same time, such as `SYN` & `ACK`. ### Properties TCP connection properties are built from the ground up with respect to the packet. TCP ensures reliability using that each packet contains a sequence number and the size of the packet. Then the receiver can `ACK` with $sequence\ number + size + 1$, which shows that it has received up to $sequence\ number + size$ from the sender's data stream. Note that if the sender does not receive this `ACK` with the proper sequence number within a timeout period, he will retransmit this packet, since it has been assumed to be lost. Flow control is implemented using a sliding windows where the receiver states how many bytes they can buffer. Sender does not send more bytes than the receiver can handle, until it receives an `ACK` stating that it has received those bytes. Not that this also plays into congestion control, since the critical assumption is that packet loss indicates congestion. Thus if the receiver is losing too many packets (not sending enough `ACK`'s then the sender will slow the rate at which he sends packets. TCP handles reordering by buffering out of order segments. This means that if we receive segments A & C, but not B, then C is buffered and the receiver keeps `ACK`ing it got A until the sender resends B, so that the receiver can have ABC in order. Note that these properties primarily derive from the fact that TCP has *guaranteed delivery* through `ACK`'s. ## Handshakes To note, the handshakes primarily involve the previously mentioned flags to indicate intent, and an internal state for each host. The states are well named and obvious on inspection, but later in the blog post the major states are explicitly listed. ### 3 Way Connection Initialization Handshake 1. Host A sends Host B a synchronize flag `SYN` with sequence number x. Host B starts in `LISTEN` state and Host A is now in `SYN_SENT` state from the previous `CLOSED` state before sending `SYN`. 2. Host B replies with a synchronize acknowledgement `SYN` & `ACK` flags with its sequence number y, and acknowledgement number x+1. Host B is now in the `SYN_RECEIVED` state. 3. Host A replies with an acknowledgement flag `ACK` & acknowledgement number y+1. Host A & Host B are now in the `ESTABLISHED` state. Now the connection has been established. ### 4 Way Connection Termination Handshake 1. Host A sends a finish flag `FIN` to Host B with sequence number x and enters the state: `FIN_WAIT_1` 2. Host B acknowledges the FIN with an acknowledgement `ACK` and sequence number x+1 sent to Host A. Host B enters into `CLOSE_WAIT` state. Host A receives acknowledgement and enters into `FIN_WAIT_2` state, meaning that Host A cannot send new information, but Host B can. 3. Host B proceeds to send remaining data to Host A, then sends finished `FIN` with sequence number y. Host B enters `LAST_ACK` state. 4. Host A sends final acknowledgement `ACK` with sequence number y+1 to acknowledge Host B's finish. Host A now enters `TIME_WAIT` state in case of any lingering data and will timeout. Host B enters `CLOSED` state and the connection is closed. 4 steps were needed because we were closing two independent byte streams. Note that closing Host A $\rightarrow$ Host B byte stream does not close Host B $\rightarrow$ Host A byte stream. Each `FIN` closes in one direction. #### Host States As mentioned, each host (client and server) can be in a specific state. These are the major states: - LISTEN - SYN_SENT - SYN_RECEIVED - ESTABLISHED - FIN_WAIT_1 - FIN_WAIT_2 - CLOSE_WAIT - LAST_ACK - TIME_WAIT - CLOSED Comments (0) Please log in to comment. No comments yet. Be the first to comment! ← Back to Blog
Comments (0)
Please log in to comment.
No comments yet. Be the first to comment!