TCP Protocol
Contents
- Stream of bytes that will arrive in the right order
Segments
A large TCP packet is split into segments to allow it to be transmitted over the wire.
The largest packet size, according to the Ethernet protocol is 1500 bytes; this is known as the MTU.
IP layer requires at least 20 bytes for its header.
TCP layer requires at least 20 bytes for its header.
As OSI layers encapsulate each other, the maximum size of a TCP packet is 1460 bytes.
- Maximum Transmission Unit (MTU) - Up to 1500 bytes with Ethernet
- Maximum Segment Size (MSS) - Up to 1460 bytes (MTU - 20 - 20)
Structure
Header
- 2 bytes - Source port
- 2 bytes - Destination port
- 4 bytes - Sequence number
- 4 bytes - Acknowledgement
- 2 bytes - Flags
- 2 bytes - Receive Window
- 2 bytes - Checksum
- 2 bytes - Urgent Pointer
- Optional n bytes - Options
Payload
Sequence Number
The sequence number of a TCP segment is the position of the first byte in the segment.
SeqNo = ISN + K
ACK Sequence Number
The acknowledgement number is the next byte the client accepting from the other host
ACKNo = SeqNo + Length
If a packet is lost and later packets arrive, the receiving will keep sending the ACK for the byte number of the lost packet. Refer to TCP Packet Buffer
Example
Seq=42, ACK=79, data='C'
-> This packet is byte #42, I am next expecting the reply with Seq=79
Packet Piggybacking
Multiple packet payloads within the same packet.
i.e. Response + ACK
TCP Packet Buffer
RTT Time and Timeout
- If timeout too short - premature timeout, unnecessary retransmission
- If timeout too long - slow reaction to segment loss and lower throughput
Solution: Maths.
SampleRTT - Time from segment transmission until last ACK receipt (ignore retransmission).
- EstimatedRTT = (1-α) * EstimatedRTT_prev + α*SampleRTT
- DevRTT = (1-ß)*DevRTT + ß*|SampleRTT-EstimatedRTT|
- Timeout Interval = EstimatedRTT + 4*DevRTT
Where there is a safety margin to allow for deviations in the EstimatedRTT.
α = 0.125
ß = 0.25
TCP Fast Retransmit
Duplicate ACKs to trigger early retransmission.
Trigger: Triple Duplicate ACK - When the same ACK is received 4 times (3 times extra).
This mechanism bypasses the timeout (Which is often relatively long)
Flow Control
Sockets usually have a RcvBuffer
value of 4096 bytes.
This controls the size of the buffer.
The Receive Window (rwnd
) value in the TCP header field of receiver-to-sender segments is the free space of the RcvBuffer.
By sending the size of the free space in the buffer, the sender will be ensured that the next data they transmit will not overflow the receiver’s buffer
Connection Management
- Handshake - Agreement on connection and connection parameters.
Three Way Handshake
- Client sends
TCP SYN
(contains initial sequence numberx
) - Server sends
TCP SYNACK
(contains their own initial sequence numbery
)ACKNo=x+1
- Client sends
TCP ACK
ACKNo=y+1
Lost SYN
packets
There is a timer (usually 3 seconds) that waits for a SYNACK.
It will retransmit if needed.
A SYN could also be transmitted quickly by creating a new connection
Closing Connections
When the FIN bit in a packet is sent, the host can no longer transmit data (*).
It can however, still receive data
*: ACK packets are not counted as data, and can still be sent
- A sends TCP segment with
FIN=1
- B sends
ACK
and possibly its ownFIN
- A sends
ACK
Abrupt Termination
- A sends a reset packet
RST
to B to tell B to stop communicating - B does not acknowledge the
RST
packet - A will keep transmitting
RST
packets for each reply it keeps receiving fromB
TCP SYN Attack (SYN Flooding)
The SYN Flooding attack causes a server to accept spoofed connections, wasting server resources.
Whilst the server will eventually close and garbage collect the socket, large numbers of spoofed SYN packets will overwhelm the server, increasing server load.
Mitigation Techniques
- Increase connection queue size
- Decrease timeout time for the 3 way handshake
- Firewalls
- TCP SYN Cookies
TCP SYN Cookies
The initial sequence number (hash of source and destination IP and ports) is used as a secret key for a hash.
When receiving a SYN
, the server will reply with a SYNACK
containing this initial sequence number (key).
The client will then have to reply with an ACK
, checking that the ACK is equal to the initial sequence number + 1.
Only if this check is true will the server create the connection.
This stops attackers from sending many SYN packets - instead they must wait for the server’s SYNACK
, and send a second packet (ACK) with the right details
Congestion Control
Congestion increases delivery latency, loss rate, and leads to retransmissions.
- Knee Point - Point where throughput increases slowly, decay increases fast
Cliff Point - Point where throughput drops to zero (Congestion collapse), decay approaches infinity
End to End Congestion Control
- Congestion inferred from observed loss and delay
Network-asssited Congestion Control
- Feedback from networking infrastructure
- DNA, DECbit, TCP/IP ECN, ATM
- Authoritative transmission rate
TCP Send Rate is roughly (cwnd / RTT) (bytes/sec)
Congestion Window (
cwnd
) - How many bytes can be sent without overflowing routers- Computed by the sender using algorithms
Receive Window (
rwnd
) - How many bytes can be sent without overflowing the receiver’s buffers- Determined by the receiver (notified to the sender)
Sender-size window:
min(cwnd,rwnd)
Rate Adjustment
- When ACK (of new data) is received, increase rate
- When loss is detected, decrease rate
TCP incorporates two algorithms, TCP Slow Start then AIMD.
TCP Slow Start (Bandwidth Discovery)
Increase transmission rate exponentially (doubled) until the first loss event.
- Initial rate slow, but ramps up exponentially
Additive Increase Multiplicative Decrease (AIMD
)
- Additive Increase - Increases
cwnd
every RTT (ACK received) until loss detected - Multiplicative Decrease - Half the
cwnd
Steady State Threshold (ssthresh
)
The steady state threshold is the cwnd
value which will trigger the protocol to change from TCP Slow Start to AIMD.
This is often set to some high value, and is decreased (by means of halving) on loss
TCP Reno (New)
- cwnd = 1 on timeout
- cwnd = cwnd/2 on triple duplicate ACK (TCP fast retransmit)
TCP Tahoe (Old)
- cwnd = 1 on timeout and triple duplicate ACK