HTTP
Contents
History of HTTP
HTTP 0 - 1990
HTTP/0.9 - 1991
HTTP/1.0 - 1992 - simple caching
HTTP/1.1 - 1996
HTTP/2.0 - 2015
HTTP Overview
- HyperText Transfer Protocol
- Client-Server Model
- Generally operates on port 80 and 443 (SSL)
- Uses TCP as the underlying transport layer
- Stateless
- Server maintains no information about past requests from a given client
- General flow
- TCP connection opened
- Server accepts conncetion
- HTTP messages exchanged
- TCP connection closed
- Entirely text-based
- Not very efficient though
- the string “12345678” -> 8 bytes
- the integer 12345678 -> 4 bytes
- headers do not have any guaranteeed order
- requires strings to be parsed
HTTP Requests
1 2 3 4 5 6 7 8 9 10 | GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n |
- All lines in the request are followed by a
\r\n
- First line has the request
- GET / POST / HEAD / etc…
- Path
- Protocol version
Header
:Value
- The request is terminated with another
\r\n
HTTP Response
1 2 3 4 5 6 7 8 9 10 11 12 | HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n Server: Apache/2.0.52 (CentOS)\r\n Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n ETag: "17dc6-a5c-bf716880"\r\n Accept-Ranges: bytes\r\n Content-Length: 2652\r\n Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html; charset=ISO-8859-1\r\n \r\n data data data data data ... |
- HTTP header sent with optional response
- Header ends with a
\r\n
before the data is sent
- Header ends with a
Cookies
Cookies are pieces of text stored on the client’s browser. These cookies are sent to the server whenever a request is made, and helps to track and identify who the client is (Remember, HTTP is stateless!)
They are stored on the client as Set-Cookie
headers in HTTP Responses
Measuring HTTP Performance
The Page Load Time (PLT) is the timing metric, from when the user clicks on a link, to when the client sees the page.
It depends on factors such as the page’s content/structure; the protocols involved; and the network environment (bandwidth and RTT)
Improving PLT
- Reduce transfer sizes
- Smaller images
- Compression
- Compression of text, images (ie lossy vs lossless)
- Change how HTTP works
- Persistent Connections
- Instead of a new TCP connection being created for each resource; reuse the same connection
- Reduces RTT from TCP connection setup
- Allows TCP to learn more accurate RTT estimates
- Allows TCP congestion window to increase
- Leverage previously discovered bandwidth
- Pipelining (HTTP/1.1)
- Request for multiple resources within the same request
- Reduces the number of RTT
- Change where the response come from
- Caches and Web Proxies
- Store copies of the server response
- Has file-newness implications
ETag
header - identifier of version of a resource- Can use the Conditional GET header
If-Modified-Since: <ETag>
- Can use the Conditional GET header
- CDNs
- Move the content closer to the user
- Pull - Local CDN pulls the resource when requested
- Push - CDNs distribute resources, expecting it to be accessed
HTTPS
HTTP over Transport Layer Security (TLS)
HTTP/2.0
Googly SPDY (pronounced Speedy) - RFC7540
- Better structure of contents
- Servers can push new content out
- Multiplexed requets and responses
- Overcomes head-of-line blocking seen in HTTP/1.1
- Resource request prioritisation
- Compression of HTTP headers