The web and internet as a large are fundamental for today’s world, but how does it work under the hood?
In this article, we will talk about HTTP requests that are the fundamentals blocks that make the modern web possible.
The Aha moment
Reasoning about the topic becomes extremely simple as soon as we figure out what is an HTTP request, and the reality is extremely simple.
An HTTP request is nothing more than a stream of bytes, almost always ASCII bytes that it is possible to read and interpret. To put it even more simply, it is just a string formatted following some rule.
The one below is already a complete HTTP request and this is the most important concept. An HTTP request is nothing more than strings of text similar to this one.
GET / HTTP/2 Host: www.google.com User-Agent: curl/7.58.0 Accept: */*
An HTTP response is not that different:
HTTP/2 200 date: Fri, 10 Apr 2020 11:25:36 GMT expires: -1 cache-control: private, max-age=0 content-type: text/html; charset=ISO-8859-1 p3p: CP="This is not a P3P policy! See g.co/p3phelp for more info." server: gws x-xss-protection: 0 x-frame-options: SAMEORIGIN set-cookie: 1P_JAR=2020-04-10-11; expires=Sun, 10-May-2020 11:25:36 GMT; path=/; domain=.google.com; Secure set-cookie: NID=202=lXbpl7jzwsRDvcSyw84CtGB7NO3J2HziT0SjF24N4joVsoUzXNRdc03yTeckZu2zQXc8TJty73IYg9ktX3yrtSb59lC1-jxyTprH_wGly4D2RiFC4Ww1T2Om69YYjxDtkgEDmQbqoYYyzahBQowvSM-q5JpF6hoC-gzLRTnnn38; expires=Sat, 10-Oct-2020 11:25:36 GMT; path=/; domain=.google.com; HttpOnly alt-svc: quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,h3-T050=":443"; ma=2592000 accept-ranges: none vary: Accept-Encoding <!doctype html><html> ...THE_HTML_OF_THE_GOOGLE_HOMEPAGE... </html>
All those libraries and frameworks are just a way to create, interpret and send over the network strings that look like this.
The HTTP rules and protocol
The rules to create a correct HTTP request are encoded in an RFC 7230 it can be interesting to explore the document and understand at least some of the details, but for most developer will be an overkill. However, if your work orbit a lot around the web infrastructure it can be necessary.
For instance, Section 3 of RFC 7230 describe formally what is an
HTTP-message = start-line *( header-field CRLF ) CRLF [ message-body ]
This definition means that an
HTTP-message is nothing else than a
start-line, followed by 0 or more
header-field each followed by a CRLF (a new line
\r\n), another new line and an optional
Keep reading the RFC we will discover what is
start-line on section 3.1 and so on.
It turns out that the
start-line of a request specify the method of the request, either
TRACE all documented in RFC 7231 Section 4.
Most developers just need to be aware of all the possible methods, but they won’t use all of them.
Then it specifies what is the target of the request, in our case the root directory
And finally, it specifies what protocol to use in the communication, in this case, we use
HTTP/2, another common one is
Keeping state, authentication, and cookie
The HTTP protocol is mostly stateless, there is nothing that forbids a client to request somebody private records on a social network protocol after all an HTTP request is just a string formatted following some rules.
However, the server needs to forbid those requests.
To overcome the limitation imposed by being a stateless protocol, HTTP relies on the concept of Cookies.
An HTTP response, generated by an HTTP server, can include the
Set-Cookie header (the name is case insensitive, the request above to google.com returned
Set-Cookie header instructs the
user-agent (the client, with some level of approximation) to store the value of the cookie and to send it to the server to any subsequent request using the
Another RFC (6265) covers this topic extensively.
For most developer is sufficient to know that the
Set-Cookie header provides some options to tweak how the client manages the cookie.
For instance, the
Path attribute indicates that the cookie should be sent only for requests against a specific URL path.
Secure indicates that the cookie should not be sent on insecure connections.
HttpOnly indicate that the cookie should not be accessible outside the HTTP protocol, for instance, it should not be read using the JS API on the browser.
While the protocol itself is rather simple, after all, it is just about concatenating strings and keep tracks of cookies, the necessities of modern software forced us to build complex systems to extract as much performance as possible while keeping an ergonomic interface.
Most likely developers would like to make more requests in parallel while maybe sharing some cookies but not all.
All these real-world necessities and use case makes the software explode in complexity, but also in usefulness.
HTTP remains a simple protocol and the CPython codebase offers a glimpse of this simplicity.
request = '%s %s %s' % (method, url, self._http_vsn_str) self._output(self._encode_request(request))
It is creating the
start-line of an HTTP message, just like we did at the very beginning of this post with
GET / HTTP/2
Create a function to create HTTP requests.
As input, you can expect the methods (
POST, etc…), the host and the URL of the resource.
You can expand the little function to support also headers?
And how would you manage cookies?
In the next section, we will discover DNS, sockets, and TCP. So it will be possible to actually send a request and read back the response.
From the standard of
HTTP1.1, the headers are strictly optional, indeed they are defined like this
*( header-field CRLF) where the star (
*) means zero or more repetition.
However, the same standard dictates that the
Host header must be present.
This clear inconsistency was introduced to keep backward compatibility so that every
HTTP1.1 request is also an
This is a clear example of how the world evolves from mistakes and error and also something as widespread and used as "the internet" was designed with some initial mistakes. After all, we are human and the success of the
HTTP was not sure.