This page is a work in progress.You can help improve it. →

How does the web work

How does the web work?

You are viewing this page on handbook.suncoast.io right now. How did your browser know where to go, how to fetch this information, and how to display it? What are the technologies and processes involved in making this work? In this lesson we will scratch the surface of the deep stack of technologies, processes, code, services, servers, and protocols needed to make this work.

Domains and Hosts

One of the first things the browser looks at is the URL you are attempting to access. The URL has a few components, the first being the protocol to use. By protocol we mean the form of communication. It indicates the format for how your browser and the remote server are going to communicate. For web requests this is typically http or https, it is the part before the : in the full address. These are not the only protocol the browser supports but are certainly the most common.

Once the browser realizes you wish to visit handbook.suncoast.io it must figure out how to address the server. The internet itself does not deal in terms of names like handbook.suncoast.io but in terms of a numerical address in the form 1.2.3.4 or 192.168.145.241. The host names (e.g. handbook.suncoast.io) are a convenience to users to not have to deal with remembering the address a service is located at, and also allows the owner of the service to move it without having to tell people the new address. These addresses are an IP Address (Internet Protocol Address).

So how does the browser know to find the IP Address of handbook.suncoast.io? It uses a service known as DNS (Domain Name Services). Every client has a defined list of IP Addresses that are equipped to perform a translation of a domain name like handbook.suncoast.io into its address 104.248.50.87. The DNS process allows your computer to quickly translate the address.

How does this work?

The first address that your computer might have is that of your local router. In this case, your browser sends a request to that address asking for the address of handbook.suncoast.io. If your router has recently resolved (turned the name into an address) it will return that information to your browser. However, every request for a translation from a name into an address comes with a TTL (Time To Live), a number of seconds that the information can be safely stored and reused. So if that time has expired your router must ask for fresh information, the same as if it had never requested the information in the first place. Your router will then look at its own list of DNS addresses. Typically these addresses belong to your ISP, but you may have also set up to use DNS services from a company like Google, Cloudflare, OpenDNS or other organizations that run "Public DNS Servers." Your router will send a request to its DNS service on your behalf and await a result. That DNS service does a lookup in much the same way. For instance, if it has recently answered the same request, say your neighbor also looked up handbook.suncoast.io it will return that cached result. If that has expired, or it hasn't ever looked up that address it will then try to determine the "authoritative answer" to the request.

The first thing it needs to do is find out where this authoritative answer lives. It does so by looking at the Top Level Domain or TLD. In this case, the TLD is .io, but these can also be .com and .org, etc. The DNS server then asks the TLD server "What servers know about suncoast.io?" Notice it only cares about suncoast.io since it really wants to ask the owner of suncoast.io about the handbook part. The TLD server will respond with the DNS servers responsible for suncoast.io that were set up when the domain was registered.

These servers are then asked for the details of handbook.suncoast.io and the result is cached, and passed back down to the router, which caches it as well, and then sent to the computer, which also caches it, and finally to the browser.

Whew! And all we have done so far is ask "What address should I send my request to for handbook.suncoast.io?"

The Internet is a wondrous thing indeed! Oh, and we skipped over quite a bit of detail here about how these requests travel over the internet itself, the different kind of "packets", and many other details of how DNS works. We also didn't discuss a new protocol for secure DNS requests called "DNS over HTTPS" or a similar protocol called DNSSEC. This is a good example of how each technology we encounter is, itself, a deep and fascinating subject.

Let's take a look at a few DNS examples

Your computer may come with a tool named dig to lookup DNS entries. If not you may have a tool named nslookup.

Let's lookup handbook.suncoast.io

dig handbook.suncoast.io

In addition to other details, you will get something similar to the following. NOTE: your address may differ from 162.243.166.170

;; ANSWER SECTION:
handbook.suncoast.io. 20 IN A 162.243.166.170

The sections here mean we are looking for handbook.suncoast.io, we are asking for an A (Address) and the answer was 162.243.166.170 and we may cache this information for 20 more seconds before having to request it.

This is a fairly simple response. However, responses may be complex.

Try using dig (or nslookup) to query amazon.com

dig amazon.com

In addition to other details, you will get something similar to the following. NOTE: your address may differ from those below

;; ANSWER SECTION:
amazon.com. 60 IN A 205.251.242.103
amazon.com. 60 IN A 176.32.98.166
amazon.com. 60 IN A 176.32.103.205

Here we received three different answers. This means that any of these three addresses are valid for amazon.com. This implies that amazon.com is using load balancing. Our client should choose one of these three to use. Some clients randomly choose while others take the first entry. Try running the lookup again and you will see that amazon.com is returning the answers in random order. This allows amazon.com to spread out the requests to different servers.

A service like amazon.com may also be sending responses back based on geography. That is, it will send different responses to requests coming from the east coast of the US than the west coast. Amazon is large enough to have geographically distributed servers and thus want to direct you to the physically closest server to give you the fastest response time. It also allows them to have fault tolerance. Since they have many servers that can handle your request it does not matter if one of them fails. They simply remove it from the available list of servers. Your browser will use one of the other addresses if it cannot communicate with the first one it tries.

Another type of response may be a CNAME response. Try a lookup of www.yahoo.com

www.yahoo.com. 57 IN CNAME new-fp-shed.wg1.b.yahoo.com.
new-fp-shed.wg1.b.yahoo.com. 57 IN A 72.30.35.9
new-fp-shed.wg1.b.yahoo.com. 57 IN A 98.138.219.232
new-fp-shed.wg1.b.yahoo.com. 57 IN A 98.138.219.231
new-fp-shed.wg1.b.yahoo.com. 57 IN A 72.30.35.10

You'll see that www.yahoo.com is a CNAME for new-fp-shed.wg1.b.yahoo.com. The CNAME means canonical name, or that new-fp-shed.wg1.b.yahoo.com is the real name of yahoo.com. Then that name is resolved and we see that there are 4 addresses for that name.

DNS makes for a remarkably dynamic and resilient service.

Making Connections

What happens after our computer resolves the address of a server? The next thing we need to do is connect to a particular service on that machine. Remember when we mentioned that part of the URL is a protocol. This also indicates which service we want to speak to.

There are many kinds of services we can run over the internet. The most familiar is http and https for serving web pages and APIs. But there are also services like pop3 and smtp which deal with email as well as services like postgres, our database.

Each of these services will be listening for a connection from a client on a particular port number. If you think of the IP Address as a street address of an office building, you can think of the port number as which room in the building the service is in.

Port numbers range from 1 to 65535 with many of the common services being in the range less than 1024. Any port number 1024 or less is typically restricted to privileged accounts on a server. Taking our example of http and https these services are typically on port 80 and 443 respectively.

To connect to a service on a port the computer creates something called a socket. A socket is a virtual connection between your computer and a port on a remote computer. Think of it like a pipe that information can flow through in both directions. Once this socket is established we can send information and we can receive information in return.

How do we connect to one of these ports?

Let's try connecting like the browser does.

For this, we will be using a tool named netcat.

On Mac OS install it with: brew install netcat On Linux it is likely sudo apt install netcat. On Windows: scoop install netcat.

To connect: nc handbook.suncoast.io 80

Now that we are connected, how do we talk? We use the http protocol that is well documented.

HTTP

GET a page

GET / HTTP/1.1
Host: handbook.suncoast.io

(blank line after Host:)

This will send back some text. This text includes:

What are other HTTP headers?

  • Date Timestamp on Server

  • Content-Type How should this content be interpreted

  • Content-Length How long is this content in bytes

  • Last-Modified When was this content last modified

  • Other tools

    • curl
    • http (httpie)
    • Insomnia

Resources

© 2017 - 2022; Built with ♥ in St. Petersburg, Florida.