An Introduction to HTTP

An Introduction to HTTP

In this blog we take an overview of the HTTP protocol, exploring how it works.

Web Servers

Web content(resources) lies on web servers. Web servers follow the HTTP protocol and are called HTTP servers. These HTTP servers store the internet's data and provide this data when it is requested.

Web Clients

The web client (for example a web browser) send HTTP request to servers and are provided with a HTTP response by the sever.

Together, HTTP clients and HTTP servers make up the basic components of the World Wide Web.

Resources

Web Servers host web resources. Web resources are simply just content. For ex : a static file on the web server, a text file, an MS word document, a pdf file, a jpeg image, audio files etc. A web resource can also be some application hosted on the web server that generates dynamic content. for ex a search engine like google search is also a web resource.

A web page is a collection of multiple resources.

Media Types

HTTP, tags or marks each resource being transported over the internet with a data format called MIME type. Web servers attach an MIME type to all HTTP resources. When the web browser gets this resource back from the server, it looks at it's MIME type to decide how to handle this type of resource. Most browsers can handle most of the popular object types.

MIME (Multipurpose Internet Mail Extension) was originally designed to solve problems encountered in moving messages between different electronic mail systems. It was so successful that HTTP also adopted it.

An MIME type is a textual label consisting of a key : value pair. Common types:

  • HTML formatted text documents : text/html

  • plain text documents : text/plain

  • JPEG format of an Image : image/jpeg

  • GIF format of an Image : image/gif

  • Microsoft PPT : application/vndms-powerpoint

URIs

Each resource on a web server has a unique name that helps to point out its exact location on the web server. This name is called a URI or Uniform Resource Identifier. URIs are of basically two types: URLs and URNs

URLs

This is the most common form of URI. URL stands for Uniform Resource Locator and it describes the exact location of a resource on a particular server. These tell you how to fetch the required resource by providing its exact address.

Most URLs follow this format:

  • First part : Scheme - it describes the protocol used to access the resource

  • Second part : internet address, this is a domain name that is converted to the server's IP address by Domain Name Service (DNS)

  • Third part : resource on the web server

I will discuss URLs in detail in the upcoming blogs.

URNs

URN stands for Uniform Resource Name. It serves as a unique name for a particular piece of content, independent of where the resource currently resides. As this address is independent of server, and the protocol, this resource can be used by multiple protocols. However, URNs are still in experimental phase and not yet accepted.

Transactions

An HTTP transaction consists of two parts - A request command (sent from client to the server ) and a response result (sent from server to the client ). This communication happens with formatted blocks of data called HTTP messages.

Methods (Request Command)

Every HTTP request message has a method which tells the server what action to perform. Common HTTP methods are :

  • GET : send a resource from the server

  • POST : send client data into the server

  • PUT : store data from client into a server resource

  • DELETE : delete the resource from the server

Status Code (Response Result)

Every HTTP response message comes back with a status code. It is a 3 digit numeric code that tells the client if the request succeeded, or or otherwise. Common status codes :

  • 200 : OK (Document returned Successfully)

  • 404 : Not Found (Can't find the resource)

HTTP also sends an explanatory text called "reason" (OK, Not Found). This is only for descriptive purpose and all processing is done with the numeric code.

Messages

HTTP messages are simple lines of strings. These are written and transported in plain text and not in binary form, thus easy to read and write. Messages sent from web client to the web server are called request messages. Messages sent from the web server to the web client are called response messages.

HTTP messages consists of three parts:

  • Start Line : contains either the method (request message) or status code(response message).

  • Header Files: each header file consists of a name and value, separated by a colon(:) for easy parsing. For ex: content-type : text/plain, Accept : text/*

  • Body : This is optional. This can contain any type of data.

Steps to extract a resource

  1. The browser extracts the server's hostname from the URL.

  2. The browser converts the server's hostname into server's IP address by Domain Name Service (DNS).

  3. The browser extracts the port name (if any) from the URL.

  4. The browser establishes a connection with the web server.

  5. The browser sends an HTTP request message to the server.

  6. The server sends an HTTP response back to the browser.

  7. The connection is closed and the browser displays the document.

This is how the HTTP protocol works. In the next blog, I will explore proxies, which are intermediaries between a web client and a web server.