A Web-Based Introduction to Computer Networks for Non-Majors

HTTP: The Protocol Used by the Web


The Worldwide Web is a particular, very popular, application that uses the Internet. The web involves two types of programs: web browsers and web servers. Any machine on the Internet can have a web browser run on it, a web server run on it, or both run on it. A web server holds files called web pages. A user requests that his or her web browser retrieve a particular web page by entering the web page's address (its URL for Universal Resource Location) in the proper text box in the web browser window or by clicking on a hypertext link. The web browser, in turn requests that web page from the web server that has that page. The web server replies by sending the web page. Web pages are written in a text language called HTML (HyperText Markup Language). The web browser interprets the HTML to display the graphical image of the page on the screen of the user's machine.

For example, at Western Carolina University just about every machine has a web browser program running on it. The two most popular web browsers are Netscape Navigator and Microsoft Internet Explorer. The main university web server is running on the machine cowee.wcu.edu that has the alias www.wcu.edu. However, there are a number of other web servers running on campus. For example, the Department of Mathematics and Computer Science has its own web server running on the machine sol.cs.wcu.edu that has the alias www.cs.wcu.edu.

Given the organization of the web the only topics that need further discussion are the nature of HTML and the nature of the protocol that the web browsers and web servers use to interact. HTML is a simple language that consists of a series of text commands called tags that are inserted in a document to control how the document is to be formatted. There are tutorials on the web for HTML such as

http://archive.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html.

For example, the following screenshot shows a very simple web page. The web page appears to just consist of the text

The actual web page, as are all web pages, is in HTML format and is shown in the next screenshot. The words, such as <HEAD>, beginning with a < character and ending with a > character are the tags. This is the web page as it is sent from the web server to the web browser. The web browser interprets the HTML to make it appear as shown in the screenshot.

If you are viewing a web page using either the Netscape Navigator or the Internet Explorer web browser you can see the HTML format of that web page by choosing the item Source from the View menu on the menu bar. You do not need to understand HTML to create web pages. A number of programs exist that help you to create a web page or to turn a document in another format into HTML.

HTTP

The connection between Web servers and Web browsers is based on a simple application layer protocol called HTTP (HyperText Transfer Protocol). HTTP connects a HTTP client (that is, the web browser) and a HTTP server (that is, the web server). Recall from the protocol stack applet that the application layer is at the top of the TCP/IP protocol stack. An HTTP connection involves four steps:

  1. Open the connection. The client contacts the server at the Internet location specified in the URL (Universal Resource Location). The connection is a transport layer connection; in particular, it is a TCP (Transmission Control Protocol) connection.
  2. The request. The client sends the server a message requesting service. The request includes a HTTP request header that defines the method requested for the transaction, information about the client, and then the data (if any) being sent to the server. The most common HTTP methods are GET for getting an object from a server and POST for posting (that is, sending) data to an object on the server.
  3. The response. The server sends a response to the client. The response consists of response headers describing the state of the transaction (for example, that the status of the transaction was successful) and then the actual data (if any).
  4. Close the connection.

A common example of a use of HTTP is using a web browser to click on a hypertext link in a web page. That link specifies a URL of a HTML document on another machine. That click causes

HTTP Example Exchange

HTTP is at the application layer in the protocol stack. An advantage of being that high in the protocol stack is that the request and reply messages at that level are simply a sequence of characters. As a result looking at a message is just like reading a text document.

It is possible for you to perform a live demonstration and observe the messages of HTTP yourself. The web server is simply a program running on a computer and listening on something called a port. Normally web servers listen on a computer's port 80, but you can have it listen on any port you wish. When a web browser sends a request message the request message goes to the computer running the web server and to the port on that computer on which the web server is listening. It is simple to write a program that pretends to be a web server listening on a port and displaying the request messages that it receives.

For example, Elliotte Rusty Harold wrote such a program and called it clientTester. The source code and an explanation of the code is in his book Java Network Programming, Second Edition, O'Reilly, 2000. I followed the steps below in using clientTester to do a live demonstration showing the HTTP request message from a web browser.

  1. I copied the source code for clientTester to my account on our departmental Linux server and compiled it into an executable program. ClientTester is written in the Java programming language so this executable will run on many different platforms.
  2. I then used the command java clientTester 2000 to start the clientTester program running and listening on TCP port 80. For this command to work the Java Development Kit had to have been installed on our departmental server.
  3. I then started my web browser and near the top of the browser window in the text box labeled Location and entered the URL http://www.cs.wcu.edu:2000/simple.html. This causes the web browser to send the HTTP request message for the file simple.html to the port 2000 on the machine www.cs.wcu.edu.
  4. The program clientTester displays in the command interpreter window the message it received from the web browser. The message is the request message example for the file simple.html. Below is a screenshot showing the clientTester display.

We will not go into the details of this message except to mention that the word GET starts the message tells the web server that this is a request from a web client. The word /simple.html says that the request is for the file named simple.html.

Just as it is possible to do a live demonstration showing the HTTP request message, it is possible to do a live demonstration showing the HTTP reply message. Unlike clientTester the program you need to show the reply message is probably already on your machine. The program is telnet and comes with the Windows XP, UNIX, and Linux operating systems. I followed the steps below in using telnet to do a live demonstration showing the HTTP reply message from a web server.

  1. I placed the web page simple.html (the content of this file is shown above) on the web site managed by our departmental web server. This web server is listening on port 80 of the machine www.cs.wcu.edu.
  2. At a command prompt on any machine that has the telnet program enter the command telnet www.cs.wcu.edu:80 to establish a telnet connection to port 80 on our departmental server.
  3. I then pretended to be the web browser which sent the HTTP request message for the file simple.html. Having used clientTester I knew the lines that made up that request message, so I entered those lines into the telnet connection and followed them with a blank line. Actually I simplified the request message to the simplest version which is just the line starting with the word GET.
  4. The web server accepted the request message and sent back to telnet the reply message. The screenshot below shows the complete exchange of both my one line request message and the complete reply message. The lines from the line HTTP/1.1 200 OK to the line Content-Type: text-html; charset=iso-8859-1 are the header of the reply while the rest of the lines are the body of the message. The body of the message is the web page, named simple.html, that was requested.