COMP249 Week 4

Steve Cassidy and Yan Wang

Web Servers

  • Web servers are (slave) programs that provide documents to requesting browsers.
  • Typically, they are the programs to answer HTTP requests.
    • Listen on a server port (80 by default)

    • Accept GET/HEAD/POST request

    • Map resource name (URL) to a local resource

    • Retrieve local resource and send it back to client

Refer to Wikipedia

Web Servers and Clients

  • Web browsers initiate network communications with servers by sending them URLs.

  • A URL can specify one of two different things
    • the address of a data file stored on the server that is to be sent to the client

    • a program stored on the server that the client wants executed, with the output of the program returned to the client

Web Servers and Clients

  • When a Web Server begins execution, it informs the operating system under which it is running that it is ready now to accept incoming network connections through a specific port on the machine.

  • While in this running state, the server runs as a background process in the operating system environment.

  • A Web client, or a browser,

    • opens a network connection to a web broswer

    • sends information requests, and possibily data to the server

    • receives information from the server, and

    • closes the connection

  • Of course, other machines exist between the Web servers and clients

    • e.g. network routers and domain-name servers

Embedded web servers

Source, see also Wikipedia

Mapping Resource Names

  • The primary task of a web server is to monitor a communication port on its host machine, accept HTTP commands, and perform operations specified by the commands

  • All HTTP commands include a URL, which inlcudes

    • the specification of the host machine

    • a filename, or

    • a program name (e.g. .py, .asp)

    • (it may also contain some data)

Mapping Resource Names

  • A Web server typically has two root directories

    • document root - its file hierarchy stores Web documents that serve to clients

      • Many server allow secondary areas that are outsinde the directory of document root or even the server machine

      • connected in a LAN - configured to direct request URLs wiht a particular file path to a storage ares

      • Secondary areas are called virtual directories or virtual document trees
    • server root - along with its descendant directories, stores the server and its support software

Mapping Resource Names

http://online.mq.edu.au/pub/COMP249/lectureschedule.html
  • Resource name: /pub/COMP249/lectureschedule.html

  • Mapped to a local file system:

    /home/httpd/html/pub/COMP249/lectureschedule.html
    C:\Web\httpd\html\pub\COMP249\lectureschedule.html
    

Mapping Resource Names

http://online.mq.edu.au/pub/COMP249/
  • Resource name: /pub/COMP249

  • Server must look for a default name in the given directory: index.html, index.htm, etc.

  • Settings are dependant on server configuration

Mapping Resource Names

http://www.ics.mq.edu.au/~cassidy/
  • Resource name: /~cassidy/

  • Refers to the personal directory of a user

  • Look in user's home directory for a give subdirectory: html (in OCS), public_html (also common).

  • Permissions:

    • Server runs as an untrusted user

    • Needs to be able to read and perhaps execute files in your html directory.

Generating Resources

http://www.smh.com.au/articles/2005/03/13/1110649055094.html

http://slashdot.org/article.pl?sid=05/03/13/1853233&
   tid=133&tid=186&tid=159
  • Server is free to find a resource any way it chooses

  • This includes finding it in a database or running a program to generate it.

  • In the SMH case the stories are likely to be stored in a database and served as needed, other content is added on the fly.

  • The Slashdot URL refers to a Perl script which will be run to generate the content. The remaining text is GET encoded form variables.

Complicated URLs

http://ad.doubleclick.net/click;h=v2|30d0|0|0|%2a|l
;7516609;0-0;0;8856706;3454-728|90;4719404|4737300|1;
;%3fhttp://www.sun.com/emrkt/sunfirev20z/

http://ad.au.doubleclick.net/click%3Bh=v5|33ae|3|0|%2a
|h%3B27111491%3B0-0%3B0%3B12619400%3B1-468|60%3B14797496
|14815392|1%3B%3B%7Esscs%3D%3fhttp://www.energy.com.au/onit

Note that these are folded onto multiple lines for display purposes. Note the use of escape codes like %3B to include characters in the URL that aren't allowed. Other special charecters: e.g. %20 or + (space), %21 (!) - % plus the hexidecimal ASCII code of the character.

MIME Types

  • Problem: how does a client know what kind of data it's getting?
    1. Look at the file extension on the URL
    2. Look at the contents of the returned data
    3. Rely on the server to tell it.
  • Answer: Rely on the server:
    • Content-Type HTTP header
    • Eg. Content-Type: text/html

MIME Types

Some MIME types:

text/html, image/jpg, audio/mp3, application/xml, application/xhtml+xml, text/plain, application/cybercash, video/mp4, text/x-vcard, text/css, multipart/digest, chemical/x-genbank, video/quicktime, application/pdf

The HTTP Protocol

  • Requires: a connection between client and server
  • Stateless: no login process, each request is independant
  • Simple format: request header, blank line, possible payload
  • Symmetrical: allows data to be sent and recieved
  • Very easy to implement but scales very well
  • In contrast, e.g. FTP (refer to 1 and 2) is more complex.

Example HTTP GET Request

GET /~cassidy/ HTTP/1.1
Host: www.ics.mq.edu.au
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
      Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,
      text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: UserTrack=63B08C38-1234-0000-0000-00000000000000; 
	    

Try at http://web-sniffer.net/ or install live header - a Firefox add-on.

Note lines folded for display.

What do each of these headers mean? Which are required? Many are defined in the HTTP standard but others can be defined via the HTTP extension framework.

Example HTTP Response

HTTP/1.x 200 OK
Date: Mon, 20 Mar 2006 05:33:32 GMT
Server: Apache/2.0
Accept-Ranges: bytes
Content-Length: 4111
Keep-Alive: timeout=15, max=499
Connection: Keep-Alive
Content-Type: text/html
Content-Language: en
	    
For status code, refer to w3c.org.

Example HTTP POST Request

POST /~steve/form.html HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
      Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,
      text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/~steve/form.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 106

name=Steve+Cassidy&interests=This+is+a+field+with%0D%0Aquite+a+bit+
    of+text%0D%0Athat+has+linebreaks.%0D%0A
	    

Note lines folded for display.

This is a POST request, note how the data is encoded in the request body.

Example HTTP GET Request

GET /~steve/form.html?name=Steve+Cassidy&interests=This+is+a+field+
     with%0D%0Aquite+a+bit+of+text%0D%0Athat+has+linebreaks.%0D%0A HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
     Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,
     text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/~steve/form.html
If-Modified-Since: Mon, 20 Mar 2006 06:22:29 GMT
If-None-Match: "4f42a9-fd-40f672edb1340"
	    

Note lines folded for display.

This is the same form submitted via a GET request, here the data is encoded in request URL. Note also the If-Modified-Since header in this request, sent because my browser has just asked for the same resource.

HTTP Redirect

GET /~steve/ HTTP/1.1
Host: www.shlrc.mq.edu.au

HTTP/1.x 301 Moved Permanently
Date: Mon, 20 Mar 2006 06:32:36 GMT
Server: Apache/2.0.46 (Red Hat)
Location: http://www.ics.mq.edu.au/~cassidy/
Content-Length: 242
Connection: close
Content-Type: text/html; charset=iso-8859-1
	    

Alternately

<meta http-equiv="refresh" 
      content="URL=http://my.new.site.com/">
	    

The HTTP redirect is a server response that can be used to indicate that a resource has moved to a new location. An alternate is to include the above meta tag in a page header to force a redirect from the current page.

HTTP Verbs

  • GET - get a resource, Idempotent
  • POST - send some data to a resource
  • HEAD - get headers for a resource
  • PUT - create a new resource
  • DELETE - delete a resource

Web Servers

Apache

Server Logs

  • Web servers receive information in request headers

  • This can be logged for later analysis

  • See the Platypus logs

  • Tools can analyse the logs to generate reports

Write Your Own Webserver

15:HOST = ''                 # Symbolic name meaning the local host
16:PORT = 50004              # Arbitrary non-privileged port
17:s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
18:s.bind((HOST, PORT))
19:s.listen(1)
20:
21:
22:conn, addr = s.accept()
23:data = conn.recv(4096)
24:words = data.split()
25:

...continued

26:if len(words) > 0 and words[0] == "GET":
27:    page = """<html>
28:<head><title>Hello</title></head>
29:<body><p>Your request was:</p>
30:<pre>""" + 
31:data + """
32:</body>
33:</html>
34:
35:"""
36:
37:    header = """HTTP/1.0  200 ok
38:Content-length: """ + str(len(page)) + """
39:Content-type: text/html
40:
41:"""
42:else:
43:    header = "HTTP/1.0  440 Page Not Found\n\n"
44:    page = ""
45:
46:print header+page
47:conn.send(header+page)
Download the full script

Even Better...use Python Modules

 8:import BaseHTTPServer
 9:import CGIHTTPServer
10:
11:server_address = ('', 8000)
12:handler = CGIHTTPServer.CGIHTTPRequestHandler
13:handler.cgi_directories = ['/cgi-bin']
14:httpd = BaseHTTPServer.HTTPServer(server_address, handler)
15:
16:print "Starting server. Connect to http://localhost:8000/"
17:
18:httpd.serve_forever()
19:
20: