GET/POST etc are one-off transactions
No concept of a user session with a web server in HTTP
HTTP/1.1 -- multiple requests per connection, each independant
Problem: how to allow login, transactions, user customisation?
HTTP wasn't designed as a protocol for long-lived transactions -- if we were starting again we might build in some idea of state or at least have the option of a stateful protocol. We didn't so we need some other mechanisms; luckily these are easily managed within the HTTP protocol.
How do we get the effect of transaction state?
Additional problem: can't identify clients uniquely
Primary method:
Give client a unique token
Client returns token with each request
HTTP Mechanisms: form variables, cookies
What info do we want to exchange? In general we need to get some info back from the client to enable us to identify them on each transaction. This could be as simple as a username but more generally will be a unique session key. The exchanged value allows the server to look up the user or session state in a local data store.
Hidden form variable carries session info:
<form action='go.cgi'> ... <input type='hidden' name='userid' value='steve' /> ... </form>
When you generate a form via a CGI script the value can be inserted based on the logged in user.
Recall that form variables can be sent as part of the URL in a GET request:
http://www.here.com/process.cgi?session=31926xxks6
This can be used to pass state information back to a server:
Modify all URLs in a delivered page by adding ?session=31926xxks6
Server can track a user through a session via parameters in the GET request.
NOTE: for this to work, page generation must be via a program (CGI, JSP, etc), but then, you can't keep track of users without code.
A name=value pair sent to client by the server
Client returns the same cookie with every HTTP request to that server
Similar effect as a hidden form variable without the URL ugliness
Cookies can persist indefinitely or for a fixed time
HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: session=31926xxks6; Path=/;
<html>
...
Set-Cookie header instructs client to remember this cookie. Client sends it back on next request:
GET /acme/shopping HTTP/1.1
Cookie: session=31926xxks6
A cookie has more than just name=value:
Path: subset of URLs to which to cookie applies, eg. /products
Domain: DNS domain that the cookie applies to, eg. .ics.mq.edu.qu
Max-Age: lifetime of the cookie in seconds
Expires: date at which cookie becomes invalid (older specification)
Note that Expires was the old way to specify the lifetime of a cookie, Max-Age is the new way.
Secure: if present, only send this cookie down secure connections
Comment: human readable information about the cookie
Version: which version of the cookie standard is being used
POST /acme/login HTTP/1.1 [form data] HTTP/1.1 200 OK Set-Cookie: Customer="WILE_E_COYOTE"; Version="1"; Path="/acme" POST /acme/pickitem HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme" [form data] HTTP/1.1 200 OK Set-Cookie: Part_Number="Rocket_Launcher_0001"; Version="1"; Path="/acme" POST /acme/shipping HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"; Part_Number="Rocket_Launcher_0001"; $Path="/acme" [form data]
This example is taken from the HTTP state Management Mechanism standard (RFC2109).
Session cookie:
No Max-Age/Expires attribute
Survives only during the browser session
Persistant cookie:
Max-Age/Expires attribute set
Stored in local persistent store by client
Survives browser session
Cookie value is opaque to the client
Client agent should provide each of the following minimum capabilities individually:
At least 300 cookies
At least 4096 bytes per cookie
At least 20 cookies per unique host/domain
Clients may disable cookies -- why?
Cookies have been seen as a serious privacy threat since they allow sites to track users browsing patterns and potentially identify them via personal info entered into forms etc.
>>> import Cookie >>> C = Cookie.SimpleCookie() >>> C['name'] = 'steve' >>> print C Set-Cookie: name=steve; >>> C['item'] = 'water pistol' >>> print C Set-Cookie: item="water pistol"; Set-Cookie: name=steve; >>> C['item']['path'] = '/products' >>> print C Set-Cookie: item="water pistol"; Path=/products; Set-Cookie: name=steve;
from Cookie import SimpleCookie import os C = SimpleCookie() C['item'] = "Water Pistol" C['item']['path'] = '/~cassidy/' # now output the HTTP header and content print C print "Content-Type: text/html\n" print "<html><head><title>Cookie</title></head><body>" print "<p>", C ,"</p>" print "</body></html>"
The cookie header is passed to the CGI script in the HTTP_COOKIE environment variable.
from Cookie import SimpleCookie
import os
user = ''
item = ''
if os.environ.has_key('HTTP_COOKIE'):
C = SimpleCookie(os.environ['HTTP_COOKIE'])
if C.has_key('user'):
user = C['user']
if C.has_key('item'):
item = C['item']
else:
C = SimpleCookie()
Example
print C
print "Content-Type: text/html\n"
print "<html><head><title>Cookie</title></head><body>"
if not user == '':
print "<p>User is ", user ,"</p>"
print "<p>Item is ", item ,"</p>"
else:
print "<p>Our cookie not found.</p>"
print "</body></html>"
In Javascript document.cookie is a list of strings representing cookies associated with the current document.
We can set cookies by assigning to the string
document.cookie = "cookieName=cookieValue";
Here's an example which illustrates this.
To read cookies, we need to parse the cookie string:
cookieName1=cookieValue1; cookieName2=cookieValue2
http://www.foo.org/bar/baz.html
http://www.icemcfd.com/tcl/comparison.html#art42
http://www.ht.com.au/Scripts/xworks.exe?CAT:DG3:0:CP#Tof
http://www.smh.com.au/articles/2002/04/22/1019441217181.html
http://publib-b.boulder.ibm.com/Redbooks.nsf/9445fa5b41665f/beaf2a3007503da?OpenDocument
Examples of the services provided are:
Web Email (Hotmail, Yahoo Mail, ...)
Web based personal information management (My Yahoo, My Palm, My Netscape ...)
Groupware (SourceForge, Collabnet, ...)
Weblogs (Blogger, weblogs.com, ...)
Online stores (Amazon.com, ...)
Business to business (SAP...)
And many more are tagged Web 2.0 applications.
What does the server need to provide to enable these applications?
Robustness: people rely on your server. Real Money may be lost if it goes down.
Scalability: if you are successful, you will have thousands or millions of users.
Ease of development: a good system should make deployment of an application straigtforward.
Ease of administration: once deployed, the application may have many levels of administration.
A major requirement is that the server platform be reliably available to users.
Components of the platform are: hardware/operating system, web server, data store, network connection.
While each of these must be reliable, it is important to put procedures into place to enable recovery should a component fail.
For example, notify sysadmin when disks are filling up, databases are using too much memory, network is clogged, etc.
Some excellent notes by Phillip Greenspun.
Any web application will need to store data of some kind: user profiles, inventory, orders, email messages.
A successful web application will have millions of users, many simultaneous users all retrieving and storing data.
Hence, we want a data store which provides reliable fast access to large amounts of data and supports multiple simultaneous reads and writes: a relational database.
RDBMS => an efficient store of data tables and an interface for updating and retrieving this data.
RDBMS takes care of table locking, backing out updates, indexing tables for fast access, etc.
Two possible models for web/database integration are:
Have CGI scripts/Servlets access the database to generate web pages.
All server content is stored in the database, the server/servlets generate web pages directly from the database.
A relational database consists of one or more tables. Each table has a set of defined column headings or fields. One field may be defined as the primary key for the table.
Tables are populated with tuples consisting of values for each of the fields. The database system stores these tuples in such a way as to provide efficient access.
We can create an index for a table using some field. This improves access to tuples when using this field as a key.
An application will often store data in many tables with shared fields. Eg. a user/password/preferences table, an order/user table, an order/salesman table etc.
New tables can be constructed from existing tables using the query language. Tables can be joined on common fields, tuples selected by various conditions, etc.
SQL is the standard query language for relational databases. It supports queries to the database as well as various database maintainance operations.
SQL queries select data from one or more tables and produce a new table as output.
SELECT members.username, members.email, orders.amount
FROM members, orders
WHERE members.username = orders.username,
orders.amount > 1000,
ORDER BY members.name;
INSERT INTO orders
(username,item,amount)
VALUES
('steve','widget','1000');
DELETE FROM orders WHERE username = 'steve';
Programs involved in processing a web application:
Web Server
CGI Script/PHP/ASP/JSP
Database Engine
Each program presents an API to the other layers.
General purpose: CGI, SQL
Optimised for web applications: Sessions, persistent object store.
Think of the overheads involved: memory, time.
Web application servers tie these components together more closely.
Apache provides an interface for loadable modules which can extend it's functionality.
mod_python is a module which runs Python scripts without the overhead of CGI.
mod_python initialises a Python process and runs many scripts in that process -- variables persist between invocations of scripts.
mod_python includes an interface to the Apache process as an alternative to CGI.
Also, see mod_perl, mod_tcl, etc.
Zope is a web application server product based on an object database and scripts written in Python.
Instead of serving pages from the file system, Zope serves objects from it's database.
These objects are organised in a hierarchy (just like a file system) and can be of differnt types (just like files).
To generate a page, Zope maps the URI to a database object and applies templates (eg. standard headers and footers) to the object before output.
Zope objects can be HTML text, images, databases, SQL queries, scripts, etc. HTML text can embed processing instructions.
ACS is a web application server built around the base AOL HTTP server and a set of Oracle databases and scripts written in Tcl.
ACS supports the idea of a web based community. It emphasises tracking users progress through the site and enabling users to contribute to the content of the site.
Each page in ACS can have comments added to it by users of the system. Pages and comments are stored in the database and pages are generated dynamically when requested.
Additional functionality can be provided via downloadable modules or by writing scripts in Tcl or Java.
AOL Server manages all database connections in a pool for use by Tcl scripts. Tcl scripts have access to server state.
Like many scripting languages Python has a standardised way of talking to relational databases
The Python DB-API defines how databases should be exposed to Python
Implemented for: DB/2, Informix, Ingres, JDBC, MySQL, Oracle, PostgreSQL, Sybase
Standard interface means projects can be portable between database vendors
From The Python DB-API in Linux Journal
>>> import soliddb
>>> conn = soliddb.soliddb("Upipe SOLID', 'myusername', 'mypassword')
>>> cursor = conn.cursor()
>>> cursor.execute("select * from Seminars")
>>> cursor.fetchall()
[(4, 'Web Commerce', 300.0, 26),
(1, 'Python Programming', 200.0, 15),
(3, 'Socket Programming', 475.0, 7),
(2, 'Intro to Linux', 100.0, 32),
]
For SQLite (pysqlite2)
>>> from pysqlite2 import dbapi2 as sqlite >>> conn = sqlite.connect(dbname) ...
A cursor object allows you to send a query and get the results.
...
cursor = conn.cursor()
cursor.execute('select * from Attendees where seminar = 1')
while 1:
attendee = cursor.fetchone()
if attendee == None: break
print attendee
cursor.execute("update Attendees set paid='yes' where name='steve'")
#...more updates...
conn.commit() # commit changes to database
#...more updates...
# discover a problem
conn.rollback() # undo changes since last commit
from pysqlite2 import dbapi2 as sqlite
# Create a connection to the database file "mydb":
con = sqlite.connect("mydb")
# Get a Cursor object that operates in the context of Connection con:
cur = con.cursor()
# Execute the SELECT statement:
cur.execute("select * from people order by age")
# Retrieve all rows as a sequence and print that sequence:
for row in cur.fetchall():
print row
Some examples can be found here.