Exercise 3

Representation and Management
of Data on the Internet

Exercise 3: Web Server

Submission date: 5/5/2005

Report dates: 12-14/4/2005

Introduction

In this exercise, you will implement a simple Web server.

The server will have the following features:

Properties of the server are defined in an XML configuration file that is loaded on the server startup.

Server Architecture

The architecture of the Web server you will create is illustrated in the following figure.

server architecture

Following is a detailed description of the architecture.
Your server will comprise the following parts:

Note that you are required to design and implement the above classes. We only require that you preserve the names of these classes. In particular, you should write the following Java classes:

  1. class ConnectionRequestHandler;
  2. class HTTPRequest;
  3. class RequestDispatcher extends java.lang.Thread;
  4. abstract class HTTPRequestHandler extends java.lang.Thread.

Notes:

HTTP Requests and Responses

The server should handle HTTP 1.1 GET requests over non-persistent connections. Such a request has the following form:

GET[space]resource[space]HTTP/1.1\r\n
header1\r\n
.
.
.
headerK\r\n
[blank]       

The argument resource can have two possible forms:

  1. An absolute URL of the resource, e.g., http://xil-20.cs.huji.ac.il:8001/count.jsp
  2. A file path, i.e., an expression of the form /dir1/dir2/ ... /dirK/name.ext. The directory of the requested resource is considered relative to the directory from where the server is being launched. For example, if the server is launched from ~snoopy/dbi/ex3/ then the resource /a/b/c.html actually denotes the file ~snoopy/dbi/ex3/a/b/c.html.

For example, suppose that the server runs on the computer xil-20 of our department, and that it listens to the port 8001. Also suppose that the server is being launched from ~snoopy/dbi/ex3/. Now, suppose that a client wishes to get the file /images/snoop.gif of the server. Then, it may issue the following request:

GET http://xil-20.cs.huji.ac.il:8001/images/snoop.gif HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,  */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts-MyWay)
Host: xil-20.cs.huji.ac.il:8001
Connection: close
[blank]

Alternatively, the client may issue the following request:

GET /images/snoop.gif HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts-MyWay)
Host: xil-20.cs.huji.ac.il:8001
Connection: close
[blank]

You may assume that all the requests your server will get are legal GET requests that conform to HTTP 1.1. You may also assume that the host name in both the URL and the Host header is the one of your server.

Your server will respond with HTTP 1.1 responses. A HTTP 1.1 response has the following form:

HTTP/1.1[space]Status-Code[space]Reason-Phrase\r\n
header1\r\n
.
.
.
headerK\r\n
\r\n       
[ message-body ]  

Note that the reason phrase is a very short and simple message describing the reason for the given status, e.g., "OK", "Not Found", "Internal Error", "Unauthorized", etc.

For example, for the above request, the server will respond by sending the content of the file ~snoopy/dbi/ex3/images/snoop.gif, if that file exists. It may issue the following response:

HTTP/1.1 200 OK
Date: Fri, 01 Apr 2005 05:27:40 GMT
Content-Length: 3545
Connection: close
Content-Type: image/gif

... the image bytes ...          

If the server does not find the requested file /images/snoop.gif, it may issue the following response:

HTTP/1.1 404 Not Found
Date: Fri, 01 Apr 2005 06:06:47 GMT
Content-Length: 177
Connection: close
Content-Type: text/html  

<html>
<head>
 <title>404 Not Found</title>
</head>
<body>
 <h1>Not Found</h1>
 <p>The requested URL /images/snoop.gif was not found on this server.</p>
 <hr />
</body>
</html>

Note that all the responses that your server generates should conform to HTTP 1.1.

Request Handler Implementations

You will create two actual request handlers. These handlers are classes that extend the abstract class HTTPRequestHandler.

File-Request Handler

The file-request handler will be implemented by the class FileRequestHandler. This handler handles requests for files. In particular, suppose that the server is given a request to a file identified by the path P. Assuming that the file P exists, the handler acts as follows:

A directory listing is an XHTML file that lists the files in the directory as links to the actual files on the server. You may freely design the directory listing of your server.

The headers of the above responses should contain the following information:

You might find the class java.io.File useful for implementing this handler.

The headers may contain other relevant information in addition to the above.

JSP Handler

The JSP-request handler will be implemented by the class JSPRequestHandler. This handler will support a limited version of JSP (Java Server Pages). A JSP file has the file-name extension jsp. In our version of JSP, the content of a page is a regular XHTML code with two types of elements embedded:

  1. Java code surrounded by <% %>.
  2. Java printable expresssions; that is, expressions that can be used as an argument of the method print of objects of class java.io.PrintStream (such as System.out). A printable expression is surrounded by <%= %>. For example, <%="DBI: Exercise " + 3%>.

Upon a request to a JSP file, the file (if it exists) is translated into statements of a Java method that sends XHTML content to the requesting client. The generated code is then being compiled and run.
JSP flow

The translation from JSP to Java code follows these rules:

  1. The method has an access to an object of type java.io.PrintStream named out. (This object can be a local member of the method, an argument of the method, etc.)
  2. Every text segment outside <% %> and <%= %> is printed by out. Note that the symbol " should become \" in the Java code.
  3. Every text segment inside <% %> is written as is to the code and is separated from the other code by new lines (thus, <%prin%><%tln();%> will not be translated to println(); but rather to two lines containing prin and tln();). Note that you need not replace entities (such as &lt;) by characters (such as <).
  4. Every text segment inside <%= %> is printed by out as is, i.e., without the surrounding quotes ("). As an example, the text <%=x%> has the same effect as the text <%out.print(x);%>.

Note that the Java code of the JSP page may include only commands that can be written inside a method (e.g., the code should not include an import command since you cannot specify an import inside a class method). In addition, the JSP Java code cannot assume any import. (e.g. java.util.Date() will be used and not Date()).

As an example, consider the file numbers.jsp (JSP source). This file translates a method that generates this XHTML content (HTML source). Note that the date should be that of when the Java method is being executed. The lines of a method that generates the XHTML may look like this.
You may assume that the content of a JSP file does not contain the strings "<%" and "%>" except for marking Java-code sections as defined above.

Following is our proposal for a method for dynamic generation of classes that execute JSP code. You may ignore our proposal and use your own. First, define a simple interface which your generated class will implement. This interface will have a method that receives a PrintStream object named out (implementors will write the XHTML code into this stream). As an example, look at the interface JSPBase.java. The code that the handler writes is a class that implements the base interface and has an empty constructor. Next, compile the java code using Runtime.exec() and load the compiled class file using Class.forName(). Once you have the Class object, you can create an instance using Class.newInstance(). This instance can be casted to the type of the interface you created, and thus its XHTML-generating method can be invoked.

The headers of a response for a JSP request should contain the following information:

The response may contain additional headers as well. Note that a response to a JSP request does not include the length of the content, since the length may be unknown when the headers are generated, and the result of the JSP file should not be stored in memory.

If the requested JSP file does not exist, then the server should response with 404, similarly to the FileRequestHandler.

Authentication Management

The server should support the basic HTTP authentication scheme. To implement this part, you will need to read the Section Basic Authentication Scheme of the HTTP/1.1 RFC.
The configuration file defines a set of privileged users and a set of protected directories. Given a request to a file under a protected directory, the server should check whether the request contains the authentication information (i.e., name and password) of one of the privileged users. If so, then a regular handling is applied for the request (i.e., returning the file, processing the JSP, responding with 404, etc.). Otherwise, the server should return a 401 response with a suitable WWW-Authenticate header.

A protected directory has the form /dir1/dir2/ ... /dirK/. The directory is considered as relative to the directory from where the server is being launched. For example, if the server is launched from ~snoopy/dbi/ex3/ then the directory /a/b/ actually denotes the directory ~snoopy/dbi/ex3/a/b/. Every resource under a protected directory (including the protected directory itself) is protected. For example, if /a/b/ is protected, then all of the resources /a/b/, /a/b/c.html and /a/b/c/d/e.html require authentication for access.

You will need to decode Base64-encoded strings. To do that, use the class org.apache.catalina.util.Base64 of Apache. You can find here an example of encoding and decoding with Base64. You might find the class java.io.File useful. Especially, the method getCanonicalPath of that class.

The Server Configuration File

The server configuration file is an XML file that defines the following properties of the server:

  1. The number of request-dispatcher instances (threads) used by the server;
  2. The request handlers used by the server. For each such handler, the following information is defined:
  3. A list of privileged users (i.e., names and passwords);
  4. A list of protected directories that can only be read by privileged users; and
  5. A list of mappings from file-name extensions to mime types.

The configuration file is called config.xml and is placed in the directory admin under the directory where the server is being launched (i.e., /admin/config.xml). This file conforms to the DTD config.dtd and it contains the DOCTYPE declaration with the DTD URL http://www.cs.huji.ac.il/~dbi/Exercises/ex3/config.dtd.

As an example, consider the configuration file config.xml. This file defines the following properties:

The configuration file must satisfy the following conditions (and you may assume that so will the configuration files that we will use for testing your solution):

  1. The configuration file conforms to the config DTD and has the suitable DOCTYPE declaration. In addition, its values are legal ones (e.g., all numbers are positive integers, etc.).
  2. There is exactly one handler that is defined as default. That handler will not have any associated file extensions.
  3. The Java classes specified in the configuration file reside in the base directory of the server.
  4. If a handler class is mentioned in the configuration file, then it must be a class that extends your abstract class HTTPRequestHandler and has a default constructor. Note, however, that in testing your solution, we will add handler classes that are not necessarily from those that you have implemented, and hence, it should be possible to automatically extend your server with additional handlers.
  5. A handler may have more than one file-name extension associated with it. However, different handlers do not have common extensions (i.e., extensions have unique handlers).

Note that the directories listed as protected do not necessarily exist. Also note that the mime-mappings should be used by the handlers to supply suitable Content-Type headers when serving requests.

For parsing the configuration file, you should use either a DOM or a SAX parser, as taught in class.

Logging Actions

To trace the actions of your server, you are required to log your actions using our dbi.DBILog class. For example, we would like your ConnectionRequestHandler to write a message every time it queues a socket. This can be done as follows:
dbi.DBILog.log(this,message);
Note that this is the ConnectionRequestHandler object (passing itself as an argument) and message is some descriptive message, like "I have just queued a request from " + clientSocket.getInetAddress(); ".
The actions you need to log are the following:

  1. Queuing an object (i.e., a socket or a request) in one of the queues;
  2. Pulling an object from one of the queues.

It is important that you use the class dbi.DBILog for writing the log messages, since the graders will use this class to trace the actions of your server. Every logging object (e.g., the connection-request handler, a request dispatcher or a request handler) should send itself (this) to the log function. The message should be short but descriptive.

Handling Errors

You should handle the following erroneous situations:

  1. A protected resource is requested, but the request does not contain correct authentication data. In this case, your server should respond with 401.
  2. The requested resource is not found. In this case, your server should respond with 404.
  3. Some error prevents the request from being fulfilled, e.g., a JSP file fails to compile or run, a file exists but it cannot be read, etc. In this case, your server should respond with 500.
  4. An error occurred after your server started sending the response. In this case, you should try to close the connection, and hope that the client will recognize the error and recover.

The headers of the error responses that your server issues should contain the following information.

Those headers may contain other relevant details as well.

Concurrency

Your server should properly handle concurrent requests. In particular, it should properly handle concurrent requests to different or same JSP files.
For implementing the thread pools, you should use the Java mechanisms thaught in class.

The Server Launcher

To start the Web server, you will implement the class WebServer.
This class should have a constructor that gets a port number, i.e.,

public WebServer(int port) {...}

In addition, the following method will read the configuration file and initialize the Web server (with all its componenets) according to the configuration.

public void start() throws ... {...}

Technical Details

Dynamic Code Generation and Class Loading

To implement the JSP handler, you will have to dynamically compile, load and instantiate Java classes. See the class DynamicDate for an example for a code that does just that. This example uses the interface Teller.
Note that this example assumes that the Java bin directory is included in the Path system variable. If you work outside our department, then you may need to update this variable. (In Windows systems, this path usually looks like C:\j2sdk1.4.2_03\bin.)

A problem that you may encounter is the following: when Java loads a class with a given name, it does not load it again in subsequent requests, but rather uses the old one. Hence, you cannot use a code like DynamicDate.java to generate several classes with the same name. There are several solutions to this problem. We propose two of them.

  1. The first solution is to use a unique Java-class name for each JSP request (e.g., using a counter). Note, however, that compiled classes should be deleted from the disk after usage.
  2. The second solution is to use a new instance of java.net.URLClassLoader to load each of the new classes. This is rather complicated, but you can use the class DBIClassLoader for that. For an example, see the class DynamicDateReload. Java source: DBIClassLoader.java, DynamicDateReload.java.

Required Libraries (needed for Eclipse)

Browser Configuration

The firewall of our department prevents you from connecting to a Web server that runs on a computer in the department from outside. Hence, a Web browser can access your Web server only if it runs on a computer inside our department, and it does not go through an external proxy. So you should disable the proxy of your browser.

Bonus Part

A bonus of 5 points will be given for correctly implementing both the following features:

  1. Using chunked transfer coding when responding to JSP requests;
  2. Supporting persistent (keep-alive) connections.

Note that to support persistent connections, all responses should either contain the content length or be chuncked-coded. Also note that a series of requests over one connection may include both JSP and regular file requests, i.e., requests that are handled by different handlers.

What You Should Submit

You should submit a jar file called ex3.jar that contains the following:

  1. All the Java files specified in this exercise definition, and all other classes that you write for this exercise;
  2. A JSP file (handled correctly by your server) called example.jsp that combines both server-side and client-side dynamic XHTML. That is, a JSP file that contains both Java and JavaScript code;
  3. A file README.html that contains:

To test your submission, make sure that the following script will start a telnet connection with your Web server.

mkdir testServer
cp ex3.jar testServer
cd testServer
jar xvf ex3.jar
          
ls ConnectionRequestHandler.java
ls HTTPRequest.java
ls RequestDispatcher.java
ls HTTPRequestHandler.java
ls FileRequestHandler.java
ls JSPRequestHandler.java
ls example.jsp
ls README.html

javac  -source 1.4 *.java
          
mkdir admin
cp ~dbi/www/Exercises/ex3/config.xml admin/
java TestServer
          
telnet localhost 8765

Good Luck!