Maps as a new paradigm for web servers

Abstract

All web servers that I am familiar with have distinctly different mechanism for translating URLs into files on disk, configuring authentication of various kinds, scripts, and other things that deal with the URL. AWS managed to do all of these things with a single type of statement - a map.

Maps are similar to the "alias" mechanism in many servers, that map URLs to a disk directory other than the server root. They differ from such mechanisms in two distinct ways:

  1. Maps do not fail; they either handle a request or they don't. If they don't, the server continues trying maps.
  2. Maps invoke a module to handle the request; the module may invoke other modules.

This paper examines maps to show that these two features suffice to allow maps to handle all of the configuration options named above, and to combine in surprisingly powerfull ways. Indeed, commonly requested abilities that others servers may not provide, or provide only with difficulty, are trivial combinations of maps.

The anatomy of a map

A map statement in the aws config file has three components:

  1. A URL prefix that is used to select requests for this map to apply to.
  2. A module name, that this map runs on requests for URLs that match the URL prefix.
  3. Module-specific arguments. Even maps that don't need an argument have something in this position, usually "-".

For instance, the map that corresponds to a ServerRoot directive in other servers is:

map / directory data:html/

Which checks all requests to see if they match a file in the directory data:html.

More generally, the NCSA httpd alias:

Alias /icons/ html:icons/

is identical to the map:

map /icons/ directory html:icons/

Maps as aliases that "fall through"

All but one of the web servers that I am familiar with use the same basic model for locating objects on the server. Usually, there is a root directory where the URL is checked for as if it were a path in the file system. Some servers allow aliases that use a prefix of the URL to indicate a directory other than the root. CGI directories are treated similarly. If the URL does not appear in the directory determined by the alias mechanism, then a not found result is returned to the client.

In contrast to this, a map that fails to find the file referred to by the URL will not return a not found result, but will tell the server that it didn't do anything. The server then continues checking the other maps available, and returns a not found result when it runs out of such maps.

If you had nothing but directory maps, there would be little difference between maps and aliases, except that maps allow a three dimensional structure for documents on the server with directory trees layered one on top of another as you apply different maps which use the same prefix.

However, because maps don't fail to find objects, modules can take on tasks that normally require direct server support.

For example, host authentication is a popular means of restricting access to an area of the server. The authhost module does that. For example:

map /private/ authhost - #?.foo.net
map /private/ directory private:

causes access from anywhere except the foo.net domain to fail with a object not found error. The "-" in the middle can be replaced by a filename to allow sending a specific message instead of a generic object not found message. If the host is in the foo.net domain, the authhost module "does nothing", allowing the next directory map to handle the access to /private/.

Similarly, the authbasic module checks HTTP Basic authentication information, and either quietly falls through or returns a not authorized response. If authorization succeeds, then authhost annotates the request with the user name and authorization method for use by further maps, or for server logging.

Modules can invoke other modules

The second feature of maps (missing from similar mechanisms in other servers) is that the module they invoke may in turn invoke other modules. The authhost module is one example of this. It invokes the file module if the host authorization fails and a file name is given. The file module takes one argument - a file name. It returns that file when it is invoked, without regard to the URL being used.

Another example of a module invoking a second module internally can be found in the userdir module. This module implements the user directory mapping found in other servers. Its one argument is the name of the directory to look for in the user's home directory. It looks up the users home directory in the password file, appends its argument, and then invokes the directory module to actually send any file that may exist.

For example, to mimic the default NCSA behavior, use the map:

map /~ userdir public_html

Since this facility is usually an integral part of the server, servers allow only one prefix, and only one public directory name. Since a module can be used as many times as needed, and falls through, both cases can be handled here. To allow users to use /home/username to search the directory html in their home directory as well as /~ to search public_html, add the following map to the configuration file:

map /home/ userdir html

The synergy of these two features

A more powerful example of using maps is the ipfilter module. Its arguments are a regular expression, a module name, and the arguments for that module. The addresses of the host issuing a request is matched against the regular expression - similar to authhost - and the named module is invoked with the given arguments if the match succeeds. ipfilter can be combined with the userdir module to allow user directories that are only visible from the local net:

map /home/ ipfilter #?.local.net userdir local_html
map /home/ userdir public_html

In this example, requests from machines on local.net will be searched for in local_html. If a match is found, it will be sent. Otherwise - for a request from outside of local.net or for one not found in local_html, the users public_html directory will be checked.

An obvious thing to want to do (and a featuer that some commercial servers don't provide) is the ability to do either host authentication or basic authentication. That is, if the request comes from a trusted host, allow it. Otherwise, if the user has an appropriate username and password, allow it. If neither is true, deny the request. The modules seen so far allow this, like so:

map /semi_private/ ipfilter ~(#?.trusted.net) authbasic roamers
map /semi_private/ directory data:semi_private_html/

In this example, if the request does not come from a machine on trusted.net, ipfilter invokes the authbasic module for the realm "roamers". That either returns an authentication failure or will do nothing. If ipfilter does not invoke authbasic, or authbasic does nothing, the next map is used, which tries to resolve the request in the data:directory/semi_private_html.

As another example, the headerfilter module takes the name of an http header, a regular expression against which to match the value of the header, and the name of a module to run. The omnipresence server used this ability to route different versions of a browser to the home pages appropirate for that browser.

Finally, several of the web servers I am familiar with can be configured to check the identity of the remote user via the IDENTD protocol. However, these servers either check for all requests or for none. Since most sites on the internet do not run the IDENTD server, doing an IDENTD check for every request just causes unwanted delays. Similarly, if you are only interested in identity checking for a small set of files, these servers identity check requests for all files on the server, which is undesirable.

An IDENTD module is nearly identical to IDENTD code in any server. The module accepts a request, does an IDENTD check on the socket associated with the request, adds a notation to the request indicating the results of that check, and then continues. However, by being part of the map/module system, the user automatically gains the ability to apply identity checking to selected parts of the document tree, or to filter requests based on ip address - or other things - to prevent unwanted IDENTD requests.

Summary

By providing a simple paradigm for routing incoming requests, maps make server configuration simpler. By having modules which handle various aspects of the request that can be combined in different ways, maps make it possible to do things with the web server that were not forseen by the author, making maps a fundamentally better paradigm for server configuration than root directories, aliases, user directories, etc.

Maps bear further investigation. I invite your comments and suggestions.


Mike Meyer