Database backed web applications: performance issues

Once a web application is up and running and reasonably secure, making it faster becomes a concern. The Apache web site has information on tuning Apache for performance - among other informative things. Similarly, the application itself can be profiled and examined and tuned. Typically, these methods will gain a few percent here and a few percent there. While not to be sneezed at - they do add up - the real performance gains with web-based applications lie elsewhere.

In the short history of the web, a few things have emerged as key issues to performance in web-based applications. One is that you should avoid starting new processes; not only is starting that process expensive, but it usually implies that data must be passed to the process, and probably back - which is also expensive. The Apache server, with multiple, pre-forked servers listening on each port, show the results of this knowledge. The CGI interface dates from the early days of the web, and does not show such results, and neither do the applications that use it.

In this instance, a request is caught by an Apache server. It finds the application script, recognizes it as a CGI, forks and then execs that script, which actually starts a python interpreter to run the script. That Python script reads in the data - probably from the socket directly - reformats it, opens a connection to the PostGreSQL server, passes the data to the SQL server, reads the results, writes the results back to the apache server, that then writes them out to the socket.

In the above scenario, we have three processes - Apache, the Python interpreter, and PostGreSQL - with the output being copied three times: once to the Python interpreter, once to Apache, and then to the client. Copying the data to the client is required, and so is the PostGreSQL server. We could eliminate copying the data through Apache - and the Apache server - by adding an HTTP port to PostGreSQL. That's a bit beyond this discussion. That leaves the Python process, and copying the data to Python. If we can avoid starting that process and copying the data to it, we can save a significant fraction of the overhead of running the application.

Fortunately, Apache again comes to the rescue. method PyApache module does just that. It integrates the Python interpreter into the Apache server, so that Python scripts are run in the server process, saving a fork and a copy of the results. It supports the CGI interface, so that installing this module and restarting results in better performance without changing the underlying application. Note: for dynamically loaded python modules to work properly from the Apache pyApache module, the the mod_so module must be configured into Apache. Since the py-PyGreSQL module was built as a dynamically loaded module, you must should link apache with symbols exported - using the -E flag for GNU ld - for this or any application that depends on py-PyGreSQL to work. Modules similar to the PyApache module exist for other interpreted languages - Perl, TCL, C, and web-specific languages as well.

That completes the installation of this database-backed web application. The next thing to look at is the summary.


Prev, Next, Contents

Mike Meyer,
March, 1999