The list checker is similar to the hotlist application proper. Upon loading the page for checking the list, the user is presented with a list of links, sorted in some order. The difference is that each item is displayed with a set of links for doing maintenance on the item as well as the link to the page described. While this format has some problems, they are acceptable for a private application like this.
After experimenting with the layout a few times, I decided that the list should be sorted by decreasing HTTP return codes from the page being fetched. This means the pages that fetched properly are at the bottom, and those that had problems are at the top. That code is also the first bit of information displayed for an item, and is a link that will remove that item from the database. Then comes the description, with a link to it just as in the hotlist application. After that, in parenthesis, comes any problem reports. If the link was redirected, the english version of the HTTP status is presented, and the URL the redirection provided. That text is a link to a maintenance function, to update the database by replacing the URL for the item with the one it was redirected to. If the title of the page fetched did not match the description, the title of the page is presented as well, and is a link to update the database by replacing the description with the actual page title.
The code for checklist is similar to the hotlist application. It starts with a set of format strings used to print the pages it produces. The only one that isn't a variant of hotlist application code are deleted_format and changed_format, which are used to format the page returned after an item is deleted or changed.
Following is the document class derived from the
Python library class SGMLParser. It provides an easy way
to to find the title in an HTML document. It includes methods invoked
when the title tag starts and when it ends that set an instance
variable to note that we are processing the title, and a method
invoked as data is processed that saves the title if it's being
processed
After that is the checkedurl class. This class
inherits from the Python library class threading, which
provides a high-level interface to the systems threading
facilities. The run method describes the actions that
should be taken to check a URL, and is invoked in a new thread when
the Thread class's start method is
invoked. It uses the good_document and
bad_document methods to build a dictionary of values for
the URL. Good documents get a title from the find_title
method, which uses a document object.
Then there's the checker class that inherits from the
hotlist handler class. It extends the initialization
method to use the checker formats, and simplifies the
display_page method as it no longer depends on the type
variable from the query string. The rest of the display methods are
unchanged. The get_list method is changed to create
checkedurl object for each entry in the list, then start
it with the Thread class's start method. It
then walks the list of checkedurl objects, invoking the
Thread class's join method for each to wait
until that thread is finished, and then checks the status and updates
that entries checked database entry. That list is then
sorted to make the higher status entries show up first, and those with
changed titles above those without. Finally, there's a trio of
do_ methods to handle request to maintain the
database. These all fetch the item with the get_item,
make the change, fetch the new version of that item - except for
do_delete - then display the items (or item, for
delete) with the appropriate format string.
checklist.py is installed
the same way that hotlist.py was - by copying it to
/usr/local/www/cgi-bin. Be warned that it can take quite a
while to run, as it checks every item in the hotlist, and some of
those may have to time out.
Like the code described for hotlist, most of the work is done by C code in other applications. The only new items of work are fetching the document and sorting the list. Fetching the document is involves quite a bit of Python code, to parse the URL and the result - but is still dominated by the time taken to fetch the document over the network. The list sorting all happens in C. So once again - that we're using an interpreted language won't make much difference in the applications speed.
This ability to modify data on the server brings with it a need for some form of security for that data.