Warning - this document is out of date, and many of the links are to missing files. I don't plan on updating it, and will delete it at some later point in time. However, it still provides a good example of how you can turn a document structured by it's type into HTML. These days, you should consider using XML instead of SGML, as XML is a simpler format.
The source for the FAQ is written in Standard Generalized Markup Language (SGML), using a varion of a FAQ DTD prepared by readers of comp.text.sgml. This allows easy translation into both an HTML version and a text version of the FAQ, as well as for building a search mechanism (currently under construction).
The SGML sources are first parsed with James Clark's freely distributable SGML parser sgmls. The parsed version (you really don't want to look at this!) is in a format that is very easy to deal with, and all future processing will use that format.
The first step is to produce an HTML version of the file for publishing on the World Wide Web. This is done using a Perl script that implements an SGML processing step, which does a number of things to prepare for further processing:
The resulting file - still in the format of parsed SGML - is then turned into HTML by the sgmlsasp program, which comes in the sgmls package. This is controlled by the representation file, which specifies what is to replace each open and close tag in the source file, including dealing with attributes.
Finally, a table of contents is added by Earl Hood's htmltoc program. While htmltoc is not an SGML processor, it produces HTML files that validate against the HTML 2.0 DTD.
The result of all this processing is the HTML version of the faq.
The text version is produced from the HTML verion by Gary Houston's General Formatter: gf, with a style guide prepared for this purpose. gf is an SGML application that formats a number of different DTDs to differnet presentation languages, ASCII among them.
The headers are pulled from the parsed SGML file so the can be added to the text version for posting. The perl script that pulls the headers builds a file ready to be concatenated with the text version and posted.