CRCL, Bangkok -- Web Site Maintenance Software border
border OVERVIEW | CRCL | CALLS | DICTS | FONTS | SOFTWARE | PAPERS | PROJECTS | WHO?... | LISTS | SPOKEN... | REF CARDS | SEEKING | BASICS... | HOW?... | CLOCKS | LOCAL | CONTENTS...

CRCL logo SOUTHEAST ASIAN
COMPUTING AND LINGUISTICS

Produced by Doug Cooper / Center for Research in Computational Linguistics,
Bangkok. Presented in cooperation with . . . ( 0.2 -- comment only )


WEB SITE MAINTENANCE SOFTWARE

border
space
spacespace BASICS spacers
space
All the standard rules of software production apply to writing HTML pages. In particular, remember that the sooner you begin to code, the longer it will take, and that while getting something to run today is easy, figuring out what it does two months down the road is rather difficult.
space The design of HTML, which creates all the problems of modular construction without conferring any of the benefits, doesn't help. The basic problem is that there is no way to make symbolic definitions of macros, procedures, variables, and the like. As a result, dozens of files may have to be updated whenever you make any changes.
space The programs described here are all management tools for HTML files. They have all been implemented as Kornshell scripts, and have been tested and run using the MKS UNIX toolkit -- a set of UNIX tools implemented under DOS. The code is slightly roundabout, in an attempt to avoid any cleverness whatsoever, and to work under an OS that might crash at any moment. They should (famous last words) run under any standard UNIX implmentation.
space Problems they help solve include:
  • Keeping the content of features (like the jump bar at the top and bottom of each page) consistent across all the pages.
  • Checking that local file references actually exist, and making it easy to deal with the ones that don't.
  • Producing a list of remote links for testing.
  • Producing a complete list of local files actually used by the .html pages (rather than all the outdated, unused, and backup files that happen to be in the same directories).
  • Producing a complete list of actually-used files modified since the last site archive was created (so that you don't have to upload all those zips and gifs every time).
space My web pages are developed on a standalone PC using HotDog and Netscape 1.22. All relevent files are collected (as described below), then uploaded to a Web archive.
As far as possible, this software has NO FEATURES! Please make sure that you understand how it works before you use it.

space
spacespace NAMING CONVENTIONS spacers
space
A few simple conventions make maintenance much easier. First, all files are in subdirectories at the same relative level:
              www     ... subdirectories only -- no files
   /           |        \
 main         font      gif    ... other directories
  |            |         |             
main.htm    font.htm     foo.gif    ... other files
The top-level directory, www, contains only subdirectories. Eventually, when it is installed, it will contain a "main.html" file that is linked to main/main.htm.
space In the html files, every local reference is given as a full path name relative to the parent directory:
space main.htm refers to a file in ../gif as ../gif/file.gif as expected.
space main.htm refers to another file in its own directory as ../main/file.htm
space Why do things this way? So that every local file reference has the exact same form, no matter where it occurs. This makes it far, far easier to maintain the whole suite of pages.
space
spacespace FEATURE NAMING spacers
space
Whenever possible, features that are repeated from page to page are named. If they are ever modified, two tools distribute the new versions across the suite. The tools are:
  • update.ksh collects the features from a 'model' file (usually main/main.htm), and creates a revision file for each feature.
  • revise.ksh performs various safety checks, then inserts the revisions, as appropriate, into each html file in www/*/*.htm
Features look like this:
<!--feature pageback version 26861-->
<BODY background="../gif/edgepr1.gif" bgcolor="ffffff">
<!--endfeature pageback -->
The number (26861) is inserted by the revision program itself. Other named features include the page masthead and the list of links at the top and bottom of each page.
space
spacespace CHECKING AND ARCHIVING spacers
space
The biggest headache with a large set of pages is making sure that all the links are actually there. These programs both search all ../*/*.htm files (ie. every .htm file at the same level):
  • gethttp.ksh collects external references of the form "http:", "mailto:", "gopher:", and "ftp:" from all the HTML files, then stores both the references, and the files they appear in, in a new file called httplist.htm, where you can check the references at leisure.
  • getref.ksh looks for local file references. It looks to see if the files actually exist, then creates:
    • ziplist.all -- a complete list of files that do exist,
    • ziplist.not -- files that do not exist,
    • ziplist.htm -- all files, together with the origin files of calls that do not exist,
    • ziplist.new -- file that do exist AND have been modified since the creation date of a file named "htm.zip" (and should be in the same directory).

These programs solve a variety of consistency and transport problems. httplist.htm and ziplist.htm make it easy to do final checks on referenced files. File ZIPLIST.NOT usually contains misspelled references, place-holders you forgot to get rid of, or path names you forgot to change.
space ZIPLIST.ALL should be used the first time you archive the site: pkzip -p htm.zip @ziplist.all. This creates a zipfile, complete with path names, of every file in the site.
space Thereafter, ZIPLIST.NEW should be used as the final argument. It includes only files that have been modified since the creation date of HTM.ZIP. ( This is slightly clunky, but was easiest to do with MKS Kornshell)
space When you unpack the zip file, remembert that "pkunzip -d htm.zip" will preserve path names and create directories as needed.


OVERVIEW | CRCL | CALLS | DICTS | FONTS | SOFTWARE | PAPERS | PROJECTS | WHO?... | LISTS | SPOKEN... | REF CARDS | SEEKING | BASICS... | CLOCKS | HOW?... | LOCAL | CONTENTS...

All original work © 1995 Doug Cooper. Please see this disclaimer, which takes responsibility for content, and the release notice, which gives you the right to copy it. We believe that all files referenced by these pages may be distributed for research / educational purposes. If any file should not be distributed, please let us know and we will remove it.
red bar