Decoding World Library CD-ROMs
------------------------------

The enclosed program wlcat decodes the .etx file on a World Library CD-ROM
and can be called as a pipe, with a file name, or with three arguments:
file name, start byte, and end byte (relative to the decoded text).

This .etx file contains all of the text, but I don't have any good way
to find a particular article.  I can't find it, but there must be a file
that stores byte offsets relative to the encoded text.  The included
DOS/Windows browsers can far too quickly to be decoding everything.

If you have a fast CD-ROM and don't mind using really gross code,
try this:

    #!/bin/sh
    cut -c5-79 $1.ftf | awk '{if(NR<10) printf(" "); print NR" "$0}' |less
    echo -n "Article Number = "
    read REPLY
    TEMP=/tmp/wl-$$
    echo -n ./wlcat $1.etx " " > $TEMP
    echo -n `awk '{if(NR=='$REPLY') print $1}' <$1.lim` " " >> $TEMP
    echo `awk '{if(NR=='$REPLY'+1) print $1}' <$1.lim` >> $TEMP
    sh $TEMP |less
    rm $TEMP

The enccat program can decode some auxillary files as described in the
list below.


FILES

These are the files in the data directory of a World Library CD.
This data is based on the Electronic Home Library disc, but is similar
for the other discs I've seen (change the first part of the filename,
of course).

    [?] = unknown format, 
    [A] = etx format, 
    [B] = enc format, 
    [P] = plaintext.

ehl.abb -- [?] 
ehl.aut -- [P] List of authors, 
               author number is line in this file, starting with 0.
ehl.dic -- [P] List of words appearing in the text?
ehl.enx -- [B] Index to illustrations
               first column = article number,
               second = long name of picture,
               third = filename of picture,
               fourth = a number (what does it mean?)
ehl.esm -- [B] Bios & descriptions of articles.
ehl.etx -- [A] Encoded text of all articles.
ehl.fst -- [?] authors, then titles, then gibberish
ehl.fta -- [P] numbers?
ehl.ftf -- [P] first column is author number from ehl.aut, second = title
ehl.ftl -- [P] first column = some number, second = title
ehl.ftt -- [P] numbers?
ehl.id  -- [P] Type, date, and place of articles.
               1st column is always "$", 2 == era, 3 == age, 4 == year, 
               5 == century, 6 == continent, 7 == country, 8/9 == type.
			   (entries are abbreviations -- see version.enc for full values)
ehl.ita -- [P] numbers?
ehl.itt -- [P] numbers?
ehl.ixb -- [?]
ehl.lad -- [?] 
ehl.lb  -- [?] 
ehl.lim -- [P] first column is byte offset to the article in the decoded text.
ehl.msk -- [?] (16 groups of mask bits?)
ehl.sb  -- [?] (5700 screen bytes?)
ehl.scr -- [?] offsets?
ehl.swm -- [?] 
ehl.tbb -- [?]
ehl.tl  -- [P] list of texts by original file name
ehl.wfs -- [?] authors, titles, then gibberish

ehlsn.dic -- [P] stop words for searches
ehlsp.dic -- [P] stop words for searches
ehlsp.ixb -- [?]
ehlsp.tbb -- [?]
ehlss.dic -- [P] stop words for searches, special phrases
ehlss.msk -- [?]
ehlsw.dic -- [P] stop words for searches

exittext.enc -- [B] decode with "enccat", exit text
version.enc  -- [B] decode with "enccat", contains info on files/categories
wexittxt.enc -- [B] decode with "enccat", exit text

*.pcx -- illustrations in PCX format.
lfwill/*.wmf -- illustrations in Windows metafile format?
