next up previous contents index
Next: Running the example Up: C.4 Example 4 - Previous: Using regular expressions

Using programs to summarize a format

The second new file format is the ``Abstract'' type, which is a file that contains only the text of a paper abstract (a format that is common in technical report FTP archives). To recognize that a file is written in this format, we'll use the naming convention that the filename for ``Abstract'' files ends in ``.abs''. So, we add that type recognition customization to the lib/byname.cf file as a regular expression:

        Abstract                ^.*\.abs$

Another way to write a summarizer is to write a program or script that takes a filename as the first argument on the command line, extracts the structured information, then outputs the results as a list of SOIF attribute-value pairs (see Appendix B.3 for further information on how to write a program that can produce SOIF). Summarizer programs are named TypeName.sum, so we call our new summarizer Abstract.sum. Remember to place the summarizer program in a directory that is in your path so that Gatherer can run it. You'll see below that Abstract.sum is a Bourne shell script that takes the first 50 lines of the file, wraps it as the ``Abstract'' attribute, and outputs it as a SOIF attribute-value pair.

        #!/bin/sh
        #
        #  Usage: Abstract.sum filename
        #
        head -50 "$1" | wrapit "Abstract"



next up previous contents index
Next: Running the example Up: C.4 Example 4 - Previous: Using regular expressions



Darren Hardy
Mon Apr 3 15:22:37 MDT 1995