Lab 3 - Document Index
Due: Wednesday, September 27, 2006 at 11:59pm

Introduction

This lab asks you to construct a system that indexes a document and then allows it to be searched by keyword. When a keyword is presented, each and every line that contains at least one match should be displayed.

The key data structure for the index is an inverted list. An inverted list is a mapping from a word to a list of its occurances. When a word is looked up in the table, the list of lines in which the word appeared will be returned.

In the case of this assignment, these lines will be printed. In order to achieve this, an ArrayList is used. The ArrayList keeps each line, indexed by line number. This makes it easy to convert a line number into the lines text

The Index class

The Index class is basically the inverted list. It should support the following methods:

There is one complication here, though: What to do about collision. You've got a couple of options here. Each bucket can be a list of lists: a list of words, each of which is a list of occurances. Or, you can use a single list, but create a Match class that contains both the word and the line number. Then, before returning a list of matches, you can traverse this and produce the simple list of line numbers that match the requested key word.

The DocumentIndexing Program

This program should accept a file name on the command line (args[0]). It should create a new instance of Index and a new ArrayList to hold lines.

It should open the named text file and process it one line at a time. Each line shoudl be added, as a whole, to the ArrayList. It should also be given to the Index for inclusion. Each line should be given a sequentially higher line number corresponding to the lines index within the ArrayList. You may find Scanner's nextLine() method helpful. You may also find StringTokenizer helpful.

After processing the file, it should enter its main work loop. In this loop, it should give the user two options: 1) Search, 2) Quit. If the user chooses to quit -- the program is done.

Otherwise, it should prompt the user for a keyword. It should then look this word up in the index. If it isn't found, it shold print "Not found". If it is found (the List that is returned is non-empty), it should traverse this list, printing out the full text of each matching line. Recall that the ArrayList contains this matching text. The program should then loop back and repeat (until the user chooses to quit).

The results should be presentable and neat. One suggestion is to print the search word. And, then, beneath it and indented by a tab, each result (or "Not found").