‹header›

‹date/time›

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

‹footer›

‹#›

Well, at this point I’m hoping that you’ve seen the method to this course from our discussion of Carol Tenopir’s article. In our “tread with dinosaurs,” we’ll be getting to see an incredibly powerful search interface to an enormous suite of databases. Many of these databases will be familiar – from your experience with access to the databases supplied by other vendors. We’ll use this database searching system to see all of the power that you might experience with other interfaces throughout your career. It will be an experience that you can always use to explore the databases that you meet when you’re walking down the Information Superhighway. When it comes to all-around capability, everything else will likely fall short. You may find yourself saying … “I know I could do that with Dialog … is there a way with this search interface?” The only drawback is that it is nearly devoid of a graphical user interface. You’ll be constructing the actual commands in DIALOG search query syntax.

Oh … snips and snails and puppy dog tails – that’s what databases are made out of … OK, so they’re not. Databases are also referred to as files or datafiles. Sometimes a databases is split into subfiles, but not necessarily. The ERIC database is a good example of a database that is split into two subfiles – the Current Index to Journals in Educations (CIJE) and Resources in Education (RIE). The important thing to remember is that some databases will have subfiles and some won’t. When they exist, they can sometimes be helpful to craft a search strategy. All databases will be made up of database records. Each database record is made up of fields. Sometimes fields are divided into subfields. Perhaps an example will help.

Here’s an example from the online catalog of a library. The online catalog is an example of a database. The record is an example of the familiar bibliographic record. It simply describes a book. The fields should also be familiar: author, title, publisher, etc. The book happens to be one of my favorites. It was my textbook for a course in US and Canadian geography. The book suggested that North America could be split into nine different nations that would make more sense than the three that we currently have. He explains why he picks the borders that he does. One memorable one was a diagonal line from northwest Connecticut to southeast Connecticut. People to the north and east of this line have a very strong tendency to be Boston Red Sox fans and people to the south and west of the line tend to be New York Yankees fans. Sorry Mets fans … I don’t think he paid them much notice!

The record was too long to fit on one screen. Maybe we should look at the underlying structure …

I know! I know! Eeek! It’s the dreaded MARC record – well, at least some of the variable fields. The numbers on the extreme left are the tags for the fields of the records. They’re unique to the MARC record and you’ll often see them displayed as the more human friendly “Personal Author”, “Title”, etc. Between the colons are what are known as “indicators.” They are truly unique to the MARC record. From my forever dimming memory, I can only recall that the “4” means to ignore the first four character spaces or “The “ – the initial article. The “|b” or “|c” are known as “delimiters” and signify the beginnings of subfields of a field.

… and this would be the continuation of the underlying structure. Can you pick out the subfields?

Now I’ll try my very best not to confuse you. This is part of an actual database that I built so that my wife could use a title listing to know which books were already in my “presidential library.” I’m going to try and describe how database records are “parsed” or chopped up into parts that the computer can recognize as “words.” The end result would be the production of an “inverted file” – an alphabetized listing of what words appear in what position of which record. In addition to this example, see Walker and Janes pages 55 through 63 and the GEP DIALOG Lab Workbook pages 3-6 through 3-10. This screen and the next two show record numbers (I’ve designated them ‘RN’) 101, 102, and 103. My database has three fields Author (AU), Title (TI), and Subject (SU). We’re going to build the inverted file for this tiny database of three records by hand.

Familiarize yourself with each record … you should begin to wonder … just how is Matt going to chop up his database?

This is a very good book by the way …

How we chop the record up into words and phrases will lead to what the inverted file will look like.

Well, I’ve been introducing an awful lot of vocabulary through these first ten slides. Remember that Walker and Janes has a pretty nice glossary. The existence of a basic index versus additional indexes is extremely important for searchers to understand. From a searchers perspective, whenever you search a database without specifying a particular field to search, the basic index is searched. It’s kind of a default instruction for computer to follow. It’s also VERY important to realize that the basic index in one database can be different from the basic index in another database. To complicate matters even further, a database that you’ll find on the DIALOG service might also be found in EBSCOHost or FirstSearch. Even though they host some of the same databases, they may choose to index them in different ways. Sometimes a database host might choose to leave certain fields completely out of the indexing process. The important lesson to learn here would be that the savvy librarian must be ready for anything and should endeavor to find out what is defined as the basic index and which additional indexes exist for all the databases that they search. Thankfully you don’t have to memorize all that junk! You just need to know how to look up this information and how to best articulate the situation to your end users. We’ll learn about how to do this with Dialog and that will provide you with a very good example for the future.

Oh, here comes yet another bit of jargon – the “Stop Words.” This is a list of words that computer is instructed to ignore as it builds the inverted file. It’s as if the words don’t even exist. Note that DIALOG’s list is very short – shorter than many hosts of databases. They even index the word “a” – remember that phrases like “type A” or “grade A” can often be found. DIALOG and some other database hosts feel that it’s important for their users to easily find them.

Now, this is where we start to chop up the database. What’s up with the author field? Both the last name and first name on one line – why isn’t the name chopped into a first and last name? Well, our author index … an additional index … will, by my design, be a phrase index. That’s the only way the author will appear in our index: Lastname, Firstname.

“101” is the record number (much smaller than the record numbers that you’ll see in DIALOG). “AU” is the field. “1” is the first phrase.

For another example “psychohistorical” is also in the 101st record, it’s in the title field of record 101 and it’s the 5th word.

And the saga continues …

Wait a minute! This name is chopped up into words! Yup! I determined that design too. We’re in the middle of chopping up a subject field into words. Even “1913” is a word according to a computer – aaw! They never could spell.

Whoa Milhouse! What’s this? Well, this madcap designer has decided that the subject field of his database is both word and phrase indexed. We can do that you know! Even within DIALOG you’ll see variations in whether a particular field is word or phrase indexed. You simply have to memorize all this stuff for 500 plus databases. Nah! I’m just kidding. Never memorize something that can easily be looked up. DIALOG has a set of tools called “Bluesheets.” There’s a Bluesheet for each database that goes through each and every field and indicates how the field is indexed … or if it’s indexed.

And so it goes …

Thankfully we’ll only take this until we get the idea. There’s a point where one begins to really appreciate what we can assign to computers to do. Let’s continue by alphabetizing our list.

And, numbers would kick off our alphabetized list of words …

Each ‘word’ has its record number, field, and position within the field in our list.

Finally we get to the “a’s” in the list.

Hopefully I haven’t screwed up the numbers of the records. This is hard enough to follow as it is.

We also can’t forget the separate additional index that we’ve built. There are only three entries – they’re the authors of the three books.

Oooh! How am I going to do this? I’ll tell you what. I’ll do my best to go through the ERIC database Bluesheet using the Camtasia Studio software – you’ll be able to listen to my explanation and I’ll try to point out what is important. The link in the slide is to the DIALOG Bluesheet for the ERIC database. One of the things that I’ll show is the pair of example records – one is a journal record from the CIJE subfile of ERIC and the other is a document record from the RIE subfile of ERIC.

The link above is to my work web page at Carnegie Mellon. You can actually view the structure that exists within any web page. Use the “VIEW” pull-down menu on your browser and then choose “Source.” The first code recognized by a computer would be the “<HTML>” tag that you’ll see about five lines down (after a bunch of remarks that the computer ignores). This would be equivalent to a document type – if you’ve ever done an advanced Google or Yahoo search, you’ll notice that you can restrict your search to a type of document. Right after the “<HTML>” tag you’ll see the beginning of the “<HEAD>” of the document. The chief portion of the head of an html document is the “<TITLE>.” Note that I put a bunch of different variants of my name in the title of my page. I did that so that if someone looks for me as “Matt Marsteller” or “Matthew Marsteller” or “Matthew R. Marsteller,” then when this page is indexed by most search engines, there’s a good chance that most search engines give greater weight to the title of a web page. The title of a web page is what shows up in the blue bar at the top of your browser window. The web page also has “Web Page of Matthew R. Marsteller” in big bold letters. If you look in the source code (about a third of the way through the code), you’ll see that this phrase is enclosed in “<H1>” and </H1>” tags. This is a heading within a document and can also be given more prominence by a search engine. The key operative phrase is “can be given more prominence” – nothing is guaranteed. When search engines rank their search results, a lot of things come into play. Ranking algorithms (rules the computer will follow) are often redesigned. We’ll explore searching of the Internet at the end of our study together. For now, I hope that you’ll appreciate that the typical web page on the Internet can, and usually does, have structure that search engines can make use of. Perhaps it’s a bit more crude than we’re used to … but it’s there! There are other concerns such as … is it standard HTML code? The answer is, unfortunately, no. One thing that is popular for people to do is to use a word processor like Microsoft Word to produce their HTML code (using the “SAVE As” function). It’s easy to do, but the result is NOT standard HTML. Most browsers will interpret it correctly (kind of) – hopefully most search engines will as well, eh?

We’ve just been talking about a collection of online databases so far. Today, the online database is truly the most common and delivery is often via the Internet. CD-ROM databases were quite popular in the roughly ten years before the advent of the World Wide Web, but they proved difficult to work with – especially when it came to effective networking. Tape loaded databases were quite common before the Internet as well, but now they’re not used much. Sometimes disk drives are used to store data, but again it is not common. The online database accessed via the Internet is predominant. When I first started out in libraries, we accessed host computers of online databases by dialing into them via phone lines. Connection speeds were so slow that I could read the data as it came across my screen. In library school, I used a dumb terminal with what was known as an acoustical coupler. We would type in our commands and then wait for a computer response to printed onto the terminal paper – there was no screen. People used IBM Selectric typewriters as their dumb terminals. My first supervisor would talk of the days when she would use a keypunch machine to type search commands onto Hollerith cards. The cards would be mailed off to a university such as Georgia Tech and fed into their computer. If there were any typographical errors, you found out when the computing center would send you your results via mail. This would take weeks … even months. How things have changed!

Databases can also be divided up by the structure of their data. The bibliographic database, such as ERIC or NTIS or a library catalog, are very familiar to most people – especially to librarians and library school students.

The directory is also a common type of database that one will experience. They are just as tricky to search as other types of databases. Even a small directory of a few hundred hotel guests or hospital patients can overwhelm those unprepared to use them. At the recent ALA conference in Chicago, I failed to reach a colleague because the query was too difficult for the operator of the hotel. At Carnegie Mellon, we routinely get calls for the public library because the telephone operators have difficulty with the directory.

The fulltext database has now become very common. Fulltext databases can be difficult to search unless one can focus their search to bibliographic and abstract data. They will often swamp the database searcher with huge sets of search results. They can often be quite advantageous as well. Once, I was trying to see if anyone had ever found a way to put an electric charge on a bunch of tiny pieces of glass. I was able to search for the words glass near the word pellet or shard or bead or fragment that also showed up near the phrase “electric charge” or “electrical charge.” I located a German patent that described a way that a company did it in order to put a charge on particles in a sand blaster – the idea was to have the glassy fragments of the sand blaster have the dust from the blaster stick to the glass. Remember in science class when you had to rub a glass rod with a sheet of rubber? Well, the inventor had the glass pellets pneumatically transported through a rubber-lined tube. The inner surface of the tube had helical shaped ridges that caused the glass fragments to bounce all over the place – the end result of the glass pieces bouncing off the rubber was a static charge! That had to be one of my happiest moments as a database searcher. The engineer that I was working with was astounded when I found a working method with just a couple of hours of planning and conducting a literature search. We’ll be learning how DIALOG lets us search through full text. Then … take this to the bank … you’ll be looking for a good equivalent search feature whenever you come across a fulltext database.

Citation databases – wow! Things have really begun to change in this arena. Perhaps ten years ago, the only citation databases were those in the Westlaw legal databases and the triad of databases from the Institute for Scientific Information: the computer versions of the Science Citation Index, the Social Sciences Citation Index, and the Arts and Humanities Citation Index. The power of a citation index is that you can start from one good article written perhaps … five years ago … and quickly search for other more recent articles that listed the older article in their list of references.

Financial databases are an interesting monster! The key to searching them is to gain an appreciation for the typical information found in a financial record. Then, gain an appreciation for how the data is structured – in DIALOG we would use their Bluesheets. Other database vendors also provide searching guidance as well.

Numeric databases allow the user to do things like search for materials with a certain set of properties … perhaps a range of boiling points for a database of chemicals or metals with a coefficient of thermal expansion below a desired value.

To help illustrate what the last three databases are like, the URLs that I’ve provided will lead you to search aids for databases of each type. The first two are Bluesheets from DIALOG. The third is the “Database Summary Sheet” from the American Chemical Society’s STN databases. STN is a service that can be considered similar to DIALOG. It has more of an emphasis on scientific information while DIALOG seems to do an admirable job with all types of databases (with the exception of numeric databases of scientific information – note that my example wasn’t another Bluesheet).

This should reinforce the “First Contact” tutorial a bit. DIALOG uses the “?” (question mark) as the command prompt. The command prompt is the symbol that the computer system uses when the searcher needs to supply input. This slide shows the straightforward example of how to start in … or switch to … a database while you are connected to DIALOG. This would be similar to clicking a checkbox to choose the database or databases that you want to search within a collection of databases like EBSCOHost or pointing your browser to the proper URL for a library’s online catalog.

This simply shows the system’s response to the BEGIN command. We’re in the ERIC database and another command prompt is showing.

The SELECT command is the next tool to learn. In this example, I’ve tasked the computer to find all records with the word “mathematics” in the basic index of the ERIC database. It will look in all the fields of each record that are designated as part of the basic index. Again, you’ll find a description of the basic index in each and every Bluesheet.

If you’ve viewed the tutorial, the search should look a little familiar. The number of records retrieved is probably a bit lower – some time has passed since I did the example search for the PowerPoint slides. In this slide, I searched for the word “fear” after completing the search for the word “mathematics.” Note that DIALOG has assigned a set number for each set of records. I’m able to use these set numbers (“S1” and “S2”) to find records where both words show up (again, simply in the basic index). A third set, “S3,” is created and the system responds with another prompt. The word “AND” that I put between “S1” and “S2” is significant. The DIALOG system recognizes that as a special word known as a Boolean operator …

These are the three main Boolean Operators. Note that the description of the AND Operator matches what I was trying to accomplish with finding records on the fear of mathematics. Let’s explore each separately …

Back in third grade, I first learned about set theory and Venn Diagrams. I have to admit that I never foresaw a career where I tend to draw Venn diagrams several times a day! Let’s let the left circle represent all the records in a database that contain the word “mathematics.” The circle on the right represents the set of all records that contain the word “fear.” Thanks to people like me that struggled mightily with multiplying and dividing fractions (at first) in fifth grade, we can expect that educators have looked into the problem of kids being a bit afraid of math. The green shaded area represents records that contain both words. The circles aren’t drawn to scale of course. The set of records with the word “mathematics” was much larger … more than ten times larger. Hopefully this makes the search more understandable. One thing that you should realize is that the result of using an AND Operator should be a set of records that is SMALLER than either of the two sets that you started with (it could be the same size as the smaller of the two sets … but that is not likely to happen). It’s important to note that using too many AND Operators may yield sets that are too small to cover the topic. Start with the most significant concepts for your search and combine them until you have a reasonably sized set to work with. Don’t overuse the AND Operator. Use it, but do so wisely. Think about the topic and the impact that your strategy might have on your results.

Typically, a computer will only do what is instructed. Searching for “mathematics” is interpreted exactly. If any of the records contained the more casual “math,” well, our earlier search would have missed the ones that didn’t also have the word “mathematics.” For the purposes of a database search, the words in the Venn Diagram above are synonyms. The OR Operator looks for either word to show up in a record and gathers them into a set that we can use later. Note that the results for truly synonymous terms will have overlap. The results of an OR Operator is a set that is larger than the set of records retrieved for each individual word (unless there’s a case of complete overlap … theoretically possible, but again, highly unlikely). Database searching often calls for the searcher to think out of the box a little. Perhaps, for my purposes, “fractions” would or could be considered synonymous with mathematics. “Fractions” would be a narrower term, but what if any of the three would be acceptable to the information seeker. Have I got you thinking?

These three slides with the Boolean Operators illustrated were borrowed from a training tool from DIALOG. Don’t blame their example on me! The NOT operator must always be used with great caution. The guidance given in this slide is most wise. Remember it!

In this slide, I’ve used the OR Operator. It helped me to retrieve more than 4,000 additional records that should cover the concept of mathematics.

Followed by that, I may be losing track of the sets that I’ve created. “DS” is the abbreviation for “DISPLAY SETS.” I use this command a lot.

Note set “S4.” It shows a mistake. The “S” before the number for sets is VERY important. Set “S3” is quite different from set “S4!” The results of set S4 are a Boolean AND of records containing the “word” “1” and the “word” “2” – the computer will treat a number as if it were a word unless given specific instructions not to do so. We’ll see some of those examples later in the class.

So, my new set S6 is a little larger than set S3. It is most likely a superior set of search results from a completeness standpoint.

I told you I use that DS command a lot!

If you think over our search strategy, set S3 is indeed a subset of set S6. Sets S6 and S7 would be considered equivalent sets.

The next thing that I tried was to limit my search to only English records. This command works in a lot of DIALOG databases, but when it doesn’t work, the system indicates that the command is ignored. If you look at the ERIC Bluesheet (link provided), and hone in on the section of the Bluesheet that discusses Limits, you’ll find no mention of a limit for English language materials. Thankfully there’s another way to handle the problem. Here’s a mind bender! DIALOG will let you search multiple files together. What happens when one file accepts the “/ENG” limit and others don’t? Well, sometimes the search gets a little messy. Preplanning your search is the only good remedy.

In a technique similar to LIMIT suffixes in DIALOG, it’s important to note that you can restrict your search to a part of the basic index. You would often want to do this to make sure … for example … mathematics is a very important concept in a particular set of search results. It is important to be cognizant of what fields are in the basic index. I know I’m probably beginning to sound like a broken record, but it’s a very important concept. The second example shows a search for the word “mathematics” restricted to the title (ti) field. There are times when this extreme narrowing of a search is beneficial. One example of this is when you try to verify a citation (patrons often struggle trying to find a citation that is slightly incorrect – maybe they misheard a detail or two in a discussion with a colleague at a conference. There’s always the hopeless case where even the best of databases won’t help … one famous item of library lore would be the kid that visits his local library looking for a book called (according to the kid) “Oranges and Peaches” – a good reference interview eventually revealed to the librarian that the kids teacher had mentioned a book on evolution by some guy named Darwin. Aha! The kid wanted “Origin of the Species!” As a person that has struggled with hearing impairment, I’m on the kids side! Oh … how many times have I misheard things!

Sometimes I’ll restrict a search to the titles or descriptors … like you see in the third example above. If the information need is a few good articles on a topic, this can be a reasonable choice. For multiple fields, simply separate the field abbreviations by a comma. What fields are available for suffix searching? Again, it depends on the database – the Bluesheet is where to look!

Okay! I get to introduce a new command and bail us out of our problem that we had in the ERIC database – that one without the Limit command that we needed! If you scour the ERIC Bluesheet, you’ll notice the Language Field as one of the additional indexes. At this point, I’ve always been more comfortable with using the EXPAND command or “E” to browse an additional index. Some folks would just use:

?S LA=ENGLISH

And be done with the problem. I’ll often start with an EXPAND command and …

… follow it up with a SELECT command of the “E Number.” Thus,

?S E3

… puts a set of English language records in a (huge) set …

Then I can return to use of the AND Operator to finish the job.

Gosh, I’ve been doing a lot of searching. What kind of a bill am I running up? The “COST” command allows me to check. Sometimes I’ll use the COST command after a TYPE command to see how much I just spent. Always make sure that you’re TYPE-ing the correct set number or the subsequent COST command might be a bit unsettling. Thankfully, if you ever make an expensive mistake and generate useless output, you can call DIALOG and explain the situation. Be prepared to tell them your User number and the Session number. In this case, the User number is 556323 and the Session number was D1.2. Oh, what’s the TYPE command? Well, I got a little ahead of myself. We have this great (we hope) set S10 and we’d like to look at some of the results. The TYPE command is what we need to use to retrieve the results of our work.

This slide and the next one show the beginning of the results of our TYPE command example …

Note that I asked for “Format 8.” This is one of the formats of database output that is free in the ERIC database. This is often not even enough information to find the document, although in this particular case I would be trying “ED452367” in a library’s huge set of ERIC microfiche. Many libraries would put the ERIC documents in “ED” number order.

Hopefully you’ll find this to be a good example. The Bluesheet will indicate which formats are available to the searcher. If you ever wanted all of the results of a particular set, the word “ALL” could be substituted for “/1-5” in our example.

Now, the ERIC database is one of the databases that uses something called Controlled Vocabulary. Specifically, the database uses the Thesaurus of ERIC Database Descriptors. If you EXPAND a word or a phrase in a database and you see a column headed with an “RT” then you’ll know you’re in a database that has an online thesaurus (but, of course you scoured the DIALOG Bluesheet ahead of time and KNEW the database had a thesaurus. Right!!??!!!)

In this example, I’m thinking of trying to focus my search to fear of mathematics in fifth grade. I had spotted the “GRADE 1, 2, 3, …” descriptors in the earlier results in Format 8 and thought I’d expand on the phrase “GRADE 5” to see if I could use it and perhaps other terms that would be or could be considered synonymous for my purposes.

In this example, I’ve used the “SELECT” command on E number “E3,” but it would have been more interesting to use the EXPAND command:

?E E3

This would have listed the three related terms for me. Bummer! I must have had a bad day! I think I’ll do it and stick the slide in!

Here’s an example of the TYPE command that requests Format 9 for record 1 of the set S12.

It’s a 16 page document … again probably in a huge bank of ERIC microfiche documents that one will find in many libraries with good education collections. ED219235 is what we’d be looking for.

When you’re connected to the DIALOG service, the meter is running. So, we’ll also want to stop spending money! The LOGOFF command will end your search session. By the way, with these student accounts, don’t worry about costs. DIALOG provides this service to us so that students have the chance to learn how to use their system. Note that I put “$0.00 2 Type(s) in Format 9” in bold face font. I wanted to point out that in a real situation you would be charged per record for the output.

I’ve already hinted at how important pre-planning can be. Let’s take a couple of minutes to discuss the process. Note that spending a little time exploring a topic before you start a database search is usually a BIG time saver. As school librarians, I would really, really, really, really, really like you to stress this step when you’re teaching kids to do their own database searching.

A lot of the searching that I’ve done was as an intermediary – that’s where a scientist or engineer (or even a management person when I’d let them talk to me) would talk to me about their information need and then I would perform the database search for them. That’s where “What topic do we have in mind?” is really emphasized! I would often ask the scientist or engineer (and yes even a management person once in a while – are you beginning to get the idea that I just might have a problem with authority figures? J ) open ended questions to get them to open up to me about their work. Then I’d try to restate their request to see if I heard their true information need. After that, I’d always ask them to take a few minutes and write down their request – often I’d get a sentence or two … or a short paragraph. I’d explain that when we communicate in writing we sometimes reach a bit deeper and reveal something new about our information need. Today, with the exception of the special library setting, many people do their own searching and we seem to have inherited more of a teaching role or we get handed the truly nasty searching problems. Just because folks are doing their own searching doesn’t mean that they should skip trying to state their topic to themselves. It’s crucial to do so or they’ll waste a lot of their own time!

Next, one has to figure out where they’re going to look for information. You’ll get a little practice with that in this class. You should spend the rest of the pursuit of your degree getting better at figuring out where to look for information. As a librarian, people will turn to you to advise them on this particular step. The better you can put yourself in the “information need shoes” of your patron … the better you can advise them on where to look for information.

You also have to consider what terms to use in your search. I strongly recommend that the searcher consider looking up an encyclopedia entry or some kind of introduction to their topic. Our third tutorial for the week deals with a search demonstration for literature about the reintroduction of wolves. At this step in the process, I started looking at encyclopedias, special web pages on wolves produced by a trusted source (in this case I was pleased to find some good descriptions on the National Park Service’s Yellowstone National Park web site), etc. Did I want to look for wolves or perhaps timber wolves or gray wolves or red wolves? How about using the scientific term Canus lupus? What would be synonymous with reintroduction? Is the reintroduction of wolves too broad of a topic? Can I figure that out from general reading about wolves? Do I suspect that I’ll be faced with “Too Much Information?” [apologies to Duran Duran]

How am I going to combine my terms? Can I sketch out my strategy in a Venn Diagram … or perhaps two?

This is another of those slides that I borrowed from DIALOG. It’s a pretty good review of the last slide, but in my opinion they should stress the use of reference sources to help with determining what individual concepts are within the topic’s description and to help with choosing synonyms or related/narrower/broader terms. The question “What words must a record contain in order to be relevant?” is fantastic. Athletes often visualize what they try to do in practice – librarians should do the same with database searching.

You guessed it … I borrowed this one as well. It’s not a bad worksheet! We have to identify and describe our topic, we have to consider where we need to look for information. Then, it helps to break up our search strategy into concepts. For concept 1, you would list the synonymous terms below (next to the OR diamond). You would do the same for concept 2 and then a third concept if necessary. If you use more concepts than that, well … your resultant set might miss a lot of pertinent literature. You may wish to invest more time into exploring your topic with general reference resources to see if you can change the way you’re thinking about your concepts. Sometimes database searchers will use a totally unnecessary concept. Why use the term library or libraries when you’re searching the Library Literature and Information Abstracts Database? … or the LISA database? Why use the term education when searching ERIC? These databases should … by their nature … contain records that already deal with the topic.

Finally, it’s always wise to actually write out the commands that you plan to use. Sometimes things go awry in mid-search, but it still helps to have the rest of your train of though to follow.

The only advice that I would give for a search worksheet would come in an intermediated search situation. I’d have a checklist of things like … What languages are okay to retrieve in the results? Note that an English abstract is usually available. How do you want me to deliver the results to you? E-mail? Printed Copy? Diskette? CD? Ready to import into bibliographic management software? Does the search need to be comprehensive or do you only need some literature to get you going for now? What types of literature do you think would be relevant? Journal articles? Patents? Theses? Conference Papers? Only the latest market research? Books?

Let’s run through a search in the BIOSIS database. When I got out of library school, my first position was at the Thomas Cooper Library of the University of South Carolina. The biology program at “The USC” had a great relationship with the Riverbanks Zoo – a very nice natural habitat design was employed at the zoo. Students would be given behavioral assignments where they’d have to study the behavior of one of the zoo’s residents. One possibility would be the preening behavior of the birds at the zoo. Let’s take the mallard as an example. Well, we’d probably want to read about mallards a bit in our preparation. At Carnegie Mellon I happened to have ready access to Access Science (the McGraw-Hill Encyclopedia of Science & Technology online) and Britannica Online. From the articles, I can surmise that I’d want to use the scientific name for the mallard – Anas platyrhynchos. Neither of the encyclopedia articles touched on the topic of preening within the articles on mallards. I looked up preening in both sources as well. My gut feeling is go with just preening and its variant word endings.

So, I start my search in the Biosis Previews® database …

It would be nice to have a way to take just the root word and allow for variant endings. That is usually know as truncation. The symbol for truncation varies with the database search syntax. In many Internet search engines, the asterisk (*) is used. In DIALOG, the question mark (?) is used. We’ll go into further rules of truncation later, but if you want to know the basic method … we could have used our truncation symbol with each word:

?S PREEN? [searches for all words beginning with “preen”]

?S MALLARD? [searches for all words beginning with “mallard”]

The single question mark allows for multiple character truncation in DIALOG. More on truncation later … I promise!

Whoa Nellie! What’s that “(W)” between the word “ANAS” and the word “PLATYRHYNCHOS”? We’ll learn more about stuff like this later, but the WITH Operator “(W)” simply searches for two words that appear next to one another … in that order. It’s a phrase search.

Set S4 basically covers the concept of mallards

Set S1 covered the concept of preening

Combining S1 and S4 with the AND Operator yielded 21 records … 13 of which are in English. In BIOSIS Previews ®, the language limit for English works fine …

At this point I could produce my output with a TYPE command. Then, maybe I would decide that I’m not done looking quite yet. There’s another database that I want to check. It’s the Zoological Record. It’s file 185. Rather than lose all my commands, I’m going to temporarily save them with the “SAVE TEMP” command. The system responds that it has saved the search … if memory serves me correctly … the search strategy is saved for a 24 hour period.

Now I’m getting fancy on you. I’ve stacked two commands into one line! You should recognize the BEGIN command. Then I entered a semicolon – this let’s DIALOG know that I’m about to enter a second command. The next command is the EXS command. That’s short for Execute Steps. Entering “EXS” without specifying a particular saved search strategy automatically runs the latest search that I saved.

So, here DIAlOG goes … into the Zoological Record Online ® and grinding through my previously saved search strategy …

The “S” of EXS means that it DIALOG will run the search in steps … leaving me with a full compliment of set numbers to work with further if need be. It also gives me some needed details of how the search is progressing. Wouldn’t you know it … the “/ENG” limit doesn’t work in the Zoological Record Online ®. Should I worry? Perhaps not with such a small set.

It’s very important to evaluate the system’s response to your commands as you conduct your search. Mistakes happen and I’ve made some doozies in my day!

If you get results that are highly unexpected, there’s usually a reason. Trust your gut instincts.

If you use an AND Operator, the result should be less than the two sets you’re combining. If not, did you goof on the set number that you entered?

If you use an OR Operator, your results are usually somewhat larger. If not, it’s time to review your work!

We’ve seen this mistake before! Look at the huge resulting set! If there are only 58,984 records on math or mathematics, why is set S4 so doggone large? Using an AND Operator should yield something smaller than the 4,055 record set for fear. In this case, I’ve asked the computer to look for the “word” “1” instead of using the set “S1”. The same thing happened with the other set.

Misspellings can really burn you. Did you know that “naptha” is really spelled naphtha? Yeah … I learned the hard way! PowerPoint found my mistake and underlined it, but when you’re querying DIALOG … you’re on your own! How many spelling bee trophies are on your mantel?

Sometimes your search history – what you see in response to a “DS” (Display Sets) command – can get longer than you originally envisioned. It can be hard to keep all of the sets straight in your mind. That’s why I make such ready use of the DS command.

Watch how DIALOG is handling your commands … it might refuse to process a limit or a prefix … can you live with the result? Maybe … maybe not.

Oh … that last one … the memories almost hurt! When you’ve done the number of searches that I’ve done, you’re bound to have stories to tell!

Well, I think I’ve covered enough ground to get you through Searching Exercise One with flying colors. You know how to reach me if I have failed to do so! See y’all in Lesson Two!