‹header›
‹date/time›
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
‹footer›
‹#›
Well, at this point I’m hoping that
you’ve seen the method to this course from our discussion of Carol Tenopir’s
article. In our “tread with dinosaurs,”
we’ll be getting to see an incredibly powerful search interface to an enormous
suite of databases. Many of these
databases will be familiar – from your experience with access to the databases
supplied by other vendors. We’ll use
this database searching system to see all of the power that you might
experience with other interfaces throughout your career. It will be an experience that you can always
use to explore the databases that you meet when you’re walking down the
Information Superhighway. When it comes
to all-around capability, everything else will likely fall short. You may find yourself saying … “I know I
could do that with Dialog … is there a way with this search interface?” The only drawback is that it is nearly
devoid of a graphical user interface. You’ll
be constructing the actual commands in DIALOG search query syntax.
Oh … snips and snails and puppy dog
tails – that’s what databases are made out of … OK, so they’re not. Databases are also referred to as files or
datafiles. Sometimes a databases is
split into subfiles, but not necessarily.
The ERIC database is a good example of a database that is split into
two subfiles – the Current Index to Journals in Educations (CIJE) and
Resources in Education (RIE). The
important thing to remember is that some databases will have subfiles and some
won’t. When they exist, they can
sometimes be helpful to craft a search strategy. All databases will be made up of database
records. Each database record is made
up of fields. Sometimes fields are
divided into subfields. Perhaps an
example will help.
Here’s an example from the online
catalog of a library. The online
catalog is an example of a database.
The record is an example of the familiar bibliographic record. It simply describes a book. The fields should also be familiar: author,
title, publisher, etc. The book happens
to be one of my favorites. It was my
textbook for a course in US and Canadian geography. The book suggested that North America could
be split into nine different nations that would make more sense than the three
that we currently have. He explains why
he picks the borders that he does. One
memorable one was a diagonal line from northwest Connecticut to southeast
Connecticut. People to the north and
east of this line have a very strong tendency to be Boston Red Sox fans and
people to the south and west of the line tend to be New York Yankees
fans. Sorry Mets fans … I don’t think
he paid them much notice!
The record was too long to fit on one
screen. Maybe we should look at the
underlying structure …
I know!
I know! Eeek! It’s the dreaded MARC record – well, at
least some of the variable fields. The numbers on the extreme left are the tags
for the fields of the records. They’re
unique to the MARC record and you’ll often see them displayed as the more
human friendly “Personal Author”, “Title”, etc. Between the colons are what are known as
“indicators.” They are truly unique to
the MARC record. From my forever
dimming memory, I can only recall that the “4” means to ignore the first four
character spaces or “The “ – the initial article. The “|b” or “|c” are known as “delimiters” and signify the
beginnings of subfields of a field.
… and this would be the continuation of
the underlying structure. Can you pick
out the subfields?
Now I’ll try my very best not to confuse
you.
This is part of an actual database
that I built so that my wife could use a title listing to know which books
were already in my “presidential library.”
I’m going to try and describe how database records are “parsed” or
chopped up into parts that the computer can recognize as “words.”
The end result would be the production of an
“inverted file” – an alphabetized listing of what words appear in what
position of which record.
In addition
to this example, see Walker and Janes pages 55 through 63 and the GEP DIALOG
Lab Workbook pages 3-6 through 3-10.
This screen and the next two show record numbers (I’ve designated them
‘RN’) 101, 102, and 103.
My database
has three fields Author (AU), Title (TI), and Subject (SU).
We’re going to build the inverted file for
this tiny database of three records by hand.
Familiarize yourself with each record …
you should begin to wonder … just how is Matt going to chop up his
database?
This is a very good book by the way …
How we chop the record up into words and
phrases will lead to what the inverted file will look like.
Well, I’ve been introducing an awful lot
of vocabulary through these first ten slides.
Remember that Walker and Janes has a pretty nice glossary. The existence of a basic index versus
additional indexes is extremely important for searchers to understand. From a searchers perspective, whenever you
search a database without specifying a particular field to search, the basic
index is searched. It’s kind of a
default instruction for computer to follow.
It’s also VERY important to realize that the basic index in one
database can be different from the basic index in another database. To complicate matters even further, a
database that you’ll find on the DIALOG service might also be found in
EBSCOHost or FirstSearch. Even though
they host some of the same databases, they may choose to index them in
different ways. Sometimes a database host
might choose to leave certain fields completely out of the indexing
process. The important lesson to learn
here would be that the savvy librarian must be ready for anything and should
endeavor to find out what is defined as the basic index and which additional
indexes exist for all the databases that they search. Thankfully you don’t have to memorize all
that junk! You just need to know how to
look up this information and how to best articulate the situation to your end
users. We’ll learn about how to do this
with Dialog and that will provide you with a very good example for the future.
Oh, here comes yet another bit of jargon
– the “Stop Words.” This is a list of
words that computer is instructed to ignore as it builds the inverted
file. It’s as if the words don’t even
exist. Note that DIALOG’s list is very
short – shorter than many hosts of databases.
They even index the word “a” – remember that phrases like “type A” or “grade
A” can often be found. DIALOG and some
other database hosts feel that it’s important for their users to easily find
them.
Now, this is where we start to chop up
the database.
What’s up with the author
field?
Both the last name and first
name on one line – why isn’t the name chopped into a first and last name?
Well, our author index … an additional index
… will, by my design, be a phrase index.
That’s the only way the author will appear in our index:
Lastname, Firstname.
“101” is the record number (much smaller than the record numbers that you’ll
see in DIALOG).
“AU” is the field.
“1” is the first phrase.
For another example “psychohistorical” is also in the 101
st record, it’s in the title field of record 101
and it’s the 5
th word.
And the saga continues …
Wait a minute! This name is chopped up into words! Yup!
I determined that design too.
We’re in the middle of chopping up a subject field into words. Even “1913” is a word according to a
computer – aaw! They never could spell.
Whoa Milhouse! What’s this?
Well, this madcap designer has decided that the subject field of his
database is both word and phrase indexed.
We can do that you know! Even
within DIALOG you’ll see variations in whether a particular field is word or
phrase indexed. You simply have to
memorize all this stuff for 500 plus databases. Nah!
I’m just kidding. Never memorize
something that can easily be looked up.
DIALOG has a set of tools called “Bluesheets.” There’s a Bluesheet for each database that
goes through each and every field and indicates how the field is indexed … or
if it’s indexed.
And so it goes …
Thankfully we’ll only take this until we
get the idea. There’s a point where one
begins to really appreciate what we can assign to computers to do. Let’s continue by alphabetizing our list.
And, numbers would kick off our alphabetized list of words …
Each ‘word’ has its record number, field, and position within the field
in our list.
Finally we get to the “a’s” in the list.
Hopefully I haven’t screwed up the
numbers of the records. This is hard
enough to follow as it is.
We also can’t forget the separate
additional index that we’ve built.
There are only three entries – they’re the authors of the three books.
Oooh!
How am I going to do this? I’ll
tell you what. I’ll do my best to go
through the ERIC database Bluesheet using the Camtasia Studio software –
you’ll be able to listen to my explanation and I’ll try to point out what is
important. The link in the slide is to
the DIALOG Bluesheet for the ERIC database.
One of the things that I’ll show is the pair of example records – one
is a journal record from the CIJE subfile of ERIC and the other is a document
record from the RIE subfile of ERIC.
The link above is to my work web page at
Carnegie Mellon. You can actually view the
structure that exists within any web page.
Use the “VIEW” pull-down menu on your browser and then choose
“Source.” The first code recognized by
a computer would be the “<HTML>” tag that you’ll see about five lines
down (after a bunch of remarks that the computer ignores). This would be equivalent to a document type
– if you’ve ever done an advanced Google or Yahoo search, you’ll notice that
you can restrict your search to a type of document. Right after the “<HTML>” tag you’ll
see the beginning of the “<HEAD>” of the document. The chief portion of the head of an html
document is the “<TITLE>.” Note
that I put a bunch of different variants of my name in the title of my
page. I did that so that if someone
looks for me as “Matt Marsteller” or “Matthew Marsteller” or “Matthew R.
Marsteller,” then when this page is indexed by most search engines, there’s a
good chance that most search engines give greater weight to the title of a web
page. The title of a web page is what
shows up in the blue bar at the top of your browser window. The web page also has “Web Page of Matthew
R. Marsteller” in big bold letters. If
you look in the source code (about a third of the way through the code),
you’ll see that this phrase is enclosed in “<H1>” and </H1>”
tags. This is a heading within a
document and can also be given more prominence by a search engine. The key operative phrase is “can be given
more prominence” – nothing is guaranteed.
When search engines rank their search results, a lot of things come
into play. Ranking algorithms (rules
the computer will follow) are often redesigned. We’ll explore searching of the Internet at
the end of our study together. For now,
I hope that you’ll appreciate that the typical web page on the Internet can,
and usually does, have structure that search engines can make use of. Perhaps it’s a bit more crude than we’re
used to … but it’s there! There are
other concerns such as … is it standard HTML code? The answer is, unfortunately, no. One thing that is popular for people to do
is to use a word processor like Microsoft Word to produce their HTML code
(using the “SAVE As” function). It’s
easy to do, but the result is NOT standard HTML. Most browsers will interpret it correctly
(kind of) – hopefully most search engines will as well, eh?
We’ve just been talking about a
collection of online databases so far.
Today, the online database is truly the most common and delivery is often via the Internet. CD-ROM databases were quite popular in the
roughly ten years before the advent of the
World Wide Web, but they proved difficult to work with – especially when it
came to effective networking. Tape loaded databases were quite common before the Internet
as well, but now they’re not used much.
Sometimes disk drives are used to
store data, but again it is not common.
The online database accessed via the Internet is predominant. When I first started out in libraries, we accessed host computers of online
databases by dialing into them via phone lines. Connection speeds were so slow that I could read the data as it came across my
screen. In library school, I used a
dumb terminal with what was known as an acoustical
coupler. We would type in our commands
and then wait for a computer response to printed onto the terminal paper – there was no screen.
People used IBM Selectric typewriters as their dumb terminals. My first supervisor would talk of the days when she would use a keypunch machine to type search
commands onto Hollerith cards. The
cards would be mailed off to a university
such as Georgia Tech and fed into their computer. If there were any typographical errors, you
found out when the computing center would
send you your results via mail. This
would take weeks … even months. How
things have changed!
Databases can also be divided up
by the structure of their data. The
bibliographic database, such as ERIC or
NTIS or a library catalog, are very familiar to most people – especially to
librarians and library school students.
The directory is also a common
type of database that one will experience.
They are just as tricky to search as
other types of databases. Even a small
directory of a few hundred hotel guests or hospital patients can overwhelm
those unprepared to use them. At the recent ALA conference in Chicago, I
failed to reach a colleague because the query was too difficult for the operator of the hotel. At Carnegie Mellon, we routinely get calls
for the public library because the telephone operators have difficulty with the directory.
The fulltext database has now
become very common. Fulltext databases
can be difficult to search unless one can
focus their search to bibliographic and abstract data. They will often swamp the database searcher
with huge sets of search results. They can often be quite advantageous as
well. Once, I was trying to see if
anyone had ever found a way to put an electric
charge on a bunch of tiny pieces of glass.
I was able to search for the words glass near the word pellet or shard
or bead or fragment that also showed up
near the phrase “electric charge” or “electrical charge.” I located a German patent that described a way that a company did it in order to put a charge on
particles in a sand blaster – the idea was to have the glassy fragments of the
sand blaster have the dust from the blaster
stick to the glass. Remember in science
class when you had to rub a glass rod with a sheet of rubber? Well, the
inventor had the glass pellets pneumatically transported through a
rubber-lined tube. The inner surface of the tube had helical shaped ridges that caused the
glass fragments to bounce all over the place – the end result of the glass pieces bouncing off the rubber was a static charge! That had to be one of my happiest moments as
a database searcher. The engineer that I was working with was astounded when I
found a working method with just a couple of hours of planning and conducting a literature search. We’ll be learning how DIALOG lets us search
through full text. Then … take this to
the bank … you’ll be looking for a good
equivalent search feature whenever you come across a fulltext database.
Citation databases – wow! Things have really begun to change in this
arena. Perhaps ten years ago, the only citation databases were those in the Westlaw legal
databases and the triad of databases from the Institute for Scientific Information: the computer versions of the Science Citation Index, the Social
Sciences Citation Index, and the Arts and Humanities Citation Index. The power of a citation index is that you
can start from one good article written perhaps … five years ago … and quickly search for other more recent articles that
listed the older article in their list of references.
Financial databases are an
interesting monster! The key to
searching them is to gain an appreciation for the typical information found in a financial record. Then, gain an appreciation for how the data
is structured – in DIALOG we would use
their Bluesheets. Other database
vendors also provide searching guidance as well.
Numeric databases allow the user
to do things like search for materials with a certain set of properties … perhaps a range of boiling points for a database of
chemicals or metals with a coefficient of thermal expansion below a desired value.
To help illustrate what the last
three databases are like, the URLs that I’ve provided will lead you to search aids for databases of each type. The first two are Bluesheets from
DIALOG. The third is the “Database
Summary Sheet” from the American Chemical Society’s STN databases. STN is a service that can be considered
similar to DIALOG. It has more of an emphasis on scientific information while DIALOG seems to
do an admirable job with all types of databases (with the exception of numeric databases of scientific information – note that
my example wasn’t another Bluesheet).
This should reinforce the “First
Contact” tutorial a bit. DIALOG uses
the “?” (question mark) as the command prompt.
The command prompt is the symbol that the computer system uses when the
searcher needs to supply input. This
slide shows the straightforward example of how to start in … or switch to … a
database while you are connected to DIALOG.
This would be similar to clicking a checkbox to choose the database or
databases that you want to search within a collection of databases like EBSCOHost
or pointing your browser to the proper URL for a library’s online catalog.
This simply shows the system’s response
to the BEGIN command. We’re in the ERIC
database and another command prompt is showing.
The SELECT command is the next tool to
learn. In this example, I’ve tasked the
computer to find all records with the word “mathematics” in the basic index of
the ERIC database. It will look in all
the fields of each record that are designated as part of the basic index. Again, you’ll find a description of the
basic index in each and every Bluesheet.
If you’ve viewed the tutorial, the
search should look a little familiar.
The number of records retrieved is probably a bit lower – some time has
passed since I did the example search for the PowerPoint slides. In this slide, I searched for the word
“fear” after completing the search for the word “mathematics.” Note that DIALOG has assigned a set number
for each set of records. I’m able to
use these set numbers (“S1” and “S2”) to find records where both words show up
(again, simply in the basic index). A
third set, “S3,” is created and the system responds with another prompt. The word “AND” that I put between “S1” and
“S2” is significant. The DIALOG system
recognizes that as a special word known as a Boolean operator …
These are the three main Boolean
Operators. Note that the description of
the AND Operator matches what I was trying to accomplish with finding records
on the fear of mathematics. Let’s
explore each separately …
Back in third grade, I first learned
about set theory and Venn Diagrams. I
have to admit that I never foresaw a career where I tend to draw Venn diagrams
several times a day! Let’s let the left
circle represent all the records in a database that contain the word
“mathematics.” The circle on the right
represents the set of all records that contain the word “fear.” Thanks to people like me that struggled
mightily with multiplying and dividing fractions (at first) in fifth grade, we
can expect that educators have looked into the problem of kids being a bit
afraid of math. The green shaded area
represents records that contain both words.
The circles aren’t drawn to scale of course. The set of records with the word
“mathematics” was much larger … more than ten times larger. Hopefully this makes the search more
understandable. One thing that you
should realize is that the result of using an AND Operator should be a set of
records that is SMALLER than either of the two sets that you started with (it could
be the same size as the smaller of the two sets … but that is not likely to happen). It’s important to note that using too many
AND Operators may yield sets that are too small to cover the topic. Start with the most significant concepts for
your search and combine them until you have a reasonably sized set to work
with. Don’t overuse the AND
Operator. Use it, but do so
wisely. Think about the topic and the impact
that your strategy might have on your results.
Typically, a computer will only do what
is instructed. Searching for
“mathematics” is interpreted exactly.
If any of the records contained the more casual “math,” well, our earlier
search would have missed the ones that didn’t also have the word “mathematics.” For the purposes of a database search, the
words in the Venn Diagram above are synonyms.
The OR Operator looks for either word to show up in a record and
gathers them into a set that we can use later.
Note that the results for truly synonymous terms will have
overlap. The results of an OR Operator
is a set that is larger than the set of records retrieved for each individual
word (unless there’s a case of complete overlap … theoretically possible, but
again, highly unlikely). Database searching
often calls for the searcher to think out of the box a little. Perhaps, for my purposes, “fractions” would
or could be considered synonymous with mathematics. “Fractions” would be a narrower term, but
what if any of the three would be acceptable to the information seeker. Have I got you thinking?
These three slides with the Boolean
Operators illustrated were borrowed from a training tool from DIALOG. Don’t blame their example on me! The NOT operator must always be used with
great caution. The guidance given in
this slide is most wise. Remember
it!
In this slide, I’ve used the OR
Operator.
It helped me to retrieve more
than 4,000 additional records that should cover the concept of
mathematics.
Followed by that, I may be losing track of the sets that I’ve created.
“DS” is the abbreviation for “DISPLAY
SETS.”
I use this command a lot.
Note set “S4.”
It shows a mistake.
The “S” before the number for sets is VERY important.
Set “S3” is quite different from set
“S4!”
The results of set S4 are a Boolean
AND of records containing the “word” “1” and the “word” “2” – the computer
will treat a number as if it were a word unless given specific instructions
not to do so.
We’ll see some of those
examples later in the class.
So, my new set S6 is a little larger
than set S3.
It is most likely a
superior set of search results from a completeness standpoint.
I told you I use that DS command a lot!
If you think over our search strategy,
set S3 is indeed a subset of set S6.
Sets S6 and S7 would be considered equivalent sets.
The next thing that I tried was to limit my search to only English
records.
This command works in a lot of
DIALOG databases, but when it doesn’t work, the system indicates that the
command is ignored.
If you look at the
ERIC Bluesheet (link provided), and hone in on the section of the Bluesheet
that discusses Limits, you’ll find no mention of a limit for English language
materials.
Thankfully there’s another way
to handle the problem.
Here’s a mind
bender!
DIALOG will let you search multiple
files together.
What happens when one
file accepts the “/ENG” limit and others don’t?
Well, sometimes the search gets a little
messy.
Preplanning your search is the
only good remedy.
In a technique similar to LIMIT
suffixes in DIALOG, it’s important to note that you can restrict your search
to a part of the basic index. You would
often want to do this to make sure … for example … mathematics is a very
important concept in a particular set of search results. It is important to be cognizant of what
fields are in the basic index. I know
I’m probably beginning to sound like a broken record, but it’s a very
important concept. The second example
shows a search for the word “mathematics” restricted to the title (ti)
field. There are times when this
extreme narrowing of a search is beneficial.
One example of this is when you try to verify a citation (patrons often
struggle trying to find a citation that is slightly incorrect – maybe they
misheard a detail or two in a discussion with a colleague at a
conference. There’s always the hopeless
case where even the best of databases won’t help … one famous item of library
lore would be the kid that visits his local library looking for a book called
(according to the kid) “Oranges and Peaches” – a good reference interview
eventually revealed to the librarian that the kids teacher had mentioned a book
on evolution by some guy named Darwin.
Aha! The kid wanted “Origin of
the Species!” As a person that has
struggled with hearing impairment, I’m on the kids side! Oh … how many times have I misheard
things!
Sometimes I’ll restrict a search to
the titles or descriptors … like you see in the third example above. If the information need is a few good
articles on a topic, this can be a reasonable choice. For multiple fields, simply separate the
field abbreviations by a comma. What
fields are available for suffix searching?
Again, it depends on the database – the Bluesheet is where to look!
Okay!
I get to introduce a new command and bail us out of our problem that we
had in the ERIC database – that one without the Limit command that we
needed!
If you scour the ERIC
Bluesheet, you’ll notice the Language Field as one of the additional indexes.
At this point, I’ve always been more
comfortable with using the EXPAND command or “E” to browse an additional
index.
Some folks would just use:
?S LA=ENGLISH
And be done with the problem. I’ll
often start with an EXPAND command and …
… follow it up with a SELECT command of the “E Number.” Thus,
?S E3
… puts a set of English language records in a (huge) set …
Then I can return to use of the AND Operator to finish the job.
Gosh, I’ve been doing a lot of searching.
What kind of a bill am I running up?
The “COST” command allows me to check.
Sometimes I’ll use the COST command after a TYPE command to see how
much I just spent.
Always make sure
that you’re TYPE-ing the correct set number or the subsequent COST command
might be a bit unsettling.
Thankfully,
if you ever make an expensive mistake and generate useless output, you can
call DIALOG and explain the situation.
Be prepared to tell them your User number and the Session number.
In this case, the User number is 556323 and
the Session number was D1.2.
Oh, what’s
the TYPE command?
Well, I got a little
ahead of myself.
We have this great (we
hope) set S10 and we’d like to look at some of the results.
The TYPE command is what we need to use to
retrieve the results of our work.
This slide and the next one show the
beginning of the results of our TYPE command example …
Note that I asked for “Format 8.” This is one of the formats of database
output that is free in the ERIC database.
This is often not even enough information to find the document,
although in this particular case I would be trying “ED452367” in a library’s
huge set of ERIC microfiche. Many
libraries would put the ERIC documents in “ED” number order.
Hopefully you’ll find this to be a good
example. The Bluesheet will indicate
which formats are available to the searcher.
If you ever wanted all of the results of a particular set, the word
“ALL” could be substituted for “/1-5” in our example.
Now, the ERIC database is one of the
databases that uses something called Controlled Vocabulary.
Specifically, the database uses the
Thesaurus
of ERIC Database Descriptors.
If you EXPAND a word or a phrase in a database and you see a column headed
with an “RT” then you’ll know you’re in a database that has an online thesaurus
(but, of course you scoured the DIALOG Bluesheet ahead of time and KNEW the
database had a thesaurus.
Right!!??!!!)
In this example, I’m thinking of trying to focus my search to fear of
mathematics in fifth grade.
I had
spotted the “GRADE 1, 2, 3, …” descriptors in the earlier results in Format 8
and thought I’d expand on the phrase “GRADE 5” to see if I could use it and perhaps
other terms that would be or could be considered synonymous for my purposes.
In this example, I’ve used the “SELECT”
command on E number “E3,” but it would have been more interesting to use the
EXPAND command:
?E E3
This would have listed the three related terms for me.
Bummer!
I must have had a bad day!
I
think I’ll do it and stick the slide in!
Here’s an example of the TYPE command
that requests Format 9 for record 1 of the set S12.
It’s a 16 page document … again probably in a huge bank of ERIC microfiche documents
that one will find in many libraries with good education collections.
ED219235 is what we’d be looking for.
When you’re connected to the DIALOG
service, the meter is running. So,
we’ll also want to stop spending money!
The LOGOFF command will end your search session. By the way, with these student accounts,
don’t worry about costs. DIALOG
provides this service to us so that students have the chance to learn how to
use their system. Note that I put
“$0.00 2 Type(s) in Format 9” in bold face font. I wanted to point out that in a real
situation you would be charged per record for the output.
I’ve
already hinted at how important pre-planning can be. Let’s take a couple of minutes to discuss
the process. Note that spending a little time exploring a topic before you
start a database search is usually a BIG time saver. As school librarians, I would really, really, really, really, really like you to
stress this step when you’re teaching kids to do their own database
searching.
A lot of
the searching that I’ve done was as an intermediary – that’s where a scientist
or engineer (or even a management person when
I’d let them talk to me) would talk to me about their information need and
then I would perform the database search for them. That’s where “What topic do
we have in mind?” is really emphasized!
I would often ask the scientist or engineer (and yes even a management person once in a while – are you
beginning to get the idea that I just might have a problem with authority figures? J ) open ended questions to get them to open up to me
about their work. Then I’d try to
restate their request to see if I heard
their true information need. After
that, I’d always ask them to take a few minutes and write down their request –
often I’d get a sentence or two … or a
short paragraph. I’d explain that when
we communicate in writing we sometimes reach a bit deeper and reveal something new about our information
need. Today, with the exception of the
special library setting, many people do their
own searching and we seem to have inherited more of a teaching role or we get
handed the truly nasty searching problems. Just
because folks are doing their own searching doesn’t mean that they should skip
trying to state their topic to themselves.
It’s crucial to do so or they’ll
waste a lot of their own time!
Next, one
has to figure out where they’re going to look for information. You’ll get a little practice with that in
this class. You should spend the rest of the pursuit of your degree
getting better at figuring out where to look for information. As a librarian, people will turn to you to advise them on this
particular step. The better you can put
yourself in the “information need shoes” of your patron … the better you can advise them on where to look for
information.
You also
have to consider what terms to use in your search. I strongly recommend that the searcher
consider looking up an encyclopedia entry
or some kind of introduction to their topic.
Our third tutorial for the week deals with a search demonstration for literature about the reintroduction of wolves. At this step in the process, I started
looking at encyclopedias, special web pages on wolves produced by a trusted source (in this case I was pleased to
find some good descriptions on the National Park Service’s Yellowstone National Park web site), etc. Did I want to look for wolves or perhaps
timber wolves or gray wolves or red wolves?
How about using the scientific term
Canus lupus? What would be synonymous
with reintroduction? Is the
reintroduction of wolves too broad of a
topic? Can I figure that out from
general reading about wolves? Do I
suspect that I’ll be faced with “Too Much
Information?” [apologies to Duran
Duran]
How am I
going to combine my terms? Can I sketch
out my strategy in a Venn Diagram … or perhaps two?
This is another of those slides that I
borrowed from DIALOG. It’s a pretty
good review of the last slide, but in my opinion they should stress the use of
reference sources to help with determining what individual concepts are within
the topic’s description and to help with choosing synonyms or
related/narrower/broader terms. The
question “What words must a record contain in order to be relevant?” is
fantastic. Athletes often visualize
what they try to do in practice – librarians should do the same with database
searching.
You guessed
it … I borrowed this one as well. It’s
not a bad worksheet! We have to
identify and describe our topic, we have to
consider where we need to look for information. Then, it helps to break up our search strategy into concepts. For concept 1, you would list the synonymous
terms below (next to the OR diamond). You would do the same for concept 2 and then
a third concept if necessary. If you use more concepts than that, well … your resultant
set might miss a lot of pertinent literature.
You may wish to invest more time
into exploring your topic with general reference resources to see if you can change the way you’re thinking about your
concepts. Sometimes database searchers
will use a totally unnecessary concept. Why
use the term library or libraries when you’re searching the Library Literature and Information Abstracts Database? … or the LISA database? Why use the term education when searching ERIC?
These databases should … by their nature … contain records that already
deal with the topic.
Finally, it’s always wise to actually write out
the commands that you plan to use.
Sometimes things go awry in
mid-search, but it still helps to have the rest of your train of though to
follow.
The only advice that I would give for a search
worksheet would come in an intermediated search situation. I’d have a checklist of
things like … What languages are okay to retrieve in the results? Note that
an English abstract is usually available.
How do you want me to deliver the results to you? E-mail? Printed Copy? Diskette?
CD? Ready to import into bibliographic management software?
Does the search need to be comprehensive or do you only
need some literature to get you going for now?
What types of literature do you think would be relevant?
Journal articles? Patents? Theses? Conference Papers? Only the latest market
research? Books?
Let’s run through a search in the BIOSIS
database.
When I got out of library
school, my first position was at the Thomas Cooper Library of the University
of South Carolina.
The biology program
at “The USC” had a great relationship with the Riverbanks Zoo – a very nice
natural habitat design was employed at the zoo.
Students would be given behavioral
assignments where they’d have to study the behavior of one of the zoo’s
residents.
One possibility would be the
preening behavior of the birds at the zoo.
Let’s take the mallard as an example.
Well, we’d probably want to read about mallards a bit in our
preparation.
At Carnegie Mellon I happened
to have ready access to Access Science (the McGraw-Hill Encyclopedia of Science
& Technology online) and Britannica Online.
From the articles, I can surmise that I’d
want to use the scientific name for the mallard – Anas platyrhynchos.
Neither of the encyclopedia articles touched
on the topic of preening within the articles on mallards.
I looked up preening in both sources as
well.
My gut feeling is go with just
preening and its variant word endings.
So, I start my search in the Biosis Previews® database …
It would be nice to have a way to take
just the root word and allow for variant endings.
That is usually know as truncation.
The symbol for truncation varies with the
database search syntax.
In many
Internet search engines, the asterisk (*) is used.
In DIALOG, the question mark (?) is
used.
We’ll go into further rules of
truncation later, but if you want to know the basic method … we could have
used our truncation symbol with each word:
?S PREEN? [searches for all
words beginning with “preen”]
?S MALLARD? [searches for all
words beginning with “mallard”]
The single question mark allows for multiple character truncation in
DIALOG.
More on truncation later … I
promise!
Whoa Nellie!
What’s that “(W)” between the word “ANAS”
and the word “PLATYRHYNCHOS”?
We’ll
learn more about stuff like this later, but the WITH Operator “(W)” simply
searches for two words that appear next to one another … in that order.
It’s a phrase search.
Set S4 basically covers the concept of mallards
Set S1 covered the concept of preening
Combining S1 and S4 with the AND Operator yielded 21 records … 13 of which are
in English.
In BIOSIS Previews ®, the
language limit for English works fine …
At this point I could produce my output
with a TYPE command.
Then, maybe I would
decide that I’m not done looking quite yet.
There’s another database that I want to check.
It’s the
Zoological Record.
It’s file 185.
Rather than lose all my commands, I’m going
to temporarily save them with the “SAVE TEMP” command.
The system responds that it has saved the
search … if memory serves me correctly … the search strategy is saved for a 24
hour period.
Now I’m getting fancy on you.
I’ve
stacked two commands into one line!
You
should recognize the BEGIN command.
Then I entered a semicolon – this let’s DIALOG know that I’m about to
enter a second command.
The next
command is the EXS command.
That’s
short for Execute Steps.
Entering “EXS”
without specifying a particular saved search strategy automatically runs the
latest search that I saved.
So, here DIAlOG goes … into the
Zoological Record Online ® and grinding through my previously saved search
strategy …
The “S” of EXS means that it DIALOG will
run the search in steps … leaving me with a full compliment of set numbers to
work with further if need be. It also
gives me some needed details of how the search is progressing. Wouldn’t you know it … the “/ENG” limit
doesn’t work in the Zoological Record Online ®. Should I worry? Perhaps not with such a small set.
It’s very important to evaluate the
system’s response to your commands as you conduct your search.
Mistakes happen and I’ve made some doozies
in my day!
If you get results that are highly unexpected, there’s usually a reason.
Trust your gut instincts.
If you use an AND Operator, the result should be less than the two sets you’re
combining.
If not, did you goof on the
set number that you entered?
If you use an OR Operator, your results are usually somewhat larger.
If not, it’s time to review your work!
We’ve seen this mistake before! Look at the huge resulting set! If there are only 58,984 records on math or
mathematics, why is set S4 so doggone large?
Using an AND Operator should yield something smaller than the 4,055
record set for fear. In this case, I’ve
asked the computer to look for the “word” “1” instead of using the set “S1”. The same thing happened with the other set.
Misspellings can really burn you.
Did you know that “naptha” is really spelled
naphtha?
Yeah … I learned the hard way!
PowerPoint found my mistake and underlined it, but when you’re querying DIALOG
… you’re on your own!
How many spelling
bee trophies are on your mantel?
Sometimes your search history – what you see in response to a “DS” (Display
Sets) command – can get longer than you originally envisioned.
It can be hard to keep all of the sets
straight in your mind.
That’s why I
make such ready use of the DS command.
Watch how DIALOG is handling your commands … it might refuse to process a
limit or a prefix … can you live with the result?
Maybe … maybe not.
Oh … that last one … the memories almost hurt!
When you’ve done the number of searches that I’ve done, you’re bound to
have stories to tell!
Well, I think I’ve covered enough ground to get you through Searching Exercise
One with flying colors.
You know how to
reach me if I have failed to do so!
See
y’all in Lesson Two!