In This
Issue:
Becoming
an Efficient and Effective Web Searcher
- An
approach to efficient and effective Web searching
- Understanding
search engines and subject directories
- Beyond
engines and directories: specialized tools and the Deep Web
- Formulating
and formalizing Web searching strategies
- Evaluating
Web resources
- More
Web topics in future issues
1.
An approach to efficient and effective Web searching
- Based
on workshops developed by Janet Eke, Pat Barlosky, and Cindy Kehoe, for the
Ontario Library Association, the National Technological University, and Graduate
School of Library and Information Science, University of Illinois at Urbana-Champaign
(1999-2002).
- Cohen,
Laura B. "The Web as a Research Tool: Teaching Strategies for Instructors."
Choice Supplement to vol. 36 (1999):19-44.
The Web is difficult
to search. Though the Web's hypertext point-and-click environment is simple, conducting
research on the Web is not. Its chaotic architecture of linking lends itself well
to surfing but not to efficient information finding. (Cohen, 1999.)
Information
professionals must be good Web searchers. The Web is an information resource
that cannot be ignored. We must know how to locate high quality Web content
quickly and accurately.
We already
have useful skills. Though some of the skills and knowledge needed to search
effectively are unique to the Web environment, many others come out of understanding
that we already have of the information structures of traditional print and
online collections. We can apply what we already know to the Web.
Three Steps
to Efficient and Effective Web Searching
The chart
below suggests three "A, B, C's" of effective Web searching. The sections following
discuss each component in more detail, drawing on library literature.
|
Step
|
Example
|
|
|
- Learn
what they are and how they work.
- Learn
the advanced search features of your favorites.
- Know
where to go to keep current on changing features and new services.
|
- B.
Know specialized search tools. Build a "Core Collection" of
Web sites beyond general-purpose search tools like Google or Yahoo!
- Specialized
tools may take you more directly to an authoritative source.
- Specialized
tools may help you locate valuable content not indexed by search engines
(the Deep Web).
|
- Know
good sources for general types of information, such as general reference
sites and news sites.
- Know
good sources for specific subject areas you search often.
- Know
gateway sites to specialized databases such as Deep Web pathfinders
and specialized directories.
|
- C.
Formulate and formalize Web searching strategies. Analyze Web
topics before you search. Consciously apply old and new search strategies
to the Web.
- Approaches
from traditional collections and Information Retrieval system searching
can be applied to the Web; other strategies are Web-specific.
- Analysis
can make the difference between Web searching success and failure.
|
Ask
- Is
there a type of source that will be useful for this topic? For example:
an encyclopedia entry, a news article, a business directory entry.
- Who
might be interested in or responsible for gathering information about
this topic? For example: a trade association, a government agency, a
research institution.
- What
is the best type of Web search tool to begin with? For example: a search
engine, a subject directory, a specific site from my Core Collection.
- Should
I search the Web in the first place?
|
2.
Understanding Search Engines and Subject Directories
- Barker,
Joe. UC Berkeley - Teaching Library Internet Workshops Berkeley - Types of
Search Tools (2001). http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/ToolsTables.html.
- Basch,
Reva and Mary Ellen Bates. Researching Online for Dummies, 2nd
edition. Chicago, IL: IDG Books Worldwide. 2000.
- Cohen,
Laura B. "The Web as a Research Tool: Teaching Strategies for Instructors."
Choice Supplement to vol. 36 (1999):19-44.
- Cohen,
Laura B. "Finding it All on the Web: Search Tools and Strategies." Choice
Special Issue to vol. 38 (2001):13-27.
- Cohen,
Laura. "Internet Tutorials - Checklist of Internet Research Tips " at the
University of Albany Libraries (2002). http://library.albany.edu/internet/checklist.html.
- Schlein,
Alan M. Find It Online: The Complete Guide to Online Research, 2nd
edition. Tempe, AZ: Facts on Demand Press, 2000.
The two main
types of general-purpose Web search tools are search engines and subject directories.
In form they can look almost identical, apparently offering very similar search
capabilities via similar interfaces; however, in function they are distinctly
different. Understanding how search engines and subject directories work and when
to use them is an essential step to becoming an efficient and effective Web searcher.
Human-centered
VS Machine-centered Services (sources 3, 5, 6)
What is a
subject directory? - A subject directory is a database of titles and URLs
of Web sites, compiled and organized into subject categories by humans. Directories
are usually searchable and browsable, and entries may be annotated. Selectivity,
evaluation, and quality of annotations vary between directories. Examples are
Yahoo! http://www.yahoo.com, The Open
Directory Project http://dmoz.org/, and
Infomine http://infomine.ucr.edu.
There are
two main types of subject directories. Cohen emphasizes that they have very
different purposes and should be approached accordingly.
- Academic
and professional directories - are often created by librarians or subject
experts to support the needs of researchers. Example: Infomine http://infomine.ucr.edu/
- Commercial
portals - are created by commercial companies to generate income and support
the needs of the general public. Example: Yahoo! http://www.yahoo.com
What is a
search engine? - A search engine is a searchable database of words from Web
pages compiled by a computer program called a crawler, robot, or spider. The spider
program travels the Web by following links, and gathering words from individual
pages it finds. These are collected in a giant index that can then be queried.
Results are ranked by relevancy algorithms. Every search engine's method of searching
is proprietary and the depth, breadth, and scope of its database is unique. Examples
are Google http://www.google.com, AltaVista
http://www.altavista.com, and AllTheWeb
http://www.alltheweb.com.
Subject
Directories VS Search Engines (sources 1, 2, 3, 4, 5)
|
Subject
Directories
|
Search
Engines
|
- Human-compiled
index of information about Web sites
- Organized
into subject categories
- Search
terms are matched in category names, site names, and site annotations,
NOT in the full-text of individual pages
- Smaller
than search engines
- Contents
MAY be carefully evaluated and annotated -- but not always, especially
in the case of large commercial portals
|
- Computer-compiled
index of words from individual Web pages
- NOT
organized into subject categories; results are ranked by computer algorithm
- Search
terms are matched in the actual text of Web pages; in many engines every
word in a page may be indexed
- Much
larger than subject directories; include more sites, and index them
much more deeply
- Contents
are unevaluated by humans
|
Use a subject
directory:
- When you
are researching a broad topic, or want just to get a sense of what is available
on a topic
- When you
want to find sites that are authoritative and/or substantively about a topic,
especially for topics that are widely covered on the Web
- When you
want to find sites that are often recommended and annotated by experts (remembering
that evaluation and selectivity varies between directories)
- When you
want to avoid searching the full-text of individual Web pages.
Use a search
engine:
- When you
are researching a more targeted, obscure or complex topic, and for some general
queries
- When your
topic can be expressed in distinctive and meaningful search terms
- When you
need to search the full text of individual Web pages
- When you
want to gather a great deal of information on your topic
- When you
want to take advantage of second-generation retrieval technologies such as
concept clustering, ranking by popularity, link ranking, etc.
Converging
Content: Engines with Directories and Directories with Engines
Most major
search services have formed alliances to provide both engine-type search results
and directory-type search results. This is why we see Web page search results
in Yahoo!, and can browse a directory in Google. Google owns and maintains the
engine database only; it gets its directory from The Open Directory Project.
Similarly, Yahoo! owns and develops its directory; currently it gets its engine
results from Google.
Searching
and Selecting Tips
- Select
a directory appropriate for your needs. Academic/professional and commercial
directories can have very different purposes and should be approached accordingly.
- Learn
'search engine math' and other advanced features of your favorite search
engine. The ability to specify how you want the system to treat your search
terms can make the difference between finding what you are looking for, and
being overwhelmed with irrelevant hits.
- Learn
Boolean operator equivalents, truncation,
etc. Investigate the correct syntax for your engine of choice, for example:
- Phrases
- enclose terms in quotation marks to search them as a phrase (works in
most systems)
- Excluding
terms - use "-" in front of a term to exclude it in AltaVista;
- Synonyms
- use "OR" in Google to search synonyms or word variants;
- Field
searching - use field searching to restrict terms to title, URL,
domain, etc. Investigate the correct syntax for your engine of choice. For
example, in Google use "link:" in front of a URL to find pages that link
to that page; (link:www.prairienet.org). In AltaVisa use "domain:" to restrict
to a particular domain (domain:gov)
Keeping
Current
To quickly learn
the features of your favorites, or find out about the latest new engines and directories,
use sites such as the following. Greg Notess, Danny Sullivan and Stephen Bell
do the work for you with news, features charts, reviews and search strategies.
Daily Briefings
Added 1/9/03 - To get daily new about search engines,
subscribe to the following:
Other Web Guides to Selecting Directories and Engines
3.
Beyond Search Engines and Subject Directories: Specialized Tools and the Deep
Web
- Cohen,
Laura. "Internet Tutorials - The Deep Web" at the University of Albany Libraries
(2002) http://library.albany.edu/internet/deepweb.html
- King,
David. "Specialized Search Engines." Online 24(3) (May/June 2000): 67-74.
- Price,
Gary and Chris Sherman. "Exploring the Invisible Web: 7 essential strategies."
Online 25(4) (July/August 2001): 32-34.
- Sherman,
Chris and Gary Price. "The Invisible Web." Searcher 9(6) (June 2001): 62-74.
- Smith,
C. Brian. "Getting to Know the Invisible Web." Library Journal Netconnect
(Summer 2001): 16-18.
- Snow,
Bonnie. "The Internet's Hidden Content and How to Fnd it." Online 24(3) (May/June
2000): 61-66.
Don't Rely
Only - Searchers should not rely exclusively on direct searches in general-purpose
search engines and subject directories. These searches may...
- Be less
efficient and effective. Using specialized tools and strategies can find some
information more quickly and/or in more authoritative resources.
- Miss "Deep
Web" content. A vast amount of high quality Web content is not included in
general-purpose engines and directories. This content is often referred to
as the "Invisible Web" or "Deep Web."
What
is the Invisible Web/Deep Web? (sources 1, 3, 4, 5)
The Deep Web
consists of text pages, files, or other often high-quality authoritative Web content
that general-purpose search engines cannot, due to technical limitations, or will
not, due to deliberate choice, add to their databases. Sometimes also referred
to as the "Invisible Web" or "dark matter." (Sherman and Price, 2001.)
Content may
be excluded for two main types of reasons.
- Economic
Reasons: Web content is expensive to gather and update. Search engines
may choose to exclude certain content.
- Non-HTML
content, such as PDF, Flash, and Office documents tend to be costly
to index, although Google and AltaVista both now index PDFs.
- Ephemeral
content, such as stock quotes and airline flight arrival information,
becomes dated quickly and is expensive to keep currently indexed.
- Engines
may also conserve resources by not deeply crawling all pages.
- Technical
Reasons: Search engines cannot understand, find, or access all Web content.
- Database
content: Databases are the largest and most important type of Deep Web
content. Engines can index the interface or gateway page, but not the content
of the database behind it, which can only be accessed by a specific query;
engines can only index static Web pages.
- Content
in non-textual file types (image, audio, video files): engines are designed
to index text. Access to non-textual data is improving, but generally relies
heavily on processing textual clues.
- Disconnected
pages: spiders rely on following links to 'crawl' the Web, and cannot
find pages if no other sites link to them.
- "No
crawl zones": Web page creators can use several methods to deny access
to pages by search engine spiders if they do not want their content to be
indexed.
Vast Amounts
of Deep Web Content - There are vast amounts of authoritative and current
information that you simply cannot access using a search engine like Google or
AltaVista. (Sherman and Price, 2001.) Examples of databases whose contents are
not accessible via search engines are
Searching
the Deep Web: Strategies (source 3)
Price and Sherman
offer the following practical strategies for successfully navigating the Web's
hidden territories.
- Be
curious and acquisitive. Don't be a passive user of information-seeking
tools. Think like a hunter or a passionate collector. Always look out for
useful resources discussed in the media or mentioned by colleagues. Investigate
interesting sites you hear or read about.
- Use
search engines to find sources, not content. Database content is hidden
from engines, but the 'front doors' to databases are not. Use engines to search
for databases on your topic. For example, combine the word "database" with
keywords for the subject you are searching.
- Use
Invisible Web pathfinders. These pathfinders list sites with specialized
databases. For example:
- Datamine
your bookmark collection. Is there a useful site you already know about
that might contain a database? Invisible Web resources are often hidden within
larger sites; search the site specifically to find this type of content.
- Keep
informed about new Web resources in your field. Monitor discussion lists
for your subject areas. For example:
- Subscribe
to "what's new" lists. For example:
- Don't
overlook printed sources. Read Web site reviews in books, magazines, journals,
trade publications.
Specialized
Directories and Engines (source 2)
A specialized
search engine focuses on a specific subject, geographic region, or computer file
format. It indexes fewer pages, but these pages are more likely to be on-topic.
Often the contents are further weeded by human subject specialists, who gather,
rank and annotate pages.
Using a
specialized search engine...
- Can
save time. You may find what you need more quickly in a subject-specific
database.
- Gives
access to evaluated, annotated sites. Entries are often manually selected
for relevancy and authority, and annotated.
Examples of
Specialized Engines