UIUC Logo

 

University of Illinois at Urbana Champaign

Graduate School of Library and Information ScienceLibrary and Information Science Library

 

Decorative icons related to library topics

 

 

Logo saying: UI Current LIS Clips: A current awareness service for the Library and Information community

 

 

 

 

Web Searching

December, 2002 - Compiled, annotated and supplemented by Janet Eke
Updated 1/9/03 (see items noted in red below)
Printer Friendly Version

In This Issue:

Becoming an Efficient and Effective Web Searcher

  1. An approach to efficient and effective Web searching
  2. Understanding search engines and subject directories
  3. Beyond engines and directories: specialized tools and the Deep Web
  4. Formulating and formalizing Web searching strategies
  5. Evaluating Web resources
  6. More Web topics in future issues

1. An approach to efficient and effective Web searching 

  1. Based on workshops developed by Janet Eke, Pat Barlosky, and Cindy Kehoe, for the Ontario Library Association, the National Technological University, and Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign (1999-2002). 
  2. Cohen, Laura B. "The Web as a Research Tool: Teaching Strategies for Instructors." Choice Supplement to vol. 36 (1999):19-44.

The Web is difficult to search. Though the Web's hypertext point-and-click environment is simple, conducting research on the Web is not. Its chaotic architecture of linking lends itself well to surfing but not to efficient information finding. (Cohen, 1999.)

Information professionals must be good Web searchers. The Web is an information resource that cannot be ignored. We must know how to locate high quality Web content quickly and accurately.

We already have useful skills. Though some of the skills and knowledge needed to search effectively are unique to the Web environment, many others come out of understanding that we already have of the information structures of traditional print and online collections. We can apply what we already know to the Web.
Three Steps to Efficient and Effective Web Searching

The chart below suggests three "A, B, C's" of effective Web searching. The sections following discuss each component in more detail, drawing on library literature.
 

Step

Example

  • Learn what they are and how they work. 
  • Learn the advanced search features of your favorites.
  • Know where to go to keep current on changing features and new services. 
  • B. Know specialized search tools. Build a "Core Collection" of Web sites beyond general-purpose search tools like Google or Yahoo! 
  • Specialized tools may take you more directly to an authoritative source.
  • Specialized tools may help you locate valuable content not indexed by search engines (the Deep Web). 
  • Know good sources for general types of information, such as general reference sites and news sites. 
  • Know good sources for specific subject areas you search often. 
  • Know gateway sites to specialized databases such as Deep Web pathfinders and specialized directories. 
  • C. Formulate and formalize Web searching strategies. Analyze Web topics before you search. Consciously apply old and new search strategies to the Web. 
  • Approaches from traditional collections and Information Retrieval system searching can be applied to the Web; other strategies are Web-specific. 
  • Analysis can make the difference between Web searching success and failure. 

Ask

  • Is there a type of source that will be useful for this topic? For example: an encyclopedia entry, a news article, a business directory entry.
  • Who might be interested in or responsible for gathering information about this topic? For example: a trade association, a government agency, a research institution.
  • What is the best type of Web search tool to begin with? For example: a search engine, a subject directory, a specific site from my Core Collection.
  • Should I search the Web in the first place? 

2. Understanding Search Engines and Subject Directories 

  1. Barker, Joe. UC Berkeley - Teaching Library Internet Workshops Berkeley - Types of Search Tools (2001). http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/ToolsTables.html
  2. Basch, Reva and Mary Ellen Bates. Researching Online for Dummies, 2nd edition. Chicago, IL: IDG Books Worldwide. 2000. 
  3. Cohen, Laura B. "The Web as a Research Tool: Teaching Strategies for Instructors." Choice Supplement to vol. 36 (1999):19-44. 
  4. Cohen, Laura B. "Finding it All on the Web: Search Tools and Strategies." Choice Special Issue to vol. 38 (2001):13-27. 
  5. Cohen, Laura. "Internet Tutorials - Checklist of Internet Research Tips " at the University of Albany Libraries (2002). http://library.albany.edu/internet/checklist.html
  6. Schlein, Alan M. Find It Online: The Complete Guide to Online Research, 2nd edition. Tempe, AZ: Facts on Demand Press, 2000.

The two main types of general-purpose Web search tools are search engines and subject directories. In form they can look almost identical, apparently offering very similar search capabilities via similar interfaces; however, in function they are distinctly different. Understanding how search engines and subject directories work and when to use them is an essential step to becoming an efficient and effective Web searcher.

Human-centered VS Machine-centered Services (sources 3, 5, 6)

What is a subject directory? - A subject directory is a database of titles and URLs of Web sites, compiled and organized into subject categories by humans. Directories are usually searchable and browsable, and entries may be annotated. Selectivity, evaluation, and quality of annotations vary between directories. Examples are Yahoo! http://www.yahoo.com, The Open Directory Project http://dmoz.org/, and Infomine http://infomine.ucr.edu.

There are two main types of subject directories. Cohen emphasizes that they have very different purposes and should be approached accordingly.

  • Academic and professional directories - are often created by librarians or subject experts to support the needs of researchers. Example: Infomine http://infomine.ucr.edu/
  • Commercial portals - are created by commercial companies to generate income and support the needs of the general public. Example: Yahoo! http://www.yahoo.com

What is a search engine? - A search engine is a searchable database of words from Web pages compiled by a computer program called a crawler, robot, or spider. The spider program travels the Web by following links, and gathering words from individual pages it finds. These are collected in a giant index that can then be queried. Results are ranked by relevancy algorithms. Every search engine's method of searching is proprietary and the depth, breadth, and scope of its database is unique. Examples are Google http://www.google.com, AltaVista http://www.altavista.com, and AllTheWeb http://www.alltheweb.com.

Subject Directories VS Search Engines (sources 1, 2, 3, 4, 5)

Subject Directories 

Search Engines 

  • Human-compiled index of information about Web sites 
  • Organized into subject categories
  • Search terms are matched in category names, site names, and site annotations, NOT in the full-text of individual pages
  • Smaller than search engines
  • Contents MAY be carefully evaluated and annotated -- but not always, especially in the case of large commercial portals 
  • Computer-compiled index of words from individual Web pages 
  • NOT organized into subject categories; results are ranked by computer algorithm
  • Search terms are matched in the actual text of Web pages; in many engines every word in a page may be indexed
  • Much larger than subject directories; include more sites, and index them much more deeply
  • Contents are unevaluated by humans 

Use a subject directory:

  • When you are researching a broad topic, or want just to get a sense of what is available on a topic
  • When you want to find sites that are authoritative and/or substantively about a topic, especially for topics that are widely covered on the Web
  • When you want to find sites that are often recommended and annotated by experts (remembering that evaluation and selectivity varies between directories)
  • When you want to avoid searching the full-text of individual Web pages.

Use a search engine:

  • When you are researching a more targeted, obscure or complex topic, and for some general queries
  • When your topic can be expressed in distinctive and meaningful search terms
  • When you need to search the full text of individual Web pages
  • When you want to gather a great deal of information on your topic
  • When you want to take advantage of second-generation retrieval technologies such as concept clustering, ranking by popularity, link ranking, etc.

Converging Content: Engines with Directories and Directories with Engines

Most major search services have formed alliances to provide both engine-type search results and directory-type search results. This is why we see Web page search results in Yahoo!, and can browse a directory in Google. Google owns and maintains the engine database only; it gets its directory from The Open Directory Project. Similarly, Yahoo! owns and develops its directory; currently it gets its engine results from Google.

Searching and Selecting Tips

  • Select a directory appropriate for your needs. Academic/professional and commercial directories can have very different purposes and should be approached accordingly.
  • Learn 'search engine math' and other advanced features of your favorite search engine. The ability to specify how you want the system to treat your search terms can make the difference between finding what you are looking for, and being overwhelmed with irrelevant hits.
  • Learn Boolean operator equivalents, truncation, etc. Investigate the correct syntax for your engine of choice, for example: 
    • Phrases - enclose terms in quotation marks to search them as a phrase (works in most systems)
    • Excluding terms - use "-" in front of a term to exclude it in AltaVista; 
    • Synonyms - use "OR" in Google to search synonyms or word variants; 
    • Field searching - use field searching to restrict terms to title, URL, domain, etc. Investigate the correct syntax for your engine of choice. For example, in Google use "link:" in front of a URL to find pages that link to that page; (link:www.prairienet.org). In AltaVisa use "domain:" to restrict to a particular domain (domain:gov)

Keeping Current

To quickly learn the features of your favorites, or find out about the latest new engines and directories, use sites such as the following. Greg Notess, Danny Sullivan and Stephen Bell do the work for you with news, features charts, reviews and search strategies.

Daily Briefings

Added 1/9/03 - To get daily new about search engines, subscribe to the following:

Other Web Guides to Selecting Directories and Engines


3. Beyond Search Engines and Subject Directories: Specialized Tools and the Deep Web 

  1. Cohen, Laura. "Internet Tutorials - The Deep Web" at the University of Albany Libraries (2002) http://library.albany.edu/internet/deepweb.html 
  2. King, David. "Specialized Search Engines." Online 24(3) (May/June 2000): 67-74. 
  3. Price, Gary and Chris Sherman. "Exploring the Invisible Web: 7 essential strategies." Online 25(4) (July/August 2001): 32-34. 
  4. Sherman, Chris and Gary Price. "The Invisible Web." Searcher 9(6) (June 2001): 62-74.
  5. Smith, C. Brian. "Getting to Know the Invisible Web." Library Journal Netconnect (Summer 2001): 16-18. 
  6. Snow, Bonnie. "The Internet's Hidden Content and How to Fnd it." Online 24(3) (May/June 2000): 61-66. 

Don't Rely Only - Searchers should not rely exclusively on direct searches in general-purpose search engines and subject directories. These searches may...

  • Be less efficient and effective. Using specialized tools and strategies can find some information more quickly and/or in more authoritative resources.
  • Miss "Deep Web" content. A vast amount of high quality Web content is not included in general-purpose engines and directories. This content is often referred to as the "Invisible Web" or "Deep Web."

What is the Invisible Web/Deep Web? (sources 1, 3, 4, 5)

The Deep Web consists of text pages, files, or other often high-quality authoritative Web content that general-purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their databases. Sometimes also referred to as the "Invisible Web" or "dark matter." (Sherman and Price, 2001.)

Content may be excluded for two main types of reasons.

  • Economic Reasons: Web content is expensive to gather and update. Search engines may choose to exclude certain content.
    • Non-HTML content, such as PDF, Flash, and Office documents tend to be costly to index, although Google and AltaVista both now index PDFs.
    • Ephemeral content, such as stock quotes and airline flight arrival information, becomes dated quickly and is expensive to keep currently indexed.
    • Engines may also conserve resources by not deeply crawling all pages.
  • Technical Reasons: Search engines cannot understand, find, or access all Web content.
    • Database content: Databases are the largest and most important type of Deep Web content. Engines can index the interface or gateway page, but not the content of the database behind it, which can only be accessed by a specific query; engines can only index static Web pages. 
    • Content in non-textual file types (image, audio, video files): engines are designed to index text. Access to non-textual data is improving, but generally relies heavily on processing textual clues. 
    • Disconnected pages: spiders rely on following links to 'crawl' the Web, and cannot find pages if no other sites link to them.
    • "No crawl zones": Web page creators can use several methods to deny access to pages by search engine spiders if they do not want their content to be indexed.

Vast Amounts of Deep Web Content - There are vast amounts of authoritative and current information that you simply cannot access using a search engine like Google or AltaVista. (Sherman and Price, 2001.) Examples of databases whose contents are not accessible via search engines are

Searching the Deep Web: Strategies (source 3)

Price and Sherman offer the following practical strategies for successfully navigating the Web's hidden territories.

  • Be curious and acquisitive. Don't be a passive user of information-seeking tools. Think like a hunter or a passionate collector. Always look out for useful resources discussed in the media or mentioned by colleagues. Investigate interesting sites you hear or read about.
  • Use search engines to find sources, not content. Database content is hidden from engines, but the 'front doors' to databases are not. Use engines to search for databases on your topic. For example, combine the word "database" with keywords for the subject you are searching.
  • Use Invisible Web pathfinders. These pathfinders list sites with specialized databases. For example: 
  • Datamine your bookmark collection. Is there a useful site you already know about that might contain a database? Invisible Web resources are often hidden within larger sites; search the site specifically to find this type of content. 
  • Keep informed about new Web resources in your field. Monitor discussion lists for your subject areas. For example:
  • Subscribe to "what's new" lists. For example:
  • Don't overlook printed sources. Read Web site reviews in books, magazines, journals, trade publications.

Specialized Directories and Engines (source 2)

A specialized search engine focuses on a specific subject, geographic region, or computer file format. It indexes fewer pages, but these pages are more likely to be on-topic. Often the contents are further weeded by human subject specialists, who gather, rank and annotate pages.

Using a specialized search engine..

  • Can save time. You may find what you need more quickly in a subject-specific database.
  • Gives access to evaluated, annotated sites. Entries are often manually selected for relevancy and authority, and annotated. 

Examples of Specialized Engines

Examples of Specialized Directories

Web Guides to Deep Web Gateways and Specialized Search Sites


4. Formulating and Formalizing Web Searching Strategies 

  1. Basch, Reva and Mary Ellen Bates. Researching Online for Dummies, 2nd edition. Chicago, IL: IDG Books Worldwide. 2000.
  2. Cohen, Laura. "Internet Tutorials" at the University of Albany Libraries (2002) http://library.albany.edu/internet/
  3. Drabenstott, Karen M. "Web Search Strategy." Online 25(4) (July/August 2001): 19-27.
  4. Paul, Nora in Alan M. Schlein, Find It Online: The Complete Guide to Online Research, 2nd edition. Tempe, AZ: Facts on Demand Press, 2000.

Reference Interview Yourself - Don't jump right in. Basch and Bates emphasize the importance of beginning a search by reference-interviewing yourself. Formulating a search strategy is critical to finding what you are looking for, and can save you hours of work. 

Formulating a Strategy 

Nora Paul recommends asking yourself specific questions about your topic and your resources.

  • WHAT are you looking for? Think about what you really want. Putting your question into words forces you to clearly define what you are searching for and why.
    • What is the angle? Why are you doing the research: surveying a broad topic, pinpointing a fact, filling in a gap in your knowledge, getting up to date on a topic, making a decision?
    • How much information do you need: a few good articles, everything possible, just the specific fact?
    • What kind of information do you need: statistics, sources, background?
    • What format will be useful: articles, reports, referrals to a person, public records?
    • What would your ideal answer look like? 
  • HOW can you describe the topic? 
    • What are the major concepts of the search? 
    • What search words could represent each concept? What are their synonyms? 
    • Are these terms likely to appear out of context? Will you need to allow for plurals and other variants of search terms?
  • WHERE should you start looking? 
    • Who might collect this information: an association, a government agency, a research center, a company?
    • Where are you most comfortable searching? Which service is most familiar to you?
    • Which service has the search features you will need?
    • Should you search the Web? Is there a non-Web source that would be a more logical starting point?

Selected Resources for Web Strategies


5. Evaluating Web Resources 

Kirk, Elizabeth. "Evaluating Information Found on the Internet." The Sheridan Libraries of the Johns Hopkins University. 2002. http://www.library.jhu.edu/researchhelp/general/evaluating/index.html

The system of evaluation by scholars, publishers and librarians that exists in traditional information collections does not exist on the Web. Anyone can write a Web page; excellent resources live side-by-side with the highly dubious. This means resources found on the Web must be carefully evaluated by the user. 

Evaluation Criteria - Elizabeth Kirk shows how criteria used to evaluate print information can be applied to Web content. Criteria and sample questions to consider are excerpted below.

  1. Authorship
    • Who wrote this? What is his/her/their authority on the topic?
    • Do you recognize the author? Is biographical information, institutional affiliation, or contact information given?
  2. Publishing body
    • Who owns the server where the document resides?
    • Does the document reside in a personal account, or as part of an official Web site? What is the relationship between the author and publisher/server?
    • Look for organization information on the document or in related directories; is the organization recognized in the field, or suitable to address the topic?
  3. Point of view or bias
    • Does the document reside on the server of an organization that has a clear stake in the issue at hand, or that has a political or philosophical agenda?
  4. Referral to and/or knowledge of the literature
    • Does the document have a bibliography?
    • Does the author display knowledge of related sources, or of theories, schools of thought, etc., usually considered appropriate in this subject? Does the author acknowledge controversy or limitations?
  5. Accuracy or verifiability of details
    • Are research methods used to gather data explained?
    • Is methodology outlined?
    • Does the document have a bibliography?
    • Can you verify background information provided?
  6. Currency
    • Does the document cite the date of data gathered or used?
    • Does the document show a publication or "last updated" date or copyright date?
    • If no date is given, try viewing the directory in which it resides to find its date of latest modification.

Question what you find on the Web - If you find information that is "too good to be true," it probably is. Never use information you cannot verify. 

Web Guide to Sources on Evaluation


More Web Topics in Future Issues 

More topics about the Web in the Library will be covered in future issues of Current LIS Clips, including:

  • Changing paradigms of reference service: The impact of the Web on reference services; using the Web to answer reference questions; e-mail reference; collaborative reference.
  • The librarian as trainer: Teaching users and staff; reasons to train; bibliographic instruction in the online environment; basic principles of instruction.

Send Us Feedback 

Do you have a favorite Web Searching tutorial or how-to resource that we should mention? Send us an e-mail. We'll add it to the resources in this issue.

Updated 1/9/03

 

 

 

 

 

 

 

 

 

Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street
Champaign, IL 61820-6211 USA
(217) 333-0734 voice (217) 244-3302 fax
http://www.lis.uiuc.edu
Current Clips Manager: Marianne Steadley

 

Library and Information Science Library
University of Illinois at Urbana-Champaign
1408 W. Gregory Drive, 306 Main Library
Urbana, IL 61801 USA
(217) 333-3804
http://www.library.uiuc.edu/lsx/
Librarian and Current Clips Editor: Susan Searing