Search Engines

3.3658192090441 (2124)
Posted by bender 05/04/2009 @ 16:07

Tags : search engines, internet, technology

News headlines
Search Engines Ready for Battle, but Where's the War? - PC World
The search wars are heating up again, as the three major search engines--Google, Yahoo, and Microsoft--are launching new features. At its recent Searchology event, Google released some new tools to help you refine your query and get at the information...
New Search Engines Show Companies Haven't Given Up Fighting Giants - findingDulcinea
by Emily Coakley Google's dominance hasn't stopped other companies, big and small, from starting new search engines and trying to improve on the Web search experience. Along with the hype surrounding Wolfram Alpha's launch this week comes news on...
Wolfram Alpha: A new kind of search engine - Los Angeles Times
But as rough as it may seem now, Wolfram Alpha looks to be the leading edge of a newer, smarter crop of search engines. It's the use of so-called semantic technologies, where computers grapple with concepts and simple learning, that may define the next...
How To Make Sure Search Engines Find Your Business Web Site -
By Amanda C. Kooser You may have spent big bucks designing and building a sleek, cutting-edge business Web site, but if search engines can't find you neither will your customers. Build your site right and the search engines will find you regardless of...
Tech Notebook: Analyst: New Microsoft search engine positive for ... - San Jose Mercury News
By Elise Ackerman, Steve Johnson and Jack Davis Sandeep Aggarwal, an analyst with Collins Stewart, said Microsoft's new search engine, which is expected to be unveiled in the next week or two, could make Yahoo more comfortable about agreeing to a...
BFFs: Users and Search Engine Friendly Sites - Search Engine Journal
As I'm reading however, I can't help but be reminded that usability and search engine optimization truly do go hand in hand. Many unfamiliar to the field of SEO maintain an outdated myth that having a search friendly site means jeopardizing the look,...
Give Up Suicide Pact Enriching Google: Ann Woolner (Correct) - Bloomberg
But we have come to expect all the news in the world at the touch of our fingertips, brought to us mostly by search engines and aggregators that gobble up the product often without paying the producers a penny. Thanks in part to a steady diet of free...
Intelligent Life Sciences Search Engine: Grid Browser Understands ... - Science Daily (press release)
It could lead to a new generation of intelligent search engines. The life sciences community has built numerous databases – such as for gene sequencing and information about diseases – that are available to researchers as 'grid' services....
Yahoo Looking to Buy Its Way into Social Networking - DailyTech
Yahoo is one of the older search engines and internet portals online. The company was once one of the biggest internet companies on the planet, but has fallen on hard times in the last few years after Google exploded onto the search market and rapidly...
Browser Plugins Now Available for Major Science Search Engines - PR Newswire (press release)
Users can easily add any of these portals to their browser's search engine box by going to and clicking on a portal to automatically add it to their search box. Typically, the browser search box is located in...

List of search engines

This is a list of Wikipedia articles about search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases.

These search engines work across the BitTorrent protocol.

To the top

Israel search engines

The article may be deleted if this message remains in place for seven days.Expired+%5B%5BWP%3APROD%7CPROD%5D%5D%2C+concern+was%3A+Searchengines+per+country%21%3F%21 This template was added 2009-04-29 11:57; seven days from then is 2009-05-06 11:57.

If you created the article, please don't take offense. Instead, consider improving the article so that it is acceptable according to the deletion policy.

The first Search Engines targeted for Israelis and Hebrew speakers, started in 1995. The first of such search engines were:,, and others.

Today, most of these search engines use customizations from either Google (Google Custom Search Engine - CSE), or Yahoo (BOSS - Build you Own Search Service).

Other minor search engines include,, and others.

One interesting note, for its population, Israeli Search Engines rank very high on site ranking sites, such as Alexa and Quantcast. Of particular note are, which ranks very high.

To the top

Search engine marketing

Search engine marketing, or SEM, is a form of Internet marketing that seeks to promote websites by increasing their visibility in search engine result pages (SERPs) through the use of paid placement, contextual advertising, and paid inclusion.. The Pay Per Click (PPC) lead Search Engine Marketing Professional Organization (SEMPO), also includes search engine optimization (SEO) within its reporting, but SEO is a separate discipline with most sources, including the New York Times defining SEM as 'the practice of buying paid search listings'.

In 2006, North American advertisers spent US$9.4 billion on search engine marketing, a 62% increase over the prior year and a 750% increase over the 2002 year. The largest SEM vendors are Google AdWords, Yahoo! Search Marketing and Microsoft adCenter. As of 2006, SEM was growing much faster than traditional advertising and even other channels of online marketing. Because of the complex technology, a secondary "search marketing agency" market has evolved. Many marketers have difficulty understanding search engine marketing and they rely on third party agencies to manage their search marketing. Some of these agencies have developed technology that automates bidding and other complex functions required for the Pay Per Click model. Some of the well known agencies in the field are LBi, Avenue_A/Razorfish and iCrossing.

As the number of sites on the Web increased in the mid-to-late 90s, search engines started appearing to help people find information quickly. Search engines developed business models to finance their services, such as pay per click programs offered by Open Text in 1996 and then in 1998. later changed its name to Overture in 2001, and was purchased by Yahoo! in 2003, and now offers paid search opportunities for advertisers through Yahoo! Search Marketing. Google also began to offer advertisements on search results pages in 2000 through the Google AdWords program. By 2007, pay-per-click programs proved to be primary money-makers for search engines.

Search engine optimization consultants expanded their offerings to help businesses learn about and use the advertising opportunities offered by search engines, and new agencies focusing primarily upon marketing and advertising through search engines emerged. The term "Search Engine Marketing" was proposed by Danny Sullivan in 2001 to cover the spectrum of activities involved in performing SEO, managing paid listings at the search engines, submitting sites to directories, and developing online marketing strategies for businesses, organizations, and individuals.

Paid search advertising has not been without controversy, and the issue of how search engines present advertising on their search result pages has been the target of a series of studies and reports by Consumer Reports WebWatch. The Federal Trade Commission (FTC) also issued a letter in 2002 about the importance of disclosure of paid advertising on search engines, in response to a complaint from Commercial Alert, a consumer advocacy group with ties to Ralph Nader.

Vested interests appear to use the expression SEM to mean exclusively Pay per click advertising to the extent that the wider advertising and marketing community have accepted this narrow definition. Such usage excludes the wider search marketing community that is engaged in other forms of SEM such as Search Engine Optimization and Search Retargeting.

To the top

Index (search engine)

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is Web indexing.

Popular engines focus on the full-text indexing of online, natural language documents. Media types such as video and audio and graphics are also searchable.

Meta search engines reuse the indices of other services and do not store a local index, whereas cache-based search engines permanently store the index along with the corpus. Unlike full-text indices, partial-text services restrict the depth indexed to reduce index size. Larger services typically perform indexing at a predetermined time interval due to the required time and processing costs, while agent-based search engines index in real time.

The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.

A major challenge in the design of search engines is the management of parallel computing processes. There are many opportunities for race conditions and coherent faults. For example, a new document is added to the corpus and the index must be updated, but the index simultaneously needs to continue responding to search queries. This is a collision between two competing tasks. Consider that authors are producers of information, and a web crawler is the consumer of this information, grabbing the text and storing it in a cache (or corpus). The forward index is the consumer of the information produced by the corpus, and the inverted index is the consumer of information produced by the forward index. This is commonly referred to as a producer-consumer model. The indexer is the producer of searchable information and users are the consumers that need to search. The challenge is magnified when working with distributed storage and distributed processing. In an effort to scale with larger amounts of indexed information, the search engine's architecture may involve distributed computing, where the search engine consists of several machines operating in unison. This increases the possibilities for incoherency and makes it more difficult to maintain a fully-synchronized, distributed, parallel architecture.

This index can only determine whether a word exists within a particular document, since it stores no information regarding the frequency and position of the word; it is therefore considered to be a boolean index. Such an index determines which documents match a query but does not rank matched documents. In some designs the index includes additional information such as the frequency of each word in each document or the positions of a word in each document. Position information enables the search algorithm to identify word proximity to support searching for phrases; frequency can be used to help in ranking the relevance of documents to the query. Such topics are the central research focus of information retrieval.

The inverted index is a sparse matrix, since not all words are present in each document. To reduce computer storage memory requirements, it is stored differently from a two dimensional array. The index is similar to the term document matrices employed by latent semantic analysis. The inverted index can be considered a form of a hash table. In some cases the index is a form of a binary tree, which requires additional storage but may reduce the lookup time. In larger indices the architecture is typically a distributed hash table.

The inverted index is filled via a merge or rebuild. A rebuild is similar to a merge but first deletes the contents of the inverted index. The architecture may be designed to support incremental indexing, where a merge identifies the document or documents to be added or updated and then parses each document into words. For technical accuracy, a merge conflates newly indexed documents, typically residing in virtual memory, with the index cache residing on one or more computer hard drives.

After parsing, the indexer adds the referenced document to the document list for the appropriate words. In a larger search engine, the process of finding each word in the inverted index (in order to report that it occurred within a document) may be too time consuming, and so this process is commonly split up into two parts, the development of a forward index and a process which sorts the contents of the forward index into the inverted index. The inverted index is so named because it is an inversion of the forward index.

The rationale behind developing a forward index is that as documents are parsing, it is better to immediately store the words per document. The delineation enables Asynchronous system processing, which partially circumvents the inverted index update bottleneck. The forward index is sorted to transform it to an inverted index. The forward index is essentially a list of pairs consisting of a document and a word, collated by the document. Converting the forward index to an inverted index is only a matter of sorting the pairs by the words. In this regard, the inverted index is a word-sorted forward index.

Generating or maintaining a large-scale search engine index represents a significant storage and processing challenge. Many search engines utilize a form of compression to reduce the size of the indices on disk. Consider the following scenario for a full text, Internet search engine.

Given this scenario, an uncompressed index (assuming a non-conflated, simple, index) for 2 billion web pages would need to store 500 billion word entries. At 1 byte per character, or 5 bytes per word, this would require 2500 gigabytes of storage space alone, more than the average free disk space of 25 personal computers. This space requirement may be even larger for a fault-tolerant distributed storage architecture. Depending on the compression technique chosen, the index can be reduced to a fraction of this size. The tradeoff is the time and processing power required to perform compression and decompression.

Notably, large scale search engine designs incorporate the cost of storage as well as the costs of electricity to power the storage. Thus compression is a measure of cost.

Document parsing breaks apart the components (words) of a document or other form of media for insertion into the forward and inverted indices. The words found are called tokens, and so, in the context of search engine indexing and natural language processing, parsing is more commonly referred to as tokenization. It is also sometimes called word boundary disambiguation, tagging, text segmentation, content analysis, text analysis, text mining, concordance generation, speech segmentation, lexing, or lexical analysis. The terms 'indexing', 'parsing', and 'tokenization' are used interchangeably in corporate slang.

Natural language processing, as of 2006, is the subject of continuous research and technological improvement. Tokenization presents many challenges in extracting the necessary information from documents for indexing to support quality searching. Tokenization for indexing involves multiple technologies, the implementation of which are commonly kept as corporate secrets.

Unlike literate humans, computers do not understand the structure of a natural language document and cannot automatically recognize words and sentences. To a computer, a document is only a sequence of bytes. Computers do not 'know' that a space character separates words in a document. Instead, humans must program the computer to identify what constitutes an individual or distinct word, referred to as a token. Such a program is commonly called a tokenizer or parser or lexer. Many search engines, as well as other natural language processing software, incorporate specialized programs for parsing, such as YACC or Lex.

During tokenization, the parser identifies sequences of characters which represent words and other elements, such as punctuation, which are represented by numeric codes, some of which are non-printing control characters. The parser can also identify entities such as email addresses, phone numbers, and URLs. When identifying each token, several characteristics may be stored, such as the token's case (upper, lower, mixed, proper), language or encoding, lexical category (part of speech, like 'noun' or 'verb'), position, sentence number, sentence position, length, and line number.

If the search engine supports multiple languages, a common initial step during tokenization is to identify each document's language; many of the subsequent steps are language dependent (such as stemming and part of speech tagging). Language recognition is the process by which a computer program attempts to automatically identify, or categorize, the language of a document. Other names for language recognition include language classification, language analysis, language identification, and language tagging. Automated language recognition is the subject of ongoing research in natural language processing. Finding which language the words belongs to may involve the use of a language recognition chart.

Options for dealing with various formats include using a publicly available commercial parsing tool that is offered by the organization which developed, maintains, or owns the format, and writing a custom parser.

Section analysis may require the search engine to implement the rendering logic of each document, essentially an abstract representation of the actual document, and then index the representation instead. For example, some content on the Internet is rendered via Javascript. If the search engine does not render the page and evaluate the Javascript within the page, it would not 'see' this content in the same way and would index the document incorrectly. Given that some search engines do not bother with rendering issues, many web page designers avoid displaying content via Javascript or use the Noscript tag to ensure that the web page is indexed properly. At the same time, this fact can also be exploited to cause the search engine indexer to 'see' different content than the viewer.

Specific documents often contain embedded meta information such as author, keywords, description, and language. For HTML pages, the meta tag contains keywords which are also included in the index. Earlier Internet search engine technology would only index the keywords in the meta tags for the forward index; the full document would not be parsed. At that time full-text indexing was not as well established, nor was the hardware able to support such technology. The design of the HTML markup language initially included support for meta tags for the very purpose of being properly and easily indexed, without requiring tokenization.

As the Internet grew through the 1990s, many brick-and-mortar corporations went 'online' and established corporate websites. The keywords used to describe webpages (many of which were corporate-oriented webpages similar to product brochures) changed from descriptive to marketing-oriented keywords designed to drive sales by placing the webpage high in the search results for specific search queries. The fact that these keywords were subjectively-specified was leading to spamdexing, which drove many search engines to adopt full-text indexing technologies in the 1990s. Search engine designers and companies could only place so many 'marketing keywords' into the content of a webpage before draining it of all interesting and useful information. Given that conflict of interest with the business goal of designing user-oriented websites which were 'sticky', the customer lifetime value equation was changed to incorporate more useful content into the website in hopes of retaining the visitor. In this sense, full-text indexing was more objective and increased the quality of search engine results, as it was one more step away from subjective control of search engine result placement, which in turn furthered research of full-text indexing technologies.

In Desktop search, many solutions incorporate meta tags to provide a way for authors to further customize how the search engine will index content from various files that is not evident from the file content. Desktop search is more under the control of the user, while Internet search engines which must focus more on the full text index.

To the top

Travel search engine

A travel search engine is a specialized type of Internet search engine that focuses specifically travel products, such as airline tickets, automobile rentals, hotel rooms, and cruise tickets. Many have comparison shopping capabilities that allow visitors to compare prices and options.

Travel remains the single largest component of e-commerce according to Forrester Research, a consulting firm in Cambridge, Mass. But despite the dominance of such online travel agency heavyweights, most users consult multiple Web sites when shopping online for travel. The average consumer visits 3.6 sites when shopping for an airline ticket online, according to PhoCusWright, a Sherman, CT-based travel technology firm. Yahoo claims 76% of all online travel purchases are preceded by some sort of search function, according to Malcolmson, director of product development for Yahoo Travel. The 2004 Travel Consumer Survey published Jupiter Research noted that "nearly two in five online travel consumers say they believe that no one site has the lowest rates or fares." Thus a niche was created for aggregate travel search which seek to find the lowest rates from multiple travel sites, obviating the need for consumers to cross-shop from site to site.

Several of the leading generic search and information aggregator sites also offer travel components. In the broadest sense, virtually any search engine could be considered a travel search engine. However, some generic search engines also should be ranked as TSEs, since they include both paid and unpaid links to travel sites and maintain "travel" pages, often accompanied by original editorial content.

These sites use technological tools generate an aggregate result from other travel sites, including third-party travel agency sites and branded sites maintained by individual travel companies.

These sites collect and publish bargain rates by advising consumers where to find them online (sometimes but not always through a direct link). Rather than providing detailed search tools, these sites generally focus on offering advertised specials, such as last-minute sales from travel suppliers eager to deplete unused inventory; therefore, these sites often work best for consumers who are flexible about destinations and other key itinerary components.

To the top

Source : Wikipedia