What is a search engine?
By definition, an Internet search engine to an information retrieval system that helps us to find information on the World Wide Web. World Wide Web is the universe of information where this information is available on the network. It facilitates the global sharing of information. But the WWW is seen as an unstructured database. It is exponentially growing to vast stores of information. Finding information online is a difficult task. There is a need to have a tool to manage, filter and retrieve this oceanic information. A search engine serves this purpose.
How does a Search Engine Work?
• Internet search engines are web search engines that search and retrieve information online. Most of them use the crawler indexing function architecture. They rely on their crawler modules. Crawlers, also referred to as spiders are small programs that browse the web.
• Crawlers given an initial set of URLs whose pages they retrieve. They extract the URLs that appear in the crawled pages and provide this information to the crawler control module. The crawler module decides which pages to visit next and provides their URLs back to the crawlers.
• The topics covered by various search engines varies according to the algorithms they use. Some search engines are programmed to search pages on a particular topic, while in other crawlers can be to visit as many places as possible.
• review the control module can use the link graph of a previous review, or may use usage patterns to assist in its review strategy.
• The indexer module pulling words form each page it visits and records its URL. It results in a large lookup table that provides a list of URLs that point to pages where each word occurs. The table shows these sites, which were covered in the crawling process.
• A collection analysis module is another important part of search engine architecture. It creates a utility index. A utility index can provide access to pages of a given length or pages that contain a certain number of images on them.
• During the process of reviewing and indexing, saving a search engine pages retrieved. They are temporarily stored in a page repository. Search engines maintain a cache of the pages they visit, so that retrieval of already visited pages dispatches.
• The query module of a search engine receives search requests form users in the form of keywords. The rating module sorts the results.
• crawler-indexer architecture has many variants. It has changed in the distributed architecture of a search engine. These search engine architectures consist of collectors and dealers. Collectors collect indexing information from Web servers while brokers provide indexing mechanisms and query interface. Agents update the indexes on the basis of information received from collectors and other dealers. They can filter information. Many search engines today use this type of architecture.
Search Engines and Page Ranking
When we send a query to a search engine, the results are displayed in a certain order. Most of us tend to visit the pages in the top order and ignore those beyond the first. This is because we consider the top few pages to carry the most relevance to search our site. So anyone interested in ranking their pages in the first ten of a search engine.
The words you enter in the query interface to a search engine are the key words are searched by search engines. They present a list of pages that are relevant to the queried keyword. During this process, search engines pick up these sites, which have frequent occurrences of keywords. They look for relationships between keywords. The location of the keywords is also considered while ranking pages that contain them. Keyword occurring in page titles or URLs are given greater weight. A page that has links pointing to it makes it more popular. If many other sites link to a page, it is considered valuable and more relevant.
It is actually a ranking algorithm that each search engine uses. The algorithm is a computerized formula designed to match the relevant pages of a user query. Each search engine may have a different ranking algorithm that analyzes the pages of the engine’s database to find relevant answers to queries. Different search engines index the information differently. This leads to a specific query submitted to two different search engines can retrieve pages in different orders or may pick up different pages. Both keyword and website popularity of the factors that determine relevance. Click through the popularity of a website is another factor of his rank. This popularity is the measurement of how often the site is visited.
Webmasters are trying to fool search engine algorithms to increase the ranks of their sites. Tricks include highly populate the homepage of a site with keywords or use of meta-tags to deceive search engine ranking strategies. But search engines are smart NOK! They keep revising their algorithms and counter program their systems so that we as researchers do not fall prey to illegal practice.
If you are a serious researcher, understand that even the pages beyond the first on the list can have serious content. But rest assured of good search engines. They will always give you very relevant pages in top order!