Understanding web crawlers

A web crawler is a program or automated script that browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches.

Search engines send out 'spiders', crawlers and robots to visit your site and gather web pages. When a robot visits a website it does one of two things;

  • It looks for the robots.txt file and the robots meta tag to see the "rules" that have been set forth. 
  • It begins to index the web pages on your site.

The robot then scans the visible text on the page, the content of various HTML tags and the hyperlinks listed on the page. It will then analyze the information and process it according to an algorithm devised by its owner. Depending on the search engine, the information is indexed and sent to the search engine's database.

Different search engines uses different robot as their web crawler. For example Yahoo uses Slurp as its web-indexing robot. Google uses googlebot as it robot and so on.


Article is closed for comments.
Powered by Zendesk