Innovation Simple

Posts Comments



Robot File Help

10 July, 2008 (09:33) | Web Development

To block ALL bots from crawling all pages under a directory use the entry in your robots.txt file:

User-agent: *

Disallow: /private/

To block Google from crawling all pages under a directory use the entry in your robots.txt file:

User-agent: Googlebot

Disallow: /images

To block Google from crawling all files of a specific file type use the entry in your robots.txt file:

User-agent: Googlebot

Disallow: /*.gif$

To block Google from crawling dynamic pages use the entry in your robots.txt file:

User-agent: Googlebot

Disallow: /*?

To block Google from indexing the images on your website use the entry your robots.txt file:

User-agent: Googlebot-Image

Disallow: /

line Robot File Help

*Rather than use a robots.txt file to block crawler access to pages, you can add a <META> tag to an HTML page to tell robots what to do.

To prevent all robots from indexing a page on your site, you’d place the following meta tag into the <HEAD> section of THAT PAGE:

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

To allow other robots to index the page on your site, preventing only Google from indexing that page, you’d use the following meta tag into the <HEAD> section of THAT PAGE:

<META NAME=”GOOGLEBOT” CONTENT=”NOINDEX, NOFOLLOW”>

To allow robots to index the page on your site but instruct them not to follow outgoing links, you’d use the following tag:

<META NAME=”ROBOTS” CONTENT=”NOFOLLOW”>

Technorati Tags: , ,

Write a comment





CAPTCHA image