Important Search Engine Robots
|
| Last updated April 29th 2008. Check http://www.user-agents.org/ for updates |
| Google Search |
Googlebot/2.1 ( http://www.google.com/bot.html) |
| Google Search |
Googlebot/2.1 ( http://www.googlebot.com/bot.html) |
| Google Image Search |
Googlebot-Image/1.0 |
| Google Image Search |
Googlebot-Image/1.0 ( http://www.googlebot.com/bot.html) |
| MSN Search |
msnbot/x.xx ( http://search.msn.com/msnbot.htm) |
| MSN Search |
MSNBOT/0.xx (http://search.msn.com/msnbot.htm) |
| MSN Media Search Robot |
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm) |
| Windows Live Product Search |
msnbot-Products/1.0 (+http://search.msn.com/msnbot.htm) |
| Microsoft Search for Mobiles |
MSNBOT_Mobile MSMOBOT Mozilla/2.0 (compatible; MSIE 4.02; Windows CE; Default) |
| Alexa / The Internet Archive |
ia_archiver |
| Alexa / The Internet Archive |
ia_archiver-web.archive.org |
| Alexa / The Internet Archive |
ia_archiver/1.6 |
| Yahoo Blog Search |
Yahoo-Blogs/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html ) |
| Yahoo Multimedia Search |
Yahoo-MMAudVid/1.0 (mms dash mmaudvidcrawler dash support at yahoo dash inc dot com) |
| Yahoo Product Search |
YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/) |
| Yahoo Product Search |
YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/yahooseeker.html) |
| Yahoo Product Search |
YahooSeeker/1.1 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/) |
| Ask/Teoma Search |
Mozilla/2.0 (compatible; Ask Jeeves) |
| Ask/Teoma Search |
Mozilla/2.0 (compatible; Ask Jeeves/Teoma) |
| Ask/Teoma Search |
Mozilla/2.0 (compatible; Ask Jeeves/Teoma; http://about.ask.com/en/docs/about/webmasters.shtml) |
This portion includes a list of all the spider user-agents of the important search engines. The versions on this list will eventually go out of date, but the list will remain useful by helping to identify oddly named spiders (Ex. IA Archiver = Ask.com).
Robots Meta Tag
|
| <meta name=”ROBOT NAME” content=”ARGUMENTS” /> |
| ROBOT NAME can be either “robots” for all robots or the user-agent of a specific robot. See robot user-agent list to the left. |
| Arguments |
Description |
| noindex |
(Google, Yahoo, Live, Ask) Page Not Indexed |
| nofollow |
(Google, Yahoo, Live, Ask) All Links on Page Become No Followed |
| noarchive |
(Google, Yahoo, Live, Ask) Page Not Cached |
| noodp |
(Google, Yahoo, Live) Stops Description and Title Tag Overwrite by DMOZ (Only for Homepage) |
| noydir |
(Yahoo) Stops Description and Title Tag Overwrite by Yahoo Directory |
| nosnippet |
(Google) Stops Google from Generating Description Based on On-page Text |
This section includes documentation for the robots meta tag. This includes all of the available arguments as well as search engine compatibility.
Common Robot Traps
|
| Input Forms |
| Session IDs in URL |
| Pages Restricted by Cookies |
| Frames |
| Logins |
This box includes a list of the most common ways webmasters unintentionally stop spiders from crawling their sites.
Robots.txt Syntax
|
| User-agent: * |
| Disallow: /privatefolder/ |
| Disallow: /privatefile.html |
| User-agent: Googlebot/2.1 |
| Disallow: /nogoogle.html |
| Sitemap: http://www.mysite.com/sitemap.xml |
A example of a simple robots.txt. This illustrates how to block specific robots from both entire directories and specific files.
Sitemap Syntax
|
| <?xml version=”1.0” encoding=’UTF-8’?> |
| <urlset xmlns=’http://www.sitemaps.org/schemas/sitemap/0.9’> |
| <url> |
| <loc>http://www.mysite.com/</loc> |
| <lastmod>1987-05-25</lastmod> |
| <changefreq>monthly</changefreq> |
| <priority>0.8</priority> |
| </url> |
| </urlset> |
| Default Locations Search Engines Look for Sitemaps |
| http://www.mysite.com/sitemap.xml |
| http://www.mysite.com/sitemap.xml.gz |
| http://www.mysite.com/sitemap.gz |
| Visit http://www.xml-sitemaps.com/ for a free sitemap |
| This section shows the standardized sitemap.xml syntax. It also lists the default places search engines look for sitemaps.
Courtesy: SEOMoz.com |
|
|
|
|