A robots.txt file can make a huge impact on your WordPress blogs traffic and search engine rank. This is an SEO optimized WordPress robots.txt file. Keep in mind that if you mess up the robots.txt file by blocking too much, you could lose all of your rank.
Download the complete file: WordPress robots.txt file
I was inspired to revisit this topic after reading Creating the ultimate WordPress robots.txt file, then I revisited this once again and created the: Updated WordPress robots.txt file
Google Says
Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it’s current for your site so that you don’t accidentally block the Googlebot crawler.
header.php meta seo trick
Place this in your wordpress themes header.php file, if the page is a single, page, or if its the home page then the robots will index and follow links on it. Otherwise search engines will not index the pages but will still follow the links.
<?php if(is_single() || is_page() || is_home()) { ?>
<meta name="googlebot" content="index,noarchive,follow,noodp" />
<meta name="robots" content="all,index,follow" />
<meta name="msnbot" content="all,index,follow" />
<?php } else { ?>
<meta name="googlebot" content="noindex,noarchive,follow,noodp" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php }?>
seo robots.txt
See the Updated WordPress robots.txt file
User-agent: * # disallow all files in these directories Disallow: /cgi-bin/ Disallow: /z/j/ Disallow: /z/c/ Disallow: /stats/ Disallow: /dh_ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /contact/ Disallow: /tag/ Disallow: /wp-content/b Disallow: /wp-content/p Disallow: /wp-content/themes/askapache/4 Disallow: /wp-content/themes/askapache/c Disallow: /wp-content/themes/askapache/d Disallow: /wp-content/themes/askapache/f Disallow: /wp-content/themes/askapache/h Disallow: /wp-content/themes/askapache/in Disallow: /wp-content/themes/askapache/p Disallow: /wp-content/themes/askapache/s Disallow: /trackback/ Disallow: /*?* Disallow: */trackback/ User-agent: Googlebot # disallow all files ending with these extensions Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.cgi$ Disallow: /*.wmv$ Disallow: /*.png$ Disallow: /*.gif$ Disallow: /*.jpg$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /*.php* Disallow: */trackback* Disallow: /*?* Disallow: /z/ Disallow: /wp-* Allow: /wp-content/uploads/ # allow google image bot to search all images User-agent: Googlebot-Image Allow: /* # allow adsense bot on entire site User-agent: Mediapartners-Google* Disallow: /*?* Allow: /z/ Allow: /about/ Allow: /contact/ Allow: /wp-content/ Allow: /tag/ Allow: /manual/* Allow: /docs/* Allow: /*.php$ Allow: /*.js$ Allow: /*.inc$ Allow: /*.css$ Allow: /*.gz$ Allow: /*.cgi$ Allow: /*.wmv$ Allow: /*.cgi$ Allow: /*.xhtml$ Allow: /*.php* Allow: /*.gif$ Allow: /*.jpg$ Allow: /*.png$ # disallow archiving site User-agent: ia_archiver Disallow: / # disable duggmirror User-agent: duggmirror Disallow: /
The Breakdown
disallow files in these directories
User-agent: * Disallow: /cgi-bin/ Disallow: /z/j/ Disallow: /z/c/ Disallow: /stats/ Disallow: /dh_ Disallow: /about/ Disallow: /contact/ Disallow: /tag/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /contact Disallow: /wp- Disallow: /feed/ Disallow: /trackback/
disallow all files ending with these extensions
User-agent: Googlebot Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$
disallow all files with ? in url
Disallow: /*?*
disable duggmirror
User-agent: duggmirror Disallow: /
disallow WayBack archiving site
User-agent: ia_archiver Disallow: /
allow google image bot to search all images
User-agent: Googlebot-Image Disallow: Allow: /*
allow adsense bot on entire site
User-agent: Mediapartners-Google* Disallow: Allow: /*
Google User-agents
- Googlebot
- crawl pages from our web index and our news index
- Googlebot-Mobile
- crawls pages for our mobile index
- Googlebot-Image
- crawls pages for our image index
- Mediapartners-Google
- crawls pages to determine AdSense content. We only use this bot to crawl your site if you show AdSense ads on your site.
- Adsbot-Google
- crawls pages to measure AdWords landing page quality. We only use this bot if you use Google AdWords to advertise your site. Find out more about this bot and how to block it from portions of your site.
Google Sponsored Robots.txt Articles
- Controlling how search engines access and index your website
- The Robots Exclusion Protocol
- robots.txt analysis tool
- Googlebot
- Inside Google Sitemaps: Using a robots.txt file
- All About Googlebot
AskApache Robots.txt Articles
Related Articles
- Allow AdSense Crawler Access
- Updated robots.txt for WordPress
- Redirect index.php to root “/”
- Robots.txt Secrets From Matt Cutts
- Boost SEO, Drive Traffic with the 404 Error Page Plugin
- SEO with Robots.txt
- Instruct Search Engines to come back to site after you finish working on it
- SetEnvIf and SetEnvIfNoCase Examples
04.01.07 at 6:35 pm
great archive for wordpress users !
05.27.07 at 12:12 am
Great!! This is exactly what I needed!
I’ still not sure what exactly remove from your seo robots.txt but I will give it a try :)
06.06.07 at 12:01 am
Amazing post. Thanks.
06.19.07 at 10:18 pm
If I put my robots.txt in my root but have my blog installed in a subfolder how can I exclude correctly.
I mean my cgi-bin folder would work but my blog is installed in a folder called blog which means /blog/wp-admin for instance
how to disallow then?
06.19.07 at 11:20 pm
Also could I do something like this?
User-agent: *
Disallow: */wp-content/
Disallow: */wp-admin/
Disallow: */wp-includes/
Disallow: */wp-
Disallow: */feed/
Disallow: /trackback/
Disallow: /cgi-bin/
08.19.07 at 10:19 pm
Google and the Marching RobotsFriday, I called my bffs at Google to get the answer straight on the robots.txt file. If you have never heard of a robots.txt file, it is a simple text file that [...]
10.04.07 at 7:50 am
I was looking for a detailed Search Engine Optimization article on the robots.txt
This is really detailed and useful.
Thanks
:-)
03.24.08 at 12:51 pm
Can you also include the noarchive command for the robots and msnbot lines, as well as the googlebot line? If not, why?