FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home  »  SEO  »  WordPress robots.txt SEO

by 34 comments

A robots.txt file can make a huge impact on your WordPress blogs traffic and search engine rank. This is an SEO optimized WordPress robots.txt file. Keep in mind that if you mess up the robots.txt file by blocking too much, you could lose all of your rank.


Note: This article is outdated, over the years I've learned to only use robots.txt as an authoritative blacklist. Now my robots.txt is much simpler and I rely on meta tags, as detailed in my SEO article.. sorry rozkan!

Download the complete file: WordPress robots.txt file

I was inspired to revisit this topic after reading Creating the ultimate WordPress robots.txt file, then I revisited this once again and created the: Updated WordPress robots.txt file

Google Says

Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler.

header.php meta seo trick

Place this in your wordpress themes header.php file, if the page is a single, page, or if its the home page then the robots will index and follow links on it. Otherwise search engines will not index the pages but will still follow the links.

<?php if(is_single() || is_page() || is_home()) { ?>
    <meta name="googlebot" content="index,noarchive,follow,noodp" />
    <meta name="robots" content="all,index,follow" />
  <meta name="msnbot" content="all,index,follow" />
<?php } else { ?>
    <meta name="googlebot" content="noindex,noarchive,follow,noodp" />
    <meta name="robots" content="noindex,follow" />
  <meta name="msnbot" content="noindex,follow" />
<?php }?>

 

seo robots.txt

See the Updated WordPress robots.txt file

User-agent:  *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-content/b
Disallow: /wp-content/p
Disallow: /wp-content/themes/askapache/4
Disallow: /wp-content/themes/askapache/c
Disallow: /wp-content/themes/askapache/d
Disallow: /wp-content/themes/askapache/f
Disallow: /wp-content/themes/askapache/h
Disallow: /wp-content/themes/askapache/in
Disallow: /wp-content/themes/askapache/p
Disallow: /wp-content/themes/askapache/s
Disallow: /trackback/
Disallow: /*?*
Disallow: */trackback/
 
User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.wmv$
Disallow: /*.png$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.php*
Disallow: */trackback*
Disallow: /*?*
Disallow: /z/
Disallow: /wp-*
Allow: /wp-content/uploads/
 
# allow google image bot to search all images
User-agent: Googlebot-Image
Allow: /*
 
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow: /*?*
Allow: /z/
Allow: /about/
Allow: /contact/
Allow: /wp-content/
Allow: /tag/
Allow: /manual/*
Allow: /docs/*
Allow: /*.php$
Allow: /*.js$
Allow: /*.inc$
Allow: /*.css$
Allow: /*.gz$
Allow: /*.cgi$
Allow: /*.wmv$
Allow: /*.cgi$
Allow: /*.xhtml$
Allow: /*.php*
Allow: /*.gif$
Allow: /*.jpg$
Allow: /*.png$
 
# disallow archiving site
User-agent: ia_archiver
Disallow: /
 
# disable duggmirror
User-agent: duggmirror
Disallow: /

The Breakdown

disallow files in these directories

User-agent:  *
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

disallow all files ending with these extensions

User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

disallow all files with ? in url

Disallow: /*?*

disable duggmirror

User-agent: duggmirror
Disallow: /

disallow WayBack archiving site

User-agent: ia_archiver
Disallow: /

allow google image bot to search all images

User-agent: Googlebot-Image
Disallow:
Allow: /*

allow adsense bot on entire site

User-agent: Mediapartners-Google*
Disallow:
Allow: /*

 

Google User-agents

Googlebot
crawl pages from our web index and our news index
Googlebot-Mobile
crawls pages for our mobile index
Googlebot-Image
crawls pages for our image index
Mediapartners-Google
crawls pages to determine AdSense content. We only use this bot to crawl your site if you show AdSense ads on your site.
Adsbot-Google
crawls pages to measure AdWords landing page quality. We only use this bot if you use Google AdWords to advertise your site. Find out more about this bot and how to block it from portions of your site.

Google Sponsored Robots.txt Articles

  1. Controlling how search engines access and index your website
  2. The Robots Exclusion Protocol
  3. robots.txt analysis tool
  4. Googlebot
  5. Inside Google Sitemaps: Using a robots.txt file
  6. All About Googlebot

See also:

Tags

May 10th, 2007

Comments Welcome

  • tech

    great archive for wordpress users !

  • Pablo Rosales

    Great!! This is exactly what I needed!

    I' still not sure what exactly remove from your seo robots.txt but I will give it a try :)

  • derek

    If I put my robots.txt in my root but have my blog installed in a subfolder how can I exclude correctly.
    I mean my cgi-bin folder would work but my blog is installed in a folder called blog which means /blog/wp-admin for instance
    how to disallow then?

  • derek

    Also could I do something like this?

    User-agent: *
    Disallow: */wp-content/
    Disallow: */wp-admin/
    Disallow: */wp-includes/
    Disallow: */wp-
    Disallow: */feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/

  • eBlogger

    I was looking for a detailed Search Engine Optimization article on the robots.txt
    This is really detailed and useful.

    Thanks

    :-)

  • max

    Can you also include the noarchive command for the robots and msnbot lines, as well as the googlebot line? If not, why?

  • Rene Dwight

    This is an amzing post but I have a couple of questions,,

    How can adding the above robots.txt file boost rank and add traffic?

    Also please forgive me if I am being dumb.. I checked the robots txt file for your site and it is not like the examples above? I don't quite understand?

    Kindest regards Rene

  • John Doro

    I'd like to try it, but how safe is this? Can we undo this and re-gain our PR supposed this customization won't work?

  • Tyrone Campbell

    Thats a really good system and plugin since i run a wordpress mu site this will really help out people using the site, one other thing ive been looking for it a sitemap generator that works for wordpress mu, know of any? (email me answer please :D )

    thanks again!

  • Naina Susen

    This is what i was searching for, thanks a lot, it helped me a lot.

    -Naina

  • Higgins

    That's exactly the explanation I was looking for! Thanx

  • Jvistas

    This is what i looking for.
    Thanks

  • rozkan

    Thanks genius! I used these commands for robots.txt and a week later i realized my new pages weren't indexed by googlebot! By the way if this "ultimate super best robots.txt" why don't YOU use it ha? I see your robots.txt file is pretty simple! So you are lying!

  • AskApache

    @rozkan

    dude this article is 2 years old.. this is a blog not an encyclopedia. Regardless of that, If you follow the instructions in this article you should be in really good shape.

    If you cut-and-paste my old robots.txt, didn't even bother to check my actual one, and skipped reading the other newer articles here on my blog or anywhere else.. well geez man I'm sorry you had a problem but I can't work miracles. I bet you did actually learn something when it all went wrong though, some people like to learn that way, I do it too sometimes. THINK before you copy.. I'm not your robot.

  • ira

    Hello Apache i am avid reader of your blog. I made a robots txt file based on the information you gave is it ok is there any mistakes? Thanks.

    User-agent:  *
    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /stats/
    Disallow: /dh_
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /tag/
    Disallow: /hakkinda/
    Disallow: /iletisimvereklam/
    Disallow: /wp-content/upgrade
    Disallow: /wp-content/plugins
    Disallow: /wp-content/languages
    Disallow: /wp-content/themes/default
    Disallow: /wp-content/themes/guzel-pro
    Disallow: /wp-content/themes/wp-max
    Disallow: /wp-content/themes/classic
    Disallow: /trackback/
    Disallow: */trackback/
    Disallow: /.smileys
     
    User-agent: Googlebot
    # disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.cgi$
    Disallow: /*.wmv$
    Disallow: /*.png$
    Disallow: /*.gif$
    Disallow: /*.jpg$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.php*
    Disallow: */trackback*
    Disallow: /z/
    Disallow: /wp-*
    Allow: /wp-content/uploads/
     
    # allow google image bot to search all images
    User-agent: Googlebot-Image
    Allow: /*
     
    # allow adsense bot on entire site
    User-agent: Mediapartners-Google*
    Disallow: /*?*
    Allow: /z/
    Disallow: /hakkinda/
    Disallow: /iletisimvereklam/
    Allow: /wp-content/
    Allow: /tag/
    Allow: /manual/*
    Allow: /docs/*
    Allow: /*.php$
    Allow: /*.js$
    Allow: /*.inc$
    Allow: /*.css$
    Allow: /*.gz$
    Allow: /*.cgi$
    Allow: /*.wmv$
    Allow: /*.cgi$
    Allow: /*.xhtml$
    Allow: /*.php*
    Allow: /*.gif$
    Allow: /*.jpg$
    Allow: /*.png$
     
    # disallow archiving site
    User-agent: ia_archiver
    Disallow: /
     
    # disable duggmirror
    User-agent: duggmirror
    Disallow: /
  • AskApache

    @ira

    I would recommend much leaner, so you don't block the wrong bot. Remember that robots.txt is taken quite literally/seriously by search engines, better to use meta tags for fine-grain access control. Nice work though!

    User-agent:  *
    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /stats/
    Disallow: /dh_
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /tag/
    Disallow: /hakkinda/
    Disallow: /iletisimvereklam/
    Disallow: /wp-content/upgrade
    Disallow: /wp-content/plugins
    Disallow: /wp-content/languages
    Disallow: /wp-content/themes/default
    Disallow: /wp-content/themes/guzel-pro
    Disallow: /wp-content/themes/wp-max
    Disallow: /wp-content/themes/classic
    Disallow: /.smileys
     
    User-agent: Googlebot
    # disallow all files ending with these extensions
    Disallow: /z/
    Disallow: /wp-*
    Allow: /wp-content/uploads/
     
    # allow google image bot to search all images
    User-agent: Googlebot-Image
    Allow: /*
     
    # allow adsense bot on entire site
    User-agent: Mediapartners-Google*
    Allow: /*
     
    # disallow archiving site
    User-agent: ia_archiver
    Disallow: /
     
    # disable duggmirror
    User-agent: duggmirror
    Disallow: /
  • Vikram

    Thanks, thats what i was looking for, SEO friendly robots.txt

  • Jonathan

    I have been doing some reading on SEO. I just read this blog. I am using a robots plug in. Is the robots plug in a complement or a replacement to your script? (I am a newbie. Please take it easy.)

  • marco

    I know that this article is outdated, but just one question.
    As far as I know, wildcards are not supported in robots.txt (it should be based only on substring matches: http://www.robotstxt.org/faq/robotstxt.html), only "User agent:*" is accepted.

    So are you sure your robots.txt (even the new one) works as you expect? Are then wildcards supported?

  • francis

    Great! Thanks for sharing!

  • MakeMoney

    We try this robots for our blog. Hope get more value for SEO.

  • rrj

    Your article very usefull & update. Thanks.

  • andy

    why wouldnt you want google to index the rest of the site as well as the home page?
    (from header.php meta seo trick)

  • Sharl

    thanks a lot for this useful article & configure-files provided

  • tetembak

    oh.. it's very great... there's not article as well as it.. I will learn slowly about it.. thanks...

  • it’s my story

    thanks very much.. i am very confused about my duplicate content...

  • evilripper

    thanks!
    i had some problem with comment-page-1 title duplicates and maybe I solved with these lines:

    Disallow: /comment-page-*/
    Disallow: /blog/comment-page/
  • HDxxx

    Thanks, This is exactly what I needed.

  • jimmy choo

    thanks a lot for this useful article

  • jimmy

    very nice information...thanks

  • Connected

    Where the latest? I need for my newly web.

  • Ganeshbabu

    Wonderful Tutorials. now i am clear about these robots concept in clearly. but still have some doubts. can you explain some more details about wordpress site map and robots concept. is any plugin is availble to generate robots.txt file. if you find solution means mail me.. thanks

  • Chinekop

    Hello, thanks for this great explanation.

    On google's webmaster help area, they mentioned that if you want to disallow any url which has aword in it, you should add the line : Disallow: /*aword, but here you said it should be like Disallow: /*aword*, so my question is what is the difference between /*aword and /*aword*, thanks!

  • AskApache

    @Chinekop ~

    It's just a difference where one has a more inclusive regex. The reason is that while Google's robot may correctly interpret either, some robots from other search engines may need the other. Always go with Google!

My Online Tools


Popular Articles



Hacking and Hackers

The use of "hacker" to mean "security breaker" is a confusion on the part of the mass media. We hackers refuse to recognize that meaning, and continue using the word to mean someone who loves to program, someone who enjoys playful cleverness, or the combination of the two. See my article, On Hacking.
-- Richard M. Stallman



Related Articles

Twitter









[hide]

It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

+Askapache | askapache

Site Map | Contact Webmaster | License and Disclaimer | Terms of Service

↑ TOPMain