Home  »  SEO  »  Updated robots.txt for WordPress

by 66 comments

Implementing an effective SEO robots.txt file for WordPress will help your blog to rank higher in Search Engines, receive higher paying relevant Ads, and increase your blog traffic. Using a robots.txt file gives you a search engine robots point of view... Sweet! Looking for the most updated robots.txt? Just look at mine, I don't slack.

Warning about robots.txt files

Your robots.txt file should never have more than 200 Disallow lines.. Start with as few as possible and add to it when needed.

Once google removes links referenced in your robots.txt file, if you want those links to be added back in it could take up to 3 months before Google re-indexes the previously disallowed links.

Google pays serious attention to robots.txt files. Google uses robots.txt files as an authoritative set of links to Disallow. If you Disallow a link in robots.txt, Google will completely and totally remove the disallowed links from the index which means you will not be able to find the disallowed links when searching google.

The big idea for you to take away, is to only use robots.txt to do hard disallows, that you know you don't want indexed. Not only will the links not be indexed, they won't be followed by search engines either, meaning the links and content on the disallowed pages will not be used by the search engines for indexing or for ranking.

So, use the robots.txt file only for disallowing links that you want totally removed from google. Use the robots meta tag to specify all the allows, and also use the rel='nofollow' attribute of the a link element when its temporary or you still want the link to be indexed but not followed.

WordPress robots.txt SEO

Here are some robots.txt files used with WordPress on this blog. For instance, I am disallowing /comment-page- links altogether in the robots.txt file below because I don't use separate comment pages, so I instruct Google to remove these links from the index. See also: Adding a 301 Redirect using mod_rewrite or RedirectMatch can further protect myself from this duplicate content issue.

User-agent: *
Allow: /
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /e/
Disallow: /show-error-*
Disallow: /xmlrpc.php
Disallow: /trackback/
Disallow: /comment-page-
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google
Allow: /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: Googlebot-Mobile
Allow: /

# getting sick with the sitemaps
Sitemap: /sitemap.xml
Sitemap: /sitemap_index.xml
Sitemap: /page-sitemap.xml 
Sitemap: /post-sitemap.xml 
Sitemap: /sitemap-news.xml 
Sitemap: /sitemap-posttype-page.xml 
Sitemap: /sitemap-posttype-post.xml 
Sitemap: /sitemap-home.xml 

#               __                          __
#   ____ ______/ /______ _____  ____ ______/ /_  ___
#  / __ `/ ___/ //_/ __ `/ __ \/ __ `/ ___/ __ \/ _ \
# / /_/ (__  ) ,< / /_/ / /_/ / /_/ / /__/ / / /  __/
# \__,_/____/_/|_|\__,_/ .___/\__,_/\___/_/ /_/\___/
#                     /_/

Generic Default robots.txt

For many super-geeky reasons, every single website you control must have a robots.txt file in its root directory I also recommend having a favicon.ico file, bare minimum. This will ensure your site is viewed as somewhat SEO, and alerts google there are rules for crawling the site. IT will also save your server resources.

User-agent: *

Google Recommendations

Use robots.txt - Webmaster Guidelines

Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler.

Troubleshooting tips part IIb: Ad relevance and targeting continued. To follow up on our previous post about ad relevance and targeting, let's look at some other reasons why you may experience ad targeting issues on your site.

Have you blocked the AdSense crawler's access to your pages?

The AdSense crawler is an automated program that scans your web pages and tracks content for indexing. Sometimes we don't crawl pages because the AdSense crawler doesn't have access to your pages, in which case we're unable to determine their content and show relevant ads. Here are a few specific instances when our crawler can't access a site:If you use a robots.txt file which regulates the crawler access to your page. In this case, you can grant the AdSense crawler access by adding these lines to the top of your robots.txt file:

User-agent: Mediapartners-Google*

Eliminate Duplicate Content

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

  • Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
  • Store items shown or linked via multiple distinct URLs
  • Printer-only versions of web pages

However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.

Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a "regular" and "printer" version of each article, and neither of these is blocked in robots.txt or with a noindex meta tag, we'll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.

Prevent page from being indexed

Pages you block in this way may still be added to the Google index if other sites link to them. As a result, the URL of the page and, potentially, other publicly available information can appear in Google search results. However, no content from your pages will be crawled, indexed, or displayed.

To entirely prevent a page from being added to the Google index even if other sites link to it, use a noindex meta tag, and ensure that the page does not appear in robots.txt. When Googlebot crawls the page, it will recognize the noindex meta tag and drop the URL from the index.

Prevent content being indexed or remove content from Google's index?

You can instruct us not to include content from your site in our index or to remove content from your site that is currently in our index in the following ways:

Google User-agents

crawls pages to measure AdWords landing page quality
crawl pages from googles web and news index
crawls pages for the image index
crawls pages for the mobile index
crawls pages to determine AdSense content

Robots Meta Tags and Examples

The meta tag is very helpful and should be preferred over modifications to robots.txt. Using the robots meta tag.

Stop all robots from indexing a page on your site, but still follow the links on the page

Allow other robots to index the page on your site, preventing only Googles bots from indexing the page

Allow robots to index the page on your site but not to follow outgoing links

header.php Trick for Conditional Robots Meta

Note: I recommend using the Yoast WordPress SEO Plugin to do this now, but here's a quick and easy way to think about it.. Add this to your header.php

<?php if(is_single() || is_page() || is_category() || is_home()) { ?>
<?php } ?>
<?php if(is_archive()) { ?>
<?php } ?>
<?php if(is_search() || is_404()) { ?>
<?php } ?>

Robots.txt footnoteAlexa, Compete, and Quantcast are all guilty of firewalling unknown friendly search engine agents at the front gate. These sites that monitor the Internet should be the most in the know that unfriendly agents cloak as humans and will come in no matter what. So the general rule of thumb is that robots.txt directives are only for the good agents anyway.

Good Robots.txt Articles

  1. How Google Crawls My Site
  2. Controlling how search engines access and index your website
  3. Controlling Access with robots.txt
  4. Removing duplicate search engine content using robots.txt - Mark Wilson
  5. Revisiting robots.txt - Twenty Steps

Robots.txt References

  1. Robots.txt optimization
  2. The Web Robots Pages
  3. - Notes on helping search engines index your Web site
  4. Wikipedia robots.txt page
  5. Inside Google Sitemaps: Using a robots.txt file


March 15th, 2008

Comments Welcome

  • Cherife

    I usually use these codes:

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-*

    I think I need some change.
    Thanks 4 your post:)

  • sam

    is it good to have the tag cloud crawled. since it is a way of humanly categorizing content?

    I hear this is good, but should I then block my original wordpress categories?

  • Sahi

    If I have 10 pages website, Do we need to add Robots index,follow on each page?

    Or I need to add this only on index or default page so that robots can follow all links from there?

    Also what will happen if robot lands on a inner page first? does this line helps re-directing robot to follow links from index page?

  • MrGroove

    Nice writeup. Thnx

  • cosasdeviajes

    Hello, my robots.txt is the following

    User-agent: *
    Disallow: /wp-
    Disallow: /search
    Disallow: /feed
    Disallow: /comments/feed
    Disallow: /feed/$
    Disallow: /*/feed/$
    Disallow: /*/feed/rss/$
    Disallow: /*/trackback/$
    Disallow: /*/*/feed/$
    Disallow: /*/*/feed/rss/$
    Disallow: /*/*/trackback/$
    Disallow: /*/*/*/feed/$
    Disallow: /*/*/*/feed/rss/$
    Disallow: /*/*/*/trackback/$
    Disallow: /?s=
    Disallow: /dogs
    Disallow: /archives
    Disallow: /page
    Disallow: /author
    Disallow: /2007
    Disallow: /category
    Disallow: /2008
    Disallow: /2009
    Disallow: /?livehit=

    My issue is that my post only ranks when are on the homepage, is something wrong with my robots.txt ??? I´m ussing the same robots.txt on other two blogs and ranks really well

  • cosasdeviajes

    Why you are not disallowing /2007, /author and /page ???

  • stacey

    I fixed it. I put it in my root directory!

  • AskApache

    @ stacey


    @ cosasdeviajes

    Why you are not disallowing /2007, /author and /page ???

    I fixed it so that now it disallows /2007 and /author and I'm allowing /page in my robots.txt so that bots can still follow the links on /page* but they will not be indexed. This makes sure that they don't use up any link-juice on my site and also helps search engines find more interlinking between my main content, single pages.

    <meta name="robots" content="noindex" />
  • vuelomania – vuelos baratos

    Hay que limitar aceso a la carpetas para lascuales nos interesa limitar el rastreo, sin embargo cuidado en no occurir en el Blackhat, algunos manipulan los CSS, y limitan el aceso al buscador para que no se de cuenta de la adaptacion de los H1...Hx...

    yo propongo uno como :

  • David

    Thanks for the post, it's really useful!

    But could you please tell more about following strings:

    Disallow: (without any symbol after colon)
    Allow: /*
    Disallow: /

    What do they mean? Is there any difference between "Allow: /*" and "Disallow:"?

    And how I should disallow indexing of a particular directory: "Disallow: /wp-admin" or "Disallow: /wp-admin/"? (should I use slash at the end or not?)

  • Mehmet

    Hi Matt,

    I am not handy in Robots.txt file, but would i block whole pages from search engines incase if i use both your sample robots.txt file and php codes that you provided for Wordpress.

  • AskApache

    @ mehmet

    I'm not sure what you are asking, but all the information you need is on this page.

  • Will

    I was wondering why the robots.txt file in this example is different to the one at

    nice blog btw.

  • Olivier

    May I please ask you why you put this line :
    Disallow: */comments

    I guess it is to prevent specific comment URL from being indexed but the format of the comments URL isn't like this, isn't it?

  • AskApache

    @ Olivier

    Well its because otherwise a search for askapache on google might list urls like which is what it means to have a URL indexed in a search engine.

    I have about 230 posts on this blog, all high-quality, and coincidentally I have about 240 urls indexed by Google and major search engines. So it really makes my good pages the center of attention


  • webdiggr

    Your Robots.txt will block practically the whole site. Robots.txt does not take variables into consideration, so when you do a:

    Disallow: /*?*
    Disallow: /*?

    Its the same as :

    Disallow: /

    Which blocks everything. I suggest you visit it will explain how wildcards are not supported.

  • Emil

    Well, I think that the above is well done, duplicated content can indeed hurt any website. Good, high-quality content is what Google is looking for, not otherway around. If you are unsure if what is trying to establish, simply take a closer look at their #1 PageRank, #2 SERP and you'll get an immediate answer if this is good or not.

    SEO Agent

  • Renegado

    Pienso que los comentarios no tienen por qué estar indexados. En todo caso, los comentarios forman parte de cada post.

    La instrucción

    Disallow: /wp-

    se excede en alcance, ya que dentro de esa carpeta se halla la carpeta /uploads, así que hay que estar seguro de lo que se está haciendo al momento de usarla.

    Por último le estaremos impidiendo el acceso a googleblogs search si colocamos la orden

    Disallow: /feed/$

    . WARNING...!

  • RocyHua

    Thanks a lot, This robots.txt tutorial is Useful!

  • engfer

    Have you seen a problem with Google showing your robots.txt in it's search results?

  • AskApache

    @ engfer

    Never heard of that, let me search google and see if mine turns up..

    Yep! Check the results of my search Disallow: User-agent: *.

    You can Disallow this by adding this to your robots.txt if you want it removed, I am personally going to keep mine in the index as people use my robots.txt as an example.

    Disallow: /robots.txt
  • JIm Chenoweth

    Ouch, I wish I had seen webdiggers post, thanks for getting 99 percent of my sites content 'resrticted by robot.txt'

    Disallow: /*?*
    Disallow: /*?

    is the same as "get lost robots"

    maybe you should do use all a favor and put disallow in your robot.txt file so this kind of disinformation get weeded out and sifts to the bottom of the sludge pile where it belongs

  • Consultor

    Fantastic article. This article has been of great help to me.

    good luck in your project

  • pututik

    Thanks for the good points, my first robots get the SEO friendly but after some accident the file was lost. It will help me to optimize my WP.

  • Prox

    Thank you for this post mate. I got my site indexed! :)

  • GearModa

    Great tip, actually clarified some questions I had about the robots txt prior. Thanks

  • Peter

    Can I mention the robots.txt WordPress plugin? The default content is not the same as yours, but it's certainly a handy way of creating and managing a robots.txt file for WordPress. Official page is at


  • Mr. T

    Excellent post. I shall be using your robots.txt on my site and hopefully I'll see good results enough to make write a short post on this and link to your site. Thanks a lot.

  • Jon @ IBM Engineering

    thanks :) It seems I was really thinking to much into robots.txt files, they are really just a simple old school method to block search engine crawlers in a very simple way. I was all confused with wordpress robots .txt files but they really are that simple.

    Thanks and have a great day!

    Jon @ IBM Core

  • Robertino

    I'm setting up a new blog and this time wanted to use a proper robots.txt file. So I started reasearching.

    Over at, where this page is linked from ( just above the grey box : Search Engine Optimization for WordPress ) I see that wildcards are used in the robots.txt file sample.

    Then I clicked the link and landed here, and this robots.txt file is not using wildcards at all.

    This, and the examples over at, are the first and only robots.txt file for WordPress which I have seen that are not using the asterisk * sign as a wildcard.

    This is I *think* a good thing, because I read over at : the following :

    Note the '*' is a special token, meaning "any other User-agent"; you cannot use wildcard patterns or regular expressions in either User-agent or Disallow lines.

    Two common errors:

    1. Wildcards are _not_ supported: instead of 'Disallow: /tmp/*' just say 'Disallow: /tmp/'.
    2. You shouldn't put more than one path on a Disallow line (this may change in a future version of the spec)

    But still I see countless of other sites using wildcards in paths.
    Was this changed so that wildcards are supported, and maybe the site just not updated?

    One other thing, I see the lines 'Disallow: /i/' etc in your robots.txt file.
    Do these have to do with the fact that typing the first letter of a page take you to that page ?

    For example : and goes to the contact page.

    When I enter for instance in my browser's address bar, I get a "No posts found" message with a "Nothing found for I" in the title.
    Should I stll be adding those lines ?
    Or should I just go for the whole alphabet ? :)

    Apologies for the lengthy post.

  • Robertino

    Forgot to add the following from I'm going with the format of your robots.txt file for now. ( )

    Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

  • Ujjwol says it cannot crawl my website due to this robots.txt ?
    How to fix this ?

  • Robertino

    If you really want to duplicate your content remove the following.

    Disallow: /
    User-agent: ia_archiver
    Disallow: /

    But is a reasonably good way to proof that your site's content existed at a certain date and was not parked with a an all ads page, or had the content first (in case of plagiarism).

  • Ujjwol

    Thanks Guys

  • hpnugroho

    hi ..
    I have question about this file, can we use on free blog on or ?
    Thanks for your information

  • Dick Raney

    How old is this information? Why don't you show dates on the post or the comments?

  • Robertino

    My previous posts are from October 1st/2nd 2009.
    You shouldn't worry about, this information is very static.
    It's not like they are changing the WordPress directory names, nor what's in them frequently.
    If it would change frequently, the theme designers, plugin coders, blog owners, and search engines wouldn't bother.
    What's most likely to change (frequently) are your own directory names and their content.
    If you don't update that part of the robots.txt file, then it will be old information.

    Off Topic 1:
    Tip : Be careful with robots.txt. It's a simple text file, that anyone can show in their browser.
    Don't giveaway your whole site structure, use meaningless names for the directories that you wish to protect.

    Off Topic 2:
    Bloody Hell, I see that Mehmet of GabfireThemes posted in here. How could I have missed that before ?

  • Keith Davis

    Wow! Pretty comprehensive robots.txt.

    I'm familiar with not producing duplicate content on a static site but the Wordpress duplicate content is something that I'm trying to get to grips with.

    With archives, posts, categories etc. etc. it's not easy.

    Thanks for shedding a little light.

  • Robertino

    Off-topic @Dick Raney:
    Better late then never. Forgot to add that when you hover your mouse pointer over a comment, it will change into an arrow with a question mark beside it. One second later or so, the date and time the comment was posted will also popup.

  • BM

    thanks, your site has tons of good info, we just finished a post on setting up robots.txt in wordpress to mimic silos you can check it out at Ultimate Wordpress Robots.txt for Silo SEO

  • Social Media

    Excellent article, thank you. I am not sure what I'm doing with robots files but know that I need one so this is very useful.

    Thanks again!

  • gina


    I am not very good in techniques but tired to follow all the way down your recommendations.

    Today big suprise: I can not enter any more in my blog:
    The message is as follow;

    Warning: require_once(admin.php) [function.require-once]: failed to open stream: No such file or directory in /public_html/wp-admin/index.php on line 10
    Fatal error: require_once() [function.require]: Failed opening required 'admin.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /public_html/wp-admin/index.php on line 10

    Do you know what this means and can you get me a good advise?
    Lucky that I made a backup before but do not know now how to restore, because I cannot enter my blog. Also it is not possibly to enter in filezilla.

    Please let me know what do do!!

    Thank you


  • zedcloud

    Nice article! What's the recommended file permissions for robots.txt? Is 644 fine? For some reason my site is being detected as a directory and says specific files may have different restrictions. As a result site is not indexed in Google. Any help?

  • Alan

    Hi again,

    I commented a few days back but it didn't seem to take so here goes again... :-)

    I understand that WP has a virtual robots.txt file. The xml sitemap plugin has an option to add the Sitemap to the virtual robots file. I have chosen that option and the virtual robots.txt file is recognized by Google Webmaster Tools and does in fact include the Sitemap. The sitemap plugin states the following:

    The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!

    So, what to do? Is the correct answer to not choose the option via the xml sitemap plugin to use the virtual robots.txt file and to create a new actual robots.txt file, where I can also include the Sitemap?

    Also, why include the Allow option in the robots.txt file for individual Google bots? Why not just Allow all (*) ? And why use any "Allow" command at all? Google will crawl the site providing that there is no Disallow command.

    I look forward to your reply. Thanks!

  • Spunky Jones

    I am using the following in my robots.txt file on one of my directories. I noticed that I have over 1,700 pages and only 400 pages indexed by Google. My question is, which statements in my robots.txt file are causing the pages not to get index? Furthermore, do I actually need any of this in the robots.txt file?

    #  PARTIAL access (Googlebot)
    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*?
    Disallow: /*?*
    Disallow: /?s=A&p=1
    Disallow: /*/trackback/
  • sSysAdmin

    Amazing post, I'm newbie in blogging and this article helped me a lot to understant robots.

  • Sarah

    please updates this information since google updated in new algorithm

  • Seriously Spain

    I wouldn't recommend using this. I tried and it deactivated my website. Deleted it immediately. No problem. The file hasn't been updated in forever so, no, it's unfortunately not useful any longer.

    • AskApache

      Updated.. thanks for the heads-up. There are better robots.txt articles on the site, just do a search.

  • Seriously Spain

    I wouldn't recommend using this. I tried and it deactivated my website. Deleted it immediately. No problem. The file hasn't been updated in forever so, no, it's unfortunately not useful any longer.

    • AskApache

      Updated.. thanks for the heads-up. There are better robots.txt articles on the site, just do a search.

      • mAsT3RpEE

        We know. But this is the page google spit out first. Wouldn't it be better to just update this page. We know you're busy and all but WE the FANS demand it.

        • AskApache

          It's all still accurate, the robots.txt protocol doesn't really change more than once every 10 years..

          Which parts would you like to see updated? Happy to clean it up..

          • mAsT3RpEE

            1) Trailing slashes on directories? What's your opinion? ie "Disallow: /wp-content/" instead of "Disallow: /wp-content" different robots (besides google which is not the only search engine). Handle it differently. I've decided to always include it.

            2) "Disallow: /show-error-* " Shouldn't "Disallow: /show-error-" do it in a more cross-robot way? Stars are not supported by all bots. I couldn't find anything conclusive in official documentation.

            I won't bother you much. I just want a little help in those 2 areas.

          • AskApache

            Good questions, I'll be updating this article shortly.

  • mAsT3RpEE

    Ummmm... I'm no expert but I ran into some issues using this code. I found it popular so I tried it out, but my site disappeared first from bing, then from google.

    This is what I changed: /wp-content to /wp-content/, etc. That way only wp-content directory is disallowed. If I want to have a page called wp-content-something it will not be disallowed. I suggest adding a trailing slash to wp-content, wp-include, wp-admin (basically all the directories). I also had issues with stars */comments/ */trackback/ etc. I found that in order to utilize them you had to specify the bot individually (ouch! lotsa work).

    I had asked before but I will ask again. Please look over my current draft. I want all errors fixed so I can compile better htaccess, robots and xmlmaps formats and submit them to wordpress devs so they can make it permanent. I hate having to do all these things after an install.

    You can find it here: on page 2.

  • Rabia Khan

    How to stop feed urls to be indexes in googel .. Every page of my site has a duplicate feed url as well.. For example i have a post which have an image, therefore google is considering it as a 2 pages and it indexing as a 4 pages extra 2 pages of as well. So how to stop feed urls to be indexed??

  • pog

    Thanks for this article. I am a new blogger so this is very helpful. It’s hard to know how long it takes to make a successful blog, so “being patient” is among some of the best advice you can give. It’s easy to get discouraged when you don’t see movement, but this give me some encouragement.

  • AskApache

    Great article:

  • mAsT3RpEE

    Sorry man. robots.txt is specifically for an entire website. You would need to ask or to specifically edit their version. Fat chance.

  • mAsT3RpEE

    Update: you can now set you robots.txt file in blogspot settings. but not wordpress though

Related Articles

My Online Tools
Popular Articles

Hacking and Hackers

The use of "hacker" to mean "security breaker" is a confusion on the part of the mass media. We hackers refuse to recognize that meaning, and continue using the word to mean someone who loves to program, someone who enjoys playful cleverness, or the combination of the two. See my article, On Hacking.
-- Richard M. Stallman


It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

+Askapache | askapache

Site Map | Contact Webmaster | License and Disclaimer | Terms of Service

↑ TOPMain