Finally some robots.txt questions have been been answered. The most interesting thing is how simple robots.txt files really are. And how incredibly useful they can be at directing your pagerank wherever you want it. The robots.txt secret is that you use the robots.txt file as the first and not too restrictive control. Then you use XHTML meta tags NoIndex and NoFollow in the
section of your html. Finally you mark up actual links in your source code by adding the rel="nofollow" to control pagerank flow.
These hounding robotstxt questings were answered by none other than Matt Cutts in this upbeat and not too technical interview done by Eric Enge of STC. The interview transcript is fairly long and touches on a multiple issues that are quite frankly out my league, so I just grabbed parts of the interview that were answering some robots.txt questions.
NoIndexmeans we won't even show it in our search results.So, with robots.txt for good reasons we've shown the reference even if we can't crawl it, whereas if we crawl a page and find a Meta tag that says NoIndex, we won't even return that page. For better or for worse that's the decision that we've made. I believe Yahoo and Microsoft might handle
NoIndexslightly differently which is little unfortunate, but everybody gets to choose how they want to handle different tags.Eric Enge Can a NoIndex page accumulate PageRank?Matt Cutts A
NoIndexpage can accumulate PageRank, because the links are still followed outwards from a
NoIndexpage.Eric Enge So, it can accumulate and pass PageRank.Matt Cutts Right, and it will still accumulate PageRank, but it won't be showing in our Index. So, I wouldn't make a
NoIndexpage that itself is a dead end. You can make a NoIndex page that has links to lots of other pages.Eric Enge Interviews Google's Matt CuttsFor example you might want to have a master Sitemap page and for whatever reason NoIndex that, but then have links to all your sub Sitemaps....Eric Enge Another example is if you have pages on a site with content that from a user point of view you recognize that it's valuable to have the page, but you feel that is too duplicative of content on another page on the siteThat page might still get links, but you don't want it in the Index and you want the crawler to follow the paths into the rest of the site.Matt Cutts That's right. Another good example is, maybe you have a login page, and everybody ends up linking to that login page. That provides very little content value, so you could NoIndex that page, but then the outgoing links would still have PageRank.Now, if you want to you can also add a NoFollow metatag, and that will say don't show this page at all in Google's Index, and don't follow any outgoing links, and no PageRank flows from that page. We really think of these things as trying to provide as many opportunities as possible to sculpt where you want your PageRank to flow, or where you want Googlebot to spend more time and attention.
December 14th, 2007