« Faster POST and GET Form Submissions… ShazamRemoving Category Base from WordPress URLs »
Instruct Search Engines to come back to site after you finish working on it
February 28th, 2008
Contents
What do you think Googlebot and other Search Engines do when they try to reach your site while you are tinkering with it?
Hopefully you aren't doing anything that could slow the response time for the page google is trying to reach, and if google gets a 404 Not Found error or a 500 Error than your pagerank for that page could cease to exist!
What if you could conveniently tell Googlebot and other bots that you are working on the page but you would like them to come back in, oh, say an hour? I know what I did when I found out this was possible.. I found out how to do it and now I'm sharing with you.
Google Webmaster Central Blog
All About GooglebotIf my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.
How to use this SEO knowledge
We will use a bit of mod_rewrite code in a .htaccess file to send google and other search engine bots the 503 Service Temporarily Unavailable header, and we will also send a Retry-After: 3600 header to instruct the bots when they should re-check our page in 3600 seconds, (1 hour) to see if the page available
Retry-After Header
Article: Retry-After, 503 Service Unavailable
The
Retry-Afterresponse-header field can be used with a503 (Service Unavailable)response to indicate how long the service is expected to be unavailable to the requesting client. This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked wait before issuing the redirected request. The value of this field can be either an HTTP-date or an integer number of seconds (in decimal) after the time of the response.
Retry-After = "Retry-After" ":" ( HTTP-date | delta-seconds )
Two examples of using Retry-After:
Retry-After: Fri, 31 Dec 1999 23:59:59 GMT Retry-After: 120
In the latter example, the delay is 2 minutes.
Send 503 only to Google Bots
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
# or RewriteCond %{HTTP_USER_AGENT} ^.*google.* [NC]
RewriteRule .* /cgi-bin/error/503.php
Send everyone except the developer a 503
Where REMOTE_HOST below is the developer's IP address.
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{REMOTE_HOST} !^1\.1\.1\.1
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/503\.php [NC]
RewriteRule .* /cgi-bin/error/503.php
Send BOTS a 503; humans to error page.
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/503\.php [NC]
RewriteRule .* /cgi-bin/error/503.php
RewriteCond %{REMOTE_HOST} !^1\.1\.1\.1
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/404\.php [NC]
RewriteRule .* /under-development-explain.html [R=302,L]
What's a 503 Service Temporarily Unavailable Header?
503 Service Temporarily Unavailable The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
503 script code
503 Header with PHP
If you are using a CGI version of PHP than the Status header below is required in addition to the HTTP/1.1 header, otherwise if you are using mod_php than you do not need the Status header.
<?php
ob_start();
header('HTTP/1.1 503 Service Temporarily Unavailable');
header('Status: 503 Service Temporarily Unavailable');
header('Retry-After: 3600');
header('X-Powered-By:');
?><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Temporarily Unavailable</title>
</head><body>
<h1>Service Temporarily Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
</body></html>
503 Header with Perl CGI
#!/usr/local/bin/perl print "Status: 503 Service Temporarily Unavailable\n"; print "Content-Type: text/html; charset=UTF-8;\n"; print "Retry-After: 3600\r\n\r\n"; print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>503 Service Temporarily Unavailable</title>\n"; print "</head><body>\n<h1>Service Temporarily Unavailable</h1>\n<p>The server is temporarily unable to service your\n"; print "request due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>";
What are URL unreachable errors?
Google encountered an error when trying to access this URL. We may have encountered a DNS error or timeout, for instance. Your server may have been down or busy when we tried to access the page. Possible URL unreachable errors include:
- 5xx error
- 503 Network Unavailable
- DNS issue
- robots.txt file unreachable
- Network unreachable
Reader Comments
-
yeap, this is a nice trick, didn't know it :) however, at a certain moment some pages have to be notified as dead to SEs. I can't figure out if getting lots of 503 is more acceptable to google than 404, maybe you can explain.
-
awesome bud! +1 for you :P
-
Hi, for what is
ob_start();needed? In php.net is a comment with this ending:$g=ob_get_clean(); echo $g; exit; exit();
This seems to be deleted in this article. Do you missed to deleteob_start();as well? -
Options +FollowSymLinks RewriteEngine On RewriteBase / RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC] # or RewriteCond %{HTTP_USER_AGENT} ^.*google.* [NC] RewriteRule .* /cgi-bin/error/503.php -
Thank you, you saved me!
-
I tried to redirect with
[R=503], but my server (shared hosting) considers that a500 server error. I guess I'll have to go with the PHP route. -
I was curious if there was a way of doing this strictly in apache2 using mod_rewrite and mod_headers. I ended up with the following which is similar to your developer example:
ErrorDocument 503 /maintenance.htm # Custom 503 error page # Developer Address RewriteCond %{REMOTE_ADDR} !^192\.168\.1\.1$ # Only rewrite dynamic pages RewriteCond %{REQUEST_URI} ^(\/.*\.php|\/.*\.html|\/)$ # Set environment variable RewriteRule .* - [E=maintain:1] # Send Retry-After header for clients that match above rewrite section Header always set Retry-After "7200" env=maintain # Send 503 error to matching clients after setting Retry-After header RewriteCond %{ENV:maintain} 1 # Again, only send 503 for dynamic pages (php, html, and directory index pages) RewriteRule ^(\/.*\.php|\/.*\.html|\/)$ - [R=503]Hope someone finds that useful. -
Thanks for this Information... Oliver
-
Good morning, I have question about these bots in general. Comparing the data from Google Analytics, which I run myself, and from AWStats which the company that does my SEO sends me, I've noticed some very large discrepancies. I'm wondering if the SEO company is using autosurf or other kinds of bots to drive up the numbers. Is there any way to detect this? I'd appreciate any advice or direction anyone could provide. My email is grace618@aol.com. Thanks, Kim


Please add useful html code for 503 response...