Instruct Search Engines to come back to site after you finish working on it
« Faster POST and GET Form SubmissionsRemoving Category Base from WordPress URLs »
What do you think Googlebot and other Search Engines do when they try to reach your site while you are tinkering with it?
Hopefully you aren’t doing anything that could slow the response time for the page google is trying to reach, and if google gets a 404 Not Found error or a 500 Error than your pagerank for that page could cease to exist!
What if you could conveniently tell Googlebot and other bots that you are working on the page but you would like them to come back in, oh, say an hour? I know what I did when I found out this was possible.. I found out how to do it and now I’m sharing with you.
If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the “down for maintenance” page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.
We will use a bit of mod_rewrite code in a .htaccess file to send google and other search engine bots the 503 Service Temporarily Unavailable header, and we will also send a Retry-After: 3600 header to instruct the bots when they should re-check our page in 3600 seconds, (1 hour) to see if the page available
Article: Retry-After, 503 Service Unavailable
The
Retry-Afterresponse-header field can be used with a503 (Service Unavailable)response to indicate how long the service is expected to be unavailable to the requesting client. This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked wait before issuing the redirected request. The value of this field can be either an HTTP-date or an integer number of seconds (in decimal) after the time of the response.
Retry-After = "Retry-After" ":" ( HTTP-date | delta-seconds )
Two examples of using Retry-After:
Retry-After: Fri, 31 Dec 1999 23:59:59 GMT Retry-After: 120
In the latter example, the delay is 2 minutes.
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
# or RewriteCond %{HTTP_USER_AGENT} ^.*google.* [NC]
RewriteRule .* /cgi-bin/error/503.php
Where REMOTE_HOST below is the developer’s IP address.
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{REMOTE_HOST} !^1\.1\.1\.1
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/503\.php [NC]
RewriteRule .* /cgi-bin/error/503.php
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/503\.php [NC]
RewriteRule .* /cgi-bin/error/503.php
RewriteCond %{REMOTE_HOST} !^1\.1\.1\.1
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/404\.php [NC]
RewriteRule .* /under-development-explain.html [R=302,L]
503 Service Temporarily Unavailable
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
If you are using a CGI version of PHP than the Status header below is required in addition to the HTTP/1.1 header, otherwise if you are using mod_php than you do not need the Status header.
<?php
ob_start();
header('HTTP/1.1 503 Service Temporarily Unavailable');
header('Status: 503 Service Temporarily Unavailable');
header('Retry-After: 3600');
header('X-Powered-By:');
?><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Temporarily Unavailable</title>
</head><body>
<h1>Service Temporarily Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
</body></html>
#!/usr/local/bin/perl print "Status: 503 Service Temporarily Unavailable\n"; print "Content-Type: text/html; charset=UTF-8;\n"; print "Retry-After: 3600\r\n\r\n"; print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>503 Service Temporarily Unavailable</title>\n"; print "</head><body>\n<h1>Service Temporarily Unavailable</h1>\n<p>The server is temporarily unable to service your\n"; print "request due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>";
Google encountered an error when trying to access this URL. We may have encountered a DNS error or timeout, for instance. Your server may have been down or busy when we tried to access the page. Possible URL unreachable errors include:
« Faster POST and GET Form Submissions
Removing Category Base from WordPress URLs »
Tags: 503, Google, htaccess, mod_rewrite
Please consider donating to support active development of the free software and articles here.![]()
The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect. Tim Berners-Lee
Hi,
for what is ob_start(); needed? In php.net is a comment with this ending:
$g=ob_get_clean(); echo $g; exit; exit();
This seems to be deleted in this article. Do you missed to delete ob_start(); as well?
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
# or RewriteCond %{HTTP_USER_AGENT} ^.*google.* [NC]
RewriteRule .* /cgi-bin/error/503.php
Thank you, you saved me!
I tried to redirect with [R=503], but my server (shared hosting) considers that a 500 server error. I guess I’ll have to go with the PHP route.
I was curious if there was a way of doing this strictly in apache2 using mod_rewrite and mod_headers. I ended up with the following which is similar to your developer example:
ErrorDocument 503 /maintenance.htm # Custom 503 error page
# Developer Address
RewriteCond %{REMOTE_ADDR} !^192\.168\.1\.1$
# Only rewrite dynamic pages
RewriteCond %{REQUEST_URI} ^(\/.*\.php|\/.*\.html|\/)$
# Set environment variable
RewriteRule .* - [E=maintain:1]
# Send Retry-After header for clients that match above rewrite section
Header always set Retry-After "7200" env=maintain
# Send 503 error to matching clients after setting Retry-After header
RewriteCond %{ENV:maintain} 1
# Again, only send 503 for dynamic pages (php, html, and directory index pages)
RewriteRule ^(\/.*\.php|\/.*\.html|\/)$ - [R=503]
Hope someone finds that useful.
Thanks for this Information…
Oliver
Good morning,
I have question about these bots in general. Comparing the data from Google Analytics, which I run myself, and from AWStats which the company that does my SEO sends me, I’ve noticed some very large discrepancies. I’m wondering if the SEO company is using autosurf or other kinds of bots to drive up the numbers. Is there any way to detect this? I’d appreciate any advice or direction anyone could provide. My email is grace618@aol.com.
Thanks,
Kim
It's very simple - you read the protocol and write the code. -Bill Joy
HTML | DCMI | GRDDL | XOXO | XDMP | XFN | DOM | XML | XHTML 1.1 Strict | CSS 2.1 | W3C | TLDP | WAI | DISA | ICSI | GIAC | SANS RR | GHOST | DEFCON | NIST | DHS CYBER | NIST | .:: Phrack Magazine ::.
↑ TOPExcept where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. HTTPD based on NCSA HTTPd
awesome bud!
+1 for you :P