FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home  »  SEO  »  Instruct Search Engines to come back to site after you finish working on it

by 12 comments

What do you think Googlebot and other Search Engines do when they try to reach your site while you are tinkering with it?

Hopefully you aren't doing anything that could slow the response time for the page google is trying to reach, and if google gets a 404 Not Found error or a 500 Error than your pagerank for that page could cease to exist!

What if you could conveniently tell Googlebot and other bots that you are working on the page but you would like them to come back in, oh, say an hour? I know what I did when I found out this was possible.. I found out how to do it and now I'm sharing with you.

Google Webmaster Central Blog

All About Googlebot

If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?

You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.

How to use this SEO knowledge

We will use a bit of mod_rewrite code in a .htaccess file to send google and other search engine bots the 503 Service Temporarily Unavailable header, and we will also send a Retry-After: 3600 header to instruct the bots when they should re-check our page in 3600 seconds, (1 hour) to see if the page available

Retry-After Header

Article: Retry-After, 503 Service Unavailable

The Retry-After response-header field can be used with a 503 (Service Unavailable) response to indicate how long the service is expected to be unavailable to the requesting client. This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked wait before issuing the redirected request. The value of this field can be either an HTTP-date or an integer number of seconds (in decimal) after the time of the response.

Retry-After  = "Retry-After" ":" ( HTTP-date | delta-seconds )

Two examples of using Retry-After:

Retry-After: Fri, 31 Dec 1999 23:59:59 GMT
Retry-After: 120

In the latter example, the delay is 2 minutes.

Send 503 only to Google Bots

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
 
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
# or RewriteCond %{HTTP_USER_AGENT} ^.*google.* [NC]
RewriteRule .* /cgi-bin/error/503.php

Send everyone except the developer a 503

Where REMOTE_HOST below is the developer's IP address.

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
 
RewriteCond %{REMOTE_HOST} !^1.1.1.1
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/503.php [NC]
RewriteRule .* /cgi-bin/error/503.php

Send BOTS a 503; humans to error page.

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/503.php [NC]
RewriteRule .* /cgi-bin/error/503.php
 
RewriteCond %{REMOTE_HOST} !^1.1.1.1
RewriteCond %{REQUEST_URI} !^/cgi-bin/error/404.php [NC]
RewriteRule .* /under-development-explain.html [R=302,L]

What's a 503 Service Temporarily Unavailable Header?

503 Service Temporarily UnavailableThe server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.

503 script code

503 Header with PHP

If you are using a CGI version of PHP than the Status header below is required in addition to the HTTP/1.1 header, otherwise if you are using mod_php than you do not need the Status header.

<?php
ob_start();
header('HTTP/1.1 503 Service Temporarily Unavailable');
header('Status: 503 Service Temporarily Unavailable');
header('Retry-After: 3600');
header('X-Powered-By:');
?><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Temporarily Unavailable</title>
</head><body>
<h1>Service Temporarily Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
</body></html>

503 Header with Perl CGI

#!/usr/local/bin/perl
 
print "Status: 503 Service Temporarily Unavailablen";
print "Content-Type: text/html; charset=UTF-8;n";
print "Retry-After: 3600rnrn";
 
print "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">n<html><head>n<title>503 Service Temporarily Unavailable</title>n";
print "</head><body>n<h1>Service Temporarily Unavailable</h1>n<p>The server is temporarily unable to service yourn";
print "request due to maintenance downtime or capacitynproblems. Please try again later.</p>n</body></html>";

What are URL unreachable errors?

Google encountered an error when trying to access this URL. We may have encountered a DNS error or timeout, for instance. Your server may have been down or busy when we tried to access the page. Possible URL unreachable errors include:

  1. 5xx error
  2. 503 Network Unavailable
  3. DNS issue
  4. robots.txt file unreachable
  5. Network unreachable

Tags

February 28th, 2008

Comments Welcome

  • Kim

    Good morning,

    I have question about these bots in general. Comparing the data from Google Analytics, which I run myself, and from AWStats which the company that does my SEO sends me, I've noticed some very large discrepancies. I'm wondering if the SEO company is using autosurf or other kinds of bots to drive up the numbers. Is there any way to detect this? I'd appreciate any advice or direction anyone could provide. My email is grace618@aol.com.

    Thanks,
    Kim

  • http://www.kadomedia.de Oliver

    Thanks for this Information...

    Oliver

  • http://www.everytruckjob.com/ Mark Reidenbach

    I was curious if there was a way of doing this strictly in apache2 using mod_rewrite and mod_headers. I ended up with the following which is similar to your developer example:

    ErrorDocument 503 /maintenance.htm   # Custom 503 error page
     
    # Developer Address
    RewriteCond %{REMOTE_ADDR} !^192.168.1.1$
     
    # Only rewrite dynamic pages
    RewriteCond %{REQUEST_URI} ^(/.*.php|/.*.html|/)$
     
    # Set environment variable
    RewriteRule .* - [E=maintain:1]
     
    # Send Retry-After header for clients that match above rewrite section
    Header always set Retry-After "7200" env=maintain
     
    # Send 503 error to matching clients after setting Retry-After header
    RewriteCond %{ENV:maintain} 1
     
    # Again, only send 503 for dynamic pages (php, html, and directory index pages)
    RewriteRule ^(/.*.php|/.*.html|/)$ - [R=503]

    Hope someone finds that useful.

  • http://www.askapache.com/ AskApache

    @ Mark

    VERY COOL! That is probably way to advanced for most people.. I especially love your implementation of both the Header directive and Environment Variables.

  • http://judebert.com/progress/ Judebert

    I tried to redirect with [R=503], but my server (shared hosting) considers that a 500 server error. I guess I'll have to go with the PHP route.

  • http://www.askapache.com/ AskApache

    @ Judebert

    That would result in a 500 Error for every version of Apache because mod_rewrite cannot use the R flag to redirect for anything other than 300 status codes. Redirect and RedirectMatch on the other hand, can redirect with all 57 status codes.

  • http://www.devpedia.net Steve

    Thank you, you saved me!

  • yetyety
    Options +FollowSymLinks
    RewriteEngine On
    RewriteBase /
     
    RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC]
    # or RewriteCond %{HTTP_USER_AGENT} ^.*google.* [NC]
    RewriteRule .* /cgi-bin/error/503.php
  • Marc

    Hi,

    for what is ob_start(); needed? In php.net is a comment with this ending:

    $g=ob_get_clean();
    echo $g;
    exit;
    exit();

    This seems to be deleted in this article. Do you missed to delete ob_start(); as well?

  • tom3k

    awesome bud!

    +1 for you :P

  • http://best-seo.org/blog/ best seo

    yeap, this is a nice trick, didn't know it :) however, at a certain moment some pages have to be notified as dead to SEs. I can't figure out if getting lots of 503 is more acceptable to google than 404, maybe you can explain.

  • Sawa

    Please add useful html code for 503 response...

Popular Articles
My Online Tools

Related Articles
Newest Posts
Twitter



Hacking and Hackers

The use of "hacker" to mean "security breaker" is a confusion on the part of the mass media. We hackers refuse to recognize that meaning, and continue using the word to mean someone who loves to program, someone who enjoys playful cleverness, or the combination of the two. See my article, On Hacking.
-- Richard M. Stallman






[hide]

It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

| Google+ | askapache

Site Map | Contact Webmaster | License and Disclaimer | Terms of Service

↑ TOPMain