<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AskApache &#187; phpBB</title>
	<atom:link href="http://www.askapache.com/phpbb/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.askapache.com</link>
	<description>Advanced Web Development</description>
	<lastBuildDate>Fri, 25 Jun 2010 00:55:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Encrypted WordPress / phpBB Backups</title>
		<link>http://www.askapache.com/linux-unix/encrypted-wordpress-site-backups.html</link>
		<comments>http://www.askapache.com/linux-unix/encrypted-wordpress-site-backups.html#comments</comments>
		<pubDate>Mon, 04 Aug 2008 16:28:00 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[DreamHost]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Linux Unix BSD]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Server Administration]]></category>
		<category><![CDATA[Shell Scripting]]></category>
		<category><![CDATA[Web Hosting]]></category>
		<category><![CDATA[Webmaster]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[phpBB]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[askapache]]></category>
		<category><![CDATA[Backups]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[GET]]></category>
		<category><![CDATA[GPG]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[Nice]]></category>
		<category><![CDATA[password]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Renice]]></category>
		<category><![CDATA[Shell]]></category>
		<category><![CDATA[shell script]]></category>
		<category><![CDATA[umask]]></category>
		<category><![CDATA[wp-config.php]]></category>

		<guid isPermaLink="false">http://www.askapache.com/?p=1049</guid>
		<description><![CDATA[<p><a rel="lb" class="IFL hs hs11" href='http://www.askapache.com/linux-unix/encrypted-wordpress-site-backups.html' title="Bash Shell Script for Encrypted WordPress and phpBB Backups"></a>Enter your DOMAIN_ROOT and the location of your wp-config.php or config.php, and this script finds all the mysql settings by parsing the phpbb or wordpress config file, then creates GPG encrypted backups, and saves your settings for future automation.<br class="C" /></p>]]></description>
			<content:encoded><![CDATA[<p><a rel="lb" class="IFL hs hs11" href='http://uploads.askapache.com/2008/07/sbackup.png' title="Bash Shell Script for Encrypted WordPress and phpBB Backups"></a>Because backups contain all your sensitive information, its smart to encrypt any sql backups.. and while we&#8217;re at it, also encrypt any site backups.</p>
<p>This simple shell-script is a useful and easy way to securely backup your wordpress and/or phpBB site files and database without confusing you.  Just generate a GPG key once, enter in 3 settings once, and from then on it runs without any user-input whenever you want.<br class="C" /></p>
<h2>What it Does</h2>
<p>When run, this script asks you for the location of your websites document root and the location of your wp-config.php or config.php file.  It also asks you for your encryption UID.  Then this script saves those settings in a file called .sbackup so that the next time you run the script it will run without having to re-enter that information, making it nice for cronjobs or quick and easy on-demand backups.  Another cool feature that I added is this script automatically parses your wp-config.php file for the mysql database name, user, host, and password, meaning you don&#8217;t have to compromise your security or take the time to type those settings in manually.</p>
<h2>What is Backed Up</h2>
<p>This script creates a tarred and gzipped archive of your entire document root in the folder <code>~/backups/domain.com/domain.com-date.tgz</code> and also creates a backup of your WordPress database and phpBB database in a format that is ideal for restoring from.  Both of these files are then encrypted using your GPG key and can then be safely downloaded as a password and key is required to decrypt them.</p>
<h2>Generating a GPG Key</h2>
<p>If you don&#8217;t already have one setup for your shell account run this command remembering the uid which you will enter in the shell script.</p>
<pre>gpg &#45;-gen-key</pre>
<h3>Decrypting Files</h3>
<pre>gpg -r UID &#45;-output FILENAME.tgz &#45;-decrypt FILENAME.tgz.asc</pre>
<h2>The Shell Script</h2>
<p><a href='http://uploads.askapache.com/2008/12/_sbackup.sh'>site-backup.sh</a></p>
<pre>#!/bin/bash
# SiteBack Version 3.3, 2008-12-17
# GNU Free Documentation License 1.2
# 12-17-08 - AskApache (www.askapache.com)
umask 022
&nbsp;
### SHELL OPTIONS
set +o noclobber # allowed to clobber files
set +o noglob # globbing on
set +o xtrace # change to - to enable tracing
set +o verbose # change to - to enable verbose debugging
set -e # abort on first error
shopt -s extglob
&nbsp;
###########################################################################&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
###
### SETTINGS
###
###########################################################################==-==-==-==-==-==-==-==-==-==-==#
&nbsp;
DT=$(date +%x); DT=${DT//\/}
DTX=$(date +%x-%H%M); DTX=${DTX//\/}
BDIR=${HOME}/backups
RUN_FILE=${BDIR}/$$.bk.log
MY_CONFIG=&quot;.sbackup&quot;
DOMAIN=;DB_NAME=;DB_USER=;DB_PASSWORD=;DB_HOST=;APP_CONFIG=;SQL_DEST=;ARC_DEST=;ENCRYPT_USER=
E_SUCCESS=0;E_YN=0;E_YES=251;E_NO=250;E_RETURN=65;C0=;C1=;C2=;C3=;C4=;C5=;C5=;C7=
&nbsp;
###########################################################################&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
###
### FUNCTIONS
###
###########################################################################==-==-==-==-==-==-==-==-==-==-==#
&nbsp;
#&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
# script_title
#==-==-==-==-==-==-==-==-==-==-==#
function script_title(){
 local e=&quot;\033[&quot;
 local l=&#039; ___________________________________________________________________ &#039;
&nbsp;
 # SET WINDOW TITLE AND COLORS IF CLIENT CAPABLE
 case $TERM in xterm*|vt*|ansi|rxvt|gnome*)
 C0=&quot;${e}0m&quot;;C1=&quot;${e}1;30m&quot;;C2=&quot;${e}1;32m&quot;;C3=&quot;${e}0;32m&quot;;C4=&quot;${e}1;37m&quot;;C5=&quot;${e}1;35m&quot;;C6=&quot;${e}30;42m&quot;
 esac
&nbsp;
 echo -e &quot;\n${C0}$l${C1}&quot;
 echo -e &quot;|             ${C2}___       __    ___                 __${C1}                |&quot;
 echo -e &quot;|            ${C2}/ _ | ___ / /__ / _ | ___  ___ _____/ /  ___${C1}           |&quot;
 echo -e &quot;|           ${C2}/ __ |(_-&lt;/  &#039;_// __ |/ _ \/ _ \`/ __/ _ \/ -_)${C1}          |&quot;
 echo -e &quot;|          ${C3}/_/ |_/___/_/\_\/_/ |_/ .__/\_,_/\__/_//_/\__/${C1}           |&quot;
 echo -e &quot;|                               ${C3}/_/${C1}                                 |&quot;
 echo -e &quot;|                                                                   |&quot;
 echo -e &quot;|       ${C1}+&#45;-${C0} SITE BACKUP SCRIPT Version 3.3${C1}                          |&quot;
 echo -e &quot;${C0}$l\n\n&quot;
}
&nbsp;
#&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
# pm
#==-==-==-==-==-==-==-==-==-==-==#
function pm(){
 START=$(date +%s) &amp;&amp; touch ${RUN_FILE}
 case &quot;${2:-title}&quot; in
  &quot;title&quot;) echo -en &quot;\n\n${C2}&gt;&gt;&gt; ${C4}${1} ${C0} \n\n&quot;; ;;
   &quot;info&quot;) echo -e &quot;${C5}=&gt; ${C4}${1} ${C0}&quot;; ;;
   &quot;item&quot;) echo -e &quot;${C4}&#45;- ${C0}${1} &quot;; ;;
 esac
}
&nbsp;
#&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
# yes_no
#==-==-==-==-==-==-==-==-==-==-==#
function yes_no(){
 local ans
 echo -en &quot;${1} [y/n] &quot; ; read -n 1 ans
 case &quot;$ans&quot; in
  n|N) E_YN=$E_NO ;;
  y|Y) E_YN=$E_YES ;;
 esac
}
&nbsp;
#&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
# do_sleep
#==-==-==-==-==-==-==-==-==-==-==#
function do_sleep (){
 local END DIFF
 echo -en &quot;${C5}${3:-.}&quot;; while [ -r &quot;$RUN_FILE&quot; ]; do sleep ${2:-3}; echo -en &quot;${3:-.}&quot;; done;
 echo -e &quot;${C0}&quot;; sleep 1; END=$(date +%s);DIFF=$(( $END - $START ))
 echo -e &quot;\n${C6} [T: ${SECONDS}] COMPLETED IN ${DIFF} SEC ${C0} \n\n&quot;; sleep 1;
 return 0;
}
&nbsp;
#&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
# get_settings
#==-==-==-==-==-==-==-==-==-==-==#
function get_settings(){
 local cha HOSTED_SITES G GG
 clear; script_title

 if [[ -r &quot;$MY_CONFIG&quot; ]]; then

  OIFS=$IFS; while IFS=: read DOMAIN DOMAINROOT APP_CONFIG ENCRYPT_USER; do
   DOMAIN=${DOMAIN};
   DOMAINROOT=${DOMAINROOT};
   APP_CONFIG=${APP_CONFIG};
   ENCRYPT_USER=${ENCRYPT_USER};
   #E_YN=$E_YES;
   break
  done &lt;${MY_CONFIG};
  IFS=$OIFS

 else

  gpg &#45;-list-keys|grep uid.*|awk &#039;{print $2}&#039;
  echo -en &quot;\n What userid to use for encryption?  &quot;;
  read -e ENCRYPT_USER; echo
&nbsp;
  echo -en &quot;\n What domain would you like to backup?  &quot;; read -e DOMAIN; echo
&nbsp;
  echo $PWD
  until [ -d &quot;$DOMAINROOT&quot; ]; do echo -en &quot;\n Folder where config file is located?  &quot;;
  read -e DOMAINROOT; echo; done
&nbsp;
  [[ -r &quot;$DOMAINROOT/config.php&quot; ]] &amp;&amp; APP_CONFIG=$DOMAINROOT/config.php &amp;&amp; DOT=PHP
  [[ -r &quot;$DOMAINROOT/wp-config.php&quot; ]] &amp;&amp; APP_CONFIG=$DOMAINROOT/wp-config.php &amp;&amp; DOT=WP

  echo $PWD
  until [[ -r &quot;$APP_CONFIG&quot; ]]; do echo -en &quot;\n Where is the applications config file?  &quot;; read -e APP_CONFIG; echo; done

 fi

&nbsp;
  [[ -r &quot;$DOMAINROOT/config.php&quot; ]] &amp;&amp; APP_CONFIG=$DOMAINROOT/config.php &amp;&amp; DOT=PHP
  [[ -r &quot;$DOMAINROOT/wp-config.php&quot; ]] &amp;&amp; APP_CONFIG=$DOMAINROOT/wp-config.php &amp;&amp; DOT=WP
&nbsp;
  ### For phpBB
  if [[ &quot;${DOT}&quot; == &quot;PHP&quot; ]]; then
    GG=$(sed -e &#039;/$db\(n\|u\|pa\|h\)/!d&#039; -e &quot;s/$db_\(name\|user\|passwd\|host\)\ =\ &#039;\([^&#039;]*\).*\$/\1=&#039;\2&#039;;/g&quot; -e &#039;s/$db/DB_/g&#039; ${APP_CONFIG});
    G=$(echo ${GG}|sed -e &#039;s/DB_name/DB_NAME/g&#039; -e &#039;s/DB_user/DB_USER/g&#039; -e &#039;s/DB_passwd/DB_PASSWORD/g&#039; -e &#039;s/DB_host/DB_HOST/g&#039;);
  else
    G=$(sed -e &quot;/define(&#039;DB_\(NAME\|USER\|PASSWORD\|HOST\)/!d&quot; -e &quot;s/[^&#039;]*&#039;DB_\(NAME\|USER\|PASSWORD\|HOST\)&#039;[^&#039;]*&#039;\([^&#039;]*\)&#039;.*$/DB_\1=&#039;\2&#039;;/g&quot; ${APP_CONFIG})
  fi
  eval $G;
&nbsp;
 mkdir -p ${BDIR}/${DOMAIN}
 SQL_DEST=${BDIR}/${DOMAIN}/${DOMAIN}-${DT}.sql;
 [[ -r &quot;${SQL_DEST}.asc&quot; ]] &amp;&amp; SQL_DEST=${BDIR}/${DOMAIN}/${DOMAIN}-${DTX}.sql
&nbsp;
 ARC_DEST=${BDIR}/${DOMAIN}/${DOMAIN}-${DT}.tgz;
 [[ -r &quot;${ARC_DEST}.asc&quot; ]] &amp;&amp; ARC_DEST=${BDIR}/${DOMAIN}/${DOMAIN}-${DTX}.tgz
&nbsp;
 if [[ &quot;$E_YN&quot; != &quot;$E_YES&quot; ]]; then
  for a in &quot;DOMAIN&quot; &quot;DOMAINROOT&quot; &quot;APP_CONFIG&quot; &quot;ENCRYPT_USER&quot; &quot;DB_NAME&quot; &quot;DB_USER&quot; &quot;DB_PASSWORD&quot; &quot;DB_HOST&quot;; do echo -e &quot;${a}: ${!a}&quot;; done
  echo; yes_no &quot;ARE THESE SETTINGS CORRECT&quot;
 fi
&nbsp;
 while [[ &quot;$E_YN&quot; != &quot;$E_YES&quot; ]]; do
  for a in &quot;DOMAIN&quot; &quot;DOMAINROOT&quot; &quot;APP_CONFIG&quot; &quot;ENCRYPT_USER&quot; &quot;DB_NAME&quot; &quot;DB_USER&quot; &quot;DB_PASSWORD&quot; &quot;DB_HOST&quot;; do
   echo -en &quot;\n (Enter for Default: ${!a} )\n ${a}:&gt; &quot;
   read -e cha; echo; [[ ${#cha} -gt 2 ]] &amp;&amp; eval &quot;$a&quot;=$cha
  done
  yes_no &quot;ARE THESE SETTINGS CORRECT&quot;
 done
&nbsp;
 echo -e &quot;${DOMAIN}:${DOMAINROOT}:${APP_CONFIG}:${ENCRYPT_USER}&quot; &gt; $MY_CONFIG
}
&nbsp;
#&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-=&#45;-#
# exit_cleanup
#==-==-==-==-==-==-==-==-==-==-==#
function exit_cleanup(){
 cd $OLDPWD
 [[ -r &quot;${SQL_DEST}&quot; ]] &amp;&amp; rm ${SQL_DEST}
 [[ -r &quot;${ARC_DEST}&quot; ]] &amp;&amp; rm ${ARC_DEST}
}
&nbsp;
############################################################################################################
###
### MAIN CODE
###
############################################################################################################
&nbsp;
#=# CATCH SCRIPT KILLED BY USER
trap exit_cleanup SIGHUP SIGINT SIGTERM
&nbsp;
#=# MAKE MAIN SCRIPT NICE
renice 19 -p $$ &amp;&gt;/dev/null
&nbsp;
cd `dirname $0`
&nbsp;
get_settings
&nbsp;
pm &quot;CREATING SQL BACKUP&quot;
mysqldump &#45;-opt -u${DB_USER} -p${DB_PASSWORD} -h ${DB_HOST} -r ${SQL_DEST} &#45;-add-drop-table ${DB_NAME} 1&gt;&amp;2 &amp;&gt;/dev/null &amp;&amp; sleep 2 1&gt;&amp;2 &amp;&gt;/dev/null &amp;&amp; rm ${RUN_FILE} 2&gt;&amp;1&amp;
do_sleep 1 1 &quot;:&quot;
&nbsp;
pm &quot;ENCRYPTING SQL BACKUP&quot;
gpg &#45;-armor &#45;-recipient ${ENCRYPT_USER} &#45;-output ${SQL_DEST}.asc &#45;-encrypt ${SQL_DEST} 1&gt;&amp;2 &amp;&gt;/dev/null &amp;&amp; sleep 2 1&gt;&amp;2 &amp;&gt;/dev/null &amp;&amp; rm ${RUN_FILE} 2&gt;&amp;1&amp;
do_sleep 1 1 &quot;:&quot;; rm ${SQL_DEST}
&nbsp;
pm &quot;CREATING ARCHIVE BACKUP&quot;
tar -czf ${ARC_DEST} . 1&gt;&amp;2 &amp;&gt;/dev/null &amp;&amp; rm ${RUN_FILE} 2&gt;&amp;1&amp;
do_sleep 1 5 &quot;:&quot;
&nbsp;
pm &quot;ENCRYPTING ARCHIVE BACKUP&quot;
gpg &#45;-armor &#45;-recipient ${ENCRYPT_USER} &#45;-output ${ARC_DEST}.asc &#45;-encrypt ${ARC_DEST} 1&gt;&amp;2 &amp;&gt;/dev/null &amp;&amp; rm ${RUN_FILE} 2&gt;&amp;1&amp;
do_sleep 1 1 &quot;:&quot;; rm ${ARC_DEST}
&nbsp;
echo -e &quot;${C1} __________________________________________________________________________ &quot;
echo -e &quot;|                                                                          |&quot;
echo -e &quot;|                 ${C4} COMPLETED SUCCESSFULLY ${C1}                                 |&quot;
echo -e &quot;${C1} __________________________________________________________________________ ${C0} \n\n&quot;
&nbsp;
cd $OLDPWD
&nbsp;
exit $?</pre>
<p><a href="http://www.askapache.com/linux-unix/encrypted-wordpress-site-backups.html"></a><a href="http://www.askapache.com/linux-unix/encrypted-wordpress-site-backups.html">Encrypted WordPress / phpBB Backups</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/linux-unix/encrypted-wordpress-site-backups.html/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SEO with Robots.txt</title>
		<link>http://www.askapache.com/seo/seo-with-robotstxt.html</link>
		<comments>http://www.askapache.com/seo/seo-with-robotstxt.html#comments</comments>
		<pubDate>Sat, 20 Oct 2007 16:59:38 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[phpBB]]></category>
		<category><![CDATA[401]]></category>
		<category><![CDATA[403 Forbidden]]></category>
		<category><![CDATA[404 Not Found]]></category>
		<category><![CDATA[admin]]></category>
		<category><![CDATA[Advanced]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[askapache]]></category>
		<category><![CDATA[ASP]]></category>
		<category><![CDATA[Bandwidth]]></category>
		<category><![CDATA[Blocking]]></category>
		<category><![CDATA[Boot]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[Elite]]></category>
		<category><![CDATA[Email]]></category>
		<category><![CDATA[Examples]]></category>
		<category><![CDATA[feed]]></category>
		<category><![CDATA[GET]]></category>
		<category><![CDATA[Htaccess]]></category>
		<category><![CDATA[HTTP Headers]]></category>
		<category><![CDATA[HTTP Status Codes]]></category>
		<category><![CDATA[httpd]]></category>
		<category><![CDATA[HTTPS SSL]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Login]]></category>
		<category><![CDATA[Logs]]></category>
		<category><![CDATA[Nice]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Port]]></category>
		<category><![CDATA[post]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[ram]]></category>
		<category><![CDATA[Redirect]]></category>
		<category><![CDATA[Robot]]></category>
		<category><![CDATA[robots]]></category>
		<category><![CDATA[robots.txt]]></category>
		<category><![CDATA[rsync]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[SSI]]></category>
		<category><![CDATA[stat]]></category>
		<category><![CDATA[Web Hosting]]></category>
		<category><![CDATA[Wget]]></category>
		<category><![CDATA[Windows XP]]></category>

		<guid isPermaLink="false">http://www.askapache.com.com/seo/seo-with-robotstxt.html</guid>
		<description><![CDATA[Very nice tutorial dealing with the robots.txt file.  Shows examples for google and other search engines.  Wordpress robots.txt and phpBB robots.txt sample files.]]></description>
			<content:encoded><![CDATA[<p><strong>robots.txt</strong> Search Engine Optimization is simply using robots.txt for your blog, wordpress, or phpbb. <a href="http://www.askapache.com/seo/wordpress-robotstxt-optimized-for-seo.html">WordPress Optimized robots.txt and meta tags</a></p>
<hr />
<p class="anote">See the <a href="http://www.askapache.com/seo/updated-robotstxt-for-wordpress.html">Updated WordPress robots.txt file</a></p>
<ul class="TOCC">
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#google-robots-txt-recommendations" title="Google Robots.txt Recommendations">Google Robots.txt Info and Recommendations</a>
<ul>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#googlebot-robots-txt" title="Googlebot and Robots.txt SEO Info">Googlebot and Robots.txt SEO Info</a></li>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#google-user-agents" title="Google User-agents">Google User-agents</a></li>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#removing-content-from-google" title="Removing Old/wrong content from google">Removing Old/wrong content from google</a></li>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#google-robots-txt-articles" title="Google Robots.txt Articles">Google Robots.txt Articles</a></li>
</ul>
</li>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#robots-txt-examples">Example Robots.txt files</a>
<ul>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#robots-txt-file-for-wordpress" title="robots.txt for WordPress">robots.txt for WordPress</a></li>
<li><a href="http://www.askapache.com/seo/seo-with-robotstxt.html#robots-txt-file-for-phpbb" title="robots.txt for phpBB">robots.txt for phpBB</a></li>
</ul>
</li>
</ul>
<h2><a name="google-robots-txt-recommendations" id="google-robots-txt-recommendations" title="Google Robots.txt Recommendations">Google Robots.txt Info and Recommendations</a></h2>
<blockquote cite="http://www.google.com/support/webmasters/bin/answer.py?answer=35769">
<p>Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it&#8217;s current for your site so that you don&#8217;t accidentally block the Googlebot crawler.</p>
</blockquote>
<h3><a name="googlebot-robots-txt" id="googlebot-robots-txt" title="Googlebot and Robots.txt SEO Info">Googlebot and Robots.txt SEO Info</a></h3>
<p>When deciding which pages to crawl, Googlebot goes in this order</p>
<ol>
<li>Googlebot will obey the first record in the robots.txt file with a User-agent starting with &#8220;Googlebot.&#8221;</li>
<li>If no &#8220;Googlebot User-agent exists, it will obey the first entry with a User-agent of &#8220;*&#8221;</li>
</ol>
<h3><a name="google-user-agents" id="google-user-agents">Google User-agents</a></h3>
<dl>
<dt><a name="googlebot" id="googlebot">Googlebot</a></dt>
<dd>crawl pages from our web index and our news index</dd>
<dt><a name="googlebot-mobile" id="googlebot-mobile">Googlebot-Mobile</a></dt>
<dd>crawls pages for our mobile index</dd>
<dt><a name="googlebot-image" id="googlebot-image">Googlebot-Image</a></dt>
<dd>crawls pages for our image index</dd>
<dt><a name="mediapartners-google" id="mediapartners-google">Mediapartners-Google</a></dt>
<dd>crawls pages to determine AdSense content. We only use this bot to crawl your site if you show AdSense ads on your site.</dd>
<dt><a name="adsbot-google" id="adsbot-google">Adsbot-Google</a></dt>
<dd>crawls pages to measure AdWords landing page quality. We only use this bot if you use Google AdWords to advertise your site. Find out more about this bot and how to block it from portions of your site.</dd>
</dl>
<h3><a name="removing-content-from-google" id="removing-content-from-google" title="Removing Old/wrong content from google">Removing Old/wrong content from google</a></h3>
<ol>
<li>Create the new page</li>
<li>In .htaccess (if Linux) add a RedirectPermanent command</li>
<li>DO NOT DELETE THE OLD FILE</li>
<li>Update all the links on your website to point to the new page    (change the link text while you&#8217;re at it)</li>
<li>Verify that no pages point to the old file (including your sitemap.xml)</li>
<li>Add a noindex,nofollow to the old file AND Disallow in your robots.txt</li>
<li>Submit your updated sitemap.xml to Google &#038; Yahoo</li>
<li>Wait a few weeks</li>
<li>When the new page appears in Google, it&#8217;s safe to delete the old one</li>
</ol>
<h3><a name="google-robots-txt-articles" id="google-robots-txt-articles" title="Google Robots.txt Articles">Google Sponsored Robots.txt Articles</a></h3>
<ol>
<li><a href="http://googleblog.blogspot.com/2007/01/controlling-how-search-engines-access.html" rel="nofollow"  title="1/26/2007, Dan Crow">Controlling how search engines access and index your website</a></li>
<li><a href="http://googleblog.blogspot.com/2007/02/robots-exclusion-protocol.html" rel="nofollow"  title="2/22/2007, Dan Crow">The Robots Exclusion Protocol</a></li>
<li><a href="http://sitemaps.blogspot.com/2006/02/analyzing-robotstxt-file.html" rel="nofollow" >robots.txt analysis tool</a></li>
<li><a href="http://www.google.com/webmasters/bot.html" rel="nofollow" >Googlebot</a></li>
<li><a href="http://sitemaps.blogspot.com/2006/02/using-robotstxt-file.html" rel="nofollow" >Inside Google Sitemaps: Using a robots.txt file</a></li>
<li><a href="http://googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html" rel="nofollow" >All About Googlebot</a></li>
</ol>
<h2><a name="robots-txt-examples" id="robots-txt-examples">robots.txt examples</a></h2>
<h3><a name="robots-txt-file-for-wordpress" id="robots-txt-file-for-wordpress" title="robots.txt for WordPress">robots.txt for WordPress 2.+</a></h3>
<pre>User-agent:  *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/
&nbsp;
User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
&nbsp;
# disallow all files with ? in url
Disallow: /*?*
&nbsp;
# disable duggmirror
User-agent: duggmirror
Disallow: /
&nbsp;
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
&nbsp;
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*</pre>
<h3><a name="robots-txt-file-for-phpbb" id="robots-txt-file-for-phpbb" title="robots.txt for phpBB">robots.txt for phpBB</a></h3>
<pre>User-agent: *
Disallow: /cgi-bin/
Disallow: /phpbb/admin/
Disallow: /phpbb/cache/
Disallow: /phpbb/db/
Disallow: /phpbb/images/
Disallow: /phpbb/includes/
Disallow: /phpbb/language/
Disallow: /phpbb/templates/
Disallow: /phpbb/faq.php
Disallow: /phpbb/groupcp.php
Disallow: /phpbb/login.php
Disallow: /phpbb/memberlist.php
Disallow: /phpbb/modcp.php
Disallow: /phpbb/posting.php
Disallow: /phpbb/privmsg.php
Disallow: /phpbb/profile.php
Disallow: /phpbb/search.php
Disallow: /phpbb/viewonline.php
&nbsp;
User-agent: Googlebot
# disallow files ending with these extensions
Disallow: /*.inc$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$

# disallow all files with? in url
Disallow: *mark=*
Disallow: *view=*

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
&nbsp;
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*</pre>
<pre>User-agent: *
Disallow: /stats
Disallow: /dh_
Disallow: /V
Disallow: /z/j/
Disallow: /z/c/
Disallow: /cgi-bin/
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /templates/
Disallow: /mx_
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /images/
Disallow: /includes/
Disallow: /common.php
Disallow: /index.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /privmsg.php
Disallow: /viewonline.php
Disallow: /images/
Disallow: /rss.php
&nbsp;
User-agent: Googlebot
# disallow all files ending with these extensions
Allow: /sitemap.php
Disallow: /*.php$
Allow: /sitemap.php
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$

# disallow all files with? in url
Disallow: /*?*
Disallow: /*?

# disallow all files in /wp- directorys
Disallow: /wp-*/

# disallow archiving site
User-agent: ia_archiver
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images
Allow: /z/i/
&nbsp;
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Allow: /*</pre>
<h2>Pattern Matching with Google</h2>
<h3>Matching a sequence of characters using <code>*</code></h3>
<p>You can use an asterisk <code>*</code> to match a sequence of characters.</p>
<h4>Block access to all subdirectories that begin with <code>private:</code></h4>
<pre>User-Agent: Googlebot
Disallow: /private*/</pre>
<h4>Block access to all URLs that include a <code>?</code></h4>
<pre>User-agent: *
Disallow: /*?*</pre>
<h3>Matching the end characters of the URL using <code>$</code></h3>
<p>You can use the <code>$</code> character to specify matching the end of the URL.</p>
<h4>Block any URLs that end with <code>.php</code></h4>
<pre>User-Agent: Googlebot
Disallow: /*.php$</pre>
<h5>You can use this pattern matching in combination with the Allow directive.</h5>
<h4>Exclude all URLs that contain ? to ensure Googlebot doesn&#8217;t crawl duplicate pages.  URLs that end with a ? DO get crawled</h4>
<pre>User-agent: *
Allow: /*?$
Disallow: /*?</pre>
<p><code>Disallow:/*?</code> blocks any URL that begins with HOST, followed by any string, followed by a <code>?</code>, followed by any string</p>
<p><code>Allow: /*?$</code> allows any URL that begins with HOST, followed by any string, followed by a <code>?</code>, with no characters after the <code>?</code></p>
<h6><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40367&#038;ctx=sibling" rel="nofollow"  title="I don't want to list every file that I want to block. Can I use pattern matching?">SRC</a></h6>
<h2>User-Agent Discussion</h2>
<h3>Blocking a specific User-Agent</h3>
<p><strong>Note</strong>: Blocking Googlebot blocks all bots that begin with &#8220;Googlebot&#8221;</p>
<h4>Block Googlebot entirely</h4>
<pre>User-agent: Googlebot
Disallow: /</pre>
<h3>Allowing a specific User-Agent</h3>
<p><strong>Note</strong>: Googlebot follows the line directed at it, rather than the line directed at everyone.</p>
<h4>Block access to all bots other than &#8220;Googlebot&#8221;</h4>
<pre>User-agent: *
Disallow: /
&nbsp;
User-agent: Googlebot
Disallow:</pre>
<p>Googlebot recognizes an extension to the robots.txt standard called Allow, which is opposite of Disallow.</p>
<h4>Block all pages inside a subdirectory except for single file</h4>
<pre>User-Agent: Googlebot
Disallow: /folder1/
Allow: /folder1/myfile.html</pre>
<h4>Block Googlebot but allow other Bot</h4>
<pre>User-agent: Googlebot
Disallow: /
&nbsp;
User-agent: Googlebot-Mobile
Allow: </pre>
<h6><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364&#038;topic=8846" rel="nofollow"  title="How Do I Block Googlebot?">SRC</a></h6>
<h2><a name="removing-content-from-google" id="removing-content-from-google" title="SEO: Removing Content From Google">Removing Content From Google</a></h2>
<p>It is better to use <code><br />
    <meta name="Googlebot" content="Follow,NoIndex"><br />
    </code> on pages that have been indexed if you wish google to drop them. This way it is much faster than blocked using robots.txt.</p>
<p><strong>Note</strong>: removing snippets also removes cached pages.</p>
<p>A snippet is a text excerpt that appears below a page&#8217;s title in our search results and describes the content of the page.</p>
<h4>Prevent Google from displaying snippets for your page</h4>
<pre>&lt;META NAME=&quot;GOOGLEBOT&quot; CONTENT=&quot;NOSNIPPET&quot;&gt;</pre>
<h3>Remove an outdated &#8220;dead&#8221; link</h3>
<p>Google updates its entire index automatically on a regular basis. When we crawl the web, we find new pages, discard dead links, and update links automatically. Links that are outdated now will most likely &#8220;fade out&#8221; of our index during our next crawl.</p>
<p><strong>Note</strong>: Please ensure that you return a true 404 error even if you choose to display a more user-friendly body of the HTML page for your visitors. It won&#8217;t help to return a page that says &#8220;File Not Found&#8221; if the http headers still return a status code of 200, or normal.</p>
<h3>Remove cached pages</h3>
<p>Google automatically takes a &#8220;snapshot&#8221; of each page it crawls and archives it. This &#8220;cached&#8221; version allows a webpage to be retrieved for your end users if the original page is ever unavailable (due to temporary failure of the page&#8217;s web server). The cached page appears to users exactly as it looked when Google last crawled it, and we display a message at the top of the page to indicate that it&#8217;s a cached version. Users can access the cached version by choosing the &#8220;Cached&#8221; link on the search results page.</p>
<h4>Prevent all search engines from showing a &#8220;Cached&#8221; link for your site</h4>
<pre>&lt;meta name=&quot;robots&quot; content=&quot;noarchive&quot; /&gt;</pre>
<h4>Allow other search engines to show a &#8220;Cached&#8221; link, preventing only Google</h4>
<pre>&lt;meta name=&quot;googlebot&quot; content=&quot;noarchive&quot; /&gt;</pre>
<p><strong>Note</strong>: this tag only removes the &#8220;Cached&#8221; link for the page. Google will continue to index the page and display a snippet.</p>
<h2>Remove your entire website</h2>
<p>If you wish to exclude your entire website from Google&#8217;s index</p>
<h4>Remove site from search engines and prevent all robots from crawling it in the future</h4>
<pre>User-agent: *
Disallow: /</pre>
<p><strong>Note</strong>: Please note that Googlebot does not interpret a 401/403 response (&#8220;Unauthorized&#8221;/&#8221;Forbidden&#8221;) to a robots.txt fetch as a request not to crawl any pages on the site.</p>
<h4>To remove your site from Google only and prevent just Googlebot from crawling your site in the future</h4>
<pre>User-agent: Googlebot
Disallow: /</pre>
<h3>Allow Googlebot to index all http pages but no https pages</h3>
<p>Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you&#8217;ll need a separate robots.txt file for each of these protocols.</p>
<h4>For your http protocol (http://yourserver.com/robots.txt)</h4>
<pre>User-agent: *
Allow: /</pre>
<h4>For the https protocol (https://yourserver.com/robots.txt)</h4>
<pre>User-agent: *
Disallow: /</pre>
<h2>Remove part of your website</h2>
<h3>Option 1: Robots.txt</h3>
<h4>Remove all pages under a particular directory (for example, lems)</h4>
<pre>User-agent: Googlebot
Disallow: /lems</pre>
<h4>Remove all files of a specific file type (for example, .gif)</h4>
<pre>User-agent: Googlebot
Disallow: /*.gif$</pre>
<h4>To remove dynamically generated pages, you&#8217;d use this robots.txt entry</h4>
<pre>User-agent: Googlebot
Disallow: /*?</pre>
<h3>Option 2: Meta tags</h3>
<p>Another standard, which can be more convenient for page-by-page use, involves adding a META tag to an HTML page to tell robots not to index the page. This standard is described at http://www.robotstxt.org/wc/exclusion.html#meta.</p>
<h4>Prevent all robots from indexing a page on your site</h4>
<pre>&lt;meta name=&quot;robots&quot; content=&quot;noindex, nofollow&quot; /&gt;</pre>
<h4>Allow other robots to index the page on your site, preventing only Google&#8217;s robots from indexing the page</h4>
<pre>&lt;meta name=&quot;googlebot&quot; content=&quot;noindex, nofollow&quot; /&gt;</pre>
<h4>Allow robots to index the page on your site but instruct them not to follow outgoing links</h4>
<pre>&lt;meta name=&quot;robots&quot; content=&quot;nofollow&quot; /&gt;</pre>
<h3>Remove an image from Google&#8217;s Image Search</h3>
<h4>Want Google to exclude the dogs.jpg image that appears on your site at www.yoursite.com/images/dogs.jpg</h4>
<pre>User-agent: Googlebot-Image
Disallow: /images/dogs.jpg</pre>
<h4>Remove all the images on your site from our index</h4>
<pre>User-agent: Googlebot-Image
Disallow: /</pre>
<h4>Remove all files of a specific file type (for example, to include .jpg but not .gif images)</h4>
<pre>User-agent: Googlebot-Image
Disallow: /*.gif$</pre>
<p><strong>Note</strong>: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, the webmaster or <a href="http://www.teknicks.com/" rel="nofollow"  title="SEO agency">SEO agency</a> must first create and place a robots.txt file on the site in question.</p>
<p>Google will continue to exclude your site or directories from successive crawls if the robots.txt file exists in the web server root. If you do not have access to the root level of your server, you may place a robots.txt file at the same level as the files you want to remove. Doing this and submitting via the automatic URL removal system will cause a temporary, 180 day removal of the directories specified in your robots.txt file from the Google index, regardless of whether you remove the robots.txt file after processing your request. (Keeping the robots.txt file at the same level would require you to return to the URL removal system every 180 days to reissue the removal.)</p>
<h3>Remove a blog from Blog Search</h3>
<p>Only blogs with site feeds will be included in Blog Search. If you&#8217;d like to prevent your feed from being crawled, make use of a robots.txt file or meta tags (NOINDEX or NOFOLLOW), as described above. Please note that if you have a feed that was previously included, the old posts will remain in the index even though new ones will not be added.</p>
<h3>Remove a RSS or Atom feed</h3>
<p>When users add your feed to their Google homepage or Google Reader, Google&#8217;s Feedfetcher attempts to obtain the content of the feed in order to display it. Since Feedfetcher requests come from explicit action by human users, Feedfetcher has been designed to ignore robots.txt guidelines.</p>
<p>It&#8217;s not possible for Google to restrict access to a publicly available feed. If your feed is provided by a blog hosting service, you should work with them to restrict access to your feed. Check those sites&#8217; help content for more information (e.g., Blogger, LiveJournal, or Typepad).</p>
<h3>Remove transcoded pages</h3>
<p>Google Web Search on mobile phones allows users to search all the content in the Google index for desktop web browsers. Because this content isn&#8217;t written specifically for mobile phones and devices and thus might not display properly, Google automatically translates (or &#8220;transcodes&#8221;) these pages by analyzing the original HTML code and converting it to a mobile-ready format. To ensure that the highest quality and most useable web page is displayed on your mobile phone or device, Google may resize, adjust, or convert images, text formatting and/or certain aspects of web page functionality.</p>
<h6>SRC</h6>
<p>To save bandwidth, Googlebot only downloads the robots.txt file once a day or whenever we&#8217;ve fetched many pages from the server. So, it may take a while for Googlebot to learn of changes to your robots.txt file. Also, Googlebot is distributed on several machines. Each of these keeps its own record of your robots.txt file.</p>
<p>Also, there&#8217;s a small difference between the way Googlebot handles the robots.txt file and the way the robots.txt standard says we should (keeping in mind the distinction between &#8220;should&#8221; and &#8220;must&#8221;). The standard says we should obey the first applicable rule, whereas Googlebot obeys the longest (that is, the most specific) applicable rule. This more intuitive practice matches what people actually do, and what they expect us to do.</p>
<h4>For example, consider the following robots.txt file:</h4>
<pre>User-Agent: *
Allow: /
Disallow: /cgi-bin</pre>
<p>It&#8217;s obvious that the webmaster&#8217;s intent here is to allow robots to crawl everything except the /cgi-bin directory. Consequently, that&#8217;s what we do.</p>
<h3>Tell googlebot not to count certain external links in your ranking</h3>
<p>Meta tags can exclude all outgoing links on a page, but you can also instruct Googlebot not to crawl individual links by adding rel=&#8221;nofollow&#8221; to a hyperlink. When Google sees the attribute rel=&#8221;nofollow&#8221; on hyperlinks, those links won&#8217;t get any credit when we rank websites in our search results. For example a link, <a href="http://www.example.com/" rel="nofollow" >This is a great link!</a> could be replaced with <a href="http://www.example.com/" rel="nofollow">I can&#8217;t vouch for this link</a>.</p>
<h2>Other Links</h2>
<ol>
<li>Database of Web Robots, Overview</li>
</ol>
<hr class="HR0" />
<pre># robots.txt, www.nytimes.com 6/29/2006
User-agent: *
Disallow: /pages/college/
Disallow: /college/
Disallow: /library/
Disallow: /learning/
Disallow: /aponline/
Disallow: /reuters/
Disallow: /cnet/
Disallow: /partners/
Disallow: /archives/
Disallow: /indexes/
Disallow: /thestreet/
Disallow: /nytimes-partners/
Disallow: /financialtimes/
Allow: /pages/
Allow: /2003/
Allow: /2004/
Allow: /2005/
Allow: /top/
Allow: /ref/
Allow: /services/xml/
&nbsp;
User-agent: Mediapartners-Google*
Disallow:
&nbsp;
# robots.txt, http://dictionary.reference.com
User-agent: Googlebot
Disallow:
&nbsp;
User-agent: Mediapartners-Google
Disallow:
&nbsp;
User-agent: Teleport Pro
Disallow: /
&nbsp;
User-agent: *
Disallow: /cgi-bin/
&nbsp;
# robots.txt for www.phpbbhacks.com
User-agent: *
Disallow: /forums/viewtopic.php
Disallow: /forums/viewforum.php
Disallow: /forums/index.php?
Disallow: /forums/posting.php
Disallow: /forums/groupcp.php
Disallow: /forums/search.php
Disallow: /forums/login.php
Disallow: /forums/privmsg.php
Disallow: /forums/post
Disallow: /forums/profile.php
Disallow: /forums/memberlist.php
Disallow: /forums/faq.php
Disallow: /forums/archive
&nbsp;
# robots.txt for Slashdot.org
#
# &quot;Any empty [Disallow] value, indicates that all URLs can be retrieved.
# At least one Disallow field needs to be present in a record.&quot;
&nbsp;
User-agent: Mediapartners-Google
Disallow:
&nbsp;
User-agent: Googlebot
Crawl-delay: 100
Disallow: /firehose.pl
Disallow: /submit.pl
Disallow: /comments.pl
Disallow: /users.pl
Disallow: /zoo.pl
Disallow: firehose.pl
Disallow: submit.pl
Disallow: comments.pl
Disallow: users.pl
Disallow: zoo.pl
Disallow: /~
Disallow: ~
&nbsp;
User-agent: Slurp
Crawl-delay: 100
Disallow:
&nbsp;
User-agent: Yahoo-NewsCrawler
Disallow:
&nbsp;
User-Agent: msnbot
Crawl-delay: 100
Disallow:
&nbsp;
User-agent: *
Crawl-delay: 100
Disallow: /authors.pl
Disallow: /index.pl
Disallow: /article.pl
Disallow: /comments.pl
Disallow: /firehose.pl
Disallow: /journal.pl
Disallow: /messages.pl
Disallow: /metamod.pl
Disallow: /users.pl
Disallow: /search.pl
Disallow: /submit.pl
Disallow: /pollBooth.pl
Disallow: /pubkey.pl
Disallow: /topics.pl
Disallow: /zoo.pl
Disallow: /palm
Disallow: authors.pl
Disallow: index.pl
Disallow: article.pl
Disallow: comments.pl
Disallow: firehose.pl
Disallow: journal.pl
Disallow: messages.pl
Disallow: metamod.pl
Disallow: users.pl
Disallow: search.pl
Disallow: submit.pl
Disallow: pollBooth.pl
Disallow: pubkey.pl
Disallow: topics.pl
Disallow: zoo.pl
Disallow: /~
Disallow: ~
&nbsp;
# robots.txt for http://www.myspace.com
User-agent: ia_archiver
Disallow: /
&nbsp;
# robots.txt for http://www.craigslist.com
User-agent: YahooFeedSeeker
Disallow: /forums
Disallow: /res/
Disallow: /post
Disallow: /email.friend
Disallow: /?flagCode
Disallow: /ccc
Disallow: /hhh
Disallow: /sss
Disallow: /bbb
Disallow: /ggg
Disallow: /jjj
&nbsp;
User-agent: *
Disallow: /cgi-bin
Disallow: /cgi-secure
Disallow: /forums
Disallow: /search
Disallow: /res/
Disallow: /post
Disallow: /email.friend
Disallow: /?flagCode
Disallow: /ccc
Disallow: /hhh
Disallow: /sss
Disallow: /bbb
Disallow: /ggg
Disallow: /jjj
&nbsp;
User-Agent: OmniExplorer_Bot
Disallow: /
&nbsp;
# robots.txt for http://www.alexa.com
User-agent: googlebot  # allow Google crawler
Disallow: /search
&nbsp;
User-agent: gulliver  # allow Northern Light crawler
Disallow: /search
&nbsp;
User-agent: slurp  # allow Inktomi crawler
Disallow: /search
&nbsp;
User-agent: fast  # allow FAST crawler
Disallow: /search
&nbsp;
User-agent: scooter  # allow AltaVista crawler
Disallow: /search
&nbsp;
User-agent: vscooter  # allow AltaVista image crawler
Disallow: /search
&nbsp;
User-agent: ia_archiver  # allow Internet Archive crawler
Disallow: /search
&nbsp;
User-agent: *    # Disallow all other crawlers access
Disallow: /
&nbsp;
# robots.txt for http://www.technorati.com
User-agent: NPBot
Disallow: /
&nbsp;
User-agent: TurnitinBot
Disallow: /
&nbsp;
User-Agent: sitecheck.internetseer.com
Disallow: /
&nbsp;
User-Agent: *
Crawl-Delay: 3
Disallow: /search/
Disallow: /search.php
Disallow: /cosmos.php
&nbsp;
# robots.txt for www.sitepoint.com
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /forums/report.php
Disallow: /forums/search.php
Disallow: /forums/newreply.php
Disallow: /forums/editpost.php
Disallow: /forums/memberlist.php
Disallow: /forums/profile.php
Disallow: /launch/
Disallow: /search/
Disallow: /voucher/424/
Disallow: /email/
Disallow: /feedback/
Disallow: /contact?reason=articlesuggest
Disallow: /linktothis/
Disallow: /popup/
Disallow: /forums/archive/
&nbsp;
# robots.txt for http://www.w3.org/
&nbsp;
# For use by search.w3.org
User-agent: W3C-gsa
Disallow: /Out-Of-Date
&nbsp;
User-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)
Disallow: /
&nbsp;
# W3C Link checker
User-agent: W3C-checklink
Disallow:
&nbsp;
# exclude some access-controlled areas
User-agent: *
Disallow: /2004/ontaria/basic
Disallow: /Team
Disallow: /Project
Disallow: /Systems
Disallow: /Web
Disallow: /History
Disallow: /Out-Of-Date
Disallow: /2002/02/mid
Disallow: /mid/
Disallow: /People/all/
Disallow: /RDF/Validator/ARPServlet
Disallow: /2003/03/Translations/byLanguage
Disallow: /2003/03/Translations/byTechnology
Disallow: /2005/11/Translations/Query
Disallow: /2003/glossary/subglossary/
#Disallow: /2005/06/blog/
#Disallow: /2001/07/pubrules-checker
#shouldnt get transparent proxies but will ml links of things like pubrules
Disallow: /2000/06/webdata/xslt
Disallow: /2000/09/webdata/xslt
Disallow: /2005/08/online_xslt/xslt
Disallow: /Bugs/
Disallow: /Search/Mail/Public/
Disallow: /2006/02/chartergen
&nbsp;
# robots.txt for www.google-analytics.com
User-Agent: *
Disallow: /
Noindex: /
&nbsp;
# robots.txt for video.google.com
User-agent: *
Disallow: /videosearch?
Disallow: /videofeed?
Disallow: /videopreview?
Disallow: /videopreviewbig?
Disallow: /videoprograminfo?
Disallow: /videorandom
Disallow: /videolineup
Disallow: /downloadgvp
&nbsp;
# robots.txt for www.google.com
User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&amp;
Allow: /news?output=xhtml
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /m?
Disallow: /m/search?
Disallow: /wml?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /froogle_
Disallow: /print
Disallow: /books
Disallow: /patents?
Disallow: /scholar?
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /translate?
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /notebook/search?
Disallow: /music
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/search?
Disallow: /base/reportbadoffer
Disallow: /base/s2
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Disallow: /finance
Disallow: /reviews/search?
&nbsp;
# robots.txt for validator.w3.org
# $Id: robots.txt,v 1.3 2000/12/13 13:04:09 gerald Exp $
&nbsp;
User-agent: *
Disallow: /check
&nbsp;
# robots.txt for httpd.apache.org
User-agent: *
Disallow: /websrc
&nbsp;
# robots.txt for www.apache.org
User-agent: *
Disallow: /websrc
Crawl-Delay: 4</pre>
<hr class="HR0" />
<pre>#  Please, we do NOT allow nonauthorized robots.
#  http://www.webmasterworld.com/robots
#  Actual robots can always be found here for: http://www.webmasterworld.com/robots2
#  Old full robots.txt can be found here: http://www.webmasterworld.com/robots3
#  Any unauthorized bot running will result in IP&#039;s being banned.
#  Agent spoofing is considered a bot.
#  Fair warning to the clueless - honey pots are - and have been - running.
#  If you have been banned for bot running - please sticky an admin for a reinclusion request.
#  http://www.searchengineworld.com/robots/
#  This code found here: http://www.webmasterworld.com/robots.txt?view=rawcode
&nbsp;
User-agent: *
Crawl-delay: 17
&nbsp;
User-agent: *
Disallow: /gfx/
Disallow: /cgi-bin/
Disallow: /QuickSand/
Disallow: /pda/
Disallow: /zForumFFFFFF/</pre>
<pre># WebmasterWorld.com: robots.txt
# GNU Robots.txt Feel free to use with credit
# given to WebmasterWorld.
# Please, we do NOT allow nonauthorized robots any longer.
# http://www.searchengineworld.com/robots/
# Yes, feel free to copy and use the following.
&nbsp;
User-agent: OmniExplorer_Bot
Disallow: /
&nbsp;
User-agent: FreeFind
Disallow: /
&nbsp;
User-agent: BecomeBot
Disallow: /
&nbsp;
User-agent: Nutch
Disallow: /
&nbsp;
User-agent: Jetbot/1.0
Disallow: /
&nbsp;
User-agent: Jetbot
Disallow: /
&nbsp;
User-agent: WebVac
Disallow: /
&nbsp;
User-agent: Stanford
Disallow: /
&nbsp;
User-agent: naver
Disallow: /
&nbsp;
User-agent: dumbot
Disallow: /
&nbsp;
User-agent: Hatena Antenna
Disallow: /
&nbsp;
User-agent: grub-client
Disallow: /
&nbsp;
User-agent: grub
Disallow: /
&nbsp;
User-agent: looksmart
Disallow: /
&nbsp;
User-agent: WebZip
Disallow: /
&nbsp;
User-agent: larbin
Disallow: /
&nbsp;
User-agent: b2w/0.1
Disallow: /
&nbsp;
User-agent: Copernic
Disallow: /
&nbsp;
User-agent: psbot
Disallow: /
&nbsp;
User-agent: Python-urllib
Disallow: /
&nbsp;
User-agent: Googlebot-Image
Disallow: /
&nbsp;
User-agent: NetMechanic
Disallow: /
&nbsp;
User-agent: URL_Spider_Pro
Disallow: /
&nbsp;
User-agent: CherryPicker
Disallow: /
&nbsp;
User-agent: EmailCollector
Disallow: /
&nbsp;
User-agent: EmailSiphon
Disallow: /
&nbsp;
User-agent: WebBandit
Disallow: /
&nbsp;
User-agent: EmailWolf
Disallow: /
&nbsp;
User-agent: ExtractorPro
Disallow: /
&nbsp;
User-agent: CopyRightCheck
Disallow: /
&nbsp;
User-agent: Crescent
Disallow: /
&nbsp;
User-agent: SiteSnagger
Disallow: /
&nbsp;
User-agent: ProWebWalker
Disallow: /
&nbsp;
User-agent: CheeseBot
Disallow: /
&nbsp;
User-agent: LNSpiderguy
Disallow: /
&nbsp;
User-agent: Mozilla
Disallow: /
&nbsp;
User-agent: mozilla
Disallow: /
&nbsp;
User-agent: mozilla/3
Disallow: /
&nbsp;
User-agent: mozilla/4
Disallow: /
&nbsp;
User-agent: mozilla/5
Disallow: /
&nbsp;
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /
&nbsp;
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /
&nbsp;
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
Disallow: /
&nbsp;
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
Disallow: /
&nbsp;
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
Disallow: /
&nbsp;
User-agent: ia_archiver
Disallow: /
&nbsp;
User-agent: ia_archiver/1.6
Disallow: /
&nbsp;
User-agent: Alexibot
Disallow: /
&nbsp;
User-agent: Teleport
Disallow: /
&nbsp;
User-agent: TeleportPro
Disallow: /
&nbsp;
User-agent: Stanford Comp Sci
Disallow: /
&nbsp;
User-agent: MIIxpc
Disallow: /
&nbsp;
User-agent: Telesoft
Disallow: /
&nbsp;
User-agent: Website Quester
Disallow: /
&nbsp;
User-agent: moget/2.1
Disallow: /
&nbsp;
User-agent: WebZip/4.0
Disallow: /
&nbsp;
User-agent: WebStripper
Disallow: /
&nbsp;
User-agent: WebSauger
Disallow: /
&nbsp;
User-agent: WebCopier
Disallow: /
&nbsp;
User-agent: NetAnts
Disallow: /
&nbsp;
User-agent: Mister PiX
Disallow: /
&nbsp;
User-agent: WebAuto
Disallow: /
&nbsp;
User-agent: TheNomad
Disallow: /
&nbsp;
User-agent: WWW-Collector-E
Disallow: /
&nbsp;
User-agent: RMA
Disallow: /
&nbsp;
User-agent: libWeb/clsHTTP
Disallow: /
&nbsp;
User-agent: asterias
Disallow: /
&nbsp;
User-agent: httplib
Disallow: /
&nbsp;
User-agent: turingos
Disallow: /
&nbsp;
User-agent: spanner
Disallow: /
&nbsp;
User-agent: InfoNaviRobot
Disallow: /
&nbsp;
User-agent: Harvest/1.5
Disallow: /
&nbsp;
User-agent: Bullseye/1.0
Disallow: /
&nbsp;
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
&nbsp;
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
&nbsp;
User-agent: CherryPickerSE/1.0
Disallow: /
&nbsp;
User-agent: CherryPickerElite/1.0
Disallow: /
&nbsp;
User-agent: WebBandit/3.50
Disallow: /
&nbsp;
User-agent: NICErsPRO
Disallow: /
&nbsp;
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
&nbsp;
User-agent: DittoSpyder
Disallow: /
&nbsp;
User-agent: Foobot
Disallow: /
&nbsp;
User-agent: WebmasterWorldForumBot
Disallow: /
&nbsp;
User-agent: SpankBot
Disallow: /
&nbsp;
User-agent: BotALot
Disallow: /
&nbsp;
User-agent: lwp-trivial/1.34
Disallow: /
&nbsp;
User-agent: lwp-trivial
Disallow: /
&nbsp;
User-agent: http://www.WebmasterWorld.com bot
Disallow: /
&nbsp;
User-agent: BunnySlippers
Disallow: /
&nbsp;
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
&nbsp;
User-agent: URLy Warning
Disallow: /
&nbsp;
User-agent: Wget/1.6
Disallow: /
&nbsp;
User-agent: Wget/1.5.3
Disallow: /
&nbsp;
User-agent: Wget
Disallow: /
&nbsp;
User-agent: LinkWalker
Disallow: /
&nbsp;
User-agent: cosmos
Disallow: /
&nbsp;
User-agent: moget
Disallow: /
&nbsp;
User-agent: hloader
Disallow: /
&nbsp;
User-agent: humanlinks
Disallow: /
&nbsp;
User-agent: LinkextractorPro
Disallow: /
&nbsp;
User-agent: Offline Explorer
Disallow: /
&nbsp;
User-agent: Mata Hari
Disallow: /
&nbsp;
User-agent: LexiBot
Disallow: /
&nbsp;
User-agent: Web Image Collector
Disallow: /
&nbsp;
User-agent: The Intraformant
Disallow: /
&nbsp;
User-agent: True_Robot/1.0
Disallow: /
&nbsp;
User-agent: True_Robot
Disallow: /
&nbsp;
User-agent: BlowFish/1.0
Disallow: /
&nbsp;
User-agent: http://www.SearchEngineWorld.com bot
Disallow: /
&nbsp;
User-agent: http://www.WebmasterWorld.com bot
Disallow: /
&nbsp;
User-agent: JennyBot
Disallow: /
&nbsp;
User-agent: MIIxpc/4.2
Disallow: /
&nbsp;
User-agent: BuiltBotTough
Disallow: /
&nbsp;
User-agent: ProPowerBot/2.14
Disallow: /
&nbsp;
User-agent: BackDoorBot/1.0
Disallow: /
&nbsp;
User-agent: toCrawl/UrlDispatcher
Disallow: /
&nbsp;
User-agent: WebEnhancer
Disallow: /
&nbsp;
User-agent: suzuran
Disallow: /
&nbsp;
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
&nbsp;
User-agent: VCI
Disallow: /
&nbsp;
User-agent: Szukacz/1.4
Disallow: /
&nbsp;
User-agent: QueryN Metasearch
Disallow: /
&nbsp;
User-agent: Openfind data gathere
Disallow: /
&nbsp;
User-agent: Openfind
Disallow: /
&nbsp;
User-agent: Xenu&#039;s Link Sleuth 1.1c
Disallow: /
&nbsp;
User-agent: Xenu&#039;s
Disallow: /
&nbsp;
User-agent: Zeus
Disallow: /
&nbsp;
User-agent: RepoMonkey Bait &amp; Tackle/v1.01
Disallow: /
&nbsp;
User-agent: RepoMonkey
Disallow: /
&nbsp;
User-agent: Microsoft URL Control
Disallow: /
&nbsp;
User-agent: Openbot
Disallow: /
&nbsp;
User-agent: URL Control
Disallow: /
&nbsp;
User-agent: Zeus Link Scout
Disallow: /
&nbsp;
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
&nbsp;
User-agent: Webster Pro
Disallow: /
&nbsp;
User-agent: EroCrawler
Disallow: /
&nbsp;
User-agent: LinkScan/8.1a Unix
Disallow: /
&nbsp;
User-agent: Keyword Density/0.9
Disallow: /
&nbsp;
User-agent: Kenjin Spider
Disallow: /
&nbsp;
User-agent: Iron33/1.0.2
Disallow: /
&nbsp;
User-agent: Bookmark search tool
Disallow: /
&nbsp;
User-agent: GetRight/4.2
Disallow: /
&nbsp;
User-agent: FairAd Client
Disallow: /
&nbsp;
User-agent: Gaisbot
Disallow: /
&nbsp;
User-agent: Aqua_Products
Disallow: /
&nbsp;
User-agent: Radiation Retriever 1.1
Disallow: /
&nbsp;
User-agent: WebmasterWorld Extractor
Disallow: /
&nbsp;
User-agent: Flaming AttackBot
Disallow: /
&nbsp;
User-agent: Oracle Ultra Search
Disallow: /
&nbsp;
User-agent: MSIECrawler
Disallow: /
&nbsp;
User-agent: PerMan
Disallow: /
&nbsp;
User-agent: searchpreview
Disallow: /
&nbsp;
User-agent: sootle
Disallow: /
&nbsp;
User-agent: es
Disallow: /
&nbsp;
User-agent: Enterprise_Search/1.0
Disallow: /
&nbsp;
User-agent: Enterprise_Search
Disallow: /
&nbsp;
User-agent: *
Disallow: /gfx/
Disallow: /cgi-bin/
Disallow: /QuickSand/
Disallow: /pda/
Disallow: /zForumFFFFFF/</pre>
<hr class="HR0" />
<p>You don&#8217;t have to block your feeds from indexing. Matt Cutts himself<br />
    suggested not to block those because there is not real reason to doing<br />
    so. If you are talking about a blog and its /feed and such urls it<br />
    won&#8217;t cause mess in your rankings, so my suggesting would be not touch<br />
    those feeds.</p>
<p>As for blocking, no you won&#8217;t have any affection to your main url if<br />
    you block /feed or whatever urls you want. Gbot will just deindex them<br />
    and stop crawling them.</p>
<h2>Wget Robot Exclusion</h2>
<p>It is extremely easy to make Wget wander aimlessly around a web site, sucking all the available data in progress. `wget -r site&#8217;, and you&#8217;re set. Great? Not for the server admin.</p>
<p>As long as Wget is only retrieving static pages, and doing it at a reasonable rate (see the `&#8211;wait&#8217; option), there&#8217;s not much of a problem. The trouble is that Wget can&#8217;t tell the difference between the smallest static page and the most demanding CGI. A site I know has a section handled by an, uh, bitchin&#8217; CGI Perl script that converts Info files to HTML on the fly. The script is slow, but works well enough for human users viewing an occasional Info file. However, when someone&#8217;s recursive Wget download stumbles upon the index page that links to all the Info files through the script, the system is brought to its knees without providing anything useful to the downloader.</p>
<p>To avoid this kind of accident, as well as to preserve privacy for documents that need to be protected from well-behaved robots, the concept of robot exclusion has been invented. The idea is that the server administrators and document authors can specify which portions of the site they wish to protect from the robots.</p>
<p>The most popular mechanism, and the de facto standard supported by all the major robots, is the &#8220;Robots Exclusion Standard&#8221; (RES) written by Martijn Koster et al. in 1994. It specifies the format of a text file containing directives that instruct the robots which URL paths to avoid. To be found by the robots, the specifications must be placed in <code>`/robots.txt'</code> in the server root, which the robots are supposed to download and parse.</p>
<p>Although Wget is not a web robot in the strictest sense of the word, it can downloads large parts of the site without the user&#8217;s intervention to download an individual page. Because of that, Wget honors RES when downloading recursively. For instance, when you issue: <code>wget -r http://www.server.com/</code> First the index of `www.server.com&#8217; will be downloaded. If Wget finds that it wants to download more documents from that server, it will request <code>`http://www.server.com/robots.txt'</code> and, if found, use it for further downloads. `robots.txt&#8217; is loaded only once per each server.</p>
<p>Until version 1.8, Wget supported the first version of the standard, written by Martijn Koster in 1994 and available at http://www.robotstxt.org/wc/norobots.html. As of version 1.8, Wget has supported the additional directives specified in the internet draft draft-koster-robots-00.txt titled &#8220;A Method for Web Robots Control&#8221;. The draft, which has as far as I know never made to an RFC, is available at http://www.robotstxt.org/wc/norobots-rfc.txt.</p>
<p>This manual no longer includes the text of the Robot Exclusion Standard.</p>
<p>The second, less known mechanism, enables the author of an individual document to specify whether they want the links from the file to be followed by a robot. This is achieved using the META tag, like this: <code><br />
<meta name="robots" content="nofollow"><br />
</code> This is explained in some detail at http://www.robotstxt.org/wc/meta-user.html. Wget supports this method of robot exclusion in addition to the usual <code>/robots.txt</code> exclusion.</p>
<p>If you know what you are doing and really really wish to turn off the robot exclusion, set the robots variable to `off&#8217; in your `.wgetrc&#8217;. You can achieve the same effect from the command line using the -e switch, e.g. <code>wget -e robots=off url</code></p>
<p><a href="http://www.askapache.com/seo/seo-with-robotstxt.html"></a><a href="http://www.askapache.com/seo/seo-with-robotstxt.html">SEO with Robots.txt</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/seo/seo-with-robotstxt.html/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Mixed SSL Content Warning Secure Fixed</title>
		<link>http://www.askapache.com/htaccess/mixed-ssl-content-warning-secure-fixed.html</link>
		<comments>http://www.askapache.com/htaccess/mixed-ssl-content-warning-secure-fixed.html#comments</comments>
		<pubDate>Mon, 09 Apr 2007 02:33:17 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[DreamHost]]></category>
		<category><![CDATA[Htaccess]]></category>
		<category><![CDATA[phpBB]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[askapache]]></category>
		<category><![CDATA[Elite]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HTTPS SSL]]></category>
		<category><![CDATA[post]]></category>
		<category><![CDATA[Rewrite Tricks]]></category>
		<category><![CDATA[rewriterule]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.askapache.com.com/htaccess-forum/mixed-ssl-content-warning-secure-fixed.html</guid>
		<description><![CDATA[I used to run into the problem of having warning messages pop up when I accessed a page with secure and non-secure information and finally I found something that let me post non-ssl encrypted content on an ssl encrypted page and no warning messages will pop up!]]></description>
			<content:encoded><![CDATA[<p>I used to run into the problem of having warning messages pop up when I accessed a page with secure and non-secure information and finally I found something that let me post non-ssl encrypted content on an ssl encrypted page and no warning messages will pop up!</p>
<hr />
<p>On on of my secure sites, lets say https://www.askapache.com/htaccess/ I wanted to give my visitors the option to post youtube videos and google videos. But then everyone kept seeing the &#8220;warning, mixed secure/non-secure content&#8221; on a page that had one of these videos.</p>
<p>I basically did a str_replace on all posts text for http://video.google.com and http://youtube.com to change them to https://www.example.com/htaccess/</p>
<p>Then I used the following rewrite code:</p>
<pre>RewriteEngine On
RewriteBase /
RewriteRule ^htaccess/googleplayer\.swf(.*)$ http://video.google.com/googleplayer.swf$1 [L]
RewriteRule ^htaccess/youtube/(.*)$ http://www.youtube.com/$1 [L]</pre>
<p>and it worked!</p>
<p>But one caveat, this wouldn&#8217;t turn the warning messages off in <code>IE < version 7</code>, so I added some simple HTML to the head of all my pages that only show up for people using <code>IE < version 7</code>.</p>
<pre>&lt;!&#45;-[if lt IE 7]&gt;
&lt;span id=&quot;ie7&quot;&gt;Please Upgrade:
&lt;a href=&quot;http://www.microsoft.com/windows/ie/downloads/default.mspx?mg_id=10013&quot;&gt;IE 7!&lt;/a&gt;
&lt;a href=&quot;http://www.mozilla.com/en-US/&quot;&gt;FF!&lt;/a&gt;&lt;/span&gt;
&lt;![endif]“&gt;</pre>
<p>Cool huh!</p>
<p><a href="http://www.htaccesselite.com/fix-for-secure-and-nonsecure-items-vt128.html" rel="nofollow" >Fix for secure and nonsecure items warning message</a></p>
<p><a href="http://www.askapache.com/htaccess/mixed-ssl-content-warning-secure-fixed.html"></a><a href="http://www.askapache.com/htaccess/mixed-ssl-content-warning-secure-fixed.html">Mixed SSL Content Warning Secure Fixed</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/htaccess/mixed-ssl-content-warning-secure-fixed.html/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Make phpBB SEO friendly with htaccess</title>
		<link>http://www.askapache.com/phpbb/using-last-modified-etag-mod_expires-mod_headers-to-seo-phpbb.html</link>
		<comments>http://www.askapache.com/phpbb/using-last-modified-etag-mod_expires-mod_headers-to-seo-phpbb.html#comments</comments>
		<pubDate>Tue, 07 Nov 2006 10:29:42 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[phpBB]]></category>
		<category><![CDATA[301 Redirect]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Apache Modules]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cache-Control]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[Elite]]></category>
		<category><![CDATA[Etags]]></category>
		<category><![CDATA[GET]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Htaccess]]></category>
		<category><![CDATA[HTTP Headers]]></category>
		<category><![CDATA[httpd]]></category>
		<category><![CDATA[Last-Modified]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[post]]></category>
		<category><![CDATA[Rewrite Tricks]]></category>
		<category><![CDATA[rewriterule]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[stat]]></category>

		<guid isPermaLink="false">http://www.askapache.com.com/phpbb/using-last-modified-etag-mod_expires-mod_headers-to-seo-phpbb.html</guid>
		<description><![CDATA[<p>After I optimized the caching for my non-dynamic pages using apache modules mod_headers and mod_expires, I began to learn about phpbb seo..</p>]]></description>
			<content:encoded><![CDATA[<p>After I optimized the caching for my non-dynamic pages using apache modules mod_headers and mod_expires, I began to learn about phpbb seo..  First I used a mod on the forum, where instead of <code>htaccesselite.com/index.php</code> the url is <code>htaccesselite.com/htaccess-vc1.html</code> , and for <code>htaccesselite.com/viewtopic.php?t=5</code> its <code>htaccesselite.com/caching-using-mod-expires-with-expires-vt5.html</code></p>
<hr />
<h2>This rewrites internally all .html files to .php</h2>
<pre>RewriteEngine On
RewriteBase /
RewriteRule (.*).html$ $1.php [L,T=application/x-httpd-php]</pre>
<h2>Forcing any files to be a certain type of file:</h2>
<pre>ForceType application/x-httpd-php</pre>
<h2>Name php files .html</h2>
<pre>AddType application/x-httpd-php .html</pre>
<p><strong> If using phpbb:</strong> Removing SID for guests and using phpBB SEO mod rewrites gives you an htaccess like this-</p>
<pre>RewriteEngine On
RewriteBase /
RewriteRule ^s/(.*).pl$ /cgi-bin/$1.php [L]
RewriteRule ^htaccess/.+/([^/]+.html)$ /htaccess/index.php [R=301,L]
RewriteRule ^htaccess/.+-vc([0-9]+).html$ /htaccess/index.php?c=$1 [QSA,L]
RewriteRule ^htaccess/.+-vf([0-9]+)-([0-9]+).html$ /htaccess/viewforum.php?f=$1&amp;start=$2 [QSA,L]
RewriteRule ^htaccess/.+-vf([0-9]+).html$ /htaccess/viewforum.php?f=$1 [QSA,L]
RewriteRule ^htaccess/.+-vt([0-9]+)-([0-9]+).html$ /htaccess/viewtopic.php?t=$1&amp;start=$2 [QSA,L]
RewriteRule ^htaccess/.+-vt([0-9]+).html$ /htaccess/viewtopic.php?t=$1 [QSA,L]
RewriteRule ^htaccess/post([0-9]+).html$ /htaccess/viewtopic.php?p=$1 [QSA,L]
RewriteRule ^htaccess/member([0-9]+).html$ /htaccess/profile.php?mode=viewprofile&amp;u=$1 [QSA,L]</pre>
<p>Now my question after doing all this to get my phpbb forum optimized is: <strong>Do search engines care about the &#8220;expires, etag, last-modification, and cache-control headers?&#8221;</strong> I can&#8217;t find any information about creating real static pages from phpbb, so that apache can create the etag headers and last-modification headers automatically.  I can&#8217;t even get mod_headers to work with phpbb.</p>
<p>It stands to reason that crawlers would record the last-modified time to display time-based results, so how do they look upon pages that don&#8217;t have any of these headers?</p>
<p>At this point I am just going to modify the page_header.php file to generate my own Last-modified and expires and cache-control headers and I will experiment with creating an etag..  the md5 function isn&#8217;t very effiecient though..   I think I&#8217;ll just start out by giving the expire and last-modified times to expire 10 minutes after requesting the file..  I also want to try and base the last-modified time off of the last post or edit to the data on the page.. what a nightmare! <a href="http://www.htaccesselite.com/optimizing-phpbb-for-seo-vt118.html" rel="nofollow" > Anyone have any suggestions? I&#8217;m googled out at this point..</a></p>
<p><a href="http://www.askapache.com/phpbb/using-last-modified-etag-mod_expires-mod_headers-to-seo-phpbb.html"></a><a href="http://www.askapache.com/phpbb/using-last-modified-etag-mod_expires-mod_headers-to-seo-phpbb.html">Make phpBB SEO friendly with htaccess</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/phpbb/using-last-modified-etag-mod_expires-mod_headers-to-seo-phpbb.html/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
