<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Blocking Bad Bots and Scrapers with .htaccess</title>
	<atom:link href="http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html</link>
	<description>Advanced Web Development</description>
	<lastBuildDate>Tue, 18 Jun 2013 10:04:02 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: sunny Tewathia</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1690</link>
		<dc:creator>sunny Tewathia</dc:creator>
		<pubDate>Thu, 03 Jan 2013 17:00:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1690</guid>
		<description><![CDATA[How can i specifically block a site or any type of bot using by that site to scrap my content. It is scraping my content and publishing as their own.
Please help.

Thanks.]]></description>
		<content:encoded><![CDATA[<p>How can i specifically block a site or any type of bot using by that site to scrap my content. It is scraping my content and publishing as their own.<br />
Please help.</p>
<p>Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Name rod</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1689</link>
		<dc:creator>Name rod</dc:creator>
		<pubDate>Mon, 12 Nov 2012 21:14:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1689</guid>
		<description><![CDATA[&lt;p&gt;hey great list and methodologies here, it must have taken a lot of work and study to produce this. sorry but am a pure simpleton (but know a little c++). Has anyone ever provided a list of good bots to only allow these in. The list of bad bots must grow exponentially but good bots only linearly? regards rod&lt;/p&gt;]]></description>
		<content:encoded><![CDATA[<p>hey great list and methodologies here, it must have taken a lot of work and study to produce this. sorry but am a pure simpleton (but know a little c++). Has anyone ever provided a list of good bots to only allow these in. The list of bad bots must grow exponentially but good bots only linearly? regards rod</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1688</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Fri, 08 Jun 2012 23:10:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1688</guid>
		<description><![CDATA[For nginx users:

&lt;pre&gt;if ($http_user_agent ~*
&quot;^(aesop_com_spiderman&#124;alexibot&#124;backweb&#124;bandit&#124;batchftp&#124;bigfoot)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(black.?hole&#124;blackwidow&#124;blowfish&#124;botalot&#124;buddy&#124;builtbottough&#124;bullseye)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(cheesebot&#124;cherrypicker&#124;chinaclaw&#124;collector&#124;copier&#124;copyrightcheck)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(cosmos&#124;crescent&#124;curl&#124;custo&#124;da&#124;diibot&#124;disco&#124;dittospyder&#124;dragonfly)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(drip&#124;easydl&#124;ebingbong&#124;ecatch&#124;eirgrabber&#124;emailcollector&#124;emailsiphon)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(emailwolf&#124;erocrawler&#124;exabot&#124;eyenetie&#124;filehound&#124;flashget&#124;flunky)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(frontpage&#124;getright&#124;getweb&#124;go.?zilla&#124;go-ahead-got-it&#124;gotit&#124;grabnet)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(grafula&#124;harvest&#124;hloader&#124;hmview&#124;httplib&#124;httrack&#124;humanlinks&#124;ilsebot)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(infonavirobot&#124;infotekies&#124;intelliseek&#124;interget&#124;iria&#124;jennybot&#124;jetcar)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(joc&#124;justview&#124;jyxobot&#124;kenjin&#124;keyword&#124;larbin&#124;leechftp&#124;lexibot&#124;lftp&#124;libweb)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(likse&#124;linkscan&#124;linkwalker&#124;lnspiderguy&#124;lwp&#124;magnet&#124;mag-net&#124;markwatch)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(mata.?hari&#124;memo&#124;microsoft.?url&#124;midown.?tool&#124;miixpc&#124;mirror&#124;missigua)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(mister.?pix&#124;moget&#124;mozilla.?newt&#124;nameprotect&#124;navroad&#124;backdoorbot&#124;nearsite)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(net.?vampire&#124;netants&#124;netcraft&#124;netmechanic&#124;netspider&#124;nextgensearchbot)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(attach&#124;nicerspro&#124;nimblecrawler&#124;npbot&#124;octopus&#124;offline.?explorer)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(offline.?navigator&#124;openfind&#124;outfoxbot&#124;pagegrabber&#124;papa&#124;pavuk)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(pcbrowser&#124;php.?version.?tracker&#124;pockey&#124;propowerbot&#124;prowebwalker)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(psbot&#124;pump&#124;queryn&#124;recorder&#124;realdownload&#124;reaper&#124;reget&#124;true_robot)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(repomonkey&#124;rma&#124;internetseer&#124;sitesnagger&#124;siphon&#124;slysearch&#124;smartdownload)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(snake&#124;snapbot&#124;snoopy&#124;sogou&#124;spacebison&#124;spankbot&#124;spanner&#124;sqworm&#124;superbot)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(superhttp&#124;surfbot&#124;asterias&#124;suzuran&#124;szukacz&#124;takeout&#124;teleport)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(telesoft&#124;the.?intraformant&#124;thenomad&#124;tighttwatbot&#124;titan&#124;urldispatcher)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(turingos&#124;turnitinbot&#124;urly.?warning&#124;vacuum&#124;vci&#124;voideye&#124;whacker)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^(libwww-perl&#124;widow&#124;wisenutbot&#124;wwwoffle&#124;xaldon&#124;xenu&#124;zeus&#124;zyborg&#124;anonymouse)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^web(zip&#124;emaile&#124;enhancer&#124;fetch&#124;go.?is&#124;auto&#124;bandit&#124;clip&#124;copier&#124;master&#124;reaper&#124;sauger&#124;site.?quester&#124;whack)&quot;){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
&quot;^.*(craftbot&#124;download&#124;extract&#124;stripper&#124;sucker&#124;ninja&#124;clshttp&#124;webspider&#124;leacher&#124;collector&#124;grabber&#124;webpictures).*$&quot;){
        set $rule_0 1;
        return 403;
        break;
}&lt;/pre&gt;]]></description>
		<content:encoded><![CDATA[<p>For nginx users:</p>
<pre>if ($http_user_agent ~*
"^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack)"){
        set $rule_0 1;
        return 403;
        break;
}
if ($http_user_agent ~*
"^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$"){
        set $rule_0 1;
        return 403;
        break;
}</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Techgazine</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1687</link>
		<dc:creator>Techgazine</dc:creator>
		<pubDate>Sun, 11 Mar 2012 13:47:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1687</guid>
		<description><![CDATA[Would you know how to block the WP Robot autoblogging plugin? It&#039;s scraping my other websites. Thanks]]></description>
		<content:encoded><![CDATA[<p>Would you know how to block the WP Robot autoblogging plugin? It's scraping my other websites. Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phoenix</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1686</link>
		<dc:creator>Phoenix</dc:creator>
		<pubDate>Sun, 15 Jan 2012 22:04:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1686</guid>
		<description><![CDATA[is it normal to still have logs of bad bots with RewriteRules?

And another question, what is the best solution: redirection or forbidden ?
&lt;pre&gt;RewriteRule ^(.*)$ http://www.example.com/ [R,L]
RewriteRule . - [F,L]&lt;/pre&gt;

Thanks by advance for your answer and thanks for this great post! :)]]></description>
		<content:encoded><![CDATA[<p>is it normal to still have logs of bad bots with RewriteRules?</p>
<p>And another question, what is the best solution: redirection or forbidden ?</p>
<pre>RewriteRule ^(.*)$ http://www.example.com/ [R,L]
RewriteRule . - [F,L]</pre>
<p>Thanks by advance for your answer and thanks for this great post! :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anuj@WordPress SEO</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1685</link>
		<dc:creator>Anuj@WordPress SEO</dc:creator>
		<pubDate>Tue, 15 Nov 2011 07:34:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1685</guid>
		<description><![CDATA[You&#039;re post is good... I agree that people should make the most of the built-ins before jumping to the advanced modules.  That&#039;s precisely what I&#039;m doing, however, I&#039;m having some trouble and was hoping you might help.]]></description>
		<content:encoded><![CDATA[<p>You're post is good... I agree that people should make the most of the built-ins before jumping to the advanced modules.  That's precisely what I'm doing, however, I'm having some trouble and was hoping you might help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erika</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1684</link>
		<dc:creator>Erika</dc:creator>
		<pubDate>Mon, 03 Jan 2011 17:26:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1684</guid>
		<description><![CDATA[¿En dónde se pega ese código para evitar que se bajen el sitio web?]]></description>
		<content:encoded><![CDATA[<p>¿En dónde se pega ese código para evitar que se bajen el sitio web?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: memoi</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1683</link>
		<dc:creator>memoi</dc:creator>
		<pubDate>Sun, 19 Dec 2010 14:00:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1683</guid>
		<description><![CDATA[thanks for the post.
I am looking for a way to block/deny OneNote. Is there a certain rule to deny OneNote or any program ?

many thanks]]></description>
		<content:encoded><![CDATA[<p>thanks for the post.<br />
I am looking for a way to block/deny OneNote. Is there a certain rule to deny OneNote or any program ?</p>
<p>many thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mark</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1682</link>
		<dc:creator>mark</dc:creator>
		<pubDate>Sat, 06 Nov 2010 20:58:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1682</guid>
		<description><![CDATA[Very good, thanks for placing this online! There are scripts that set traps for bad bots but somehow I can&#039;t set them to block anything... at least with this a few bad bots are trapped :)]]></description>
		<content:encoded><![CDATA[<p>Very good, thanks for placing this online! There are scripts that set traps for bad bots but somehow I can't set them to block anything... at least with this a few bad bots are trapped :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Donna</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comment-1681</link>
		<dc:creator>Donna</dc:creator>
		<pubDate>Thu, 18 Mar 2010 19:17:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.askapache.com/?p=549#comment-1681</guid>
		<description><![CDATA[I have 2 questions:
&lt;ol&gt;
&lt;li&gt;Why do we need a custom 404 page?&lt;/li&gt;
&lt;li&gt;Below is my htaccess. To save space I&#039;ve removed some code. Will this work? Do I need multiple &quot;RewriteEngine On&quot; statements? If not, do I keep the top one and then just begin your code with &quot;Rewrite Base&quot;? Thank you!&lt;/li&gt;
&lt;/ol&gt;

&lt;pre&gt;# redirect non-www to the www url always
Options +FollowSymLinks
RewriteEngine on
...
# prevent hotlinking of images
RewriteEngine on
....
# custom not found file
ErrorDocument 404 /notfound.shtml
&#160;
# beginning of the blocking of bad bots
ErrorDocument 403 /403.html
RewriteEngine On
RewriteBase /
 # IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman&#124;alexibot&#124;backweb&#124;bandit&#124;batchftp&#124;bigfoot) [NC,OR]
.............rest of rewrites...............
# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]&lt;/pre&gt;]]></description>
		<content:encoded><![CDATA[<p>I have 2 questions:</p>
<ol>
<li>Why do we need a custom 404 page?</li>
<li>Below is my htaccess. To save space I've removed some code. Will this work? Do I need multiple "RewriteEngine On" statements? If not, do I keep the top one and then just begin your code with "Rewrite Base"? Thank you!</li>
</ol>
<pre># redirect non-www to the www url always
Options +FollowSymLinks
RewriteEngine on
...
# prevent hotlinking of images
RewriteEngine on
....
# custom not found file
ErrorDocument 404 /notfound.shtml
&nbsp;
# beginning of the blocking of bad bots
ErrorDocument 403 /403.html
RewriteEngine On
RewriteBase /
 # IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
.............rest of rewrites...............
# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]</pre>
]]></content:encoded>
	</item>
</channel>
</rss>
