If you have files on your site that you don't want indexed by malicious search engines, grabbed and leeched by malicious spammers, or stolen and made available elsewhere, you can use mod_rewrite to drastically reduce or totally reduce that activity.

The Worst Kind of People

Spammers, and Leechers. They operate like this: Let's say you have some mp3 files on a server, and SOMEWHERE on the web there is a link to that mp3 file's location. This includes in javascript files, css files, robots.txt files, the spammers and leechers robots check all those files looking for the type of link they are looking for. Then they try to request that file usually utilizing a number of different types of requests to get access to the file. Then they use it for personal gain, at your peril.

Some robots perform valuable services for the world wide web community, and other leeching programming is pretty cool, so not all these activities are perpetrated by nefarious spammers.

Ok so if a link exists to your file, it is going to be requested by a robot eventually, so the way to defeat them is by doing something on your site that modifies the way a user would request it. Robots for the most part are not javascript-capable, so the most-common advanced method is to set a cookie using javascript, and then we can check for that cookie in the request for the file using mod_rewrite.

So if your site sets a cookie named fspammers, and furthermore gives that cookie a value of 445, then this is what the request sent by an HTTP Client like Firefox looks like.

GET /hotlink/lovefreedom.mp3 HTTP/1.1
Host: s.askapache.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.askapache.com/wordpress/seo-in-wordpress/
Cookie: fspammers=455

Mod_Rewrite HTTP Headers

The mod_rewrite module has access to ALL the HTTP Headers sent in a request, so for each of the HTTP Headers in the request example above, we can use mod_rewrite to validate.

Mod_Rewrite .htaccess Example

Finally, now that everyone is on the same page about what is really going on, here is the .htaccess code that blocks any requests for anything in the /hotlink/ folder.

Here are the triggers this code blocks access based on.

  1. Cookie: Checks if fspammers cookie is present, and that it has the value of 445.
  2. HTTP Protocol: Checks if HTTP 1.1 is being used (many robots use 1.0)
  3. Host: Checks that the HOST Requested was s.askapache.net
  4. Referer: Checks for Referring site is s.askapache.net or www.askapache.com
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_COOKIE} !^.*fspammers=445.*$ [NC,OR]
RewriteCond %{THE_REQUEST} !^[A-Z]{3,9}\ /(.*) HTTP/1.1 [NC,OR]
RewriteCond %{HTTP_HOST} !^z.askapache.com$ [NC,OR]
RewriteCond %{HTTP_REFERER} !^http://(www|z).askapache.com.*$ [NC]
RewriteRule ^hotlink/.*$ - [F]</p>

Htaccess Htaccess malicious search engines mod_rewrite Spamming