<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AskApache &#187; Search Results  &#187;  blocking</title>
	<atom:link href="http://www.askapache.com/search/blocking/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.askapache.com</link>
	<description>Advanced Web Development</description>
	<lastBuildDate>Thu, 26 Apr 2012 11:29:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Dealing with Mobile Visitors using Bad Browsers</title>
		<link>http://www.askapache.com/htaccess/dealing-mobile-browsers.html</link>
		<comments>http://www.askapache.com/htaccess/dealing-mobile-browsers.html#comments</comments>
		<pubDate>Fri, 10 Sep 2010 00:26:35 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Htaccess]]></category>

		<guid isPermaLink="false">http://www.askapache.com/?p=4508</guid>
		<description><![CDATA[]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/htaccess/dealing-mobile-browsers.html"></a><a href="http://www.askapache.com/htaccess/dealing-mobile-browsers.html"><cite>AskApache.com</cite></a></p><a href="http://perishablepress.com/press/2010/04/26/stop-404-requests-for-mobile-versions-of-your-site/">Definately worth the read</a>, but it is hard to see any benefit to doing this.

Mobile agents are still in their infancy, but within 2 years most mobiles will be as fast as the laptops from a couple years back... I mean my droid is running linux!  The world is moving steadily towards a society where mobile devices will greatly outnumber pcs.  So during this "growing up" phase I would argue that it would be much more beneficial to look for a method that solves the resource-robbing issue from the server-side, while also keeping in mind that mobile visitors to your site will continue to grow and eventually surpass non-mobile clients.

It definately makes it harder to understand the reasons behind this clever post without having more information on the mobile bots that you've been seeing in your logs..  Obviously (looking at your impressive blacklist work) that would be easy for you to get, but it would help us to see the same thing.

Alot of mobile devices have very small amounts of memory, especially smal is the amount of storage available to save data too.  In approaching that significant problem from a programming point of view, programmers built the mobile user-agents to be as fast as possible using minimal data.   Knowing that it makes sense for a mobile agent to try hard to find an alternate version of a page formatted with it's unique lack of resources in mind.  Issuing 100 requests for non-existant pages and only finding the right one on the 100th try would almost certainly be worth it for a mobile device.  Most devices use socket programming to communicate across HTTP for speed, which makes it very quick and easy to issue requests..  Basically they are free in terms of what it takes to make a request.  unfortunately this would really take up some cpu/memory/connections on our servers if they aren't setup and optimized.   1000 mobiles doing this simultaneously would grind most sites to a halt.

The solution you came up with would definately help that situation.  403's are the strongest method available to a server (at least in terms of the HTTP protocol) to tell a useragent to get gone.   They are also the best way (at least in apache) to save your cpu/resources as a 403 causes apache internally to end the connection, clean up it's internal data structures, and terminate the actual connection and apache processing of the request.

However 403's are too strict for a situation without any clear abuse going on, 403's are understood by all agents and can do some bad things to your site if used like this.  You can get dropped from search indexes for returning 403's (thats google trying to do you a favor by not indexing "Forbidden" content), and I've found that returning a 403 to crawlers causes them to sometimes retry in 15min, then an hour, then a day, then a week, and the spaces between checks grow until they stop.

Oh ya, it is very unlikely that a mobile device will save the results of non-existing mobile uris, mostly because it doesn't cost them anything to make it (unless you setup a trap like mod_security that lets you respond byte by byte veryyyyy slowwwwlyyyy).   And even then mobile devices do not have that kind of memory to store lists of requested urls and their responses.  Think about it, to check if the url returned a bad result previously before making the same request would very very quickly freeze up a device, 50 sites x 5 requests and responses equals quite a bit of data, not to mention having to then search through all that data before making a request.. the battery would die super fast.

This is also the primary reason that the new AMAZINGLY fast opera browser released last month for the droid does what it does.  It uses socket-level HTTP like everyone else, but opera setup mobile proxy servers around the nation to act as the intermediarys and crunch the actual data for the mobile.   There just isn't enough mem for my droid to be able to open a huge webpage, parse the source, and then render it, so it looks for mobile versions whenever possible.  If it can't find a mobile version or the mobile version is still too big, it proxies the request across a mobile proxy server (such as used by google, opera, blackberry) which allows the proxy server (super sophisticated) to get the content first, render it, and then send it to your mobile for direct viewing.   More than proxy servers they act as caches.  And especially due to the fact they all use custom programming (the proxies) you do not want to play around with HTTP 403's like that.  It could easily have the effect of blocking a root proxy resulting in your site being blocked by the entire proxy and it's clients.  Unlike mobiles, those machines store request state info extremely well.


Regarding a 410, that seems like a great solution but actually could be the worst possible thing to do.  410 gone means it used to exist, and also means that it was removed purposefully and will NEVER be available again.   2 years from now when googles mobile index takes over the main web index, you will be upstream without a paddle, with no clue as to why your new mobile area isn't getting traffic.

Very few useragents understand a 410, it's one of those codes used almost exclusively for controlling the way search engines index your content.  So to me it makes no sense to issue an esoteric status code to a bot that doesn't even understand 404's.

The only time you should ever have to use a 410 is when you make a big mistake with your indexing and have to use it to fix your site index.  Many other useragents have minimal understanding of HTTP (esp bots, crawlers, spammers, etc) either by design for speed or whatever..  they just look at the first digit of the response code (2 OK, 3 REDIRECT, 4 NOT EXIST, 5 SERVER ERROR) and determine from that alone whether the content is good or not.

Basically all mobile devices run on HTTP 1.1, but for their own physical limitations they behave like HTTP 1.0 clients from a server admin standpoint.<p><a href="http://www.askapache.com/htaccess/dealing-mobile-browsers.html"></a><a href="http://www.askapache.com/htaccess/dealing-mobile-browsers.html">Dealing with Mobile Visitors using Bad Browsers</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/htaccess/dealing-mobile-browsers.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Optimizing Servers and Processes for Speed with ionice, nice, ulimit</title>
		<link>http://www.askapache.com/optimize/optimize-nice-ionice.html</link>
		<comments>http://www.askapache.com/optimize/optimize-nice-ionice.html#comments</comments>
		<pubDate>Sat, 10 Oct 2009 05:41:28 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://www.askapache.com/?p=3167</guid>
		<description><![CDATA[<p><a href="http://www.askapache.com/linux/optimize-nice-ionice.html" class="IFL" id="id18"></a>To prepare for several upcoming articles on AskApache that are focused on optimizing Servers and Sites from a server admin level, here is an article to introduce the main tools that we will be using.  These tools are used to optimize CPU time for each process using <strong>nice</strong> and <strong>renice</strong>, and other tools like <strong>ionice</strong> are used to optimize the Disk IO, or Disk speed / Disk traffic for each process.  Then you can make sure your mysqld and httpd processes are always fast and prioritized.<br class="C" /></p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/optimize/optimize-nice-ionice.html"></a><a href="http://www.askapache.com/optimize/optimize-nice-ionice.html"><cite>AskApache.com</cite></a></p><p>Ok, sup.  I really felt I had to get this out of the way, because I have a whole stack of drafts waiting to be published, but I realized that not many people will benefit from all the advanced optimizations and tricks I'm writing unless they get a basic understanding of some of the tools I'm using.  I decided to write a series of articles explaining how I optimize servers for speed because lately I've been getting a lot more people wanting to hire me to do that.  I take on projects when I can but there is clearly a need out here on the net for some self-help.   The momentum is swinging more and more towards VPS type of web hosting, and I would say that 99% of those customers are getting supremely ripped off, which goes against the foundation of the web.</p>
<p>Keep in mind that this blog and my research is only a hobby of mine, my job is primarily marketing and sales, so I'm not some licensed expert or anything, or even an unlicensed expert! haha.  But it does bother me that those who are tech-savvy enough to run web-hosting companies are happily ripping people off.  So this article details the main tools that are used to speed up and optimize your machine by delegating levels of priority to specific processes.  Future articles will use these tools alot, so this is meant as an intro.</p>




<p><a id="cpu-disk-io" name="cpu-disk-io"></a></p><h2>CPU and Disk I/O</h2>
<p>As most of you are aware, there are 2 variables that determine any computer or programs speed.  CPU and Disk I/O.  CPU determines how fast you can process data, crunch numbers, etc. while disk I/O determines how fast your disks can read and write data to the hard-drive.  Wouldn't it be great if you could easily configure your server to give your httpd, php, and other processes both greater CPU processing and disk IO than your non-important processes like backup scripts, ftp daemons, etc.?  We are talking about Linux in this article, so of course YES not only can you do that, you should!</p>
<p><a name="optimize-ram" id="optimize-ram"></a></p><h3>RAM</h3>
<p>RAM is like a hard-drive in that data is stored on it, and read/written to it.  The difference is that RAM is somewhere around 30x faster than disk I/O, but the cost of that incredible speed is that the data stored on it is only temporary in the sense that it won't be stored permanently, it is completely erased when your machine is rebooted.  RAM is also expensive, and there is a limit to how much a server or machine can have due to hardware limits.</p>
<p><a name="optimize-swap" id="optimize-swap"></a></p><h3>SWAP</h3>
<p>SWAP takes off when you run out of RAM but you still want certain data to be read/write quickly.  Basically when you start running out of RAM your machine starts supplementing RAM with SWAP storage.  SWAP is usually a partition on a second hard-drive disk.  There is an upper limit on how much I/O can occur on a disk at one time, and the more I/O takes place, the slower all I/O becomes, so SWAP works well on a separate hard-drive as it will have much faster I/O.  On Windows they opted to copy the SWAP mechanism but instead use a file named pagefile.sys, and that is just one reason people in the know do not care for Windows.</p>
<p><a name="optimize-cpu" id="optimize-cpu"></a></p><h3>CPU</h3>
<p>So lets do this, think of your CPU (your processor) as having an amount of 100% processing available when not being used, 0% when its maxed out.  CPU's handle multiple processing tasks simultaneously, so what we will discuss in this article is how to specify HOW MUCH of that processing amount each of your programs (heretofore "processes") are able to use.  Yes, very very cool.</p>
<p>That is correct, you can easily configure your server to provide more of the available processing time to certain programs over others, like you can configure apache and php to utilize 50% of your CPU processing time by themselves, so that all other processes (proftpd, sshd, rsync, etc.) combined can only utilize 50%.  The terminology is we can give certain specific processes (like php.cgi, httpd, fast-cgi.cgi) a specific <strong>priority</strong>, where -19 is the most priority, and +19 is the least amount of priority, or CPU processing time.  I know it seems backwards.. </p>


<p><a id="tools" name="tools"></a></p><h2>The Tools</h2>
<p>If you run Windows, you are in the right place... because the following advice will save your life:  GET LINUX! Ok, now that that is out of the way, the following are the tools dicussed on this page.  All of them are free, open-source, and wonderful.  The basic idea of these tools is to control how much CPU is devoted to each process, and also how much Disk IO/Disk traffic is given to each process.</p>
<dl>
<dt><a href="#nice-tool">nice</a></dt><dd>run a program with modified scheduling priority</dd>
<dt><a href="#renice-tool">renice</a></dt><dd>alter priority of running processes</dd>
<dt><a href="#ionice-tool">ionice</a></dt><dd>set or retrieve the I/O priority for a given pid or execute a new task with a given I/O priority.</dd>
<dt><a href="#iostat-tool">iostat</a></dt><dd>Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.</dd>
<dt><a href="#ulimit-tool">ulimit</a></dt><dd>Ulimit provides control over the resources available to processes started by the shell, on systems that allow such control.</dd>
<dt><a href="#chrt-tool">chrt</a></dt><dd>set or retrieve real-time scheduling parameters for a given pid or execute a new task under given scheduling parameters.</dd>
<dt><a href="#taskset-tool">taskset</a></dt><dd>set or retrieve task CPU affinity for a given pid or execute a new task under a given affinity mask.</dd>
<dt></dt><dd></dd>
</dl>




<p><a id="part1-processes" name="part1-processes"></a></p><h2>Part 1: Process Processes Faster</h2>
<p>Ok so lets tackle figuring out how to give your response-intensive processes (like apache, php, ruby, perl, java) meaning a request to your server/machine requires a <em>response</em>.  For instance, when you requested this page that you are reading at this very second, several things on my server had to happen for you to be able to read this.</p>
<p>First your computer sends out a request to see what server the www.askapache.com domain name is.  DNS servers respond with my server IP, so for servers dedicated as nameservers, optimizing the DNS processes like bind would speed that up.  Now that your computer knows how to reach my server it sends an HTTP GET request for this url.  This request is received by the httpd process that is apache, and apache determines this url should be handled by my custom compiled php5.3.0 binary, because this page is WordPress generated.  So the php binary loads up the WordPress /index.php file, which chain-loads several other php files, including <code>wp-config.php</code> containing my MySql database settings.  Now php connects to my MySql Server to fetch this articles content, comments, title, tags, etc. and then generates the HTML and hands that back to Apache.</p>
<p>Finally, Apache generates a HTTP RESPONSE and sends the RESPONSE and CONTENT back to your Browser, which then in turn renders the page for your eyes with the necessary javascript, images, css, and other files included in the HTML response.</p>

<h3>Too much Processing</h3>
<p>Now you see why I've opted to write my own caching plugin that takes the php and mysql processes OUT of that equation.  Both the php binary and the mysql instance consume CPU processing, and disk IO, to load all their library files, make various network requests and sockets, check permissions, and on and on.  And that's completely ok, the thing is, unless you configure these processes (Apache, PHP, MySQL) they will use the same amount of CPU processing that other processes use, other processes that have very little to do with you reading this sentence.  Processes to run my mail server, my FTP server, my SSH server, my cronjobs, cleanup scripts, atd daemon, etc.. and they will get the same amount of CPU!</p>
<p>Another even simpler example is what got me to look into this myself.  I wrote a shell script that created hourly, daily, weekly, and monthly backups for all of my websites and sql databases, and set it up to run by cronjob at those set intervals.  Eventually I noticed my sites were slower, my php even slower, and sometimes I even saw 503 errors that my host throws up when my server is overloaded.  The research that I pursued to prevent that from happening has been hugely eye-opening.  What does a backup script do?  Mine just created tar archives of all the files in my web root, then gzipped the tar archive saving to a backup server using scp (a file transfer using ssh).  This resulted in the following huge problems that seem to have nothing to do with a faster server and speedier website, but they have everything to with it.</p>
<ol>
<li><strong>CPU Bottleneck #1</strong> - tar and gzip use compression algorithms at a low level to create a compressed version, and all that compressing uses a whole lot of crunching - CPU processing</li>
<li><strong>DISK IO Bottleneck</strong> - Tarring the whole web root directory was creating a ton of disk io, and remember the more disk io that is going on, the less is available for everything else.</li>
<li><strong>CPU Bottleneck #2</strong> - Using scp to send my backups was security-smart, but these huge archive files had to be encrypted and sent over the net.</li>
</ol>






<p><a id="breaking-bottlenecks" name="breaking-bottlenecks"></a></p><h2>Breaking Bottles</h2>
<p>I apologize for being a little long-winded there, but I think it's important to make sure everyone understands those basic concepts, which are foreign to most people.  Once you understand what is causing the bottlenecks, then you can understand the solutions, which actually are incredibly simple and even a novice linux user can easily do.  Besides, the net gets a little bit faster every time someone implements this.</p>

<p><a id="nice-tool" name="nice-tool"></a></p><h3>nice</h3>
<p><img src="http://uploads.askapache.com/2009/10/nice-chart.png" alt="NICE Levels Chart" title="NICE Levels Chart" width="351" height="225" class="IFL" />Nice allows you to run a program with modified scheduling priority which specifies how much CPU is devoted to a particular process.  Run COMMAND with an adjusted niceness, which affects process scheduling.  With no COMMAND, print the current niceness.  <br /><br />Nicenesses range from -20 (most favorable scheduling) to 19 (least favorable).   <code>-n, --adjustment=N</code> -  add integer N to the niceness (default 10).   <code>nice +19</code> tasks get a HZ-independent 1.5%.  Running a <code>nice +10</code> and a <code>nice +11</code> task means the first will get 55% of the CPU, the other 45%.<br class="C" /></p>

<p><a id="nice-usage" name="nice-usage"></a></p><h4>nice usage</h4>
<pre>nice [OPTION] [COMMAND [ARG]...]
&nbsp;
-n, --adjustment=ADJUST   increment priority by ADJUST first</pre>

<p><a id="nice-examples" name="nice-examples"></a></p><h4>Examples of nice</h4>
<p>Using nice to download a file</p>
<pre>nice -n 17 curl -q -v -A &#039;Mozilla/5.0&#039; -L -O http://wordpress.org/latest.zip</pre>
<p>Unzipping a file with nice</p>
<pre>nice -n 17 unzip latest.zip</pre>
<p>Nice way to build from source</p>
<pre>nice -n 2 ./configure
nice -n 2 make
nice -n 2 make install</pre>
<p>It is sometimes useful to run non-interactive programs with reduced priority.</p>
<pre>$ nice factor `echo &#039;2^9 - 1&#039;|bc`
511: 7 73</pre>
<p>Since nice prints the current priority, we can invoke it through itself to demonstrate how it works: The default behavior is to reduce priority by 10.</p>
<pre> $ nice nice
10
$ nice -n 10 nice
10</pre>
<p> The ADJUSTMENT is relative to the current priority.  The first <code>nice</code> invocation runs the second one at priority 10, and it in turn runs the final one at a priority lowered by 3 more.</p>
<pre>$ nice nice -n 3 nice
13</pre>
<p>Specifying a priority larger than 19 is the same as specifying 19.</p>
<pre>$ nice -n 30 nice
19</pre>
<p>Only a privileged user may run a process with higher priority.</p>
<pre>$ nice -n -1 nice
nice: cannot set priority: Permission denied
$ sudo nice -n -1 nice
-1</pre>

<blockquote cite="http://uploads.askapache.com/2009/08/sched-nice-design.txt">
<p>The new scheduler in v2.6.23 addresses all three types of complaints:</p>
<p>To address the first complaint (of nice levels being not "punchy" enough), the scheduler was decoupled from 'time slice' and HZ concepts (and granularity was made a separate concept from nice levels) and thus it was possible to implement better and more consistent nice +19 support: with the new scheduler nice +19 tasks get a HZ-independent 1.5%, instead of the variable 3%-5%-9% range they got in the old scheduler.</p>
<p>To address the second complaint (of nice levels not being consistent), the new scheduler makes nice(1) have the same CPU utilization effect on tasks, regardless of their absolute nice levels. So on the new scheduler, running a nice +10 and a nice 11 task has the same CPU utilization "split" between them as running a nice -5 and a nice -4 task. (one will get 55% of the CPU, the other 45%.) That is why nice levels were changed to be "multiplicative" (or exponential) - that way it does not matter which nice level you start out from, the 'relative result' will always be the same.</p>
<p>The third complaint (of negative nice levels not being "punchy" enough and forcing audio apps to run under the more dangerous SCHED_FIFO scheduling policy) is addressed by the new scheduler almost automatically: stronger negative nice levels are an automatic side-effect of the recalibrated dynamic range of nice levels.</p>
</blockquote>







<p><a id="renice-tool" name="renice-tool"></a></p><h3>renice</h3>
<p>Renice is similar to the nice command, but it lets you modify the nice of a currently running process.  This is nice for shell scripts where you can add this to the top of the script to nicify the whole script to 19.</p>

<p><a id="renice-usage" name="renice-usage"></a></p><h4>renice usage</h4>
<pre>renice priority [ [ -p ] pids ] [ [ -g ] pgrps ] [ [ -u ] users ]
&nbsp;
-g      Force who parameters to be interpreted as process group ID&#039;s.
-u      Force the who parameters to be interpreted as user names.
-p      Resets the who interpretation to be (the default) process ID&#039;s.</pre>

<p><a id="renice-examples" name="renice-examples"></a></p><h4>Examples of renice</h4>
<p>From the shell, changes the priority of the shell and all children to 19.  From a shell script, does the same but only for the script and its children.</p>
<pre>renice 19 -p $$</pre>
<p>This runs renice without any output</p>
<pre>renice 19 -p $$ &amp;&gt;/dev/null</pre>
<p>10 gets more CPU than 19</p>
<pre>renice 10 -p $$</pre>
<p>change the priority of process ID's 987 and 32, and all processes owned by users daemon and root.</p>
<pre>renice +1 987 -u daemon root -p 32</pre>









<p><a id="part2-disk-io" name="part2-disk-io"></a></p><h2>Part 2: Optimizing Disk I/O</h2>
<p><a id="scheduling-policies" name="scheduling-policies"></a></p><h3>Linux Scheduling Policies</h3>
<p>The scheduler is the kernel component that decides which runnable process will be executed by the CPU next.  Each process has an associated scheduling policy and a static scheduling priority, sched_priority</p>
<p>Processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the <strong>range 1 (low) to 99 (high)</strong>.  (As the numbers imply, real-time processes always have higher priority than normal processes.)   The following "real-time" policies are also supported, for special time-critical applications that need precise control over the way in which runnable processes are selected for execution:</p>
<p>Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies:</p>
<dl>
<dt><strong>SCHED_OTHER</strong>: Default Linux time-sharing scheduling</dt><dd>The standard round-robin time-sharing policy</dd><dt><strong>SCHED_BATCH</strong>: Scheduling batch processes</dt><dd>This policy is useful for workloads that are non-interactive, but do not want to lower their nice value, and for workloads that want a deterministic scheduling policy without interactivity causing extra preemptions (between the workload's tasks).</dd>
<dt><strong>SCHED_IDLE</strong>: Scheduling very low priority jobs</dt>
<dd>This policy is intended for running jobs at extremely low priority (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH policies)</dd>
<dt><strong>SCHED_FIFO</strong>: First In-First Out scheduling</dt><dd>A first-in, first-out policy</dd>
<dt><strong>SCHED_RR</strong>: Round Robin scheduling</dt><dd>A round-robin policy.</dd>
</dl>

<p><a id="scheduling-classes" name="scheduling-classes"></a></p><h3>Scheduling Classes</h3>
<dl>
<dt><code>IOPRIO_CLASS_RT</code></dt>
<dd>This is the realtime io class. The RT scheduling class is given first access to the disk, regardless of what else is going on in the system. Thus the RT class needs to be used with some care, as it can starve other processes. As with the best effort class, 8 priority levels are defined denoting how big a time slice a given process will receive on each scheduling window.  This scheduling class is given higher priority than any other in the system, processes from this class are given first access to the disk every time. Thus it needs to be used with some care, one io RT process can starve the entire system. Within the RT class, there are 8 levels of class data that determine exactly how much time this process needs the disk for on each service. In the future this might change to be more directly mappable to performance, by passing in a wanted data rate instead.</dd>
<dt><code>IOPRIO_CLASS_BE</code></dt>
<dd>This is the best-effort scheduling class, which is the default for any process that hasn't set a specific io priority. This is the default scheduling class for any process that hasn't asked for a specific io priority. Programs inherit the CPU nice setting for io priorities. This class takes a priority argument from 0-7, with lower number being higher priority. Programs running at the same best effort priority are served in a round-robin fashion.  The class data determines how much io bandwidth the process will get, it's directly mappable to the cpu nice levels just more coarsely implemented. 0 is the highest BE prio level, 7 is the lowest. The mapping between cpu nice level and io nice level is determined as: io_nice = (cpu_nice + 20) / 5.</dd>
<dt><code>IOPRIO_CLASS_IDLE</code></dt>
<dd>This is the idle scheduling class, processes running at this level only get io time when no one else needs the disk. A program running with idle io priority will only get disk time when no other program has asked for disk io for a defined grace period. The impact of idle io processes on normal system activity should be zero. This scheduling class does not take a priority argument.    The idle class has no class data, since it doesn't really apply here.</dd>
</dl>








<p><a id="ionice-tool" name="ionice-tool"></a></p><h3>ionice</h3>
<p>ionice - get/set program io scheduling class and priority.  This program sets the io scheduling class and priority for a program.  Since v3 (aka CFQ Time Sliced) CFQ implements I/O nice levels similar to those of CPU scheduling. These nice levels are grouped in three scheduling classes each one containing one or more priority levels:</p>

<p><a id="ionice-usage" name="ionice-usage"></a></p><h4>ionice usage</h4>
<p>If no arguments or just -p is given, ionice will query the current io scheduling class and priority for that process.</p>
<pre>ionice [-c] [-n] [-p] [COMMAND [ARG...]]</pre>
<ul>
<li><strong>-c</strong> - The scheduling class. 1 for real time, 2 for best-effort, 3 for idle.</li>
<li><strong>-n</strong> - The scheduling class data. This defines the class data, if the class accepts an argument. For real time and best-effort, 0-7 is valid data.</li>
<li><strong>-p</strong> - Pass in a process pid to change an already running process. If this argument is not given, ionice will run the listed program with the given parameters.</li>
</ul>

<p><a id="ionice-examples" name="ionice-examples"></a></p><h4>ionice Examples</h4>
<p>Sets process with PID 89 as an idle io process.</p>
<pre>ionice -c3 -p89</pre>
<p>Runs 'bash' as a best-effort program with highest priority.</p>
<pre>ionice -c2 -n0 bash</pre>
<p>Returns the class and priority of the process with PID 89</p>
<pre>ionice -p89</pre>

<blockquote cite="http://gaarai.com/2009/03/06/multitasking-from-the-linux-command-line-plus-process-prioritization/">
<p><p>With the ionice command, you can set the IO priority for a process to one of three classes: Idle (3), Best Effort (2), and Real Time (1). The Idle class means that the process will only be able to read and write to the disk when all other processes are not using the disk. The Best Effort class is the default and has eight different priority levels from 0 (top priority) to 7 (lowest priority). The Real Time class results in the process having first access to the disk irregardless of other process and should never be used unless you know what you are doing.</p>
<p>If we wish to run the updatedb process in the background with an Idle IO class priority, we can run the following:</p>
<pre>$ sudo date
$ sudo updatedb &amp;
[1] 16324
$ sudo ionice -c3 -p16324</pre>
<p>If we’d rather just lower the Best Effort class priority (defaults to 4) for the command so the process isn’t limited to idle IO periods, we can run the following:</p>
<pre>$ sudo date
$ sudo updatedb &amp;
[1] 16324
$ sudo ionice -c2 -n7 -p16324</pre>
<p>Again, the Real Time class should not be used as it can prevent you from being able to interact with your system.</p>
<p>You may wonder where you can get the process ID if you don’t know it, can’t remember it, or didn’t start the process (an automatted script may have launched it). You can find process IDs with the ps command.</p>
<p>For example, if I had an updatedb program running in the background, and I wanted to find its process ID, I can run the following:</p>
<pre>$ ps -C updatedb
PID TTY TIME CMD
4234 ? 00:00:42 updatedb</pre>
<p>This tells me that the process’ process ID (PID) is 4234.</p></p>
</blockquote>





<p><a id="iostat-tool" name="iostat-tool"></a></p><h3>iostat</h3>
<p><a id="iostat-usage" name="iostat-usage"></a></p><h4>iostat Usage</h4>
<pre>iostat [ -c ] [ -d ] [ -N ] [ -n ] [ -h ] [ -k | -m ] [ -t ] [ -V ] [ -x ] [ -z ] [ &lt;device&gt; [...] | ALL ] [ -p [ &lt;device&gt; [,...] | ALL ] ] [ &lt;interval&gt; [ &lt;count&gt; ] ]
&nbsp;
-c     The -c option is exclusive of the -d option and displays only the CPU usage report.
-d     The -d option is exclusive of the -c option and displays only the device utilization report.
-k     Display statistics in kilobytes per second instead of blocks per second.  Data displayed are valid only with kernels 2.4 and newer.
-m     Display statistics in megabytes per second instead of blocks or kilobytes per second.  Data displayed are valid only with kernels 2.4 and newer.
-n     Displays the NFS-directory statistic.  Data displayed are valid only with kernels 2.6.17 and newer.  This option is exclusive ot the -x option.
-h     Display the NFS report more human readable.
-p [ { device | ALL } ]   The  -p  option  is  exclusive  of  the -x option and displays statistics for block devices and all their partitions that are used by the system.
-t     Print the time for each report displayed.
-x     Display extended statistics.</pre>

<p><a id="iostat-examples" name="iostat-examples"></a></p><h4>iostat Examples</h4>
<pre>iostat -p ALL 2 1000
avg-cpu:  %user   %nice    %sys %iowait   %idle
            8.34    0.08    1.26    2.27   88.05</pre>
<p>Display a single history since boot report for all CPU and Devices.</p>
<pre>$ iostat</pre>
<p>Display a continuous device report at two second intervals.</p>
<pre>$ iostat -d 2</pre>
<p>Display six reports at two second intervals for all devices.</p>
<pre>$ iostat -d 2 6</pre>
<p>Display six reports of extended statistics at two second intervals for devices hda and hdb.</p>
<pre>$ iostat -x hda hdb 2 6</pre>
<p>Display six reports at two second intervals for device sda and all its partitions (sda1, etc.)</p>
<pre>$ iostat -p sda 2 6</pre>






<p><a id="schedule-utils" name="schedule-utils"></a></p><h2>Schedule Utils</h2>
<p>These are the Linux scheduler utilities - schedutils for short.  These programs take advantage of the scheduler family of syscalls that Linux implements across various kernels.  These system calls implement interfaces for scheduler-related parameters such as CPU affinity and real-time attributes.  The standard UNIX utilities do not provide support for these interfaces -- thus this package.</p>
<p>The programs that are included in this package are chrt and taskset.  Together with nice and renice (not included), they allow full control of process scheduling parameters.  Suggestions for related utilities are welcome, although it is believed (barring new interfaces) that all scheduling interfaces are covered.</p>
<p>I've found that quite a few servers do not have this package installed, indicating to you that they might not know what they are doing.  Here is how you can install this incredible package, for non-root users.  Root users know how to do this, or they shouldn't be root.  Download and install in 1 line provided you have curl.  Or just use the following commands.</p>
<pre>mkdir -pv $HOME/{dist,source,bin,share/man/man1} &amp;&amp; cd ~/dist &amp;&amp; curl -O http://ftp.de.debian.org/debian/pool/main/s/schedutils/schedutils_1.5.0.orig.tar.gz &amp;&amp; cd ~/source &amp;&amp; tar -xvzf ~/dist/sch*z &amp;&amp; cd sch* &amp;&amp; sed -i -e &#039;s,= /usr/local,=${HOME},g&#039; Makefile &amp;&amp; make &amp;&amp; make install &amp;&amp; make installdoc</pre>
<pre>mkdir -pv $HOME/{dist,source,bin,share/man/man1}
cd ~/dist &amp;&amp; curl -O http://ftp.de.debian.org/debian/pool/main/s/schedutils/schedutils_1.5.0.orig.tar.gz
cd ~/source &amp;&amp; tar -xvzf ~/dist/schedutils_1.5.0.orig.tar.gz
cd ~/source/schedutils-1.5.0 &amp;&amp; sed -i -e &#039;s,= /usr/local,=${HOME},g&#039; Makefile
make || make -d &amp;&amp; make install || make install -d &amp;&amp; make installdoc || make installdoc -d</pre>


<p><a id="taskset-tool" name="taskset-tool"></a></p><h3>taskset</h3>
<p>Taskset  is  used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity.  CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system.  The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs.  Note that the Linux scheduler also supports natural CPU affinity: the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons.  Therefore, forcing a specific CPU affinity is useful only in certain applications.</p>
<p>The  CPU  affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU.  Not all CPUs may exist on a given system but a mask may specify more CPUs than are present.  A retrieved mask will reflect only the bits that correspond to CPUs physically on the system.  If an invalid mask is given (i.e., one that corresponds to no valid CPUs on the current system) an error is returned.  A user must possess CAP_SYS_NICE to change the CPU affinity of a process.  Any user can retrieve the affinity mask.</p>

<p><a id="taskset-usage" name="taskset-usage"></a></p><h4>taskset Usage</h4>
<pre>taskset [options] [mask | cpu-list] [pid | cmd [args...]]
&nbsp;
-p, --pid            operate on existing given pid
-c, --cpu-list     display and specify cpus in list format</pre>

<p><a id="taskset-examples" name="taskset-examples"></a></p><h4>taskset-examples</h4>
<p>The default behavior is to run a new command:</p>
 <pre>$ taskset 03 sshd -b 1024</pre>
<p>You can retrieve the mask of an existing task or set it:</p>
<pre>$ taskset -p 700
$ taskset -p 03 700</pre>
<p>List format uses a comma-separated list instead of a mask:</p>
<pre>$ taskset -pc 0,3,7-11 700</pre>




<p><a id="chrt-tool" name="chrt-tool"></a></p><h3>chrt</h3>
<p><code>chrt</code> sets or retrieves the real-time scheduling attributes of an existing PID or runs COMMAND with the given attributes.  Both policy (one of <code>SCHED_FIFO</code>, <code>SCHED_RR</code>, or <code>SCHED_OTHER</code>) and priority can be set and retrieved.  A user must possess CAP_SYS_NICE to change the scheduling attributes of a process.  Any user can retrieve the scheduling information.</p>

<p><a id="chrt-usage" name="chrt-usage"></a></p><h4>chrt Usage</h4>
<pre>chrt [options] [prio] [pid | cmd [args...]]
&nbsp;
-p, --pid operate on an existing PID and do not launch a new task
-f, --fifo set scheduling policy to SCHED_FIFO
-m, --max show minimum and maximum valid priorities, then exit
-o, --other set policy scheduling policy to SCHED_OTHER
-r, --rr set scheduling policy to SCHED_RR (the default)</pre>

<p><a id="chrt-examples" name="chrt-examples"></a></p><h4>chrt Examples</h4>
<p>The default behavior is to run a new command:   <code>chrt [prio] -- [command] [arguments]</code></p>
<p>You can also retrieve the real-time attributes of an existing task:</p>
<pre>chrt -p [pid]</pre>
<p>Or set them:</p>
<pre>chrt -p [prio] [pid]</pre>













<p><a id="ulimit-tool" name="ulimit-tool"></a></p><h2>ulimit - get and set user limits</h2>
<p>Ulimit provides control over the resources available to processes started by the shell, on systems that allow such control. One can set the resource limits of the shell using the built-in ulimit command.  The shell's resource limits are inherited by the processes that it creates to execute commands.</p>

<p><a id="ulimit-usage" name="ulimit-usage"></a></p><h4>ulimit Usage</h4>
<pre>ulimit [-SHacdfilmnpqstuvx] [limit]</pre>
<dl>
<dt>-S</dt><dd>use the `soft' resource limit</dd>
<dt>-H</dt><dd>use the `hard' resource limit</dd>
<dt>-a</dt><dd>all current limits are reported</dd>
<dt>-c</dt><dd>the maximum size of core files created</dd>
<dt>-d</dt><dd>the maximum size of a process's data segment</dd>
<dt>-f</dt><dd>the maximum size of files created by the shell</dd>
<dt>-l</dt><dd>the maximum size a process may lock into memory</dd>
<dt>-m</dt><dd>the maximum resident set size</dd>
<dt>-n</dt><dd>the maximum number of open file descriptors</dd>
<dt>-p</dt><dd>the pipe buffer size</dd>
<dt>-s</dt><dd>the maximum stack size</dd>
<dt>-t</dt><dd>the maximum amount of cpu time in seconds</dd>
<dt>-u</dt><dd>the maximum number of user processes</dd>
<dt>-v</dt><dd>the size of virtual memory</dd>
</dl>
<p>If LIMIT is given, it is the new value of the specified resource; the special LIMIT values `soft', `hard', and `unlimited' stand for the current soft limit, the current hard limit, and no limit, respectively.  Otherwise, the current value of the specified resource is printed.  If no option is given, then -f is assumed.  Values are in 1024-byte increments, except for -t, which is in seconds, -p, which is in increments of 512 bytes, and -u, which is an unscaled number of processes.</p>
<dl>
<dt>RLIMIT_AS</dt>
<dd>The maximum size of the process's virtual memory (address space) in bytes.  This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit.  Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process if no alternate stack has been made available via sigaltstack(2)).  Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.</dd>
<dt>RLIMIT_CORE</dt>
<dd>Maximum size of core file.  When 0 no core dump files are created. When non-zero, larger dumps are truncated to this size.</dd>
<dt>RLIMIT_CPU CPU</dt>
<dd>time limit in seconds.  When the process reaches the soft limit, it is sent a SIGXCPU signal.  The default action for this signal is to terminate the process.  However, the signal can be caught, and the handler can return control to the main program.  If the process continues to consume CPU time, it will be sent SIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL. (This latter point describes Linux 2.2 through 2.6 behavior. Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit.  Portable applications that need to catch this signal should perform an orderly termination upon first receipt of SIGXCPU.)</dd>
<dt>RLIMIT_DATA</dt>
<dd>The maximum size of the process's data segment (initialized data, uninitialized data, and heap).  This limit affects calls to brk(2) and sbrk(2), which fail with the error ENOMEM upon encountering the soft limit of this resource.</dd>
<dt>RLIMIT_FSIZE</dt>
<dd>The maximum size of files that the process may create.  Attempts to extend a file beyond this limit result in delivery of a SIGXFSZ signal. By default, this signal terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g., write(2), truncate(2)) fails with the error EFBIG.</dd>
<dt>RLIMIT_LOCKS</dt>
<dd>(Early Linux 2.4 only) A limit on the combined number of flock(2) locks and fcntl(2) leases that this process may establish.</dd>
<dt>RLIMIT_MEMLOCK</dt>
<dd>The maximum number of bytes of memory that may be locked into RAM.  In effect this limit is rounded down to the nearest multiple of the system page size.  This limit affects mlock(2) and mlockall(2) and the mmap(2) MAP_LOCKED operation.  Since Linux 2.6.9 it also affects the shmctl(2) SHM_LOCK operation, where it sets a maximum on the total bytes in shared memory segments (see shmget(2)) that may be locked by the real user ID of the calling process.  The shmctl(2) SHM_LOCK locks are accounted for separately from the per-process memory locks established by mlock(2), mlockall(2), and mmap(2) MAP_LOCKED; a process can lock bytes up to this limit in each of these two categories.  In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be locked by a privileged process.  Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process may lock, and this limit instead governs the amount of memory that an unprivileged process may lock.</dd>
<dt>RLIMIT_MSGQUEUE</dt>
<dd>(Since Linux 2.6.8) Specifies the limit on the number of bytes that can be allocated for POSIX message queues for the real user ID of the calling process.  This limit is enforced for mq_open(3).  Each message queue that the user creates counts (until it is removed) against this limit according to the formula:  <code>bytes = attr.mq_maxmsg * sizeof(struct msg_msg *) +             attr.mq_maxmsg * attr.mq_msgsize</code> where attr is the mq_attr structure specified as the fourth argument to mq_open(3).  The first addend in the formula, which includes sizeof(struct msg_msg *) (4 bytes on Linux/i386), ensures that the user cannot create an unlimited number of zero-length messages (such messages nevertheless each consume some system memory for bookkeeping overhead).</dd>
<dt>RLIMIT_NICE</dt>
<dd>(since Linux 2.6.12, but see BUGS below) Specifies a ceiling to which the process's nice value can be raised using setpriority(2) or nice(2).  The actual ceiling for the nice value is calculated as 20 - rlim_cur.  (This strangeness occurs because negative numbers cannot be specified as resource limit values, since they typically have special meanings.  For example, RLIM_INFINITY typically is the same as -1.)</dd>
<dt>RLIMIT_NOFILE</dt>
<dd>Specifies a value one greater than the maximum file descriptor number that can be opened by this process.  Attempts (open(2), pipe(2), dup(2), etc.)  to exceed this limit yield the error EMFILE. (Historically, this limit was named RLIMIT_OFILE on BSD.)</dd>
<dt>RLIMIT_NPROC</dt>
<dd>The maximum number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process.  Upon encountering this limit, fork(2) fails with the error EAGAIN.</dd>
<dt>RLIMIT_RSS</dt>
<dd>Specifies the limit (in pages) of the process's resident set (the number of virtual pages resident in RAM).  This limit only has effect in Linux 2.4.x, x < 30, and there only affects calls to madvise(2) specifying MADV_WILLNEED.</dd>
<dt>RLIMIT_RTPRIO</dt>
<dd>(Since Linux 2.6.12, but see BUGS) Specifies a ceiling on the real-time priority that may be set for this process using sched_setscheduler(2) and sched_setparam(2).</dd>
<dt>RLIMIT_RTTIME</dt>
<dd>(Since Linux 2.6.25) Specifies a limit on the amount of CPU time that a process scheduled under a real-time scheduling policy may consume without making a blocking system call.  For the purpose of this limit, each time a process makes a blocking system call, the count of its consumed CPU time is reset to zero.  The CPU time count is not reset if the process continues trying to use the CPU but is preempted, its time slice expires, or it calls sched_yield(2). Upon reaching the soft limit, the process is sent a SIGXCPU signal.  If the process catches or ignores this signal and continues consuming CPU time, then SIGXCPU will be generated once each second until the hard limit is reached, at which point the process is sent a SIGKILL signal.  The intended use of this limit is to stop a runaway real-time process from locking up the system.</dd>
<dt>RLIMIT_SIGPENDING</dt>
<dd>(Since Linux 2.6.8) Specifies the limit on the number of signals that may be queued for the real user ID of the calling process.  Both standard and real-time signals are counted for the purpose of checking this limit.  However, the limit is only enforced for sigqueue(2); it is always possible to use kill(2) to queue one instance of any of the signals that are not already queued to the process.</dd>
<dt>RLIMIT_STACK</dt>
<dd>The maximum size of the process stack, in bytes.  Upon reaching this limit, a SIGSEGV signal is generated.  To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)).</dd>
</dl>

<p><a id="ulimit-examples" name="ulimit-examples"></a></p><h4>ulimit Examples</h4>
<p>Turn off core dumps</p>
<pre>ulimit -S -c 0</pre>








<h2>More Reading</h2>
<ul>
<li>Please see the <a href="http://pagesperso-orange.fr/sebastien.godard/">SYSSTAT Utilities Home for more performance monitoring tools</a> like sar, sadf, mpstat, iostat, pidstat and sa tools.</li>
<li><a href="http://gaarai.com/2009/03/06/multitasking-from-the-linux-command-line-plus-process-prioritization/">Multitasking from the Linux Command Line + Process Prioritization</a></li>
</ul>


<h2>Man Pages</h2>
<ol>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setscheduler.2.html">sched_setscheduler</a></li>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html">cpuset</a></li>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man7/signal.7.html">signal</a></li>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/getrlimit.2.html">getrlimit</a></li>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man3/ulimit.3.html">ulimit</a></li>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/ioprio_get.2.html">ioprio_get</a></li>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/ioprio_set.2.html">ioprio_set</a></li>
</ol>


<h2>Kernel Documentation</h2>
<ul>
<li><a href='http://uploads.askapache.com/2009/08/sched-stats.txt'>information on schedstats (Linux Scheduler Statistics)</a></li>
<li><a href='http://uploads.askapache.com/2009/08/sched-rt-group.txt'>real-time group scheduling</a></li>
<li><a href='http://uploads.askapache.com/2009/08/sched-nice-design.txt'>How and why the scheduler's nice levels are implemented</a></li>
<li><a href='http://uploads.askapache.com/2009/08/sched-domains.txt'>information on scheduling domains</a></li>
<li><a href='http://uploads.askapache.com/2009/08/sched-design-CFS.txt'>goals, design and implementation of the Complete Fair Scheduler</a></li>
</ul>



<h2>Future Discussions:</h2>
<p><a href="http://www.cuddletech.com/blog/pivot/entry.php?id=820">IO Benchmarking: How, Why and With What</a></p><p><a href="http://www.askapache.com/optimize/optimize-nice-ionice.html"></a><a href="http://www.askapache.com/optimize/optimize-nice-ionice.html">Optimizing Servers and Processes for Speed with ionice, nice, ulimit</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/optimize/optimize-nice-ionice.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Protecting Files with Advanced Mod_Rewrite Anti-Hotlinking</title>
		<link>http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html</link>
		<comments>http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html#comments</comments>
		<pubDate>Wed, 16 Sep 2009 05:09:11 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Htaccess]]></category>

		<guid isPermaLink="false">http://www.askapache.com/?p=3287</guid>
		<description><![CDATA[<p><a class="IFL" id="id17" href="http://www.askapache.com/htaccess/anti-hotlinking-mod_rewrite.html"></a>If you have files on your site that you don't want indexed by malicious search engines, grabbed and leeched by malicious spammers, or stolen and made available elsewhere, you can use mod_rewrite to drastically reduce or totally reduce that activity.<br class="C" /></p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html"></a><a href="http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html"><cite>AskApache.com</cite></a></p><p><a class="IFL" id="id17" href="http://uploads.askapache.com/2009/09/866310_predestrian_lights.jpg"></a><br style="padding-top:2em;" /><br />If you have files on your site that you don't want indexed by malicious search engines, grabbed and leeched by malicious spammers, or stolen and made available elsewhere, you can use mod_rewrite to drastically reduce or totally reduce that activity.</p>

<h2>The Worst Kind of People</h2>
<p><strong>Spammers, and Leechers</strong>.  They operate like this:  Let's say you have some mp3 files on a server, and SOMEWHERE on the web there is a link to that mp3 file's location.  This includes in javascript files, css files, robots.txt files, the spammers and leechers robots check all those files looking for the type of link they are looking for.   Then they try to request that file usually utilizing a number of different types of requests to get access to the file.  Then they use it for personal gain, at your peril.</p>
<p>Some robots perform valuable services for the world wide web community, and other leeching programming is pretty cool, so not all these activities are perpetrated by nefarious spammers.</p>
<br class="C" />

<h2>Blocking by IP COOKIE</h2>
<p>Ok so if a link exists to your file, it is going to be requested by a robot eventually, so the way to defeat them is by doing something on your site that modifies the way a user would request it.  Robots for the most part are not javascript-capable, so the most-common advanced method is to set a cookie using javascript, and then we can check for that cookie in the request for the file using mod_rewrite.</p>

<p>So if your site sets a cookie named fspammers, and furthermore gives that cookie a value of 445, then this is what the request sent by an HTTP Client like Firefox looks like.</p>
<pre>GET /hotlink/lovefreedom.mp3 HTTP/1.1
Host: static.askapache.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.askapache.com/wordpress/seo-in-wordpress.html
Cookie: fspammers=455</pre>


<h2>Mod_Rewrite HTTP Headers</h2>
<p>The mod_rewrite module has access to ALL the HTTP Headers sent in a request, so for each of the HTTP Headers in the request example above, we can use mod_rewrite to validate.</p>


<h2>Mod_Rewrite .htaccess Example</h2>
<p>Finally, now that everyone is on the same page about what is really going on, here is the <a href="http://www.askapache.com/htaccess/htaccess.html" title=".htaccess tutorial">.htaccess</a> code that blocks any requests for anything in the /hotlink/ folder.</p>
<p>Here are the triggers this code blocks access based on.</p>
<ol>
<li><strong>Cookie</strong>: Checks if fspammers cookie is present, and that it has the value of 445. </li>
<li><strong>HTTP Protocol</strong>: Checks if HTTP 1.1 is being used (many robots use 1.0)</li>
<li><strong>Host</strong>: Checks that the HOST Requested was static.askapache.com</li>
<li><strong>Referer</strong>:  Checks for Referring site is static.askapache.com or www.askapache.com</li>
</ol>


<pre>RewriteEngine On
RewriteBase /
&nbsp;
RewriteCond %{HTTP_COOKIE} !^.*fspammers=445.*$ [NC,OR]
RewriteCond %{THE_REQUEST} !^[A-Z]{3,9}\ /(.*)\ HTTP/1\.1 [NC,OR]
RewriteCond %{HTTP_HOST} !^z\.askapache\.com$ [NC,OR]
RewriteCond %{HTTP_REFERER} !^http://(www|z)\.askapache\.com.*$ [NC]
RewriteRule ^hotlink/.*$ - [F]&lt;/p&gt;</pre><p><a href="http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html"></a><a href="http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html">Protecting Files with Advanced Mod_Rewrite Anti-Hotlinking</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/htaccess/anti-hotlinking-modrewrite.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>THE Ultimate Htaccess</title>
		<link>http://www.askapache.com/htaccess/htaccess.html</link>
		<comments>http://www.askapache.com/htaccess/htaccess.html#comments</comments>
		<pubDate>Sat, 10 Jan 2009 13:05:32 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Htaccess]]></category>

		<guid isPermaLink="false">http://www.askapache.com.com/htaccess/htaccesselite-ultimate-htaccess-article.html</guid>
		<description><![CDATA[<p><strong>Skip this - still under edit</strong></p>
<p>I discovered these tips and tricks mostly while working as a network security penetration specialist hired to find security holes in web hosting environments.  Shared hosting is the most common and cheapest form of web-hosting where multiple customers are placed on a single machine and "share" the resources (CPU/RAM/SPACE).  The machines are configured to basically ONLY do HTTP and FTP.  No shells or any interactive logins, no ssh, just FTP access.  That is when I started examining htaccess files in great detail and learned about the incredible untapped power of htaccess.  For 99% of the worlds best Apache admins, they don't use .htaccess much, if AT ALL.  It's much easier, safer, and faster to configure Apache using the httpd.conf file instead.  However, this file is almost never readable on shared-hosts, and I've never seen it writable.  So the only avenue left for those on shared-hosting was and is the .htaccess file, and holy freaking fiber-optics.. it's almost as powerful as httpd.conf itself!<br /><br />Most all .htaccess code works in the httpd.conf file, but not all httpd.conf code works in .htaccess files, around 50%.  So all the best Apache admins and programmers never used .htaccess files.  There was no incentive for those with access to httpd.conf to use htaccess, and the gap grew.  It's common to see "computer gurus" on forums and mailing lists rail against all uses and users of .htaccess files, smugly announcing the well known problems with .htaccess files compared with httpd.conf - I wonder if these "gurus" know the history of the htaccess file, like it's use in the earliest versions of the HTTP Server- NCSA's HTTPd, which BTW, became known as Apache HTTP.  So you could easily say that htaccess files predates Apache itself.<br /><br />Once I discovered what .htaccess files could do towards helping me enumerate and exploit security vulnerabilities even on big shared-hosts I focused all my research into .htaccess files, meaning I was reading the venerable Apache HTTP Source code 24/7!  I compiled every released version of the Apache Web Server, ever, even NCSA's, and focused on enumerating the most powerful htaccess directives. Good times! Because my focus was on protocol/file/network vulnerabilites instead of web dev I built up a nice toolbox of htaccess tricks to do unusual things.  When I switched over to webdev in 2005 I started using htaccess for websites, not research.  I documented most of my favorites and rewrote the htaccess guide for webdevelopers.  After some great encouragement on various forums and nets I decided to start a blog to share my work with everyone, AskApache.com was registered, I published my guide, and it was quickly plagiarized and scraped all over the net.  Information is freedom, and freedom is information, so this blog has the least restrictive copyright for you.  Feel free to modify, copy, republish, sell, or use anything on this site ;)</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/htaccess/htaccess.html"></a><a href="http://www.askapache.com/htaccess/htaccess.html"><cite>AskApache.com</cite></a></p><p><dfn title="HyperText Access">.htaccess</dfn> is a very ancient configuration file that controls the Web Server running your website, and is one of the most powerful configuration files you will ever come across. Htaccess has the ability to control access of the <acronym title="World Wide Web">WWW</acronym>'s HyperText Transfer Protocol (<acronym title="HyperText Transfer Protocol">HTTP</acronym>) using Password Protection, 301 Redirects, rewrites, and much much more.  This is because this configuration file was coded in the earliest days of the web (HTTP), for one of the first Web Servers ever!  Eventually these Web Servers (configured with htaccess) became known as the World Wide Web, and eventually grew into the Internet we use today.</p>
<p><a class="IFL" href="/htaccess/htaccess.html"><img src="http://uploads.askapache.com/2008/08/htaccess-up.png" alt=".htaccess file tutorial" title=".htaccess file tutorial" width="141" height="144" /></a>This is not an <em>introduction to .htaccess</em>&hellip; This is the evolution of the best of the best <tt>.htaccess</tt> on the planet.  Its changed over the years with more and better <strong>.htaccess tricks</strong> using real <a href="#htaccess-code-examples">.htaccess example code</a>.  I add my favorite htaccess-related links and sites, results from my <a href="#best-htaccess-articles">htaccess research</a>, and frequently update this article.<br /><br />You've come to the right place if you are looking to acquire <strong>mad skills</strong> for using .htaccess files.<br /><br />Originally (2003) this guide was known in certain hacker circles and hidden corners of the net as an <em>ultimate .htaccess</em> due to the powerful <strong>htaccess tricks</strong> and tips to bypass security on a webhost, and also because many of the tricks and <a href="#htaccess-code-examples">examples</a> were pretty impressive back then in that group.<br class="C" /></p>

<h2>AskApache Htaccess Journey</h2>
<p><strong>Skip this - still under edit</strong></p>
<p>I discovered these tips and tricks mostly while working as a network security penetration specialist hired to find security holes in web hosting environments.  Shared hosting is the most common and cheapest form of web-hosting where multiple customers are placed on a single machine and "share" the resources (CPU/RAM/SPACE).  The machines are configured to basically ONLY do HTTP and FTP.  No shells or any interactive logins, no ssh, just FTP access.  That is when I started examining htaccess files in great detail and learned about the incredible untapped power of htaccess.  For 99% of the worlds best Apache admins, they don't use .htaccess much, if AT ALL.  It's much easier, safer, and faster to configure Apache using the httpd.conf file instead.  However, this file is almost never readable on shared-hosts, and I've never seen it writable.  So the only avenue left for those on shared-hosting was and is the .htaccess file, and holy freaking fiber-optics.. it's almost as powerful as httpd.conf itself!<br /><br />Most all .htaccess code works in the httpd.conf file, but not all httpd.conf code works in .htaccess files, around 50%.  So all the best Apache admins and programmers never used .htaccess files.  There was no incentive for those with access to httpd.conf to use htaccess, and the gap grew.  It's common to see "computer gurus" on forums and mailing lists rail against all uses and users of .htaccess files, smugly announcing the well known problems with .htaccess files compared with httpd.conf - I wonder if these "gurus" know the history of the htaccess file, like it's use in the earliest versions of the HTTP Server- NCSA's HTTPd, which BTW, became known as Apache HTTP.  So you could easily say that htaccess files predates Apache itself.<br /><br />Once I discovered what .htaccess files could do towards helping me enumerate and exploit security vulnerabilities even on big shared-hosts I focused all my research into .htaccess files, meaning I was reading the venerable Apache HTTP Source code 24/7!  I compiled every released version of the Apache Web Server, ever, even NCSA's, and focused on enumerating the most powerful htaccess directives. Good times! Because my focus was on protocol/file/network vulnerabilites instead of web dev I built up a nice toolbox of htaccess tricks to do unusual things.  When I switched over to webdev in 2005 I started using htaccess for websites, not research.  I documented most of my favorites and rewrote the htaccess guide for webdevelopers.  After some great encouragement on various forums and nets I decided to start a blog to share my work with everyone, AskApache.com was registered, I published my guide, and it was quickly plagiarized and scraped all over the net.  Information is freedom, and freedom is information, so this blog has the least restrictive copyright for you.  Feel free to modify, copy, republish, sell, or use anything on this site ;)</p>



<h2 class="htaccess" id="h21adf" style="font-weight:bold;">Htaccess - Evolved</h2>
<p>The Hyper Text Transfer Protocol (HTTP) was initiated at the CERN in Geneve (Switzerland), where it emerged (together with the HTML presentation language) from the need to exchange scientific information on a computer network in a simple manner. The first public HTTP implementation only allowed for plain text information, and almost instantaneously became a replacement of the GOPHER service. One of the first text-based browsers was LYNX which still exists today; a graphical HTTP client appeared very quickly with the name NCSA Mosaic. Mosaic was a popular browser back in 1994. Soon the need for a more rich multimedia experience was born, and the markup language provided support for a growing multitude of media types.</p>
<p>Htaccess file know-how will do several things for you:</p>
<ul><li>Make your website noticeably faster.</li><li>Allow you to debug your server with ease.</li><li>Make your life easier and more rewarding.</li><li>Allow you to work faster and more productively.</li></ul>


<h3>What Is .htaccess</h3>
<p>Specifically, <kbd>.htaccess</kbd> is the default file name of a special configuration file that provides a number of <a href="#htaccess-directives">directives</a> (commands) for controlling and configuring the <a href="http://httpd.apache.org/" title="open-source HTTP server running the WWW">Apache Web Server</a>, and also to control and configure <a href="#htaccess-modules">modules</a> that can be built into the Apache installation, or included at run-time like mod_rewrite (for htaccess rewrite), mod_alias (for htaccess redirects), and mod_ssl (for controlling SSL connections).</p>
<p><strong>Htaccess</strong> allows for decentralized management of Web Server configurations which makes life very easy for web hosting companies and especially their savvy consumers.  They set up and run "server farms" where many hundreds and thousands of web hosting customers are all put on the same Apache Server.  This type of hosting is called "virtual hosting" and without .htaccess files would mean that every customer must use the same exact settings as everyone else on their segment.  So that is why any half-decent web host allows/enables <em>(DreamHost, Powweb, MediaTemple, GoDaddy) .htaccess files</em>, though few people are aware of it.  Let's just say that if I was a customer on your server-farm, and .htaccess files were enabled, my websites would be a LOT faster than yours, as these configuration files allow you to fully take advantage of and utilize the resources allotted to you by your host.  If even 1/10 of the sites on a server-farm took advantage of what they are paying for, the providers would go out of business.</p>

<blockquote cite="http://httpd.apache.org/docs/1.3/misc/API.html">
<p>One of the design goals for this server was to maintain external compatibility with the NCSA 1.3 server --- that is, to read the same configuration files, to process all the directives therein correctly, and in general to be a drop-in replacement for NCSA. On the other hand, another design goal was to move as much of the server's functionality into modules which have as little as possible to do with the monolithic server core. The only way to reconcile these goals is to move the handling of most commands from the central server into the modules.</p>
<p>However, just giving the modules command tables is not enough to divorce them completely from the server core. The server has to remember the commands in order to act on them later. That involves maintaining data which is private to the modules, and which can be either per-server, or per-directory. Most things are per-directory, including in particular access control and authorization information, but also information on how to determine file types from suffixes, which can be modified by AddType and DefaultType directives, and so forth. In general, the governing philosophy is that anything which can be made configurable by directory should be; per-server information is generally used in the standard set of modules for information like Aliases and Redirects which come into play before the request is tied to a particular place in the underlying file system.</p>
<p>Another requirement for emulating the NCSA server is being able to handle the <strong>per-directory configuration files, generally called .htaccess files</strong>, though even in the NCSA server they can contain directives which have nothing at all to do with access control. Accordingly, after URI -> filename translation, but before performing any other phase, the server walks down the directory hierarchy of the underlying filesystem, following the translated pathname, to read any .htaccess files which might be present. The information which is read in then has to be merged with the applicable information from the server's own config files (either from the <code>&lt;directory&gt;</code> sections in access.conf, or from defaults in srm.conf, which actually behaves for most purposes almost exactly like <code>&lt;directory /&gt;</code>).</p>
<p>Finally, after having served a request which involved <strong>reading .htaccess files</strong>, we need to discard the storage allocated for handling them. That is solved the same way it is solved wherever else similar problems come up, by tying those structures to the per-transaction resource pool.</p>
</blockquote>


<h4 class="tic">Creating Htaccess Files</h4>
<p><a class="IFL" href="http://uploads.askapache.com/2009/01/htaccess-explorer.png"><img src="http://uploads.askapache.com/2009/01/htaccess-explorer.png" alt="What an Htaccess File Looks Like in Windows Explorer" title="What an Htaccess File Looks Like in Windows Explorer" width="243" height="322" /></a>Htaccess files use the default filename "<code>.htaccess</code>" but any unix-style file name can be specified from the <a href="#httpd-config-examples">main server config</a> using the <code>AccessFileName</code> directive.  The file isn't <code>.htaccess.txt</code>, its literally just named <code>.htaccess</code>.<br class="C" /></p>
<p><a class="IFR" href="http://uploads.askapache.com/2009/01/viewing-htaccess-files.png"><img src="http://uploads.askapache.com/2009/01/viewing-htaccess-files.png" alt="View .htaccess files" title="View .htaccess files" width="386" height="287" /></a>In a Windows Environment like the one I use for work, you can change how Windows opens and views .htaccess files by modifying the Folder Options in explorer.  As you can see, on my computer files ending in .htaccess are recognized as having the HTACCESS extension and are handled/opened by Adobe Dreamweaver CS4.<br class="C" /></p>

<h4>Htaccess Scope</h4>
<p>Unlike the main server configuration files like <a href="#httpd-config-examples">httpd.conf</a>, <strong>Htaccess files are read on every request</strong> therefore changes in these files take immediate effect.  Apache searches all directories and subdirectories that are htaccess-enabled for an .htaccess file which results in performance loss due to file accesses. I've never noticed a performance loss but OTOH, I know how to use them.  If you do have access to your main server configuration file, you should of course use that instead, and lucky for you ALL the .htaccess tricks and examples can be used there as well (just not vice versa).</p>


<h3>Htaccess File Syntax</h3>
<p>Htaccess files follow the same syntax as the main Apache configuration files, for powerusers here's an <a href='http://uploads.askapache.com/2009/01/apache.vim'>apache.vim</a> for VI. The one main difference is the <dfn title="Whether the directive is allowed in .htaccess files">context</dfn> of the directive, which means whether or not that directive is ALLOWED to be used inside of an .htaccess file.  Htaccess files are incredibly powerful, and can also be very dangerous as some directives allowed in the main configuration files would allow users/customers to completely bypass security/bandwidth-limits/resource-limits/file-permissions, etc..  About 1/4 of all Apache directives cannot be used inside an .htaccess file (also known as a per-directory context config).  The Apache Developers are well-regarded throughout the world as being among some of the best programmers, ever.  To enable a disallowed directive inside a .htaccess file would require modifying the source code and re-compiling the server (which they allow and encourage if you are the owner/admin).  Here's a taste of that famous Apache source code that builds the directives allowed in .htaccess file context, the key that tells whether its enabled in .htaccess context is the DIR_CMD_PERMS and then the OR_FILEINFO, which means a directive is enabled dependent on the AllowOverride directive that is only allowed in the main config.  First Apache 1.3.0, then Apache 2.2.10</p>

<h5>mod_autoindex</h5>
<pre>
AddIcon, add_icon, BY_PATH, DIR_CMD_PERMS, an icon URL followed by one or more filenames
AddIconByType, add_icon, BY_TYPE, DIR_CMD_PERMS, an icon URL followed by one or more MIME types
AddIconByEncoding, add_icon, BY_ENCODING, DIR_CMD_PERMS, an icon URL followed by one or more content encodings
AddAlt, add_alt, BY_PATH, DIR_CMD_PERMS, alternate descriptive text followed by one or more filenames
AddAltByType, add_alt, BY_TYPE, DIR_CMD_PERMS, alternate descriptive text followed by one or more MIME types
AddAltByEncoding, add_alt, BY_ENCODING, DIR_CMD_PERMS, alternate descriptive text followed by one or more content encodings
IndexOptions, add_opts, DIR_CMD_PERMS, RAW_ARGS, one or more index options
IndexIgnore, add_ignore, DIR_CMD_PERMS, ITERATE, one or more file extensions
AddDescription, add_desc, BY_PATH, DIR_CMD_PERMS, Descriptive text followed by one or more filenames
HeaderName, add_header, DIR_CMD_PERMS, TAKE1, a filename
ReadmeName, add_readme, DIR_CMD_PERMS, TAKE1, a filename
FancyIndexing, fancy_indexing, DIR_CMD_PERMS, FLAG, Limited to &#039;on&#039; or &#039;off&#039; (superseded by IndexOptions FancyIndexing)
DefaultIcon, ap_set_string_slot, (void *) XtOffsetOf(autoindex_config_rec, default_icon), DIR_CMD_PERMS, TAKE1, an icon URL
</pre>

<h5>mod_rewrite</h5>
<pre>
// mod_rewrite
RewriteEngine, cmd_rewriteengine, OR_FILEINFO, On or Off to enable or disable (default)
RewriteOptions, cmd_rewriteoptions, OR_FILEINFO, List of option strings to set
RewriteBase, cmd_rewritebase, OR_FILEINFO, the base URL of the per-directory context
RewriteCond, cmd_rewritecond, OR_FILEINFO, an input string and a to be applied regexp-pattern
RewriteRule, cmd_rewriterule, OR_FILEINFO, an URL-applied regexp-pattern and a substitution URL
RewriteMap, cmd_rewritemap, RSRC_CONF, a mapname and a filename
RewriteLock, cmd_rewritelock, RSRC_CONF, the filename of a lockfile used for inter-process synchronization
RewriteLog, cmd_rewritelog, RSRC_CONF, the filename of the rewriting logfile
RewriteLogLevel, cmd_rewriteloglevel, RSRC_CONF, the level of the rewriting logfile verbosity (0=none, 1=std, .., 9=max)
RewriteLog, fake_rewritelog, RSRC_CONF, [DISABLED] the filename of the rewriting logfile
RewriteLogLevel, fake_rewritelog, RSRC_CONF, [DISABLED] the level of the rewriting logfile verbosity
</pre>



<h3>Htaccess Directives</h3>
<p><strong>Don't ask why</strong>, but I personally downloaded each major/beta release of the Apache HTTPD source code from version 1.3.0 to version 2.2.10 (<dfn title="1.3.0, 1.3.1, 1.3.11, 1.3.12, 1.3.14, 1.3.17, 1.3.19, 1.3.2, 1.3.20, 1.3.22, 1.3.23, 1.3.24, 1.3.27, 1.3.28, 1.3.29, 1.3.3, 1.3.31, 1.3.32, 1.3.33, 1.3.34, 1.3.35, 1.3.36, 1.3.37, 1.3.39, 1.3.4, 1.3.41, 1.3.6, 1.3.9, 2.0.35, 2.0.36, 2.0.39, 2.0.40, 2.0.42, 2.0.43, 2.0.44, 2.0.45, 2.0.46, 2.0.47, 2.0.48, 2.0.49, 2.0.50, 2.0.51, 2.0.52, 2.0.53, 2.0.54, 2.0.55, 2.0.58, 2.0.59, 2.0.61, 2.0.63, 2.1.3-beta, 2.1.6-alpha, 2.1.7-beta, 2.1.8-beta, 2.1.9-beta, 2.2.0, 2.2.2, 2.2.3, 2.2.4, 2.2.6, 2.2.8, 2.2.9, 2.2.10">all 63 Apache versions</dfn>!), then I <strong>configured and compiled each version for a custom HTTPD installation built from source</strong>. This allowed me to find <strong><a href="#htaccess-directives-list">every directive allowed in .htaccess files</a></strong> for each particular version, which has never been done before, or since. <strong>YES!</strong> <em>I think that is so cool..</em></p>
<p><strong>An .htaccess directive</strong> is basically a command that is specific to a module or builtin to the core that performs a specific task or sets a specific setting for how Apache serves your WebSite.  Directives placed in Htaccess files <strong>apply to the directory they are in, and all sub-directories</strong>.  Here's the 3 top links (<em>official Apache Docs</em>) you will repeatedly use, bookmark/print/save them.</p>
<p><a href="http://uploads.askapache.com/2008/08/htaccess-up1.png"><img src="http://uploads.askapache.com/2008/08/htaccess-up1-350x178.png" alt="htaccess Context Legend" title="htaccess-up1" width="350" height="178" /></a></p>
<ol><li><a href="http://httpd.apache.org/docs/trunk/mod/directive-dict.html">Terms Used to Describe Directives</a></li><li><a href="http://httpd.apache.org/docs/trunk/mod/directives.html">Official List of Apache Directives</a></li><li><a href="http://httpd.apache.org/docs/trunk/mod/quickreference.html">Directive Quick-Reference -- with Context</a></li></ol>
<hr class="C" />


<h3>Litespeed Htaccess support</h3>
<p>Unlike other lightweight web servers, Apache compatible per-directory configuration overridden is fully supported by <a href="http://www.litespeedtech.com/">LiteSpeed Web Server</a>. With .htacess you can change configurations for any directory under document root on-the-fly, which in most cases is a mandatory feature in shared hosting environment.   It is worth noting that <em>enabling .htaccess support in LiteSpeed</em> Web Server will not degrade server's performance, comparing to Apache's 40% drop in performance. </p>







<h2>Main Server Config Examples</h2>
<p>Now lets take a look at some htaccess examples to get a feel for the syntax and some general ideas at the capabilities.  Some of the best examples for .htaccess files are included with Apache for <a href="http://httpd.apache.org/docs/trunk/configuring.html">main server config</a> files, so lets take a quick look at a couple of them on our way down to the actual .htaccess examples further down the page (this site has thousands, take your time).  As you can see, the basic syntax is a line starting with # is a comment, everything else are directives followed by the directive argument.</p>
<p><strong><a href="http://uploads.askapache.com/2008/08/httpd-multilang-errordocconf.in">httpd-multilang-errordoc.conf</a></strong>: The configuration below implements multi-language error documents through content-negotiation</p>
<pre>
Options IncludesNoExec
AddOutputFilter Includes html
AddHandler type-map var
LanguagePriority en cs de es fr it ja ko nl pl pt-br ro sv tr
ForceLanguagePriority Prefer Fallback
ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var
ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var
ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
</pre>
<p><strong><a href="http://uploads.askapache.com/2008/08/httpd-manualconf.in">httpd-manual.conf</a></strong>: Provide local access to the server documentation on your server</p>
<pre>
SetEnvIf Request_URI ^/manual/(de|en|es|fr|ja|ko|pt-br|ru|tr)/ prefer-language=$1
RedirectMatch 301 ^/manual(?:/(de|en|es|fr|ja|ko|pt-br|ru|tr)){2, }(/.*)?$ /manual/$1$2
LanguagePriority en de es fr ja ko pt-br ru tr
ForceLanguagePriority Prefer Fallback
</pre>
<p><strong><a href="http://uploads.askapache.com/2008/08/httpd-languagesconf.in">httpd-languages.conf</a></strong>: Settings for hosting different languages.</p>
<pre>
DefaultLanguage en
AddLanguage ca .ca
# Just list the languages in decreasing order of preference.
LanguagePriority en ca cs da de el eo es et fr he hr it ja ko ltz nl nn no pl pt pt-BR ru sv tr zh-CN zh-TW
# Commonly used filename extensions to character sets.
AddCharset us-ascii.ascii .us-ascii
AddCharset ISO-8859-1  .iso8859-1  .latin1
</pre>
<p><strong><a href="http://uploads.askapache.com/2008/08/httpd-autoindexconf.in">httpd-autoindex.conf</a></strong>: Directives controlling the display of server-generated directory listings.</p>
<pre>
# IndexOptions: Controls the appearance of server-generated directory listings.
IndexOptions FancyIndexing HTMLTable VersionSort
# AddIcon* directives tell the server which icon to show for different files or filename extensions.
AddIconByEncoding (CMP, /icons/compressed.gif) x-compress x-gzip
AddIconByType (TXT, /icons/text.gif) text/*
AddIcon /icons/folder.gif ^^DIRECTORY^^
# DefaultIcon is which icon to show for files which do not have an icon explicitly set.
DefaultIcon /icons/unknown.gif
# AddDescription allows you to place a short description after a file in server-generated indexes.
AddDescription "GZIP compressed document" .gz
# ReadmeName is the name of the README file the server will look for by default, and append to directory listings.
ReadmeName README.html
# HeaderName is the name of a file which should be prepended to directory indexes.
HeaderName HEADER.html
</pre>
<p>Here are the rest of them if you wanna take a look.  (<a href="http://uploads.askapache.com/2008/08/httpd-mpmconf.in" title="Server-Pool Management (MPM specific)">httpd-mpm.conf</a>, <a href="http://uploads.askapache.com/2008/08/httpd-defaultconf.in" title="This configuration file reflects default settings for Apache HTTP Server">httpd-default.conf</a>, <a href="http://uploads.askapache.com/2008/08/httpd-sslconf.in" title="Contains the configuration directives to instruct the server how to serve pages over an https connection">httpd-ssl.conf</a>, <a href="http://uploads.askapache.com/2008/08/httpd-infoconf.in" title="Get information about the requests being processed by the server and the configuration of the server">httpd-info.conf</a>, <a href="http://uploads.askapache.com/2008/08/httpd-vhostsconf.in" title="If you want to maintain multiple domains/hostnames on your machine">httpd-vhosts.conf</a>, <a href="http://uploads.askapache.com/2008/08/httpd-davconf.in" title="Distributed authoring and versioning (WebDAV)">httpd-dav.conf</a>)</p>
<hr class="C" />







<h2>Example .htaccess Files</h2>
<p>Here are some samples and examples taken from different .htaccess files I've used over the years.  Specific solutions are farther down on this page and throughout the site.</p>
<pre>
# Set the Time Zone of your Server
SetEnv TZ America/Indianapolis
# ServerAdmin:  This address appears on some server-generated pages, such as error documents.
SetEnv SERVER_ADMIN webmaster@askapache.com
# Possible values for the Options directive are "None", "All", or any combination of:
#  Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
Options -ExecCGI -MultiViews -Includes -Indexes FollowSymLinks
# DirectoryIndex: sets the file that Apache will serve if a directory is requested.
DirectoryIndex index.html index.php /index.php
#
# Action lets you define media types that will execute a script whenever
# a matching file is called. This eliminates the need for repeated URL
# pathnames for oft-used CGI file processors.
# Format: Action media/type /cgi-script/location
# Format: Action handler-name /cgi-script/location
#
Action php5-cgi /bin/php.cgi
#
# AddHandler allows you to map certain file extensions to "handlers":
# actions unrelated to filetype. These can be either built into the server
# or added with the Action directive (see below)
#
# To use CGI scripts outside of ScriptAliased directories:
# (You will also need to add "ExecCGI" to the "Options" directive.)
#
AddHandler php-cgi .php .inc
# Commonly used filename extensions to character sets.
AddDefaultCharset UTF-8
# AddType allows you to add to or override the MIME configuration
AddType &#039;application/rdf+xml; charset=UTF-8&#039; .rdf
AddType &#039;application/xhtml+xml; charset=UTF-8&#039; .xhtml
AddType &#039;application/xhtml+xml; charset=UTF-8&#039; .xhtml.gz
AddType &#039;text/html; charset=UTF-8&#039; .html
AddType &#039;text/html; charset=UTF-8&#039; .html.gz
AddType application/octet-stream .rar .chm .bz2 .tgz .msi .pdf .exe
AddType application/vnd.ms-excel .csv
AddType application/x-httpd-php-source .phps
AddType application/x-pilot .prc .pdb
AddType application/x-shockwave-flash .swf
AddType application/xrds+xml .xrdf
AddType text/plain .ini .sh .bsh .bash .awk .nawk .gawk .csh .var .c .in .h .asc .md5 .sha .sha1
AddType video/x-flv .flv
# AddEncoding allows you to have certain browsers uncompress information on the fly. Note: Not all browsers support this.
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz
# DefaultType: the default MIME type the server will use for a document.
DefaultType text/html
#
# Optionally add a line containing the server version and virtual host
# name to server-generated pages (internal error documents, FTP directory
# listings, mod_status and mod_info output etc., but not CGI generated
# documents or custom error documents).
# Set to "EMail" to also include a mailto: link to the ServerAdmin.
# Set to one of:  On | Off | EMail
#
ServerSignature Off
</pre>
<pre>
## MAIN DEFAULTS
Options +ExecCGI -Indexes
DirectoryIndex index.html index.htm index.php
DefaultLanguage en-US
AddDefaultCharset UTF-8
ServerSignature Off
## ENVIRONMENT VARIABLES
SetEnv PHPRC /webroot/includes
SetEnv TZ America/Indianapolis
&nbsp;
SetEnv SERVER_ADMIN webmaster@askapache.com
## MIME TYPES
AddType video/x-flv .flv
AddType application/x-shockwave-flash .swf
AddType image/x-icon .ico
## FORCE FILE TO DOWNLOAD INSTEAD OF APPEAR IN BROWSER
# http://www.htaccesselite.com/addtype-addhandler-action-vf6.html
AddType application/octet-stream .mov .mp3 .zip
## ERRORDOCUMENTS
# http://askapache.com/htaccess/apache-status-code-headers-errordocument.html
ErrorDocument 400 /e400/
ErrorDocument 401 /e401/
ErrorDocument 402 /e402/
ErrorDocument 403 /e403/
ErrorDocument 404 /e404/
#
# Handlers be builtin, included in a module, or added with Action directive
# default-handler: default, handles static content (core)
#   send-as-is: Send file with HTTP headers (mod_asis)
#   cgi-script: treat file as CGI script (mod_cgi)
#    imap-file: Parse as an imagemap rule file (mod_imap)
#   server-info: Get server config info (mod_info)
#  server-status: Get server status report (mod_status)
#    type-map: type map file for content negotiation (mod_negotiation)
#  fastcgi-script: treat file as fastcgi script (mod_fastcgi)
#
# http://www.askapache.com/php/custom-phpini-tips-and-tricks.html
## PARSE AS CGI
AddHandler cgi-script .cgi .pl .spl
## RUN PHP AS APACHE MODULE
AddHandler application/x-httpd-php .php .htm
## RUN PHP AS CGI
AddHandler php-cgi .php .htm
## CGI PHP WRAPPER FOR CUSTOM PHP.INI
AddHandler phpini-cgi .php .htm
Action phpini-cgi /cgi-bin/php5-custom-ini.cgi
## FAST-CGI SETUP WITH PHP-CGI WRAPPER FOR CUSTOM PHP.INI
AddHandler fastcgi-script .fcgi
AddHandler php-cgi .php .htm
Action php-cgi /cgi-bin/php5-wrapper.fcgi
## CUSTOM PHP CGI BINARY SETUP
AddHandler php-cgi .php .htm
Action php-cgi /cgi-bin/php.cgi
## PROCESS SPECIFIC FILETYPES WITH CGI-SCRIPT
Action image/gif /cgi-bin/img-create.cgi
## CREATE CUSTOM HANDLER FOR SPECIFIC FILE EXTENSIONS
AddHandler custom-processor .ssp
Action custom-processor /cgi-bin/myprocessor.cgi
### HEADER CACHING
# http://www.askapache.com/htaccess/speed-up-sites-with-htaccess-caching.html
&lt;FilesMatch "\.(flv|gif|jpg|jpeg|png|ico)$"&gt;
Header set Cache-Control "max-age=2592000"
&lt;/FilesMatch&gt;
&lt;FilesMatch "\.(js|css|pdf|swf)$"&gt;
Header set Cache-Control "max-age=604800"
&lt;/FilesMatch&gt;
&lt;FilesMatch "\.(html|htm|txt)$"&gt;
Header set Cache-Control "max-age=600"
&lt;/FilesMatch&gt;
&lt;FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$"&gt;
Header unset Cache-Control
&lt;/FilesMatch&gt;
## ALTERNATE EXPIRES CACHING
# htaccesselite.com/d/use-htaccess-to-speed-up-your-site-discussion-vt67.html
ExpiresActive On
ExpiresDefault A604800
ExpiresByType image/x-icon A2592000
ExpiresByType application/x-javascript A2592000
ExpiresByType text/css A2592000
ExpiresByType text/html A300
&lt;FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$"&gt;
ExpiresActive Off
&lt;/FilesMatch&gt;
## META HTTP-EQUIV REPLACEMENTS
&lt;FilesMatch "\.(html|htm|php)$"&gt;
Header set imagetoolbar "no"
&lt;/FilesMatch&gt;
</pre>
<p>Here are some default MOD_REWRITE code examples.</p>
<pre>
## REWRITE DEFAULTS
RewriteEngine On
RewriteBase /
## REQUIRE SUBDOMAIN
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{HTTP_HOST} !^subdomain\.askapache\.com$ [NC]
RewriteRule ^/(.*)$ http://subdomain.askapache.com/$1 [L,R=301]
## SEO REWRITES
RewriteRule ^(.*)/ve/(.*)$ $1/voluntary-employee/$2 [L,R=301]
RewriteRule ^(.*)/hsa/(.*)$ $1/health-saving-account/$2 [L,R=301]
## WORDPRESS
RewriteCond %{REQUEST_FILENAME} !-f  # Existing File
RewriteCond %{REQUEST_FILENAME} !-d  # Existing Directory
RewriteRule . /index.php [L]
## ALTERNATIVE ANTI-HOTLINKING
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(subdomain\.)?askapache.com/.*$ [NC]
RewriteRule ^.*\.(bmp|tif|gif|jpg|jpeg|jpe|png)$ - [F]
## REDIRECT HOTLINKERS
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(subdomain\.)?askapache.com/.*$ [NC]
RewriteRule ^.*\.(bmp|tif|gif|jpg|jpeg|jpe|png)$ http://google.com [R]
## DENY REQUEST BASED ON REQUEST METHOD
RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK|OPTIONS|HEAD)$ [NC]
RewriteRule ^.*$ - [F]
## REDIRECT UPLOADS
RewriteCond %{REQUEST_METHOD} ^(PUT|POST)$ [NC]
RewriteRule ^(.*)$ /cgi-bin/form-upload-processor.cgi?p=$1 [L,QSA]
## REQUIRE SSL EVEN WHEN MOD_SSL IS NOT LOADED
RewriteCond %{HTTPS} !=on [NC]
RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [R,L]
### ALTERNATATIVE TO USING ERRORDOCUMENT
# http://www.htaccesselite.com/d/htaccess-errordocument-examples-vt11.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^.*$ /error.php [L]
## SEO REDIRECTS
Redirect 301 /2006/oldfile.html http://subdomain.askapache.com/newfile.html
RedirectMatch 301 /o/(.*)$ http://subdomain.askapache.com/s/dl/$1
</pre>
<p>Examples of protecting your files and securing with password protection.</p>
<pre>
#
# Require (user|group|valid-user) (username|groupname)
#
## BASIC PASSWORD PROTECTION
AuthType basic
AuthName "prompt"
AuthUserFile /.htpasswd
AuthGroupFile /dev/null
Require valid-user
## ALLOW FROM IP OR VALID PASSWORD
Require valid-user
Allow from 192.168.1.23
Satisfy Any
## PROTECT FILES
&lt;FilesMatch "\.(htaccess|htpasswd|ini|phps|fla|psd|log|sh)$"&gt;
Order Allow,Deny
Deny from all
&lt;/FilesMatch&gt;
## PREVENT HOTLINKING
SetEnvIfNoCase Referer "^http://subdomain.askapache.com/" good
SetEnvIfNoCase Referer "^$" good
&lt;FilesMatch "\.(png|jpg|jpeg|gif|bmp|swf|flv)$"&gt;
Order Deny,Allow
Deny from all
Allow from env=good
ErrorDocument 403 http://www.google.com/intl/en_ALL/images/logo.gif
ErrorDocument 403 /images/you_bad_hotlinker.gif
&lt;/FilesMatch&gt;
## LIMIT UPLOAD FILE SIZE TO PROTECT AGAINST DOS ATTACK
#bytes, 0-2147483647(2GB)
LimitRequestBody 10240000
## MOST SECURE WAY TO REQUIRE SSL
# http://www.askapache.com/htaccess/apache-ssl-in-htaccess-examples.html
SSLOptions +StrictRequire
SSLRequireSSL
SSLRequire %{HTTP_HOST} eq "askapache.com"
ErrorDocument 403 https://askapache.com
## COMBINED DEVELOPER HTACCESS CODE-USE THIS
&lt;FilesMatch "\.(flv|gif|jpg|jpeg|png|ico|js|css|pdf|swf|html|htm|txt)$"&gt;
Header set Cache-Control "max-age=5"
&lt;/FilesMatch&gt;
AuthType basic
AuthName "Ooops! Temporarily Under Construction..."
AuthUserFile /.htpasswd
AuthGroupFile /dev/null
Require valid-user      # password prompt for everyone else
Order Deny,Allow
Deny from all
Allow from 192.168.64.5   # Your, the developers IP address
Allow from w3.org      # css/xhtml check jigsaw.w3.org/css-validator/
Allow from googlebot.com   # Allows google to crawl your pages
Satisfy Any        # no password required if host/ip is Allowed
## DONT HAVE TO EMPTY CACHE OR RELOAD TO SEE CHANGES
ExpiresDefault A5 #If using mod_expires
&lt;FilesMatch "\.(flv|gif|jpg|jpeg|png|ico|js|css|pdf|swf|html|htm|txt)$"&gt;
Header set Cache-Control "max-age=5"
&lt;/FilesMatch&gt;
## ALLOW ACCESS WITH PASSWORD OR NO PASSWORD FOR SPECIFIC IP/HOSTS
AuthType basic
AuthName "Ooops! Temporarily Under Construction..."
AuthUserFile /.htpasswd
AuthGroupFile /dev/null
Require valid-user      # password prompt for everyone else
Order Deny,Allow
Deny from all
Allow from 192.168.64.5   # Your, the developers IP address
Allow from w3.org      # css/xhtml check jigsaw.w3.org/css-validator/
Allow from googlebot.com   # Allows google to crawl your pages
Satisfy Any        # no password required if host/ip is Allowed
</pre>
<hr class="C" />






<h2>Example .htaccess Code Snippets</h2>
<p>Here are some specific examples, this is the most popular section of this page.  Updated frequently.</p>

<h4>Redirect Everyone Except IP address to alternate page</h4>
<pre>
ErrorDocument 403 http://www.yahoo.com/
Order deny,allow
Deny from all
Allow from 208.113.134.190
</pre>

<h4>When developing sites</h4>
<p>This lets google crawl the page, lets me access  without a password, and lets my client access the page WITH a password.  It also allows for XHTML and CSS validation! (w3.org)</p>
<pre>
AuthName "Under Development"
AuthUserFile /home/sitename.com/.htpasswd
AuthType basic
Require valid-user
Order deny,allow
Deny from all
Allow from 208.113.134.190 w3.org htmlhelp.com googlebot.com
Satisfy Any
</pre>

<h4>Fix double-login prompt</h4>
<p>Redirect non-https requests to https server and ensure that <strong>.htpasswd authorization</strong> can only be entered across HTTPS</p>
<pre>
SSLOptions +StrictRequire
SSLRequireSSL
SSLRequire %{HTTP_HOST} eq "askapache.com"
ErrorDocument 403 https://askapache.com
</pre>

<h4>Set Timezone of the Server (GMT)</h4>
<pre>
SetEnv TZ America/Indianapolis
</pre>

<h4>Administrator Email for ErrorDocument</h4>
<pre>
SetEnv SERVER_ADMIN webmaster@google.com
</pre>

<h4><code>ServerSignature</code> for <code>ErrorDocument</code></h4>
<pre>
ServerSignature off | on | email
</pre>

<h4>Charset and Language headers</h4>
<p>Article: <a href="/htaccess/setting-charset-in-htaccess.html">Setting Charset in htaccess</a>, and <a href="http://www.w3.org/International/questions/qa-htaccess-charset">article by <cite>Richard Ishida</cite></a></p>
<pre>
AddDefaultCharset UTF-8
DefaultLanguage en-US
</pre>

<h4>Disallow Script Execution</h4>
<pre>
Options -ExecCGI
AddHandler cgi-script .php .pl .py .jsp .asp .htm .shtml .sh .cgi
</pre>

<h4>Deny Request Methods</h4>
<pre>
RewriteCond %{REQUEST_METHOD} !^(GET|HEAD|OPTIONS|POST|PUT)
RewriteRule .* - [F]
</pre>

<h4>Force "File Save As" Prompt</h4>
<pre>
AddType application/octet-stream .avi .mpg .mov .pdf .xls .mp4
</pre>

<h4>Show CGI Source Code</h4>
<pre>
RemoveHandler cgi-script .pl .py .cgi
AddType text/plain .pl .py .cgi
</pre>

<h4>Serve all .pdf files on your site using .htaccess and mod_rewrite with the php script.</h4>
<pre>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.+)\.pdf$  /cgi-bin/pdf.php?file=$1 [L,NC,QSA]
</pre>

<h4>Rewrite to www</h4>
<pre>
RewriteCond %{REQUEST_URI} !^/(robots\.txt|favicon\.ico|sitemap\.xml)$
RewriteCond %{HTTP_HOST} !^www\.askapache\.com$ [NC]
RewriteRule ^(.*)$ http://www.askapache.com/$1 [R=301,L]
</pre>

<h4>Rewrite to www dynamically</h4>
<pre>
RewriteCond %{REQUEST_URI} !^/robots\.txt$ [NC]
RewriteCond %{HTTP_HOST} !^www\.[a-z-]+\.[a-z]{2,6} [NC]
RewriteCond %{HTTP_HOST} ([a-z-]+\.[a-z]{2,6})$   [NC]
RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]
</pre>

<h4>301 Redirect Old File</h4>
<pre>
Redirect 301 /old/file.html http://www.askapache.com/new/file.html
</pre>

<h4>301 Redirect Entire Directory</h4>
<pre>
RedirectMatch 301 /blog(.*) http://www.askapache.com/$1
</pre>

<h4>Protecting your php.cgi</h4>
<pre>
&lt;FilesMatch "^php5?\.(ini|cgi)$"&gt;
Order Deny,Allow
Deny from All
Allow from env=REDIRECT_STATUS
&lt;/FilesMatch&gt;
</pre>

<h4>Set Cookie based on Request</h4>
<p>This code sends the <code>Set-Cookie</code> header to create a cookie on the client with the value of a matching item in 2nd parantheses.</p>
<pre>
RewriteEngine On
RewriteBase /
RewriteRule ^(.*)(de|es|fr|it|ja|ru|en)/$ - [co=lang:$2:.askapache.com:7200:/]
</pre>

<h4>Set Cookie with env variable</h4>
<pre>
Header set Set-Cookie "language=%{lang}e; path=/;" env=lang
</pre>

<h4>Custom ErrorDocuments</h4>
<pre>
ErrorDocument 100 /100_CONTINUE
ErrorDocument 101 /101_SWITCHING_PROTOCOLS
ErrorDocument 102 /102_PROCESSING
ErrorDocument 200 /200_OK
ErrorDocument 201 /201_CREATED
ErrorDocument 202 /202_ACCEPTED
ErrorDocument 203 /203_NON_AUTHORITATIVE
ErrorDocument 204 /204_NO_CONTENT
ErrorDocument 205 /205_RESET_CONTENT
ErrorDocument 206 /206_PARTIAL_CONTENT
ErrorDocument 207 /207_MULTI_STATUS
ErrorDocument 300 /300_MULTIPLE_CHOICES
ErrorDocument 301 /301_MOVED_PERMANENTLY
ErrorDocument 302 /302_MOVED_TEMPORARILY
ErrorDocument 303 /303_SEE_OTHER
ErrorDocument 304 /304_NOT_MODIFIED
ErrorDocument 305 /305_USE_PROXY
ErrorDocument 307 /307_TEMPORARY_REDIRECT
ErrorDocument 400 /400_BAD_REQUEST
ErrorDocument 401 /401_UNAUTHORIZED
ErrorDocument 402 /402_PAYMENT_REQUIRED
ErrorDocument 403 /403_FORBIDDEN
ErrorDocument 404 /404_NOT_FOUND
&nbsp;
ErrorDocument 405 /405_METHOD_NOT_ALLOWED
ErrorDocument 406 /406_NOT_ACCEPTABLE
ErrorDocument 407 /407_PROXY_AUTHENTICATION_REQUIRED
ErrorDocument 408 /408_REQUEST_TIME_OUT
ErrorDocument 409 /409_CONFLICT
ErrorDocument 410 /410_GONE
ErrorDocument 411 /411_LENGTH_REQUIRED
ErrorDocument 412 /412_PRECONDITION_FAILED
ErrorDocument 413 /413_REQUEST_ENTITY_TOO_LARGE
ErrorDocument 414 /414_REQUEST_URI_TOO_LARGE
ErrorDocument 415 /415_UNSUPPORTED_MEDIA_TYPE
ErrorDocument 416 /416_RANGE_NOT_SATISFIABLE
ErrorDocument 417 /417_EXPECTATION_FAILED
ErrorDocument 422 /422_UNPROCESSABLE_ENTITY
ErrorDocument 423 /423_LOCKED
ErrorDocument 424 /424_FAILED_DEPENDENCY
ErrorDocument 426 /426_UPGRADE_REQUIRED
ErrorDocument 500 /500_INTERNAL_SERVER_ERROR
ErrorDocument 501 /501_NOT_IMPLEMENTED
ErrorDocument 502 /502_BAD_GATEWAY
ErrorDocument 503 /503_SERVICE_UNAVAILABLE
ErrorDocument 504 /504_GATEWAY_TIME_OUT
ErrorDocument 505 /505_VERSION_NOT_SUPPORTED
ErrorDocument 506 /506_VARIANT_ALSO_VARIES
ErrorDocument 507 /507_INSUFFICIENT_STORAGE
ErrorDocument 510 /510_NOT_EXTENDED
</pre>

<h4>Implementing a Caching Scheme with .htaccess</h4>
<pre>
# year
&lt;FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$"&gt;
Header set Cache-Control "public"
Header set Expires "Thu, 15 Apr 2010 20:00:00 GMT"
Header unset Last-Modified
&lt;/FilesMatch&gt;
#2 hours
&lt;FilesMatch "\.(html|htm|xml|txt|xsl)$"&gt;
Header set Cache-Control "max-age=7200, must-revalidate"
&lt;/FilesMatch&gt;
&lt;FilesMatch "\.(js|css)$"&gt;
SetOutputFilter DEFLATE
Header set Expires "Thu, 15 Apr 2010 20:00:00 GMT"
&lt;/FilesMatch&gt;
</pre>

<h4>Password Protect single file</h4>
<pre>
&lt;Files login.php&gt;
AuthName "Prompt"
AuthType Basic
AuthUserFile /home/askapache.com/.htpasswd
Require valid-user
&lt;/Files&gt;
</pre>

<h4>Password Protect multiple files</h4>
<pre>
&lt;FilesMatch "^(private|phpinfo)\.*$"&gt;
AuthName "Development"
AuthUserFile /.htpasswd
AuthType basic
Require valid-user
&lt;/FilesMatch&gt;
</pre>

<h4>Send Custom Headers</h4>
<pre>
Header set P3P "policyref=\"http://www.askapache.com/w3c/p3p.xml\""
Header set X-Pingback "http://www.askapache.com/xmlrpc.php"
Header set Content-Language "en-US"
Header set Vary "Accept-Encoding"
</pre>

<h4>Blocking based on User-Agent Header</h4>
<pre>
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT
</pre>

<h4>Blocking with RewriteCond</h4>
<pre>
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]
RewriteRule . - [F,L]
</pre>

<h4>.htaccess for mod_php</h4>
<pre>
SetEnv PHPRC /location/todir/containing/phpinifile
</pre>

<h4>.htaccess for php as cgi</h4>
<pre>
AddHandler php-cgi .php .htm
Action php-cgi /cgi-bin/php5.cgi
</pre>

<h4>Shell wrapper for custom php.ini</h4>
<pre>
#!/bin/sh
export PHP_FCGI_CHILDREN=3
exec php5.cgi -c /abs/php5/php.ini
</pre>

<h4>Add values from HTTP Headers</h4>
<pre>
SetEnvIfNoCase ^If-Modified-Since$ "(.+)" HTTP_IF_MODIFIED_SINCE=$1
SetEnvIfNoCase ^If-None-Match$ "(.+)" HTTP_IF_NONE_MATCH=$1
SetEnvIfNoCase ^Cache-Control$ "(.+)" HTTP_CACHE_CONTROL=$1
SetEnvIfNoCase ^Connection$ "(.+)" HTTP_CONNECTION=$1
SetEnvIfNoCase ^Keep-Alive$ "(.+)" HTTP_KEEP_ALIVE=$1
SetEnvIfNoCase ^Authorization$ "(.+)" HTTP_AUTHORIZATION=$1
SetEnvIfNoCase ^Cookie$ "(.+)" HTTP_MY_COOKIE=$1
</pre>

<h4>Stop hotlinking</h4>
<pre>
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?askapache.com/.*$ [NC]
RewriteRule \.(gif|jpg|swf|flv|png)$ http://www.askapache.com/feed.gif [R=302,L]
</pre>

<h4>Turn logging off for IP</h4>
<pre>
SecFilterSelective REMOTE_ADDR "208\.113\.183\.103" "nolog,noauditlog,pass"
</pre>

<h4>Turn logging on for IP</h4>
<pre>
SecFilterSelective REMOTE_ADDR "!^208\.113\.183\.103" "nolog,noauditlog,pass"
SecFilterSelective REMOTE_ADDR "208\.113\.183\.103" "log,auditlog,pass"
</pre>
<hr class="C" />






<h2>Advanced Mod_Rewrite Examples</h2>






<h2>Best .htaccess Articles</h2>

<h3><a title="Apache HTTP Web Server htaccess tips and tricks" rel="chapter" href="http://www.askapache.com/htaccess/htaccess-for-webmasters.html">.htaccess for Webmasters</a></h3><ul><li><a title="htaccess trick to run requests through a cgi script" href="/htaccess/htaccess-for-webmasters.html#process-file-through-cgi">Process certain requests for files using a cgi script</a></li><li><a title="htaccess security for apache hacking" href="/htaccess/htaccess-for-webmasters.html#process-request-methods-with-script">Process Requests with certain Request Methods</a></li><li><a title="Apache ForceType Directive in htaccess process file" href="/htaccess/htaccess-for-webmasters.html#force-filetype-with-forcetype">Make any file be a certain filetype</a></li><li><a title="Using the IfModule Directive in Apache htaccess files." href="/htaccess/htaccess-for-webmasters.html#ifmodule-in-apache">Use IfModule directive for robust code</a></li></ul>

<h3><a title="mod_rewrite RewriteRule, RewriteCond help" rel="chapter" href="http://www.askapache.com/htaccess/mod_rewrite-tips-and-tricks.html">Mod_Rewrite URL Rewriting</a></h3><p><a class="hs hs13" href="/htaccess/crazy-advanced-mod_rewrite-tutorial.html#decoded"></a>Undocumented techniques and methods will allow you to utilize mod_rewrite at an  "expert level" by showing you how to <a href="/htaccess/crazy-advanced-mod_rewrite-tutorial.html#decoded">unlock its secrets</a>.</p><ul><li><a title="Search query string at QUERY_STRING" href="/htaccess/mod_rewrite-tips-and-tricks.html#check-for-key-in-query-string">Check for a key in QUERY_STRING</a></li><li><a title="Deny access using htaccess during certain time" href="/htaccess/mod_rewrite-tips-and-tricks.html#time-based-access">Block access to files during certain hours of the day</a></li><li><a title="Change underscores to hyphens for SEO URL" href="/htaccess/mod_rewrite-tips-and-tricks.html#convert-underscore-hyphen">Rewrite underscores to hyphens for SEO URL</a></li><li><a title="Rewriting WordPress RSS feeds to Feedburner in SEO friendly method" href="/htaccess/mod_rewrite-tips-and-tricks.html#redirect-wordpress-feed">Redirecting WordPress Feeds to Feedburner</a></li></ul>

<h3><a title="301 Redirects" rel="chapter" href="http://www.askapache.com/htaccess/seo-search-engine-friendly-redirects-without-mod_rewrite.html">301 Redirects without mod_rewrite</a></h3><ul><li><a title="301 Redirect single file" href="/htaccess/seo-search-engine-friendly-redirects-without-mod_rewrite.html#seo-301-redirect-single-file">Redirect single url</a></li><li><a title="301 Redirect new domain" href="/htaccess/seo-search-engine-friendly-redirects-without-mod_rewrite.html#seo-301-redirect-new-domain">Redirect to new Domain</a></li></ul>

<h3><a href="/htaccess/php-cgi-redirect_status.html">Secure PHP with .htaccess</a></h3>
<p><a class="IFL" title="Locking down your php.ini and php cgi with .htaccess" href="/htaccess/php-cgi-redirect_status.html"><img src="http://uploads.askapache.com/2008/01/jail-bars-1.png" alt="Locking down your php.ini and php cgi with .htaccess" title="jail bars 1 htaccess" /></a>If you have a php.cgi or php.ini file in your /cgi-bin/ directory or other pub directory, try requesting them from your web browser.  If your php.ini shows up or worse you are able to execute your php cgi, you'll need to secure it ASAP.  This shows several ways to secure these files, and other interpreters like perl, fastCGI, bash, csh, etc.<br class="C" /></p>

<h3><a href="/htaccess/htaccess-fresh.html">.htaccess Cookie Manipulation</a></h3><p><a class="IFL" title="Cookie Manipulation in .htaccess with RewriteRule" href="/htaccess/htaccess-fresh.html"><img src="http://uploads.askapache.com/2007/10/cookies.png" alt="Cookie Manipulation in .htaccess with RewriteRule" title="cookies htaccess" /></a><strong>Fresh <a href="/htaccess/htaccess.html">.htaccess</a> code</strong> for you!  Check out the Cookie Manipulation and environment variable usage with mod_rewrite!  I also included a couple Mod_Security .htaccess examples. <strong>Enjoy!</strong><br class="C" /></p><ul><li><a href="/htaccess/htaccess-fresh.html#modrewrite1">Mod_Rewrite .htaccess Examples</a></li><li><a href="/htaccess/htaccess-fresh.html#modrewrite2">Cookie Manipulation and Tests with mod_rewrite</a></li><li><a href="/htaccess/htaccess-fresh.html#modrewrite3">Setting Environment Variables</a></li><li><a href="/htaccess/htaccess-fresh.html#modrewrite4">Using the Environment Variable</a></li><li><a href="/htaccess/htaccess-fresh.html#modrewrite5">Mod_Security .htaccess Examples</a></li></ul>

<h3><a title="htaccess Caching" rel="chapter" href="http://www.askapache.com/htaccess/speed-up-your-site-with-caching-and-cache-control.html">.htaccess Caching</a></h3><ul><li><a href="/htaccess/speed-up-sites-with-htaccess-caching.html">Speed Up Sites with htaccess Caching</a></li><li><a title="htaccess time cheatsheet" href="/htaccess/speed-up-your-site-with-caching-and-cache-control.html#htaccess-time-cheatsheet">htaccess time cheat sheet</a></li></ul>

<h3><a title="401, 403 htpasswd authentication" rel="chapter" href="http://www.askapache.com/htaccess/apache-authentication-in-htaccess.html">Password Protection and Authentication</a></h3><ul><li><a title="Requiring a password for single file" href="/htaccess/apache-authentication-in-htaccess.html#require-password-for-single-file">Require password for single file</a></li><li><a title="A comprehensive default Apache .htaccess example file" href="/htaccess/apache-authentication-in-htaccess.html#skeleton-htaccess">Example .htaccess file for password protection</a></li></ul>

<h3><a title="Creating and using HTTP Headers with htaccess" rel="chapter" href="http://www.askapache.com/htaccess/using-http-headers-with-htaccess.html">Control HTTP Headers</a></h3><ul><li><a title="Prevent Browsers and Proxies from caching" href="/htaccess/using-http-headers-with-htaccess.html#prevent-caching-with-htaccess">Prevent Caching 100%</a></li><li><a title="Remove Internet Explorer imagetoolbar" href="/htaccess/using-http-headers-with-htaccess.html#remove-ie-imagetoolbar">Remove IE imagetoolbar without meta tag</a></li><li><a title="How To use Apache to send P3P Privacy Header for website" href="/htaccess/using-http-headers-with-htaccess.html#privacy-p3p-header-in-apache">Add Privacy (P3P) Header to your site</a></li><li><a title="Language header, Charset header without meta" href="/htaccess/using-http-headers-with-htaccess.html#language-and-content-header-in-htaccess">Add language and charset headers without meta tags</a></li></ul>

<h3><a href="/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html">Blocking Spam and bad Bots</a></h3><p><a class="IFR" href="/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html"><img title="Block Bad Robot" src="http://uploads.askapache.com/2008/04/bad_robot1.png" alt="Block Bad Robot" height="100" /></a>Want to block a bad robot or web scraper using .htaccess files?  Here are 2 methods that illustrate blocking 436 various user-agents.  You can block them using either SetEnvIf methods, or by using Rewrite Blocks.<br class="C" /></p>

<h3><a title="php htaccess tips, htaccess php tricks" rel="chapter" href="http://www.askapache.com/htaccess/php-htaccess-tips-and-tricks.html">PHP htaccess tips</a></h3><p>By using some cool .htaccess tricks we can control PHP to be run as a cgi or a module.  If php is run as a cgi then we need to compile it ourselves or use .htaccess to force php to use a local php.ini file.  If it is running as a module then we can use various directives supplied by that modules in .htaccess</p><ul><li><a title=".htaccess for php as cgi" href="/htaccess/php-htaccess-tips-and-tricks.html#php-run-as-cgi">When php run as CGI</a></li><li><a title="custom php.ini with Apache htaccess using PHPRC" href="/htaccess/php-htaccess-tips-and-tricks.html#htaccess-php-ini">Use a custom php.ini with mod_php or php as a cgi</a></li><li><a title="htaccess tips for mod_php php running as Apache module" href="/htaccess/php-htaccess-tips-and-tricks.html#sub-mod_php">When php run as Apache Module (mod_php)</a></li><li><a title="Apache FastCGI wrapper for php cgi" href="/htaccess/php-htaccess-tips-and-tricks.html#php-and-fastcgi-in-htaccess">When cgi php is run with wrapper (FastCGI)</a></li></ul>

<h3><a href="/htaccess/http-https-rewriterule-redirect.html">HTTP to HTTPS Redirects with mod_rewrite</a></h3><p><a href="/htaccess/http-https-rewriterule-redirect.html"><img class="IFL" src="http://uploads.askapache.com/2007/11/security.png" alt="HTTP to HTTPS Redirects with mod_rewrite" title="security htaccess" /></a>This is freaking sweet if you use SSL I promise you!  Basically instead of having to check for HTTPS using a <code>RewriteCond %{HTTPS} =on</code> for every redirect that can be either HTTP or HTTPS, I set an environment variable once with the value "http" or "https" if HTTP or HTTPS is being used for that request, and use that env variable in the RewriteRule.<br class="C" /></p>

<h3><a title="Apache SSL examples" rel="chapter" href="http://www.askapache.com/htaccess/ssl-example-usage-in-htaccess.html">SSL in .htaccess</a></h3><ul><li><a title="Redirecting non-SSL to SSL in Apache" href="/htaccess/ssl-example-usage-in-htaccess.html#redirect-http-to-https">Redirect non-https requests to https server</a></li><li><a title="redirect HTTP to HTTPS without mod_ssl!" href="/htaccess/ssl-example-usage-in-htaccess.html#rewrite-http-to-https-no-mod_ssl">Rewrite non-https to HTTPS without mod_ssl!</a></li><li><a title="Redirect HTTP to HTTPS by port" href="/htaccess/ssl-example-usage-in-htaccess.html#redirect-port-80-to-https">Redirect everything served on port 80 to HTTPS URI</a></li></ul>

<h3><a title="Conditionally setting variables in Apache .htaccess" rel="chapter" href="http://www.askapache.com/htaccess/setenvif.html">SetEnvIf and SetEnvIfNoCase in .htaccess</a></h3><ul><li><a title="Unique mod_setenvif Variables" href="/htaccess/setenvif.html#setenvif-variables">Unique mod_setenvif Variables</a></li><li><a title="Populates HTTP_MY_ Variables with mod_setenvif variable values" href="/htaccess/setenvif.html#http-headers">Populates HTTP_MY_ Variables with mod_setenvif variable values</a></li><li><a title="Allows only if HOST Header is present in request" href="/htaccess/setenvif.html#allow-host">Allows only if HOST Header is present in request</a></li><li><a title="Add values from HTTP Headers" href="/htaccess/setenvif.html#header-copy">Add values from HTTP Headers</a></li></ul>

<h3><a title="htaccess security and hacking" rel="chapter" href="http://www.askapache.com/htaccess/security-with-htaccess.html">Site Security with .htaccess</a></h3>
<p>chmod .htpasswd files 640, chmod .htaccess 644, php files 600, and chmod files that you really dont want people to see as 400. (NEVER chmod 777, try 766)</p>
<ul><li><a title="CHMOD .htaccess, chmod .htpasswd, chmodding files" href="/htaccess/security-with-htaccess.html#chmod-htaccess-info">CHMOD your files</a></li><li><a title="Deny access for htaccess/htpasswd file" href="/htaccess/security-with-htaccess.html#deny-htaccess-htpasswd-access">Prevent access to .htaccess and .htpasswd files</a></li><li><a title="Show source code in browser, prevent executing file" href="/htaccess/security-with-htaccess.html#show-source-code">Show Source Code instead of executing</a></li><li><a title="Remove execution privileges" href="/htaccess/security-with-htaccess.html#securing-directories-with-htaccess">Securing directories: Remove ability to execute scripts</a></li><li><a title="ErrorDocument usage in htaccess files" href="/htaccess/security-with-htaccess.html#errordocument-usage-in-htaccess">.htaccess ErrorDocuments</a></li></ul>

<h3><a title="mod_security Guide and sample mod_Security diretive usage in .htaccess" rel="chapter" href="http://www.askapache.com/htaccess/mod_security-htaccess-tricks.html">.htaccess Security with MOD_SECURITY</a></h3><ul><li><a href="/htaccess/mod_security-htaccess-tricks.html#mod_security-mod_rewrite">mod_security + mod_rewrite</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#block-post-spam">Block Spam by examining POST form fields</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#disable-mod_security">Disabling mod_security conditionally per IP</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#mod_security-authorization">Disabling mod_security with .htaccess Authorization</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#block-wordpress-spam">Block WordPress Spam Forever!</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#pause-connections">Force Any Connections to be Paused a set number of ms</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#mod_security-debugging">ModSecurity Debugging and Logging</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#conditional-logging">Turn Off/On Logging JUST for your IP Address</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#mod_security-directives">Mod_Security Directives for DreamHost</a></li><li><a href="/htaccess/mod_security-htaccess-tricks.html#httpdconf-rules">Example httpd.conf mod_security rule files</a></li></ul>
<hr class="C" />





<h3>Merging Notes</h3>
<p>The order of merging is:</p>
<ol>
<li><code>&lt;Directory&gt;</code> (except regular expressions) and .htaccess done simultaneously (with .htaccess, if allowed, overriding <code>&lt;Directory&gt;</code>)</li>
<li><code>&lt;DirectoryMatch&gt;</code> (and <code>&lt;Directory ~&gt;</code>)</li>
<li><code>&lt;Files&gt;</code> and <code>&lt;FilesMatch&gt;</code> done simultaneously</li>
<li><code>&lt;Location&gt;</code> and <code>&lt;LocationMatch&gt;</code> done simultaneously</li>
</ol>
<p>Below is an artificial example to show the order of merging. Assuming they all apply to the request, the directives in this example will be applied in the order:</p>
<p><code>A &gt; B &gt; C &gt; D &gt; E</code></p>.
<pre>
&lt;Location /&gt;
E
&lt;/Location&gt;
&lt;Files askapache.txt&gt;
D
&lt;/Files&gt;
&lt;VirtualHost *&gt;
&lt;Directory /a/b&gt;
B
&lt;/Directory&gt;
&lt;/VirtualHost&gt;
&lt;DirectoryMatch "^.*b$"&gt;
C
&lt;/DirectoryMatch&gt;
&lt;Directory /a/b&gt;
A
&lt;/Directory&gt;
</pre>





<h2>My Favorite .htaccess Links</h2>
<p class="anote">These are just some of my favorite <a href="http://www.google.com/Top/Computers/Internet/Web_Design_and_Development/Authoring/FAQs,_Help,_and_Tutorials/Access_Control/">.htaccess resources</a>.  I'm really into doing your own hacking to get knowledge and these links are all great resources in that respect.  I'm really interested in new or unusual htaccess solutions or htaccess hacks using .htaccess files, so let me know if you find one.</p>
<p><strong>NCSA HTTPd Tutorials</strong><br /></p>
<p><strong>Robert Hansen</strong><br />Here's a great <a href="http://www.securityfocus.com/infocus/1368">Hardening HTAccess part 1</a>, <a href="http://www.securityfocus.com/infocus/1369">part 2</a>, <a href="http://www.securityfocus.com/infocus/1370">part 3</a> article that goes into detail about some of the rarer security applications for .htaccess files.</p>
<p><strong>SAMAXES</strong><br />Some very detailed and helpful .htaccess articles, such as the <a href="http://www.samaxes.com/2008/04/20/htaccess-gzip-and-cache-your-site-for-faster-loading-and-bandwidth-saving/">".htaccess - gzip and cache your site for faster loading and bandwidth saving."</a></p>
<p><strong>PerishablePress</strong><br /><a href="http://perishablepress.com/press/2006/01/10/stupid-htaccess-tricks/">Stupid .htaccess tricks</a> is probably the <strong>best explanation online</strong> for many of the best .htaccess solutions, including many from this page. Unlike me they are fantastic writers, even for technical stuff they are very readable, so its a good blog to kick back on and read.  They also have a <a title="Eight Ways to Blacklist with Apache's mod_rewrite" href="http://perishablepress.com/press/2009/02/03/eight-ways-to-blacklist-with-apaches-mod_rewrite/">fantastic article</a> detailing how to block/deny specific requests using mod_rewrite.</p>
<p><strong>BlogSecurity</strong><br />Mostly a site for... blog security (which is really any web-app security) this blog has a few really impressive articles full of solid information for <a href="http://blogsecurity.net/wordpress/article-210607/">Hardening WordPress with .htaccess</a> among more advanced topics that can be challenging but effective.  This is a good site to subscribe to their feed, they publish plugin exploits and wordpress core vulnerabilities quite a bit.</p>
<p><strong>Check-These</strong><br />Oldschool security/unix dude with some incredibly detailed mod_rewrite tutorials, helped me the most when I first got into this, and a great guy too. See: <a href="http://check-these.info/mod_rewrite-basic.html">Basic Mod_Rewrite Guide</a>, and <a href="http://check-these.info/RewriteRule.html">Advanced Mod_Rewrite Tutorial</a></p>
<p><strong>Reaper-X</strong><br />Alot of .htaccess tutorials and code.  See: <a href="http://www.reaper-x.com/2007/09/01/hardening-wordpress-with-mod-rewrite-and-htaccess/">Hardening WordPress with Mod Rewrite and htaccess</a></p>
<p><strong>jdMorgan</strong><br /><a href="http://www.webmasterworld.com/profilev4.cgi?action=view&amp;member=jdMorgan">jdMorgan</a> is the Moderator of the <a href="http://www.webmasterworld.com/apache/">Apache Forum</a> at WebmasterWorld, a great place for answers.  In my experience he can answer any tough question pertaining to advanced .htaccess usage, haven't seen him stumped yet.</p>
<p><strong>The W3C</strong><br /><a href="http://www.w3.org/International/questions/qa-htaccess-charset">Setting Charset in .htaccess</a> is very informative.<br /></p>
<p><strong>Holy Shmoly!</strong><br />A great blogger with analysis of attacks and spam.  See: More ways to stop spammers and unwanted traffic.</p>
<p><strong>Apache Week</strong><br />A partnership with Red Hat back in the 90's that produced some <a href="http://www.apacheweek.com/features/userauth">excellent documentation</a>.</p>
<p><strong>Corz</strong><br />Here's a resource that I consider to have some of the most creative and ingenious ideas for .htaccess files, although the author is somewhat of a character ;) Its a trip trying to navigate around the site, a fun trip. Its like nothing I've ever seen. There are only a few articles on the site, but the htaccess articles are very original and well-worth a look. See: <a rel="nofollow" href="http://corz.org/serv/tricks/htaccess.php">htaccess tricks and tips</a>.</p>
<hr class="C" />







<h2>Htaccess Directives</h2>
<p class="anote">This is an AskApache.com exclusive <em>you won't find this anywhere else</em>.</p>
<p>Directory, DirectoryMatch, Files, FilesMatch, IfDefine, IfVersion, IfModule, Limit, LimitExcept, Location, LocationMatch, Proxy, ProxyMatch, VirtualHost, AcceptMutex, AcceptPathInfo, AccessFileName, Action, AddCharset, AddDefaultCharset, AddDescription, AddEncoding, AddHandler, AddInputFilter, AddLanguage, AddOutputFilter, AddOutputFilterByType, AddType, Alias, AliasMatch, AllowCONNECT, AllowOverride, Anonymous, Anonymous_Authoritative, Anonymous_LogEmail, Anonymous_MustGiveEmail, Anonymous_NoUserId, Anonymous_VerifyEmail, AuthAuthoritative, AuthDBMAuthoritative, AuthDBMGroupFile, AuthDBMType, AuthDBMUserFile, AuthDigestAlgorithm, AuthDigestDomain, AuthDigestFile, AuthDigestGroupFile, AuthDigestNcCheck, AuthDigestNonceFormat, AuthDigestNonceLifetime, AuthDigestQop, AuthDigestShmemSize, AuthGroupFile, AuthName, AuthType, AuthUserFile, BS2000Account, BrowserMatch, BrowserMatchNoCase, CacheNegotiatedDocs, CharsetDefault, CharsetOptions, CharsetSourceEnc, CheckSpelling, ContentDigest, CookieDomain, CookieExpires, CookieName, CookieStyle, CookieTracking, CoreDumpDirectory, DAV, DAVDepthInfinity, DAVMinTimeout, DefaultIcon, DefaultLanguage, DefaultType, DocumentRoot, ErrorDocument, ErrorLog, ExtFilterDefine, ExtFilterOptions, FancyIndexing, FileETag, ForceLanguagePriority, ForceType, GprofDir, Header, HeaderName, HostnameLookups, IdentityCheck, ImapBase, ImapDefault, ImapMenu, Include, IndexIgnore, LanguagePriority, LimitRequestBody, LimitRequestFields, LimitRequestFieldsize, LimitRequestLine, LimitXMLRequestBody, LockFile, LogLevel, MaxRequestsPerChild, MultiviewsMatch, NameVirtualHost, NoProxy, Options, PassEnv, PidFile, Port, ProxyBlock, ProxyDomain, ProxyErrorOverride, ProxyIOBufferSize, ProxyMaxForwards, ProxyPass, ProxyPassReverse, ProxyPreserveHost, ProxyReceiveBufferSize, ProxyRemote, ProxyRemoteMatch, ProxyRequests, ProxyTimeout, ProxyVia, RLimitCPU, RLimitMEM, RLimitNPROC, ReadmeName, Redirect, RedirectMatch, RedirectPermanent, RedirectTemp, RemoveCharset, RemoveEncoding, RemoveHandler, RemoveInputFilter, RemoveLanguage, RemoveOutputFilter, RemoveType, RequestHeader, Require, RewriteCond, RewriteRule, SSIEndTag, SSIErrorMsg, SSIStartTag, SSITimeFormat, SSIUndefinedEcho, Satisfy, ScoreBoardFile, Script, ScriptAlias, ScriptAliasMatch, ScriptInterpreterSource, ServerAdmin, ServerAlias, ServerName, ServerPath, ServerRoot, ServerSignature, ServerTokens, SetEnv, SetEnvIf, SetEnvIfNoCase, SetHandler, SetInputFilter, SetOutputFilter, Timeout, TypesConfig, UnsetEnv, UseCanonicalName, XBitHack, allow, deny, order, CGIMapExtension, EnableMMAP, ISAPIAppendLogToErrors, ISAPIAppendLogToQuery, ISAPICacheFile, ISAPIFakeAsync, ISAPILogNotSupported, ISAPIReadAheadBuffer, SSLLog, SSLLogLevel, MaxMemFree, ModMimeUsePathInfo, EnableSendfile, ProxyBadHeader, AllowEncodedSlashes, LimitInternalRecursion, EnableExceptionHook, TraceEnable, ProxyFtpDirCharset, AuthBasicAuthoritative, AuthBasicProvider, AuthDefaultAuthoritative, AuthDigestProvider, AuthLDAPAuthzEnabled, AuthLDAPBindDN, AuthLDAPBindPassword, AuthLDAPCharsetConfig, AuthLDAPCompareDNOnServer, AuthLDAPDereferenceAliases, AuthLDAPGroupAttribute, AuthLDAPGroupAttributeIsDN, AuthLDAPRemoteUserIsDN, AuthLDAPURL, AuthzDBMAuthoritative, AuthzDBMType, AuthzDefaultAuthoritative, AuthzGroupFileAuthoritative, AuthzLDAPAuthoritative, AuthzOwnerAuthoritative, AuthzUserAuthoritative, BalancerMember, DAVGenericLockDB, FilterChain, FilterDeclare, FilterProtocol, FilterProvider, FilterTrace, IdentityCheckTimeout, IndexStyleSheet, ProxyPassReverseCookieDomain, ProxyPassReverseCookiePath, ProxySet, ProxyStatus, ThreadStackSize, AcceptFilter, Protocol, AuthDBDUserPWQuery, AuthDBDUserRealmQuery, UseCanonicalPhysicalPort, CheckCaseOnly, AuthLDAPRemoteUserAttribute, ProxyPassMatch, SSIAccessEnable, Substitute, ProxyPassInterpolateEnv</p>
<hr class="C" />







<h2>Htaccess Modules</h2>
<p>Here are most of the modules that come with Apache.  Each one can have new commands that can be used in .htaccess file scopes.</p>
<p><a href="/servers/mod_actions.c.html">mod_actions</a>, <a href="/servers/mod_alias.c.html">mod_alias</a>, <a href="/servers/mod_asis.c.html">mod_asis</a>, <a href="/servers/mod_auth_basic.c.html">mod_auth_basic</a>, <a href="/servers/mod_auth_digest.c.html">mod_auth_digest</a>, <a href="/servers/mod_authn_anon.c.html">mod_authn_anon</a>, <a href="/servers/mod_authn_dbd.c.html">mod_authn_dbd</a>, <a href="/servers/mod_authn_dbm.c.html">mod_authn_dbm</a>, <a href="/servers/mod_authn_default.c.html">mod_authn_default</a>, <a href="/servers/mod_authn_file.c.html">mod_authn_file</a>, <a href="/servers/mod_authz_dbm.c.html">mod_authz_dbm</a>, <a href="/servers/mod_authz_default.c.html">mod_authz_default</a>, <a href="/servers/mod_authz_groupfile.c.html">mod_authz_groupfile</a>, <a href="/servers/mod_authz_host.c.html">mod_authz_host</a>, <a href="/servers/mod_authz_owner.c.html">mod_authz_owner</a>, <a href="/servers/mod_authz_user.c.html">mod_authz_user</a>, <a href="/servers/mod_autoindex.c.html">mod_autoindex</a>, <a href="/servers/mod_cache.c.html">mod_cache</a>, <a href="/servers/mod_cern_meta.c.html">mod_cern_meta</a>, <a href="/servers/mod_cgi.c.html">mod_cgi</a>, <a href="/servers/mod_dav.c.html">mod_dav</a>, <a href="/servers/mod_dav_fs.c.html">mod_dav_fs</a>, <a href="/servers/mod_dbd.c.html">mod_dbd</a>, <a href="/servers/mod_deflate.c.html">mod_deflate</a>, <a href="/servers/mod_dir.c.html">mod_dir</a>, <a href="/servers/mod_disk_cache.c.html">mod_disk_cache</a>, <a href="/servers/mod_dumpio.c.html">mod_dumpio</a>, <a href="/servers/mod_env.c.html">mod_env</a>, <a href="/servers/mod_expires.c.html">mod_expires</a>, <a href="/servers/mod_ext_filter.c.html">mod_ext_filter</a>, <a href="/servers/mod_file_cache.c.html">mod_file_cache</a>, <a href="/servers/mod_filter.c.html">mod_filter</a>, <a href="/servers/mod_headers.c.html">mod_headers</a>, <a href="/servers/mod_ident.c.html">mod_ident</a>, <a href="/servers/mod_imagemap.c.html">mod_imagemap</a>, <a href="/servers/mod_include.c.html">mod_include</a>, <a href="/servers/mod_info.c.html">mod_info</a>, <a href="/servers/mod_log_config.c.html">mod_log_config</a>, <a href="/servers/mod_log_forensic.c.html">mod_log_forensic</a>, <a href="/servers/mod_logio.c.html">mod_logio</a>, <a href="/servers/mod_mem_cache.c.html">mod_mem_cache</a>, <a href="/servers/mod_mime.c.html">mod_mime</a>, <a href="/servers/mod_mime_magic.c.html">mod_mime_magic</a>, <a href="/servers/mod_negotiation.c.html">mod_negotiation</a>, <a href="/servers/mod_proxy.c.html">mod_proxy</a>, <a href="/servers/mod_proxy_ajp.c.html">mod_proxy_ajp</a>, <a href="/servers/mod_proxy_balancer.c.html">mod_proxy_balancer</a>, <a href="/servers/mod_proxy_connect.c.html">mod_proxy_connect</a>, <a href="/servers/mod_proxy_ftp.c.html">mod_proxy_ftp</a>, <a href="/servers/mod_proxy_http.c.html">mod_proxy_http</a>, <a href="/servers/mod_rewrite.c.html">mod_rewrite</a>, <a href="/servers/mod_setenvif.c.html">mod_setenvif</a>, <a href="/servers/mod_speling.c.html">mod_speling</a>, <a href="/servers/mod_ssl.c.html">mod_ssl</a>, <a href="/servers/mod_status.c.html">mod_status</a>, <a href="/servers/mod_substitute.c.html">mod_substitute</a>, <a href="/servers/mod_unique_id.c.html">mod_unique_id</a>, <a href="/servers/mod_userdir.c.html">mod_userdir</a>, <a href="/servers/mod_usertrack.c.html">mod_usertrack</a>, <a href="/servers/mod_version.c.html">mod_version</a>, <a href="/servers/mod_vhost_alias.c.html">mod_vhost_alias</a></p>
<hr class="C" />





<h2>Htaccess Software</h2>
<p>Apache HTTP Server comes with the following <a href="http://httpd.apache.org/docs/trunk/programs/">programs</a>.</p>
<dl><dt><code>httpd</code></dt><dd>Apache hypertext transfer protocol server</dd><dt><code>apachectl</code></dt><dd>Apache HTTP server control interface</dd><dt><code>ab</code></dt><dd>Apache HTTP server benchmarking tool</dd><dt><code>apxs</code></dt><dd>APache eXtenSion tool</dd><dt><code>dbmmanage</code></dt><dd>Create and update user authentication files in DBM format for basic authentication</dd><dt><code>fcgistarter</code></dt><dd>Start a FastCGI program</dd><dt><code>htcacheclean</code></dt><dd>Clean up the disk cache</dd><dt><code>htdigest</code></dt><dd>Create and update user authentication files for digest authentication</dd><dt><code>htdbm</code></dt><dd>Manipulate DBM password databases.</dd><dt><code>htpasswd</code></dt><dd>Create and update user authentication files for basic authentication</dd><dt><code>httxt2dbm</code></dt><dd>Create dbm files for use with RewriteMap</dd><dt><code>logresolve</code></dt><dd>Resolve hostnames for IP-addresses in Apache logfiles</dd><dt>log_server_status</dt><dd>Periodically log the server's status</dd><dt><code>rotatelogs</code></dt><dd>Rotate Apache logs without having to kill the server</dd><dt>split-logfile</dt><dd>Split a multi-vhost logfile into per-host logfiles</dd><dt><code>suexec</code></dt><dd>Switch User For Exec</dd></dl>




















<h2>Technical Look at .htaccess</h2>
<p><a href="http://httpd.apache.org/docs/1.3/misc/API.html">Source: Apache API notes</a></p>
<h3>Per-directory configuration structures</h3>
<p>Let's look out how all of this plays out in mod_mime.c, which defines the file typing handler which emulates the NCSA server's behavior of determining file types from suffixes. What we'll be looking at, here, is the code which implements the AddType and AddEncoding commands. These commands can appear in .htaccess files, so they must be handled in the module's private per-directory data, which in fact, consists of two separate tables for MIME types and encoding information, and is declared as follows:</p>

<pre>
table *forced_types;      /* Additional AddTyped stuff */
table *encoding_types;    /* Added with AddEncoding... */
mime_dir_config;
</pre>

<p>When the server is reading a configuration file, or &lt;Directory&gt; section, which includes one of the MIME module's commands, it needs to create a mime_dir_config structure, so those commands have something to act on. It does this by invoking the function it finds in the module's `create per-dir config slot', with two arguments: the name of the directory to which this configuration information applies (or NULL for srm.conf), and a pointer to a resource pool in which the allocation should happen.</p>

<p>(If we are reading a .htaccess file, that resource pool is the per-request resource pool for the request; otherwise it is a resource pool which is used for configuration data, and cleared on restarts. Either way, it is important for the structure being created to vanish when the pool is cleared, by registering a cleanup on the pool if necessary).</p>

<p>For the MIME module, the per-dir config creation function just ap_pallocs the structure above, and a creates a couple of tables to fill it. That looks like this:</p>

<pre>
void *create_mime_dir_config (pool *p, char *dummy)
mime_dir_config *new = (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config));
&nbsp;
new-&gt;forced_types = ap_make_table (p, 4);
new-&gt;encoding_types = ap_make_table (p, 4);
</pre>


<p>Now, suppose we've just read in a .htaccess file. We already have the per-directory configuration structure for the next directory up in the hierarchy. If the .htaccess file we just read in didn't have any AddType or AddEncoding commands, its per-directory config structure for the MIME module is still valid, and we can just use it. Otherwise, we need to merge the two structures somehow.</p>

<p>To do that, the server invokes the module's per-directory config merge function, if one is present. That function takes three arguments: the two structures being merged, and a resource pool in which to allocate the result. For the MIME module, all that needs to be done is overlay the tables from the new per-directory config structure with those from the parent:</p>

<pre>
void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
mime_dir_config *subdir = (mime_dir_config *)subdirv;
mime_dir_config *new =  (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config));
new-&gt;forced_types = ap_overlay_tables (p, subdir-&gt;forced_types, parent_dir-&gt;forced_types);
new-&gt;encoding_types = ap_overlay_tables (p, subdir-&gt;encoding_types, parent_dir-&gt;encoding_types);
</pre>


<p>As a note --- if there is no per-directory merge function present, the server will just use the subdirectory's configuration info, and ignore the parent's. For some modules, that works just fine (e.g., for the includes module, whose per-directory configuration information consists solely of the state of the XBITHACK), and for those modules, you can just not declare one, and leave the corresponding structure slot in the module itself NULL.</p>

<h3>Command handling</h3>
<p>Now that we have these structures, we need to be able to figure out how to fill them. That involves processing the actual AddType and AddEncoding commands. To find commands, the server looks in the module's command table. That table contains information on how many arguments the commands take, and in what formats, where it is permitted, and so forth. That information is sufficient to allow the server to invoke most command-handling functions with pre-parsed arguments. Without further ado, let's look at the AddType command handler, which looks like this (the AddEncoding command looks basically the same, and won't be shown here):</p>
<pre>
char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
if (*ext == &#039;.&#039;) ++ext;
ap_table_set (m-&gt;forced_types, ext, ct);
</pre>

<p>This command handler is unusually simple. As you can see, it takes four arguments, two of which are pre-parsed arguments, the third being the per-directory configuration structure for the module in question, and the fourth being a pointer to a cmd_parms structure. That structure contains a bunch of arguments which are frequently of use to some, but not all, commands, including a resource pool (from which memory can be allocated, and to which cleanups should be tied), and the (virtual) server being configured, from which the module's per-server configuration data can be obtained if required.</p>

<p>Another way in which this particular command handler is unusually simple is that there are no error conditions which it can encounter. If there were, it could return an error message instead of NULL; this causes an error to be printed out on the server's stderr, followed by a quick exit, if it is in the main config files; for a .htaccess file, the syntax error is logged in the server error log (along with an indication of where it came from), and the request is bounced with a server error response (HTTP error status, code 500).</p>

<p>The MIME module's command table has entries for these commands, which look like this:</p>
<pre>
command_rec mime_cmds[] =
{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, "a mime type followed by a file extension" },
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, "an encoding (e.g., gzip), followed by a file extension" },
</pre>


<p>The entries in these tables are:</p>
<ul>
<li>The name of the command</li>
<li>The function which handles it a (void *) pointer, which is passed in the cmd_parms structure to the command handler --- this is useful in case many similar commands are handled by the same function.</li>
<li>A bit mask indicating where the command may appear. There are mask bits corresponding to each AllowOverride option, and an additional mask bit, RSRC_CONF, indicating that the command may appear in the server's own config files, but not in any .htaccess file.</li>
<li>A flag indicating how many arguments the command handler wants pre-parsed, and how they should be passed in. TAKE2 indicates two pre-parsed arguments. Other options are TAKE1, which indicates one pre-parsed argument, FLAG, which indicates that the argument should be On or Off, and is passed in as a boolean flag, RAW_ARGS, which causes the server to give the command the raw, unparsed arguments (everything but the command name itself). There is also ITERATE, which means that the handler looks the same as TAKE1, but that if multiple arguments are present, it should be called multiple times, and finally ITERATE2, which indicates that the command handler looks like a TAKE2, but if more arguments are present, then it should be called multiple times, holding the first argument constant.</li>
<li>Finally, we have a string which describes the arguments that should be present. If the arguments in the actual config file are not as required, this string will be used to help give a more specific error message. (You can safely leave this NULL).</li>
</ul>

<p>Finally, having set this all up, we have to use it. This is ultimately done in the module's handlers, specifically for its file-typing handler, which looks more or less like this; note that the per-directory configuration structure is extracted from the request_rec's per-directory configuration vector by using the ap_get_module_config function.</p>

<h3>Side notes --- per-server configuration, virtual servers, etc.</h3>
<p>The basic ideas behind per-server module configuration are basically the same as those for per-directory configuration; there is a creation function and a merge function, the latter being invoked where a virtual server has partially overridden the base server configuration, and a combined structure must be computed. (As with per-directory configuration, the default if no merge function is specified, and a module is configured in some virtual server, is that the base configuration is simply ignored).</p>

<p>The only substantial difference is that when a command needs to configure the per-server private module data, it needs to go to the cmd_parms data to get at it. Here's an example, from the alias module, which also indicates how a syntax error can be returned (note that the per-directory configuration argument to the command handler is declared as a dummy, since the module doesn't actually have per-directory config data):</p>




<p><a href="/htaccess/htaccess-rewrite.html">Continue Reading Page 2</a></p><p><a href="http://www.askapache.com/htaccess/htaccess.html"></a><a href="http://www.askapache.com/htaccess/htaccess.html">THE Ultimate Htaccess</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/htaccess/htaccess.html/feed</wfw:commentRss>
		<slash:comments>88</slash:comments>
		</item>
		<item>
		<title>Fsockopen Power Plays</title>
		<link>http://www.askapache.com/php/fsockopen-socket.html</link>
		<comments>http://www.askapache.com/php/fsockopen-socket.html#comments</comments>
		<pubDate>Wed, 02 Jul 2008 11:42:56 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.askapache.com/?p=1046</guid>
		<description><![CDATA[<p><a class="IFL hs hs17" rel="lb" href='http://www.askapache.com/php/fsockopen-socket.html' title="Fsockopen Power"></a><strong>PHP's <a href="http://php.net/manual/en/function.fsockopen.php">fsockopen</a> function lets you open an Internet or Unix domain socket connection for connecting to a resource, and is one of the most powerful functions available in the php language.</strong><br class="C" /></p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/php/fsockopen-socket.html"></a><a href="http://www.askapache.com/php/fsockopen-socket.html"><cite>AskApache.com</cite></a></p><p><a class="IFL hs hs17" rel="lb" href='http://uploads.askapache.com/2008/07/fsockopen-lightning.jpg' title="Fsockopen Power"></a><strong>PHP's function <a href="http://php.net/manual/en/function.fsockopen.php">fsockopen</a> lets you open an Internet or Unix domain socket connection for connecting to a resource, and is one of the most powerful functions.</strong>  fsockopen could be described as creating a direct link to the wire connected to a resource, which means you can send any information (<em>EBCDIC, ASCII, Hex, C arrays, Raw</em>) directly to the target server.<br class="C" /></p>

<h2>A Socket is like /dev/null</h2>
<p>In unix you can send anything to the <code>/dev/null</code> device, <em>for Windows think Recycle Bin</em>, and likewise you can send anything to a socket created with fsockopen.  I've seen fsockopen code that sends custom exploits to cisco routers, including being used by the metasploit framework.  I've seen fsockopen telnet emulation, smtp/pop3 login, and a lot of other advanced raw networking that is exciting for me see.</p>

<h3>Some Definitions for Fsockopen</h3>
<dl>
<dt><a href="http://www.askapache.com/glossary/#client" title="client">client</a></dt>
<dd>A program that establishes connections for the purpose of sending requests.</dd>
<dt><a href="http://www.askapache.com/glossary/#server" title="server">server</a></dt>
<dd>An application program that accepts connections in order to service requests by sending back responses.</dd>
</dl>

<h3>Simple Socket Explantion</h3>
<p>A web server host listens on TCP port 80.  When a client host wishes to view a resource on the web server, it establishes a TCP connection with the server host by opening a socket to send the request for the resource.  When the connection is established, the client and server exchange requests and responses (respectively) until the connection is closed or aborted.</p>


<h2>HTTP and fsockopen</h2>
<p><a class="IFR" href='http://www.askapache.com/php/speedy-form-post.html' title='Snoopy Fsockopen HTTP Class for PHP'><img src='http://uploads.askapache.com/2008/02/snoopy-fsockopen.thumbnail.png' alt='Snoopy Fsockopen HTTP Class for PHP' title="snoopy fsockopen.thumbnail php" /></a>The <a href="http://snoopy.sourceforge.net/">Snoopy</a> class is bundled with WordPress distributions and uses fsockopen to achieve most of its cool features.  WordPress core, plugins, and other included files and classes also use the fsockopen function to communicate via HTTP.<br class="C" /></p>



<h2>Fsockopen Examples</h2>
<p><a rel="lb" class="IFL" href='http://uploads.askapache.com/2008/07/fsockopen-warning.jpg'><img src="http://uploads.askapache.com/2008/07/fsockopen-warning-200x181.jpg" alt="fsockopen warning" title="fsockopen warning" width="100" height="91" /></a>Note the warning sign, fsockopen is dangerous in the sense that you can crash your server, perform a DOS against your own server or other site, use up all your servers available sockets and fd descriptors, use up your bandwidth, etc.. Shouldn't be a problem unless you are being malicious or careless.<br class="C" /></p>
<p>Here are some BOSS fsockopen functions I hacked together yesterday for use in my <a href="http://wordpress.org/extend/plugins/askapache-crazy-cache/">AskApache Crazy Cache WordPress Plugin</a>.  I've used code and ideas from 100's of authors, projects, and docs to try to make this the very best I can.</p>

<h3>Intro</h3>
<p>This is a working example employing as many of the best-practices, tips, and tricks for using fsockopen on remote streams that I could find.</p>
<pre>&lt;?php
// max time for script execution
if(!@defined(&#039;AA_MAX_TIME&#039;)) define(&#039;AA_MAX_TIME&#039;,  60);
&nbsp;
// max time for socket reads
if(!@defined(&#039;AA_RECV_TIME&#039;)) define(&#039;AA_RECV_TIME&#039;, 30);
&nbsp;
// max time for socket connect
if(!@defined(&#039;AA_CONN_TIME&#039;)) define(&#039;AA_CONN_TIME&#039;, 5);
&nbsp;
// linebreak
if(!@defined(&#039;AA_LF&#039;)) define(&#039;AA_LF&#039;, chr(13).chr(10));
&nbsp;
// ignore TCP RST i.e. browser stop button
@ignore_user_abort(1);
&nbsp;
// set the script execution time
@set_time_limit(AA_MAX_TIME);
&nbsp;
// set the default socket timeout value
@ini_set("default_socket_timeout",AA_RECV_TIME);
&nbsp;
// output implicitly
@ob_implicit_flush(1);
&nbsp;
// for binary freads
@set_magic_quotes_runtime(0);
&nbsp;
// keep track of script execution time
$aa_time=time();
&nbsp;
// download each of these urls using fsockopen
aa_dl(&#039;http://httpd.apache.org&#039;);
aa_dl(&#039;http://www.w3.org&#039;);
aa_dl(&#039;http://www.google.com&#039;);
aa_dl(&#039;http://www.freebsd.org/cgi/man.cgi?query=connect&amp;sektion=2&amp;apropos=0&amp;manpath=FreeBSD+7.0-RELEASE&#039;);
aa_dl(&#039;http://www.askapache.com/htaccess/htaccess.html&#039;);
aa_dl(&#039;http://www.php.net&#039;);
aa_dl(&#039;http://en.wikipedia.org/wiki/Main_Page&#039;);
&nbsp;
/*  returns a socket pointer if valid or displays an error message
    sets stream timeout, starts the clock to check for socket read time */
function askapache_get_sock($target,$port){
  global $aa_time_start;
  $aa_time_start=time();
  if(false===($fp = @fsockopen($target,$port,$errno,$errstr,AA_CONN_TIME))||!is_resource($fp))
    return askapache_sock_strerror($errno,$errstr);
  @stream_set_timeout($fp, AA_RECV_TIME);
  return $fp;
}
&nbsp;
/*  writes request, then reads response until EOF, script max, or socket max
    returns response on success.  Uses buffer to allow size&gt;100megs */
function askapache_txrx($fp,$request,$chunk=1024){
  $rec=$buf=&#039;&#039;;
  if(!@fwrite($fp, $request, strlen($request)))die(&#039;fwrite error&#039;);
  while ( !@feof($fp) &amp;&amp; askapache_time_ok(askapache_time_passed())){
    $buf = @fread($fp, $chunk);
    $rec .= $buf;
  }
  if(!@fclose($fp))die(&#039;fclose error&#039;);
  return $rec;
}
&nbsp;
/* initiates the socket and download for the passed url.
   automatically handles gzip, chunked, both, and plain downloads.
   uses the long2ip/ip2long for ip validation, uses gethostbyname to
   get the ipv4 address which saves fsockopen from having to do the lookup
   final data is saved to $rbody but currently only displays headers.*/
function aa_dl($url=NULL){
  global $aa_time;
  $ub = @parse_url($url);
  if(!isset($ub[&#039;host&#039;])||empty($ub[&#039;host&#039;])) die("bad url $url");
  $proto   = ($ub[&#039;scheme&#039;]==&#039;https&#039;)?&#039;ssl://&#039;:&#039;&#039;;
  $port   = (isset($ub[&#039;port&#039;])&amp;&amp;!empty($ub[&#039;port&#039;])) ? $ub[&#039;port&#039;]:($proto!=&#039;&#039;)?443:80;
  $path   = (isset($ub[&#039;path&#039;])&amp;&amp;!empty($ub[&#039;path&#039;])) ? $ub[&#039;path&#039;]:&#039;/&#039;;
  $query   = (isset($ub[&#039;query&#039;])&amp;&amp;!empty($ub[&#039;query&#039;])) ? &#039;?&#039;.$ub[&#039;query&#039;] : &#039;&#039;;
  $host   = $ub[&#039;host&#039;];
  $ipp     = @gethostbyname($host);
  $ip     = ($ipp!=$host) ? long2ip(ip2long($ipp)) : $host;
&nbsp;
  $headers=array(
   "GET {$path}{$query} HTTP/1.1",
   "Host: {$host}",
   &#039;User-Agent: Mozilla/5.0 (AskApache/; +http://www.askapache.com/)&#039;,
   &#039;Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,*/*;q=0.5&#039;,
   &#039;Accept-Language: en-us,en;q=0.5&#039;,
   &#039;Accept-Encoding: gzip,deflate&#039;,
   &#039;Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7&#039;,
   &#039;Connection: close&#039;,&#039;Referer: http://www.askapache.com&#039;
  );
  $request=join(AA_LF,$headers).AA_LF.AA_LF;
&nbsp;
  $fp=askapache_get_sock($proto.$ip, $port);
  if($fp){
    $rbody=$rec=&#039;&#039;;$resp_headers=array();
    $rec=askapache_txrx($fp,$request);
    list($resp_headers, $rbody) = explode(AA_LF.AA_LF, trim($rec), 2);
  echo "\n&lt;p&gt;$request&lt;/p&gt;\n&lt;p&gt;$resp_headers&lt;/p&gt;\n";
    $gzip2=(stripos($resp_headers,&#039;Content-Encoding&#039;)!==false &amp;&amp;
        stripos($resp_headers,&#039;gzip&#039;)!==false)?1:0;
    $chunk=(stripos($resp_headers,&#039;Transfer-Encoding&#039;)!==false &amp;&amp;
        stripos($resp_headers,&#039;chunked&#039;)!==false)?1:0;
    $rbody=aa_decode_body($rbody,$chunk,$gzip2);
    unset($rbody);
  }
}
&nbsp;
/* based on http://us.php.net/manual/en/function.fsockopen.php#75175
   ungzips and/or re-assembles transfer-encoded:chunked responses
   returns the good response on success */
function aa_decode_body ($str, $chunked, $gzipped){
  if($gzipped &amp;&amp; !$chunked) return aa_gzdecode($str);
  if(!$gzipped &amp;&amp; !$chunked) return $str;
  $tmp = $str; $str = &#039;&#039;;
  do {
    $tmp = ltrim($tmp);
    $pos = strpos($tmp,AA_LF);
    $len = hexdec(substr($tmp, 0, $pos));
    if($gzipped) $str .= gzinflate(substr($tmp,($pos+12),$len));
    else $str .=substr($tmp,($pos+2),$len);
    $tmp = substr($tmp,($len+$pos+2));
  $chk=trim($tmp);
  } while (!empty($chk));
  return $str;
}
&nbsp;
/*  based on http://us2.php.net/manual/en/function.gzencode.php#82520
  saves the gzipped data to a tempfile, then outputs the decoded
  data to the output buffer using readgzfile, returning the decoded
  buffer and deleting the tempfile on success */
function aa_gzdecode($data){
  $g=tempnam(&#039;/tmp&#039;,&#039;ff&#039;);
  @file_put_contents($g,$data);
  ob_start(); readgzfile($g); $d=ob_get_clean(); @unlink($g);
  return $d;
}
&nbsp;
/*  very cool!  this is run during socket reads and checks whether the script
  execution time limit or the socket read time limit has been met, killing
  the script if so, otherwise returns true.  Run with a cron-like process */
function askapache_time_ok($sock_time=0) {
  global $aa_time;
  if (time()-$aa_time&gt;AA_MAX_TIME)
    die(&#039;killed script.. time exceeded &#039;.AA_MAX_TIME.&#039; Total: &#039;.$total);
  if ($sock_time&gt;AA_RECV_TIME)
    die(&#039;Killed socket.. time exceeded &#039;.AA_RECV_TIME.&#039; Total: &#039;.$sock_time);
  return true;
}
&nbsp;
/* input for askapache_time_ok to keep track of each socket read time time. */
function askapache_time_passed() {
  global $aa_time_start;
  return (time() - $aa_time_start);
}
&nbsp;
/*  handles fsockopen errors, printing them out though you may want to die on err */
function askapache_sock_strerror($errno,$errstr){
  switch($errno){
    case -3:  $err="Socket creation failed"; break;
    case -4:  $err="DNS lookup failure"; break;
    case -5:  $err="Connection refused or timed out"; break;
    case 111: $err="Connection refused"; break;
    case 113: $err="No route to host"; break;
    case 110: $err="Connection timed out"; break;
    case 104: $err="Connection reset by client"; break;
    default:  $err="Connection failed"; break;
  }
  echo &#039;&lt;p&gt;Fsockopen failed!&#039;."\n[".$errno."] ".$err." (".$errstr.")&lt;/p&gt;";
  return false;
}
?&gt;</pre>


<hr class="C" />




<h2>Debugging Fsockopen</h2>
<p>If you really want to know more about fsockopen, you can do what I did and read all the relevant php source files, your OS sys, lib, and user files relevant to fsockopen, and of course you can always trace php using the fsockopen function to get an under-the-hood look at what in the world fsockopen is doing.  Personally, I was trying to find more error codes and error strings to display when an fsockopen call failed, and I ended up finding over 50..</p>


<h3>fsockopen Errors</h3>
<pre>function fsockopen_err($errnum)
{
   static $fsockopen_errors;
   is_null($fsockopen_errors) &amp;&amp; $fsockopen_errors = array(
  0 =&gt; &#039;Success&#039;,
  1 =&gt; &#039;Operation not permitted&#039;,
  2 =&gt; &#039;No such file or directory&#039;,
  3 =&gt; &#039;No such process&#039;,
  4 =&gt; &#039;Interrupted system call - DNS lookup failure&#039;,
  5 =&gt; &#039;Input/output error - Connection refused or timed out&#039;,
  6 =&gt; &#039;No such device or address&#039;,
  7 =&gt; &#039;Argument list too long&#039;,
  8 =&gt; &#039;Exec format error&#039;,
  9 =&gt; &#039;Bad file descriptor&#039;,
  10 =&gt; &#039;No child processes&#039;,
  11 =&gt; &#039;Resource temporarily unavailable&#039;,
  12 =&gt; &#039;Cannot allocate memory&#039;,
  13 =&gt; &#039;Permission denied&#039;,
  14 =&gt; &#039;Bad address&#039;,
  15 =&gt; &#039;Block device required&#039;,
  16 =&gt; &#039;Device or resource busy&#039;,
  17 =&gt; &#039;File exists&#039;,
  18 =&gt; &#039;Invalid cross-device link&#039;,
  19 =&gt; &#039;No such device&#039;,
  20 =&gt; &#039;Not a directory&#039;,
  21 =&gt; &#039;Is a directory&#039;,
  22 =&gt; &#039;Invalid argument&#039;,
  23 =&gt; &#039;Too many open files in system&#039;,
  24 =&gt; &#039;Too many open files&#039;,
  25 =&gt; &#039;Inappropriate ioctl for device&#039;,
  26 =&gt; &#039;Text file busy&#039;,
  27 =&gt; &#039;File too large&#039;,
  28 =&gt; &#039;No space left on device&#039;,
  29 =&gt; &#039;Illegal seek&#039;,
  30 =&gt; &#039;Read-only file system&#039;,
  31 =&gt; &#039;Too many links&#039;,
  32 =&gt; &#039;Broken pipe&#039;,
  33 =&gt; &#039;Numerical argument out of domain&#039;,
  34 =&gt; &#039;Numerical result out of range&#039;,
  35 =&gt; &#039;Resource deadlock avoided&#039;,
  36 =&gt; &#039;File name too long&#039;,
  37 =&gt; &#039;No locks available&#039;,
  38 =&gt; &#039;Function not implemented&#039;,
  39 =&gt; &#039;Directory not empty&#039;,
  40 =&gt; &#039;Too many levels of symbolic links&#039;,
  41 =&gt; &#039;Unknown error 41&#039;,
  42 =&gt; &#039;No message of desired type&#039;,
  43 =&gt; &#039;Identifier removed&#039;,
  44 =&gt; &#039;Channel number out of range&#039;,
  45 =&gt; &#039;Level 2 not synchronized&#039;,
  46 =&gt; &#039;Level 3 halted&#039;,
  47 =&gt; &#039;Level 3 reset&#039;,
  48 =&gt; &#039;Link number out of range&#039;,
  49 =&gt; &#039;Protocol driver not attached&#039;,
  50 =&gt; &#039;No CSI structure available&#039;,
  51 =&gt; &#039;Level 2 halted&#039;,
  52 =&gt; &#039;Invalid exchange&#039;,
  53 =&gt; &#039;Invalid request descriptor&#039;,
  54 =&gt; &#039;Exchange full&#039;,
  55 =&gt; &#039;No anode&#039;,
  56 =&gt; &#039;Invalid request code&#039;,
  57 =&gt; &#039;Invalid slot&#039;,
  58 =&gt; &#039;Unknown error 58&#039;,
  59 =&gt; &#039;Bad font file format&#039;,
  60 =&gt; &#039;Device not a stream&#039;,
  61 =&gt; &#039;No data available&#039;,
  62 =&gt; &#039;Timer expired&#039;,
  63 =&gt; &#039;Out of streams resources&#039;,
  64 =&gt; &#039;Machine is not on the network&#039;,
  65 =&gt; &#039;Package not installed&#039;,
  66 =&gt; &#039;Object is remote&#039;,
  67 =&gt; &#039;Link has been severed&#039;,
  68 =&gt; &#039;Advertise error&#039;,
  69 =&gt; &#039;Srmount error&#039;,
  70 =&gt; &#039;Communication error on send&#039;,
  71 =&gt; &#039;Protocol error&#039;,
  72 =&gt; &#039;Multihop attempted&#039;,
  73 =&gt; &#039;RFS specific error&#039;,
  74 =&gt; &#039;Bad message&#039;,
  75 =&gt; &#039;Value too large for defined data type&#039;,
  76 =&gt; &#039;Name not unique on network&#039;,
  77 =&gt; &#039;File descriptor in bad state&#039;,
  78 =&gt; &#039;Remote address changed&#039;,
  79 =&gt; &#039;Can not access a needed shared library&#039;,
  80 =&gt; &#039;Accessing a corrupted shared library&#039;,
  81 =&gt; &#039;.lib section in a.out corrupted&#039;,
  82 =&gt; &#039;Attempting to link in too many shared libraries&#039;,
  83 =&gt; &#039;Cannot exec a shared library directly&#039;,
  84 =&gt; &#039;Invalid or incomplete multibyte or wide character&#039;,
  85 =&gt; &#039;Interrupted system call should be restarted&#039;,
  86 =&gt; &#039;Streams pipe error&#039;,
  87 =&gt; &#039;Too many users&#039;,
  88 =&gt; &#039;Socket operation on non-socket&#039;,
  89 =&gt; &#039;Destination address required&#039;,
  90 =&gt; &#039;Message too long&#039;,
  91 =&gt; &#039;Protocol wrong type for socket&#039;,
  92 =&gt; &#039;Protocol not available&#039;,
  93 =&gt; &#039;Protocol not supported&#039;,
  94 =&gt; &#039;Socket type not supported&#039;,
  95 =&gt; &#039;Operation not supported&#039;,
  96 =&gt; &#039;Protocol family not supported&#039;,
  97 =&gt; &#039;Address family not supported by protocol&#039;,
  98 =&gt; &#039;Address already in use&#039;,
  99 =&gt; &#039;Cannot assign requested address&#039;,
  100 =&gt; &#039;Network is down&#039;,
  101 =&gt; &#039;Network is unreachable&#039;,
  102 =&gt; &#039;Network dropped connection on reset&#039;,
  103 =&gt; &#039;Software caused connection abort&#039;,
  104 =&gt; &#039;Connection reset by peer&#039;,
  105 =&gt; &#039;No buffer space available&#039;,
  106 =&gt; &#039;Transport endpoint is already connected&#039;,
  107 =&gt; &#039;Transport endpoint is not connected&#039;,
  108 =&gt; &#039;Cannot send after transport endpoint shutdown&#039;,
  109 =&gt; &#039;Too many references: cannot splice&#039;,
  110 =&gt; &#039;Connection timed out&#039;,
  111 =&gt; &#039;Connection refused&#039;,
  112 =&gt; &#039;Host is down&#039;,
  113 =&gt; &#039;No route to host&#039;,
  114 =&gt; &#039;Operation already in progress&#039;,
  115 =&gt; &#039;Operation now in progress&#039;,
  116 =&gt; &#039;Stale NFS file handle&#039;,
  117 =&gt; &#039;Structure needs cleaning&#039;,
  118 =&gt; &#039;Not a XENIX named type file&#039;,
  119 =&gt; &#039;No XENIX semaphores available&#039;,
  120 =&gt; &#039;Is a named type file&#039;,
  121 =&gt; &#039;Remote I/O error&#039;,
  122 =&gt; &#039;Disk quota exceeded&#039;,
  123 =&gt; &#039;No medium found&#039;,
  124 =&gt; &#039;Wrong medium type&#039;,
  125 =&gt; &#039;Operation canceled&#039;
  );
    return (isset($fsockopen_errors[$errnum])) ? $fsockopen_errors[$errnum] : $errnum;
}</pre>
<p>If you would like to see all the errors on your particular machine:</p>
<pre>for($i=0, $s=""; $i&lt;250; $s=socket_strerror($i), $i++)
  !empty($s) &amp;&amp; (&#039;Unknown error&#039; != (substr($s,0,13)) ) &amp;&amp; print "{$i} =&gt; {$s}\n";</pre>
<p>Which outputs:</p>
<pre>1 =&gt; Success
2 =&gt; Operation not permitted
3 =&gt; No such file or directory
4 =&gt; No such process
5 =&gt; Interrupted system call
6 =&gt; Input/output error
7 =&gt; No such device or address
8 =&gt; Argument list too long
9 =&gt; Exec format error
10 =&gt; Bad file descriptor
11 =&gt; No child processes
12 =&gt; Resource temporarily unavailable
13 =&gt; Cannot allocate memory
14 =&gt; Permission denied
15 =&gt; Bad address
16 =&gt; Block device required
17 =&gt; Device or resource busy
18 =&gt; File exists
19 =&gt; Invalid cross-device link
20 =&gt; No such device
21 =&gt; Not a directory
22 =&gt; Is a directory
23 =&gt; Invalid argument
24 =&gt; Too many open files in system
25 =&gt; Too many open files
26 =&gt; Inappropriate ioctl for device
27 =&gt; Text file busy
28 =&gt; File too large
29 =&gt; No space left on device
30 =&gt; Illegal seek
31 =&gt; Read-only file system
32 =&gt; Too many links
33 =&gt; Broken pipe
34 =&gt; Numerical argument out of domain
35 =&gt; Numerical result out of range
36 =&gt; Resource deadlock avoided
37 =&gt; File name too long
38 =&gt; No locks available
39 =&gt; Function not implemented
40 =&gt; Directory not empty
41 =&gt; Too many levels of symbolic links
43 =&gt; No message of desired type
44 =&gt; Identifier removed
45 =&gt; Channel number out of range
46 =&gt; Level 2 not synchronized
47 =&gt; Level 3 halted
48 =&gt; Level 3 reset
49 =&gt; Link number out of range
50 =&gt; Protocol driver not attached
51 =&gt; No CSI structure available
52 =&gt; Level 2 halted
53 =&gt; Invalid exchange
54 =&gt; Invalid request descriptor
55 =&gt; Exchange full
56 =&gt; No anode
57 =&gt; Invalid request code
58 =&gt; Invalid slot
60 =&gt; Bad font file format
61 =&gt; Device not a stream
62 =&gt; No data available
63 =&gt; Timer expired
64 =&gt; Out of streams resources
65 =&gt; Machine is not on the network
66 =&gt; Package not installed
67 =&gt; Object is remote
68 =&gt; Link has been severed
69 =&gt; Advertise error
70 =&gt; Srmount error
71 =&gt; Communication error on send
72 =&gt; Protocol error
73 =&gt; Multihop attempted
74 =&gt; RFS specific error
75 =&gt; Bad message
76 =&gt; Value too large for defined data type
77 =&gt; Name not unique on network
78 =&gt; File descriptor in bad state
79 =&gt; Remote address changed
80 =&gt; Can not access a needed shared library
81 =&gt; Accessing a corrupted shared library
82 =&gt; .lib section in a.out corrupted
83 =&gt; Attempting to link in too many shared libraries
84 =&gt; Cannot exec a shared library directly
85 =&gt; Invalid or incomplete multibyte or wide character
86 =&gt; Interrupted system call should be restarted
87 =&gt; Streams pipe error
88 =&gt; Too many users
89 =&gt; Socket operation on non-socket
90 =&gt; Destination address required
91 =&gt; Message too long
92 =&gt; Protocol wrong type for socket
93 =&gt; Protocol not available
94 =&gt; Protocol not supported
95 =&gt; Socket type not supported
96 =&gt; Operation not supported
97 =&gt; Protocol family not supported
98 =&gt; Address family not supported by protocol
99 =&gt; Address already in use
100 =&gt; Cannot assign requested address
101 =&gt; Network is down
102 =&gt; Network is unreachable
103 =&gt; Network dropped connection on reset
104 =&gt; Software caused connection abort
105 =&gt; Connection reset by peer
106 =&gt; No buffer space available
107 =&gt; Transport endpoint is already connected
108 =&gt; Transport endpoint is not connected
109 =&gt; Cannot send after transport endpoint shutdown
110 =&gt; Too many references: cannot splice
111 =&gt; Connection timed out
112 =&gt; Connection refused
113 =&gt; Host is down
114 =&gt; No route to host
115 =&gt; Operation already in progress
116 =&gt; Operation now in progress
117 =&gt; Stale NFS file handle
118 =&gt; Structure needs cleaning
119 =&gt; Not a XENIX named type file
120 =&gt; No XENIX semaphores available
121 =&gt; Is a named type file
122 =&gt; Remote I/O error
123 =&gt; Disk quota exceeded
124 =&gt; No medium found
125 =&gt; Wrong medium type
126 =&gt; Operation canceled</pre>



<h3>Tracing fsockopen using Strace</h3>
<p>Once you save the above file on your site, you can use the strace tool to debug it.  This is a tad overboard but way cool nevertheless!</p>

<p><code>strace -e trace=connect php -nef fsockopen-test.php</code></p>
<pre>connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("66.33.216.129")}, 28) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.87.106.226")}, 16) = -1 EINPROGRESS (Operation now in progress)</pre>


<p><code>strace -e trace=network php -nef fsockopen-test.php</code></p>
<pre>socket(PF_FILE, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_FILE, path="/var/run/.nscd_socket"}, 110) = -1 ENOENT (No such file or directory)
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("66.33.216.129")}, 28) = 0
send(3, "\274\221\1\0\0\1\0\0\0\0\0\0\5httpd\6apache\3org\0\0\1"..., 34, 0) = 34
recvfrom(3, "\274\221\201\200\0\1\0\1\0\0\0\0\5httpd\6apache\3org\0"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("66.33.216.129")}, [16]) = 50
socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = -1 EAFNOSUPPORT (Address family not supported by protocol)
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.87.106.226")}, 16) = -1 EINPROGRESS (Operation now in progress)
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
send(3, "GET / HTTP/1.1\r\nHost: httpd.apac"..., 356, MSG_DONTWAIT) = 356
recv(3, "HTTP/1.1 200 OK\r\nDate: Wed, 02 J"..., 8192, MSG_DONTWAIT) = 2609
recv(3, "", 8192, MSG_DONTWAIT)         = 0</pre>

<p><code>strace -q -e trace=all php -nef fsockopen-test.php</code></p>
<pre>mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb76ba000
munmap(0xb76ba000, 266240)              = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_FILE, path="/var/run/.nscd_socket"}, 110) = -1 ENOENT (No such file or directory)
close(3)                                = 0
open("/etc/hosts", O_RDONLY)            = 3
fcntl64(3, F_GETFD)                     = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=948, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f6e000
read(3, "# /etc/hosts - dh2 generated\n127"..., 4096) = 948
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0xb7f6e000, 4096)                = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("66.33.216.129")}, 28) = 0
send(3, "X~\1\0\0\1\0\0\0\0\0\0\2en\twikipedia\3org\0\0\1"..., 34, 0) = 34
gettimeofday({1214998196, 656179}, NULL) = 0
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
ioctl(3, FIONREAD, [100])               = 0
recvfrom(3, "X~\201\200\0\1\0\3\0\0\0\0\2en\twikipedia\3org\0\0\1"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("66.33.216.129")}, [16]) = 100
close(3)                                = 0
time(NULL)                              = 1214998196
gettimeofday({1214998196, 656754}, NULL) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("208.80.152.2")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=3, events=POLLIN|POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 10000) = 1
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl64(3, F_SETFL, O_RDWR)             = 0
send(3, "GET /wiki/Main_Page HTTP/1.1\r\nHo"..., 370, MSG_DONTWAIT) = 370
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
time(NULL)                              = 1214998196
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 30000) = 1
recv(3, "HTTP/1.0 200 OK\r\nDate: Wed, 02 J"..., 8192, MSG_DONTWAIT) = 2896
time(NULL)                              = 1214998196
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 30000) = 1
recv(3, "\214!\337i\307\336\23w\253wy\215\26EL\227;\227\253\261"..., 8192, MSG_DONTWAIT) = 5792
time(NULL)                              = 1214998196
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 30000) = 1
recv(3, "4\201\273\214\17yI\347\257\371\373\344\332\330\227\245"..., 8192, MSG_DONTWAIT) = 7487
time(NULL)                              = 1214998197
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 30000) = 1
recv(3, "", 8192, MSG_DONTWAIT)         = 0
close(3)                                = 0
write(1, "\n&lt;pre&gt;GET /wiki/Main_Page HTTP/1"..., 1300</pre>




<hr class="C" />


<h2>More Fsockopen Info</h2>

<h3>TCP Multiplexing</h3>
<p><a href="http://rfc.askapache.com/rfc793/rfc793.html#page-10">RFC 793</a>: To allow for many processes within a single Host to use TCP communication facilities simultaneously, the TCP provides a set of addresses or ports within each host.  Concatenated with the network and host addresses from the internet communication layer, this forms a socket.  A pair of sockets uniquely identifies each connection. That is, a socket may be simultaneously used in multiple connections.</p>
<p>The binding of ports to processes is handled independently by each Host.  However, it proves useful to attach frequently used processes (e.g., a "logger" or timesharing service) to fixed sockets which are made known to the public.  These services can then be accessed through the known addresses.  Establishing and learning the port addresses of other processes may involve more dynamic mechanisms.</p>

<h3>TCP Connections</h3>
<p>The reliability and flow control mechanisms described above require that TCPs initialize and maintain certain status information for each data stream.  The combination of this information, including sockets, sequence numbers, and window sizes, is called a connection. Each connection is uniquely specified by a pair of sockets identifying its two sides.</p>
<p>When two processes wish to communicate, their TCP's must first establish a connection (initialize the status information on each side).  When their communication is complete, the connection is terminated or closed to free the resources for other uses.</p>
<p>Since connections must be established between unreliable hosts and over the unreliable internet communication system, a handshake mechanism with clock-based sequence numbers is used to avoid erroneous initialization of connections.</p>

<h3>Fsockopen Practical Uses</h3>
<ul>
<li>Download Web Pages, Files, etc.</li>
<li><a href="http://www.askapache.com/php/speedy-form-post.html">Upload a file</a></li>
<li><a href="http://www.askapache.com/php/speedy-form-post.html">Send POST data to a form</a></li>
<li>Emulate cron</li>
<li>Download plugin updates</li>
<li><a href="http://www.askapache.com/online-tools/request-method-scanner/">Scan sites for exploits</a></li>
<li><a href="http://www.askapache.com/online-tools/curl-google-feed/">Auto Login to Google</a></li>
<li><a href="http://www.askapache.com/wordpress/crazy-cache-wordpress-plugin.html">Pass wp-nonces via cookie headers</a>, and more</li>
</ul>


<h3>Transfer-Encoding</h3>
<p><a href="http://www.ietf.org/rfc/rfc2068">RFC 2068</a></p>
<pre>19.4.6 Introduction of Transfer-Encoding
&nbsp;
   HTTP/1.1 introduces the Transfer-Encoding header field (section
   14.40).  Proxies/gateways MUST remove any transfer coding prior to
   forwarding a message via a MIME-compliant protocol.
&nbsp;
   A process for decoding the "chunked" transfer coding (section 3.6)
   can be represented in pseudo-code as:
&nbsp;
          length := 0
          read chunk-size, chunk-ext (if any) and CRLF
          while (chunk-size &gt; 0) {
             read chunk-data and CRLF
             append chunk-data to entity-body
             length := length + chunk-size
             read chunk-size and CRLF
          }
          read entity-header
          while (entity-header not empty) {
             append entity-header to existing header fields
             read entity-header
          }
          Content-Length := length
          Remove "chunked" from Transfer-Encoding</pre>


<h3>Socket-Related Man Pages</h3>
<pre>DESCRIPTION
This  manual  page  describes the Linux networking socket layer user interface. The BSD compatible sockets are the uniform interface between
the user process and the network protocol stacks in the kernel.  The protocol modules are  grouped  into  protocol  families  like  PF_INET,
PF_IPX, PF_PACKET and socket types like SOCK_STREAM or SOCK_DGRAM.  See socket(2) for more information on families and types.
&nbsp;
SOCKET LAYER FUNCTIONS
These  functions  are  used by the user process to send or receive packets and to do other socket operations. For more information see their
respective manual pages.
&nbsp;
socket(2) creates a socket, connect(2) connects a socket to a remote socket address, the bind(2) function binds a socket to a  local  socket
address,  listen(2)  tells  the socket that new connections shall be accepted, and accept(2) is used to get a new socket with a new incoming
connection.  socketpair(2) returns two connected anonymous sockets (only implemented for a few local families like PF_UNIX)
&nbsp;
send(2), sendto(2), and sendmsg(2) send data over a socket, and recv(2), recvfrom(2), recvmsg(2) receive data from a  socket.   poll(2)  and
select(2)  wait  for  arriving  data  or a readiness to send data.  In addition, the standard I/O operations like write(2), writev(2), send-
file(2), read(2), and readv(2) can be used to read and write data.
&nbsp;
getsockname(2) returns the local socket address and getpeername(2) returns the remote socket address.  getsockopt(2) and  setsockopt(2)  are
used to set or get socket layer or protocol options.  ioctl(2) can be used to set or read some other options.
&nbsp;
close(2) is used to close a socket.  shutdown(2) closes parts of a full duplex socket connection.
&nbsp;
Seeking, or calling pread(2) or pwrite(2) with a non-zero position is not supported on sockets.
&nbsp;
It  is possible to do non-blocking IO on sockets by setting the O_NONBLOCK flag on a socket file descriptor using fcntl(2).  Then all opera-
tions that would block will (usually) return with EAGAIN (operation should be retried later); connect(2) will return EINPROGRESS error.  The
user can then wait for various events via poll(2) or select(2).</pre>

<p>From the <a href="http://www.freebsd.org/cgi/man.cgi?query=socket&sektion=2&apropos=0&manpath=FreeBSD+7.0-RELEASE">FreeBSD man page for socket(2)</a></p>
<pre>Sockets of type SOCK_STREAM are full-duplex byte streams, similar to
pipes.  A stream socket must be in a connected state before any data may
be sent or received on it.  A connection to another socket is created
with a connect(2) system call.  Once connected, data may be transferred
using read(2) and write(2) calls or some variant of the send(2) and
recv(2) functions.  (Some protocol families, such as the Internet family,
support the notion of an ``implied connect&#039;&#039;, which permits data to be
sent piggybacked onto a connect operation by using the sendto(2) system
call.)  When a session has been completed a close(2) may be performed.
Out-of-band data may also be transmitted as described in send(2) and
received as described in recv(2).
&nbsp;
The communications protocols used to implement a SOCK_STREAM insure that
data is not lost or duplicated.  If a piece of data for which the peer
protocol has buffer space cannot be successfully transmitted within a
reasonable length of time, then the connection is considered broken and
calls will indicate an error with -1 returns and with ETIMEDOUT as the
specific code in the global variable errno.  The protocols optionally
keep sockets ``warm&#039;&#039; by forcing transmissions roughly every minute in
the absence of other activity.  An error is then indicated if no response
can be elicited on an otherwise idle connection for an extended period
(e.g. 5 minutes).  A SIGPIPE signal is raised if a process sends on a
broken stream; this causes naive processes, which do not handle the sig-
nal, to exit.</pre>

<p>Have Fun   ;)</p>




<pre>define (&#039;SOCKET_EPERM&#039;, 1);
define (&#039;SOCKET_ENOENT&#039;, 2);
define (&#039;SOCKET_EINTR&#039;, 4);
define (&#039;SOCKET_EIO&#039;, 5);
define (&#039;SOCKET_ENXIO&#039;, 6);
define (&#039;SOCKET_E2BIG&#039;, 7);
define (&#039;SOCKET_EBADF&#039;, 9);
define (&#039;SOCKET_EAGAIN&#039;, 11);
define (&#039;SOCKET_ENOMEM&#039;, 12);
define (&#039;SOCKET_EACCES&#039;, 13);
define (&#039;SOCKET_EFAULT&#039;, 14);
define (&#039;SOCKET_ENOTBLK&#039;, 15);
define (&#039;SOCKET_EBUSY&#039;, 16);
define (&#039;SOCKET_EEXIST&#039;, 17);
define (&#039;SOCKET_EXDEV&#039;, 18);
define (&#039;SOCKET_ENODEV&#039;, 19);
define (&#039;SOCKET_ENOTDIR&#039;, 20);
define (&#039;SOCKET_EISDIR&#039;, 21);
define (&#039;SOCKET_EINVAL&#039;, 22);
define (&#039;SOCKET_ENFILE&#039;, 23);
define (&#039;SOCKET_EMFILE&#039;, 24);
define (&#039;SOCKET_ENOTTY&#039;, 25);
define (&#039;SOCKET_ENOSPC&#039;, 28);
define (&#039;SOCKET_ESPIPE&#039;, 29);
define (&#039;SOCKET_EROFS&#039;, 30);
define (&#039;SOCKET_EMLINK&#039;, 31);
define (&#039;SOCKET_EPIPE&#039;, 32);
define (&#039;SOCKET_ENAMETOOLONG&#039;, 36);
define (&#039;SOCKET_ENOLCK&#039;, 37);
define (&#039;SOCKET_ENOSYS&#039;, 38);
define (&#039;SOCKET_ENOTEMPTY&#039;, 39);
define (&#039;SOCKET_ELOOP&#039;, 40);
define (&#039;SOCKET_EWOULDBLOCK&#039;, 11);
define (&#039;SOCKET_ENOMSG&#039;, 42);
define (&#039;SOCKET_EIDRM&#039;, 43);
define (&#039;SOCKET_ECHRNG&#039;, 44);
define (&#039;SOCKET_EL2NSYNC&#039;, 45);
define (&#039;SOCKET_EL3HLT&#039;, 46);
define (&#039;SOCKET_EL3RST&#039;, 47);
define (&#039;SOCKET_ELNRNG&#039;, 48);
define (&#039;SOCKET_EUNATCH&#039;, 49);
define (&#039;SOCKET_ENOCSI&#039;, 50);
define (&#039;SOCKET_EL2HLT&#039;, 51);
define (&#039;SOCKET_EBADE&#039;, 52);
define (&#039;SOCKET_EBADR&#039;, 53);
define (&#039;SOCKET_EXFULL&#039;, 54);
define (&#039;SOCKET_ENOANO&#039;, 55);
define (&#039;SOCKET_EBADRQC&#039;, 56);
define (&#039;SOCKET_EBADSLT&#039;, 57);
define (&#039;SOCKET_ENOSTR&#039;, 60);
define (&#039;SOCKET_ENODATA&#039;, 61);
define (&#039;SOCKET_ETIME&#039;, 62);
define (&#039;SOCKET_ENOSR&#039;, 63);
define (&#039;SOCKET_ENONET&#039;, 64);
define (&#039;SOCKET_EREMOTE&#039;, 66);
define (&#039;SOCKET_ENOLINK&#039;, 67);
define (&#039;SOCKET_EADV&#039;, 68);
define (&#039;SOCKET_ESRMNT&#039;, 69);
define (&#039;SOCKET_ECOMM&#039;, 70);
define (&#039;SOCKET_EPROTO&#039;, 71);
define (&#039;SOCKET_EMULTIHOP&#039;, 72);
define (&#039;SOCKET_EBADMSG&#039;, 74);
define (&#039;SOCKET_ENOTUNIQ&#039;, 76);
define (&#039;SOCKET_EBADFD&#039;, 77);
define (&#039;SOCKET_EREMCHG&#039;, 78);
define (&#039;SOCKET_ERESTART&#039;, 85);
define (&#039;SOCKET_ESTRPIPE&#039;, 86);
define (&#039;SOCKET_EUSERS&#039;, 87);
define (&#039;SOCKET_ENOTSOCK&#039;, 88);
define (&#039;SOCKET_EDESTADDRREQ&#039;, 89);
define (&#039;SOCKET_EMSGSIZE&#039;, 90);
define (&#039;SOCKET_EPROTOTYPE&#039;, 91);
define (&#039;SOCKET_ENOPROTOOPT&#039;, 92);
define (&#039;SOCKET_EPROTONOSUPPORT&#039;, 93);
define (&#039;SOCKET_ESOCKTNOSUPPORT&#039;, 94);
define (&#039;SOCKET_EOPNOTSUPP&#039;, 95);
define (&#039;SOCKET_EPFNOSUPPORT&#039;, 96);
define (&#039;SOCKET_EAFNOSUPPORT&#039;, 97);
define (&#039;SOCKET_EADDRINUSE&#039;, 98);
define (&#039;SOCKET_EADDRNOTAVAIL&#039;, 99);
define (&#039;SOCKET_ENETDOWN&#039;, 100);
define (&#039;SOCKET_ENETUNREACH&#039;, 101);
define (&#039;SOCKET_ENETRESET&#039;, 102);
define (&#039;SOCKET_ECONNABORTED&#039;, 103);
define (&#039;SOCKET_ECONNRESET&#039;, 104);
define (&#039;SOCKET_ENOBUFS&#039;, 105);
define (&#039;SOCKET_EISCONN&#039;, 106);
define (&#039;SOCKET_ENOTCONN&#039;, 107);
define (&#039;SOCKET_ESHUTDOWN&#039;, 108);
define (&#039;SOCKET_ETOOMANYREFS&#039;, 109);
define (&#039;SOCKET_ETIMEDOUT&#039;, 110);
define (&#039;SOCKET_ECONNREFUSED&#039;, 111);
define (&#039;SOCKET_EHOSTDOWN&#039;, 112);
define (&#039;SOCKET_EHOSTUNREACH&#039;, 113);
define (&#039;SOCKET_EALREADY&#039;, 114);
define (&#039;SOCKET_EINPROGRESS&#039;, 115);
define (&#039;SOCKET_EISNAM&#039;, 120);
define (&#039;SOCKET_EREMOTEIO&#039;, 121);
define (&#039;SOCKET_EDQUOT&#039;, 122);
define (&#039;SOCKET_ENOMEDIUM&#039;, 123);
define (&#039;SOCKET_EMEDIUMTYPE&#039;, 124);</pre>


<ul>
<li><a href="http://www.w3.org/Protocols/rfc2616/rfc2616.html">Hypertext Transfer Protocol — HTTP/1.1</a>, RFC 2616.  R. Fielding <em>et al.</em></li>
<li><a href="http://www.w3.org/Talks/9608HTTP/index.htm"> Hypertext ransport Protocol HTTP/1.1</a>.  J. Gettys. (slides)</li>
<li><a href="http://www.usenix.org/publications/library/proceedings/usenix99/invited_talks/mogul.pdf">What's wrong with HTTP (and why it doesn't matter)</a>.J. C. Mogul. (PDF slides)</li>
<li><a href="http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html">Network Performance Effects of HTTP/1.1, CSS1, and PNG</a>.H. F. Nielsen, J. Gettys <em>et al.</em></li>
<li>Mozilla's <a href="http://www.mozilla.org/projects/netlib/http/pipelining-faq.html">HTTP/1.1 Pipelining FAQ</a>. D. Fisher.</li>
<li>Wikipedia: <a href="http://en.wikipedia.org/wiki/HTTP_proxy">HTTP proxy</a>.</li>
</ul><p><a href="http://www.askapache.com/php/fsockopen-socket.html"></a><a href="http://www.askapache.com/php/fsockopen-socket.html">Fsockopen Power Plays</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/php/fsockopen-socket.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Blocking Bad Bots and Scrapers with .htaccess</title>
		<link>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html</link>
		<comments>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html#comments</comments>
		<pubDate>Tue, 08 Apr 2008 15:04:11 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Htaccess]]></category>

		<guid isPermaLink="false">http://www.askapache.com/?p=549</guid>
		<description><![CDATA[<p><a class="IFL" href='http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html'><img src="http://uploads.askapache.com/2008/04/bad_robot1.png" alt="Block Bad Robot" title="Block Bad Robot" width="112" height="150" /></a>Want to block a bad robot or web scraper using .htaccess files?  Here are 2 methods that illustrate blocking 436 various user-agents.<br class="C" /></p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html"></a><a href="http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html"><cite>AskApache.com</cite></a></p><p><a rel="lb" class="IFL" href='http://uploads.askapache.com/2008/04/bad_robot.png'><img src="http://uploads.askapache.com/2008/04/bad_robot.png" alt="Bad Robot!" title="Bad Robot!" /></a>This article shows 2 methods of blocking <a href="#badbotlist">this entire list of bad robots and web scrapers</a> with .htaccess files using <a href="#setenvif" title="jump to blocking bots with setenvif guide">SetEnvIfNoCase</a> or using <a href="#botblockrewrites" title="jump to blocking bots with mod_rewrite guide">RewriteRules with mod_rewrite</a><br class="C" /></p>






<h2><a id="botblockrewrites">Blocking Bad Robots and Web Scrapers with RewriteRules</a></h2>
<pre>ErrorDocument 403 /403.html
&nbsp;
RewriteEngine On
RewriteBase /
&nbsp;
# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]
&nbsp;
# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]
&nbsp;
# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]
&nbsp;
# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]</pre>



<h3>Alternate RewriteCond Rules</h3>
<pre>RewriteEngine on
&nbsp;
#Block spambots
RewriteCond %{HTTP:User-Agent} (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|\
BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|\
CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|\
eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|\
EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|\
Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|\
InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|\
larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|\
Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|\
Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|\
NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|\
Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|\
psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|\
Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|\
TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|\
Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|\
WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|\
WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|\
Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|\
Xaldon\sWebSpider|Xenu&#039;s|Zeus) [NC]
RewriteRule .? - [F]</pre>



<h2><a id="setenvif">Block Bad Bots with SetEnvIfNoCase</a></h2>
<pre>ErrorDocument 403 /403.html
&nbsp;
# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT</pre>







<h2><a id="badbotlist">Original Bad Bot / Web Scraper List</a></h2>
<ol>
<li><code>WebBandit</code></li><li><code>2icommerce</code></li><li><code>Accoona</code></li><li><code>ActiveTouristBot</code></li><li><code>adressendeutschland</code></li><li><code>aipbot</code></li><li><code>Alexibot</code></li><li><code>Alligator</code></li><li><code>AllSubmitter</code></li><li><code>almaden</code></li><li><code>anarchie</code></li><li><code>Anonymous</code></li><li><code>Apexoo</code></li><li><code>Aqua_Products</code></li><li><code>asterias</code></li><li><code>ASSORT</code></li><li><code>ATHENS</code></li><li><code>AtHome</code></li><li><code>Atomz</code></li><li><code>attache</code></li><li><code>autoemailspider</code></li><li><code>autohttp</code></li><li><code>b2w</code></li><li><code>bew</code></li><li><code>BackDoorBot</code></li><li><code>Badass</code></li><li><code>Baiduspider</code></li><li><code>Baiduspider+</code></li><li><code>BecomeBot</code></li><li><code>berts</code></li><li><code>Bitacle</code></li><li><code>Biz360</code></li><li><code>Black.Hole</code></li><li><code>BlackWidow</code></li><li><code>bladder fusion</code></li><li><code>Blog Checker</code></li><li><code>BlogPeople</code></li><li><code>Blogshares Spiders</code></li><li><code>Bloodhound</code></li><li><code>BlowFish</code></li><li><code>Board Bot</code></li><li><code>Bookmark search tool</code></li><li><code>BotALot</code></li><li><code>BotRightHere</code></li><li><code>Bot mailto:craftbot@yahoo.com</code></li><li><code>Bropwers</code></li><li><code>Browsezilla</code></li><li><code>BuiltBotTough</code></li><li><code>Bullseye</code></li><li><code>BunnySlippers</code></li><li><code>Cegbfeieh</code></li><li><code>CFNetwork</code></li><li><code>CheeseBot</code></li><li><code>CherryPicker</code></li><li><code>Crescent</code></li><li><code>charlotte/</code></li>
<li><code>ChinaClaw</code></li><li><code>Convera</code></li><li><code>Copernic</code></li><li><code>CopyRightCheck</code></li><li><code>cosmos</code></li><li><code>Crescent</code></li><li><code>c-spider</code></li><li><code>curl</code></li><li><code>Custo</code></li><li><code>Cyberz</code></li><li><code>DataCha0s</code></li><li><code>Daum</code></li><li><code>Deweb</code></li><li><code>Digger</code></li><li><code>Digimarc</code></li><li><code>digout4uagent</code></li><li><code>DIIbot</code></li><li><code>DISCo</code></li><li><code>DittoSpyder</code></li><li><code>DnloadMage</code></li><li><code>Download</code></li><li><code>dragonfly</code></li><li><code>DreamPassport</code></li><li><code>DSurf</code></li><li><code>DTS Agent</code></li><li><code>dumbot</code></li><li><code>DynaWeb</code></li><li><code>e-collector</code></li><li><code>EasyDL</code></li><li><code>EBrowse</code></li><li><code>eCatch</code></li><li><code>ecollector</code></li><li><code>edgeio</code></li><li><code>efp@gmx.net</code></li><li><code>EirGrabber</code></li><li><code>Email Extractor</code></li><li><code>EmailCollector</code></li><li><code>EmailSiphon</code></li><li><code>EmailWolf</code></li><li><code>EmeraldShield</code></li><li><code>Enterprise_Search</code></li><li><code>EroCrawler</code></li><li><code>ESurf</code></li><li><code>Eval</code></li><li><code>Everest-Vulcan</code></li><li><code>Exabot</code></li><li><code>Express</code></li><li><code>Extractor</code></li><li><code>ExtractorPro</code></li><li><code>EyeNetIE</code></li><li><code>FairAd</code></li><li><code>fastlwspider</code></li><li><code>fetch</code></li><li><code>FEZhead</code></li><li><code>FileHound</code></li><li><code>findlinks</code></li><li><code>Flaming AttackBot</code></li><li><code>FlashGet</code></li><li><code>FlickBot</code></li><li><code>Foobot</code></li><li><code>Forex</code></li><li><code>Franklin Locator</code></li><li><code>FreshDownload</code></li><li><code>FrontPage</code></li><li><code>FSurf</code></li><li><code>Gaisbot</code></li><li><code>Gamespy_Arcade</code></li><li><code>genieBot</code></li><li><code>GetBot</code></li><li><code>Getleft</code></li><li><code>GetRight</code></li><li><code>GetWeb!</code></li>
<li><code>Go!Zilla</code></li><li><code>Go-Ahead-Got-It</code></li><li><code>GOFORITBOT</code></li><li><code>GrabNet</code></li><li><code>Grafula</code></li><li><code>grub</code></li><li><code>Harvest</code></li><li><code>Hatena Antenna</code></li><li><code>heritrix</code></li><li><code>HLoader</code></li><li><code>HMView</code></li><li><code>holmes</code></li><li><code>HooWWWer</code></li><li><code>HouxouCrawler</code></li><li><code>HTTPGet</code></li><li><code>httplib</code></li><li><code>HTTPRetriever</code></li><li><code>HTTrack</code></li><li><code>humanlinks</code></li><li><code>IBM_Planetwide</code></li><li><code>iCCrawler</code></li><li><code>ichiro</code></li><li><code>iGetter</code></li><li><code>Image Stripper</code></li><li><code>Image Sucker</code></li><li><code>imagefetch</code></li><li><code>imds_monitor</code></li><li><code>IncyWincy</code></li><li><code>Industry Program</code></li><li><code>Indy</code></li><li><code>InetURL</code></li><li><code>InfoNaviRobot</code></li><li><code>InstallShield DigitalWizard</code></li><li><code>InterGET</code></li><li><code>IRLbot</code></li><li><code>Iron33</code></li><li><code>ISSpider</code></li><li><code>IUPUI Research Bot</code></li><li><code>Jakarta</code></li><li><code>java/</code></li><li><code>JBH Agent</code></li><li><code>JennyBot</code></li><li><code>JetCar</code></li><li><code>jeteye</code></li>
<li><code>jeteyebot</code></li><li><code>JoBo</code></li><li><code>JOC Web Spider</code></li><li><code>Kapere</code></li><li><code>Kenjin</code></li><li><code>Keyword Density</code></li><li><code>KRetrieve</code></li><li><code>ksoap</code></li><li><code>KWebGet</code></li><li><code>LapozzBot</code></li><li><code>larbin</code></li><li><code>leech</code></li><li><code>LeechFTP</code></li><li><code>LeechGet</code></li><li><code>leipzig.de</code></li><li><code>LexiBot</code></li><li><code>libWeb</code></li><li><code>libwww-FM</code></li><li><code>libwww-perl</code></li><li><code>LightningDownload</code></li><li><code>LinkextractorPro</code></li><li><code>Linkie</code></li><li><code>LinkScan</code></li><li><code>linktiger</code></li><li><code>LinkWalker</code></li><li><code>lmcrawler</code></li><li><code>LNSpiderguy</code></li><li><code>LocalcomBot</code></li><li><code>looksmart</code></li><li><code>LWP</code></li><li><code>Mac Finder</code></li><li><code>Mail Sweeper</code></li><li><code>mark.blonin</code></li><li><code>MaSagool</code></li><li><code>Mass</code></li><li><code>Mata Hari</code></li><li><code>MCspider</code></li>
<li><code>MetaProducts Download Express</code></li><li><code>Microsoft Data Access</code></li><li><code>Microsoft URL Control</code></li><li><code>MIDown</code></li><li><code>MIIxpc</code></li><li><code>Mirror</code></li><li><code>Missauga</code></li><li><code>Missouri College Browse</code></li><li><code>Mister</code></li><li><code>Monster</code></li><li><code>mkdb</code></li><li><code>moget</code></li><li><code>Moreoverbot</code></li><li><code>mothra/netscan</code></li><li><code>MovableType</code></li><li><code>Mozi!</code></li><li><code>Mozilla/22</code></li><li><code>Mozilla/3.0 (compatible)</code></li><li><code>Mozilla/5.0 (compatible; MSIE 5.0)</code></li><li><code>MSIE_6.0</code></li><li><code>MSIECrawler</code></li><li><code>MSProxy</code></li><li><code>MVAClient</code></li><li><code>MyFamilyBot</code></li><li><code>MyGetRight</code></li><li><code>nameprotect</code></li><li><code>NASA Search</code></li><li><code>Naver</code></li><li><code>Navroad</code></li><li><code>NearSite</code></li><li><code>NetAnts</code></li><li><code>netattache</code></li><li><code>NetCarta</code></li><li><code>NetMechanic</code></li><li><code>NetResearchServer</code></li><li><code>NetSpider</code></li><li><code>NetZIP</code></li><li><code>Net Vampire</code></li><li><code>NEWT ActiveX</code></li><li><code>Nextopia</code></li><li><code>NICErsPRO</code></li><li><code>ninja</code></li><li><code>NimbleCrawler</code></li><li><code>noxtrumbot</code></li><li><code>NPBot</code></li><li><code>Octopus</code></li><li><code>Offline</code></li><li><code>OK Mozilla</code></li>
<li><code>OmniExplorer</code></li><li><code>OpaL</code></li><li><code>Openbot</code></li><li><code>Openfind</code></li><li><code>OpenTextSiteCrawler</code></li><li><code>Oracle Ultra Search</code></li><li><code>OutfoxBot</code></li><li><code>P3P</code></li><li><code>PackRat</code></li><li><code>PageGrabber</code></li><li><code>PagmIEDownload</code></li><li><code>panscient</code></li><li><code>Papa Foto</code></li><li><code>pavuk</code></li><li><code>pcBrowser</code></li><li><code>perl</code></li><li><code>PerMan</code></li><li><code>PersonaPilot</code></li><li><code>PHP version</code></li><li><code>PlantyNet_WebRobot</code></li><li><code>playstarmusic</code></li><li><code>Plucker</code></li><li><code>Port Huron</code></li><li><code>Program Shareware</code></li><li><code>Progressive Download</code></li><li><code>ProPowerBot</code></li><li><code>prospector</code></li><li><code>ProWebWalker</code></li><li><code>Prozilla</code></li><li><code>psbot</code></li><li><code>psycheclone</code></li><li><code>puf</code></li><li><code>PushSite</code></li><li><code>PussyCat</code></li><li><code>PuxaRapido</code></li><li><code>Python-urllib</code></li><li><code>QuepasaCreep</code></li><li><code>QueryN</code></li><li><code>Radiation</code></li><li><code>RealDownload</code></li><li><code>RedCarpet</code></li><li><code>RedKernel</code></li><li><code>ReGet</code></li><li><code>relevantnoise</code></li><li><code>RepoMonkey</code></li><li><code>RMA</code></li><li><code>Rover</code></li><li><code>Rsync</code></li><li><code>RTG30</code></li><li><code>Rufus</code></li><li><code>SAPO</code></li><li><code>SBIder</code></li><li><code>scooter</code></li><li><code>ScoutAbout</code></li><li><code>script</code></li><li><code>searchpreview</code></li><li><code>searchterms</code></li><li><code>Seekbot</code></li><li><code>Serious</code></li><li><code>Shai</code></li><li><code>shelob</code></li><li><code>Shim-Crawler</code></li><li><code>SickleBot</code></li><li><code>sitecheck</code></li><li><code>SiteSnagger</code></li><li><code>Slurpy Verifier</code></li><li><code>SlySearch</code></li><li><code>SmartDownload</code></li><li><code>sna-</code></li><li><code>snagger</code></li><li><code>Snoopy</code></li>
<li><code>sogou</code></li><li><code>sootle</code></li><li><code>So-net” bat_bot</code></li><li><code>SpankBot” bat_bot</code></li><li><code>spanner” bat_bot</code></li><li><code>SpeedDownload</code></li><li><code>Spegla</code></li><li><code>Sphere</code></li><li><code>Sphider</code></li><li><code>SpiderBot</code></li><li><code>sproose</code></li><li><code>SQ Webscanner</code></li><li><code>Sqworm</code></li><li><code>Stamina</code></li><li><code>Stanford</code></li><li><code>studybot</code></li><li><code>SuperBot</code></li><li><code>SuperHTTP</code></li><li><code>Surfbot</code></li><li><code>SurfWalker</code></li><li><code>suzuran</code></li><li><code>Szukacz</code></li><li><code>tAkeOut</code></li><li><code>TALWinHttpClient</code></li><li><code>tarspider</code></li><li><code>Teleport</code></li><li><code>Telesoft</code></li><li><code>Templeton</code></li><li><code>TestBED</code></li><li><code>The Intraformant</code></li><li><code>TheNomad</code></li><li><code>TightTwatBot</code></li><li><code>Titan</code></li><li><code>toCrawl/UrlDispatcher</code></li><li><code>True_Robot</code></li><li><code>turingos</code></li><li><code>TurnitinBot</code></li><li><code>Twisted PageGetter</code></li><li><code>UCmore</code></li><li><code>UdmSearch</code></li><li><code>UMBC</code></li><li><code>UniversalFeedParser</code></li><li><code>URL Control</code></li><li><code>URLGetFile</code></li><li><code>URLy Warning</code></li><li><code>URL_Spider_Pro</code></li><li><code>UtilMind</code></li><li><code>vayala</code></li><li><code>vobsub</code></li><li><code>VCI</code></li><li><code>VoidEYE</code></li><li><code>VoilaBot</code></li><li><code>voyager</code></li><li><code>w3mir</code></li><li><code>Web Image Collector</code></li><li><code>Web Sucker</code></li>
<li><code>Web2WAP</code></li><li><code>WebaltBot</code></li><li><code>WebAuto</code></li><li><code>WebBandit</code></li><li><code>WebCapture</code></li><li><code>webcollage</code></li><li><code>WebCopier</code></li><li><code>WebCopy</code></li><li><code>WebEMailExtrac</code></li><li><code>WebEnhancer</code></li><li><code>WebFetch</code></li><li><code>WebFilter</code></li><li><code>WebFountain</code></li><li><code>WebGo</code></li><li><code>WebLeacher</code></li><li><code>WebMiner</code></li><li><code>WebMirror</code></li><li><code>WebReaper</code></li><li><code>WebSauger</code></li><li><code>WebSnake</code></li><li><code>Website</code></li><li><code>WebStripper</code></li><li><code>WebVac</code></li><li><code>webwalk</code></li><li><code>WebWhacker</code></li><li><code>WebZIP</code></li><li><code>Wells Search</code></li><li><code>WEP Search 00</code></li><li><code>WeRelateBot</code></li><li><code>Wget</code></li><li><code>WhosTalking</code></li><li><code>Widow</code></li><li><code>Wildsoft Surfer</code></li><li><code>WinHttpRequest</code></li><li><code>WinHTTrack</code></li><li><code>WUMPUS</code></li><li><code>WWWOFFLE</code></li><li><code>wwwster</code></li><li><code>WWW-Collector</code></li><li><code>Xaldon</code></li><li><code>Xenu&#039;s</code></li><li><code>Xenus</code></li><li><code>XGET</code></li><li><code>Y!TunnelPro</code></li><li><code>YahooYSMcm</code></li><li><code>YaDirectBot</code></li><li><code>Yeti</code></li><li><code>Zade</code></li><li><code>ZBot</code></li><li><code>zerxbot</code></li><li><code>Zeus</code></li><li><code>ZyBorg</code></li>
</ol><p><a href="http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html"></a><a href="http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html">Blocking Bad Bots and Scrapers with .htaccess</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html/feed</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>IP Abuse Detection for DreamHost</title>
		<link>http://www.askapache.com/security/htaccess-ip-blocking.html</link>
		<comments>http://www.askapache.com/security/htaccess-ip-blocking.html#comments</comments>
		<pubDate>Sun, 16 Mar 2008 13:46:44 +0000</pubDate>
		<dc:creator>AskApache</dc:creator>
				<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.askapache.com/security/htaccess-ip-blocking.html</guid>
		<description><![CDATA[<p><a class="IFR" rel="lb" href='http://www.askapache.com/security/htaccess-ip-blocking.html' title='Block IP Address .htaccess'><img src='http://uploads.askapache.com/2008/01/ip-abuse-3.png' alt='Block IP Address .htaccess' /></a>Scan Apache logs for IP address that are probably evil, then generates an .htaccess file to DENY them all.</p>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.askapache.com/security/htaccess-ip-blocking.html"></a><a href="http://www.askapache.com/security/htaccess-ip-blocking.html"><cite>AskApache.com</cite></a></p><p><a class="IFR" rel="lb" href='http://uploads.askapache.com/2008/01/ip-abuse-3.png' title='Block IP Address .htaccess'><img src='http://uploads.askapache.com/2008/01/ip-abuse-3.png' alt='Block IP Address .htaccess' title="ip abuse 3 security" /></a>Alot of people have asked me about the best way to block specific IP addresses that are attacking their servers.  No easy answer for sure.  Here's what I came up with.<br class="C" /></p>


<h2>Per-domain Error.log and Access.log checks</h2>
<p>This shell script analyzes a specific domains apache logs, both error and access, and then finds the IP address that hit your server the most.  Then it checks for a a reverse DNS for each of the IP addresses, if it doesn't have a reverse DNS, AND it results in a positive to one of the 3 other tests its added to a bad IP list.<br class="C" /></p>


<h2>Generates .htaccess blocking code</h2>
<p><a class="IFL" rel="lb" href='http://uploads.askapache.com/2008/01/ip-abuse-2.png' title='automatically generated .htaccess file'><img src='http://uploads.askapache.com/2008/01/ip-abuse-2.thumbnail.png' alt='automatically generated .htaccess file' title="ip abuse 2.thumbnail security" /></a>Once the tests have concluded (on my machine it takes maybe 5 minutes) you have the choice to view an automatically generated .htaccess file that you can copy into your real .htaccess file, full of the correct code to Block the bad IP's.  You can tell them bye bye with a 403 Forbidden.<br class="C" /></p>

<h2>IP Abuse / .htaccess Blocking Script</h2>
<p><img src='http://uploads.askapache.com/2008/01/ip-abuse-1.png' alt='IP Abuse Detection from Apache Logs' title="ip abuse 1 security" /></p>
<pre>#!/bin/sh
&nbsp;
# User-contributed script. Not sponsored by DreamHost.
# Script created 2008-01-16 by AskApache (www.askapache.com)
&nbsp;
### SHELL OPTIONS
set +o noclobber  # allowed to clobber files
set +o noglob     # globbing on
set +o xtrace     # change to - to enable tracing
set +o verbose    # change to - to enable verbose debugging
set -e            # abort on first error
&nbsp;
# directory where log files, reports, and generated .htaccess files will be saved
TMPDI="$HOME/ip_abuse"
&nbsp;
function exitt(){
 case $TERM in
  xterm*|vt*|ansi|rxvt|gnome*) echo -e "\033]0;$USER@`hostname`: $HOME\007" ;;
 esac
}
&nbsp;
function ok_continue()
{ echo -en "\n\033[?25l\033[30;42m[ Press any key to continue ]\033[0m\n" ;read -n 1 ans;echo -en "\033[?25h"; }
&nbsp;
function test_title()
{ echo -en "\n\n\033[0;32m\n&gt;&gt;&gt;"; echo -e "\033[1;37m $1 \033[0m \n"; echo -e "\n\n[ ${1} ]\n" &gt;&gt; $REPORT; }
&nbsp;
function error_abuse(){
 clear; title
 case "$2" in
  connlimit|1*) test_title "CONCURRENT CONNECTION TEST"; future=2
   echo -e "Shows IP&#039;s making more than 20 requests concurrently.\n\n"
   cat $TMPDI/$YD/logs/e* |grep &#039;concurrent\ connection\ limit&#039;|awk -F &#039;]&#039; &#039;{print $3}&#039; |
   awk &#039;{print $2}&#039;|sort|uniq -c |sort -nr|sed &#039;s/^ *//&#039;|egrep "[0-9]{2}+\ " &gt; $TMPDI/$YD/logs/out.txt ;;
  access|2*) test_title "TOP 50 IP TEST"; future=3
   echo -e "Displays the top 50 unique IP addresses that access your site."
   echo -e "If they don&#039;t have a reverse DNS maybe they should be blocked.\n\n"
   cat $TMPDI/$YD/logs/a* |awk &#039;{print $1|"sort|uniq -dc|sort -nr"}&#039; |egrep "[0-9]{3}+\ "|
   awk &#039;{print $1,$2}&#039; &gt; $TMPDI/$YD/logs/out.txt ;;
  internal|3*) test_title "INTERNAL RECURSION TEST"; future="report"
   echo -e "Shows the IP&#039;s that triggered an Internal Recursion Error,"
   echo -e "meaning that their is a looping problem on your server.\n"
   cat $TMPDI/$YD/logs/e* |grep &#039;LimitInternalRecursion&#039;|awk -F &#039;]&#039; &#039;{print $3}&#039;|awk &#039;{print $2}&#039; |
  sort|uniq -c|sort -nr|sed &#039;s/^ *//&#039;|egrep "[0-9]{2}+\ " &gt; $TMPDI/$YD/logs/out.txt ;;
 esac
&nbsp;
 echo -e "\033[?25l";t=0; h=0; cat $TMPDI/$YD/logs/out.txt |
 while read a
 do
 if [ $t -lt 50 ];then
  n=`echo "$a"|awk &#039;{print $1}&#039;`; ip=`echo "$a"|awk &#039;{print $2}&#039;`
  host=`nice -n 19 host -qQ -s 1 "${ip}" 2&gt;&amp;1|tr &#039;\n&#039; &#039;\t&#039;|awk &#039;{print $2}&#039;`;
  case "$host" in
   does) echo -en "\033[0;33m"; echo -e "$ip" &gt;&gt; $REPORT
    host=${host/does/!!!POSSIBLE-ABUSE!!!}; (( t++ )); (( h++ )); ips=" $ip$ips" ;;
   not) echo -en "\033[0;33m"; echo -e "$ip" &gt;&gt; $REPORT
    host=${host/not/!!!POSSIBLE-ABUSE!!!}; (( t++ )); (( h++ )); ips=" $ip$ips" ;;
   PTR) echo -en "\033[0;33m"; echo -e "$ip" &gt;&gt; $REPORT
    host=${host/PTR/!!!POSSIBLE-ABUSE!!!}; (( t++ )); (( h++ )); ips=" $ip$ips" ;;
   .) echo -en "\033[0;33m"; echo -e "$ip" &gt;&gt; $REPORT
    host=${host/\./!!!POSSIBLE-ABUSE!!!}; (( t++ )); (( h++ )); ips=" $ip$ips" ;;
  esac
  echo -en " ${n}\t${ip}\t${host}\033[0m\n"
  if [ $h -gt 3 ]; then
  echo -e "Deny from$ips" &gt;&gt; $TMPDI/$YD/.htaccess; h=0; ips=" ";
  fi
fi
done
[ $h -gt 0 ] &amp;&amp; echo -e "Deny from$ips" &gt;&gt; $TMPDI/$YD/.htaccess
ok_continue; exec sh $0 "$1" "$future"
}
&nbsp;
function menu(){
 PS3="`echo -e &#039;\033[0;36m&#039;`Please Select a Domain To Test: `echo -e &#039;\033[0m&#039;`"; echo -ne "\033[0m"
 select v
 do
  YD="$v"
  mkdir -p -m 0755 $TMPDI/$YD/reports
  mkdir -p -m 0755 $TMPDI/$YD/logs
  echo "" &gt; $TMPDI/$YD/logs/out.txt
  echo "" &gt; $TMPDI/$YD/.htaccess
  echo -e "## IP-ABUSE-LOOKUP\nOrder Allow,Deny\nAllow from All" &gt; $TMPDI/$YD/.htaccess
  ELOG="$TMPDI/$YD/logs/error.log"
  ALOG="$TMPDI/$YD/logs/access.log"
  REPORT=$(echo -en "$TMPDI/$YD/reports/`date +%mx%dx%y`.txt");
  echo "" &gt; $REPORT; echo -e "GENERATED REPORT FOR $YD \n`date`\n" &gt;&gt; $REPORT
  clear; title; cd ~/logs/$YD/http
  if [ ! -f $ELOG ]; then
   test_title "Creating Error log"
   `nice -n 19 gunzip -dc e*.gz | split -b 1m -a 4 -d - $TMPDI/$YD/logs/e.`; wait
   cp error.log $TMPDI/$YD/logs; echo -e "\033[0;31m [ DONE ]\n\n"
  fi
  if [ ! -f $ALOG ]; then
   test_title "Creating access log"
   `nice -n 19 gunzip -dc a*.gz | split -b 5m -a 4 -d - $TMPDI/$YD/logs/a.`; wait
   cp access.log $TMPDI/$YD/logs; echo -e "\033[0;31m [ DONE ]\n\n"
  fi
  cd $HOME; exec sh $0 "$YD" "1"
 done
}
&nbsp;
function show_report(){
 clear; title;
 PS3="`echo -e &#039;\033[0;36m&#039;`Please select a course of action: `echo -e &#039;\033[0m&#039;`"; echo -ne "\033[0m"
 select v in "View Report" "View .htaccess" "Quit"
 do
  case "$v" in
   *Report) clear; title; test_title "VIEWING $REPORT"; cat $REPORT ;;
   *htaccess) clear; title; test_title "VIEWING $TMPDI/$YD/.htaccess"; cat $TMPDI/$YD/.htaccess ;;
   Quit) break;;
  esac
 done
exit 0
}
&nbsp;
function title(){
 # pretty sweet!
 echo -e "\033[1;30m __________________________________________________________________________ "
 echo -e "|\033[1;32m                 ___                     __ __         __                 \033[1;30m|"
 echo -e "|\033[1;32m                / _ \_______ ___ ___ _  / // /__  ___ / /_                \033[1;30m|"
 echo -e &#039;|\033[1;32m               / // / __/ -_) _ `/  ` \/ _  / _ \(_-&lt;/ __/                \033[1;30m|&#039;
 echo -e "|\033[0;32m              /____/_/  \__/\_,_/_/_/_/_//_/\___/___/\__/                 \033[1;30m|"
 echo -e "|                                                                          |"
 echo -e "|            \033[1;37mDREAMHOST IP ABUSE DETECTION SCRIPT VERSION 0.1\033[1;30m               |";
 echo -e "\033[1;30m __________________________________________________________________________ \033[0m\n\n"
}
&nbsp;
# catch non-kill exit to reset term / ncurses
trap exitt EXIT
&nbsp;
# set window title if client is capable
case $TERM in
 xterm*|vt*|ansi|rxvt|gnome*) echo -e "\033]0;DREAMHOST IP ABUSE DETECTION SCRIPT\007" ;;
esac
&nbsp;
if [ $# -lt 1 ]; then
 clear; title; [ -d $TMPDI ] || mkdir -m 755 $TMPDI
 cd ~/logs; DOMAINS=( `ls ~/logs/` ); cd $OLDPWD; menu ${DOMAINS[@]}
else
 YD="${1}"
 mkdir -p -m 0755 $TMPDI/$YD/reports; mkdir -p -m 0755 $TMPDI/$YD/logs
 touch $TMPDI/$YD/logs/out.txt;
 ELOG="$TMPDI/$YD/logs/error.log"
 ALOG="$TMPDI/$YD/logs/access.log"
 REPORT=$(echo -en "$TMPDI/$YD/reports/`date +%mx%dx%y`.txt");
&nbsp;
 if [ "$2" == "report" ]; then
  show_report "$1" "$2"
 else
  error_abuse "$1" "$2"
 fi
fi
&nbsp;
exit 0</pre>

<p>Read more on the DreamHost wiki page: <a href="http://wiki.dreamhost.com/Block_IP_Abuse">Block IP Abuse</a>.</p><p><a href="http://www.askapache.com/security/htaccess-ip-blocking.html"></a><a href="http://www.askapache.com/security/htaccess-ip-blocking.html">IP Abuse Detection for DreamHost</a> originally appeared on <cite>AskApache.com</cite> </p>]]></content:encoded>
			<wfw:commentRss>http://www.askapache.com/security/htaccess-ip-blocking.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

