FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Mod_Rewrite Basic Examples

mod_rewrite is very useful in many situations. Yet some behaviors were not so obvious when I started to mess with it. After many tests, I understand it much better, now. Having said that, I do not pretend to know it perfectly. I also make mistakes.

So, although I tested all these on my local server and PowWeb, do not trust what I say blindly. Rather, please use it as a suggestion or bases for your own experiments.

First things first

You need these lines before any rules.

You need this in your .htaccess on some servers. You don't need it on PowWeb because it's already set. You must put this line once before to put RewriteRule. You don't have to put this in many cases. But specifying it can reduce the risk of endless looping.

Also, certain internal redirect rules and conditions will not work well if you don't use this line. So, put it, and forget about adding / at the beginning of the substitute string

REGEX is Perl compatible Regular expression.

ex. RewriteRule ^(abc.*)$ xxx$1 [R]
  ^ == beginning of URL
  $ == end of URL
  . (period) == matches any one character
  * == matches with zero or more of previous character.

RewriteRules may have additional Conditions.

RewriteCond STRING expression [optional flags]
RewriteRule REGEX substitution [optional flags]

For the details of REGEX, expressions, substitutions and options, you should read Apache module mod_rewrite documentation.

  1. mod_rewrite documentation
  2. URL rewriting Guide

What happens when the rules modify URL (in .htaccess)

When RewriteRule that changes URL is matched, modified URL will go through Next roud of processing from the beginning of the rule sets, again. This is a very important point.

It's because of the way Apache handles per-directory context (.htaccess, or in tab). It has to do per-directory auth and other processes for the newly generated path. In per-server or virtual host context (in httpd.conf), this doesn't happen.

In this example, URL is unchanged. (For this document, URL is the part of address without http://host.com/ and ?QUERY_STRING portion.)

Thus, it will not go through 2nd round of processing (the modification on QUERY_STRING does not affect this), unless it's a URL for a directory and transformed according to the DirectoryIndex directive or when resulting filepath doesn't exist.

RewriteRule ^xxx/(.*)$ yyy/$1

As the 1st rule changes URL, modified URL /xxx/abcSomething will go through next round and checked against from the 1st rule, again.

So, even the 1st rule has [L] (last) option to indicate following rules to be skipped, in the 2nd round, the URL will match the next rule, and modified to /yyy/abcSomething.

Then, this URL will go through 3rd round. But it will not match any rules, and the processing stops there.

How to avoid endless processing

If you make a mistake and processing go into endless loop, Apache will stop after preset numbers of times and issue 500 (Internal Server Error) with a log entry saying:

mod_rewrite: maximum number of internal redirects reached. ....

If you want to see that, try this. (on your home machine...)

ex.

One of the easiest method to avoid endless looping is using %{ENV:REDIRECT_STATUS}

Add loop stoper condition

RewriteRule ^(.*).html xyz/whatever.cgi?$1 [L]

The loop stopper rule for many following rules

RewriteRule ^ - [L]

Often RewriteCond with %{REQUEST_URI} has been used.

ex.

RewriteRule ^(.*)$ xyz/$1

Note 1. We don't need to put / in front of xyz in the right side of RewriteRule line, with RewriteBase /.

If we put it, it does not harm most of the cases. But it will create endless loop easier, and it will break other rules that does not expect multiple / in front of URL. So, it's better not to put it. Note 2. REQUEST_URI contains "/" + URL To check from the beginning og REQUEST_URI, we must use "/" like: ^/something However, the example code can be writteh like this, saving one REGEX processing. ex2. Or, RewriteRule (that does not change URL) and [L] option will do the same. ex.
RewriteRule ^(.*)$ xyz/$1
ex2.
RewriteRule ^(.*)$ index.php?p=$1
Often, PHP people uses many many way too many RewriteRules to achieve SEO friendly URL hype. By placing a simple rule that exclude any URL with a dot from being processed, you can save lots of wasteful REGEX processing for narmal files, such as .html, .jpg, .css. Example of this loop stopping method: Generic .htaccess Method for sub/pointed domains However, you can't use these tricks in some cases. (a.html => b.html, b.html => a.html) ex.
RewriteRule b.html a.html
One alternative is using %{THE_REQUEST} ex.
RewriteCond %{THE_REQUEST} ^(GET|HEAD) /b.html
RewriteRule b.html a.html
THE_REQUEST contains the first line of HTTP request header. It is something like "GET /index.html HTTP/1.1". So, by verifying this variable, we can make sure that the URL of "b.html" is not coming from the internal redirect but from the original request. Note. To match a string with a space, just escape with "" as shown in the example above. We can use %{QUERY_STRING} to check if it is the first round or subsequent one in some cases. But this method alone can't treat some cases. # And remove that string from QUERY_STRING # ($1 is from RewriteRule line, and %1 is from RewriteCond line.) # # The first Rule is needed to stop the URL that ends with slashe. #
RewriteCond %{QUERY_STRING} __XXX__$
RewriteRule /$ - [L]
RewriteCond %{QUERY_STRING} ^(.*)__XXX__$
RewriteRule ^(.*)$ $1?%1 [L]

# Add key string to the QUERY_STRING
RewriteRule ^(.*)$ $1?%{QUERY_STRING}__XXX__

# Following rules will be checked only onece,
# as long as QUERY_STRING is unmodified or the key string is kept.

# No modification to the QUERY_STRING
RewriteRule ^(other.*)$ rules/$1

# QUERY_STRING is conserved
RewriteRule ^(more.*)$ rules.cgi?$1%{QUERY_STRING}

# key string is placed, explicitly.
RewriteRule ^(yetmore.*)$ rules.cgi?$1__XXX___

I tried to use [E=ENVVAR:STRING] to distinguish the subsequest round but ENV
variables seem to be reset on each round... So, following example for preventing 2nd round does not work.


RewriteRule ^.*$ - [L]

RewriteRule ^.*$ - [L,E=DONE:YES]

This trick can be used to check if a certain rule is matched in the same round, though.
RewriteRule ^pattern1$ substitute1
RewriteRule ^pattern2$ substitute2
RewriteRule ^pattern3$ substitute3

RewriteCond %{ENV:DONE} YES
RewriteRule ^patternX$ substituteX
But you can use [S] (skip) and/or [C] (chain) in most cases instead of this %{ENV:VAR} trick. Rant I think mod_rewrite is BADLY designed. It doesn't have definitive way to control looping, and we can't use variables in the right had side of the RewriteCond. Also, the fact ENV variables get reset on each round is stupid ... How to give new ENV variable to cgi Note. On a server with suExec, most env variables are cleansed by suExec. You should prefix the env var with 'HTTP_' and it will survive! We can put any information we want to pass to CGI in QUERY_STRING, though. [E=VAR:STRING] option can be used to set ENV variable. But it will not go to cgi if it's set in the URL changing rule, or in the round that has URL changing rule. These rules should be placed at the beginning of the ruleset so that they are set again withoutt fail at the final round.
# As this is not usually available to CGI program,
# it is very useful in DIY authentication with CGI.
RewriteRule ^.*$ - [E=HTTP_AUTH:%{HTTP:Authorization}]

# available in RewriteRules.
RewriteRule ^.*$ - [E=HTTP_TIME:%{TIME}]

QUERY_STRING can be used for passing parameters, too.

RewriteRule ^(.*)$ $1?AUTH=%{HTTP:Authorization} [QSA]
or __SEPARATOR__ can be anything you want. You don't need it if QUERY_STRING is empty. You can pass any ENV variable, such as THE_REQUEST or TIME, this way. This can be usefull for debugging RewriteRule How to make rules more efficient It is similar to other programing language. Identify the resource consuming part, and try to minimize the trafic that goes through that part. For RewriteCond, "=" is probably the least time consuming of all, and -f, -d, -s, and others more time consuming. -U and -F could be the most costly one. REGEX maybe pretty heavy if the string checked is long and the pattern is complex. Checking against REQUEST_URI can be heavy because it inclueds both URL part and QUERY_STRING, which can be very very long... By using ^ to do forward matching, it may require less backtracking, thus more efficient. (On powerful servers, the difference can be invisible...)
RewriteRule ^(.*)$ subdir/$1

# This is not efficient...and may not work sometime
# because the REQUEST_URI may contain "subdir/"
# in a part of QUERY_STRING, and also it matches
# "/sub/sub-subdir" and "/abcdefgsubdir/" and so on.


RewriteRule ^(.*)$ subdir/$1
# Now, it is more efficient and no room for confusion. Secret directory I wrote a separate page for secret directory. secretdir.html ( This can be used with CGI Authentication and other tricks. Anti-Leech, bandwidth saving, Referer blocking I understand the desire to do these thing. However, it's not really effective, and it often causes more headaches. I do not recommend it unless you know well about rewriting and the limited effectiveness and potential problems. If you are a user of PowWeb, we have enough bandwidth allowance to cope with usual "Leeching". If you dislike leeching, maybe you can add your URL on the picture using ImageMagik or sitebuilder tool soon available from PowWeb! More about Anti-Leech measures However, kicking off certain robots is a good practice. Some robots will access well over several hundreds items per minutes. If your script is hit like this, the server may experience lots of load. Although there seems to be a built-in safety cut off mechanism of PowWeb, we can do our part in this. ## Keep bad robots off. ## Give them blank page instead of 403. Cost less for thr server
RewriteEngine on
RewriteBase /

RewriteRule ^blank.txt - [L]

RewriteCond %{HTTP_USER_AGENT}
 (MSIECrawler|Ninja|Microsoft|MSFront|WebCopier|Pockey) [NC]
RewriteRule ^(.*)$  blank.txt [L]
Usually, the last rule is like this, "RewriteRule ^(.*)$ - [F]" and it gives 403 Forbidden error with error_log entry. I don't like to see massive entries in my error_log because detecting more serious trouble will be harder due to too many garbage entries. So, I decided to send them a nice white blank page without any data. This saves bit of bandwidth, simplifies my error_log, and cost a little less for the server because there is no need to make two log entries. And as it is a little more polite to send blank page with 200 OK response code than 403 Forbidden, it may even reduce the risk of atacks by frustrated youth. Search Engine friendly URL It seems to be another hot topics among PHP users. I think it's better to parse the URL in PHP rather than trying to do something with mod_rewrite. The idea is, using such a URL http://host.net/aa/bb/cc/dd is better than usual php thing http://host.net/index.php?aa=bb&cc=dd And with mod_rewrite,
RewriteRule ^/*(([^/]+)/+([^/]+)/+([^/]+)/*(.*)$ index.php?$1=$2&$3=$4 [L]
do something like this. While simple example like this works, more complex rule could be tricky. Although the string parsing power of mod_rewrite is not that bad, it should be much easier to do your own parsing in php using $_SERVER{QUERY_STRING} or $_SERVER{REQUEST_URI} and other variables. Samething apllies to Perl and other language. But php people seem to be more eager to do this...somehow. Maybe they don't want to change their script or they don't know what to change, how to change... Oh well, here is very inefficient but flexible version. This one can treat any number of parameters. But I think it is a resource consuming hog.
RewriteRule ^/*([^/]+)/+([^/]+)(.*)/*$ $3?$1=$2 [L,QSA]
RewriteRule ^([^/]+)$ index.php?$1 [L,QSA]
Tring to double the parameters treated in one round.
RewriteRule ^/*([^/]+)/+([^/]+)/+([^/]+)/+([^/]+)(.*)/*$ $5?$1=$2&$3=$4 [L,QSA]
RewriteRule ^([^/]+)$ index.php?$1 [L,QSA]
A little better idea is, using such a URL http://host.net/bb/dd (instead of http://host.net/aa/bb/cc/dd) to obtain this. http://host.net/index.php?aa=bb&cc=dd
RewriteRule ^/*(([^/]+)/+([^/]+)/*(.*)$ index.php?aa=$1&bb=$2&$3 [L]
While the URL looks better and more efficient, this one is not flexible. Serve dynamic page statically Generating a page for each and every request is a pure waste of server resources unless there is a good reason. Most of the time, exactly same page can be served many times. So, it makes sense to implement "generate once, serve many times" system. I wrote an example of such system (very simple one) for someone. Please take a look if you are intersted. [cache.html] ampescape ? I saw a question about RewriteMap, recently. As we can't use RewriteMap in .htaccess, I wrote a RewriteRule that escape & to %26, as ampescape would do.
RewriteRule ^([^&]*)&(.*)$ $1%26$2 [NE,N]
RewriteRule ^([^&]*)$ whatever.php?title=$1 [L]

Htaccess

 

 

Comments