FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home  »  Htaccess  »  Mod_Rewrite Basic Examples

by 9 comments

mod_rewrite is very useful in many situations. Yet some behaviors were not so obvious when I started to mess with it. After many tests, I understand it much better, now. Having said that, I do not pretend to know it perfectly. I also make mistakes.

So, although I tested all these on my local server and PowWeb, do not trust what I say blindly. Rather, please use it as a suggestion or bases for your own experiments.


First things first

You need these lines before any rules.

You need this in your .htaccess on some servers. You don't need it on PowWeb because it's already set. You must put this line once before to put RewriteRule. You don't have to put this in many cases. But specifying it can reduce the risk of endless looping.

Also, certain internal redirect rules and conditions will not work well if you don't use this line. So, put it, and forget about adding / at the beginning of the substitute string

REGEX is Perl compatible Regular expression.

ex. RewriteRule ^(abc.*)$ xxx$1 [R]
  ^ == beginning of URL
  $ == end of URL
  . (period) == matches any one character
  * == matches with zero or more of previous character.

RewriteRules may have additional Conditions.

RewriteCond STRING expression [optional flags]
RewriteRule REGEX substitution [optional flags]

For the details of REGEX, expressions, substitutions and options, you should read Apache module mod_rewrite documentation.

  1. mod_rewrite documentation
  2. URL rewriting Guide

What happens when the rules modify URL (in .htaccess)

When RewriteRule that changes URL is matched, modified URL will go through Next roud of processing from the beginning of the rule sets, again. This is a very important point.

It's because of the way Apache handles per-directory context (.htaccess, or in <Directory> tab). It has to do per-directory auth and other processes for the newly generated path. In per-server or virtual host context (in httpd.conf), this doesn't happen.

In this example, URL is unchanged. (For this document, URL is the part of address without http://host.com/ and ?QUERY_STRING portion.)

Thus, it will not go through 2nd round of processing (the modification on QUERY_STRING does not affect this), unless it's a URL for a directory and transformed according to the DirectoryIndex directive or when resulting filepath doesn't exist.

RewriteRule ^xxx/(.*)$ yyy/$1

As the 1st rule changes URL, modified URL /xxx/abcSomething will go through next round and checked against from the 1st rule, again.

So, even the 1st rule has [L] (last) option to indicate following rules to be skipped, in the 2nd round, the URL will match the next rule, and modified to /yyy/abcSomething.

Then, this URL will go through 3rd round. But it will not match any rules, and the processing stops there.

How to avoid endless processing

If you make a mistake and processing go into endless loop, Apache will stop after preset numbers of times and issue 500 (Internal Server Error) with a log entry saying:

mod_rewrite: maximum number of internal redirects reached. ....

If you want to see that, try this. (on your home machine...)

ex.

One of the easiest method to avoid endless looping is using %{ENV:REDIRECT_STATUS}

Add loop stoper condition

RewriteRule ^(.*).html xyz/whatever.cgi?$1 [L]

The loop stopper rule for many following rules

RewriteRule ^ - [L]

Often RewriteCond with %{REQUEST_URI} has been used.

ex.

RewriteRule ^(.*)$ xyz/$1

Note 1. We don't need to put / in front of xyz in the right side of RewriteRule line, with RewriteBase /.

If we put it, it does not harm most of the cases.But it will create endless loop easier,and it will break other rules that does not expectmultiple / in front of URL.So, it's better not to put it.Note 2. REQUEST_URI contains "/" + URLTo check from the beginning og REQUEST_URI,we must use "/" like: ^/somethingHowever, the example code can be writteh like this, saving one REGEX processing.ex2.Or, RewriteRule (that does not change URL) and [L] option will do the same.ex.
RewriteRule ^(.*)$ xyz/$1
ex2.
RewriteRule ^(.*)$ index.php?p=$1
Often, PHP people uses many many way too many RewriteRulesto achieve SEO friendly URL hype.By placing a simple rule that exclude any URL with a dot frombeing processed, you can save lots of wasteful REGEX processingfor narmal files, such as .html, .jpg, .css.Example of this loop stopping method:Generic .htaccess Method for sub/pointed domainsHowever, you can't use these tricks in some cases.(a.html => b.html, b.html => a.html)ex.
RewriteRule b.html a.html
One alternative is using %{THE_REQUEST}ex.
RewriteCond %{THE_REQUEST} ^(GET|HEAD) /b.html
RewriteRule b.html a.html
THE_REQUEST contains the first line of HTTP request header.It is something like "GET /index.html HTTP/1.1".So, by verifying this variable, we can make sure thatthe URL of "b.html" is not coming from the internal redirectbut from the original request.Note. To match a string with a space, just escape with ""as shown in the example above.We can use %{QUERY_STRING} to check if it is the first round or subsequentone in some cases. But this method alone can't treat some cases.# And remove that string from QUERY_STRING# ($1 is from RewriteRule line, and %1 is from RewriteCond line.)## The first Rule is needed to stop the URL that ends with slashe.#
RewriteCond %{QUERY_STRING} __XXX__$
RewriteRule /$ - [L]
RewriteCond %{QUERY_STRING} ^(.*)__XXX__$
RewriteRule ^(.*)$ $1?%1 [L]
 
# Add key string to the QUERY_STRING
RewriteRule ^(.*)$ $1?%{QUERY_STRING}__XXX__
 
# Following rules will be checked only onece,
# as long as QUERY_STRING is unmodified or the key string is kept.
 
# No modification to the QUERY_STRING
RewriteRule ^(other.*)$ rules/$1
 
# QUERY_STRING is conserved
RewriteRule ^(more.*)$ rules.cgi?$1%{QUERY_STRING}
 
# key string is placed, explicitly.
RewriteRule ^(yetmore.*)$ rules.cgi?$1__XXX___
 
I tried to use [E=ENVVAR:STRING] to distinguish the subsequest round but ENV
variables seem to be reset on each round... So, following example for preventing 2nd round does not work.
 
RewriteRule ^.*$ - [L]
 
RewriteRule ^.*$ - [L,E=DONE:YES]
 
This trick can be used to check if a certain rule is matched in the same round, though.
RewriteRule ^pattern1$ substitute1
RewriteRule ^pattern2$ substitute2
RewriteRule ^pattern3$ substitute3
 
RewriteCond %{ENV:DONE} YES
RewriteRule ^patternX$ substituteX
But you can use [S] (skip) and/or [C] (chain) in most cases instead of this%{ENV:VAR} trick.RantI think mod_rewrite is BADLY designed.It doesn't have definitive way to control looping,and we can't use variables in the right had side of the RewriteCond.Also, the fact ENV variables get reset on each round is stupid ...How to give new ENV variable to cgiNote. On a server with suExec, most env variables are cleansed by suExec.You should prefix the env var with 'HTTP_' and it will survive!We can put any information we want to pass to CGI in QUERY_STRING, though.[E=VAR:STRING] option can be used to set ENV variable. But it will not go to cgiif it's set in the URL changing rule, or in the round that has URL changing rule.These rules should be placed at the beginning of the ruleset so that they are setagain withoutt fail at the final round.
# As this is not usually available to CGI program,
# it is very useful in DIY authentication with CGI.
RewriteRule ^.*$ - [E=HTTP_AUTH:%{HTTP:Authorization}]
 
# available in RewriteRules.
RewriteRule ^.*$ - [E=HTTP_TIME:%{TIME}]
 
QUERY_STRING can be used for passing parameters, too.
 
RewriteRule ^(.*)$ $1?AUTH=%{HTTP:Authorization} [QSA]
or__SEPARATOR__ can be anything you want. You don't need it if QUERY_STRING is empty.You can pass any ENV variable, such as THE_REQUEST or TIME, this way.This can be usefull for debugging RewriteRuleHow to make rules more efficientIt is similar to other programing language.Identify the resource consuming part, and try to minimize the trafic that goes through that part.For RewriteCond, "=" is probably the least time consuming of all, and -f, -d, -s, and others moretime consuming. -U and -F could be the most costly one.REGEX maybe pretty heavy if the string checked is long and the pattern is complex. Checking against REQUEST_URI can be heavy because it inclueds both URL part andQUERY_STRING, which can be very very long...By using ^ to do forward matching, it may require less backtracking, thus more efficient.(On powerful servers, the difference can be invisible...)
RewriteRule ^(.*)$ subdir/$1
 
# This is not efficient...and may not work sometime
# because the REQUEST_URI may contain "subdir/"
# in a part of QUERY_STRING, and also it matches
# "/sub/sub-subdir" and "/abcdefgsubdir/" and so on.
 
RewriteRule ^(.*)$ subdir/$1
# Now, it is more efficient and no room for confusion.Secret directoryI wrote a separate page for secret directory. secretdir.html (This can be used with CGI Authentication and other tricks.Anti-Leech, bandwidth saving, Referer blockingI understand the desire to do these thing. However, it's not really effective, and it oftencauses more headaches.I do not recommend it unless you know well about rewriting and the limited effectiveness and potential problems.If you are a user of PowWeb, we have enough bandwidth allowance to cope with usual "Leeching".If you dislike leeching, maybe you can add your URL on the picture using ImageMagikor sitebuilder tool soon available from PowWeb!More about Anti-Leech measuresHowever, kicking off certain robots is a good practice. Some robots will access wellover several hundreds items per minutes. If your script is hit like this, the server mayexperience lots of load. Although there seems to be a built-in safety cut off mechanism of PowWeb, we can do our part in this.## Keep bad robots off.## Give them blank page instead of 403. Cost less for thr server
RewriteEngine on
RewriteBase /
 
RewriteRule ^blank.txt - [L]
 
RewriteCond %{HTTP_USER_AGENT}
 (MSIECrawler|Ninja|Microsoft|MSFront|WebCopier|Pockey) [NC]
RewriteRule ^(.*)$  blank.txt [L]
Usually, the last rule is like this, "RewriteRule ^(.*)$ - [F]" and it gives 403Forbidden error with error_log entry.I don't like to see massive entries in my error_log because detecting more serious trouble will be harder due to too many garbage entries.So, I decided to send them a nice white blank page without any data.This saves bit of bandwidth, simplifies my error_log, and cost a little less for the server because there is no need to make two log entries.And as it is a little more polite to send blank page with 200 OK response code than403 Forbidden, it may even reduce the risk of atacks by frustrated youth.Search Engine friendly URLIt seems to be another hot topics among PHP users.I think it's better to parse the URL in PHP rather thantrying to do something with mod_rewrite.The idea is, using such a URLhttp://host.net/aa/bb/cc/ddis better than usual php thinghttp://host.net/index.php?aa=bb&cc=ddAnd with mod_rewrite,
RewriteRule ^/*(([^/]+)/+([^/]+)/+([^/]+)/*(.*)$ index.php?$1=$2&$3=$4 [L]
do something like this.While simple example like this works, more complex rule could be tricky.Although the string parsing power of mod_rewrite is not that bad,it should be much easier to do your own parsing in phpusing $_SERVER{QUERY_STRING} or $_SERVER{REQUEST_URI} and other variables.Samething apllies to Perl and other language.But php people seem to be more eager to do this...somehow.Maybe they don't want to change their script or they don't knowwhat to change, how to change...Oh well, here is very inefficient but flexible version.This one can treat any number of parameters.But I think it is a resource consuming hog.
RewriteRule ^/*([^/]+)/+([^/]+)(.*)/*$ $3?$1=$2 [L,QSA]
RewriteRule ^([^/]+)$ index.php?$1 [L,QSA]
Tring to double the parameters treated in one round.
RewriteRule ^/*([^/]+)/+([^/]+)/+([^/]+)/+([^/]+)(.*)/*$ $5?$1=$2&$3=$4 [L,QSA]
RewriteRule ^([^/]+)$ index.php?$1 [L,QSA]
A little better idea is, using such a URL http://host.net/bb/dd (instead of http://host.net/aa/bb/cc/dd) to obtain this.http://host.net/index.php?aa=bb&cc=dd
RewriteRule ^/*(([^/]+)/+([^/]+)/*(.*)$ index.php?aa=$1&bb=$2&$3 [L]
While the URL looks better and more efficient, this one is not flexible.Serve dynamic page staticallyGenerating a page for each and every request is a pure waste of server resources unlessthere is a good reason.Most of the time, exactly same page can be served many times. So, it makes sense to implement "generate once, serve many times" system.I wrote an example of such system (very simple one) for someone.Please take a look if you are intersted. [cache.html]ampescape ?I saw a question about RewriteMap, recently. As we can't use RewriteMap in .htaccess,I wrote a RewriteRule that escape & to %26, as ampescape would do.
RewriteRule ^([^&]*)&(.*)$ $1%26$2 [NE,N]
RewriteRule ^([^&]*)$ whatever.php?title=$1 [L]

Tags

November 7th, 2006

Comments Welcome

  • basile

    typo: $_SERVER{'QUERY_STIRING

  • Glenn

    Thanks for the information.

    I currently have the following in my .htaccess on a Wordpress website:

    RewriteRule ^([a-zA-Z_0-9.'-]+).html$ ./?s=$1

    I’m trying to use your rule to escape & to %26:

    RewriteRule ^([^&]*)&(.*)$ $1%26$2 [NE,N]

    The problem I'm having is that i doesn't appear to be parsing the & sign when using the WP search function.

    For example, if I type:

    mydomain.com/at&t.html

    It actually parses

    mydomain.com/at6t.html

    to the website.

    Could you provide any help with this at all please?

    Thank you

  • Andy

    It looks like there are quotes missing in some examples where there are spaces in CondPattern.

    Note. To match a string with a space, just escape with "" as shown in the example above.

    ^^note is confusing, but confirms the missing quotes.

    Could you fix it for us, newbs?

  • CarloLoretta

    Hi, this is very interesting.

    I have a problem like this: external link: ..../italian-cooking/carne/fagiano/ricette/fag_00%3Cbr%20%3E1_fagiano_ai_frutti_di_bosco.shtml

    I cannot rewrite this

    • AskApache

      What Rewrite Code are you currently using?

      • CarloLoretta

        I tried to use different local solutions, but the presence of the character < or %3C produces a 403 error "Access Denied!"

        For example:

        RewriteCond %{HTTP_HOST} ^.*$
        RewriteRule ^(.*)/?(%3C|<)$ http://192.168.0.4 [R=301,L]
        or excaping char % RewriteRule ^(.*)/?(%3C|<)$ http://192.168.0.4 [R=301,L]

  • CarloLoretta

    Hi, this is very interesting.

    I have a problem like this: external link: ..../italian-cooking/carne/fagiano/ricette/fag_00%3Cbr%20%3E1_fagiano_ai_frutti_di_bosco.shtml

    I cannot rewrite this

    • AskApache

      What Rewrite Code are you currently using?

      • CarloLoretta

        I tried to use different local solutions, but the presence of the character < or %3C produces a 403 error "Access Denied!"

        For example:

        RewriteCond %{HTTP_HOST} ^.*$
        RewriteRule ^(.*)/?(%3C|<)$ http://192.168.0.4 [R=301,L]
        or excaping char % RewriteRule ^(.*)/?(%3C|<)$ http://192.168.0.4 [R=301,L]

My Online Tools


Popular Articles



Hacking and Hackers

The use of "hacker" to mean "security breaker" is a confusion on the part of the mass media. We hackers refuse to recognize that meaning, and continue using the word to mean someone who loves to program, someone who enjoys playful cleverness, or the combination of the two. See my article, On Hacking.
-- Richard M. Stallman



Related Articles

Twitter









[hide]

It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

+Askapache | askapache

Site Map | Contact Webmaster | License and Disclaimer | Terms of Service

↑ TOPMain