FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home  »  Htaccess  »  Htaccess SetEnvIf and SetEnvIfNoCase Examples

by 8 comments

SetEnvIf and SetEnvIfNoCase are really useful directives supplied by the mod_setenvif module that allow you to conditionally set environment variables accessible by scripts and apache based on the value of HTTP Headers, Other Variables, and Request information.

View mod_setenvif Source

For debugging, you may want to use my server environment variable debugging script

Unique mod_setenvif Variables

These can be used for attribute.

Remote_Host
the hostname (if available) of the client making the request - crawl-66-249-70-24.googlebot.com
Remote_Addr
IP address of the client making the request - 66.249.70.24
Server_Addr
IP address of the server on which the request was received - 208.113.183.103
Request_Method
name of the method being used - GET
Request_Protocol
name and version of the protocol with which the request was made - HTTP/1.1
Request_URI
the resource requested on the HTTP request line -- generally the portion of the URL following the scheme and host portion without the query string - /robots.txt

Syntax:

SetEnvIf attribute regex [!]env-variable[=value] [[!]env-variable[=value]] ...

Populates HTTP_MY_ Variables with mod_setenvif variable values

SetEnvIfNoCase Remote_Host "(.*)" HTTP_MY_REMOTE_HOST=$1
SetEnvIfNoCase Remote_Addr "(.*)" HTTP_MY_REMOTE_ADDR=$1
SetEnvIfNoCase Server_Addr "(.*)" HTTP_MY_SERVER_ADDR=$1
SetEnvIfNoCase Request_Method "(.*)" HTTP_MY_REQUEST_METHOD=$1
SetEnvIfNoCase Request_Protocol "(.*)" HTTP_MY_REQUEST_PROTOCOL=$1
SetEnvIfNoCase Request_URI "(.*)" HTTP_MY_REQUEST_URI=$1

Set REMOTE_HOST to HTTP_HOST

Sets REMOTE_HOST to www.askapache.com if Remote_Addr=208.113.183.103. This can be useful if your server doesn't automatically do a reverse lookup on a remote address, so this way you can tell if the request was internal/from your server.

SetEnvIf Remote_Addr 208.113.183.103 REMOTE_HOST=www.askapache.com

Allows only if HOST Header is present in request

SetEnvIfNoCase ^HOST$ .+ HTTP_MY_HAS_HOST
Order Deny,Allow
Deny from All
Allow from env=HTTP_MY_HAS_HOST

or

SetEnvIfNoCase Host .+ HTTP_MY_HAS_HOST
Order Deny,Allow
Deny from All
Allow from env=HTTP_MY_HAS_HOST

Add values from HTTP Headers

SetEnvIfNoCase ^If-Modified-Since$ "(.+)" HTTP_IF_MODIFIED_SINCE=$1
SetEnvIfNoCase ^If-None-Match$ "(.+)" HTTP_IF_NONE_MATCH=$1
SetEnvIfNoCase ^Cache-Control$ "(.+)" HTTP_CACHE_CONTROL=$1
SetEnvIfNoCase ^Connection$ "(.+)" HTTP_CONNECTION=$1
SetEnvIfNoCase ^Keep-Alive$ "(.+)" HTTP_KEEP_ALIVE=$1
SetEnvIfNoCase ^Authorization$ "(.+)" HTTP_AUTHORIZATION=$1
SetEnvIfNoCase ^Cookie$ "(.+)" HTTP_MY_COOKIE=$1

Set the REDIRECT_STATUS for Interpreter Security

This is useful in disallowing direct access to interpreters like shell scripts, cgi scripts, and other interpreters. Only works this way if you have a static IP for your server. So the only way to access these files is by instructing the server itself to request the file, using an Action directive or by requesting the file through a .php or other script using curl or wget, or something like fsockopen.

<FilesMatch "\.(cgi|sh|pl)$">
SetEnvIfNoCase Remote_Addr 208.113.183.103 REDIRECT_STATUS
 
Order Deny,Allow
Deny from All
Allow from env=REDIRECT_STATUS
</FilesMatch>

Block Bad Bots

Can be useful if your site is getting hammered by spambots. Some nice examples from around the net are at Fight Blog Spam With Apache... Keep in mind the HTTP_USER_AGENT is directly from the client, so its easy to spoof / change. Instead use mod_security for a much better solution.

SetEnvIfNoCase User-Agent "^Bandit" bad_bot
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "^BatchFTP" bad_bot
SetEnvIfNoCase User-Agent "^Bigfoot" bad_bot
SetEnvIfNoCase User-Agent "^Black.Hole" bad_bot
 
Order Allow,Deny
Allow from All
Deny from env=bad_bot

Allow Search robots

This does the opposite of above, allowing ONLY these web robots access. Other than rogue robots, configuring your robots.txt file correctly will keep most robots where you want them.

SetEnvIfNoCase User-Agent .*google.* search_robot
SetEnvIfNoCase User-Agent .*yahoo.* search_robot
SetEnvIfNoCase User-Agent .*bot.* search_robot
SetEnvIfNoCase User-Agent .*ask.* search_robot
 
Order Deny,Allow
Deny from All
Allow from env=search_robot

SetEnvIf Directive

Description: Sets environment variables based on attributes of the request
Syntax: SetEnvIf attribute regex [!]env-variable[=value] [[!]env-variable[=value]] ...
Context: server config, virtual host, directory, .htaccess
Override: FileInfo
Status: Base
Module: mod_setenvif

The SetEnvIf directive defines environment variables based on attributes of the request. The attribute specified in the first argument can be one of three things:

  1. An HTTP request header field (see RFC2616 for more information about these); for example: Host, User-Agent, Referer, and Accept-Language. A regular expression may be used to specify a set of request headers.
  2. One of the following aspects of the request:
    • Remote_Host - the hostname (if available) of the client making the request
    • Remote_Addr - the IP address of the client making the request
    • Server_Addr - the IP address of the server on which the request was received (only with versions later than 2.0.43)
    • Request_Method - the name of the method being used (GET, POST, et cetera)
    • Request_Protocol - the name and version of the protocol with which the request was made (e.g., "HTTP/0.9", "HTTP/1.1", etc.)
    • Request_URI - the resource requested on the HTTP request line -- generally the portion of the URL following the scheme and host portion without the query string. See the RewriteCond directive of mod_rewrite for extra information on how to match your query string.
  3. The name of an environment variable in the list of those associated with the request. This allows SetEnvIf directives to test against the result of prior matches. Only those environment variables defined by earlier SetEnvIf[NoCase] directives are available for testing in this manner. 'Earlier' means that they were defined at a broader scope (such as server-wide) or previously in the current directive's scope. Environment variables will be considered only if there was no match among request characteristics and a regular expression was not used for the attribute.

The second argument (regex) is a regular expression. If the regex matches against the attribute, then the remainder of the arguments are evaluated.

The rest of the arguments give the names of variables to set, and optionally values to which they should be set. These take the form of

  1. varname
  2. !varname
  3. varname=value

In the first form, the value will be set to "1". The second will remove the given variable if already defined, and the third will set the variable to the literal value given by value. Since version 2.0.51 Apache will recognize occurrences of $1..$9 within value and replace them by parenthesized subexpressions of regex.

SetEnvIf Example:

SetEnvIf Request_URI ".gif$" object_is_image=gif
SetEnvIf Request_URI ".jpg$" object_is_image=jpg
SetEnvIf Request_URI ".xbm$" object_is_image=xbm
SetEnvIf Referer www.askapache.com intra_site_referral
SetEnvIf object_is_image xbm XBIT_PROCESSING=1
SetEnvIf ^SETENVIF*  ^[a-z].*  HAS_SETENVIF

The first three will set the environment variable object_is_image if the request was for an image file, and the fourth sets intra_site_referral if the referring page was somewhere on the www.askapache.com Web site.

The last example will set environment variable HAS_SETENVIF if the request contains any headers that begin with "SETENVIF" whose values begins with any character in the set [a-z].

Tags

December 7th, 2008

Comments Welcome

  • bay islands

    I think Baidu spider is now legit. I saw a thread at webmasterworld where the baidu techs came and went through why they were ok.

  • Emmanuel Mwangi

    I would like to be able to make .htaccess directives on a number of sites I have version control systems. I have a copy of the site on my computer hosted on my local apache set to hostname.dev while that site is live on the net as hostname.com or .org.

    Right now, I can't keep the .htaccess file in version control because it differs greatly from my host's setup (the excellent Dreamhost) with my own setup (Apache2+PHP5 on OSX Leopard on a MBP). I had thought of trying to duplicate Dreamhosts' environment (just for fun) but, that's more trouble than it is worth in both fun and profit.

    So, I was looking for a way to have environment variables trigger various rules in .htaccess. I found this site (great btw) and I was looking for solution. I was thinking that there would be some way to trigger blocks of commands with directives from server variables. Something along these lines:

    DEVELOPMENT (MACBOOK APACHE SETUP) RULES...
    ...
    PRODUCTION (DREAMHOST) RULES...

  • Average Joe

    Your article and site is helpful, but the music is rude, annoying, poor design and dis-tasteful. Everyone likes their own music, and usually respect one another's tastes. Please respect your visitors and let us listen to our own music, not yours. DJ'ing should be left for radio hosts and party's. Thanks.

  • Marcello

    I don't understand this

    SetEnvIfNoCase ^If-Modified-Since$ "(.+)" HTTP_IF_MODIFIED_SINCE=$1

    shouldn't there be a variable in place of ^If-Modified-Since$ ? otherwise it will always match..

  • http://codefight.org/ damu

    can we do something like this:

    AuthUserFile /web/XYZ/passwd/.htpasswd

    AuthUserFile /web/XYZ2/passwd/.htpasswd

  • http://www.philipgoddard.com Philip Goddard

    I haven't been able to figure out how to make an exception when using the SetEnvIf directive.

    I have in my .htaccess the following configuration to block unwanted visitors on the basis of referrer keywords:

    setenvifnocase referer "^http://.*(poker|texas.?hold-?em|diet|loan|cash|video|liftmaster|fillbest|pharma|.info/)" spam_ref=1
    # (plus lots more similar lines)
    Order Allow,Deny
    Allow from all
    Deny from env=spam_ref

    That works fine except that certain legitimate referrers such as Google can actually contain the odd word that is in the forbidden referrer keyword list and thus get blocked, and so I want to make an exception for them - to allow them even if they do contain one of the banned words. Any suggestions of solutions that have actually been tested and found to operate reliably? Thanks.

  • http://www.askapache.com/ AskApache

    @ Philip Goddard

    Run another setenvif for those you want to allow, and make the action unset the spam_ref variable by prepending it with !.

    setenvifnocase referer "^http://.*(poker|texas.?hold-?em|diet|loan|cash|video|liftmaster|fillbest|pharma|.info/)" spam_ref=1
    setenvifnocase referer "^http://.*(google|yahoo|bing)" !spam_ref
    # (plus lots more similar lines)
    Order Allow,Deny
    Allow from all
    Deny from env=spam_ref
  • billy bobo

    how do I simply set REMOTE_HOST without if/then?

    I never want dodubleyahdodubleyah dodubleyah. I use rewrite to re-educate them to no-www.org

Popular Articles
My Online Tools

Related Articles
Newest Posts
Twitter



Hacking and Hackers

The use of "hacker" to mean "security breaker" is a confusion on the part of the mass media. We hackers refuse to recognize that meaning, and continue using the word to mean someone who loves to program, someone who enjoys playful cleverness, or the combination of the two. See my article, On Hacking.
-- Richard M. Stallman






[hide]

It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

| Google+ | askapache

Site Map | Contact Webmaster | License and Disclaimer | Terms of Service

↑ TOPMain