|
Apache HTTP Server Version 1.3
Module mod_rewrite
URL Rewriting Engine
This module provides a rule-based rewriting engine to rewrite requested URLs on the fly.
Status:
Extension
Source
File: mod_rewrite.c
Module
Identifier: rewrite_module
Compatibility:
Available in Apache 1.2 and later.
Summary
``The great thing about mod_rewrite is it gives you all the configurability and
flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the
configurability and flexibility of Sendmail.''
-- Brian Behlendorf
Apache Group
`` Despite the tons of examples and docs, mod_rewrite is voodoo. Damned cool
voodoo, but still voodoo. ''
-- Brian Moore
bem@news.cmc.net
Welcome to mod_rewrite, the Swiss Army Knife of URL manipulation!
This module uses a rule-based rewriting engine (based on a regular-expression parser) to
rewrite requested URLs on the fly. It supports an unlimited number of rules and an unlimited
number of attached rule conditions for each rule to provide a really flexible and powerful
URL manipulation mechanism. The URL manipulations can depend on various tests, for instance
server variables, environment variables, HTTP headers, time stamps and even external
database lookups in various formats can be used to achieve a really granular URL matching.
This module operates on the full URLs (including the path-info part) both in per-server
context (httpd.conf) and per-directory context (.htaccess) and can
even generate query-string parts on result. The rewritten result can lead to internal
sub-processing, external request redirection or even to an internal proxy throughput.
But all this functionality and flexibility has its drawback: complexity. So don't expect
to understand this entire module in just one day.
This module was invented and originally written in April 1996
and gifted exclusively to the The Apache Group in July 1997 by
Ralf S. Engelschall
rse@engelschall.com
www.engelschall.com
Table Of Contents
Internal Processing
Configuration Directives
Miscellaneous
The internal processing of this module is very complex but needs to be explained once
even to the average user to avoid common mistakes and to let you exploit its full
functionality.
First you have to understand that when Apache processes a HTTP request it does this in
phases. A hook for each of these phases is provided by the Apache API. Mod_rewrite uses two
of these hooks: the URL-to-filename translation hook which is used after the HTTP request
has been read but before any authorization starts and the Fixup hook which is triggered
after the authorization phases and after the per-directory config files (.htaccess)
have been read, but before the content handler is activated.
So, after a request comes in and Apache has determined the corresponding server (or
virtual server) the rewriting engine starts processing of all mod_rewrite directives from
the per-server configuration in the URL-to-filename phase. A few steps later when the final
data directories are found, the per-directory configuration directives of mod_rewrite are
triggered in the Fixup phase. In both situations mod_rewrite rewrites URLs either to new
URLs or to filenames, although there is no obvious distinction between them. This is a usage
of the API which was not intended to be this way when the API was designed, but as of Apache
1.x this is the only way mod_rewrite can operate. To make this point more clear remember the
following two points:
- Although mod_rewrite rewrites URLs to URLs, URLs to filenames and even filenames to
filenames, the API currently provides only a URL-to-filename hook. In Apache 2.0 the two
missing hooks will be added to make the processing more clear. But this point has no
drawbacks for the user, it is just a fact which should be remembered: Apache does more
in the URL-to-filename hook than the API intends for it.
- Unbelievably mod_rewrite provides URL manipulations in per-directory context, i.e.,
within
.htaccess files, although these are reached a very long time after
the URLs have been translated to filenames. It has to be this way because .htaccess
files live in the filesystem, so processing has already reached this stage. In other
words: According to the API phases at this time it is too late for any URL
manipulations. To overcome this chicken and egg problem mod_rewrite uses a trick: When
you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the
filename back to its corresponding URL (which is usually impossible, but see the RewriteBase
directive below for the trick to achieve this) and then initiates a new internal
sub-request with the new URL. This restarts processing of the API phases.
Again mod_rewrite tries hard to make this complicated step totally transparent to the
user, but you should remember here: While URL manipulations in per-server context are
really fast and efficient, per-directory rewrites are slow and inefficient due to this
chicken and egg problem. But on the other hand this is the only way mod_rewrite can
provide (locally restricted) URL manipulations to the average user.
Don't forget these two points!
Now when mod_rewrite is triggered in these two API phases, it reads the configured rulesets
from its configuration structure (which itself was either created on startup for per-server
context or during the directory walk of the Apache kernel for per-directory context). Then
the URL rewriting engine is started with the contained ruleset (one or more rules together
with their conditions). The operation of the URL rewriting engine itself is exactly the same
for both configuration contexts. Only the final result processing is different.
The order of rules in the ruleset is important because the rewriting engine processes
them in a special (and not very obvious) order. The rule is this: The rewriting engine loops
through the ruleset rule by rule (RewriteRule directives) and when a particular
rule matches it optionally loops through existing corresponding conditions (RewriteCond
directives). For historical reasons the conditions are given first, and so the control flow
is a little bit long-winded. See Figure 1 for more details.
![[Needs graphics capability to display]](../images/mod_rewrite_fig1.gif) |
| Figure 1: The control flow through the rewriting
ruleset |
As you can see, first the URL is matched against the Pattern of each rule. When
it fails mod_rewrite immediately stops processing this rule and continues with the next
rule. If the Pattern matches, mod_rewrite looks for corresponding rule conditions.
If none are present, it just substitutes the URL with a new value which is constructed from
the string Substitution and goes on with its rule-looping. But if conditions exist,
it starts an inner loop for processing them in the order that they are listed. For
conditions the logic is different: we don't match a pattern against the current URL. Instead
we first create a string TestString by expanding variables, back-references, map
lookups, etc. and then we try to match CondPattern against it. If the
pattern doesn't match, the complete set of conditions and the corresponding rule fails. If
the pattern matches, then the next condition is processed until no more conditions are
available. If all conditions match, processing is continued with the substitution of the URL
with Substitution.
As of Apache 1.3.20, special characters in TestString and Substitution
strings can be escaped (that is, treated as normal characters without their usual special
meaning) by prefixing them with a slosh ('\') character. In other words, you can include an
actual dollar-sign character in a Substitution string by using '\$';
this keeps mod_rewrite from trying to treat it as a backreference.
One important thing here has to be remembered: Whenever you use parentheses in Pattern
or in one of the CondPattern, back-references are internally created which can be
used with the strings $N and %N (see below). These are available
for creating the strings Substitution and TestString. Figure 2 shows to
which locations the back-references are transfered for expansion.
![[Needs graphics capability to display]](../images/mod_rewrite_fig2.gif) |
| Figure 2: The back-reference flow through a rule |
We know this was a crash course on mod_rewrite's internal processing. But you will
benefit from this knowledge when reading the following documentation of the available
directives.
This module keeps track of two additional (non-standard) CGI/SSI environment variables named
SCRIPT_URL and SCRIPT_URI. These contain the logical
Web-view to the current resource, while the standard CGI/SSI variables SCRIPT_NAME
and SCRIPT_FILENAME contain the physical System-view.
Notice: These variables hold the URI/URL as they were initially requested, i.e.,
before any rewriting. This is important because the rewriting process is primarily
used to rewrite logical URLs to physical pathnames.
Example:
SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
SCRIPT_FILENAME=/u/rse/.www/index.html
SCRIPT_URL=/u/rse/
SCRIPT_URI=http://en1.engelschall.com/u/rse/
We also have an
URL Rewriting
Guide available, which provides a collection of practical solutions for URL-based
problems. There you can find real-life rulesets and additional information about mod_rewrite.
|