|
Apache HTTP Server Version 1.3
Apache 1.3
URL Rewriting Guide
Originally written by
Ralf S. Engelschall <rse@apache.org>
December 1997
This document supplements the mod_rewrite
reference documentation. It
describes how one can use Apache's mod_rewrite to solve typical URL-based problems
webmasters are usually confronted with in practice. I give detailed descriptions on how to
solve each problem by configuring URL rewriting rulesets.
The Apache module mod_rewrite is a killer one, i.e. it is a really sophisticated module
which provides a powerful way to do URL manipulations. With it you can nearly do all types
of URL manipulations you ever dreamed about. The price you have to pay is to accept
complexity, because mod_rewrite's major drawback is that it is not easy to understand and
use for the beginner. And even Apache experts sometimes discover new aspects where
mod_rewrite can help.
In other words: With mod_rewrite you either shoot yourself in the foot the first time and
never use it again or love it for the rest of your life because of its power. This paper
tries to give you a few initial success events to avoid the first case by presenting already
invented solutions to you.
Here come a lot of practical solutions I've either invented myself or collected from other
peoples solutions in the past. Feel free to learn the black magic of URL rewriting from
these examples.
ATTENTION: Depending on your server-configuration it can be necessary to slightly
change the examples for your situation, e.g. adding the [PT] flag when additionally
using mod_alias and mod_userdir, etc. Or rewriting a ruleset to fit in .htaccess
context instead of per-server context. Always try to understand what a particular
ruleset really does before you use it. It avoid problems. |
URL Layout
Canonical URLs
- Description:
- On some webservers there are more than one URL for a resource. Usually there are
canonical URLs (which should be actually used and distributed) and those which are just
shortcuts, internal ones, etc. Independent which URL the user supplied with the request
he should finally see the canonical one only.
- Solution:
- We do an external HTTP redirect for all non-canonical URLs to fix them in the location
view of the Browser and for all subsequent requests. In the example ruleset below we
replace
/~user by the canonical /u/user and fix a missing
trailing slash for /u/user.
RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2 [R]
RewriteRule ^/([uge])/([^/]+)$ /$1/$2/ [R]
|
Canonical Hostnames
- Description:
- ...
- Solution:
-
RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{SERVER_PORT} !^80$
RewriteRule ^/(.*) http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R]
RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://fully.qualified.domain.name/$1 [L,R]
|
Moved DocumentRoot
- Description:
- Usually the DocumentRoot of the webserver directly relates to the URL ``
/''.
But often this data is not really of top-level priority, it is perhaps just one entity
of a lot of data pools. For instance at our Intranet sites there are /e/www/
(the homepage for WWW), /e/sww/ (the homepage for the Intranet) etc. Now
because the data of the DocumentRoot stays at /e/www/ we had to make sure
that all inlined images and other stuff inside this data pool work for subsequent
requests.
- Solution:
- We just redirect the URL
/ to /e/www/. While is seems
trivial it is actually trivial with mod_rewrite, only. Because the typical old
mechanisms of URL Aliases (as provides by mod_alias and friends) only used prefix
matching. With this you cannot do such a redirection because the DocumentRoot is a
prefix of all URLs. With mod_rewrite it is really trivial:
RewriteEngine on
RewriteRule ^/$ /e/www/ [R]
|
Trailing Slash Problem
- Description:
- Every webmaster can sing a song about the problem of the trailing slash on URLs
referencing directories. If they are missing, the server dumps an error, because if you
say
/~quux/foo instead of /~quux/foo/ then the server searches
for a file named foo. And because this file is a directory it
complains. Actually is tries to fix it themself in most of the cases, but sometimes this
mechanism need to be emulated by you. For instance after you have done a lot of
complicated URL rewritings to CGI scripts etc.
- Solution:
- The solution to this subtle problem is to let the server add the trailing slash
automatically. To do this correctly we have to use an external redirect, so the browser
correctly requests subsequent images etc. If we only did a internal rewrite, this would
only work for the directory page, but would go wrong when any images are included into
this page with relative URLs, because the browser would request an in-lined object. For
instance, a request for
image.gif in /~quux/foo/index.html
would become /~quux/image.gif without the external redirect!
So, to do this trick we write:
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo$ foo/ [R]
|
The crazy and lazy can even do the following in the top-level .htaccess
file of their homedir. But notice that this creates some processing overhead.
RewriteEngine on
RewriteBase /~quux/
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R]
|
Webcluster through Homogeneous URL Layout
- Description:
- We want to create a homogenous and consistent URL layout over all WWW servers on a
Intranet webcluster, i.e. all URLs (per definition server local and thus server
dependent!) become actually server independed! What we want is to give the WWW
namespace a consistent server-independend layout: no URL should have to include any
physically correct target server. The cluster itself should drive us automatically to
the physical target host.
- Solution:
- First, the knowledge of the target servers come from (distributed) external maps which
contain information where our users, groups and entities stay. The have the form
user1 server_of_user1
user2 server_of_user2
: :
We put them into files map.xxx-to-host. Second we need to instruct all
servers to redirect URLs of the forms
/u/user/anypath
/g/group/anypath
/e/entity/anypath
to
http://physical-host/u/user/anypath
http://physical-host/g/group/anypath
http://physical-host/e/entity/anypath
when the URL is not locally valid to a server. The following ruleset does this for us
by the help of the map files (assuming that server0 is a default server which will be
used if a user has no entry in the map):
RewriteEngine on
RewriteMap user-to-host txt:/path/to/map.user-to-host
RewriteMap group-to-host txt:/path/to/map.group-to-host
RewriteMap entity-to-host txt:/path/to/map.entity-to-host
RewriteRule ^/u/([^/]+)/?(.*) http://${user-to-host:$1|server0}/u/$1/$2
RewriteRule ^/g/([^/]+)/?(.*) http://${group-to-host:$1|server0}/g/$1/$2
RewriteRule ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2
RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/
RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
|
Move Homedirs to Different Webserver
- Description:
- A lot of webmaster aksed for a solution to the following situation: They wanted to
redirect just all homedirs on a webserver to another webserver. They usually need such
things when establishing a newer webserver which will replace the old one over time.
- Solution:
- The solution is trivial with mod_rewrite. On the old webserver we just redirect all
/~user/anypath
URLs to http://newserver/~user/anypath.
RewriteEngine on
RewriteRule ^/~(.+) http://newserver/~$1 [R,L]
|
Structured Homedirs
- Description:
- Some sites with thousend of users usually use a structured homedir layout, i.e. each
homedir is in a subdirectory which begins for instance with the first character of the
username. So,
/~foo/anypath is /home/f/foo/.www/anypath
while /~bar/anypath is /home/b/bar/.www/anypath.
- Solution:
- We use the following ruleset to expand the tilde URLs into exactly the above layout.
RewriteEngine on
RewriteRule ^/~(([a-z])[a-z0-9]+)(.*) /home/$2/$1/.www$3
|
Filesystem Reorganisation
- Description:
- This really is a hardcore example: a killer application which heavily uses
per-directory
RewriteRules to get a smooth look and feel on the Web while
its data structure is never touched or adjusted. Background: net.sw
is my archive of freely available Unix software packages, which I started to collect in
1992. It is both my hobby and job to to this, because while I'm studying computer
science I have also worked for many years as a system and network administrator in my
spare time. Every week I need some sort of software so I created a deep hierarchy of
directories where I stored the packages:
drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/
drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/
drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/
drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/
drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/
drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/
drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/
drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/
drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/
drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/
drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/
drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/
drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/
drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/
drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/
drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/
In July 1996 I decided to make this archive public to the world via a nice Web
interface. "Nice" means that I wanted to offer an interface where you can
browse directly through the archive hierarchy. And "nice" means that I didn't
wanted to change anything inside this hierarchy - not even by putting some CGI scripts
at the top of it. Why? Because the above structure should be later accessible via FTP as
well, and I didn't want any Web or CGI stuff to be there.
- Solution:
- The solution has two parts: The first is a set of CGI scripts which create all the
pages at all directory levels on-the-fly. I put them under
/e/netsw/.www/
as follows:
-rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl
drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/
-rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE
-rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO
-rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html
-rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl
-rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi
-rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi
drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/
-rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi
-rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi
-rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi
-rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst
The DATA/ subdirectory holds the above directory structure, i.e. the
real net.sw stuff and gets automatically updated via rdist
from time to time. The second part of the problem remains: how to link these two
structures together into one smooth-looking URL tree? We want to hide the DATA/
directory from the user while running the appropriate CGI scripts for the various URLs.
Here is the solution: first I put the following into the per-directory configuration
file in the Document Root of the server to rewrite the announced URL /net.sw/
to the internal path /e/netsw:
RewriteRule ^net.sw$ net.sw/ [R]
RewriteRule ^net.sw/(.*)$ e/netsw/$1
|
The first rule is for requests which miss the trailing slash! The second rule does
the real thing. And then comes the killer configuration which stays in the per-directory
config file /e/netsw/.www/.wwwacl:
Options ExecCGI FollowSymLinks Includes MultiViews
RewriteEngine on
# we are reached via /net.sw/ prefix
RewriteBase /net.sw/
# first we rewrite the root dir to
# the handling cgi script
RewriteRule ^$ netsw-home.cgi [L]
RewriteRule ^index\.html$ netsw-home.cgi [L]
# strip out the subdirs when
# the browser requests us from perdir pages
RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L]
# and now break the rewriting for local files
RewriteRule ^netsw-home\.cgi.* - [L]
RewriteRule ^netsw-changes\.cgi.* - [L]
RewriteRule ^netsw-search\.cgi.* - [L]
RewriteRule ^netsw-tree\.cgi$ - [L]
RewriteRule ^netsw-about\.html$ - [L]
RewriteRule ^netsw-img/.*$ - [L]
# anything else is a subdir which gets handled
# by another cgi script
RewriteRule !^netsw-lsdir\.cgi.* - [C]
RewriteRule (.*) netsw-lsdir.cgi/$1
|
Some hints for interpretation:
- Notice the L (last) flag and no substitution field ('-') in the forth part
- Notice the ! (not) character and the C (chain) flag at the first rule in the last
part
- Notice the catch-all pattern in the last rule
NCSA imagemap to Apache mod_imap
- Description:
- When switching from the NCSA webserver to the more modern Apache webserver a lot of
people want a smooth transition. So they want pages which use their old NCSA
imagemap
program to work under Apache with the modern mod_imap. The problem is that
there are a lot of hyperlinks around which reference the imagemap program
via /cgi-bin/imagemap/path/to/page.map. Under Apache this has to read just /path/to/page.map.
- Solution:
- We use a global rule to remove the prefix on-the-fly for all requests:
RewriteEngine on
RewriteRule ^/cgi-bin/imagemap(.*) $1 [PT]
|
Search pages in more than one directory
- Description:
- Sometimes it is neccessary to let the webserver search for pages in more than one
directory. Here MultiViews or other techniques cannot help.
- Solution:
- We program a explicit ruleset which searches for the files in the directories.
RewriteEngine on
# first try to find it in custom/...
# ...and if found stop and be happy:
RewriteCond /your/docroot/dir1/%{REQUEST_FILENAME} -f
RewriteRule ^(.+) /your/docroot/dir1/$1 [L]
# second try to find it in pub/...
# ...and if found stop and be happy:
RewriteCond /your/docroot/dir2/%{REQUEST_FILENAME} -f
RewriteRule ^(.+) /your/docroot/dir2/$1 [L]
# else go on for other Alias or ScriptAlias directives,
# etc.
RewriteRule ^(.+) - [PT]
|
Set Environment Variables According To URL Parts
- Description:
- Perhaps you want to keep status information between requests and use the URL to encode
it. But you don't want to use a CGI wrapper for all pages just to strip out this
information.
- Solution:
- We use a rewrite rule to strip out the status information and remember it via an
environment variable which can be later dereferenced from within XSSI or CGI. This way a
URL
/foo/S=java/bar/ gets translated to /foo/bar/ and the
environment variable named STATUS is set to the value "java".
RewriteEngine on
RewriteRule ^(.*)/S=([^/]+)/(.*) $1/$3 [E=STATUS:$2]
|
Virtual User Hosts
- Description:
- Assume that you want to provide
www.username.host.domain.com
for the homepage of username via just DNS A records to the same machine and without any
virtualhosts on this machine.
- Solution:
- For HTTP/1.0 requests there is no solution, but for HTTP/1.1 requests which contain a
Host: HTTP header we can use the following ruleset to rewrite
http://www.username.host.com/anypath
internally to /home/username/anypath:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.[^.]+\.host\.com$
RewriteRule ^(.+) %{HTTP_HOST}$1 [C]
RewriteRule ^www\.([^.]+)\.host\.com(.*) /home/$1$2
|
Redirect Homedirs For Foreigners
- Description:
- We want to redirect homedir URLs to another webserver
www.somewhere.com
when the requesting user does not stay in the local domain ourdomain.com.
This is sometimes used in virtual host contexts.
- Solution:
- Just a rewrite condition:
RewriteEngine on
RewriteCond %{REMOTE_HOST} !^.+\.ourdomain\.com$
RewriteRule ^(/~.+) http://www.somewhere.com/$1 [R,L]
|
Redirect Failing URLs To Other Webserver
- Description:
- A typical FAQ about URL rewriting is how to redirect failing requests on webserver A
to webserver B. Usually this is done via ErrorDocument CGI-scripts in Perl, but there is
also a mod_rewrite solution. But notice that this is less performant than using a
ErrorDocument CGI-script!
- Solution:
- The first solution has the best performance but less flexibility and is less error
safe:
RewriteEngine on
RewriteCond /your/docroot/%{REQUEST_FILENAME} !-f
RewriteRule ^(.+) http://webserverB.dom/$1
|
The problem here is that this will only work for pages inside the DocumentRoot. While
you can add more Conditions (for instance to also handle homedirs, etc.) there is better
variant:
RewriteEngine on
RewriteCond %{REQUEST_URI} !-U
RewriteRule ^(.+) http://webserverB.dom/$1
|
This uses the URL look-ahead feature of mod_rewrite. The result is that this will
work for all types of URLs and is a safe way. But it does a performance impact on the
webserver, because for every request there is one more internal subrequest. So, if your
webserver runs on a powerful CPU, use this one. If it is a slow machine, use the first
approach or better a ErrorDocument CGI-script.
What you need to find out about singles online
dating and associating risks. Getting appropriate training for
interior design careers. Choosing the right
culinary art school to become a great chef.
Extended Redirection
- Description:
- Sometimes we need more control (concerning the character escaping mechanism) of URLs
on redirects. Usually the Apache kernels URL escape function also escapes anchors, i.e.
URLs like "url#anchor". You cannot use this directly on redirects with
mod_rewrite because the uri_escape() function of Apache would also escape the hash
character. How can we redirect to such a URL?
- Solution:
- We have to use a kludge by the use of a NPH-CGI script which does the redirect itself.
Because here no escaping is done (NPH=non-parseable headers). First we introduce a new
URL scheme
xredirect: by the following per-server config-line (should be
one of the last rewrite rules):
RewriteRule ^xredirect:(.+) /path/to/nph-xredirect.cgi/$1 \
[T=application/x-httpd-cgi,L]
|
This forces all URLs prefixed with xredirect: to be piped through the nph-xredirect.cgi
program. And this program just looks like:
#!/path/to/perl
##
## nph-xredirect.cgi -- NPH/CGI script for extended redirects
## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
##
$| = 1;
$url = $ENV{'PATH_INFO'};
print "HTTP/1.0 302 Moved Temporarily\n";
print "Server: $ENV{'SERVER_SOFTWARE'}\n";
print "Location: $url\n";
print "Content-type: text/html\n";
print "\n";
print "<html>\n";
print "<head>\n";
print "<title>302 Moved Temporarily (EXTENDED)</title>\n";
print "</head>\n";
print "<body>\n";
print "<h1>Moved Temporarily (EXTENDED)</h1>\n";
print "The document has moved <a HREF=\"$url\">here</a>.<p>\n";
print "</body>\n";
print "</html>\n";
##EOF##
|
This provides you with the functionality to do redirects to all URL schemes, i.e.
including the one which are not directly accepted by mod_rewrite. For instance you can
now also redirect to news:newsgroup via
RewriteRule ^anyurl xredirect:news:newsgroup
|
Notice: You have not to put [R] or [R,L] to the above rule because the xredirect:
need to be expanded later by our special "pipe through" rule above.
- Continue: Part 2,
Part 3,
Part 4
Apache HTTP Server Version 1.3
|