| |
|
|
Apache 1.3
URL Rewriting Guide
Archive Access Multiplexer
- Description:
- Do you know the great CPAN (Comprehensive Perl Archive Network) under http://www.perl.com/CPAN? This does a redirect to
one of several FTP servers around the world which carry a CPAN mirror and is
approximately near the location of the requesting client. Actually this can be called an
FTP access multiplexing service. While CPAN runs via CGI scripts, how can a similar
approach implemented via mod_rewrite?
- Solution:
- First we notice that from version 3.0.0 mod_rewrite can also use the "ftp:"
scheme on redirects. And second, the location approximation can be done by a rewritemap
over the top-level domain of the client. With a tricky chained ruleset we can use this
top-level domain as a key to our multiplexing map.
RewriteEngine on
RewriteMap multiplex txt:/path/to/map.cxan
RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C]
RewriteRule ^.+\.([a-zA-Z]+)::(.*)$ ${multiplex:$1|ftp.default.dom}$2 [R,L]
|
##
## map.cxan -- Multiplexing Map for CxAN
##
de ftp://ftp.cxan.de/CxAN/
uk ftp://ftp.cxan.uk/CxAN/
com ftp://ftp.cxan.com/CxAN/
:
##EOF##
|
Time-Dependend Rewriting
- Description:
- When tricks like time-dependend content should happen a lot of webmasters still use
CGI scripts which do for instance redirects to specialized pages. How can it be done via
mod_rewrite?
- Solution:
- There are a lot of variables named
TIME_xxx for rewrite conditions. In
conjunction with the special lexicographic comparison patterns <STRING, >STRING
and =STRING we can do time-dependend redirects:
RewriteEngine on
RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700
RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900
RewriteRule ^foo\.html$ foo.day.html
RewriteRule ^foo\.html$ foo.night.html
|
This provides the content of foo.day.html under the URL foo.html
from 07:00-19:00 and at the remaining time the contents of foo.night.html.
Just a nice feature for a homepage...
Backward Compatibility for YYYY to XXXX migration
- Description:
- How can we make URLs backward compatible (still existing virtually) after migrating
document.YYYY to document.XXXX, e.g. after translating a bunch of .html files to .phtml?
- Solution:
- We just rewrite the name to its basename and test for existence of the new extension.
If it exists, we take that name, else we rewrite the URL to its original state.
# backward compatibility ruleset for
# rewriting document.html to document.phtml
# when and only when document.phtml exists
# but no longer document.html
RewriteEngine on
RewriteBase /~quux/
# parse out basename, but remember the fact
RewriteRule ^(.*)\.html$ $1 [C,E=WasHTML:yes]
# rewrite to document.phtml if exists
RewriteCond %{REQUEST_FILENAME}.phtml -f
RewriteRule ^(.*)$ $1.phtml [S=1]
# else reverse the previous basename cutout
RewriteCond %{ENV:WasHTML} ^yes$
RewriteRule ^(.*)$ $1.html
|
Content Handling
From Old to New (intern)
- Description:
- Assume we have recently renamed the page
foo.html to bar.html
and now want to provide the old URL for backward compatibility. Actually we want that
users of the old URL even not recognize that the pages was renamed.
- Solution:
- We rewrite the old URL to the new one internally via the following rule:
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo\.html$ bar.html
|
From Old to New (extern)
- Description:
- Assume again that we have recently renamed the page
foo.html to bar.html
and now want to provide the old URL for backward compatibility. But this time we want
that the users of the old URL get hinted to the new one, i.e. their browsers Location
field should change, too.
- Solution:
- We force a HTTP redirect to the new URL which leads to a change of the browsers and
thus the users view:
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo\.html$ bar.html [R]
|
Browser Dependend Content
- Description:
- At least for important top-level pages it is sometimes necesarry to provide the
optimum of browser dependend content, i.e. one has to provide a maximum version for the
latest Netscape variants, a minimum version for the Lynx browsers and a average feature
version for all others.
- Solution:
- We cannot use content negotiation because the browsers do not provide their type in
that form. Instead we have to act on the HTTP header "User-Agent". The
following condig does the following: If the HTTP header "User-Agent" begins
with "Mozilla/3", the page
foo.html is rewritten to foo.NS.html
and and the rewriting stops. If the browser is "Lynx" or "Mozilla"
of version 1 or 2 the URL becomes foo.20.html. All other browsers receive
page foo.32.html. This is done by the following ruleset:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.*
RewriteRule ^foo\.html$ foo.NS.html [L]
RewriteCond %{HTTP_USER_AGENT} ^Lynx/.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[12].*
RewriteRule ^foo\.html$ foo.20.html [L]
RewriteRule ^foo\.html$ foo.32.html [L]
|
Dynamic Mirror
- Description:
- Assume there are nice webpages on remote hosts we want to bring into our namespace.
For FTP servers we would use the
mirror program which actually maintains an
explicit up-to-date copy of the remote data on the local machine. For a webserver we
could use the program webcopy which acts similar via HTTP. But both
techniques have one major drawback: The local copy is always just as up-to-date as often
we run the program. It would be much better if the mirror is not a static one we have to
establish explicitly. Instead we want a dynamic mirror with data which gets updated
automatically when there is need (updated data on the remote host).
- Solution:
- To provide this feature we map the remote webpage or even the complete remote webarea
to our namespace by the use of the Proxy Throughput feature (flag [P]):
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^hotsheet/(.*)$ http://www.tstimpreso.com/hotsheet/$1 [P]
|
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^usa-news\.html$ http://www.quux-corp.com/news/index.html [P]
|
Reverse Dynamic Mirror
- Description:
- ...
- Solution:
-
RewriteEngine on
RewriteCond /mirror/of/remotesite/$1 -U
RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
|
Retrieve Missing Data from Intranet
- Description:
- This is a tricky way of virtually running a corporates (external) Internet webserver (
www.quux-corp.dom),
while actually keeping and maintaining its data on a (internal) Intranet webserver (www2.quux-corp.dom)
which is protected by a firewall. The trick is that on the external webserver we
retrieve the requested data on-the-fly from the internal one.
- Solution:
- First, we have to make sure that our firewall still protects the internal webserver
and that only the external webserver is allowed to retrieve data from it. For a
packet-filtering firewall we could for instance configure a firewall ruleset like the
following:
ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80
DENY Host * Port * --> Host www2.quux-corp.dom Port 80
|
Just adjust it to your actual configuration syntax. Now we can establish the
mod_rewrite rules which request the missing data in the background through the proxy
throughput feature:
RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P]
|
Load Balancing
- Description:
- Suppose we want to load balance the traffic to
www.foo.com over www[0-5].foo.com
(a total of 6 servers). How can this be done?
- Solution:
- There are a lot of possible solutions for this problem. We will discuss first a
commonly known DNS-based variant and then the special one with mod_rewrite:
- DNS Round-Robin
The simplest method for load-balancing is to use the DNS round-robin feature of
BIND. Here you just configure www[0-9].foo.com as usual in your DNS
with A(address) records, e.g.
www0 IN A 1.2.3.1
www1 IN A 1.2.3.2
www2 IN A 1.2.3.3
www3 IN A 1.2.3.4
www4 IN A 1.2.3.5
www5 IN A 1.2.3.6
|
Then you additionally add the following entry:
www IN CNAME www0.foo.com.
IN CNAME www1.foo.com.
IN CNAME www2.foo.com.
IN CNAME www3.foo.com.
IN CNAME www4.foo.com.
IN CNAME www5.foo.com.
IN CNAME www6.foo.com.
|
Notice that this seems wrong, but is actually an intended feature of BIND and can
be used in this way. However, now when www.foo.com gets resolved, BIND
gives out www0-www6 - but in a slightly permutated/rotated order every
time. This way the clients are spread over the various servers. But notice that this
not a perfect load balancing scheme, because DNS resolve information gets cached by
the other nameservers on the net, so once a client has resolved www.foo.com
to a particular wwwN.foo.com, all subsequent requests also go to this
particular name wwwN.foo.com. But the final result is ok, because the
total sum of the requests are really spread over the various webservers.
- DNS Load-Balancing
A sophisticated DNS-based method for load-balancing is to use the program lbnamed
which can be found at http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html.
It is a Perl 5 program in conjunction with auxilliary tools which provides a real
load-balancing for DNS.
- Proxy Throughput Round-Robin
In this variant we use mod_rewrite and its proxy throughput feature. First we
dedicate www0.foo.com to be actually www.foo.com by using
a single
www IN CNAME www0.foo.com.
|
entry in the DNS. Then we convert www0.foo.com to a proxy-only
server, i.e. we configure this machine so all arriving URLs are just pushed through
the internal proxy to one of the 5 other servers (www1-www5). To
accomplish this we first establish a ruleset which contacts a load balancing script lb.pl
for all URLs.
RewriteEngine on
RewriteMap lb prg:/path/to/lb.pl
RewriteRule ^/(.+)$ ${lb:$1} [P,L]
|
Then we write lb.pl:
#!/path/to/perl
##
## lb.pl -- load balancing script
##
$| = 1;
$name = "www"; # the hostname base
$first = 1; # the first server (not 0 here, because 0 is myself)
$last = 5; # the last server in the round-robin
$domain = "foo.dom"; # the domainname
$cnt = 0;
while (<STDIN>) {
$cnt = (($cnt+1) % ($last+1-$first));
$server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
print "http://$server/$_";
}
##EOF##
|
A last notice: Why is this useful? Seems like www0.foo.com still is
overloaded? The answer is yes, it is overloaded, but with plain proxy throughput
requests, only! All SSI, CGI, ePerl, etc. processing is completely done on the other
machines. This is the essential point.
- Hardware/TCP Round-Robin
There is a hardware solution available, too. Cisco has a beast called
LocalDirector which does a load balancing at the TCP/IP level. Actually this is some
sort of a circuit level gateway in front of a webcluster. If you have enough money
and really need a solution with high performance, use this one.
New MIME-type, New Service
- Description:
- On the net there are a lot of nifty CGI programs. But their usage is usually boring,
so a lot of webmaster don't use them. Even Apache's Action handler feature for
MIME-types is only appropriate when the CGI programs don't need special URLs (actually
PATH_INFO and QUERY_STRINGS) as their input. First, let us configure a new file type
with extension
.scgi (for secure CGI) which will be processed by the
popular cgiwrap program. The problem here is that for instance we use a
Homogeneous URL Layout (see above) a file inside the user homedirs has the URL /u/user/foo/bar.scgi.
But cgiwrap needs the URL in the form /~user/foo/bar.scgi/.
The following rule solves the problem:
RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3 [NS,T=application/x-http-cgi]
|
Or assume we have some more nifty programs: wwwlog (which displays the access.log
for a URL subtree and wwwidx (which runs Glimpse on a URL subtree). We have
to provide the URL area to these programs so they know on which area they have to act
on. But usually this ugly, because they are all the times still requested from that
areas, i.e. typically we would run the swwidx program from within /u/user/foo/
via hyperlink to
/internal/cgi/user/swwidx?i=/u/user/foo/
which is ugly. Because we have to hard-code both the location of the
area and the location of the CGI inside the hyperlink. When we have to
reorganise or area, we spend a lot of time changing the various hyperlinks.
- Solution:
- The solution here is to provide a special new URL format which automatically leads to
the proper CGI invocation. We configure the following:
RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/
RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
|
Now the hyperlink to search at /u/user/foo/ reads only
HREF="*"
which internally gets automatically transformed to
/internal/cgi/user/wwwidx?i=/u/user/foo/
The same approach leads to an invocation for the access log CGI program when the
hyperlink :log gets used.
From Static to Dynamic
- Description:
- How can we transform a static page
foo.html into a dynamic variant foo.cgi
in a seamless way, i.e. without notice by the browser/user.
- Solution:
- We just rewrite the URL to the CGI-script and force the correct MIME-type so it gets
really run as a CGI-script. This way a request to
/~quux/foo.html
internally leads to the invokation of /~quux/foo.cgi.
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo\.html$ foo.cgi [T=application/x-httpd-cgi]
|
|
|
|
|
|
|
© 2005 Active-Venture.com Web
Page Hosting
Service
|
|
|
|

|
|
< A language that doesn't have everything is actually easier to program in than some that do.
> |
|
|
| |
|
Disclaimer: This
documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://httpd.apache.org/docs/
|
|
|