Web site hosting and domain registration services by Active-Venture.com
  

Apache 1.3
URL Rewriting Guide

Archive Access Multiplexer

Description:
Do you know the great CPAN (Comprehensive Perl Archive Network) under http://www.perl.com/CPAN? This does a redirect to one of several FTP servers around the world which carry a CPAN mirror and is approximately near the location of the requesting client. Actually this can be called an FTP access multiplexing service. While CPAN runs via CGI scripts, how can a similar approach implemented via mod_rewrite?
Solution:
First we notice that from version 3.0.0 mod_rewrite can also use the "ftp:" scheme on redirects. And second, the location approximation can be done by a rewritemap over the top-level domain of the client. With a tricky chained ruleset we can use this top-level domain as a key to our multiplexing map.
RewriteEngine on
RewriteMap    multiplex                txt:/path/to/map.cxan
RewriteRule   ^/CxAN/(.*)              %{REMOTE_HOST}::$1                 [C]
RewriteRule   ^.+\.([a-zA-Z]+)::(.*)$  ${multiplex:$1|ftp.default.dom}$2  [R,L]
##
##  map.cxan -- Multiplexing Map for CxAN
##

de        ftp://ftp.cxan.de/CxAN/
uk        ftp://ftp.cxan.uk/CxAN/
com       ftp://ftp.cxan.com/CxAN/
 :
##EOF##

Time-Dependend Rewriting

Description:
When tricks like time-dependend content should happen a lot of webmasters still use CGI scripts which do for instance redirects to specialized pages. How can it be done via mod_rewrite?
Solution:
There are a lot of variables named TIME_xxx for rewrite conditions. In conjunction with the special lexicographic comparison patterns <STRING, >STRING and =STRING we can do time-dependend redirects:
RewriteEngine on
RewriteCond   %{TIME_HOUR}%{TIME_MIN} >0700
RewriteCond   %{TIME_HOUR}%{TIME_MIN} <1900
RewriteRule   ^foo\.html$             foo.day.html
RewriteRule   ^foo\.html$             foo.night.html

This provides the content of foo.day.html under the URL foo.html from 07:00-19:00 and at the remaining time the contents of foo.night.html. Just a nice feature for a homepage...

Backward Compatibility for YYYY to XXXX migration

Description:
How can we make URLs backward compatible (still existing virtually) after migrating document.YYYY to document.XXXX, e.g. after translating a bunch of .html files to .phtml?
Solution:
We just rewrite the name to its basename and test for existence of the new extension. If it exists, we take that name, else we rewrite the URL to its original state.
#   backward compatibility ruleset for 
#   rewriting document.html to document.phtml
#   when and only when document.phtml exists
#   but no longer document.html
RewriteEngine on
RewriteBase   /~quux/
#   parse out basename, but remember the fact
RewriteRule   ^(.*)\.html$              $1      [C,E=WasHTML:yes]
#   rewrite to document.phtml if exists
RewriteCond   %{REQUEST_FILENAME}.phtml -f
RewriteRule   ^(.*)$ $1.phtml                   [S=1]
#   else reverse the previous basename cutout
RewriteCond   %{ENV:WasHTML}            ^yes$
RewriteRule   ^(.*)$ $1.html

Content Handling

From Old to New (intern)

Description:
Assume we have recently renamed the page foo.html to bar.html and now want to provide the old URL for backward compatibility. Actually we want that users of the old URL even not recognize that the pages was renamed.
Solution:
We rewrite the old URL to the new one internally via the following rule:
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^foo\.html$  bar.html

From Old to New (extern)

Description:
Assume again that we have recently renamed the page foo.html to bar.html and now want to provide the old URL for backward compatibility. But this time we want that the users of the old URL get hinted to the new one, i.e. their browsers Location field should change, too.
Solution:
We force a HTTP redirect to the new URL which leads to a change of the browsers and thus the users view:
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^foo\.html$  bar.html  [R]

Browser Dependend Content

Description:
At least for important top-level pages it is sometimes necesarry to provide the optimum of browser dependend content, i.e. one has to provide a maximum version for the latest Netscape variants, a minimum version for the Lynx browsers and a average feature version for all others.
Solution:
We cannot use content negotiation because the browsers do not provide their type in that form. Instead we have to act on the HTTP header "User-Agent". The following condig does the following: If the HTTP header "User-Agent" begins with "Mozilla/3", the page foo.html is rewritten to foo.NS.html and and the rewriting stops. If the browser is "Lynx" or "Mozilla" of version 1 or 2 the URL becomes foo.20.html. All other browsers receive page foo.32.html. This is done by the following ruleset:
RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/3.*
RewriteRule ^foo\.html$         foo.NS.html          [L]

RewriteCond %{HTTP_USER_AGENT}  ^Lynx/.*         [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/[12].*
RewriteRule ^foo\.html$         foo.20.html          [L]

RewriteRule ^foo\.html$         foo.32.html          [L]

Dynamic Mirror

Description:
Assume there are nice webpages on remote hosts we want to bring into our namespace. For FTP servers we would use the mirror program which actually maintains an explicit up-to-date copy of the remote data on the local machine. For a webserver we could use the program webcopy which acts similar via HTTP. But both techniques have one major drawback: The local copy is always just as up-to-date as often we run the program. It would be much better if the mirror is not a static one we have to establish explicitly. Instead we want a dynamic mirror with data which gets updated automatically when there is need (updated data on the remote host).
Solution:
To provide this feature we map the remote webpage or even the complete remote webarea to our namespace by the use of the Proxy Throughput feature (flag [P]):
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^hotsheet/(.*)$  http://www.tstimpreso.com/hotsheet/$1  [P]
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^usa-news\.html$   http://www.quux-corp.com/news/index.html  [P]

Reverse Dynamic Mirror

Description:
...
Solution:
RewriteEngine on
RewriteCond   /mirror/of/remotesite/$1           -U 
RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1

Retrieve Missing Data from Intranet

Description:
This is a tricky way of virtually running a corporates (external) Internet webserver (www.quux-corp.dom), while actually keeping and maintaining its data on a (internal) Intranet webserver (www2.quux-corp.dom) which is protected by a firewall. The trick is that on the external webserver we retrieve the requested data on-the-fly from the internal one.
Solution:
First, we have to make sure that our firewall still protects the internal webserver and that only the external webserver is allowed to retrieve data from it. For a packet-filtering firewall we could for instance configure a firewall ruleset like the following:
ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80  
DENY  Host *                 Port *     --> Host www2.quux-corp.dom Port 80

Just adjust it to your actual configuration syntax. Now we can establish the mod_rewrite rules which request the missing data in the background through the proxy throughput feature:

RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2
RewriteCond %{REQUEST_FILENAME}       !-f
RewriteCond %{REQUEST_FILENAME}       !-d
RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P]

Load Balancing

Description:
Suppose we want to load balance the traffic to www.foo.com over www[0-5].foo.com (a total of 6 servers). How can this be done?
Solution:
There are a lot of possible solutions for this problem. We will discuss first a commonly known DNS-based variant and then the special one with mod_rewrite:
  1. DNS Round-Robin

    The simplest method for load-balancing is to use the DNS round-robin feature of BIND. Here you just configure www[0-9].foo.com as usual in your DNS with A(address) records, e.g.

    www0   IN  A       1.2.3.1
    www1   IN  A       1.2.3.2
    www2   IN  A       1.2.3.3
    www3   IN  A       1.2.3.4
    www4   IN  A       1.2.3.5
    www5   IN  A       1.2.3.6
    

    Then you additionally add the following entry:

    www    IN  CNAME   www0.foo.com.
           IN  CNAME   www1.foo.com.
           IN  CNAME   www2.foo.com.
           IN  CNAME   www3.foo.com.
           IN  CNAME   www4.foo.com.
           IN  CNAME   www5.foo.com.
           IN  CNAME   www6.foo.com.
    

    Notice that this seems wrong, but is actually an intended feature of BIND and can be used in this way. However, now when www.foo.com gets resolved, BIND gives out www0-www6 - but in a slightly permutated/rotated order every time. This way the clients are spread over the various servers. But notice that this not a perfect load balancing scheme, because DNS resolve information gets cached by the other nameservers on the net, so once a client has resolved www.foo.com to a particular wwwN.foo.com, all subsequent requests also go to this particular name wwwN.foo.com. But the final result is ok, because the total sum of the requests are really spread over the various webservers.

  2. DNS Load-Balancing

    A sophisticated DNS-based method for load-balancing is to use the program lbnamed which can be found at http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. It is a Perl 5 program in conjunction with auxilliary tools which provides a real load-balancing for DNS.

  3. Proxy Throughput Round-Robin

    In this variant we use mod_rewrite and its proxy throughput feature. First we dedicate www0.foo.com to be actually www.foo.com by using a single

    www    IN  CNAME   www0.foo.com.
    

    entry in the DNS. Then we convert www0.foo.com to a proxy-only server, i.e. we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers (www1-www5). To accomplish this we first establish a ruleset which contacts a load balancing script lb.pl for all URLs.

    RewriteEngine on
    RewriteMap    lb      prg:/path/to/lb.pl
    RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
    

    Then we write lb.pl:

    #!/path/to/perl
    ##
    ##  lb.pl -- load balancing script
    ##
    
    $| = 1;
    
    $name   = "www";     # the hostname base
    $first  = 1;         # the first server (not 0 here, because 0 is myself) 
    $last   = 5;         # the last server in the round-robin
    $domain = "foo.dom"; # the domainname
    
    $cnt = 0;
    while (<STDIN>) {
        $cnt = (($cnt+1) % ($last+1-$first));
        $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
        print "http://$server/$_";
    }
    
    ##EOF##
    

    A last notice: Why is this useful? Seems like www0.foo.com still is overloaded? The answer is yes, it is overloaded, but with plain proxy throughput requests, only! All SSI, CGI, ePerl, etc. processing is completely done on the other machines. This is the essential point.

  4. Hardware/TCP Round-Robin

    There is a hardware solution available, too. Cisco has a beast called LocalDirector which does a load balancing at the TCP/IP level. Actually this is some sort of a circuit level gateway in front of a webcluster. If you have enough money and really need a solution with high performance, use this one.

New MIME-type, New Service

Description:
On the net there are a lot of nifty CGI programs. But their usage is usually boring, so a lot of webmaster don't use them. Even Apache's Action handler feature for MIME-types is only appropriate when the CGI programs don't need special URLs (actually PATH_INFO and QUERY_STRINGS) as their input. First, let us configure a new file type with extension .scgi (for secure CGI) which will be processed by the popular cgiwrap program. The problem here is that for instance we use a Homogeneous URL Layout (see above) a file inside the user homedirs has the URL /u/user/foo/bar.scgi. But cgiwrap needs the URL in the form /~user/foo/bar.scgi/. The following rule solves the problem:
RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3  [NS,T=application/x-http-cgi]

Or assume we have some more nifty programs: wwwlog (which displays the access.log for a URL subtree and wwwidx (which runs Glimpse on a URL subtree). We have to provide the URL area to these programs so they know on which area they have to act on. But usually this ugly, because they are all the times still requested from that areas, i.e. typically we would run the swwidx program from within /u/user/foo/ via hyperlink to

/internal/cgi/user/swwidx?i=/u/user/foo/

which is ugly. Because we have to hard-code both the location of the area and the location of the CGI inside the hyperlink. When we have to reorganise or area, we spend a lot of time changing the various hyperlinks.

Solution:
The solution here is to provide a special new URL format which automatically leads to the proper CGI invocation. We configure the following:
RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3

Now the hyperlink to search at /u/user/foo/ reads only

HREF="*"

which internally gets automatically transformed to

/internal/cgi/user/wwwidx?i=/u/user/foo/

The same approach leads to an invocation for the access log CGI program when the hyperlink :log gets used.

From Static to Dynamic

Description:
How can we transform a static page foo.html into a dynamic variant foo.cgi in a seamless way, i.e. without notice by the browser/user.
Solution:
We just rewrite the URL to the CGI-script and force the correct MIME-type so it gets really run as a CGI-script. This way a request to /~quux/foo.html internally leads to the invokation of /~quux/foo.cgi.
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^foo\.html$  foo.cgi  [T=application/x-httpd-cgi]

 

 

 

© 2005 Active-Venture.com Web Page Hosting Service

Buy domain name registration | Register cheap domain name | Domain registration services 

< A language that doesn't have everything is actually easier to program in than some that do.   >

 

 
 

Disclaimer: This documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://httpd.apache.org/docs/