| |
|
|
Apache 1.3
URL Rewriting Guide
On-the-fly Content-Regeneration
- Description:
- Here comes a really esoteric feature: Dynamically generated but statically served
pages, i.e. pages should be delivered as pure static pages (read from the filesystem and
just passed through), but they have to be generated dynamically by the webserver if
missing. This way you can have CGI-generated pages which are statically served unless
one (or a cronjob) removes the static contents. Then the contents gets refreshed.
- Solution:
- This is done via the following ruleset:
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L]
|
Here a request to page.html leads to a internal run of a corresponding page.cgi
if page.html is still missing or has filesize null. The trick here is that page.cgi
is a usual CGI script which (additionally to its STDOUT) writes its output to the file page.html.
Once it was run, the server sends out the data of page.html. When the
webmaster wants to force a refresh the contents, he just removes page.html
(usually done by a cronjob).
Document With Autorefresh
- Description:
- Wouldn't it be nice while creating a complex webpage if the webbrowser would
automatically refresh the page every time we write a new version from within our editor?
Impossible?
- Solution:
- No! We just combine the MIME multipart feature, the webserver NPH feature and the URL
manipulation power of mod_rewrite. First, we establish a new URL feature: Adding just
:refresh
to any URL causes this to be refreshed every time it gets updated on the filesystem.
RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1
|
Now when we reference the URL
/u/foo/bar/page.html:refresh
this leads to the internal invocation of the URL
/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
The only missing part is the NPH-CGI script. Although one would usually say
"left as an exercise to the reader" ;-) I will provide this, too.
#!/sw/bin/perl
##
## nph-refresh -- NPH/CGI script for auto refreshing pages
## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
##
$| = 1;
# split the QUERY_STRING variable
@pairs = split(/&/, $ENV{'QUERY_STRING'});
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$name =~ tr/A-Z/a-z/;
$name = 'QS_' . $name;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
eval "\$$name = \"$value\"";
}
$QS_s = 1 if ($QS_s eq '');
$QS_n = 3600 if ($QS_n eq '');
if ($QS_f eq '') {
print "HTTP/1.0 200 OK\n";
print "Content-type: text/html\n\n";
print "<b>ERROR</b>: No file given\n";
exit(0);
}
if (! -f $QS_f) {
print "HTTP/1.0 200 OK\n";
print "Content-type: text/html\n\n";
print "<b>ERROR</b>: File $QS_f not found\n";
exit(0);
}
sub print_http_headers_multipart_begin {
print "HTTP/1.0 200 OK\n";
$bound = "ThisRandomString12345";
print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
&print_http_headers_multipart_next;
}
sub print_http_headers_multipart_next {
print "\n--$bound\n";
}
sub print_http_headers_multipart_end {
print "\n--$bound--\n";
}
sub displayhtml {
local($buffer) = @_;
$len = length($buffer);
print "Content-type: text/html\n";
print "Content-length: $len\n\n";
print $buffer;
}
sub readfile {
local($file) = @_;
local(*FP, $size, $buffer, $bytes);
($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
$size = sprintf("%d", $size);
open(FP, "<$file");
$bytes = sysread(FP, $buffer, $size);
close(FP);
return $buffer;
}
$buffer = &readfile($QS_f);
&print_http_headers_multipart_begin;
&displayhtml($buffer);
sub mystat {
local($file) = $_[0];
local($time);
($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
return $mtime;
}
$mtimeL = &mystat($QS_f);
$mtime = $mtime;
for ($n = 0; $n < $QS_n; $n++) {
while (1) {
$mtime = &mystat($QS_f);
if ($mtime ne $mtimeL) {
$mtimeL = $mtime;
sleep(2);
$buffer = &readfile($QS_f);
&print_http_headers_multipart_next;
&displayhtml($buffer);
sleep(5);
$mtimeL = &mystat($QS_f);
last;
}
sleep($QS_s);
}
}
&print_http_headers_multipart_end;
exit(0);
##EOF##
Mass Virtual Hosting
- Description:
- The
<VirtualHost> feature of Apache is nice and works great when
you just have a few dozens virtual hosts. But when you are an ISP and have hundreds of
virtual hosts to provide this feature is not the best choice.
- Solution:
- To provide this feature we map the remote webpage or even the complete remote webarea
to our namespace by the use of the Proxy Throughput feature (flag [P]):
##
## vhost.map
##
www.vhost1.dom:80 /path/to/docroot/vhost1
www.vhost2.dom:80 /path/to/docroot/vhost2
:
www.vhostN.dom:80 /path/to/docroot/vhostN
|
##
## httpd.conf
##
:
# use the canonical hostname on redirects, etc.
UseCanonicalName on
:
# add the virtual host in front of the CLF-format
CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
:
# enable the rewriting engine in the main server
RewriteEngine on
# define two maps: one for fixing the URL and one which defines
# the available virtual hosts with their corresponding
# DocumentRoot.
RewriteMap lowercase int:tolower
RewriteMap vhost txt:/path/to/vhost.map
# Now do the actual virtual host mapping
# via a huge and complicated single rule:
#
# 1. make sure we don't map for common locations
RewriteCond %{REQUEST_URI} !^/commonurl1/.*
RewriteCond %{REQUEST_URI} !^/commonurl2/.*
:
RewriteCond %{REQUEST_URI} !^/commonurlN/.*
#
# 2. make sure we have a Host header, because
# currently our approach only supports
# virtual hosting through this header
RewriteCond %{HTTP_HOST} !^$
#
# 3. lowercase the hostname
RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$
#
# 4. lookup this hostname in vhost.map and
# remember it only when it is a path
# (and not "NONE" from above)
RewriteCond ${vhost:%1} ^(/.*)$
#
# 5. finally we can map the URL to its docroot location
# and remember the virtual host for logging puposes
RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}]
:
|
Access Restriction
Blocking of Robots
- Description:
- How can we block a really annoying robot from retrieving pages of a specific webarea?
A
/robots.txt file containing entries of the "Robot Exclusion
Protocol" is typically not enough to get rid of such a robot.
- Solution:
- We use a ruleset which forbids the URLs of the webarea
/~quux/foo/arc/
(perhaps a very deep directory indexed area where the robot traversal would create big
server load). We have to make sure that we forbid access only to the particular robot,
i.e. just forbidding the host where the robot runs is not enough. This would block users
from this host, too. We accomplish this by also matching the User-Agent HTTP header
information.
RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.*
RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$
RewriteRule ^/~quux/foo/arc/.+ - [F]
|
Blocked Inline-Images
- Description:
- Assume we have under http://www.quux-corp.de/~quux/ some pages with inlined GIF
graphics. These graphics are nice, so others directly incorporate them via hyperlinks to
their pages. We don't like this practice because it adds useless traffic to our server.
- Solution:
- While we cannot 100% protect the images from inclusion, we can at least restrict the
cases where the browser sends a HTTP Referer header.
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]
RewriteRule .*\.gif$ - [F]
|
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$
RewriteRule ^inlined-in-foo\.gif$ - [F]
|
|
|
|
|
|
|
© 2005 Active-Venture.com Web
Page Hosting
Service
|
|
|
|

|
|
< The ultimate metric that I would like to propose for user friendliness is quite simple: if this system was a person, how long would it take before you punched it in the nose.
> |
|
|
| |
|
Disclaimer: This
documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://httpd.apache.org/docs/
|
|
|