The advantage of using a caching reverse proxy for mirroring media files (images etc.) instead of rsync-ing them is, that only the requested files are being transfered to the mirrors and not all old ones that are nearly never accessed. Thus less diskspace is required at the mirror and while setting up a new one the bandwidth- and time-consuming process of syncing all files is omitted.
Basically it's also possible to cache HTML files, but since they are getting modified very often (because of comments etc.) it's very hard to find out when the cache needs to be updated and it needs a lot more effort in configuring everything correctly.
On a Debian (lenny) system I did the following steps to set up the caching reverse proxy.
- Create a directory where Apache can cache (writable for www-data): here I used /var/wwwcache.
- Create /etc/apache2/sites-available/media.de.indymedia.org-proxy:
<VirtualHost *:80>
ServerName media.de.indymedia.org
# disable forward proxying
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
# pass requests to the server that has all files
ProxyPass / http://www3.de.indymedia.org/
ProxyPassReverse / http://www3.de.indymedia.org/
# should exist and be writable by apache:
CacheRoot /var/wwwcache
CacheDirLevels 3
CacheDirLength 1
# cache these paths:
CacheEnable disk /rtsp
CacheEnable disk /images
CacheEnable disk /media
CacheEnable disk /icon
CacheEnable disk /style
CacheEnable disk /static
# 32 mb:
CacheMaxFileSize 33554432
CacheMinFileSize 1
# 24 h:
CacheDefaultExpire 86400
# 10 d:
CacheMaxExpire 864000
# also use cache when client requested refresh
CacheIgnoreCacheControl On
# cache files without last modified date
CacheIgnoreNoLastMod On
# store all files
CacheStoreNoStore On
# because media items will never need a query string
CacheIgnoreQueryString On
LogFormat "noip - - %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %V" noip
CustomLog /dev/null noip
ErrorLog /dev/null
</VirtualHost>
- Enable required modules (proxy, proxy_http, cache, disk_cache):
a2enmod proxy proxy_http cache disk_cache
- Enable the newly created vhost:
a2ensite media.de.indymedia.org-proxy
TODO: To prevent the cache from filling up the disk htcacheclean needs to be run periodically.
Documentation on caching and proxying with Apache:
--
BriKs - 14 Feb 2010