MirChangeTrackerHowto
What is the Change Tracker?
Warning: This code is still under development - see the end of this page for details ...
The new change tracking in mir makes it possible to do away with the need for rsync (mostly!) when syncing a publish server with a static mirror. Most of the files change very rarely, but you want your mirror(s) to be up to date. There is quite a load (bandwidth and disk thrash) to rsync a few hundred thousand files every few minutes. So instead Mir publishes a text file with a list of what's been added or modified, and a mirror, running a supplied perl script, fetches this file (if it's changed), and then fetches the indicated changes. Much less traffic and load, and you can get your mirror up to date every minute. (Or that's the theory at least ...)
You probably still want to keep a working rsync setup, both to bring things totally up to date if things get out of sync, but also to make sure that changes made manually (say to include files, or media files which are uploaded via scp) can be propagated. But you no longer have to run an rsync job every five minutes!
How to set it up
Make sure you are up to date with cvs
Right now, the code lives in the MIR_1_1 branch. It's pretty deep in there, because this adds behavior to most producer nodes. Make sure to do
cvs update -d so you get new files!
Add a producer node to publish the changelist
This looks something like:
<producer name="reportchanges">
<verbs>
<verb name="doit" />
</verbs>
<body>
<ReportChanges
format="${config.now.formatted['HH:mm:ss']} ${change.type} /${change.path}"
file="${config['Producer.StorageRoot']}/changes/changes${config.now.formatted['yyyyMMdd']}.txt" basepath="${config['Producer.StorageRoot']}" />
</body>
</producer>
Make sure this gets called when a change is made, for example, by adding it to the "generate all new" producers.
Set up the mirror script on the mirror
In your cvs checkout, you should see a file mir/scripts/mirror-scripts/update.pl. Copy this to the mirror (you may need to install some perl modules, but nothing that isn't in cpan or apt-getable. To fetch changes, the command looks something like:
perl update.pl --remoteroot=http://publish.somesite.indymedia.org --workingdir=/var/www/somesite
That's it! Just put that into cron - you could even run it once a minute, since it doesn't do really take up that much resources if there aren't any changes to fetch. You can also run the update.pl command for more help on possible arguments.
Still under development
Currently (Feb 2007) this set up is under development and should not be considered stable and final. However we think it is good enough to try it out, but please let us know if you run into any trouble (files not reported changed, files not copied, features needed ...). Errors/bugs/difficulties should be reported to at least one of the following lists
- imc-uk-tech at lists dot indymedia dot org
- mir-coders at lists dot indymedia dot org
Good luck. Two threads you might want to read are
one and
two.
And if it all works wonderfully for you and server load is much lower, then let us know too
Ideas for improvements include
- a blacklist file - for mirror server admins to stop a file being downloaded if they need to cover their arses.
- scripts to be able to delete files from all mirrors