CachingClusterIdeas
Table of content :
Overview
The goal of this project is to replace hopefully all frontend website traffic (including as many IMCs as possible) with a geographically diverse caching system. This page goes over the ideas of how we could realize this goal, and mentions some pro/con comparisons on various software that could be used to do it, etcetera.
Why is this needed?
Right now many IMC sites have issues with load when high profile events happen. The goal here would be to help reduce the load and also allow some sort of existance should an emergency occur (hardware failures, etc).
Software Shootout
Squid
Pros: Versatile, proven source code, maintained.
Cons: Designed to act as a forward proxy (can be configured to work the other way around, but this would be complex for the amount of sites we intend to cluster), many global settings that can't be customized per site, security concerns, etc.
TODO: If someone has pointers for using Squid for this sort of thing, put them here.
Varnish
Pros: Designed to be high performance, written by people with years of software engineering experience, maintained.
Cons: Uses non-portable programming methods (literally they intend to only target Linux and
FreeBSD?, possible software configurations on an average cluster node may be more diverse then this), C compiler required to "parse and compile the configuration file into a shared library" -- this is a killer for Indymedia usage period.
TODO: If someone has pointers for using Varnish for this, and has ideas on how to get rid of the gcc runtime dependency, then we'd love to hear about both of these things.
Loreley
Pros: High performance, portable, designed for a similar task at another highly accessible site (Wikimedia)
Cons: Due to possible lack of interest from it's author we may need to maintain it ourselves (this could allow for more customization for what we specifically need, though), written with Boost which means a recent C++ compiler is required to build it (gcc 3.4 and later).
TODO: Determine if this is the way to go, and if so, document some Indymedia-specific requirements for Loreley here.
FAQ
Why not use all three? Because then we would have to maintain three sets of configurations, which means three times the work.
Why does Indymedia need this, really? To help load balance IMC sites, hide IMC site IPs (so they can't be DDoS attacked, etcetera), and to provide fallback information should the IMC site go down.
TODO: Document other questions as they come up.
Changelog
--
WilliamPitcock - 16 Aug 2007: Wrote initial page.
to top