Wiki spam
About wiki spam
WikiSpam is becoming a global problem on the different types of wiki setups and on web applications which allow for permanent recording and displaying of user submitted contents. As seen on open posting newswires, the nature of openness always comes with vulnerabilities. The more open your application is, the more is it vulnerable to spam and any other types of 'content based attacks'.
Despite of trolls, who only want to annoy, URL/link spammers often have a mission. Just like email spammers, they want to promote a website or a service which is accessible through this website.
While trolls are more difficult to get rid off, there are a couple of counter measures against URL/link spammers. But before thinking about their vulnerable spots, let's think about ours.
Types of wiki spam
Registration Spam
There are a couple of ways to post spam to a TWiki setup. Registration spam is one of them. In this case, someody (or rather a script)
registers an account with the Indymedia Documentation Project, aka 'the Wiki', spamvertizing web sites through the 'Comment' field which is taken from the account signup form and written to his/her home topic in the Main web.
The problem about this is that it cannot be easily worked around. The registration form's target address is always
Main.WebHome which exists in about every TWiki setup. Thus its quite easy to exploit this and build a spam tool which does mass registrations. As far as I know, it would theoretically be possible to change the name of the home page of the Main web and (thus) the home page of the whole TWiki installation.
However, this would mean sooo many broken links both within and out of the Wiki that it is probably not a practicale solution. And even then, it would be another 'security by obscurity' solution, as we used it before for the TWiki guest account (we changed the default TWiki guest account login credentials).
Topic Edit Spam
Spammer who use the guest account, have already registered an account or became aware of the account credentials of an existing account tend to add pretty hard to spot links to random existing (or newly created) twiki topics.
Those often look like so:
Some sentence
. Some other sentence.
which is actually this:
Some sentence[[http://example.com/][.]] Some other sentence.
or uses some
<div>
block with a
style="hidden"
for this purpose.
or uses some META HTTP-EQUIV="Refresh" to redirect you to another website.
We've been able to reduce the topic edit spam problem a lot by deactivating the guest login. In fact we haven't had a single occasion of this spam type (which is
much more annoying and time eating than registration spam) within two weeks after that. Deactivating the guest account is, however, not an adequate solution. It is just a workaround, as there is no quickly implementable solution availabke which fits our needs.
Possible solutions
CAPTCHA
The best 'solution' for this I can think of is integrating
CAPTCHA into the registration process. However, this would need to be integrated into TWiki first of all. This is implemented for Dakar by
the CaptchaPlugin.
Traditionnal CAPTCHA's have accessibility problems. See my comments on
TWiki:Codev/WikiSpam for further details and ideas. --
IntRigeri
Blacklisting
Looking over twiki.org I just came across a second semi-solution which may be less restrictive than CAPTCHA:
TWiki:TWiki.BlackListPlugin
However, this comes at the cost of additional maintenance work and possible false positives.
Unluckily, as Main.Intrigeri points out, it logs IP addresses, thus using it is not an option for Indymedia.
Host/network firewall rules + blacklist
Indymedia doesn't want to log IP addresses, however the
TWiki:TWiki.BlackListPlugin needs to record and store them to function properly and is - although probably a good counter measure otherwise - thus not going to work for Indymedia. However, other sites which use this plugin provide an automated feedback stream to the twiki.org mothership via
TWiki:TWiki.BlackListLog. This topic stores IP addresses of known spammers. As this blacklist is already publicly available, it would not do any harm to make use of it by importing it to firewall rulesets and prevent any connections of the listed IP endpoints to the server hosting the Indymedia wiki. There are also several other blacklists on the web which serve a similar topic, see the web links to other wiki and blog codebases on the bottom of this WikiTopic.
The backdraws of this approach are that
- TWiki:TWiki.BlackListLog has not been updated since February 2006
- we rely on others' people data which we are we able to verify
- we rely on the prolonged availability of this blacklist
- it is not currently easily possible to fine tune the firewall for the server running docs.indymedia.org
- conducting (semi-?) automated changes of firewall rules is always a sensitive topic in general
Web application firewall + blacklist
The last two backdraws given above could be circumvented by making these changes on the web application level instead, i.e. by means of a web application firewall such as
mod_security.
Pros:
- these (semi-?) automated changes to the server configuration have less impact than they would have on the host/network firewall level
- as a side effect, a web application firewall would also safe us from exploitability of several security issues in the TWiki code which have not yet been discovered or revealed as of now
Cons:
- we still conduct (semi-?) automated changes to a server configuration based on external information
- we rely on others' people data which we are we able to verify
- we rely on the prolonged availability of external blacklists
- we have little experience with application level firewalls
- web application firewalls cause higher system load than network/host level firewalls
No Follow-Attribute
As noted down before, one of the main reason why people spam on Wikis is that they are trying to draw attention to their web sites. The idea is to place your web sites URL on a widely visited and cross-linked wiki like the one of Indymedia, which is also regulalrly spidered and highly valued by search engine robots. Many search engines increase search ranking positions for web sites which are widely referenced on other web sites. Thus, Wiki spammers attempt to place their web site URL on as many wikis as possible.
A way around this is to 'tell' the search engines that they shouldn't use the links found on your wiki to calculate search result rankings. This way you can make render all wiki spam ineffective.
To do this, to a link like
<a href="http://www.example.com">Example.com</a>
you add a new attribute
<a href="http://www.example.com" rel="nofollow">Example.com</a>
This option is further explained on
TWiki:Codev.SpamDefeatingViaNofollowAttribute and the
Google blog.
dakar (new TWiki release) does this out of the box.
Additional resources
Web links:
- WWW spam in general
- Wiki spam
- TWiki and spam
- The problem
- Possible counter measures