MirAntiAbuseFilters
Table of content :
How filters work
The basic idea of the filter system is to automatically process postings, in particular abusive postings such as spam and trolling.
A filter is nothing more or less than a condition and an action: if a posting fits the condition, the action will be performed. No more than 1 filter can be matched against a single posting.
Filters are organized into groups. There's a list of groups and each group has a list of filters. Both are iterated and the action of the first matching filter will be applied to the posting. The order of the filters and of the groups can be changed from within the mir admin.
What comprises a filter
A filter has a number of fields:
Field |
Explanation |
type |
There are a number of different filter types, each behaves in a different way. For the explanation of the different filter types, see the next section. |
expression |
the actual filter expression. the format of the expression depends entirely on the type |
tag |
if a posting matches a filter, the tag will be added to the "internal comments" field of the posting. The reason tag was introduced instead of an automatic comment with the filter expression is that a filter on ip would amount to a de facto ip log |
article |
the action to perform if the posting was an article. It's possible to, programmatically add actions, but it requires java skills |
comment |
the action ot perform if the posting is a comment |
comments |
internal comments that aren't used by mir |
Filter types and examples
Regular expression
The expression should be a Perl 5 regular expression.
The regular expression is applied case insensitively, and applies to all fields of the posting: author, title, body, abstract, etc.
suppose you'd like to filter all postings with, for instance
oehoeroeboeroe
in the body, you can simply put
oehoeroeboeroe
in the expression, but you can make use of all the complex perl5 regular expression features.
User agent
A browser transmits a so-called user-agent string to a web server. For instance:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
.
The user agent of a posting will appear in the mir ip log, if turned on.
Sometimes, a user agent is very specific for a user and can thus be used to filter postings. Some user agents are very common though, and shouldn't be used.
The filter looks for a partial match in the user agent string, so if you use
Firefox
as expression, all user agent strings that contain
firefox
will be matched
IP Number
Expressions may be like:
10.0.0.1
,
10.0.0.0/24
or
10.0.0.0/255.255.0.0
Host name
Hst name expressions are regular expressions that are applied to the hostname. For instance
.*xs4all.nl
would match that every posting originating from an xs4all.nl domain.
Throttle
The throttle filter limits the number of postings per ip.
Throttle filters are of the format:
:
.
For instance
10:20
means no ip may have more than 20 postings per 10 minutes.
The 21st posting within the 10 minutes from the same ip would match the filter.
Posting size
The expression is an number.
If the posting size (the size of all fields combined) is more than the given number, the filter will match.
URL blacklist query
This filter is a bit technical. There are a couple of services that keep track of domain names used in spam urls. I.e. the URLS that are often used in link spam postings.
More information can be obtained from the webs
ites of the blacklists, like
http://uribl.com/.
A good expression would be
multi.uribl.com
.
If a posting contains a URL that is known by this blacklist, then the filter will match.
Initial version --
ZaPata - 23 Oct 2007