MirAntiAbuseFilters

Table of content :

How filters work

The basic idea of the filter system is to automatically process postings, in particular abusive postings such as spam and trolling. A filter is nothing more or less than a condition and an action: if a posting fits the condition, the action will be performed. No more than 1 filter can be matched against a single posting.

Filters are organized into groups. There's a list of groups and each group has a list of filters. Both are iterated and the action of the first matching filter will be applied to the posting. The order of the filters and of the groups can be changed from within the mir admin.

What comprises a filter

A filter has a number of fields:

Field Explanation
type There are a number of different filter types, each behaves in a different way. For the explanation of the different filter types, see the next section.
expression the actual filter expression. the format of the expression depends entirely on the type
tag if a posting matches a filter, the tag will be added to the "internal comments" field of the posting. The reason tag was introduced instead of an automatic comment with the filter expression is that a filter on ip would amount to a de facto ip log
article the action to perform if the posting was an article. It's possible to, programmatically add actions, but it requires java skills
comment the action ot perform if the posting is a comment
comments internal comments that aren't used by mir

Filter types and examples

Regular expression
The expression should be a Perl 5 regular expression. The regular expression is applied case insensitively, and applies to all fields of the posting: author, title, body, abstract, etc.

suppose you'd like to filter all postings with, for instance oehoeroeboeroe in the body, you can simply put oehoeroeboeroe in the expression, but you can make use of all the complex perl5 regular expression features.

User agent
A browser transmits a so-called user-agent string to a web server. For instance: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6. The user agent of a posting will appear in the mir ip log, if turned on. Sometimes, a user agent is very specific for a user and can thus be used to filter postings. Some user agents are very common though, and shouldn't be used.

The filter looks for a partial match in the user agent string, so if you use Firefox as expression, all user agent strings that contain firefox will be matched

IP Number
Expressions may be like: 10.0.0.1, 10.0.0.0/24 or 10.0.0.0/255.255.0.0

Host name
Hst name expressions are regular expressions that are applied to the hostname. For instance .*xs4all.nl would match that every posting originating from an xs4all.nl domain.

Throttle
The throttle filter limits the number of postings per ip. Throttle filters are of the format: :. For instance 10:20 means no ip may have more than 20 postings per 10 minutes. The 21st posting within the 10 minutes from the same ip would match the filter.

Posting size
The expression is an number. If the posting size (the size of all fields combined) is more than the given number, the filter will match.

URL blacklist query
This filter is a bit technical. There are a couple of services that keep track of domain names used in spam urls. I.e. the URLS that are often used in link spam postings. More information can be obtained from the webs ites of the blacklists, like http://uribl.com/.

A good expression would be multi.uribl.com. If a posting contains a URL that is known by this blacklist, then the filter will match.

Initial version -- ZaPata - 23 Oct 2007
Topic revision: r1 - 23 Oct 2007, ZaPata
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback