Table of content :
How filters work
The basic idea of the filter system is to automatically process postings, in particular abusive postings such as spam and trolling.
A filter is nothing more or less than a condition and an action: if a posting fits the condition, the action will be performed. No more than 1 filter can be matched against a single posting.
Filters are organized into groups. There's a list of groups and each group has a list of filters. Both are iterated and the action of the first matching filter will be applied to the posting. The order of the filters and of the groups can be changed from within the mir admin.
What comprises a filter
A filter has a number of fields:
|| There are a number of different filter types, each behaves in a different way. For the explanation of the different filter types, see the next section.
|| the actual filter expression. the format of the expression depends entirely on the type
|| if a posting matches a filter, the tag will be added to the "internal comments" field of the posting. The reason tag was introduced instead of an automatic comment with the filter expression is that a filter on ip would amount to a de facto ip log
|| the action to perform if the posting was an article. It's possible to, programmatically add actions, but it requires java skills
|| the action ot perform if the posting is a comment
|| internal comments that aren't used by mir
Filter types and examples
The expression should be a Perl 5 regular expression.
The regular expression is applied case insensitively, and applies to all fields of the posting: author, title, body, abstract, etc.
suppose you'd like to filter all postings with, for instance
in the body, you can simply put
in the expression, but you can make use of all the complex perl5 regular expression features.
A browser transmits a so-called user-agent string to a web server. For instance:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:126.96.36.199) Gecko/20070725 Firefox/188.8.131.52
The user agent of a posting will appear in the mir ip log, if turned on.
Sometimes, a user agent is very specific for a user and can thus be used to filter postings. Some user agents are very common though, and shouldn't be used.
The filter looks for a partial match in the user agent string, so if you use
as expression, all user agent strings that contain
will be matched
Expressions may be like:
Hst name expressions are regular expressions that are applied to the hostname. For instance
would match that every posting originating from an xs4all.nl domain.
The throttle filter limits the number of postings per ip.
Throttle filters are of the format:
means no ip may have more than 20 postings per 10 minutes.
The 21st posting within the 10 minutes from the same ip would match the filter.
The expression is an number.
If the posting size (the size of all fields combined) is more than the given number, the filter will match.
URL blacklist query
This filter is a bit technical. There are a couple of services that keep track of domain names used in spam urls. I.e. the URLS that are often used in link spam postings.
More information can be obtained from the webs
ites of the blacklists, like http://uribl.com/
A good expression would be
If a posting contains a URL that is known by this blacklist, then the filter will match.
Initial version -- ZaPata
- 23 Oct 2007