SpamAssassin cumulative rules

When I started using SpamAssassin about one year ago, I didn’t like the fact that even with high Bayes scores messages were not deleted, but marked as spam instead. At the same time Mozilla Thinderbird virtually never made a mistake deciding whether message is a junk mail or not.

On the other hand, messages with low Bayes scores and one or two matches by SA formal rules were getting low overall score too.

I’ve started analyzing messages that Thunderbird marked as junk mail. Soon I came to conclusion which seems quite obvious: if we have both Bayes match and formal rule match, than the overall probability that message is a spam becomes much higher. So, to solve the problem, we have to implement some meta rules that would execute when number of conditions match at the same time.

The implementation is very simple. Some samples from my local.cf below.

Example #1

meta        BAYES_HIGH_BADRELAY ( BAYES_80 || BAYES_95 || BAYES_99 ) && UNPARSEABLE_RELAY
describe    BAYES_HIGH_BADRELAY Unparsable relay in message with high Bayes score
score       BAYES_HIGH_BADRELAY 0 0 3.3 3.3
 
meta        BAYES_AVRG_BADRELAY ( BAYES_50 || BAYES_60 ) && UNPARSEABLE_RELAY
describe    BAYES_AVRG_BADRELAY Unparsable relay in message with average Bayes score
score       BAYES_AVRG_BADRELAY 0 0 1.7 1.7

So, if message contains unparseable relay string and hits Bayes filter with high score, increase overall score by 3.3 (same principle for average Bayes, but increase only by 1.7).

Example #2

meta        BAYES_HIGH_RAZOR2   ( BAYES_99 || BAYES_95 || BAYES_80 ) && RAZOR2_CHECK
describe    BAYES_HIGH_RAZOR2   High Bayes probability matches Razor2 check
score       BAYES_HIGH_RAZOR2   0 0 0 3.5
 
meta        BAYES_AVG_RAZOR2    ( BAYES_60 || BAYES_50 ) && RAZOR2_CHECK
describe    BAYES_AVG_RAZOR2    Average Bayes probability matches Razor2 check
score       BAYES_AVG_RAZOR2    0 0 0 2.7

Same as above: if we have digest network hit along with Bayes filter positive, increase the score accordingly.

The approach described above notably cleans up your mail traffic. The drawback is also obvious: you have to set up a lot of custom rules.