BLAST-Filter

BLAST-Filter

BLAST is another incarnation of the currently very en vouge Bayesian filters.

The theorie says that a Bayesian filter 'learns' what Spam is and what not. It is done by assuming that certain words (here refered as token) have a probability to be 'known spam token'. So if a message has lots of those tokens, the summed up probability indicates Spam.

But where to get those tokens and the probabilities from?

Simple: You need to do that! Simply consider whether a message is Spam or not. The number the tokens are contained in SPAM and Not-Spam gives their probability to be spam-tokens or not.

The Good and The Bad

After reading this very short theorie abstract you might see the advantages: BLAST is very adaptiv: it does automatically adapt your view what is Spam and what not. No need to build expression filters or look up the senders IP-number in DNSBLs.

One the other hand: it needs thousands of messages to 'learn' a good profil that has a low enough error rate to be sufficient.

And if the patterns of Spam change, the filter needs to be trained again.

The more Bad

Due to its adaptive nature there is a big risk: it adapts your errors. You really need to clean out all the false positives from you spam folder with the 'this is not spam' function.

The Solution

Simply turn on the other Spam detectors. BLAST will learn these results and after a while it will 'assist' these filters to remove those Spam the other filters didn't detect.

And to avoid the adapt-problem, BLAST has an aging-option: it simply discards old tokens and does not use them any more.

Status

BLAST is experimental. Time will show its power or failure.

BLAST?

Disruptor OL uses BLAST, a set of object-oriented Delphi classes to manage the needed aspects of Bayesian statistical methods. This consists of classes to count token, condense and age token data and to analyze it.

BLAST - TSALB - This Sieve Acts Like Bayesian

Links

Read http://www.paulgraham.com/spam.html