BLAST-Filter
BLAST is another incarnation of the currently very en vouge Bayesian
filters.
The theorie says that a Bayesian filter 'learns' what Spam is and
what not. It is done by assuming that certain words (here refered
as token) have a probability to be 'known spam token'. So if a message
has lots of those tokens, the summed up probability indicates Spam.
But where to get those tokens and the probabilities from?
Simple: You need to do that! Simply consider whether a message is
Spam or not. The number the tokens are contained in SPAM and Not-Spam
gives their probability to be spam-tokens or not.
The Good and The Bad
After reading this very short theorie abstract you might see the
advantages: BLAST is very adaptiv: it does automatically adapt your
view what is Spam and what not. No need to build expression filters
or look up the senders IP-number in DNSBLs.
One the other hand: it needs thousands of messages to 'learn' a
good profil that has a low enough error rate to be sufficient.
And if the patterns of Spam change, the filter needs to be trained
again.
The more Bad
Due to its adaptive nature there is a big risk: it adapts your errors.
You really need to clean out all the false positives
from you spam folder with the 'this is not spam' function.
The Solution
Simply turn on the other Spam detectors. BLAST will learn these
results and after a while it will 'assist' these filters to remove
those Spam the other filters didn't detect.
And to avoid the adapt-problem, BLAST has an aging-option: it simply
discards old tokens and does not use them any more.
Status
BLAST is experimental. Time will show its power or failure.
BLAST?
Disruptor OL uses BLAST, a set of object-oriented Delphi classes
to manage the needed aspects of Bayesian statistical methods. This
consists of classes to count token, condense and age token data and
to analyze it.
BLAST - TSALB - This Sieve Acts Like Bayesian
Links
Read http://www.paulgraham.com/spam.html