How To Train the CleanMail Bayes Database

Some spam messages aren't detected by the static rule set of SpamAssassin (false negatives), and some are tagged as spam even when the aren't (false positives). Bayes training, in short, is about teaching SpamAssassin to do better for similar messages in the future. This is something you should to once or twice a week, or daily, depending on the volume of mail you handle.

The program to train the database is part of the spam assassin distribution: sa-learn. You will find it in the sa subdirectory of your installation, and the documentation can be found in the sa\doc subdirectory, or online here.

Learning Mail Folders

If you are using a non-Microsoft mail client it is almost simple to learn messages. Most mail clients are using the mbox mail folder format, or have an export function to export a mail folder to an mbox file. Collect the spam messages you want to learn in a mail folder, and export this folder to an mbox file. Then use the following commands in a command line window:

cd [CleanMailInstallationPath]
sa\sa-learn -C sa\ruleset --spam --mbox "[FolderPath]"

The -C option tells sa-learn to use the same config files as spamassassin, so it will use the same database file. Be sure to run sa-learn from your installation directory, like in the example above, because otherwise the relative path settings in the SpamAssassin configuration files will not work.

For repeated use, create a batch file with the commands above. An example is provided in the installation directory of CleanMail.

Converting mail stored in Outlook .pst files or in Exchange

If you are using Microsoft Outlook or Outlook Express, there is no simple way to export mail folders to an mbox file. There is not even a way to export a single message to a text file in RFC-822 format.

So, to convert the messages you want to learn, you need some other tool or mail client.

If you are using Outlook with Exchange, you can retrieve the mail folder to learn with IMAP2mbox (available in the contribution area), which uses IMAP access to retrieve a mail folder for learning. This approach does not require Exchange, it works for every other mail server that supports IMAP retrieval. Step-by-step instructions can be found here.

Mail clients that use the mbox format natively (e.g. Mozilla Thunderbird), or any mail clients capable of exporting mail to mbox files are suitable as well.

If you use the Outlook mail client only, there are some open source projects, and some commercial products that claim to do the job of retrieving mail from Outlook's .pst file. Several mail clients are suitable as well, examples are the Mozilla Thunderbird and Eudora mail clients.

If you get some other mail client, you can still continue to use Outlook. Use the other mail client just to import the messages you want from Outlook's .pst file, and to export them afterwards in mbox format for learning.

Here are some links:

Things to keep in mind

Closing Remarks

Your feedback is welcome! Please submit hints and suggestions to .