Byteplant Forum

Home » CleanMail Support » CleanMail SpamAssassin Filter » Bayes Learning
Bayes Learning [message #454] Fri, 25 August 2006 20:15
Budd
Messages: 7
Registered: April 2006
Junior Member
Ok, I need a definitive answer here... I have pretty much a basic installation of NST, and because we are mostly Outlook and Outlook Express users, not using an Exchange server, we have up to this point not done much to try teaching the bayes database.

However, though NST is killing a good amount of spam, we'd like it to get better, so I finally took a look and found some of the dbx converter programs out there, and can now collect some false negative spam messages, convert them to mbox, and perform the SA learning process.

My question, though, is if I'm supposed to only be teaching it messages that should have been spam, or are we supposed to teach it both spam and ham? Is it effective to only keep false negative spam messages and to run those through the sa-learn --spam process each week? And if we're supposed to also teach it ham, are we supposed to consider ham as only false positive messages, or do we just collect a random bunch of good mail messages and teach SA those as ham?

Thanks for your help, great product!

-Budd Wright
Pelco Solutions, LLC
www.pelcosolutions.com
Re: Bayes Learning [message #455 is a reply to message #454] Mon, 28 August 2006 11:54 Go to previous message
support
Messages: 919
Registered: April 2004
Senior Member
> However, though NST is killing a good amount of spam, we'd like
> it to get better, so I finally took a look and found some of
> the dbx converter programs out there, and can now collect some
> false negative spam messages, convert them to mbox, and perform
> the SA learning process.
>
> My question, though, is if I'm supposed to only be teaching it
> messages that should have been spam, or are we supposed to
> teach it both spam and ham? Is it effective to only keep false
> negative spam messages and to run those through the sa-learn
> --spam process each week? And if we're supposed to also teach
> it ham, are we supposed to consider ham as only false positive
> messages, or do we just collect a random bunch of good mail
> messages and teach SA those as ham?
>
> Thanks for your help, great product!

For best results, you should teach any false positives (ham falsely classified as spam)
and any false negatives (spam falsely classified as ham) mails you get.
See also:
http://spamassassin.apache.org/full/3.1.x/dist/doc/sa-learn.html
http://wiki.apache.org/spamassassin/BayesInSpamAssassin
http://wiki.apache.org/spamassassin/BayesFaq



Customer Support
Byteplant GmbH
Re: Bayes Learning [message #456 is a reply to message #454] Mon, 28 August 2006 20:15 Go to previous message
Budd
Messages: 7
Registered: April 2006
Junior Member
Thanks... however, I have a question on this. I've taught the database several hundred spam messages, definitely more than 200, and yet, when looking at the NST administrator's Maintenance tab, where the SA statistics are shown, the Words (tokens), spam tokens and ham tokens all still display "0". Why is this?

Thanks,

-Budd
Re: Bayes Learning [message #457 is a reply to message #456] Wed, 30 August 2006 10:22 Go to previous message
support
Messages: 919
Registered: April 2004
Senior Member
Budd wrote:

> Thanks... however, I have a question on this. I've taught the
> database several hundred spam messages, definitely more than
> 200, and yet, when looking at the NST administrator's
> Maintenance tab, where the SA statistics are shown, the Words
> (tokens), spam tokens and ham tokens all still display "0".
> Why is this?

What command line exactly did you use to teach the Bayes database ?



Customer Support
Byteplant GmbH
Re: Bayes Learning [message #458 is a reply to message #457] Thu, 31 August 2006 16:25 Go to previous message
Budd
Messages: 7
Registered: April 2006
Junior Member
The standard spam learn command line found in the batch file:

sa\sa-learn -C sa\ruleset --showdots --spam --mbox "C:\bayes\*spam*.mbx"

I place our mbox files in that specified path, and everytime I run it, it runs successfully, telling me that it learned [x] number of tokens from [x] number of messages. (Not the exact message it gives me, but you get the point.)

Yet, the Maintenance tab's Words (Tokens), Ham Words and Spam Words fields still display 0 for each, even after clicking the refresh button and restarting the service...
Re: Bayes Learning [message #459 is a reply to message #458] Thu, 31 August 2006 16:31 Go to previous message
support
Messages: 919
Registered: April 2004
Senior Member
Budd wrote:

> The standard spam learn command line found in the batch file:
>
> sa\sa-learn -C sa\ruleset --showdots --spam --mbox
> "C:\bayes\*spam*.mbx"
>
> I place our mbox files in that specified path, and everytime I
> run it, it runs successfully, telling me that it learned [x]
> number of tokens from [x] number of messages. (Not the exact
> message it gives me, but you get the point.)
>
> Yet, the Maintenance tab's Words (Tokens), Ham Words and Spam
> Words fields still display 0 for each, even after clicking the
> refresh button and restarting the service...

Please send us your nospamtoday.cf and local.cf configuration files.



Customer Support
Byteplant GmbH
Previous Topic: autolearn=no - Is this the best setting
Next Topic: Learning ham
Goto Forum:
  


Current Time: Sat Dec 03 01:18:18 CET 2016