CleanMail feeds incoming mail to a series of mail filters, the so-called filter pipeline. Examples of mail filters are the built-in attachment blocker, third-party virus checkers, or SpamAssassin.
Each filter analyzes the message and returns a filter result telling CleanMail what to do with it. Depending on the result, the message is passed on to the next filter in the pipeline, discarded, or delivered to its intended recipients.
Obviously, the order of the filters in the pipeline matters. If the first filter is known to consume lots of resources overall server throughput will be reduced. On the other end of the spectrum, light-weight filters may be used as a "triage" stage: obvious spam is discarded, freeing precious server resources, possibly at the cost of a higher probability of classifying legitimate mails as spam (false positives). However, resource usage is only one aspect of a much more complex issue, there are other criteria to look at when configuring your filters. Here's a list:
Aggressivity — This term describes the likelihood of a filter discarding a legitimate message, resulting in a so called false positive. Usually you do not want to get false positives, but unfortunately aggressive filters often execute very fast with little resource usage. The aggressivity of some filters can be configured, but configuring a filter to be less aggressive also increases the likelihood of spam messages passing through (false negatives).
Resource Usage — Filters performing complex tests, and providing a high degree of flexibility usually also require a large amount of system resources, such as memory or raw computing power. Depending on the amount of mail you want to filter, this may not be an issue at all, but if it is, you should avoid running these filters for each and every message you get.
Selectivity — Filters able to classify only a small percentage of messages as definitely legitimate or definitely not legitimate, passing a large percentage of "undecided" messages to the following filters are said to have a low selectivity. For example, it may not be worthwhile to run a resource-consuming filter, if its selectivity is very low.
Type — Malicious message fall into two categories: spam messages (including scams and phishing attacks), and virus messages (containing and propagating worms, trojans and viruses). Some filters are effective against spam only, others are effective on virus messages only, and some are effective against both.
CleanMail message filters can be classified according to the following table:
The built-in SMTP-level filtering (traffic limiting, anti-abuse), is not configurable in filter pipeline, and is only available in SMTP proxies.
CleanMail by default orders the filters to optimize throughput, using the following guidelines:
Filters with the lowest resource usage and the highest selectiveness should go first.
For this reason the fingerprint filter is always be one of the first filters in the filter pipeline, because of its low resource usage and its good results in blocking spam and malware.
Filters which use a lot of processing power and with low selectiveness should go last.
Therefore, SpamAssassin is one of the last filters. It does a good job at detecting spam, but its CPU and memory usage may prohibit its use for every message received.
Choosing the Right Filters
Judging from the list above, the following filters are a must-have in every CleanMail configuration, in order of their execution:
Attachment Blocker — a no-brainer, with a static set of blocked attachments, this gets rid of many virus messages at practically no cost.
Fingerprint Filter — this filter gets rid of spam and virus messages without using up resources. Though long-term studies are still not available, the additional risk of false positives appears to be very small.
SMTP Delay — another no-brainer. This filter in its simplicity makes you wonder, why it hasn't been countered by spammers yet. Sometimes legitimate batch mailers run into problems with this, if they have set their timeouts set too low. In this case, remove this filter.
SpamAssassin — a classic. The leading open-source spam filter, highly flexible with exceptionally good results, though at the cost of heavy resource usage.
Anti-Virus — you can use the open source Clam AV scanner, or integrate any other third party scanner. Use multiple virus scanners, if you have the necessary processing power available.
The other filters are situational - your milage may vary:
Blacklist — there are situations where this filter is useful, but in general it is largely ineffective, as spammers usually use a different fake address for every message.
DNSBL — DNSBLs sometimes are too aggressive, but overall the low resource usage of these filters may help you out of a tight spot if your filtering server runs into system load trouble. If not, there is no need to add this filter, as SpamAssassin already integrates DNSBLs in a less aggressive form.
The effectiveness of DNSBLs for IPv6 is in doubt, given the fact that a spammer can use a different IP address for every message he sends. At the moment only a small percentage of spam and viruses delivered using IPv6, so this is not an issue (yet).
Whitelist — use if needed. You can configure for every filter individually if the whitelist should be ignored, as some users prefer to run anti-virus and attachment filtering even for messages originating from whitelisted senders.
When configuring CleanMail with the Admin application, every new filter will be automatically moved to the best position in the filter pipeline. Afterwards, you can still change the order of filters, but only within limits.
Example Filtering Results
The figure below shows typical filtering results for a CleanMail filter pipeline, using an attachment blocker, the fingerprint filter, the delay filter, SpamAssassin and Clam Anti-Virus, in that order, with the built-in SMTP checks as an added bonus filter getting rid of abusive SMTP traffic even before the filter pipeline is invoked.
The low-resource usage filters are able to discard 77.6% of all incoming mail traffic, before the rest (22.4%) is passed to the more elaborate filters such as SpamAssassin and Anti-Virus, finally leaving only 15.2% of all messages classified as legitimate and passed on to the recipients.
The fingerprint filter in this example, being one of the first filters, is able to get rid of the lion's share of all unwanted messages, removing this filter would increase the slices for SpamAssassin and Anti-Virus proportionately.
In summary, the light-weight filters are able to increase your CleanMail server's message throughput almost five-fold, in comparison to a solution only using SpamAssassin and Anti-Virus.
Looking at the chart and the numbers, you might be misled into thinking that SpamAssassin and Anti-Virus have a rather small effect on the filtering results and can be removed from the filter pipeline with only little impact on results. However, precisely these filters are needed to teach spam fingerprints to your fingerprint database, as a spam message has to be filtered at least once by some other filter, before all subsequent occurrences of similar messages can be discarded by the fingerprint filter. After removing SpamAssassin and Anti-Virus from the pipeline, the fingerprint filter would effectively stop to work.
Too many false positives — chances are you are using one of the more aggressive mail filters. Get rid of DNSBL, if the processing power of your server permits. Always learn false positives as ham to improve future results, and whitelist any senders that are repeatedly blocked by your filtering. Make searches available to your users on your Intranet, so they can check the list of blocked messages themselves.
Too many false negatives — add more filters. With every new filter the chances increase that a particular spam or virus message could have been detected. Learn false negatives as as spam.
Server is at 100% CPU constantly — check for flooding. If you are not being flooded, and the high resource usage is constant, add filters with low resource usage and high selectivity. The fingerprint filter is a must, and you may also need DNSBL. If all of this does not help, upgrade your hardware.
Your feedback is welcome! Please submit hints and suggestions to