Learn about SpamAssassin’s email filtering system to keep inboxes clear of unwanted emails & assign scores to emails. Thanks to Mailosaur for providing us with this blog post.
SpamAssassin (SA) is a well-established email filtering system designed to live up to its name. It uses comprehensive spam-fighting methods to keep inboxes clear of unwanted email. The filter assigns scores to emails to separate the genuine from the unwanted.
In this article, we’ll look at how a SpamAssassin score is calculated and what you can do it improve it. Inspecting and improving your SpamAssassin results will help you write better emails that your recipients will be happy to receive.
What is SpamAssassin?
SpamAssassin (officially, ‘Apache SpamAssassin’) is an open-source project developed and operated by the Apache Software Foundation. It was initially released in 2001 with the aim of providing a robust and customisable filter for detecting ‘email spam’, the Monty Python-inspired term for the practice of sending out unsolicited emails en masse.
The filter employs a range of different tests. These include scanning an email’s body and header, and checking a sender’s IP against several different block and allow lists. Users can add and adapt rules, or simply resort to SpamAssassin’s spam classifier and train it with their own data. Many email providers rely on SpamAssassin scores to classify incoming email as spam or the opposite, ‘ham’.
The SpamAssassin score
Each SpamAssassin rule is associated with a value that can be either negative or positive. The SpamAssassin filter runs its tests on each incoming email and adds up the values for the rules that are triggered. It then returns an aggregated SpamAssassin score. Perhaps counterintuitively, a higher score signifies a higher probability that an email is spam. Therefore, when passing your transactional emails through the SpamAssassin filter, you should aim for a lower rather than a higher score.
If an email passes a certain threshold, it’s regarded as spam. By default that threshold is set to 5.0 in the SpamAssassin configuration, though it can be adjusted by the user. It’s common to tweak the threshold to reach a good balance between low numbers of false positives (genuine email wrongly classified as spam) and false negatives (spam email that tricks the filter into thinking it’s genuine). That’s why your email might make it into the inbox of one recipient, but land in the spam folder of another—they might be using an email service with more restrictive anti-spam settings. Therefore, when you test your emails against the SpamAssassin filter, simply being under the 5.0 threshold may not be enough. Rather, you should aim for the lowest possible score.
Rule-based spam filters
So what does a SpamAssassin rule look like? Let’s look at an example that uses the KAM rules, a widely used custom rule set:
You can probably figure out the type of scam targeted here, but let’s go through it line-by-line. The first line is simply a comment for the developer that isn’t very descriptive. The next two lines are more interesting: tagged as ‘body’, they consist of two Perl regular expressions (‘regexes’). Regexes are powerful expressions for pattern-matching in text and are frequently part of the SpamAssassin filter.
Together, the regexes describe a pattern often used in spammy email in which a sender describes a job offer with unrealistically great conditions. Interestingly, the rule only gets triggered if the text includes the claim that the recipient can make ‘twice as much’ (not more, not less) money than with their current employer. This is ensured by a ‘meta’ rule (a complex combination of expressions) which requires that the sum of two individual patterns is at least two.
When this meta rule is triggered, a spam score increases by 4.3 points. Remember when we said that a score of 5.0 often suffices for a spam classification? So if your email uses the above pattern, it gets dangerously close to the spam folder.
While the McGrail Foundation which developed the KAM rules is closely associated with Apache’s SpamAssassin, the ruleset is not strictly speaking part of SA’s core rule collection. Have a look at all of SpamAssassin’s default rules here. To add your own rules, you’ll need to incorporate them into your local configuration and set the parameter ‘allow_local_rules’ to 1.
Probabilistic spam filters
Hard-coded rules are fine for catching spam, but an even smarter solution is to combine these rules with a score from a probabilistic spam classifier. Enter SpamAssassin’s Bayesian classifier, which is a machine learning model often employed in spam filtering. As is common with machine-learning techniques, the more emails the classifier sees, the better it gets at categorizing them.
SpamAssassin users thus have a way of fine-tuning the filter to their requirements, by feeding their own emails designated ‘spam’ or ‘ham’ to the classifier. By enabling the ‘bayes_auto_learn’ parameter one may even use the system without any previously labeled data. In such cases, incoming emails which the filter tagged as either spam or ham with a very high probability are fed back to the classifier as training data.
How to check your SpamAssassin score
Let’s finally look at SpamAssassin in action! After setting up a new server on our Mailosaur account, we’ll send it this deliberately suspicious email:
It does not look particularly trustworthy. We can already tell that we’ll get a pretty high SpamAssassin score for this email, but can you guess which parts will most trigger the filter?
Let’s open our Mailosaur dashboard to investigate:
While there are a few slightly positive elements in our email (notably the authentication signatures in the header), we made just about every mistake possible in the email’s body. The two most negative items, however, were in addressing our recipient a ‘dear friend’ and mentioning a ‘100% guarantee’.
Interestingly, despite its mediocre SpamAssassin score, Gmail accepted the above email into an inbox folder. That’s likely due to the fact that this email service prioritises spam-catching rules differently from SpamAssassin’s default settings.
How to improve your SpamAssassin score
Improving our SpamAssassin score is not too difficult. Let’s simply avoid the spammy phrases that trigger the addition of high values to our score. So as a little challenge, let’s try to do the best we can and achieve a negative SpamAssassin score:
Not only does our email sound much less frantic than the previous one (lowercasing may have helped), but it’s also personalised and friendly, managing to get our message across just as well. What does our SpamAssassin score say now?
What a lovely sight! SpamAssassin likes our email just as much as we do—and we’ve achieved our goal of generating a negative SpamAssassin score.
Content is not everything
While examples that look at the body of an email are fun and intuitively understandable, a lot of SpamAssassin checks are performed behind the scenes, and many of these require more than a nicely written email to be fooled. For instance, if your domain lands on a block list, there’s not much you can do other than use a different domain or a different email provider.
Likewise, message authentication protocols such as DKIM and SPF have been designed specifically to address the problem of email spoofing, where the sender pretends to be somebody else. While you can still manipulate the content of your email headers, you cannot simply fake a DKIM protocol. What makes SpamAssassin so robust as a spam filter is that it combines all those metrics (and more) to form a score.
So what can you do besides writing polished emails and using well-formed HTML? Here are a few tips:
• Use the email standards we mentioned: DKIM and SPF. They prove that you wrote the email and that you’re authorised to send emails from your domain.
• Avoid sending email in bulk from a brand new ip address: that’s considered suspicious behaviour. Instead, if sending emails to multiple recipients, make sure to distribute them over several days, slowly increasing the daily load.
• Work on your sender reputation. Make sure to send personalised emails that your recipients actually want to read. The more your emails get opened and read, and the less they’re marked as spam, the better your reputation will be.
Use a dedicated email testing solution
Checking your SpamAssassin score is one tiny piece of the much larger puzzle that is email deliverability and testing. If you’re serious about communicating with your customers, consider using a dedicated email testing platform like Mailosaur, which allows you to do everything above, and much more, as part of your automated testing suite.
Mailosaur is an exhibitor at AutomationSTAR 2022. Join the software testing community in Munich, 17-18 October. Get your ticket now.
This blog is translated from English to German by Google.