This is the third of a series of webpages dealing with Emmesmail's development of email filters, all of which have been based on Paul Graham's 2002 seminal work on Bayesian email filtering. The first, anti-spam.html, explained how initially we adapted and implemented Paul Graham's proposals. The second, anti-spam.devel.html, documented our filter's behavior, performance and later development from 2015-2021.
This third webpage was started to acknowledge that the kind of emails we today are interested in diverting to a spam archive, are not the same kind we diverted 20 years ago. 20 years ago, those emails were truly spam, or "unsolicited bulk mail" as it alternatively was called. Today, the emails we are classifying as "spam" and diverting to the spam archive using Emmesmail, are not spam in its original formulation. They are not sent out in bulk from an unknown spammer. They often are directed specifically to us from institutions we have a relationship with; a bank, an organization or association we are part of. We divert these emails because we consider them unnecessary and unwanted and are unable either to sever our connection to the sender or persuade the sender not to send them. These are harder to filter because the emails often share much in common with required or wanted emails from the same sender. Nonetheless, we currently are attaining levels of filtering comparable to our anti-spam filters.
Period | Received | Director-diverted | Valid passed-all | Bayes-fp | Spam-Bayes | Spam-Directed | Spam-from-blacklist* | Spam-from-whitelist | Spam-missed | Filter-Efficiency(%) |
2022 | 1243 | 7 | 853 | 56 | 314 | 0 | 0 | 7 | 24 | 93.0 ± 1.4 |
2023 | 923 | 47 | 627 | 21 | 211 | 3 | 0 | 1 | 8 | 96.2 ± 1.3 |