SpamAssassin
A practical guide to integration and configuration

Packt Publishing


 

HOME > CHAPTER 11

Chapter 11;
Network Tests

SpamAssassin on its own can detect a high proportion of spam. By using network tests, spam detection can be further improved. SpamAssassin includes support for Realtime BlockLists (RBLs) and Spam URI Realtime BlockLists (SURBLs). All these external services are easy to integrate into SpamAssassin.

The effectiveness of network tests varies from a 60% detection rate upwards. By using them in conjunction with SpamAssassin, spam detection rates are much higher, typically over 95%! However, network tests slow down spam detection. This means that the SpamAssassin processes will take longer to complete and will increase the memory usage of the email server.

This chapter describes the support SpamAssassin has for RBLs and SURBLs, and focuses on three external services:

  • Vipul's Razor

  • Pyzor

  • The Distributed Checksum Clearinghouse (DCC)

RBLs are blocklists of known sources of spam. By default, SpamAssassin uses a number of RBLs to check the source of the email.

A SURBL is a blocklist of Universal Resource Identifiers (URIs) that appear in spam email. They filter spam by having a list of websites that have been advertised in spam emails. SpamAssassin includes support for SURBLs in version 3.0, and a plug-in is available for version 2.63.

Razor, Pyzor, and DCC operate by comparing incoming emails with known spam. They allow clients to query their database to determine if an email is likely to be spam. However, there is a difference in operation—the Razor database contains only spam emails, whereas Pyzor and the DCC have a database of all emails that have been tested, and keep a count of how often they have been submitted for testing. Bulk emails are indicated by a high number of reports. In other words, Razor is a spam detecting network, and Pyzor and the DCC are bulk email detecting networks.

Razor is currently in version 2, known as Razor2.Within this chapter, Razor2 will be referred to as Razor to aid readability. Razor uses a distributed network of many servers, and only spam is reported to Razor. It is highly reliable; there are rarely false positives, and it recognizes around 25% of spam.

Pyzor uses a single server and tracks all emails, not just spam emails. Spam is detected by a high number of reports rather than being explicitly identified as spam.

The Distributed Checksum Clearinghouse, as its name implies, uses a distributed approach. Its mode of operation is that all emails are reported to it, and counted. Bulk emails will have high counts and can thus be recognized as spam. At the time of writing, there are approximately 200 machines in the DCC network. The servers exchange spam details with each other, to react quickly to new spam.

All three services are free. However, if an organization uses distributed services extensively, it can set up a server for its own use and support the service by making it available for public use.

If Razor is used, it will assist others only if spam is reported to Razor. SpamAssassin tags should not be relied upon to identify spam; a human must identify the email as spam in case there is an error. Emails addressed to a spamtrap address can be reported automatically. Spamtraps are discussed later in the chapter. Note that the Razor network must not contain incorrect data, or its effectiveness drops.

Razor, Pyzor, and DCC rely on checksums. A checksum is a small number or code made from a larger number or message. It is similar to a check digit in a credit-card number or airline ticket number. The checksums are calculated by a client application and transmitted to the server, which compares them with checksums of other emails. As checksums are small, network traffic is minimal, and so is the processing required to perform the comparison against the database of known spam. For example, DCC typically transmits 100 bytes (less than two lines of text) when querying an email message. This is a fraction of the size of an email message; the headers alone on an email will be several times larger.

Pyzor and DCC benefit from every report of email, spam or ham. Only checksums (and not the whole email) are communicated to the server, so there is no disclosure of confidential data. As checksums change even with a slight change in the message, some parts of the message—for example, those that contain dates and times—are excluded from the checksum calculations. A small network overhead is involved with reporting email. The integration of SpamAssassin and DCC described below will do this automatically, and the Pyzor package also has this ability.

In terms of effectiveness, DCC is generally considered better than the others. However, all these services can be used with SpamAssassin at the same time, the cost being a delay of one or two seconds per email while incoming messages are being processed. Keep in mind that if the servers used are unavailable, email processing will take longer.

  • Chapter 11: Table of Contents: Preview Chapter 11 HTML | PDF [210KB]

    • RBLs

    • SURBLs

      • SpamAssassin 2.63

    • Vipul's Razor

      • Installing Razor

      • Configuring Razor

      • Configuring SpamAssassin

      • Testing Razor

    • Pyzor

      • Installing Pyzor

      • Configuring Pyzor

      • Configuring SpamAssassin

      • Testing Pyzor

      • Pyzor Headers

    • DCC

      • Installing DCC

      • Configuring SpamAssassin

      • Testing DCC

      • DCC Headers

    • Spamtraps

      • Choosing a Spamtrap Address

      • Baiting the Spamtrap

      • Configuring the Email Account

    • Summary

BOOK DETAILS
  Paperback, 220 pages
Released: Sept 2004
ISBN: 1904811124
Author: Alistair McDonald
 
 

TABLE OF CONTENTS

Intro
1. Introducing Spam
2. Spam and Anti-Spam Techniques
3. Open Relays
4. Protecting Email Addresses
5. Detecting Spam
6. Installing SpamAssassin
7. Configuration Files
8. Using SpamAssassin
9. Bayesian Filtering
10. Look and Feel
11. Network Tests
12. Rules
13. Improving Filtering
14. Performance
15. Housekeeping and Reporting
16. Building an Anti-Spam Gateway
17. Email Clients
18. Choosing other Spam Tools
Appendix A
Index

 




View the book details
on PacktPub.com

 

 

  This website is owned and maintained by Packt Publishing Ltd, 2004. All rights reserved. Terms and Conditions