|
|
HOME >
CHAPTER 11
Chapter
11; Network Tests
SpamAssassin on its own can detect a high proportion of spam. By
using network tests, spam detection can be further improved.
SpamAssassin includes support for Realtime BlockLists (RBLs) and
Spam URI Realtime BlockLists (SURBLs). All these external services
are easy to integrate into SpamAssassin.
The effectiveness of network tests varies from a 60% detection rate
upwards. By using them in conjunction with SpamAssassin, spam
detection rates are much higher, typically over 95%! However,
network tests slow down spam detection. This means that the
SpamAssassin processes will take longer to complete and will
increase the memory usage of the email server.
This chapter describes the support SpamAssassin has for RBLs and
SURBLs, and focuses on three external services:
RBLs are blocklists of known sources
of spam. By default, SpamAssassin uses a number of RBLs to check the
source of the email.
A SURBL is a blocklist of Universal Resource Identifiers (URIs) that
appear in spam email. They filter spam by having a list of websites
that have been advertised in spam emails. SpamAssassin includes
support for SURBLs in version 3.0, and a plug-in is available for
version 2.63.
Razor, Pyzor, and DCC operate by comparing incoming emails with
known spam. They allow clients to query their database to determine
if an email is likely to be spam. However, there is a difference in
operation—the Razor database contains only spam emails, whereas
Pyzor and the DCC have a database of all emails that have been
tested, and keep a count of how often they have been submitted for
testing. Bulk emails are indicated by a high number of reports. In
other words, Razor is a spam detecting network, and Pyzor and the
DCC are bulk email detecting networks.
Razor is currently in version 2, known as Razor2.Within this
chapter, Razor2 will be referred to as Razor to aid readability.
Razor uses a distributed network of many servers, and only spam is
reported to Razor. It is highly reliable; there are rarely false
positives, and it recognizes around 25% of spam.
Pyzor uses a single server and tracks all emails, not just spam
emails. Spam is detected by a high number of reports rather than
being explicitly identified as spam.
The Distributed Checksum Clearinghouse, as its name implies, uses a
distributed approach. Its mode of operation is that all emails are
reported to it, and counted. Bulk emails will have high counts and
can thus be recognized as spam. At the time of writing, there are
approximately 200 machines in the DCC network. The servers exchange
spam details with each other, to react quickly to new spam.
All three services are free. However, if an organization uses
distributed services extensively, it can set up a server for its own
use and support the service by making it available for public use.
If Razor is used, it will assist others only if spam is reported to
Razor. SpamAssassin tags should not be relied upon to identify spam;
a human must identify the email as spam in case there is an
error. Emails addressed to a spamtrap address can be reported
automatically. Spamtraps are discussed later in the chapter. Note
that the Razor network must not contain incorrect data, or its
effectiveness drops.
Razor, Pyzor, and DCC rely on checksums. A checksum is a small
number or code made from a larger number or message. It is similar
to a check digit in a credit-card number or airline ticket number.
The checksums are calculated by a client application and transmitted
to the server, which compares them with checksums of other emails.
As checksums are small, network traffic is minimal, and so is the
processing required to perform the comparison against the database
of known spam. For example, DCC typically transmits 100 bytes (less
than two lines of text) when querying an email message. This is a
fraction of the size of an email message; the headers alone on an
email will be several times larger.
Pyzor and DCC benefit from every report of email, spam or ham. Only
checksums (and not the whole email) are communicated to the server,
so there is no disclosure of confidential data. As checksums change
even with a slight change in the message, some parts of the
message—for example, those that contain dates and times—are excluded
from the checksum calculations. A small network overhead is involved
with reporting email. The integration of SpamAssassin and DCC
described below will do this automatically, and the Pyzor package
also has this ability.
In terms of effectiveness, DCC is generally considered better than
the others. However, all these services can be used with
SpamAssassin at the same time, the cost being a delay of one or two
seconds per email while incoming messages are being processed. Keep
in mind that if the servers used are unavailable, email processing
will take longer.
|
 |
|
 |
| |
Paperback,
220 pages
Released: Sept 2004
ISBN: 1904811124
Author: Alistair McDonald |
|
|
|
|
|
Intro
1. Introducing Spam
2. Spam and Anti-Spam
Techniques
3. Open Relays
4. Protecting Email Addresses
5. Detecting Spam
6. Installing SpamAssassin
7. Configuration Files
8. Using SpamAssassin
9. Bayesian Filtering

10. Look and Feel
11. Network Tests 
12. Rules
13. Improving Filtering
14. Performance
15. Housekeeping and Reporting
16. Building an Anti-Spam Gateway
17. Email Clients
18. Choosing other Spam Tools
Appendix A
Index
|
 |
|

View the book details
on PacktPub.com
|
 |