|
|
HOME >
CHAPTER 5
Chapter
5; Detecting Spam
Although
humans can easily distinguish between spam and ham, detecting spam
with computer programs is not simple. Over the years, several
methods have been developed to filter spam from ham. Some anti-spam
tools use only a subset of these methods, but SpamAssassin uses
almost all of them.
Content Tests
Content tests analyze the message
part of the email, and sometimes the headers. These tests typically
look for key words or phrases within emails. Usually, when using
content tests, a scoring system is used. It is not uncommon for
words normally associated with spam emails to also appear in
legitimate emails, so a score or count of suspicious words is
accumulated for each email. Each word associated with spam increases
the overall score of an email. The final score is compared with a
predefined threshold; this is used to decide whether an email is
spam or ham.
Content tests need not focus on single words; phrases and sequences
of punctuation are used. The words, phrases, and other symbols
tested are normally generated by a developer, who analyzes spam and
manually creates tests.
Sometimes the message headers are examined as part of a content
test. The message headers include dates, time, and other attributes,
such as the mail application used. Often, spam-creation programs
contain errors or misspellings in their headers that can be caught
by spam filters.
Spammers attempt to avoid detection by deliberate misspellings and
varying content slightly in each spam or spam run.
A simple example of a content test would be to locate the word
'Viagra' within an email. A more complex content test would be to
locate the sequence of characters 'v?i?a?g?r?a', where the '?'
represents any character, and one or more instances of '?' may not
be present at all. For example, 'VIAGRA', 'V I A G R A', and 'V*i*agra'
would all match.
The majority of tests that SpamAssassin performs are content tests.
To filter content, SpamAssassin allows users to write and share
their own custom rules. Writing rules is described in Chapter 12.
Content tests are generally resource intensive, using CPU, memory,
and disk I/O.
|
 |
|
 |
| |
Paperback,
220 pages
Released: Sept 2004
ISBN: 1904811124
Author: Alistair McDonald |
|
|
|
|
|
Intro
1. Introducing Spam
2. Spam and Anti-Spam
Techniques
3. Open Relays
4. Protecting Email Addresses
5. Detecting Spam
6. Installing SpamAssassin
7. Configuration Files
8. Using SpamAssassin
9. Bayesian Filtering

10. Look and Feel
11. Network Tests 
12. Rules
13. Improving Filtering
14. Performance
15. Housekeeping and Reporting
16. Building an Anti-Spam Gateway
17. Email Clients
18. Choosing other Spam Tools
Appendix A
Index
|
 |
|

View the book details
on PacktPub.com
|
 |