SpamAssassin
A practical guide to integration and configuration

Packt Publishing


 

HOME > CHAPTER 5

Chapter 5;
Detecting Spam

Although humans can easily distinguish between spam and ham, detecting spam with computer programs is not simple. Over the years, several methods have been developed to filter spam from ham. Some anti-spam tools use only a subset of these methods, but SpamAssassin uses almost all of them.

Content Tests
Content tests analyze the message part of the email, and sometimes the headers. These tests typically look for key words or phrases within emails. Usually, when using content tests, a scoring system is used. It is not uncommon for words normally associated with spam emails to also appear in legitimate emails, so a score or count of suspicious words is accumulated for each email. Each word associated with spam increases the overall score of an email. The final score is compared with a predefined threshold; this is used to decide whether an email is spam or ham.

Content tests need not focus on single words; phrases and sequences of punctuation are used. The words, phrases, and other symbols tested are normally generated by a developer, who analyzes spam and manually creates tests.

Sometimes the message headers are examined as part of a content test. The message headers include dates, time, and other attributes, such as the mail application used. Often, spam-creation programs contain errors or misspellings in their headers that can be caught by spam filters.

Spammers attempt to avoid detection by deliberate misspellings and varying content slightly in each spam or spam run.

A simple example of a content test would be to locate the word 'Viagra' within an email. A more complex content test would be to locate the sequence of characters 'v?i?a?g?r?a', where the '?' represents any character, and one or more instances of '?' may not be present at all. For example, 'VIAGRA', 'V I A G R A', and 'V*i*agra' would all match.

The majority of tests that SpamAssassin performs are content tests. To filter content, SpamAssassin allows users to write and share their own custom rules. Writing rules is described in Chapter 12. Content tests are generally resource intensive, using CPU, memory, and disk I/O.

  • Chapter 5: Table of Contents:

    • Content Tests

    • Header Tests

    • DNS Based Blacklists

    • Statistical Tests

    • Message Recognition

    • URL Recognition

    • Examining Headers

      • Faked Headers

    • Reporting Spammers

    • Valid Bulk Email Delivery

    • Summary

BOOK DETAILS
  Paperback, 220 pages
Released: Sept 2004
ISBN: 1904811124
Author: Alistair McDonald
 
 

TABLE OF CONTENTS

Intro
1. Introducing Spam
2. Spam and Anti-Spam Techniques
3. Open Relays
4. Protecting Email Addresses
5. Detecting Spam
6. Installing SpamAssassin
7. Configuration Files
8. Using SpamAssassin
9. Bayesian Filtering
10. Look and Feel
11. Network Tests
12. Rules
13. Improving Filtering
14. Performance
15. Housekeeping and Reporting
16. Building an Anti-Spam Gateway
17. Email Clients
18. Choosing other Spam Tools
Appendix A
Index

 




View the book details
on PacktPub.com

 

 

  This website is owned and maintained by Packt Publishing Ltd, 2004. All rights reserved. Terms and Conditions