"Linux Gazette...making Linux just a little more fun!"

No More Spam!

(a "procmail"-based solution with tips on "fetchmail" and "mutt")

By Ben Okopnik

"Spamming is the scourge of electronic-mail and newsgroups on the Internet. It can seriously interfere with the operation of public services, to say nothing of the effect it may have on any individual's e-mail mail system. ... Spammers are, in effect, taking resources away from users and service suppliers without compensation and without authorization."
-- Vint Cerf, Senior VP, MCI and acknowledged "Father of the Internet"

Spam. Seems like it's become a cost of having an e-mail address these days: if you post in a newsgroup, enter something in an on-line guestbook, or have your email address on the Net in some way, sooner or later you'll get harvested by the spambots. Even if you don't, spam _still_ costs you money: it takes up bandwidth that could otherwise be used for real information transfer, leading to overall higher costs for ISPs - and consequently, keeping up costs of service for everyone. This cost is, incidentally, up in the tens of millions of dollars per month (see http://www.techweb.com/se/directlink.cgi?INW19980504S0003) for an excellent overview) - and this translates directly to about $2 of your monthly bill. If you pay for your access "by the byte", there is yet another cost - and all this comes before you add in the cost of your own wasted time.

Is there anything that we can do? The answer is "yes". We can stop spam from polluting our own mailboxes, and we can intercept it back at the ISP, if we have access to a shell account and they implement a simple tool (and most ISPs that provide shell accounts do). I invite those of you who would like to fight spam at its root to take a look at http://www.cauce.org - these are the folks that are advocating a legislative solution to spam; the information on their site tells you how you can help. In this article, however, I will concentrate on stopping spam locally - at your shell account or on your own machine.

There are several ways to do this, but the most common by far - and one that most ISPs offering shell-accounts already have - is a program called "procmail" by Stephen R. van den Berg, an e-mail processor that uses a 'recipe' that tells it what to keep, what to filter, and what to redirect to another mailbox. So, we need to do two things: first, we need to tell our system to use "procmail"; second, we need to cobble together a 'recipe' that will do what we want.

In my own case, I collect my e-mail via "fetchmail", running as a daemon. This is something I would recommend to everyone, even if you normally collect your mail via Netscape: fetchmail does one job (mail collection) and does it very well, in the worst and most complex of circumstances, things that Netscape doesn't even try to do (multiple servers with different protocols and different usernames, for example) - and Netscape will happily read your local mailbox instead of the ISPs.

Normally, my "fetchmail" will wake up every 5 minutes, pull down the mail from the several servers that I use, and pass it to "sendmail" which then puts it in my mailbox. Whew. Sounds like wasted effort to me, but I guess that's the way things are when you scale down an MTA intended for processing big batches... Actually, using "procmail" eliminates that last step.

In my "~/.fetchmailrc", the resource file that controls what "fetchmail" does when it runs, the pertinent line reads:

mda "procmail"

This tells "fetchmail" to use "procmail" as the mail delivery agent instead of "sendmail" - remember, this is for incoming mail only; your outgoing mail will not be affected.

The other way to do this - and this is the way I recommend if you're filtering mail at your ISP's machine - is to create a ".forward" file in your home directory (this tells your MTA to 'forward' the mail - in this case to our processor.)

Edit ".forward" and enter one of the following lines:

"|exec /usr/bin/procmail"

if you're using "sendmail" (the quotes are necessary in this case).

If you are using "exim", use this instead:

|/usr/bin/procmail

[ Note: According to Mike Orr, "exim" has its own procmail-like filtering language. I haven't looked at it, but it should be in the "exim" docs. ]

You'll need to double-check the actual path to "procmail": you can get that by typing:

which procmail

at the command prompt.

Now that we have redirected all our mail to pass through procmail, the overall effect is... nothing. Huh? Oh yeah - we still have to set up the recipe! Let's take a look at a very simple ".procmailrc", the file in which the recipes are kept:

PATH=$HOME/bin:/usr/local/bin:/usr/bin:/bin
MAILDIR=/var/spool/mail # make sure this is right
DEFAULT=$MAILDIR/username # completely optional
LOGFILE=/var/log/procmail.log # recommended

:0:
* ^Sender:.*owner-linux-kernel-announce@vger.rutgers.edu
linux-kernel-announce

:0:
* ^Resent-Sender.*debian-user-request@lists.debian.org
debian-user

Those top four lines, once you've checked to make sure that the variables are correct for your system, should be in every ".procmailrc". What comes after can be as complex as you want - you could cobble up a HUGE ".procmailrc" that does more sorting than the main US Post Office - but for spam filtering purposes (and that's the only thing most folks use it for), it's not very complex at all. The above recipe simply sorts the mail into two boxes, "linux-kernel-announce" and "debian-user" before "falling
off the end" and delivering everything else into $DEFAULT.

Recipes are built like this:

:0:
* ^Subject:.*test
joe

Notation            Meaning
========            =======
:0                  Begin a recipe
:                 Use a lock file (strongly recommended)
*                   Begin a condition
^                 Match the beginning of a line followed by....
   Subject:         ``Subject:'' followed by....
           .        any character (.) followed by....
            *       0 or more of preceding character (any character in
                      this case) followed by....
             test   ``test''
joe                 If successful match, put in folder $MAILDIR/joe

What we'll do here is take a look at several people's solutions; in order to write this article, I polled the members of the Answer Gang, and some of their recipes - along with their rationale for them - are shown below.

My own recipe has been in service for quite a while. I built a rather basic one at first, and this immediately decreased the spam volume by at least 95%; later, I added a "blacklist" and a "whitelist" to always reject/accept mail from certain addresses - the first is useful for spammers that manage to get through, especially those that send their garbage multiple times, the second one is for my friends whose mail I don't want to filter out no matter what strange things they may put in the headers (I have some strange friends. :)

For those of you who use "mutt", here's how I add people to those lists: in my "/etc/Muttrc", I have these lines:

and in my "/usr/local/bin", I have a script called "addysort":

#!/usr/bin/perl -wn
# Picks out the actual address from the "From:" line

unless (/\</) { print; } else { print /<([^>]+)/, "\n"; }

Given the above, all I have to do with a given spammer is hit 'Esc-b' - and I'll never see him again. By the same token, a person whom I want to add to the whitelist gets an 'Esc-w' - and they're permanently in my good graces. :)

So, here is my "~/.procmailrc":

PATH=/usr/local/bin:/usr/bin:/bin
MAILDIR=/var/spool/mail
DEFAULT=/var/spool/mail/ben
LOGFILE=/var/log/procmail
SPAMFILE=/var/spool/mail/spam

# Test if the email's sender is whitelisted; if so, send it straight to
# $DEFAULT. Note that this comes before any other filters.
:0:
* ? formail -x"From" -x"From:" -x"Sender:" \
-x"Reply-To:" -x"Return-Path:" -x"To:" \
| egrep -is -f $MAILDIR/white.lst
$DEFAULT

# Test if the email's sender is blacklisted; if so, send it to "/dev/null"
:0
* ? formail -x"From" -x"From:" -x"Sender:" \
-x"Reply-To:" -x"Return-Path:" -x"To:" \
| egrep -is -f $MAILDIR/black.lst
/dev/null

# Here is the real spam-killer, much improved by Dan's example below: if
# it isn't addressed, cc'd, or in some way sent to one of my addresses -
# all of which contain either "fuzzybear" or "ulysses" - it's spam. Note
# the '!' in front of the matching expression: it inverts the sense of the
# match, that is "if the line _doesn't_ match these words, then send it to
# $SPAMFILE." "^TO" is a procmail variable that matches "To:", "Cc:",
# "Bcc:", and other "To:"-type headers (see 'man procmailrc'.)
:0:
* !^TO .*(fuzzybear|ulysses).*
$SPAMFILE

# X-Advertisement header = spam!
:0:
* ^X-Advertisement:.*
$SPAMFILE

# To nobody!
:0:
* To:[ ]*$
$SPAMFILE

# No "To:" header at all!
:0:
* !^To: .*
$SPAMFILE

For most folks, the only thing necessary would be the last four stanzas of the above (and of course, the variables at the beginning), with the first stanza of that doing 95% of the work. The last three I stole from Dan :), but I can see where they'd come in handy.

By the way, yet another useful thing is a mechanism I've implemented for reporting spammers: in my "/etc/Muttrc", I have a line that says

send-hook (~s\ Spammer) 'set signature="~/.mutt/spammer"'

and a "signature" file, "~/.mutt/spammer" that says

Dear sirs:

I've just received mail from a spammer who seems to be coming from your
domain. Please fall upon this creature and rend him to bits. His garbage,
with headers, is appended.

Sincerely,
Ben Okopnik
-=-=-=-=-=-

<grin> No, I don't like spammers.

So, in order to send a complaint, I look at the headers with the 'h' key, run a 'whois' on the originating address, hit 'm' to send mail, and type the following at the prompts:

To: abuse@<domain.com>
Subject: Spammer

Given that keyword in the subject, "mutt" pulls up my "spammer" signature file. I save it, append the original spam with the 'A' key, and send it. About 15 seconds of typing, whenever I feel like getting another spam kill. :)

(Just as I was about to send off this article, I got another spam kill, this time from UUnet. <grin> I collect'em.)

On we go to other folks' recipes.

LG itself is in a rather vulnerable position in regard to spam. Since the address is posted on the Net thousands (by now, perhaps millions) of times, it is constantly harvested by spammers. The thing is, "tight" filters of the sort that I've described are not feasible. Consider: what would be a safe "filter" for spam that would not also knock out a certain percentage of our readers' mail? People send mail in from all over the world, in many different ways, with just about every kind of client (including those that create broken headers.) The spam rate for LG, according to Mike Orr, is 28% per month. Rejecting "Precedence: bulk" mail, which can be a good "minor" filter, is not an option: many "News Bytes" entries (since they are bulk-mailed news releases) are sent that way. Even the inquiry that led to the publication of the HelpDex cartoon series came that way.

What to do?

The answer seems to be careful, "accept-when-in-doubt" filtering and checking the "spam" mailbox more often than most people. For the staff at LG, it's just another cost of doing business. Hopefully, even their "loose" filters decrease some of that load.

Dan Wilder, the resident admin and system magician, had the following "spam killer" in his "~/.procmailrc":

Pretty obvious - anything that doesn't match those three domains in the specified headers wasn't sent to him. Dan checks his "spamfile" every once so often - as do I, because real mail can slip by and match by mistake - and this takes care of the tiny percentage of errors.

Before that is a rule that allows exemptions, as for mailing lists:

...

|bugs\
)
$DEFAULT

Much the same thing as my "whitelist", but hard-coded into ".procmailrc" itself. It's not much more difficult to add new people there, so it's just as good a solution as mine, though perhaps a trifle less automatic.

LG's editor, Mike Orr, does a fair bit of sorting (which I've clipped) as well as spam-killing in his recipes (designed by Dan Wilder):

LOG=$HOME/log
#LOGFILE=$LOG/procmail-log
VERBOSE=no
SPAMFILE=$LOG/spam
UMASK=077
LOGABSTRACT=on
COMSAT=no
DEFAULT=$HOME/Mail/Maildir/new

# The real workhorse.
# Bogus recipient .. not To: or From: or Cc: ssc.com,
# and has an "@" in To: (local mail from damaged MUAs may not)

:0:
* !^(To:|From:|Cc:) .*(\
ssc\.com\
|linuxjournal\.com\
|linuxgazette\.com\
)
* ^To: .*@
$SPAMFILE

Hmm, looks like we have a blacklisted spammer here...

:0:
* From: .*john@songpeople.com
$SPAMFILE

# if they have an X-Advertisement header, it's spam!

:0:
* ^X-Advertisement:.*
$SPAMFILE

# To nobody!

:0:
* To:[ ]*$
$SPAMFILE

# No To: header at all!
:0:
* !^To: .*
$SPAMFILE

# Otherwise, fall off the end and default.

Wow. Seems like a whole lot of stuff, doesn't it? In reality, it's minor:

1) Pipe all incoming mail through "procmail".
2) Build a recipe.

That's it. I've thrown in a lot of tips for doing other spam-related stuff, but the above two steps are all you need to do in order to decrease or almost completely eliminate your spam. Oh, look - here comes another tip! :)

If you already have a mailbox full of spam, and you go ahead and do the above two steps, it's really easy to filter it all in one shot:

cat mbox | formail -s procmail

So, you can indeed "de-spam" your mailbox. Thanks to the power of Linux and "procmail", you too can stare at people who complain about getting deluged and say "Oh, yeah... I remember when I had that problem." <Laugh>

Happy spam-hunting!

"Linux Gazette...making Linux just a little more fun!"

No More Spam!

(a "procmail"-based solution with tips on "fetchmail" and "mutt")

By Ben Okopnik

Copyright © 2001, Ben Okopnik. Copying license http://www.linuxgazette.net/copying.html Published in Issue 62 of Linux Gazette, February 2001

Copyright © 2001, Ben Okopnik.
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 62 of Linux Gazette, February 2001