"Spamming is the scourge of electronic-mail and newsgroups on the Internet. It can seriously interfere with the operation of public services, to say nothing of the effect it may have on any individual's e-mail mail system. ... Spammers are, in effect, taking resources away from users and service suppliers without compensation and without authorization."
-- Vint Cerf, Senior VP, MCI and acknowledged "Father of the Internet"
Spam. Seems like it's become a cost of having an e-mail address these days: if you post in a newsgroup, enter something in an on-line guestbook, or have your email address on the Net in some way, sooner or later you'll get harvested by the spambots. Even if you don't, spam _still_ costs you money: it takes up bandwidth that could otherwise be used for real information transfer, leading to overall higher costs for ISPs - and consequently, keeping up costs of service for everyone. This cost is, incidentally, up in the tens of millions of dollars per month (see http://www.techweb.com/se/directlink.cgi?INW19980504S0003) for an excellent overview) - and this translates directly to about $2 of your monthly bill. If you pay for your access "by the byte", there is yet another cost - and all this comes before you add in the cost of your own wasted time.
Is there anything that we can do? The answer is "yes". We can stop spam from polluting our own mailboxes, and we can intercept it back at the ISP, if we have access to a shell account and they implement a simple tool (and most ISPs that provide shell accounts do). I invite those of you who would like to fight spam at its root to take a look at http://www.cauce.org - these are the folks that are advocating a legislative solution to spam; the information on their site tells you how you can help. In this article, however, I will concentrate on stopping spam locally - at your shell account or on your own machine.
There are several ways to do this, but the most common by far - and one that most ISPs offering shell-accounts already have - is a program called "procmail" by Stephen R. van den Berg, an e-mail processor that uses a 'recipe' that tells it what to keep, what to filter, and what to redirect to another mailbox. So, we need to do two things: first, we need to tell our system to use "procmail"; second, we need to cobble together a 'recipe' that will do what we want.
In my own case, I collect my e-mail via "fetchmail", running as a daemon. This is something I would recommend to everyone, even if you normally collect your mail via Netscape: fetchmail does one job (mail collection) and does it very well, in the worst and most complex of circumstances, things that Netscape doesn't even try to do (multiple servers with different protocols and different usernames, for example) - and Netscape will happily read your local mailbox instead of the ISPs.
Normally, my "fetchmail" will wake up every 5 minutes, pull down the mail from the several servers that I use, and pass it to "sendmail" which then puts it in my mailbox. Whew. Sounds like wasted effort to me, but I guess that's the way things are when you scale down an MTA intended for processing big batches... Actually, using "procmail" eliminates that last step.
In my "~/.fetchmailrc", the resource file that controls what "fetchmail" does when it runs, the pertinent line reads:
mda "procmail"
This tells "fetchmail" to use "procmail" as the mail delivery agent
instead of "sendmail" - remember, this is for incoming mail only; your
outgoing mail will not be affected.
The other way to do this - and this is the way I recommend if you're filtering mail at your ISP's machine - is to create a ".forward" file in your home directory (this tells your MTA to 'forward' the mail - in this case to our processor.)
Edit ".forward" and enter one of the following lines:
"|exec /usr/bin/procmail"
if you're using "sendmail" (the quotes are necessary in this case).
If you are using "exim", use this instead:
|/usr/bin/procmail
[ Note: According to Mike Orr, "exim" has its own procmail-like filtering language. I haven't looked at it, but it should be in the "exim" docs. ]
You'll need to double-check the actual path to "procmail": you can get that by typing:
which procmail
at the command prompt.
Now that we have redirected all our mail to pass through procmail, the overall effect is... nothing. Huh? Oh yeah - we still have to set up the recipe! Let's take a look at a very simple ".procmailrc", the file in which the recipes are kept:
:0:
* ^Sender:.*owner-linux-kernel-announce@vger.rutgers.edu
linux-kernel-announce
:0:
* ^Resent-Sender.*debian-user-request@lists.debian.org
debian-user
Recipes are built like this:
Notation
Meaning
========
=======
:0
Begin a recipe
:
Use a lock file (strongly recommended)
*
Begin a condition
^
Match the beginning of a line followed by....
Subject:
``Subject:'' followed by....
.
any character (.) followed by....
* 0 or more of preceding character
(any character in
this case) followed by....
test ``test''
joe
If successful match, put in folder $MAILDIR/joe
My own recipe has been in service for quite a while. I built a rather basic one at first, and this immediately decreased the spam volume by at least 95%; later, I added a "blacklist" and a "whitelist" to always reject/accept mail from certain addresses - the first is useful for spammers that manage to get through, especially those that send their garbage multiple times, the second one is for my friends whose mail I don't want to filter out no matter what strange things they may put in the headers (I have some strange friends. :)
For those of you who use "mutt", here's how I add people to those lists: in my "/etc/Muttrc", I have these lines:
macro index \ew '| formail -x From: | addysort >> ~/Mail/white.lst'
macro pager \ew '| formail -x From: | addysort >> ~/Mail/white.lst'
macro index \eb '| formail -x From: | addysort >> ~/Mail/black.lst'
macro pager \eb '| formail -x From: | addysort >> ~/Mail/black.lst'
and in my "/usr/local/bin", I have a script called "addysort":
unless (/\</) { print; } else { print /<([^>]+)/, "\n"; }
Given the above, all I have to do with a given spammer is hit 'Esc-b'
- and I'll never see him again. By the same token, a person whom I want
to add to the whitelist gets an 'Esc-w' - and they're permanently in my
good graces. :)
So, here is my "~/.procmailrc":
# Test if the email's sender is whitelisted; if so, send it straight
to
# $DEFAULT. Note that this comes before any other filters.
:0:
* ? formail -x"From" -x"From:" -x"Sender:" \
-x"Reply-To:" -x"Return-Path:" -x"To:" \
| egrep -is -f $MAILDIR/white.lst
$DEFAULT
# Test if the email's sender is blacklisted; if so, send it to "/dev/null"
:0
* ? formail -x"From" -x"From:" -x"Sender:" \
-x"Reply-To:" -x"Return-Path:" -x"To:" \
| egrep -is -f $MAILDIR/black.lst
/dev/null
# Here is the real spam-killer, much improved by Dan's example below:
if
# it isn't addressed, cc'd, or in some way sent to one of my addresses
-
# all of which contain either "fuzzybear" or "ulysses" - it's spam.
Note
# the '!' in front of the matching expression: it inverts the sense
of the
# match, that is "if the line _doesn't_ match these words, then
send it to
# $SPAMFILE." "^TO" is a procmail variable that matches "To:",
"Cc:",
# "Bcc:", and other "To:"-type headers (see 'man procmailrc'.)
:0:
* !^TO .*(fuzzybear|ulysses).*
$SPAMFILE
# X-Advertisement header = spam!
:0:
* ^X-Advertisement:.*
$SPAMFILE
# To nobody!
:0:
* To:[ ]*$
$SPAMFILE
# No "To:" header at all!
:0:
* !^To: .*
$SPAMFILE
For most folks, the only thing necessary would be the last four stanzas of the above (and of course, the variables at the beginning), with the first stanza of that doing 95% of the work. The last three I stole from Dan :), but I can see where they'd come in handy.
By the way, yet another useful thing is a mechanism I've implemented for reporting spammers: in my "/etc/Muttrc", I have a line that says
send-hook (~s\ Spammer) 'set signature="~/.mutt/spammer"'
and a "signature" file, "~/.mutt/spammer" that says
I've just received mail from a spammer who seems to be coming from
your
domain. Please fall upon this creature and rend him to bits. His
garbage,
with headers, is appended.
Sincerely,
Ben Okopnik
-=-=-=-=-=-
<grin> No, I don't like spammers.
So, in order to send a complaint, I look at the headers with the 'h' key, run a 'whois' on the originating address, hit 'm' to send mail, and type the following at the prompts:
To: abuse@<domain.com>
Subject: Spammer
Given that keyword in the subject, "mutt" pulls up my "spammer" signature file. I save it, append the original spam with the 'A' key, and send it. About 15 seconds of typing, whenever I feel like getting another spam kill. :)
(Just as I was about to send off this article, I got another spam
kill, this time from UUnet. <grin> I collect'em.)
On we go to other folks' recipes.
LG itself is in a rather vulnerable position in regard to spam. Since the address is posted on the Net thousands (by now, perhaps millions) of times, it is constantly harvested by spammers. The thing is, "tight" filters of the sort that I've described are not feasible. Consider: what would be a safe "filter" for spam that would not also knock out a certain percentage of our readers' mail? People send mail in from all over the world, in many different ways, with just about every kind of client (including those that create broken headers.) The spam rate for LG, according to Mike Orr, is 28% per month. Rejecting "Precedence: bulk" mail, which can be a good "minor" filter, is not an option: many "News Bytes" entries (since they are bulk-mailed news releases) are sent that way. Even the inquiry that led to the publication of the HelpDex cartoon series came that way.
What to do?
The answer seems to be careful, "accept-when-in-doubt" filtering and
checking the "spam" mailbox more often than most people. For the staff
at LG, it's just another cost of doing business. Hopefully, even their
"loose" filters decrease some of that load.
Dan Wilder, the resident admin and system magician, had the following "spam killer" in his "~/.procmailrc":
Pretty obvious - anything that doesn't match those three domains in the specified headers wasn't sent to him. Dan checks his "spamfile" every once so often - as do I, because real mail can slip by and match by mistake - and this takes care of the tiny percentage of errors.
:0:
* ^(From:|To:|Cc:) .*(-list\
|debian\
|networksolutions\
|ciac\
...
|bugs\
)
$DEFAULT
Much the same thing as my "whitelist", but hard-coded into ".procmailrc"
itself. It's not much more difficult to add new people there, so it's just
as good a solution as mine, though perhaps a trifle less automatic.
LG's editor, Mike Orr, does a fair bit of sorting (which I've clipped) as well as spam-killing in his recipes (designed by Dan Wilder):
# The real workhorse.
# Bogus recipient .. not To: or From: or Cc: ssc.com,
# and has an "@" in To: (local mail from damaged MUAs may not)
:0:
* !^(To:|From:|Cc:) .*(\
ssc\.com\
|linuxjournal\.com\
|linuxgazette\.com\
)
* ^To: .*@
$SPAMFILE
Hmm, looks like we have a blacklisted spammer here...
:0:
* From: .*john@songpeople.com
$SPAMFILE
# if they have an X-Advertisement header, it's spam!
:0:
* ^X-Advertisement:.*
$SPAMFILE
# To nobody!
:0:
* To:[ ]*$
$SPAMFILE
# No To: header at all!
:0:
* !^To: .*
$SPAMFILE
# Otherwise, fall off the end and default.
Wow. Seems like a whole lot of stuff, doesn't it? In reality, it's minor:
1) Pipe all incoming mail through "procmail".
2) Build a recipe.
That's it. I've thrown in a lot of tips for doing other spam-related stuff, but the above two steps are all you need to do in order to decrease or almost completely eliminate your spam. Oh, look - here comes another tip! :)
If you already have a mailbox full of spam, and you go ahead and do the above two steps, it's really easy to filter it all in one shot:
cat mbox | formail -s procmail
So, you can indeed "de-spam" your mailbox. Thanks to the power of Linux and "procmail", you too can stare at people who complain about getting deluged and say "Oh, yeah... I remember when I had that problem." <Laugh>
Happy spam-hunting!