Updated: 2003-02-01; 9:58:58 AM
Doug's Inner Net News
    News and views from a software developer's perspective

daily link  Tuesday, January 07, 2003

Here's an example spam message from Jeremy Bowers [jerf.org]:
Subject: Re: Re: the proposal
 
That's a nice point, but I think you should consider the information 
at http://www.somewebsite.com/info.html before going with that 
approach. I found that information to be really pertinent.

Yes, indeed, this message would get past even the best spam filters that try to interpret the message's content, such as Bayesian-like filters.

On the other hand, this spam email would not have a very good marketing message. 

10:49:19 PM  permalink 


I think the idea of a Bayesian filter is a bit too simplistic. It's too automatic. Nothing in the real world is simple, and real intelligence requires more than just magically applying an algorithm that can be described in a few paragraphs.

For example, a good filter could really cut down on false positives on messages that I get from co-workers. But I wouldn't rely on just a simple Bayesian classifier. A good filter can look at the content of the From, Reply-To, Received, and X-Mailer header fields. It's possible that a spammer could guess a From address that is in my address book -- just pick an address from their massive list that has the same domain, which probably indicates someone in the same company. They may even be able to guess the content of the X-Mailer header field, since Outlook is so widely used in businesses. However, they probably couldn't guess the sending host in the Received header field. A very simple filter could intelligently inspect these carefully chosen fields to determine if an email originated from a person who already has an email in one of my mail folders. I expect that this would really cut down on false positives, and would be difficult for spammers to get past.

So, that means I would have a folder for known good incoming email.

But what I am really concerned about is false positives on emails that originate from someone that I have not received from before. 

10:33:29 PM  permalink 


There's lots of discussion on slashdot about spam filtering, especially Bayesian classifier-based filters. 
10:20:22 PM  permalink 


I think one of the important parts of an "Internet education" must be instruction on how to send an email that gets past spam filters. A word of advice: don't put viagra in the subject line and expect it to get through to the intended recipient's inbox. That's too obvious. Slightly less obvious: don't write WITH THE CAPS LOCK ON. I hope we don't get to the point where creating a subject line like "Want to go out for lunch?" means your email ends up in the spam folder. 
10:11:35 PM  permalink 


Spam Filtering's Last Stand Good stuff here. 
10:07:44 PM  permalink 


Copyright 2003 © Doug Sauder