Updated: 2003-02-01; 9:58:59 AM
Doug's Inner Net News
    News and views from a software developer's perspective

daily link  Saturday, January 11, 2003

Another approach to solving the spam problem: list all new messages in your inbox based on a relevance rating. The idea is this: when you start reading your new mail, you start with the messages that are from co-workers, relatives, and so on, and eventually you get to the messages that look like spam. You might review the first few spam messages -- the ones that may or may not be spam -- then automatically trash the rest of them. This is different from a binary classification system that decides a yes-or-no question "Is this spam?" 
4:26:26 PM  permalink 


The action that a spammer tries to achieve from a recipient is key to effective filtering.

What are these actions?

The simplest action is to follow a URL that takes the recipient to a web page. Just as simple, is to reply to the message. Not quite as simple, is to call a phone number. Calling the phone number typically connects you to an answering machine that asks you to leave a number where you may be reached. The spammer calls you back. The least likely action is to send a response by mail. Using the postal system is also unlikely for another reason: because fraud conducted via U.S. mail is a serious offense. 

1:02:34 PM  permalink 


One of the points that Jeremy Bowers tries to make, is that once spammers learn how to get past Bayesian-like spam filters, then there is nothing further that we can do. I disagree.

If a spammer's goal is just to get past a spam filter, then I have no doubt that he will find a way to do that, no matter how good the filter is. But a spammer's goal is not just to get past a spam filter, it's to motivate some kind of action from the recipient. In most cases, the desired action is ultimately to get the recipient to buy a product or service. That's why a message like "Here's the link we talked about:", while it will get past every spam filter based on content, may not be a popular form of spam. The response rate may be so low that it can't justify the cost. Therefore, if Bayesian-like spam filters become widely deployed, then we have certainly won a battle against spammers.

But there is still something we can do in the way of filtering. First of all, I think we should find a way to filter based on the IP addresses of the URLs in the message. An IP address is a very small bit of information -- only 4 bytes. A UDP-based server could probably handle a large load of storing IP addresses and responding to queries. Filtering messages based on the IP addresses of the HTTP URLs they contain could be effective.

Keep in mind that even if we don't completely filter every spam message, we can still have a major impact on spam if we just make it difficult to respond to the spam. A URL in a message is just too simple to respond to. If we start filtering based on the URLs in a message, then we can take away that option from spammers.

There are other actions that a spammer can try to motivate, such as replying to a message. There are ways to filter based on the reply-to address, too. 

10:37:46 AM  permalink 


Copyright 2003 © Doug Sauder