I've been thinking a lot about the spam in blog comments that
John Dowdell and
Brajeshwar have been mentioning. This could become a huge problem if it gets automated in a big way. To really filter spam, or filter anything, you have to first determine what exactly makes it different. Some success has been had with
Bayesian filtering, but really, at the heart, it isn't the text that makes spam spam. An example, if I copied a piece of spam and sent it to a person collecting spam samples for a filter, that would be legitimate mail right?
What really makes spam spam, and what makes it hateable, are two things:
1) there is no human at the other end
2) it is sent by the truckload
The best filter for point one would be a
Turing test - eg: does the thing on the other end seem capable of human intelligence. Just a simple one of course - you wouldn't want to miss any important mail from world leaders. It could be as simple as a random picture, then the person has to type in 'tree' or 'book' etc. The problem with this would be language, so maybe best to stick with numbers (and I've seen this type of thing before on the net - can't remember where, anyone?). You generate a gif (or use swf) and it displays a human-but-not-machine readable image that represents a 4 digit number, different each time. Then when commenting, the user has to type in the numbers they see as a validating key. No match, no post. I think that would go a long way to heading off the problem - I'll try to make one for this .Text software, maybe this weekend, and see how it works.
But really, why stop at blog comments? The problem with e-mail filtering today, is there is always a chance you'll miss a message, yet still a good chance your kids will wake up to "xnawlprqdt olwinw big wide open pussies", along with a descriptive photograph (that is if you still dare give your kids an e-mail account). The idea would be that your server (or a mail program extension) has a list of 'allowed' and 'disallowed' e-mail addresses. You could seed these with your current address book, scans of mail you've answered, addresses you haven't deleted (which would include newsgroups), and so on. Of course you could edit it manually too, allowing all mail from *yourCompany.com for example. People you send to would automatically be added to the allowed list too.
Then, if your server gets a message from a new address, not in the 'allowed' list, it writes back automatically (using your address), with a picture of numbers (a Turing test) and a question (written in all the languages you speak, and sure, the numbers in all the localized glyphs if needed). Answering the question correctly will allow the sender to be added to your 'allowed' list and allow the message to arrive. This only has to happen once per person. If the person had the same filter installed, the Turing message you sent would still reach them - they would have just sent you mail (and thus you would now be in their allow list). This would prevent infinite ping pong, though you could still have safeguards for that. So they simply reply, type in the numbers they see, and then you can speak with each other as normal from then on.
The reason I think this would work, it there would be a cost to becoming an 'allowed' sender, or more like a 'friction'. The cost would be five seconds for a human, and thousands of hours for a bulk mailer. This filters the second unique thing about spam, it is sent out by the millions. Even for companies, like your bank, it is nice there is a cost. If they don't care enough about your message getting through that it isn't worth 5 seconds of their time, once, then you don't need their message right? Also, if they do care that much, they can still get through to you - something that isn't always allowed in present systems. Of course you can still move them to the disallowed list, or being there is a human at the other end, just write them and ask them to stop.
If the throngs of
soon to be unemployed telemarketers are put to work answering Turing tests, you could always put a legal bit of text in the test - saying you do not accept unsolicited ads. Their response would be somewhat traceable here (they have to get the message back), and this type of thing would probably be enforceable. Probably some of this system could be circumvented, but the trick to stopping the 'million transactions a minute' aspect of spam is to add a tiny bit of friction - and perhaps risk - to each one.
The system would also be self propagating. Each verify message could contain information that explains and links to this type of software. ISPs could install it as an option for their customers. It is also simple enough that there couldn't be some dick that would have the bright idea of making it ad supported - or at least they wouldn't survive because in an afternoon you could make your own : ). I will try to get one of these going, maybe on a bogus address at first.
The best part here, is if this ever would work out, I could finally go back to being the only one on the block with a 12 incher - woot! If you have any thoughts on the matter (or the spam filter), please let me know : ).
posted on Friday, October 31, 2003 5:58 AM