Page 2 of 2
The idea is for each e-mail to be classified 10 times for a majority consensus. So far, the project is about one-third done.
I stepped up to the challenge. I started classifying e-mail, hoping to run across Enron employee gossip about what happened
at the last company party, such as stories of accountants wearing lamp shades on their heads (which appears to have continued
well into their working day).
I buzzed through 25 e-mail messages, most of which were obviously spam and devoid of scuttlebutt. Unfortunately, the real
messages I came across were strictly numbing work chatter, which made the seedy spam subject lines at least mildly amusing
by comparison.
I disagreed with the machines on one message, which was classified as real by the filters. The message was composed of complete
sentences that appeared to be from news stories but in utter non sequiturs. The e-mail also lacked a bull's eye zinger such
as +V1a*gra! 2nite!
The message was obviously junk, but didn't make any sense, somehow wriggling through the spam filter's clutches.
Most messages are easy to classify to anyone vaguely familiar with e-mail. But overall, machines and people disagree about
one out of 10 times, Graham-Cumming said.
Not surprisingly, phishing e-mail messages, which often look quite legitimate but dupe people into divulging personal details,
are hardest for people to distinguish, Graham-Cumming said.
The research could be used to publish an updated corpus, one that more precisely classifies what is spam and what is ham,
Graham-Cumming said. It also may lend new knowledge into phishing attempts, which continue to flourish despite better awareness.
"I'd be very interested in discovering if there are certain sorts of legitimate mail that always gets filtered," Graham-Cumming
said.
Those who participate in classifying messages have a chance to win a suite of Austin Powers movie trinkets, including an enlarger.
What's an enlarger? Check your junk mail box.
The IDG News Service is a Network World affiliate.