CAMBRIDGE, MASS. – Blog spammers have found ways to automate inserting their unwanted messages into online conversations,
but the few tools available to block them lag woefully behind.
“How far ahead [of us] are the spammers? Who knows,” says Jessica Baumgart, an affiliate with Harvard University’s Berkman Center for Internet and Society, who gave
a presentation on blog spam at the MIT Spam Conference 2007 held in Cambridge Friday. “Any time we try to block them out,
they find a way to get in. We’ll do something and five minutes later they’re back. It’s like playing chess.”
According to Baumgart, who has been involved with Harvard’s blogging initiative for seven years and manages tens of blogs
on seven different platforms, there are three main ways spammers get their messages into blogs:
• Comment spam: spammers are paid to surf the Web in search of blogs to manually type comments into, or write scripts to automatically enter
the text. These can be hard to distinguish from legitimate entries, Baumgart says, except they’re often off the topic of the
blog and include a link to a Web site.
• Trackback spam: spammers develop scripts that use trackback links to place spam on blogs. A blog’s trackback feature lets readers automatically
notify a site that they have linked to its pages. Trackback spam are links to random Web sites, many of which “are things
you don’t necessarily want to see” as the blog host or participant, Baumgart says.
• Spam blogs, or splogs: Spammers take advantage of services like Blogspot to set up free blogs that exist only to point visitors to Web sites. Not
only are these sites annoying to visitors looking for legitimate information on a topic, Baumgart says, but they also pollute
the results of search engines that index the sites.
There are some tools available to help blog hosts combat this unwanted, unrelated input. Certain blog platforms include administration
tools to block certain IP addresses from adding comments, although Baumgart adds spammers tend to use a range of IP addresses
so blocking them one-by-one can become unfeasible. There’s also the no-follow link option, which is a command that can be
embedded in HTML code that tells search engines indexing a blog not to consider a link legitimate, she says.