According to spam-filtration service BotKiller, Twitter spam comprises up to 3.69 percent of all tweets.
They’ve been working on a solution to cleanse the stream of Twitter spam; their new product tags and blocks computer-generated tweets with a minimal margin of false positives. BotKiller is a product of Rarefied Technologies, an open-source company that implements advanced algorithmic classification for enterprise applications.
According to the BotKiller site, “Petabytes of new information are created daily. This data is meaningless unless we can find what we’re looking for. Everyone has had the experience of search results that are polluted with false keywords and unsolicited advertisements. BotKiller can make those go away and make realtime search relevant again.”
The company claims that its “specialized lexical parsing” can find and block computer-generated content by analyzing the metadata and the conversations and relationships between post authors and the larger network.
Currently, the service is focusing on spam filtering for real-time UGC, and Twitter provides a case-study playground for the the product’s accuracy and effectiveness. Overall, the company cites a 95 percent accuracy rate for spam filtering with a set of false positives equal to less than one percent of all blocked tweets.
Here, we can see a sampling of blocked tweets from a sample of about 3 million tweets:
Clearly, there are still false positives, some of which do appear to be auto-tweets about new blog posts. Rarefied CEO, Gabriel Ortiz, wrote to us in an email this afternoon, “We haven’t really made a decision regarding auto-tweeted blog posts, right now we’re trying to tag the ones that are from obvious spam blogs, such as those selling prescription drugs or promoting multilevel marketing scams, but we’re not yet blocking ALL such blog post tweets, as some of them might be more legitimate.”
According to Ortiz, the BotKiller product is currently just a proof-of-concept for their real-time spam classifier. As such, whether the service will be free for end users or how pricing would be tiered is yet to be determined. “We’re hoping we can partner with someone who has a desktop or mobile Twitter client to deliver the filtering service to users,” he wrote. “It would also make a lot of sense for Twitter themselves to license our solution so they could just mark as private tweets which are likely to be spam, thus keeping them out of the public search and trending topics without stopping people who wish to read messages of that nature from doing so.”