Home Map Reduce your Inbox: Yahoo Mail is Fighting Spam with Big Data

Map Reduce your Inbox: Yahoo Mail is Fighting Spam with Big Data

Is there a way to defeat spam? Late last week, the Yahoo Mail team shared news from an independent study that users of the Yahoo Mail receive significantly less spam messages in their email inbox than other competitive services.

We caught up with Vish Ramarao, anti-spam guru at Yahoo, to learn how the company was able to achieve these results and whether it is possible to outsmart spammers using more capable filters.

The Study

Here are the statistics supplied by the Yahoo team.

“The Fraunhofer Institute, an independent research firm, found that Yahoo! Mail users saw the least amount of spam out of the five providers tested, with nearly 40% less spam than Hotmail and 55% less spam than Gmail – meaning Gmail users in the study saw more than twice as much spam as Yahoo! Mail users.”

It is noted that Yahoo spam filtering processes reduce 99% of the spam for the 300 million account holders, adding up to over 120 billion blocked spam messages per month.

Spam is Polymorphic – Algorithms Need A Grid To Keep Up

Ramarao shared with us the approach that Yahoo has implemented that consists of analyzing both historical in present data to find spam patterns.

What we learned is that spam delivery is increasingly complex. Spammers are increasingly turning to “reputation bots” that help fight negative reports from users. The spammers have organized their systems to break the filtering routines, black lists, and reputation mechanics that have been employed to date.

Yahoo turned to building a better knowledge base, or in this case a broader and more available information set. By enabling the Map Reduce functionality of Hadoop, the company is able to perform ad hoc queries across broader grid of header data on email to find patterns previously not possible in the filtering process.

The Yahoo mail team recently shared more of the details of their process to use Hadoop and other companion big data technologies to fight the ever changing stream of spam.

The good news is that this approach is providing a new generation of data intelligence tools that can be tuned for real-time algorithms to find patterns previously undetected in the spam arms race.

Map Reducing the raw data provides a path to preparing the data for the real challenges seen in finding patterns in spam. The Yahoo team also shared their insights that this type of approach may also be useful in other security data models, where access to high volumes of data (e.g. logs) may have been impossible in the past but can now be optimized for real time analysis.

What do you think? Will techniques like Map Reduce unleash other good things in our information saturated world?

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.