Home How Facebook Scales with Open Source

How Facebook Scales with Open Source

When it comes to having to scale your service to handle ever-increasing data demands, there may be no better example to look at than Facebook. The social networking giant has grown from a service for Harvard students to some 500 million users.

But that number doesn’t adequately capture the company’s scaling and storage demands. So here are a few more statistics: All told, users spend 8 billion minutes on the site every day. There are 3.5 billion pieces of content shared weekly. 2.5 billion photos are uploaded every month, and 1.2 million photos are served up every second. And as 70% of Facebook users are outside the United States, the amount of data served and stored is further complicated by the locations of users and data centers.

It’s no surprise, then, that some of the traditional ways of scaling cannot work for Facebook. You can’t simply shard the database based on a user’s location or based on what information she or he will be accessing, as users are interconnected in way that is both unpredictable and global.

As Facebook has grown, the company has worked to develop a number of tools to handle this data, both in terms of the storage and the delivery of content, and it has open sourced many of these. Facebook has been built from the beginning on open source technologies, according to David Recordon, Facebook’s Open Source Programs Manager. But Facebook’s use of open source goes far beyond the LAMP stack (or even, beyond the LAMP stack plus Memcached). The company has also created and released several open source projects and participates heavily in others, most notably perhaps, Hadoop.

Here are a few of Facebook’s open source tools that have helped it handle the massive amount of data:

Cassandra

Now an Apache Software Foundation project, Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Originally developed to help power Facebook’s Inbox search, Cassandra is one of a growing number of NoSQL database solutions.

Hive

Also an Apache project, Hive is data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets. Hive provides a simple query language called Hive QL, which as it’s based on SQL enables those familiar with SQL to query this data.

Here is an excerpt from a presentation at FOSDEM earlier this year:

HipHop

In order to save resources for its servers, Facebook developed HipHop, which transforms PHP source code into a highly optimized C++. HipHop was open sourced earlier this year.

Scribe

Facebook logs approximately 25 terrabytes of data a day. No surprise, perhaps, other tools weren’t able to scale to that capacity, so Facebook developed Scribe to log data streamed in real time from a large number of servers.

Thrift

Currently an Apache incubator project, Thrift provides a framework for scalable cross-language services development in C++, Java, Python, PHP, and Ruby.

Recordon notes that many of Facebook’s new technologies started as “a hack project,” and the company is working not only to develop new tools internally, but to encourage their development and usage externally as well. So while few companies may have the massive data storage and scaling needs that Facebook does, the company’s open source efforts work towards building a “collaborative sustainable model” of development.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.