Netflix stores 50 different files for every viewable media asset, including 3 copies of every movie, ten years of user ratings, extensive user account info and metadata including complex licensing rights for everything. Audio files, log files, subtitles, etc. A year ago the fast-growing consumer service saw its own data centers melt down under the weight of all that data, Netflix’s Adrian Cockcroft said this morning at O’Reilly’s OSCON Data conference in Portland, Oregon. Now, the company is aggressively moving its huge collection of data into the cloud, especially Amazon services. That work is essential to Netflix’s world-beating ambitions.
Netflix announced this week that it now has 24 million customers across the United States and Canada. Cockcroft says the company is now focused on expanding to 43 countries in Latin America and then going world-wide. Expanding to 43 new countries means a 43X metadata explosion, Cockcroft says today. Netflix’s impact on the entertainment industry and on the web at large has been widely discussed, but hearing about how the company deals with itself, internally, is fascinating.
Today Netflix is stuck in a state of Roman Riding, Cockcroft says, where the rider stands on the backs of two independent horses, holding the reins for both. A significant amount of the company’s data remains in its own data centers, but it’s quickly moving as much as possible into the cloud. Credit card data, for example, is stored in-house, but Netflix is working closely with security compliance specialists to safely move that data onto the cloud.
Everything the company does is duplicated in three different buildings, Cockcroft says. The videos themselves get copied three times and sent to the content delivery networks: Akamai, LimeLight and Level3. Those companies then distribute the videos out to the Internet service providers who serve them up to the customers.
Worldwide placement of cloud data centers for minimum latency, authentication data for customers, data about which test cell each of us are in and what devices our accounts have activated. Bookmarks, personalization, genre preferences. The list goes on and on and none of these types of data are trivial to deal with.
“We’re seeing localization as a space more people are paying attention to, because no one wants to be just US- or Europe-focused,” says Sarah Novotny, OSCON co-chair. “That leads to a data explosion. Multiple languages, formats, everything multiplies. Looking at ways to localize data is probably a big opportunity, especially if you’re not a big international team. Simply the language space is an enormous challenge, not to mention content, addresses and all the other data. There are all sorts of complexities involved and any update in one space has to be updated in N Spaces.”
Regarding Netflix in particular, Novotny says the keynoting company is up in the front of the pack, innovating, has some fascinatingly complex needs and is clearly interested in open sourcing some of the infrastructure they’ve developed. “It’s been really interesting to watch the amount of data they have and the way they have connected that data over the years,” she says, “that explains some of what we get in return as consumers.”
Next time you hear about the massive data demands that Netflix consumers make on the web and mobile networks, perhaps give pause and consider what all that data requires of the people behind the scenes at Netflix, too. International expansion will require even more heavy data lifting than is required today.