The explosion of big data has caused far-reaching ripples in the enterprise. Organizations today are faced with unprecedented challenges in sorting, processing and analyzing their data, which has in turn given rise to a new generation of technologies.
One such example is the R statistics language, which was originally developed by noted statisticians Robert Gentleman and Ross Ihaka at the University of New Zealand in 1997. In recent years, R has emerged as a popular language for advanced analytics, and is also central to the emerging data science movement.
David Smith is Vice President of Marketing & Community at Revolution Analytics, a Palo Alto, Calif.-based startup that produces enterprise-ready analytics software built on top of the R statistical language. Follow him on Twitter: @revodavid.
Over two million analysts worldwide use R, and they come from an extremely diverse pool of industries that ranges from journalism to financial services to life sciences. It is widely recognized as the most powerful statistical computing language on the planet and has emerged as the de facto standard tool for students pursuing advanced degrees in statistics. The traction it has gained with tomorrow’s data scientists, coupled with its extensible, flexible nature, has propelled R into the analytics mainstream.
R is used for a variety of functions in some of today’s most well-known organizations, having been incorporated into a wide variety of practices due to its open source nature. Developers build specific packages around R to perform industry-specific functions. What follows is a brief sampling of how some of the world’s best-known brands are using and customizing R to gain insight from their data.
New York Times Co.
The New York Times graphics department has long used R for its data visualization features. R is often used for interactive graphics that reveal patterns, provide context and describe relationships in visual form. The R-based graphics span all departments — from breaking news (the destruction of the Haiti earthquake) to politics (the 2010 U.S. election) to entertainment (Netflix rental habits) to sports (a breakdown of New York Yankees’ player Mariano Rivera’s pitch chart).
Amanda Cox of the New York Times’ graphics department says, “R makes it easy to read data, generate lines and points, and place them where you want them. It’s flexible and quick – which is helpful when you’ve only got two or three hours until deadline.”
Analytics use cases are increasingly varied and complex, which has allowed modern languages like R to emerge as tools of choice for data scientists.
Leading online travel site
is another company that frequently uses R. In the highly competitive travel market, the stakes are higher than for a traditional Web search engine. If a customer chooses the first-listed hotel in a search for accommodations, and is dissatisfied with their stay, Orbitz will soon have an unhappy customer. It’s imperative for Orbitz to optimize their hotel search results for customer satisfaction.
To do so, Orbitz uses R to perform statistical analysis on data stored in Hadoop and extracts it with Hive. After extracting data including customer hotel booking records and user ratings of hotels from Hive, the Orbitz team uses statistical analysis to identify the best hotel to promote to the top of the list for each new booking.
Popular online dating site OKCupid collects a lot of data. With over three million members, many of whom have provided extensive detail on their preferences, lifestyle, sexuality and hobbies via their dating profiles, they have a wealth of information upon which to identify trends about the love lives of a typical OKCupid! member.
OKCupid! uses R on its OKTrends blog, which reports aggregate trends and insights, such as the differences in preferences between races, how the behaviors of gay members are at odds with some pernicious stereotypes and how religion relates to reading and writing levels. R is used for both its analytic and visualization capabilities on OKTrends.
The above examples are but a few examples of how big data is transforming analytics in the enterprise and beyond. Analytics use cases are increasingly varied and complex, which has allowed modern languages like R to emerge as tools of choice for data scientists. While big data certainly presents a challenge to organizations as far as processing and analyzing their information is concerned, it offers an equally large payoff in terms of additional insight that can be gleaned.
R photo by Christopher Woo