Home LexisNexis Will Open-Source Its Hadoop Alternative for Handling Big Data

LexisNexis Will Open-Source Its Hadoop Alternative for Handling Big Data

LexisNexis announced today that it will open-source its High Performance Computing Cluster (HPCC) technology, as well as offer an enterprise version with commercial support. The company is positioning HPCC Systems, developed internally by its Risk Solutions unit, as an alternative to Apache Hadoop. A virtual machine for testing purposes will be available soon, and code will be available in a few weeks.

The Risk Solutions unit, less well known than LexisNexis’ legal and media units, was founded 10 years ago. It provides identity verification services to government agencies and private organizations such as banks and insurance companies. According to Armando Escalante, CTO of Risk Solutions, the company started developing HPCC 10 years ago when it found that existing solutions weren’t capable of munging large data sets and returning results fast enough.

Since its development, Risk Services has used HPCC to analyze and find links in large data sets. Its also provided its solutions to intelligence organizations and scientific research laboratories. HPCwire wrote about the technology in 2009:

LexisNexis specializes in data — lots of data — about you, me, and just about every other person in the US that has any kind of digital fingerprint. These data come from thousands of databases about all kinds of transactions and public records that are kept by companies and agencies around the US. But just having the data isn’t very useful; LexisNexis has to be able to access it on behalf of their customers to help them make complex decisions about what businesses to start or stop, what 500,000 people to send a packet of coupons too, or which John Smith living in California to get a search warrant for.

LexisNexis claims HPCC can scale to “thousands of nodes handling petabytes of data and supporting millions of transactions per minute.”

Escalante said he and his team have been watching the devlopment of Hadoop closely for the past few years, and felt the time was right to make the technology available to customers outside of the Risk Solutions base. Only the core technology is being released, LexisNexis’ own data linking techniques aren’t being released, nor are its data sources.

Like Hadoop, HPCC consists of clusters of commodity servers. HPCC consists of three main components:

  • Thor Data Refinery Cluster: the data extraction, transformation and loading system.
  • Roxie Rapid Data Delivery Cluster: a delivery system for querying and datawarehousing. Escalante believes this is a key competitive advantage over Hadoop.
  • ECL (Enterprise Control Language): A declarative programming language developed in C++ for working with HPCC. Escalante says it’s SQL-like, but “not too SQL-like.”

HPCC will be available in two versions: a free open source Community Edition and a commercial Enterprise Edition. The Enterprise Edition will include support, training and some additional tools.

Escalante says the HPCC team has been working with Amazon Web Services to make sure the product work well on AWS servers.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.