Home Hadoop Needs Better Bridges to Fulfill the Big Data Promise

Hadoop Needs Better Bridges to Fulfill the Big Data Promise

Hadoop is designed to store big data cheaply on a distributed file system across commodity servers. How you get that data there is your problem. And it’s a surprisingly critical issue because Hadoop isn’t a replacement for existing infrastructure, but rather a tool to augment data management and storage capabilities. Data, therefore, will be continually going in and out.

Beyond Basic Tools

Basic tools exist, of course: Since Hadoop came into being, simple commands like Hadoop Copy have enabled a very straightforward and slow way to get data into Hadoop. And there’s Apache Sqoop, which is built expressly for getting data within a relational database management system (RDBMS) in and out of Hadoop.

But Sqoop has limitations of its own. It works, but it uses low-level MapReduce jobs to accomplish the work, which introduces a lot of complexity and (since MapReduce is done in batch jobs) time to data import and export jobs. It might be possible to take the time, of course, and dump your data into Hadoop just the once, but that assumes that Hadoop will be completely replacing your data storage infrastructure.

This is the near-forgotten side of big data: properly placing Hadoop within existing infrastructure so data is stored cheaply, but still quickly accessible for analysis. It is here that data integration tools must play a role as the bridge between existing data stores, analytics and business intelligence tools on one side, and Hadoop on the other.

Pervasive Software is a recent entrant to the Hadoop space, but not to the field of data integration: The Pervasive Data Integrator is no stranger to those who move in data circles. Earlier this month, the Austin-based company announced a Hadoop edition of its product that enables users to roll data from more than 200 sources into Hadoop’s Distributed File System (HDFS) or HBase, the Big Table-type NoSQL database that runs atop Hadoop.

A Visual Approach

Unlike Sqoop, Pervasive uses a visual approach to integrating data.

“It’s a mapping problem,” described Pervasive CTO Mike Hoskins, detailing a story of how even in development, one of Pervasive’s developers was able to perform an off-the-cuff data integration of 50,000 rows of data from an Oracle database to Hadoop in seconds… and that included the time it took to visually map tables in Oracle to Hadoop.

“He just mapped the tables, set the filters and constraints, set the target and clicked go,” Hoskins said.

Hoskins has a vested interest in talking up Pervasive, of course, but his company’s software is part of a growing class of data integration software geared to work with Hadoop and its ecosystem of big data tools. Among these are Talend’s Open Studio and Enterprise Data Integration products, as well as Pentaho’s Kettle.

Data integration tools like these will make transitioning to Hadoop a lot easier up front, along with extracting data for further analysis with tools outside Hadoop. And they will be necessary if Big Data is to fulfill its promise of making it easier to understand the meanings and patterns hidden in complex information.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the gambling and blockchain industries for major developments, new product and brand launches, game releases and other newsworthy events. Editors assign relevant stories to in-house staff writers with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest iGaming headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Gambling News

    Explore the latest in online gambling with our curated updates. We cut through the noise to deliver concise, relevant insights, keeping you informed about the ever-changing world of iGaming and its most important trends.

    In-Depth Strategy Guides

    Elevate your game with tailored strategies for sports betting, table games, slots, and poker. Learn how to maximize bonuses, refine your tactics, and boost your chances to beat the house.

    Unbiased Expert Reviews

    Honest and transparent reviews of sportsbooks, casinos and poker rooms crafted through industry expertise and in-depth analysis. Delve into intricacies, get the best bonus deals, and stay ahead with our trustworthy guides.