Home Hadoop Needs Better Bridges to Fulfill the Big Data Promise

Hadoop Needs Better Bridges to Fulfill the Big Data Promise

Hadoop is designed to store big data cheaply on a distributed file system across commodity servers. How you get that data there is your problem. And it’s a surprisingly critical issue because Hadoop isn’t a replacement for existing infrastructure, but rather a tool to augment data management and storage capabilities. Data, therefore, will be continually going in and out.

Beyond Basic Tools

Basic tools exist, of course: Since Hadoop came into being, simple commands like Hadoop Copy have enabled a very straightforward and slow way to get data into Hadoop. And there’s Apache Sqoop, which is built expressly for getting data within a relational database management system (RDBMS) in and out of Hadoop.

But Sqoop has limitations of its own. It works, but it uses low-level MapReduce jobs to accomplish the work, which introduces a lot of complexity and (since MapReduce is done in batch jobs) time to data import and export jobs. It might be possible to take the time, of course, and dump your data into Hadoop just the once, but that assumes that Hadoop will be completely replacing your data storage infrastructure.

This is the near-forgotten side of big data: properly placing Hadoop within existing infrastructure so data is stored cheaply, but still quickly accessible for analysis. It is here that data integration tools must play a role as the bridge between existing data stores, analytics and business intelligence tools on one side, and Hadoop on the other.

Pervasive Software is a recent entrant to the Hadoop space, but not to the field of data integration: The Pervasive Data Integrator is no stranger to those who move in data circles. Earlier this month, the Austin-based company announced a Hadoop edition of its product that enables users to roll data from more than 200 sources into Hadoop’s Distributed File System (HDFS) or HBase, the Big Table-type NoSQL database that runs atop Hadoop.

A Visual Approach

Unlike Sqoop, Pervasive uses a visual approach to integrating data.

“It’s a mapping problem,” described Pervasive CTO Mike Hoskins, detailing a story of how even in development, one of Pervasive’s developers was able to perform an off-the-cuff data integration of 50,000 rows of data from an Oracle database to Hadoop in seconds… and that included the time it took to visually map tables in Oracle to Hadoop.

“He just mapped the tables, set the filters and constraints, set the target and clicked go,” Hoskins said.

Hoskins has a vested interest in talking up Pervasive, of course, but his company’s software is part of a growing class of data integration software geared to work with Hadoop and its ecosystem of big data tools. Among these are Talend’s Open Studio and Enterprise Data Integration products, as well as Pentaho’s Kettle.

Data integration tools like these will make transitioning to Hadoop a lot easier up front, along with extracting data for further analysis with tools outside Hadoop. And they will be necessary if Big Data is to fulfill its promise of making it easier to understand the meanings and patterns hidden in complex information.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.