Home Talend Opens Up Data Integration Processes to the Community

Talend Opens Up Data Integration Processes to the Community

Here’s an interesting predicament: Suppose your business’ only problem isn’t with the quality of the services your software performs, but just with getting all of its databases to work together. So you’re not out to replace the software, and maybe that would be a bad idea anyway. Getting data and processes built for SAP, Siebel and Tivoli to work together may require input from at least one of these companies – something which neither of them may have a business interest in providing, if only for one customer at a time.

Organizations today are relying on their own IT departments’ scripting abilities to synchronize their data, but the results are almost always too intricate and volatile to bet the integrity of your business on for too long. We’ve talked about the Informatica Cloud as one option. Another cloud-based service relies upon the open source community – literally, from the willingness of contributors to help other folks solve the same problems they’re solving for themselves.

This alternative is called Talend Unified Platform. Version 5, which Talend announced today, extends its cloud’s functionality for live integration of data to not only include business process, but to also manage existing processes and even model new ones using graphical tools.

“Big institutions… are very protective of the secrets that they have within their organizations, because they see it typically as a competitive advantage,” remarks Ciaran Dynes, Talend’s senior director for product management, in an interview with RWW. “One of the things that open source can give you is the ability to innovate the core technology beyond what those people are doing. There is a creative innovation advantage that [businesses] believe they can derive from open source, if they have their own development organizations. So it’s a marriage between an open source vendor, who’s got expertise – and in our case, data integration and data quality – and marrying that with the expertise, the vertical knowledge that they have.”

Contributing processes without divulging secrets

It’s an interesting case, and on the surface it might even sound self-contradictory: Open source gives institutions the means to share the methods they use to adapt and integrate their vital business secrets. It makes more sense (kind of) when you realize that the secrets themselves are not being shared.

The component of Talend’s data integration suite that it calls Data Quality (that’s not a marketing phrase, but the title of the component) compares data sources to one another in an effort to determine what disparate elements appear similar, and whether fields related to those elements can be related to each other. Imagine a set of unrelated databases with the same tables of would-be-related customer names, and you’ll get the idea.

For Talend v5, Dynes tells RWW, the company has added new extensions that enable users to integrate matching algorithms into their data integration workflows. This way, he explains, the process can be geared to check whether postal addresses are formatted properly, or whether the latest known address applies in all instances. Experts in the community who contribute to Data Quality perfect the efficiency and optimization algorithms that continue to be built even after the software is deployed.

Revising revision

Talend’s Enterprise Service Bus component is a graphical environment for building, testing, and deploying RESTful Web services for making integrated data accessible online. This evolution of what Talend had previously called “ESB Studio” actually adds a studio – a richer graphical front end for designing these services without having to use Java.

“That helps an organization that may not have the most skillful Java development team, or simply may not want to use its best Java development resources to do [just] integration,” says Talend’s Dynes. “They want to point-and-click, connect to a database, and expose it as a Web service.”

Talend’s Master Data Management (MDM) tool, he adds, changes the way its users build Web forms for visualizing data as it’s being integrated, and afterward. “This allows different business analysts to get really comfortable with the tools they’re using, because they can be expressed with their own corporate logo, and their own look and feel.”

The biggest challenge in recent months for institutions, Dynes says, is adapting existing databases for use with Hadoop, the system for dividing huge databases over multiple partitions in cloud architectures. Talend has already supported Hadoop to some extent, but with version 5 adds support for Hive data warehousing and Pig analysis, plus better support for Sqoop, Hadoop’s tool for scooping SQL-based data into its non-structured format.

“It’s not that they are looking to replace their traditional relational databases,” he explains, “but they’re seeing that big data is applicable to a certain class of activity or problem that they have today. One of the areas they’re looking at is archiving the data. Major problem for them, because they archive all this data, and it goes into the data warehouse, and thereafter it goes into… well, I wouldn’t call it a ‘graveyard.’ The problem they suffer from is, they’ve got this data but they can’t completely make any use of it for business in near-time [processes].”

So the Data Integration feature for Talend v5 will enable Hadoop’s MapReduce to give users insights into data that Sqoop has imported into a Hive archive. This could significantly change the way institutions archive their data, Dynes predicts, mainly because up to now data has never been archived with the intention of it still being usable. “Because it had such large volume, they could not even put together the analyses. What questions would they even ask, could they even contemplate with this amount of data?”

Ciaran Dynes believes that the levels of problems his company can tackle now, including the redefinition of archiving, could only have been approached from an open source perspective. “There’s proprietary solutions out there, but why would you? The Googles, the Facebooks, the LinkedIns, all of these big, great companies have done phenomenal things in terms of how they’ve impacted not only the technology trade, but society. [And organizations are asking,] ‘If they’re making it work, why can’t we?’ If you follow the logic through, why not?”

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.