For a total of 17 years, Karen Padir was an executive at Sun Microsystems, and was present for that company's astounding transition to open source technologies. She was an advocate for MySQL when Sun acquired that project, and then later when Oracle acquired Sun, promising the "Dolphin" faithful that good times still lay ahead. But then she left, to lead the marketing effort for MySQL's principal key competition in open source-derived databases, EnterpriseDB - the commercial provider of Postgres Plus.
Now, Padir is seeing another new and astonishing transition in her field: the open source development community's move towards less structured, higher capacity databases. So last week, EnterpriseDB started building a bridge to Hadoop, the cloud-oriented database born from Yahoo, launching a private beta program for the innocuously-entitled Postgres Plus Connector for Hadoop.
Bridging the structured table with the data store
In an interview with ReadWriteWeb, EnterpriseDB Vice President for Product Marketing Karen Padir noted how the rise of mobile apps capable of customizing their own services and service levels on the fly, based on your smartphone's location as well as the locations of others you may be talking to, has resulted in a data glut. "There's these huge amounts of data, and industries are trying to figure out, 'How do we make sense of it all?' And some of the data is important for legal reasons, you have to make sure you keep track of all this data."
In some cases, users may want to take subsets of data from a structured database, and analyze it in real-time in a more agile, unstructured scenario. This is the job EnterpriseDB has in mind for its Hadoop Connector: enabling users to perform MapReduce functions at Hadoop speed on Postgres Plus structured tables that were designed for more formal, less scalable unions and joins.
Conversely, the same Connector can be used to move subsets of Hadoop data into the Postgres Plus environment for analysis. She offers an example: "Say you have a whole bunch of unstructured data that is being managed within an Hadoop cluster -- all this data you're collecting from Tweets, data that's part of the digital footprint you're creating as you live your life. Now say in your data center, in your IT department, your employee database has tables of employees, their geography, location, telephone numbers. And you're trying to run a query about where your employees are traveling, or their degree of cell phone activity if you're [processing] their bills. You want to correlate every single employee, but you don't want to change your application and everything you've already done before to make that happen. So this allows you to connect and combine those two things: applications in your existing toolset, with unstructured data."
In the Hadoop environment, the service that loads jobs onto data processing nodes that can be executed in parallel, is called Pig. EnterpriseDB's Connector, from its own point of view, appears to be just another JDBC driver that passes the results of a SQL query on to something else. But from the Hadoop standpoint, it's Pig that receives the results, as a dataset that may be processed in parallel using MapReduce.
Karen Padir will be moderating a webcast demonstrating the Hadoop Connector at work, currently scheduled for Tuesday, November 29, at 11:00 am EST. Sign up here to register for this webcast.
After Sun, there's light
Padir's career with Sun started in the engineering group, with everyday tasks like support. The first opportunity she could, she moved to the Java group, which was driven at that time by what she characterizes as not a movement, but an initiative.
"In that time in 1995, we were looking at [Java] as changing the way people develop software and write programs for enterprises," she tells RWW. "Learning with all those experiences, how do you make a technology ubiquitous in the enterprise? We made lots of mistakes at Sun, but we did a lot of things right. We knew how to partner with people to create technology.
"And then this open source business happened," she continues. She knew Sun needed to move to open source models if it was to stay competitive. It was 2004, and Sun couldn't quite do it. So she switched companies for about a year, moving to Red Hat. In comparing the way the two companies operated, she came to an enlightening conclusion - one which her competitors even today have yet to realize.
"You're never going to be able to employ the smartest people on the planet, all of them," Padir states. But a company can bring them together, if it's willing to do so outside its gateway.
"There are lots of angles when you're creating a product for a company, that you can optimize around. But when you're not bound by that, you're looking at, what are the core things that developers need? That's where having people across the planet, folks who innovate together, who are not bound by business models, company objectives, geography or politics, [becomes] definitely one of the bigger advantages."
Padir learned community building from Red Hat, and then took that knowledge back to Sun. Now open source was becoming more than an initiative, and Sun was opening up practically its entire portfolio. It was almost a full-service open source technology provider, but it lacked a general-purpose database.
"There are very few communities that can be bought," admits Padir, "but that [MySQL] was one that could." She participated in the due diligence processes for acquiring MySQL, and later in integrating the MySQL community with Sun's. She began participating in the MySQL community's regular "State of the Dolphin" message to its stakeholders. Oracle acquired Sun, and Padir kept the faith. For a while, it seemed all the major open source middleware initiatives had been done.
"Postgres was going to be the next great wave," she says, which led her to join her former Red Hat colleagues at EnterpriseDB. She knew that companies' administrative talents were all skilled with SQL, so the demand for technologies that addressed this skill set would not disappear even if the underlying platforms changed. But the underlying platforms were indeed changing - moving to cloud-based configurations, and it didn't seem MySQL was moving with them. Part of the problem there was the mindset split over the role of open source developers in building what eventually becomes commercial software.
"Clearly there are folks who are religious about it. They're absolutely dead-set religious; everything has to be open source... or they're not going to work with it. There are also these folks who are of the [mindset] that says, 'I need tools to make my work more effective, but I don't want them to be something that will end up forcing somebody into going down some closed-source road where they're going to be stuck.' This tool, the Connector, is a bridge between two open source things. So it's not really one that will end up locking somebody in to a closed source, proprietary implementation that dictates the spending future for your company."