You may not realize it, but the data analytics market is buzzing. There are new vendors emerging, new products popping up, new deals being done, and several new strategies being pursued. Vendors are predominately chasing big data, with battles lines being drawn by solution providers that cater to between roughly 100 TB and 10 PB data sets. The battle was inevitable because the world is producing data at a phenomenal rate, and we have an increasing need to analyze them within shorter time frames. In this post we analyze one of these vendors, Kickfire.
Yet while the big names in town are capturing the headlines, in reality only a small percentage of businesses today need to be able to analyze petabytes of data. Today, the rest of us are more likely to deal with analytic data sets in the 50 GB to 3 TB range.
Kickfire is interesting because it has decided to let the other vendors fight it out for the massive data volumes. Instead, it has focused on a relatively untapped segment: the MySQL database market or, more correctly, the market that MySQL serves.
The bulk of MySQL installs are for Web 2.0 and web-related applications (i.e. applications based on the LAMP stack), and these applications usually aren't set up to manage industrial-sized data sets. Instead, they often have gigabytes or a few terabytes of data, but analyzing that data is just as important to their owners. However, like many transaction-oriented databases, MySQL doesn't perform very well when you run analytics-style queries, even on mid-sized data sets. Customers often find that running complex ad-hoc queries that aggregate data across many rows is very time-consuming, and the lack of certain features, such as query parallelism, diminishes MySQL's appeal.
Kickfire's solution is to use MySQL as the base, because this gives its customers the ability to easily migrate to Kickfire but replace MySQL's storage engine with their own column store engine. Under the covers, the column store structures data based on the columns in a table, rather than the traditional method based on rows in a table. This structure has been found to achieve better compression and better ad-hoc query performance because only the columns being queried -- not all of the rows -- need to be scanned. The column store is also used by Vertica and was popularized by its founder, the well-known database researcher Michael Stonebraker.
But Kickfire doesn't end there. It goes one step further by adding a proprietary "SQL Chip" co-processor to further enhance its product's performance. Kickfire has replaced the MySQL query optimizer (the component that takes an SQL statement and splits it into a series of operators for processing) to produce operators that can be sent directly to its SQL Chip for processing. So, rather than running these operators on a general-purpose CPU, which has to convert them into a series of regular CPU instructions and then muck around loading the data into registers from memory, the optimizer instead sends them to the SQL Chip, which natively understands them and processes them on data streamed directly from memory.
Kickfire's solution is bundled as a data warehouse "appliance," which is made up of two physical servers: one conventional server running MySQL 5.1, and the other connected via PCIe, which is used to offload processing to the SQL Chip. The underlying capabilities that Kickfire adds remain largely transparent in terms of the user's interaction via SQL code, because Kickfire hasn't changed the MySQL syntax that its customers are already familiar with.
Page 2: Is the Performance Advantage Real?
Is the Performance Advantage Real?
Kickfire took a major step for a small vendor last year by passing the most credible benchmark in data warehousing: the TPC-H. For those not familiar with it, TPC is a non-profit association whose benchmarks require all vendors to use the same workload, thereby producing comparable results. Of course, performance is a key measure; but perhaps the more important benchmark result is the price vs. performance ratio. While a vendor could throw large amounts of hardware at a workload to produce high-performance results, making it cost-effective has always been the biggest challenge. TPC-H rates the vendors' solutions using its own metric, called the "Composite Query-per-Hour Performance Metric" (QphH). The metric is a comparative measure of effective query throughput and processing power relative to data warehouse workloads.
Kickfire's results are impressive. For a 300 GB workload, it currently ranks fourth on the performance list (and second on the non-clustered performance list). But what stands out is its price/performance ratio. Kickfire has the lowest cost per QphH of any vendor, at $0.89. Compare that to the fifth-placed product on the performance list, an SQL server-based solution that costs $5.40 per QphH, and the sixth-placed product, an Oracle-based solution that costs $18.67 per QphH!
Web 2.0 Success
Kickfire announced last week on its blog that it had shipped its first appliance to a Web 2.0 customer. As many Web 2.0 businesses are finding out, killer features alone do not determine success or failure. Success also depends on how well the vendor understands its customers' evolving needs and how it generates revenue by addressing those needs. Kickfire's Web 2.0 customer is using the appliance to do click-stream analysis to better understand its own users' behavior, so that it can target relevant advertising offers. According to Kickfire, the customer has around 500 GB of imported data, but this is expected to grow at a rate of 1 GB per day.
The traditional enterprise space is less of a focus for Kickfire at the moment, partly because that space is already relatively well served by specialized offerings, but also because MySQL has less of a presence there. While Sun is pushing MySQL to break through those enterprise walls, most corporate data platforms remain largely the domain of Oracle, Microsoft, IBM, and niche vendors, such as Teradata. If Sun does manage to break through, then the road will be paved for Kickfire to follow.
Kickfire is a stand-alone solution and can of course be loaded with data from any data source, but the ease of adoption for existing MySQL customers, combined with its strong price/performance ratio, makes Kickfire a compelling option for Web 2.0 businesses looking to add a data warehouse platform to improve their analytics capabilities.