<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">


	<channel>
		<title>Tony Bain - ReadWrite</title>
		<link>http://readwrite.com</link>
		<description />
		<language>en</language>
		<copyright>Copyright 2012 SAY Media, Inc.</copyright>
		<managingEditor>readwriteweb@gmail.com</managingEditor>
		<docs>http://blogs.law.harvard.edu/tech/rss</docs> 
		<lastBuildDate>Thu, 12 Feb 2009 07:00:00 -0800</lastBuildDate>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://rww.superfeedr.com/" />

					<item>
				<title><![CDATA[Is the Relational Database Doomed?]]></title>
				<description><![CDATA[<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/database_symbol.jpg" style="" alt="" width="150" height="120" />
	
	
	</span>
Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, "if you want vast, on-demand scalability, you need a non-relational database".</p>

<p>If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor? Is this a sign that relational databases have had their day and will decline over time? In this post, we'll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database.</p>
<p><a href="http://en.wikipedia.org/wiki/Relational_database">Relational databases</a> have been around for over 30 years. During this time, several so-called revolutions flared up briefly, all of which were supposed to spell the end of the relational database. All of those revolutions fizzled out, of course, and none even made a dent in the dominance of relational databases.</p>

<h2>First, Some Background</h2>

<p>A relational database is essentially a group of tables (entities). Tables are made up of columns and rows (tuples). Those tables have constraints, and relationships are defined between them. Relational databases are queried using SQL, and result sets are produced from queries that access data from one or more tables. Multiple tables being accessed in a single query are "joined" together, typically by a criterion defined in the table relationship columns. <a href="http://en.wikipedia.org/wiki/Database_normalization">Normalization</a> is a data-structuring model used with relational databases that ensures data consistency and removes data duplication.</p>

<span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09b.jpg" style="" alt="" width="610" height="337" />
	
	
	</span>


<p>Relational databases are facilitated through <a href="http://en.wikipedia.org/wiki/Relational_Database_Management_Systems">Relational Database Management Systems</a> (RDBMS). Almost all database systems we use today are RDBMS, including those of Oracle, SQL Server, MySQL, Sybase, DB2, TeraData, and so on.</p>

<p>The reasons for the dominance of relational databases are not trivial. They have continually offered the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility in managing generic data.</p>

<p>However, to offer all of this, relational databases have to be incredibly complex internally. For example, a relatively simple SELECT statement could have hundreds of potential query execution paths, which the optimizer would evaluate at run time. All of this is hidden to us as users, but under the cover, RDBMS determines the "execution plan" that best answers our requests by using things like cost-based algorithms.</p>

<h2>The Problem with Relational Databases</h2>

<p>Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. This has not been much of a problem so far because the universal dominance of RDBMS has outweighed the need to push any of these boundaries. Nonetheless, if you really had a need that couldn't be answered by a generic relational database, alternatives have always been around to fill those niches.</p>

<p>Today, we are in a slightly different situation. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability. As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.</p>

<p>Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.</p>

<p>For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that come with relational databases.</p>

<p>These efforts, combined with those of existing niche vendors, have led to the rise of a new breed of database management system.</p>

<p><b><em>Next page: The New Breed</em></b></p>

<!--nextpage-->

<h2>The New Breed</h2>

<p>This new kind of database management system is commonly called a key/value store. In fact, no official name yet exists, so you may see it referred to as document-oriented, Internet-facing, attribute-oriented, <a href="http://en.wikipedia.org/wiki/Distributed_database">distributed database</a> (although this can be relational also), sharded sorted arrays, <a href="http://en.wikipedia.org/wiki/Distributed_Hash_Table">distributed hash table</a>, and key/value database. While each of these names point to specific traits of this new approach, they are all variations on one theme, which we'll call key/value databases.</p>

<p>Whatever you call it, this "new" type of database has been around for a long time and has been used for specialized applications for which the generic relational database was ill-suited. But without the scale that web and cloud applications have brought, it would have remained a mostly unused subset. Now, the challenge is to recognize whether it or a relational database would be better suited to a particular application.</p>

<p>Relational databases and key/value databases are fundamentally different and designed to meet different needs. A side-by-side comparison only takes you so far in understanding these differences, but to begin, let's lay one down:</p>

<span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09c.png" style="" alt="" width="610" height="346" />
	
	
	</span>


<h2>No Entity Joins</h2>

<p>Key/value databases are item-oriented, meaning all relevant data relating to an item are stored within that item. A domain (which you can think of as a table) can contain vastly different items. For example, a domain may contain customer items and order items. This means that data are commonly duplicated between items in a domain. This is accepted practice because disk space is relatively cheap. But this model allows a single item to contain all relevant data, which improves scalability by eliminating the need to join data from multiple tables. With a relational database, such data needs to be joined to be able to regroup relevant attributes.</p>

<span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09d.jpg" style="" alt="" width="398" height="356" />
	
	
	</span>


<p>But while the need for relationships is greatly reduced with key/value databases, certain ones are inevitable. These relationships usually exist among core entities. For example, an ordering system would have items that contain data about customers, products, and orders. Whether these reside on the same domain or separate domains is irrelevant; but when a customer places an order, you would likely not want to store both the customer and product's attributes in the same order item.</p>

<p>Instead, orders would need to contain relevant keys that point to the customer and product. While this is perfectly doable in a key/value database, these relationships are not defined in the data model itself, and so the database management system cannot enforce the integrity of the relationships. This means you can delete customers and the products they have ordered. The responsibility of ensuring data integrity falls entirely to the application.</p>

<span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09e.png" style="" alt="" width="610" height="258" />
	
	
	</span>
<span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09f.png" style="" alt="" width="610" height="219" />
	
	
	</span>


<h2>Key/Value Stores: The Good</h2>

<p>There are two clear advantages of key/value databases to relational databases.</p>

<p><strong>Suitability for Clouds</p></strong>

<p>The first benefit is that they are simple and thus scale much better than today's relational databases. If you are putting together a system in-house and intend to throw dozens or hundreds of servers behind your data store to cope with what you expect will be a massive demand in scale, then consider a key/value store.</p>

<p>Because key/value databases easily and dynamically scale, they are also the database of choice for vendors who provide a multi-user, <a href="http://en.wikipedia.org/wiki/Web_service">web services</a> platform data store. The database provides a relatively cheap data store platform with massive potential to scale. Users typically only pay for what they use, but their usage can increase as their needs increase. Meanwhile, the vendor can scale the platform dynamically based on the total user load, with little limitation on the entire platform's size.</p>

<p><strong>More Natural Fit with Code</strong></p>

<p>Relational data models and Application Code Object Models are typically built differently, which leads to incompatibilities. Developers overcome these incompatibilities with code that maps relational models to their object models, a process commonly referred to as <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">object-to-relational mapping</a>.This process, which essentially amounts to "plumbing" code and has no clear and immediate value, can take up a significant chunk of the time and effort that goes into developing the application. On the other hand, many key/value databases retain data in a structure that maps more directly to object classes used in the underlying application code, which can significantly reduce development time.</p>

<p>Other arguments in favor of this type of data storage, such as "Relational databases can become unwieldy" (whatever that means), are less convincing. But before jumping on the key/value database bandwagon, consider the downsides.</p>

<h2>Key/Value Stores: The Bad</h2>

<p>The inherent constraints of a relational database ensure that data at the lowest level have integrity. Data that violate integrity constraints cannot physically be entered into the database. These constraints don't exist in a key/value database, so the responsibility for ensuring data integrity falls entirely to the application. But application code often carries bugs. Bugs in a properly designed relational database usually don't lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues.</p>

<p>One of the other key benefits of a relational database is that it forces you to go through a data modeling process. If done well, this modeling process create in the database a logical structure that reflects the data it is to contain, rather than reflecting the structure of the application. Data, then, become somewhat application-independent, which means other applications can use the same data set and application logic can be changed without disrupting the underlying data model. To facilitate this process with a key/value database, try replacing the relational data modeling exercise with a class modeling exercise, which creates generic classes based on the natural structure of the data.</p>

<p>And don't forget about compatibility. Unlike relational databases, cloud-oriented databases have little in the way of shared standards. While they all share similar concepts, they each have their own API, specific query interfaces, and peculiarities. So, you will need to really trust your vendor, because you won't simply be able to switch down the line if you're not happy with the service. And because almost all current key/value databases are still in beta, that trust is far riskier than with old-school relational databases.</p>

<p><strong>Limitations on Analytics</strong></p>

<p>In the cloud, key/value databases are usually <a href="http://en.wikipedia.org/wiki/Multitenancy">multi-tenanted</a>, which means that a lot of users and applications will use the same system. To prevent any one process from overloading the shared environment, most cloud data stores strictly limit the total impact that any single query can cause. For example, with SimpleDB, you can't run a query that takes longer than 5 seconds. With Google's AppEngine Datastore, you can't retrieve more than 1000 items for any given query.</p>

<p>These limitations aren't a problem for your bread-and-butter application logic (adding, updating, deleting, and retrieving small numbers of items). But what happens when your application becomes successful? You have attracted many users and gained lots of data, and now you want to create new value for your users or perhaps use the data to generate new revenue. You may find yourself severely limited in running even straightforward analysis-style queries. Things like tracking usage patterns and providing recommendations based on user histories may be difficult at best, and impossible at worst, with this type of database platform.</p>

<p>In this case, you will likely have to implement a separate analytical database, populated from your key/value database, on which such analytics can be executed. Think in advance of where and how you would be able to do that? Would you host it in the cloud or invest in on-site infrastructure? Would latency between you and the cloud-service provider pose a problem? Does your current cloud-based key/value database support this? If you have 100 million items in your key/value database, but can only pull out 1000 items at a time, how long would queries take?</p>

<p>Ultimately, while scale is a consideration, don't put it ahead of your ability to turn data into an asset of its own. All the scaling in the world is useless if your users have moved on to your competitor because it has cooler, more personalized features.</p>

<p><b><em>Next page: Cloud-Service Contenders</em></b></p>

<!--nextpage-->

<h2>Cloud-Service Contenders</h2>

<p>A number of web service vendors now offer multi-tenanted key/value databases on a pay-as-you-go basis. Most of them meet the criteria discussed to this point, but each has unique features and varies from the general standards described thus far. Let's take a look now at particular databases, namely SimpleDB, Google AppEngine Datastore, and SQL Data Services.</p>

<p><strong>Amazon: SimpleDB</strong></p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/aws-logo.jpg" style="" alt="" width="145" height="60" />
	
	
	</span>
<a href="http://aws.amazon.com/simpledb/">SimpleDB</a> is an attribute-oriented key/value database available on the Amazon Web Services platform. SimpleDB is still in public beta; in the meantime, users can sign up online for a "free" version -- free, that is, until you exceed your usage limits.</p>

<p>SimpleDB has several limitations. First, a query can only execute for a maximum of 5 seconds. Secondly, there are no data types apart from strings. Everything is stored, retrieved, and compared as a string, so date comparisons won't work unless you convert all dates to ISO8601 format. Thirdly, the maximum size of any string is limited to 1024 bytes, which limits how much text (i.e. product descriptions, etc.) you can store in a single attribute. But because the schema is dynamic and flexible, you can get around the limit by adding "ProductDescription1," "ProductDescription2," etc. The catch is that an item is limited to 256 attributes. While SimpleDB is in beta, domains can't be larger than 10 GB, and entire databases cannot exceed 1 TB.</p>

<p>One key feature of SimpleDB is that it uses an <a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html">eventual consistency model</a>.This consistency model is good for concurrency, but means that after you have changed an attribute for an item, those changes may not be reflected in read operations that immediately follow. While the chances of this actually happening are low, you should account for such situations. For example, you don't want to sell the last concert ticket in your event booking system to five people because your data wasn't consistent at the time of sale.</p>

<p><strong>Google AppEngine Data Store</strong></p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/appengine_lowres.jpg" style="" alt="" width="100" height="79" />
	
	
	</span>
<a href="http://code.google.com/appengine/docs/python/datastore/">Google's AppEngine Datastore</a> is built on BigTable, Google's internal storage system for handling structured data. In and of itself, the AppEngine Datastore is not a direct access mechanism to BigTable, but can be thought of as a simplified interface on top of BigTable.</p>

<p>The AppEngine Datastore supports much richer data types within items than SimpleDB, including list types, which contain collections within a single item.</p>

<p>You will almost certainly use this data store if you plan on building applications within the Google AppEngine. However, unlike with SimpleDB, you cannot currently interface with the AppEngine Datastore (or with BigTable) using an application outside of Google's web service platform.</p>

<p><strong>Microsoft: SQL Data Services</strong></p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09g.jpg" style="" alt="" width="150" height="40" />
	
	
	</span>
<a href="http://www.microsoft.com/azure/data.mspx">SQL Data Services</a> is part of the Microsoft <a href="http://www.microsoft.com/azure/default.mspx">Azure</a> Web Services platform. The SDS service is also in beta and so is free but has limits on the size of databases. SQL Data Services is actually an application itself that sits on top of many SQL servers, which make up the underlying data storage for the SDS platform. While the underlying data stores may be relational, you don't have access to these; SDS is a key/value store, like the other platforms discussed thus far.</p>

<p>Microsoft seems to be alone among these three vendors in acknowledging that while key/value stores are great for scalability, they come at the great expense of data management, when compared to RDBMS. Microsoft's approach seems to be to strip to the bare bones to get the scaling and distribution mechanisms right, and then over time build up, adding features that help bridge the gap between the key/value store and relational database platform.</p>

<h2>Non-Cloud Service Contenders</h2>

<p>Outside the cloud, a number of key/value database software products exist that can be installed in-house. Almost all of these products are still young, either in alpha or beta, but most are also open source; having access to the code, you can perhaps be more aware of potential issues and limitations than you would with close-source vendors.</p>

<p><strong>CouchDB</strong></p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09h.jpg" style="" alt="" width="91" height="60" />
	
	
	</span>
<a href="http://couchdb.apache.org/">CouchDB</a> is a free, open-source, document-oriented database. Derived from the key/value store, it uses JSON to define an item's schema. CouchDB is meant to bridge the gap between document-oriented and relational databases by allowing "views" to be dynamically created using JavaScript. These views map the document data onto a table-like structure that can be indexed and queried.</p>

<p><a href="http://wiki.apache.org/couchdb/Configuring_distributed_systems">At the moment, CouchDB</a> isn't really a distributed database. It has replication functions that allow data to be synchronized across servers, but this isn't the kind of distribution needed to build highly scalable environments. The CouchDB community, though, is no doubt working on this.</p>

<p><strong>Project Voldemort</strong></p>

<p><a href="http://project-voldemort.com/">Project Voldemort</a> is a distributed key/value database that is intended to scale horizontally across a large numbers of servers. It spawned from work done at LinkedIn and is reportedly used there for a few systems that have very high scalability requirements. Project Voldemort also uses an eventual consistency model, based on Amazon's.</p>

<p>Project Voldemort is very new; its website went up in only the last few weeks.</p>

<p><strong>Mongo</strong></p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09i.gif" style="" alt="" width="140" height="50" />
	
	
	</span>
<a href="http://www.mongodb.org">Mongo</a> is the database system being developed at 10gen by Geir Magnusson and Dwight Merriman (whom you may remember from DoubleClick). Like CouchDB, Mongo is a document-oriented JSON database, except that it is designed to be a true object database, rather than a pure key/value store. Originally, 10gen focused on putting together a complete web services stack; more recently, though, it has refocused mainly on the Mongo database. The beta release is scheduled for mid-February.</p>

<p><strong>Drizzle</strong></p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/relational_database_feb09j.png" style="" alt="" width="150" height="54" />
	
	
	</span>
<a href="https://launchpad.net/drizzle">Drizzle</a> can be thought of as a counter-approach to the problems that key/value stores are meant to solve. Drizzle began life as a spin-off of the MySQL (6.0) relational database. Over the last few months, its developers have removed a host of non-core features (including views, triggers, prepared statements, stored procedures, query cache, ACL, and a number of data types), with the aim of creating a leaner, simpler, faster database system. Drizzle can still store relational data; as Brian Aker of MySQL/Sun puts it, "There is no reason to throw out the baby with the bath water." The aim is to build a semi-relational database platform tailored to web- and cloud-based apps running on systems with 16 cores or more.</p>

<h2>Making a Decision</h2>

<p>Ultimately, there are four reasons why you would choose a non-relational key/value database platform for your application:</p>

<ol><li>Your data is heavily document-oriented, making it a more natural fit with the key/value data model than the relational data model.</li>

<li>Your development environment is heavily object-oriented, and a key/value database could minimize the need for "plumbing" code.</li>

<li>The data store is cheap and integrates easily with your vendor's web services platform.</li>

<li>Your foremost concern is on-demand, high-end scalability -- that is, large-scale, distributed scalability, the kind that can't be achieved simply by scaling up.</li></ol>

<p>But in making your decision, remember the database's limitations and the risks you face by branching off the relational path.</p>

<p>For all other requirements, you are probably best off with the good old RDBMS. So, is the relational database doomed? Clearly not. Well, not yet at least.</p>
<p><em>Top image by <a href="http://www.flickr.com/photos/timothymorgan/75593157/">Tim Morgan</a></em></p>]]></description>
				<link>http://readwrite.com/2009/02/12/is-the-relational-database-doomed</link>
				<guid>http://readwrite.com/2009/02/12/is-the-relational-database-doomed</guid>
				<category>enterprise</category>
				<pubDate>Thu, 12 Feb 2009 07:00:00 -0800</pubDate>
				<author>Tony Bain</author>
			</item>
					<item>
				<title><![CDATA[Kickfire: Data Analytics for the Masses]]></title>
				<description><![CDATA[<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/data_analytics_jan09a.gif" style="" alt="" width="131" height="101" />
	
	
	</span>
You may not realize it, but the data analytics market is buzzing. There are new vendors emerging, new products popping up, new deals being done, and several new strategies being pursued. Vendors are predominately chasing big data, with battles lines being drawn by solution providers that cater to between roughly 100 TB and 10 PB data sets. The battle was inevitable because the world is producing data at a phenomenal rate, and we have an increasing need to analyze them within shorter time frames. In this post we analyze one of these vendors, Kickfire.</p>
<p>Yet while the big names in town are capturing the headlines, in reality only a small percentage of businesses today need to be able to analyze petabytes of data. Today, the rest of us are more likely to deal with analytic data sets in the 50 GB to 3 TB range.</p>

<p><a href="http://www.kickfire.com">Kickfire</a> is interesting because it has decided to let the other vendors fight it out for the massive data volumes. Instead, it has focused on a relatively untapped segment: the MySQL database market or, more correctly, the market that MySQL serves.</p>

<p>The bulk of MySQL installs are for Web 2.0 and web-related applications (i.e. applications based on the <a href="http://en.wikipedia.org/wiki/LAMP_(software_bundle)">LAMP</a> stack), and these applications usually aren't set up to manage industrial-sized data sets. Instead, they often have gigabytes or a few terabytes of data, but analyzing that data is just as important to their owners. However, like many transaction-oriented databases, MySQL doesn't perform very well when you run analytics-style queries, even on mid-sized data sets. Customers often find that running complex ad-hoc queries that aggregate data across many rows is very time-consuming, and the lack of certain features, such as query parallelism, diminishes MySQL's appeal.</p>

<p>Kickfire's solution is to use MySQL as the base, because this gives its customers the ability to easily migrate to Kickfire but replace MySQL's storage engine with their own column store engine. Under the covers, the <a href="http://en.wikipedia.org/wiki/Column_store">column store</a> structures data based on the columns in a table, rather than the traditional method based on rows in a table. This structure has been found to achieve better compression and better ad-hoc query performance because only the columns being queried -- not all of the rows -- need to be scanned. The column store is also used by <a href="http://www.vertica.com/">Vertica</a> and was popularized by its founder, the well-known database researcher <a href="http://en.wikipedia.org/wiki/Michael_Stonebraker">Michael Stonebraker</a>.</p>

<span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/data_analytics_jan09b.jpg" style="" alt="" width="582" height="229" />
	
	
	</span>


<p>But Kickfire doesn't end there. It goes one step further by adding a proprietary "SQL Chip" co-processor to further enhance its product's performance. Kickfire has replaced the MySQL query optimizer (the component that takes an SQL statement and splits it into a series of operators for processing) to produce operators that can be sent directly to its SQL Chip for processing. So, rather than running these operators on a general-purpose CPU, which has to convert them into a series of regular CPU instructions and then muck around loading the data into registers from memory, the optimizer instead sends them to the SQL Chip, which natively understands them and processes them on data streamed directly from memory.</p>

<p><span class="embedded-Media-image img-caption-c ">
	
			<img src="http://readwrite.com/files/files/files/images/data_analytics_jan09c.gif" style="" alt="" width="300" height="160" />
	
	
	</span>
Kickfire's solution is bundled as a data warehouse "appliance," which is made up of two physical servers: one conventional server running MySQL 5.1, and the other connected via PCIe, which is used to offload processing to the SQL Chip. The underlying capabilities that Kickfire adds remain largely transparent in terms of the user's interaction via SQL code, because Kickfire hasn't changed the MySQL syntax that its customers are already familiar with.</p>

<p><strong><em>Page 2: Is the Performance Advantage Real?</em></strong></p>

<!--nextpage-->

<h2>Is the Performance Advantage Real?</h2>

<p>Kickfire took a major step for a small vendor last year by passing the most credible benchmark in data warehousing: the TPC-H. For those not familiar with it, <a href="http://www.tpc.org/information/about/abouttpc.asp">TPC</a> is a non-profit association whose benchmarks require all vendors to use the same workload, thereby producing comparable results. Of course, performance is a key measure; but perhaps the more important benchmark result is the price vs. performance ratio. While a vendor could throw large amounts of hardware at a workload to produce high-performance results, making it cost-effective has always been the biggest challenge. TPC-H rates the vendors' solutions using its own metric, called the "Composite Query-per-Hour Performance Metric" (QphH). The metric is a comparative measure of effective query throughput and processing power relative to data warehouse workloads.</p>

<p><a href="http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=all">Kickfire's results</a> are impressive. For a 300 GB workload, it currently ranks fourth on the performance list (and second on the non-clustered performance list). But what stands out is its price/performance ratio. Kickfire has the lowest cost per QphH of any vendor, at $0.89. Compare that to the fifth-placed product on the performance list, an SQL server-based solution that costs $5.40 per QphH, and the sixth-placed product, an Oracle-based solution that costs $18.67 per QphH!</p>

<h2>Web 2.0 Success</h2>

<p>Kickfire <a href="http://www.kickfire.com/blog/?p=73">announced last week</a> on its blog that it had shipped its first appliance to a Web 2.0 customer. As many Web 2.0 businesses are finding out, killer features alone do not determine success or failure. Success also depends on how well the vendor understands its customers' evolving needs and how it generates revenue by addressing those needs. Kickfire's Web 2.0 customer is using the appliance to do click-stream analysis to better understand its own users' behavior, so that it can target relevant advertising offers. According to Kickfire, the customer has around 500 GB of imported data, but this is expected to grow at a rate of 1 GB per day.</p>

<p>The traditional enterprise space is less of a focus for Kickfire at the moment, partly because that space is already relatively well served by specialized offerings, but also because MySQL has less of a presence there. While Sun is pushing MySQL to break through those enterprise walls, most corporate data platforms remain largely the domain of Oracle, Microsoft, IBM, and niche vendors, such as Teradata. If Sun does manage to break through, then the road will be paved for Kickfire to follow.</p>

<p>Kickfire is a stand-alone solution and can of course be loaded with data from any data source, but the ease of adoption for existing MySQL customers, combined with its strong price/performance ratio, makes Kickfire a compelling option for Web 2.0 businesses looking to add a data warehouse platform to improve their analytics capabilities.</p>]]></description>
				<link>http://readwrite.com/2009/01/22/data-analytics-for-the-masses</link>
				<guid>http://readwrite.com/2009/01/22/data-analytics-for-the-masses</guid>
				<category>enterprise</category>
				<pubDate>Thu, 22 Jan 2009 12:30:01 -0800</pubDate>
				<author>Tony Bain</author>
			</item>
			</channel>
</rss>

