Eric Day

Thoughts, code, and other oddments.
Dark | Light

< || >

Thoughts on “NoSQL”

March 26th, 2010

I’ve decided to jump on the bandwagon and spill my thoughts on “NoSQL” since it’s been such a hot topic lately ([1], [2], [3], [4]). Since I work on the Drizzle project some folks would probably think I take the SQL side of the “debate,” but actually I’m pretty objective about the topic and find value in projects on both sides. Let me explain.

Last November at OpenSQL Camp I assembled a panel to debate “SQL vs NoSQL.” We had folks representing a variety of projects, including Cassandra, CouchDB, Drizzle, MariaDB, MongoDB, MySQL, and PostgreSQL. Even though I realized this was a poor name for such a panel, I went with it anyways because this “debate” was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term “NoSQL” is a bit misleading. I talked with Eric Evans (one of my new co-workers over on the Cassandra team) who reintroduced the term, and even he admits it is vague and doesn’t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah.

One reason for all this confusion is that for some people, the term “database” equates to “relational database.” This makes the non-relational projects look foreign because they don’t fit the database model that became “traditional” due it’s popularity. Anyone who has ever read up on other database models would quickly realize relational is just one of many models, and many of the “NoSQL” projects fit quite nicely into one of these categories. The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the “SQL” side should really take the time to study them – there are some tricks to learn.

Square Peg, Round Hole

One of the main criticisms of the “NoSQL” projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins. I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it’s a hard problem. People have worked around this by accepting new data models and consistency levels. It all depends on what your problem requires.

I fully embrace the “NoSQL” projects out there, there is something we can all learn from them even if we don’t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset “peg” into the relational “hole.” Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember:

Dear everyone who is not Facebook: You are not Facebook.

Posted in Drizzle, Main, MySQL

3 Responses to "Thoughts on “NoSQL”"

  1. Do you see a possibility of NoSQL databases becomes a storage engine of Drizzle?
    Could we get the best properties of both the worlds?
    A typical enterprise applications got entities(tables) upto 2000 to 5000.
    will it be feasible to manage without a proper SQL interface?.

  2. Eric Day says:

    I think Drizzle acting as a SQL front end for at least one of them is very possible, in fact it’s a side project by one of the Drizzle developers. I don’t think you need SQL by any means, it just certainly helps to get started because it can them be accessible to a wider range developers. Once they get familiar and understand the constraints forcing SQL requires (if any), they can then poke under the hood more and use one of the more native request methods (like Thrift serialization for Cassandra).

  3. Adrian Otto says:

    Jobin Augustine,

    Rackspace will integrate Drizzle and Cassandra together, although it’s a much lower priority than focusing on the fundamentals within each system first. Basic things will work fine in a setup where Cassandra acts as a simple storage engine for Drizzle. This would allow Cassandra to benefit from an abbreviated SQL interface and as Eric is saying, does not necessarily preclude you from migrating to native access to the underlying data when your requirements justify it.

    This subject is not quite as simple as it sounds as Cassandra’s data model is not a 1:1 fit with a table/row setup. It uses column families and supercolumns, which is a rather sharp departure from the way today’s developers tend to think about data in relational databases. For simple use cases like an address book or a blog there are mappings that work just fine, and those are the types of things you should expect an integrated solution to do well. Just keep your expectations reasonable.

    Adrian

Leave a Reply


< || >
Blog
Wiki
About
Resume
RSS
Comments

E-Mail
Launchpad
LinkedIn
Twitter
identi.ca
Facebook

OpenStack
Scale Stack
Gearman
NW Veg
Veg Food & Fit

Linux On Laptops