Eric DayThoughts, code, and other oddments. |
Dark | Light |
|
|
|
< Non-blocking State Machines || OpenSQL Camp, SQL vs NoSQL > Eventually Consistent Relational Database?October 12th, 2009This weekend I attended Drupal Camp PDX and listened to a session titled “Drupal in the Cloud”. The presenter, Josh Koenig from Chapter Three, gave a great introduction of what moving to “the cloud” really means, especially in the context of a typical web application like Drupal. The problem, which is of course no fault of Josh’s, is that the best high availability database practices are harder to deploy because you’re working within a different set of constraints in the cloud. Sure, you can setup MySQL replication, but without the ability to insert a hardware load balancer or better control over floating IPs, reliable single-master solutions are difficult at best. I spoke with Josh for a bit after and discussed how Drizzle is doing things to help and what it would take to have a Drizzle back end for Drupal (turns out it should not be too difficult). We then got onto the topic what some of the newer non-relational databases would look like for Drupal, and the short answer is it would be extremely difficult. Drupal, in both the core and many of the modules, depend on a relational model for the underlying data. This is not unique to Drupal. People, and the software they write, have thought “relational” for decades when it comes down to data. Sure, the various NoSQL projects are becoming more popular, but the masses are still thinking in terms of joining tables. Silver Bullet So, what would be the silver bullet? A relational database that did not depend on a single master. Not just dual-master setups with offset auto-increment, I’m talking about removing the entire concept of master-slave for replication. This is obviously nothing new in the industry, but it’s never been easy to accomplish. Just do some reading on distributed locking algorithms and you’ll get the idea. The main problem with distributed locking is that they don’t scale. But, what about an eventually consistent replication model for a relational database? So far eventually consistent databases have not been relational (document based like CouchDB or simple key/value pairs) and relational databases have always focused on atomic consistency or some close relaxed relative (various levels of serialization). As a thought experiment, I’m going to attempt to describe what this may look under the hood at a high level. Eventually consistent? Not familiar with this term? Take a look at Werner Vogels’ article on the topic. The main idea behind EC is that you sacrifice the ability for all nodes to see exact same thing at any given time (consistency), but in return you can tolerate network partitions and you have availability. This directly relates to the CAP theorem which states you only get two of: Consistency, Availability, and tolerance to network Partitions. So, we are throwing out “C” so we can get rid of those nasty distributed locking algorithms, but in return we take on “EC”. MyEventuallyConsistentSQL Let’s start off with a traditional relational database and start modifying it until we have something that looks like an ECRDBMS (ok, maybe this acronym is a bit wordy).
What are we missing? What else would break down if we toss out atomic consistency and make the above changes? One thing I left out is DDL operations. Those would require some more thought, but I’m pretty sure we could figure out a way to handle conflicting events, possibly with configuration parameters to control the decisions made in conflict resolution algorithms. For example, if an UPDATE event gets applied after a ALTER TABLE that removed a column referenced in the UPDATE, you could just ignore that value and apply the other updates (if any). Chances are you didn’t want that column if it was removed at about the same time. This model has the major benefit of not having to worry about which node is the master or keeping an ordered replication log, they all operate independently and toss deterministic events which can be applied in any order. Summary This would-be-ECRDBMS looks a bit different on the inside, but from the outside it will look pretty familiar. From the normal web application perspective we are still creating tables, inserting data, joining data, and doing all the things we depend on from a relational database. This many not be a great idea, but I think it would be possible if you are willing to accept some of the behaviors that come along with it. So what do you think? How can it be improved? Would you use it for your application? Posted in Drizzle, Main, MySQL9 Responses to "Eventually Consistent Relational Database?"
Leave a Reply< Non-blocking State Machines || OpenSQL Camp, SQL vs NoSQL > |
Blog Wiki About Resume RSS Comments Launchpad identi.ca OpenStack Scale Stack Gearman NW Veg Veg Food & Fit |
|
Copyright (C) Eric Day - eday@oddments.org All content licensed under the Creative Commons Attribution 3.0 License. Hosted by Rackspace Cloud |
|
Hi Eric!
Thanks for the interesting article. One other related thought that I’ve had is why not use an eventually consistent data store as the backend for a storage engine in Drizzle. This way, we don’t have to worry about implementing replication – its already done for us. For example, one of the NoSQL stores like Project Voldemort is pretty much just a clone of Amazon’s Dynamo. So why not just build a storage engine in Drizzle on top of voldemort? Then you would have a SQL interface to an eventually consistent data store i.e. we shoe-horned the relational model on to this key/value store.
Of course, this does bring up a lot of issues such as how to store and query relational data in these key/value stores efficiently. And how do we do joins efficiently on top of a store such as this. Another issue is whether its even a good idea to build a storage engine on top of a store such as this! But it gets you the relational model on top of an eventually consistent store pretty easily.
Some research has been done recently in this area, for example:
* Building a Database on S3 – http://is.gd/4goRW
* Building a Database in the Cloud – http://www.dbis.ethz.ch/research/publications/dbs3.pdf
For a class I’m taking this semester on distributed systems, I’m creating a storage engine for Drizzle on Amazon’s S3 storage service so I’ve been thinking about these things a fair bit myself lately. One thing that’s a big issue in that case is the latency involved in making a request to S3 so I’m starting to think a fair bit about caching strategies for the engine.
Its definitely an interesting topic you’ve brought up (well, I find it pretty interesting). Sometimes I wonder how many people actually need to migrate to the eventually consistent data model though. That doesn’t affect me though so I’m happy to work on it :)