Eric DayThoughts, code, and other oddments. |
Dark | Light |
|
|
||||||||||
|
Blog Wiki About Resume RSS Comments Launchpad identi.ca Gearman Drizzle NW Veg Veg Food & Fit |
< Older Entries | Archive for the "MySQL" CategoryMoving OnJanuary 11th, 2010Friday was my last day at Sun Microsystems, and today is the first day at my new job (location coming soon). I’ve had a great time at Sun, and thank them for all the opportunities given to me there. I’ll be doing mostly the same work at the new gig, working on projects like Drizzle, but with a slightly different focus. For the most part my day-to-day won’t change much. Right now I’m focusing on libdrizzle again and am implementing the prepared statement API, cleaning up the MySQL protocol support a little, and also implementing the new Drizzle client/server protocol. I’ll continue to work on Gearman as well, especially where it is relevant to Drizzle. I also need to start blogging again with specific topics in the projects I’m working on, I’ve been fairly quiet lately. I’ll be in New Zealand next week at Linux Conf AU (yes, it’s not in AU this year). I have a talk on Gearman, and it looks like I’ll also be helping out with the Drizzle talk. It will be really nice to escape the Portland, OR winter for a bit. :) Posted in Drizzle, Gearman, Main, MySQL | 2 CommentsPluggable Database Client ToolNovember 23rd, 2009A few weeks ago I wrote about a student group who will be working with the Drizzle community to build a new database client tool. While the tool will be the primary replacement for the Drizzle client tool, we hope it will be generic (using the Python DB API) so it will work with others like MySQL and PostgreSQL. We’ve had a number of great discussions, including a session at OpenSQL camp last weekend. I wanted to toss out a few ideas of how such a tool could be structured to allow for maximum extensibility. One possibility is to borrow from typical Unix shells and DSP processing systems where you have a number of modules with I/O interfaces and data exchange formats between each module. Each module provides a specific signature so you know what other modules it can plug into. Here is a simple example:
New Database Command Line ClientOctober 29th, 2009A few weeks ago I proposed a project to students at Portland State University for their senior capstone class, and this weekend I found out it was chosen by a group! The project will be a rewrite of the command line tool (the Drizzle tool is currently based on the ‘mysql’ tool), plus a lot of new features. We’re really excited to be working with them, and they seem equally excited about the project too. I hope DBAs, developers, and other folks in the Drizzle/MySQL/MariaDB communities will work with them to help define what features should be part of this new command line client. Some new features we have in mind are background queries, piping and redirection of queries (like a normal shell), and plugin support. It will also support at least the MySQL/MariaDB protocol too since it will be built on libdrizzle, but possibly more if we end up using a common DB API (we’re pondering Python). If you have any ideas or feature requests, feel free to leave a comment. The student group will be sending plans to the Drizzle mailing list soon for feedback, as well as attending OpenSQL Camp and leading a session on what folks would like to see in a client tool. Join me in welcoming Clark, Ken, Max, Victoria, David, and Andreas! Posted in Drizzle, Main, MySQL | 3 CommentsOpenSQL Camp, SQL vs NoSQLOctober 26th, 2009The upcoming OpenSQL Camp is almost full! We have space for 130 people to register, and as of this writing only 10 spots are free. If you want to attend, sign up before it’s too late! We’re still looking for a few sponsors if anyone is interested in helping cover food and t-shirt costs. I’m organizing the closing keynote panel, “SQL vs NoSQL”, which will include core community members and committers from a number of open source databases. Selena has offered to take the PostgreSQL position if we don’t find another worthy contender. So far, it will include:
I’ll be sure to report who is the last one standing so we know which project to follow the closest. :) Posted in Drizzle, Main, MySQL | 1 CommentEventually Consistent Relational Database?October 12th, 2009This weekend I attended Drupal Camp PDX and listened to a session titled “Drupal in the Cloud”. The presenter, Josh Koenig from Chapter Three, gave a great introduction of what moving to “the cloud” really means, especially in the context of a typical web application like Drupal. The problem, which is of course no fault of Josh’s, is that the best high availability database practices are harder to deploy because you’re working within a different set of constraints in the cloud. Sure, you can setup MySQL replication, but without the ability to insert a hardware load balancer or better control over floating IPs, reliable single-master solutions are difficult at best. I spoke with Josh for a bit after and discussed how Drizzle is doing things to help and what it would take to have a Drizzle back end for Drupal (turns out it should not be too difficult). We then got onto the topic what some of the newer non-relational databases would look like for Drupal, and the short answer is it would be extremely difficult. Drupal, in both the core and many of the modules, depend on a relational model for the underlying data. This is not unique to Drupal. People, and the software they write, have thought “relational” for decades when it comes down to data. Sure, the various NoSQL projects are becoming more popular, but the masses are still thinking in terms of joining tables. Silver Bullet So, what would be the silver bullet? A relational database that did not depend on a single master. Not just dual-master setups with offset auto-increment, I’m talking about removing the entire concept of master-slave for replication. This is obviously nothing new in the industry, but it’s never been easy to accomplish. Just do some reading on distributed locking algorithms and you’ll get the idea. The main problem with distributed locking is that they don’t scale. But, what about an eventually consistent replication model for a relational database? So far eventually consistent databases have not been relational (document based like CouchDB or simple key/value pairs) and relational databases have always focused on atomic consistency or some close relaxed relative (various levels of serialization). As a thought experiment, I’m going to attempt to describe what this may look under the hood at a high level. Eventually consistent? Not familiar with this term? Take a look at Werner Vogels’ article on the topic. The main idea behind EC is that you sacrifice the ability for all nodes to see exact same thing at any given time (consistency), but in return you can tolerate network partitions and you have availability. This directly relates to the CAP theorem which states you only get two of: Consistency, Availability, and tolerance to network Partitions. So, we are throwing out “C” so we can get rid of those nasty distributed locking algorithms, but in return we take on “EC”. MyEventuallyConsistentSQL Let’s start off with a traditional relational database and start modifying it until we have something that looks like an ECRDBMS (ok, maybe this acronym is a bit wordy).
What are we missing? What else would break down if we toss out atomic consistency and make the above changes? One thing I left out is DDL operations. Those would require some more thought, but I’m pretty sure we could figure out a way to handle conflicting events, possibly with configuration parameters to control the decisions made in conflict resolution algorithms. For example, if an UPDATE event gets applied after a ALTER TABLE that removed a column referenced in the UPDATE, you could just ignore that value and apply the other updates (if any). Chances are you didn’t want that column if it was removed at about the same time. This model has the major benefit of not having to worry about which node is the master or keeping an ordered replication log, they all operate independently and toss deterministic events which can be applied in any order. Summary This would-be-ECRDBMS looks a bit different on the inside, but from the outside it will look pretty familiar. From the normal web application perspective we are still creating tables, inserting data, joining data, and doing all the things we depend on from a relational database. This many not be a great idea, but I think it would be possible if you are willing to accept some of the behaviors that come along with it. So what do you think? How can it be improved? Would you use it for your application? Posted in Drizzle, Main, MySQL | 9 CommentsNon-blocking State MachinesOctober 7th, 2009If you’ve ever done any non-blocking programming (usually for socket I/O), you’ve probably had to come up with a non-trivial state machine to handle all the places where everything can pause. Say you’re reading an application level packet from a socket, and half way through the read() system call it screams EAGAIN. You need to stop, save any state, and exit out of whatever chain of functions got you there so the calling application can regain control. I’m going to explain a few techniques I’ve come up with over the years, each with their strengths and weaknesses, and I hope this will spur some conversation of what other folks have done. While I’m fairly happy with how I handle these state machines now, but I’m always looking for a more succinct way of handling things. Please share your thoughts! Switch Statements The obvious way to handle non-blocking I/O is with one or more switch statements. Say we need to check the status of something by sending a request over a TCP connection, possibly connecting to the remote host first, and then reading the response. Here is a bit of pseudo-code that demonstrates how this could work (ignoring some error cases, efficient buffer handling, and non-blocking connect cases):
int check_status(struct connection *con)
{
switch (con->state)
{
case CONNECTION_STATE_NONE:
getaddrinfo(...);
con->fd = socket(...);
/* Fall through to next state. */
case CONNECTION_STATE_CONNECT:
ret = connect(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_CONNECT;
return WAIT_FOR_WRITE;
}
/* Fall through to next state. */
case CONNECTION_STATE_REQUEST:
ret = write(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_REQUEST;
return WAIT_FOR_WRITE;
}
/* Fall through to next state. */
case CONNECTION_STATE_RESPONSE_HEADER:
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_RESPONSE_HEADER;
return WAIT_FOR_READ;
}
/* Save header. */
/* Fall through to next state. */
case CONNECTION_STATE_RESPONSE:
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_RESPONSE;
return WAIT_FOR_READ;
}
/* Save response. */
/* Set this here so we skip the connect state next time around. */
con->state = CONNECTION_STATE_REQUEST;
break;
}
}
The first thing you may cringe at is the fall-through cases in switch statements. The alternative is to set a new state at the end of each case, break, and then reevaluate the switch again with that new state (wrapping the above switch in a while loop). I skipped that version since those are some extra ops that are just not necessary. The above machine may be a bit clunky, but it works for simple cases. But what about when you have more complex states that have loops, non-sequential state execution, or nested switch statements? The above has the potential to grow into an unwieldy mess of code. For example, say if we need to read multiple responses back in the last state above, this could be expanded to:
int check_status(struct connection *con)
{
switch (con->state)
{
...
/* Fall through to next state. */
case CONNECTION_STATE_RESPONSE:
while (1)
{
if (con->need_header)
{
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_RESPONSE;
return WAIT_FOR_READ;
}
/* Save header. */
con->need_header = false;
}
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_RESPONSE;
return WAIT_FOR_READ;
}
/* Save response. */
if (last_response)
break;
con->need_header = true;
}
/* Set this here so we skip the connect state next time around. */
con->state = CONNECTION_STATE_REQUEST;
break;
}
}
As you can see, another state variable has been added as a boolean (con->need_header). What if responses were not made up of simple header and body? What if there are more nested levels? We can add more switch statements and start breaking this up some into nested functions to make it more readable, but the complexity is still there. For non-trivial non-blocking state machines, this approach is not scalable. Nested switch/while Statements Early on in my C years I stumbled upon Duff’s Device. At first I was confused, is that even valid C? Oh, it compiles! Then I was offended. Eventually it clicked and I appreciated the cleverness of the code. Nesting while/for/if with switch statements. I went off to re-write my non-blocking state machines with this new trick:
int check_status(struct connection *con)
{
switch (con->state)
{
...
/* Fall through to next state. */
while (1)
{
case CONNECTION_STATE_RESPONSE_HEADER:
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_RESPONSE_HEADER;
return WAIT_FOR_READ;
}
/* Save header. */
/* Fall through to next state. */
case CONNECTION_STATE_RESPONSE:
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
{
con->state = CONNECTION_STATE_RESPONSE;
return WAIT_FOR_READ;
}
/* Save response. */
if (last_response)
break;
}
/* Set this here so we skip the connect state next time around. */
con->state = CONNECTION_STATE_REQUEST;
break;
}
}
Yup, that’s correct. Shove that while loop right in there. Think of it this way: write your state machine as you would if it were blocking, nesting as deep as you need with for/if/while statements. Next, put a switch around the entire thing, and toss a case statement in wherever something could hit a non-blocking condition (regardless of scope or nesting level). Some folks have commented this feels a lot like using gotos, but I disagree. With switch, you have structure, and compiler warnings for when things are missing (like a case). Sure, it may not be the most elegant solution, but you avoid the nested switch statements and multiple state variables. I still use this today for some things (like inside of the Gearman C server and library), but only when they are fairly simple state machines. Function Pointer Stack Last year I started writing a non-blocking C library for MySQL. When I head about Drizzle, I decided to focus my effort there (while keeping the MySQL compatibility), and renamed it to libdrizzle. Today it supports the Drizzle protocol and the most common parts of the MySQL protocol. The protocols for these projects are a bit more involved, so when I began writing the library, I went through a few iterations of state machines and didn’t find anything I was happy with. After some brainstorming I came up with an alternative design, I usually refer to it as a “function pointer stack” or “callback stack”. Please let me know if you have seen something like this and point me to the proper name. :) This works by creating a traditional stack (LIFO structure) of function pointers. When a state needs to be executed, push it on, when a state is complete, it can pop itself off. It’s similar to a program execution stack, but maintained in user space and state is kept so you know where things left off. Still not getting it? Lets look at the code. First, one quick note about function pointer typedefs: typedef int (state_fn)(struct connection *con); These are not required of course, but it makes things a bit more legible. This is saying ’state_fn’ is now a type that points to a function with the given signature. It’s a lot easier that having to write the function signature out every time you have a variable of this type. Now, the code:
typedef int (state_fn)(struct connection *con);
struct connection
{
...
state_fn *state_stack[STACK_SIZE];
int state_current;
};
/* These functions operation on the function pointer stack. */
static inline bool state_none(struct connection *con)
{
return con->state_current == 0;
}
static inline void state_push(struct connection *con, state_fn *function)
{
assert(con->state_current < STACK_SIZE);
con->state_stack[con->state_current]= function;
con->state_current++;
}
static inline void state_pop(struct connection *con)
{
con->state_current--;
}
int state_run(struct connection *con)
{
int ret;
while (!state_none(con))
{
ret= con->state_stack[con->state_current - 1](con);
if (ret)
return ret;
}
return 0;
}
/* These are the states that can be pushed onto the stack. */
int start_state(struct connection *con)
{
getaddrinfo(...);
con->fd = socket(...);
state_pop(con);
state_push(con, connect_state);
return 0;
}
int connect_state(struct connection *con)
{
ret = connect(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
return WAIT_FOR_WRITE;
state_pop(con);
return 0;
}
int request_state(struct connection *con)
{
if (not connected)
{
state_push(con, start_state);
return 0;
}
ret = write(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
return WAIT_FOR_WRITE;
state_pop(con);
state_push(con, response_header_state);
return 0;
}
int response_header_state(struct connection *con)
{
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
return WAIT_FOR_READ;
/* Save header. */
state_pop(con);
state_push(con, response_state);
return 0;
}
int response_state(struct connection *con)
{
ret = read(con->fd, ...);
if (ret == -1 && errno == EAGAIN)
return WAIT_FOR_READ;
/* Save response. */
state_pop(con);
if (have_more_responses)
state_push(con, response_header_state);
return 0;
}
/* Here is a function you would make public in the API to start the state machine. */
int check_status(struct connection *con)
{
/* If we are coming back into this after a blocking event,
make sure we don't push a new state on again. */
if (state_none(con))
state_push(con, request_state);
return state_run(con);
}
As you can see, we still start in the check_status function, but push a state if there is no state and then go into our run loop. You can follow along the various functions (sort of like a choose your own adventure book) but eventually you should end up with an empty stack. When this happens, the state_run() function returns 0 and the call is complete. This may be a bit overkill for such a simple state machine, but as your state execution flow becomes non-sequential (random jumps, recursion, …) the power and flexibility of this design becomes apparent. And what? No switch statements? As far as performance is concerned, you may have more function calls, but you are eliminating jumps (those nested if/switches). For example, if your state is five levels deep and you need to keep pausing and returning to that point, you hit all those switch statements every time. With the above approach? You jump directly into the function you left off in. I’m not sure which one is faster in general (really depends on application), but the cost of switches vs function calls will be insignificant compared to what normal applications are actually doing (like system calls for I/O). I have working C and C++ examples of what a complete state machine looks like. There is also some micro-benchmarking numbers in there comparing C vs C++ (you take a hit in C++ due to inheritance, but that cost is fairly insignificant). Thoughts? Gearman News and ReleasesOctober 7th, 2009The past week has brought a surge of Gearman related releases. They include: C Server and Library Some of these releases were driven by the C API changes I made to clean a few things up, but a fair amount of functionality was added to the C library and C based modules (like timeouts and non-blocking API clients and workers). The Perl server included a number of algorithm improvements in worker selection. I’ll be taking a closer look at those and including them in the C server for the next release. Rasmus Lerdorf took Gearman for a spin in PHP, and a Gearman implementation in Erlang was even announced this week. Thanks to everyone in the Gearman community for all your hard work! Posted in Drizzle, Gearman, Main, MySQL | No CommentsDebug Console in drizzled, Part 2October 5th, 2009About a month ago I blogged about the debug console I was adding to drizzled. I finished this work up and it’s now in the trunk and latest release. This is implemented using the Client and Listen plugin points (which are heavily modified versions of MySQL’s Protocol class), and can be enabled using the ‘–console-enable’. For example: hades> drizzled --datadir=/Users/eday/drizzle.data --console-enable InnoDB: The InnoDB memory heap is disabled InnoDB: Mutexes and rw_locks use GCC atomic builtins. 090928 15:22:07 InnoDB: highest supported file format is Barracuda. 090928 15:22:07 InnoDB Plugin 1.0.3 started; log sequence number 46409 Listening on :::4427 Listening on 0.0.0.0:4427 ./drizzled/drizzled: Forcing close of thread 0 user: '(null)' ./drizzled/drizzled: ready for connections. Version: '2009.09.1144' Source distribution (trunk) drizzled> show tables in information_schema; Tables_in_information_schema INNODB_TRX INNODB_LOCKS ... STATISTICS TABLE_CONSTRAINTS TABLES drizzled> ./drizzled/drizzled: Forcing close of thread 1 user: '(null)' ./drizzled/drizzled: Normal shutdown 090928 15:22:31 InnoDB: Starting shutdown... 090928 15:22:32 InnoDB: Shutdown completed; log sequence number 46419 ./drizzled/drizzled: Shutdown complete hades> You can type ‘quit’, ‘exit’, or just send EOF (CTRL-D) to shutdown drizzled. There is another patch along the way with some fixes to make it more efficient for mas imports (ie, drizzled –console-enable < data_dump.sql). It’s pretty bare bones now, but patches welcome for new features to make this look more like the normal command line tool (like changing prompt). Posted in Drizzle, Main, MySQL | 3 CommentsGearman Slides from San Francisco MeetupOctober 5th, 2009Thanks to everyone who came out to the San Francisco PHP and MySQL meetup! Also, thanks to Michael for organizing such a great event, and Percona for sponsoring the food. I put the slides from the talk up on my wiki for reference or in case you missed it. I believe that there will be a video up at some point as well. While down there I also had a chance to stop by Digg and talked to them about Gearman (they’ve been using it for a while). It was interesting to see how they were using it in a large scale deployment. I was able to get some valuable feedback to future development, and a cool t-shirt. :) Thanks Digg! Posted in Gearman, Main, MySQL | No CommentsGearman at San Francisco PHP and MYSQL MeetupSeptember 28th, 2009I’ll be talking about Gearman this Thursday (October 1st, 2009) at the San Francisco PHP and MySQL Meetup groups (these are two separate groups, but sometimes share the topic). A few other folks involved in the Gearman community should also be there to help out, including James Luedke (the PHP extension main author), Eric Lambert (the Java API author), Dormando, and Hachi (Perl version maintainers at SixApart). You can sign up at either the MySQL or PHP meetup groups. We’ll be discussing the basics for those of you who don’t even know what Gearman is, common use cases, new features, advanced topics for folks already using Gearman, and of course Q&A throughout. Hope to see you there! Posted in Gearman, Main, MySQL | No Comments< Older Entries | |
|||||||||
|
Copyright (C) Eric Day - eday@oddments.org All content licensed under the Creative Commons Attribution 3.0 License. |
||||||||||