Eric Day

Thoughts, code, and other oddments.
Dark | Light

< || >

Non-blocking State Machines

October 7th, 2009

If you’ve ever done any non-blocking programming (usually for socket I/O), you’ve probably had to come up with a non-trivial state machine to handle all the places where everything can pause. Say you’re reading an application level packet from a socket, and half way through the read() system call it screams EAGAIN. You need to stop, save any state, and exit out of whatever chain of functions got you there so the calling application can regain control. I’m going to explain a few techniques I’ve come up with over the years, each with their strengths and weaknesses, and I hope this will spur some conversation of what other folks have done. While I’m fairly happy with how I handle these state machines now, but I’m always looking for a more succinct way of handling things. Please share your thoughts!

Switch Statements

The obvious way to handle non-blocking I/O is with one or more switch statements. Say we need to check the status of something by sending a request over a TCP connection, possibly connecting to the remote host first, and then reading the response. Here is a bit of pseudo-code that demonstrates how this could work (ignoring some error cases, efficient buffer handling, and non-blocking connect cases):

int check_status(struct connection *con)
{
  switch (con->state)
  {
  case CONNECTION_STATE_NONE:
    getaddrinfo(...);
    con->fd = socket(...);
    /* Fall through to next state. */

  case CONNECTION_STATE_CONNECT:
    ret = connect(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_CONNECT;
      return WAIT_FOR_WRITE;
    }
    /* Fall through to next state. */

  case CONNECTION_STATE_REQUEST:
    ret = write(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_REQUEST;
      return WAIT_FOR_WRITE;
    }
    /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE_HEADER:
    ret = read(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_RESPONSE_HEADER;
      return WAIT_FOR_READ;
    }
    /* Save header. */
    /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE:
    ret = read(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_RESPONSE;
      return WAIT_FOR_READ;
    }
    /* Save response. */

    /* Set this here so we skip the connect state next time around. */
    con->state = CONNECTION_STATE_REQUEST;
    break;
  }
}

The first thing you may cringe at is the fall-through cases in switch statements. The alternative is to set a new state at the end of each case, break, and then reevaluate the switch again with that new state (wrapping the above switch in a while loop). I skipped that version since those are some extra ops that are just not necessary. The above machine may be a bit clunky, but it works for simple cases. But what about when you have more complex states that have loops, non-sequential state execution, or nested switch statements? The above has the potential to grow into an unwieldy mess of code. For example, say if we need to read multiple responses back in the last state above, this could be expanded to:

int check_status(struct connection *con)
{
  switch (con->state)
  {
...
    /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE:
    while (1)
    {
      if (con->need_header)
      {
        ret = read(con->fd, ...);
        if (ret == -1 && errno == EAGAIN)
        {
          con->state = CONNECTION_STATE_RESPONSE;
          return WAIT_FOR_READ;
        }
        /* Save header. */
        con->need_header = false;
      }

      ret = read(con->fd, ...);
      if (ret == -1 && errno == EAGAIN)
      {
        con->state = CONNECTION_STATE_RESPONSE;
        return WAIT_FOR_READ;
      }
      /* Save response. */
      if (last_response)
        break;
      con->need_header = true;
    }

    /* Set this here so we skip the connect state next time around. */
    con->state = CONNECTION_STATE_REQUEST;
    break;
  }
}

As you can see, another state variable has been added as a boolean (con->need_header). What if responses were not made up of simple header and body? What if there are more nested levels? We can add more switch statements and start breaking this up some into nested functions to make it more readable, but the complexity is still there. For non-trivial non-blocking state machines, this approach is not scalable.

Nested switch/while Statements

Early on in my C years I stumbled upon Duff’s Device. At first I was confused, is that even valid C? Oh, it compiles! Then I was offended. Eventually it clicked and I appreciated the cleverness of the code. Nesting while/for/if with switch statements. I went off to re-write my non-blocking state machines with this new trick:

int check_status(struct connection *con)
{
  switch (con->state)
  {
...
    /* Fall through to next state. */

    while (1)
    {
  case CONNECTION_STATE_RESPONSE_HEADER:
      ret = read(con->fd, ...);
      if (ret == -1 && errno == EAGAIN)
      {
        con->state = CONNECTION_STATE_RESPONSE_HEADER;
        return WAIT_FOR_READ;
      }
      /* Save header. */
      /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE:
      ret = read(con->fd, ...);
      if (ret == -1 && errno == EAGAIN)
      {
        con->state = CONNECTION_STATE_RESPONSE;
        return WAIT_FOR_READ;
      }
      /* Save response. */
      if (last_response)
        break;
    }

    /* Set this here so we skip the connect state next time around. */
    con->state = CONNECTION_STATE_REQUEST;
    break;
  }
}

Yup, that’s correct. Shove that while loop right in there. Think of it this way: write your state machine as you would if it were blocking, nesting as deep as you need with for/if/while statements. Next, put a switch around the entire thing, and toss a case statement in wherever something could hit a non-blocking condition (regardless of scope or nesting level). Some folks have commented this feels a lot like using gotos, but I disagree. With switch, you have structure, and compiler warnings for when things are missing (like a case). Sure, it may not be the most elegant solution, but you avoid the nested switch statements and multiple state variables. I still use this today for some things (like inside of the Gearman C server and library), but only when they are fairly simple state machines.

Function Pointer Stack

Last year I started writing a non-blocking C library for MySQL. When I head about Drizzle, I decided to focus my effort there (while keeping the MySQL compatibility), and renamed it to libdrizzle. Today it supports the Drizzle protocol and the most common parts of the MySQL protocol. The protocols for these projects are a bit more involved, so when I began writing the library, I went through a few iterations of state machines and didn’t find anything I was happy with. After some brainstorming I came up with an alternative design, I usually refer to it as a “function pointer stack” or “callback stack”. Please let me know if you have seen something like this and point me to the proper name. :)

This works by creating a traditional stack (LIFO structure) of function pointers. When a state needs to be executed, push it on, when a state is complete, it can pop itself off. It’s similar to a program execution stack, but maintained in user space and state is kept so you know where things left off. Still not getting it? Lets look at the code. First, one quick note about function pointer typedefs:

typedef int (state_fn)(struct connection *con);

These are not required of course, but it makes things a bit more legible. This is saying ‘state_fn’ is now a type that points to a function with the given signature. It’s a lot easier that having to write the function signature out every time you have a variable of this type.

Now, the code:

typedef int (state_fn)(struct connection *con);

struct connection
{
  ...
  state_fn *state_stack[STACK_SIZE];
  int state_current;
};

/* These functions operation on the function pointer stack. */
static inline bool state_none(struct connection *con)
{
  return con->state_current == 0;
}

static inline void state_push(struct connection *con, state_fn *function)
{
  assert(con->state_current < STACK_SIZE);
  con->state_stack[con->state_current]= function;
  con->state_current++;
}

static inline void state_pop(struct connection *con)
{
  con->state_current--;
}

int state_run(struct connection *con)
{
  int ret;

  while (!state_none(con))
  {
    ret= con->state_stack[con->state_current - 1](con);
    if (ret)
      return ret;
  }

  return 0;
}

/* These are the states that can be pushed onto the stack. */
int start_state(struct connection *con)
{
  getaddrinfo(...);
  con->fd = socket(...);
  state_pop(con);
  state_push(con, connect_state);
  return 0;
}

int connect_state(struct connection *con)
{
  ret = connect(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_WRITE;
  state_pop(con);
  return 0;
}

int request_state(struct connection *con)
{
  if (not connected)
  {
    state_push(con, start_state);
    return 0;
  }

  ret = write(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_WRITE;
  state_pop(con);
  state_push(con, response_header_state);
  return 0;
}

int response_header_state(struct connection *con)
{
  ret = read(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_READ;
  /* Save header. */
  state_pop(con);
  state_push(con, response_state);
  return 0;
}

int response_state(struct connection *con)
{
  ret = read(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_READ;
  /* Save response. */
  state_pop(con);
  if (have_more_responses)
    state_push(con, response_header_state);
  return 0;
}

/* Here is a function you would make public in the API to start the state machine. */
int check_status(struct connection *con)
{
  /* If we are coming back into this after a blocking event,
     make sure we don't push a new state on again. */
  if (state_none(con))
    state_push(con, request_state);

  return state_run(con);
}

As you can see, we still start in the check_status function, but push a state if there is no state and then go into our run loop. You can follow along the various functions (sort of like a choose your own adventure book) but eventually you should end up with an empty stack. When this happens, the state_run() function returns 0 and the call is complete.

This may be a bit overkill for such a simple state machine, but as your state execution flow becomes non-sequential (random jumps, recursion, …) the power and flexibility of this design becomes apparent. And what? No switch statements? As far as performance is concerned, you may have more function calls, but you are eliminating jumps (those nested if/switches). For example, if your state is five levels deep and you need to keep pausing and returning to that point, you hit all those switch statements every time. With the above approach? You jump directly into the function you left off in. I’m not sure which one is faster in general (really depends on application), but the cost of switches vs function calls will be insignificant compared to what normal applications are actually doing (like system calls for I/O).

I have working C and C++ examples of what a complete state machine looks like. There is also some micro-benchmarking numbers in there comparing C vs C++ (you take a hit in C++ due to inheritance, but that cost is fairly insignificant).

Thoughts?
Another choice I didn’t bother to mention is to have one thread per connection and let it block, but that doesn’t scale. What methods have you used to solve this? Do the nested switch/whiles offend you? Are the function pointer stacks elegant or spaghetti code?

Posted in Drizzle, Gearman, Main, MySQL

10 Responses to "Non-blocking State Machines"

  1. comstud says:

    Didn’t you cringe when I showed you I’d used Duff’s Device in a certain piece of state machine code? :) I think it was you. Maybe it was someone else. :)

  2. Eric Day says:

    Rich originally sent me a link to Duff’s Device early on when I was there. I supported your use of it, I think it was David who cringed. :)

  3. Jonas says:

    Ndb datanodes are internally 100% non-blocking with mulitple complex state-machines, and manually store state/stack in heap when relinquishing control.

    We use a message-passing abstraction where reading from file/socket is done by sending the READ message to the file-system abstrction module (which do use threads (but could be using async io)). Later when data is available the READ_READY message will be sent to the module requesting the data. Socket IO is handled similary.
    (the names of the messages are in fact FSREADREQ and FSREADCONF, and there is more to read at http://messagepassing.blogspot.com/2009/09/ndb-software-architecture.html)

  4. Frazer says:

    This technique gives you a way to compose otherwise independent state machines.
    I have seen something similar to this in a telecoms product. It was in an event handling system and was referred to as the ‘Finite State Machine Layer’ or FSML. I’m not sure if there’s a generally accepted name for such a thing.

    It would be cool to also describe :
    – Some ways to manage per-state machine state (allocation/cleanup)
    – Some standard mechanism for state machines to communicate with each other (invocation / return)
    – Some patterns of use (Invoke state machine X after I complete, Invoke state machine Y and then return control to me, etc. and the problems they solve)

  5. Kay Roepke says:

    Yeah, state machines can really suck :)

    MySQL Proxy is still using the first version, and sometimes it can be a bit hard to follow.
    While Duff’s Device is a neat trick if you need the loop unrolling I don’t particularly like it for state machines, for the simple reason that it’s using what I call the principle of greatest surprise ;)
    In my book, any code that makes a newcomer to your code go “huh?!” is bad code. I don’t care if a trick is making it 10% faster, unless you can prove you really need the speed improvement.

    That being said, I actually prefer to generate this kind of code using something like Ragel, for example.
    While that might make the actual code even harder to read, it’s fairly easy to generate transition diagrams from the input spec and that makes it easy to verify (or use state machine verification tools) the machine itself.

    I would probably not go the function pointer stack route, because a) I wouldn’t want to have to deal with stack overflows and/or growing the stack in C and b) it’s hard to debug given that you only have function pointers.
    Now, if C had closures, my initial reaction would probably be different…(wait it has! at least on one platform with one compiler :P)

    cheers,
    -k

  6. Eric Day says:

    Hi everyone, thanks for the comments!

    Jonas – Interesting. I read your blog on NDB state machine internals, but still had one question. How is the state/stack stored when it needs to relinquish control? And how is that context entered back into? I’ve thought about doing a generic event system like that, but so far using simple socket I/O notification and timer events (something based on a poll() variant) seem to the trick. As you get into more complex file I/O, I can see how this would be more useful though.

    Frazer – You could accomplish some of the items you mention fairly easily. Any per-state machine context could be held in a state management object (the thing maintaining the stack), and could automatically manage resources when the stack hits a certain level or is empty (ie, free all temporary resources). As for the other two, you could accomplish this with a richer set of stack management functions, or do some event notification bus between different state instances. Much of it depends on the application.

    Kay – Understood on the non-obvious code, but if you can reduce a lot of complexity once you grasp once concept, it may be worth it. I took a brief look at Ragel, but it doesn’t seem to have any built-in way to “pause”. For example, when I hit a blocking event and want the active thread to switch context to another state machine that can run, how do you save that context? It seems a lot of context is kept only in scope of where you are at in the goto mess. :) As far as managing the function pointer stack, you can either pre-determine your max stack size at compile time (requires some thought), or you could easily make it dynamic (starting off with a reasonable fixed-size stack). I’m not saying any of this is ideal (which is one of the reasons I wrote this post), just looking for the least-scary way of handling it.

  7. Hi Eric,

    I’m curious why you don’t use a state machine formalism like Harel Statecharts for this problem. For any state machine problem with more than a couple of states, it’s very helpful to model not only states but state transitions with activities associated with leaving states, traversing a transition, and entering a new state, possibly through outer states.

    I ran into a similar set of problems last year and wrote a library named Tungsten FSM (Finite State Machines) in Java that addresses state machine handling for Tungsten. It deals with some additional issues that come up pretty commonly like thread synchronization, error handling, and rational back-out of failed transitions. There’s a description on my blog from a while back. (http://scale-out-blog.blogspot.com/2009/03/announcing-tungsten-finite-state.html)

    Cheers, Robert

  8. Eric Day says:

    Hi Robert!

    Thanks for the link to your FSM project, it was a very interesting read (also the wiki page). While I’ve read quite a bit about formal state machines, I’ve only used them a couple times in real programming (usually for some hand-built parsers). I skipped most of that for this C protocol parsing code mostly because I wanted to keep things extremely light weight and simple, but I’m still trying to have some sanity in the design. The extra steps of building the machines and verifying them at runtime would be a few more ops than I want to spend. I looked at some state-machine generators (like the Ragel one Kay posted), but I like having the ability to edit the machine directly and not via a generator. I plan on adding pluggable states as well, so a shared library just needs to provide a function pointer of the same signature to be pushed/popped. While not ideal or the safest, they are fast, flexible, and direct.

    Perhaps this will come back to bite me later on and I’ll end up writing a C library based on the Tungsten FSM. :)

  9. Arvind says:

    Yeh, I have been into this too many a times. The way I handle this is to keep the states only for establishing socket connection and the rest protocol related stuff are not stored as program states. I use a queue of buffers for writing and a single read bufffer. The actions specific to the protocol like adding an IM contact are stateless and I use the response itself to drive the program execution.

    Amazing blog post. I will follow your blog from now on.

  10. Rory says:

    I used set_context,get_context on linux rather than threads. pretty much makes co-operative multi-threading controlled by the socket selector.

Leave a Reply


< || >
Blog
Wiki
About
Resume
RSS
Comments

E-Mail
Launchpad
LinkedIn
Twitter
identi.ca
Facebook

OpenStack
Scale Stack
Gearman
NW Veg
Veg Food & Fit

Linux On Laptops