Eric DayThoughts, code, and other oddments. |
||
Threads with EventsApril 20, 2010Last week I was surprised to see this paper bubble back up on Planet MySQL. It describes the pros and cons of thread and event based programming for high concurrency applications (like a web server), arguing that thread-based programming is superior if you use an appropriate lightweight threading implementation. I don't entirely disagree with this, but the problem is such a library does not exist that is standard, portable, and useful for all types of applications. We have POSIX threads in the portable Linux/Unix/BSD world, so we need to work with this. Other experimental libraries based on lightweight threads or "fibers" are really interesting as they can maintain your stack without all the normal overhead, but it is hard to get the scheduling correct for all application types. I would even argue that thread and event based programming is actually not all that different, it's just a matter of how state is maintained (stack vs state variables) and how scheduling is performed. The comparisons done in that paper also put a C-based web server using a co-routine threading library against a Java based server that depends on the poll() system call. I'm sorry, but this is comparing apples to oranges. First, you're in the Java VM with a number of runtime components (like garbage collection) which may be getting in the way. Also, the standard poll() system call is not an efficient event-handling mechanism, it's much better to use epoll or some other Kernel-based handling mechanism. One high-concurrency userland threading implementation I do like is in Erlang. Erlang processes are extremely lightweight and I've written apps that depend heavily on them. One interesting application I saw was caching objects where each object got it's own Erlang process. This put a whole new spin on cache management, and it looked like it could actually scale reasonably well. The "problem" with Erlang, which may or may not be a problem depending on your requirements, is that it is still a bit of overhead running byte-code in a VM, as well as it being a functional language. I love functional programming, but I've found it still ties most developer's heads in knots if they don't have a reason to use it regularly. For open source projects trying to build a contributor community, it can act as one more hurdle. So, what is the "best" paradigm?Back in 2000 some colleagues and I wrote a hybrid thread-event library that would create one event-handler instance per thread, and connections would be spread across the pool of event-handling threads. I believe this gave the best of both worlds, and I saw high throughputs with fairly minimal overhead. I wrote a number of servers based on this architecture, including HTTP, IMAP, POP3, and DNS, and with each server type this model proved to be efficient and scalable. Ultimately the best architecture depends on your application. If you never intend to have many connections, and your applications has long-running computations, one-thread-per-connection would probably be best. If you need to handle large numbers of connections and have short, non-blocking request processing, event-based scales extremely well. You can of course create a hybrid of these two and have all connections managed by event threads and asynchronous queues to dedicated processing threads for heavy request handling (this is sort of what I did in the C Gearman Job Server). There is no single correct answer, so take a look at your options before deciding how to approach your own applications. Don't be afraid to create hybrids as well. Regardless of which paradigm you choose, concurrent programming can be hard, especially at the lower levels. There have been a number of higher level abstractions to help developers, from new libraries to new languages, but most of these come with a cost in performance or flexibility. When you need to squeeze every bit of performance out of your application, you will most likely end up in C or C++ dealing with these issues directly. This is actually one of the problems I'm attempting to address with the Scale Stack Event modules. I'm trying to create a healthy level of abstraction on hybrid thread/event based applications so you don't have any overhead or limitations while a lot of the common headaches are taken care of for you. If you have a need for such a system, get in touch, I'd be interested to talk. Since it is BSD licensed you can use it in any application, including commercial. Comment via Twitter or E-Mail View all thoughts. |
Thoughts Code About RSS Trap.it Scale Stack NW Veg Veg Food & Fit |
|
|
© 2011 Eric Day - eday@oddments.org
All content licensed under the Creative Commons Attribution 3.0 License. |
||