Eric DayThoughts, code, and other oddments. |
Dark | Light |
|
|
|
< Gearman Releases and Talks at the MySQL Conference || Boots: A Modular CLI for Databases > Scale Stack and Database Proxy PrototypeApril 8th, 2010Back in January when I was between jobs I had a free weekend to do some fun hacking. I decided to start a new open source project that had been brewing in the back of my head and since then have been poking at it on the weekends and an occasional late night. I decided to call it Scale Stack because it aims to provide a scalable network service stack. This may sound a bit generic and boring, but let me show a graph of a database proxy module I slapped together in the past couple days: I setup MySQL 5.5.2-m2 and ran the sysbench read-only tests against it with 1-8192 threads. I then started up the database proxy module built on Scale Stack so sysbench would route through that, and you can see the concurrency improved quite a bit at higher thread counts. The database module doesn’t do much, it simply does connection concentration, mapping M to N connections, where N is a fixed parameter given at startup. In this case I always mapped all incoming sysbench connections down to 128 connections between Scale Stack and MySQL. It also uses a fixed number of threads and is entirely non-blocking. As you can see the max throughput around 64 threads is a bit lower, but I’ve not done much to optimize this yet (there should be some easy improvements where I simply stuck in a mutex instead of doing a lockless queue). It’s only a simple proof-of-concept module to see how well this would work, but it’s a start to a potentially useful module built on the other Scale Stack components. One other thing to mention is that these tests were run on a single 16-core Intel machine. I’d really like to test this with multiple machines at some point. So, what is Scale Stack? Check out the website for a simple overview of what it is. The goal is to pick up where the operating system kernel leaves off with the network stack. It is written in C++ and is extremely modular with only the module loader, option parsing, and basic log in the kernel library. It uses Monty Taylor’s pandora-build autoconf files to provide a sane modular build system, along with some modifications I made so dependency tracking is done between modules. You can actually use it to write modules that would do anything, I’m just most interested in network service based modules. The kernel/module loader is also just a library, so you can actually embed this into existing applications as well. Some of the modules I’ve written for it are a threaded event handling module based on libevent/pthreads and a TCP socket module. There is also an echo server and simple proxy module I created while testing the event and socket modules. The database proxy module builds on top of the event and socket module. The code is under the BSD license and is up on Launchpad, so feel free to check it out and contribute. If you need a base to build high-performance network services on, you should definitely take a look and talk with me. What’s up next? I have a long list of things I would like to do with this, but first up are still some basics. This includes other socket type modules like TLS/SSL, UDP, and Unix sockets. Then are some more protocol modules such as Drizzle, a real MySQL protocol module, and others like HTTP, Gearman, and memcached. It’s fairly trivial to write these since the socket modules handle all buffering and provide a simple API. As for the DatabaseProxy module, I’d like to rework how things are now so it’s not MySQL protocol specific, integrate other protocol modules, improve performance, add in multi-tenancy support for quality-of-service queuing based on account rules, and a laundry list of other features I won’t bore you with right now. I also have plans for other services besides a database proxy, especially one that could combine a number of protocols into a generic URI server with pluggable handlers so you can do some interesting translations between modules (like Apache httpd but not http-centric). For example, think of the crazy things you can do with Twisted for Python, but now with a fast, threaded C++ kernel. I also still need to experiment with live reloading of modules, but I’m not sure if this will be worthwhile yet. If any of this sounds interesting, get in touch, I’d love to have some help! I’ll have some blog posts later on how to get started writing modules, but for now just take a look at the existing modules. The EchoServer is a good place to start since it is pretty simple. Also, if you’ll be at the MySQL Conference and Expo next week, I’d be happy to talk more about it then. Posted in Drizzle, Main, MySQL3 Responses to "Scale Stack and Database Proxy Prototype"
Leave a Reply< Gearman Releases and Talks at the MySQL Conference || Boots: A Modular CLI for Databases > |
Blog Wiki About Resume RSS Comments Launchpad identi.ca OpenStack Scale Stack Gearman NW Veg Veg Food & Fit |
|
Copyright (C) Eric Day - eday@oddments.org All content licensed under the Creative Commons Attribution 3.0 License. Hosted by Rackspace Cloud |
|
Eric,
Awesome. A question immediately jumped into my mind when looking at the chart. Is there anything that could be done to make the performance more similar with the lower thread counts? I suspect that performance gap is related to an additional latency because of TCP/IP round trips. Using persistent connections on the back-end of the proxy may help with that, if you are not doing that already.
For the SQL proxy use case, you might consider keeping a simple queue of active read-only SQL queries, and multiplex responses for identical ones. That way if I have a multitude of concurrency running the identical query/request, they can all get the same response at the same time after a short period of blocking while one of them runs on the back-end. This should cause the back-end server to do much less concurrent work in flash crowd scenarios, and speed things up considerably. It would scale well horizontally by adding more proxies. This same approach work work well for lots of request oriented protocols for read-only operations, such as an HTTP GET or HEAD.
Adrian