Eric Day

Thoughts, code, and other oddments.

Yet Another Language Comparison

September 22, 2010

Over the past year or so I've found myself evaluating my overall programming experience with the languages I'm working with. I might just be getting impatient in my old age (turning the big three-oh in a couple months), but I like to think I'm trying to find the most efficient way to solve the problem at hand. This has led me to learn and experiment with a number of languages, taking a look at each one's strengths and weaknesses. I realize programming language selection is very subjective and folks can get quite passionate in the debate, but I'm still going to present my personal opinions on the matter. Flame away.

The main question that I'm trying to try to answer is: What language will enable me to solve the problem at hand correctly and in the fastest way possible? By correctly I mean without bugs or missing requirements. By fastest I not only mean the initial design and coding phases, but also maintenance. I'm a strong believer that no piece of software is ever complete and is usually read many more times than it is written. An application needs to be written in a way that is easy to jump back into it after some period of time. I've considered a number of metrics while experimenting with each language to answer the above question and I'm going to touch on a few big ones before digging into various scenarios and languages. I should also prefix this with the assumption that I'm talking about community-driven open-source software. This can of course apply to any team, open or closed, but I don't really care about those one-off programs that never leave your hard-drive. Here is a list of things to consider while evaluating your choices:

  • Don't be Different - Disregard all that hoopla about everyone being their own unique, beautiful butterfly - sometimes it's best to conform. If I want to hack on the Linux kernel, I'm probably going to be doing it in C or assembly. If I want to contribute a plugin to Drupal or Wordpress, it's going to be in PHP. Even though it is technically possible to embed some other language into a project, it's usually easiest to take the path of least resistance. There is only one work flow, one set of development tools, and less time spent context switching between the different languages. If there is some precedent for a language already, stop and use that. The rest of this post should be applied when you have a clean slate and can make choices without worrying about tight integration with an existing project.

  • Be Mature - I recently read this article suggesting we need a new programming language, and while the ideas are nice, I can't say I agree. As great as new languages like Go may be, you never know when the plug may be pulled on the core development team. There is always the option of maintaining the language tools as well as your project, but that's a lot of extra work. I like to choose languages that I know are not going anywhere or won't be changing too drastically in the future.

  • Be Popular - There are a few websites out there that try to measure programming language popularity from various sources. Take them with a grain of salt, but it does give you a pretty good idea of who is hot or not, and even what the recent trends are. If a language doesn't appear in the top 20-30 of the general lists, I usually don't look any further. Google Trends can be useful as well to create your own trending graphs from their data set. Popularity is important because you want to have useful developer tools and a community to help answer your questions. If you and a professor at some university are the only people using a language, there is going to be a bottleneck on resources. This is also critical for open source projects when you want to build a developer community. Choose a language that folks already know so they can make useful contributions and help make your software better.

  • Determine What's Important - As with many aspect of computer science (and life), choosing a language is all about trade-offs. To make these decisions, you need to know what limits you are going to hit. Are you going to be bound by CPU, disk, network, user, or some other resource? By user bound I mean your application will always be limited by user interaction, and no hardware resource limits will ever be hit. If you are CPU bound, you probably want a machine code or efficient byte-code compiled language. If you are user bound, performance matters less so you have more options. Keep in mind these limits are not isolated and can have effects on one another. For example, some languages may make I/O interaction really easy at the cost of space, but this may be due to double or even triple buffering of data, causing your CPU usage to increase too.

  • Don't Guess - I found myself performing a number of micro-benchmarks to test various aspects of the languages. How long does it take to call a function? How much overhead is there in the concurrency primitives? How expensive is context switching? How efficient are the built-in string processing functions? It's best to answer these questions by writing small programs in each language and comparing the results.

  • Choose the Best Tool for the Job - Sometimes you choose a language mainly because it has a particular library, module, or some built in feature suitable for your application. Most languages have the same standard library bits, but many have a few niche uses and have great support for certain features. For example, if you're going to be running on large multi-core machines and need to share a lot of memory between threads (so not multi-process), languages that have a single interpretor lock like Python may not be a good choice. If you want a simple, integrated webserver framework that you can customize, Python is great. C or C++ may not be the best choice in this case because you'll mostly be writing your own. Examine what your primary feature requests are and how well they are met by each language.

My Current Preferences


Below is a list of a few classes of applications and my current preference for each. You'll notice a lack of Java in the discussion, mainly because I've always been on the C++ side for object-oriented applications. I'd rather put the time into a C++/STL/boost application and eliminate the extra VM layer at runtime. C/C++ also has the benefit of being able to link with other C/C++ libraries natively, where in Java you would need to write a JNI wrapper, find the Java equivalent, or write your own native library.

  • Web Applications - Early on I used Perl for all my web programming, then I switched to PHP for a number of years, and recently I've found myself preferring Python. It is a fantastic language and allows you to do almost anything with the objects at runtime (for better or worse). I've been doing some work on OpenStack and have found the WSGI standard great for building modular web applications. Combined with an event framework like Eventlet, you don't even really need Apache httpd. The main concerns with Python are CPU bound tasks and SMP support because of the global interpretor lock (GIL). Most of the web apps I write are not bound by either and most of the heavy lifting (if any) is pushed to some other service that is more efficient (like a database). Projects like Django and Pylons take this to the next level providing frameworks around these basic ideas, but if you want to keep it simple then 10 lines of Python will get you a functioning web server and WSGI application (with a dependency on Eventlet). The Routes and SQL Alchemy packages also provide some very useful functionality while building your web applications.

  • Scripting, Tools, and Middleware - For these types of apps, I've mainly used Perl or a combination of shell/sed/awk, but recently I've again found Python to be a better fit. Decent versions of Python are standard on any system now, so you don't need to worry about customizing or installing any dependencies to get your applications running. Again, if there are SMP or CPU performance concerns, you might need to look at another language.

  • Shared Libraries and Drivers - These consist of libraries that are used to provide some core functionality or other service. For example, libz for compression or libmysql to talk with MySQL servers. You really want the lowest common denominator so the library can easily be wrapped and reused in a number of other languages. This means writing it in C. Python, PHP, Perl, Erlang, Ruby, Lua, and pretty much all others have well defined interfaces for interacting with C libraries. Projects such as SWIG even take care of some of this interfacing work for you, allowing you to build multiple language bindings at once. You can of course write your driver in each language natively, but this can be a lot of work. You can probably get away with writing the library in C++, but you'll most likely run into more issues than if you had just used C.

  • Servers - This is where most of my time has gone throughout my career, and for about 10 years the answer was always C. I was always trying to squeeze every bit of CPU and memory out the servers I was writing. In the past three years I started doing a lot more C++ work for MySQL related projects like Drizzle, and recently I've been experimenting with a number of alternatives. In a previous blog post I tested performance and throughput for a few different solutions, and I while I was impressed with the higher level languages, the C++ version still won by a good margin. In further tests I performed more CPU-intensive calculations and the Javascript and Python versions went through the roof compared to C++. This was most likely due to less time being spent in the kernel for the I/O calls, which should be about the same regardless of language. There were two languages that did stand out in the performance tests: Go and Erlang. Even with heavier CPU loads, they both performed quite well, usually taking only 10-15% more time than the C or C++ equivalents. Go is still a no-go due to it's immaturity, but I think Erlang is a real contender. I've been somewhat frustrated with C++ due to it's verbosity and nuances. For example, defining and debugging complex template code can be a nightmare, but it's required if you want to use the STL. When doing the same thing in Erlang, I found myself writing more concise code with less bugs in a fraction of the time. In other words, the code was almost as fast and much more elegant than the C or C++ equivalents.

And the winner is...


There is of course no single winner, choose the best tool for the job. I think the combination of C, Python, and Erlang are a good fit for a wide variety of applications. The mental shift to a functional language may take a bit in the case of Erlang, but I encourage you to give it a try if you have not already. The main downside of Erlang is its popularity (or lack thereof). It's not too far down the list, but certainly not in the top ten. This is probably due to it being a functional language and not having a history of general purpose applications. The popularity of projects such as CouchDB and RabbitMQ are putting Erlang on the map and giving developers a reason to take a closer look. If you still need to squeeze every bit of CPU and memory out of your applications, you'll probably need to stick with C or C++.

Comment via Twitter or E-Mail


Scale Stack vs node.js vs Twisted vs Eventlet

July 28, 2010

We've been discussing switching from Tornado to either Twisted or Eventlet for Nova (the compute project for OpenStack), so I decided to setup a test to see if there are performance differences to take into consideration. While I was at it I decided to include node.js since that's all the rage these days, as well as Scale Stack, a C++ project I started earlier this year.

The Test


I wanted to check for two main factors: handling of large numbers of concurrent connections and the overhead with transferring large amounts of data. To do this I wrote a simple echo server in each framework and then used the Scale Stack echo flood tool to test each one. The tool allows you to specify the number of concurrent connections and how much data to send and verify in 32k chunks. You can find the echo server and flood tool for Scale Stack in the project source code. For each of the others, here is the echo server source:

node.js
var net = require('net');
net.createServer(function (socket) {
  socket.on("data", function (data) {
    socket.write(data);
  });
  socket.on("end", function () {
    socket.end();
  });
}, {backlog: 32768}).listen(12345, "localhost");
Twisted
from twisted.internet.protocol import Protocol, Factory
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
 
class Echo(Protocol):
    def dataReceived(self, data):
        self.transport.write(data)
 
factory = Factory()
factory.protocol = Echo
reactor.listenTCP(12345, factory, backlog=32768)
reactor.run()
Eventlet
import eventlet
 
def handle(fd):
  while True:
    c = fd.recv(16384)
    if not c: break
    fd.sendall(c)
 
server = eventlet.listen(('0.0.0.0', 12345), backlog=32768)
pool = eventlet.GreenPool(size=32768)
count = 0
while True:
  new_sock, address = server.accept()
  pool.spawn_n(handle, new_sock)

Setup


Since none of the frameworks run multi-core for this test (although Scale Stack could), I decided to use my laptop which is a 2.4ghz Core 2 Duo with 4GB of memory running Ubuntu 10.4. There will be one core for the server, and one for the client. Doing the test on a single machine also lets us cut network bottlenecks out of the picture since it all runs through the local interface. In order to test at the high connection counts, I needed to tweak some system limits. I allow for 64k file descriptors per process in /etc/security/limits:
root             soft    nofile          65535
root             hard    nofile          65535
*                soft    nofile          65535
*                hard    nofile          65535
You'll notice really high listen backlog settings for the echo server code above. The kernel limits need to match this as well so we need to set these new limits in /proc. I also increased the ephemeral port range so we can get up to 32k active client connections and reduced the kernel socket buffer sizes so I don't out of memory. These can be set with:
echo 32768 > /proc/sys/net/core/netdev_max_backlog
echo 32768 > /proc/sys/net/core/somaxconn
echo "21000 61000" > /proc/sys/net/ipv4/ip_local_port_range
echo 8192 > /proc/sys/net/core/rmem_default
echo 8192 > /proc/sys/net/core/wmem_default
With the system limits set, I started running the flood tool with connection counts from 1 through 32k. For each connection count, I ran the test with the connection echoing 32k of data and 512k of data. I ran each test three times for each server and took the lowest time (times were very consistent across the board, so any sample would have done).

Results



Graph of the result listed below.


 124816326412825651210242048409681921638432768
Scale Stack 32k.05.05.05.05.06.06.07.10.14.20.32.52.911.723.496.83
node.js 32k.05.05.05.05.06.06.08.10.14.20.35.671.302.435.1410.11
Twisted 32k.05.05.05.06.06.06.08.10.14.24.39.731.432.715.3310.54
Eventlet 32k.05.05.05.05.06.06.07.10.14.20.32.621.152.264.609.30
Scale Stack 512k.05.05.06.06.08.10.16.25.45.821.593.136.3412.5624.9750.25
node.js 512k.05.08.10.12.16.22.29.49.851.513.106.3212.6126.7558.56117.04
Twisted 512k.05.06.06.07.10.13.22.43.821.633.246.2211.4822.7244.4889.46
Eventlet 512k.06.06.06.07.10.12.21.35.691.292.614.919.6819.5538.6277.80


After the above tests, I also started each server up one at a time and ran a 32k connection client that sent data indefinitely to saturate the process. Here are the vmstat numbers of my system during these tests:

 Context SwitchesUser %System %Idle %Client Delay (s)
Scale Stack70079306.30
node.js9k3140297.07
Twisted13k2745288.93
Eventlet24k2849235.15


In all cases the server process was consuming an entire core. The idle times were on the core running the client tool, since the server could not always keep up with the client load. The last column labeled "Client Delay" was another time test I ran while the server was saturated to measure response time. For this test, a client would connect, send 32k of data, wait for the echo response, and then disconnect. Results are in seconds for this test.

Conclusions


I was very impressed with how node.js and the Python frameworks held up. I've been writing event-driven servers in C/C++ for the past decade or so and didn't think the higher level languages could handle this kind of load as well as they did. My only concern with node.js or Python is not being able to use all the cores on your system. Some services are well suited to run multiple server process on a single machine or to farm work out to worker process pools to utilize all your cores, so this will be less of an issue. Other services are best implemented when all connections are in a single process and use thread pools instead. For that you'll still need to rely on a C or C++ based server (Scale Stack is meant to be a framework like the others to help in these cases). Servers written in Erlang or Java would probably perform decently across multiple cores as well.

For short lived connections transferring less than 32k of data, all frameworks scaled very well. When a larger amount of data was being sent we started to see some differentiation. This could be due to buffering techniques or simply the overhead of calling into the language handlers more often. The increase in user % in the processor utilization from the vmstat output for node.js and Python supports this. Scale Stack only buffers once on read and has less runtime overhead since it is not running in an interpretor. The node.js and Python servers may be able to be optimized to avoid double buffering if that is indeed happening, please let me know if that is the case.

As far as the original question of Twisted vs Eventlet, I don't think performance will be much of a deciding factor. Eventlet has a slight boost in performance and claims to be easier to write services in, but other folks still swear by Twisted. It is probably safe to say that available framework features and personal preference will be the deciding factors.

Update - August 6, 2010


I decided to run a few more versions for just the 32k connection, 512k data test. Below are the repeated times for the original four, plus Erlang, regular Python threads, and two versions of Go.

  • Scale Stack 50.25
  • node.js 117.04
  • Twisted 89.46
  • Eventlet 77.80
  • Erlang 61.65
  • Python threads 111.04 (lots of memory even with minimal stack size)
  • Go v1 62.95
  • Go v2 59.73

The Go version is very impressive, almost as fast as the C++ version. Of course these last four you get SMP without any extra work, which is a bonus. It turns out the default socket buffer sizes in Erlang are only 1500 bytes (MTU size). So be sure to push these up (in this test I set it to 16k). Memory consumption with the Erlang server was also fairly low (peak around 400M, usually around 150M).

Comment via Twitter or E-Mail


Threads with Events

April 20, 2010

Last week I was surprised to see this paper bubble back up on Planet MySQL. It describes the pros and cons of thread and event based programming for high concurrency applications (like a web server), arguing that thread-based programming is superior if you use an appropriate lightweight threading implementation. I don't entirely disagree with this, but the problem is such a library does not exist that is standard, portable, and useful for all types of applications. We have POSIX threads in the portable Linux/Unix/BSD world, so we need to work with this. Other experimental libraries based on lightweight threads or "fibers" are really interesting as they can maintain your stack without all the normal overhead, but it is hard to get the scheduling correct for all application types. I would even argue that thread and event based programming is actually not all that different, it's just a matter of how state is maintained (stack vs state variables) and how scheduling is performed.

The comparisons done in that paper also put a C-based web server using a co-routine threading library against a Java based server that depends on the poll() system call. I'm sorry, but this is comparing apples to oranges. First, you're in the Java VM with a number of runtime components (like garbage collection) which may be getting in the way. Also, the standard poll() system call is not an efficient event-handling mechanism, it's much better to use epoll or some other Kernel-based handling mechanism.

One high-concurrency userland threading implementation I do like is in Erlang. Erlang processes are extremely lightweight and I've written apps that depend heavily on them. One interesting application I saw was caching objects where each object got it's own Erlang process. This put a whole new spin on cache management, and it looked like it could actually scale reasonably well. The "problem" with Erlang, which may or may not be a problem depending on your requirements, is that it is still a bit of overhead running byte-code in a VM, as well as it being a functional language. I love functional programming, but I've found it still ties most developer's heads in knots if they don't have a reason to use it regularly. For open source projects trying to build a contributor community, it can act as one more hurdle.

So, what is the "best" paradigm?


Back in 2000 some colleagues and I wrote a hybrid thread-event library that would create one event-handler instance per thread, and connections would be spread across the pool of event-handling threads. I believe this gave the best of both worlds, and I saw high throughputs with fairly minimal overhead. I wrote a number of servers based on this architecture, including HTTP, IMAP, POP3, and DNS, and with each server type this model proved to be efficient and scalable. Ultimately the best architecture depends on your application. If you never intend to have many connections, and your applications has long-running computations, one-thread-per-connection would probably be best. If you need to handle large numbers of connections and have short, non-blocking request processing, event-based scales extremely well. You can of course create a hybrid of these two and have all connections managed by event threads and asynchronous queues to dedicated processing threads for heavy request handling (this is sort of what I did in the C Gearman Job Server).

There is no single correct answer, so take a look at your options before deciding how to approach your own applications. Don't be afraid to create hybrids as well. Regardless of which paradigm you choose, concurrent programming can be hard, especially at the lower levels. There have been a number of higher level abstractions to help developers, from new libraries to new languages, but most of these come with a cost in performance or flexibility. When you need to squeeze every bit of performance out of your application, you will most likely end up in C or C++ dealing with these issues directly.

This is actually one of the problems I'm attempting to address with the Scale Stack Event modules. I'm trying to create a healthy level of abstraction on hybrid thread/event based applications so you don't have any overhead or limitations while a lot of the common headaches are taken care of for you. If you have a need for such a system, get in touch, I'd be interested to talk. Since it is BSD licensed you can use it in any application, including commercial.

Comment via Twitter or E-Mail


View all thoughts.
Thoughts
Code
About
RSS

Scale Stack
NW Veg
Veg Food & Fit