Yet Another Language Comparison
September 22, 2010
Over the past year or so I've found myself evaluating my overall
programming experience with the languages I'm working with. I might just
be getting impatient in my old age (turning the big three-oh in a couple
months), but I like to think I'm trying to find the most efficient way
to solve the problem at hand. This has led me to learn and experiment
with a number of languages, taking a look at each one's strengths and
weaknesses. I realize programming language selection is very subjective
and folks can get quite passionate in the debate, but I'm still going to
present my personal opinions on the matter. Flame away.
The main question that I'm trying to try to answer is: What language
will enable me to solve the problem at hand correctly and in the
fastest way possible? By correctly I mean without bugs or missing
requirements. By fastest I not only mean the initial design and coding
phases, but also maintenance. I'm a strong believer that no piece of
software is ever complete and is usually read many more times than it
is written. An application needs to be written in a way that is easy to
jump back into it after some period of time. I've considered a number of
metrics while experimenting with each language to answer the above question
and I'm going to touch on a few big ones before digging into various
scenarios and languages. I should also prefix this with the assumption
that I'm talking about community-driven open-source software. This can of
course apply to any team, open or closed, but I don't really care about
those one-off programs that never leave your hard-drive. Here is a list
of things to consider while evaluating your choices:
- Don't be Different - Disregard all that hoopla about everyone
being their own unique, beautiful butterfly - sometimes it's best to
conform. If I want to hack on the Linux kernel, I'm probably going to be
doing it in C or assembly. If I want to contribute a plugin to Drupal or
Wordpress, it's going to be in PHP. Even though it is technically possible
to embed some other language into a project, it's usually easiest to
take the path of least resistance. There is only one work flow, one set
of development tools, and less time spent context switching between the
different languages. If there is some precedent for a language already,
stop and use that. The rest of this post should be applied when you have a
clean slate and can make choices without worrying about tight integration
with an existing project.
- Be Mature - I recently read this
article suggesting we need a new programming language, and while
the ideas are nice, I can't say I agree. As great as new languages like
Go may be, you never know when the plug
may be pulled on the core development team. There is always the
option of maintaining the language tools as well as your project, but
that's a lot of extra work. I like to choose languages that I know are
not going anywhere or won't be changing too drastically in the future.
- Be Popular - There are a few websites out there that try
to measure programming language popularity from various sources. Take
them with a grain of salt, but it does give you a pretty good idea of
who is hot or not, and even what the recent trends are. If a language
doesn't appear in the top 20-30 of the general lists, I usually don't
look any further. Google Trends
can be useful as well to create your own trending graphs from their data
set. Popularity is important because you want to have useful developer
tools and a community to help answer your questions. If you and a professor
at some university are the only people using a language, there is going
to be a bottleneck on resources. This is also critical for open source
projects when you want to build a developer community. Choose a language
that folks already know so they can make useful contributions and help
make your software better.
- Determine What's Important - As with many aspect of computer
science (and life), choosing a language is all about trade-offs. To
make these decisions, you need to know what limits you are going to
hit. Are you going to be bound by CPU, disk, network, user, or some other
resource? By user bound I mean your application will always be limited by
user interaction, and no hardware resource limits will ever be hit. If you
are CPU bound, you probably want a machine code or efficient byte-code
compiled language. If you are user bound, performance matters less so
you have more options. Keep in mind these limits are not isolated and
can have effects on one another. For example, some languages may make
I/O interaction really easy at the cost of space, but this may be due
to double or even triple buffering of data, causing your CPU usage to
increase too.
- Don't Guess - I found myself performing a number of
micro-benchmarks to test various aspects of the languages. How long does
it take to call a function? How much overhead is there in the concurrency
primitives? How expensive is context switching? How efficient are the
built-in string processing functions? It's best to answer these questions
by writing small programs in each language and comparing the results.
- Choose the Best Tool for the Job - Sometimes you choose a
language mainly because it has a particular library, module, or some
built in feature suitable for your application. Most languages have the
same standard library bits, but many have a few niche uses and have great
support for certain features. For example, if you're going to be running on
large multi-core machines and need to share a lot of memory between threads
(so not multi-process), languages that have a single interpretor lock
like Python may not be a good choice. If you want a simple, integrated
webserver framework that you can customize, Python is great. C or C++
may not be the best choice in this case because you'll mostly be writing
your own. Examine what your primary feature requests are and how well
they are met by each language.
My Current Preferences
Below is a list of a few classes of applications and my current preference
for each. You'll notice a lack of Java in the discussion, mainly because
I've always been on the C++ side for object-oriented applications. I'd
rather put the time into a C++/STL/boost application and eliminate the
extra VM layer at runtime. C/C++ also has the benefit of being able to link
with other C/C++ libraries natively, where in Java you would need to write
a JNI wrapper, find the Java equivalent, or write your own native library.
- Web Applications - Early on I used Perl for all my web
programming, then I switched to PHP for a number of years, and
recently I've found myself preferring Python. It is a fantastic
language and allows you to do almost anything with the objects
at runtime (for better or worse). I've been doing some work on OpenStack and have found the WSGI standard great for building
modular web applications. Combined with an event framework like Eventlet, you don't even really need
Apache httpd. The main concerns with Python are CPU bound tasks and SMP
support because of the global interpretor lock (GIL). Most of the web
apps I write are not bound by either and most of the heavy lifting
(if any) is pushed to some other service that is more efficient
(like a database). Projects like Django and Pylons take this to
the next level providing frameworks around these basic ideas, but
if you want to keep it simple then 10 lines of Python will get you
a functioning web server and WSGI application (with a dependency on
Eventlet). The Routes and SQL Alchemy packages also provide
some very useful functionality while building your web applications.
- Scripting, Tools, and Middleware - For these types of apps,
I've mainly used Perl or a combination of shell/sed/awk, but recently
I've again found Python to be a better fit. Decent versions of Python are
standard on any system now, so you don't need to worry about customizing
or installing any dependencies to get your applications running. Again,
if there are SMP or CPU performance concerns, you might need to look at
another language.
- Shared Libraries and Drivers - These consist of libraries that
are used to provide some core functionality or other service. For example,
libz for compression or libmysql to talk with MySQL servers. You really
want the lowest common denominator so the library can easily be wrapped and
reused in a number of other languages. This means writing it in C. Python,
PHP, Perl, Erlang, Ruby, Lua, and pretty much all others have well defined
interfaces for interacting with C libraries. Projects such as SWIG even
take care of some of this interfacing work for you, allowing you to build
multiple language bindings at once. You can of course write your driver
in each language natively, but this can be a lot of work. You can probably
get away with writing the library in C++, but you'll most likely run into
more issues than if you had just used C.
- Servers - This is where most of my time has gone throughout
my career, and for about 10 years the answer was always C. I was
always trying to squeeze every bit of CPU and memory out the servers
I was writing. In the past three years I started doing a lot more
C++ work for MySQL related projects like Drizzle, and recently I've
been experimenting with a number of alternatives. In a
previous blog post I tested performance and throughput for a few
different solutions, and I while I was impressed with the higher level
languages, the C++ version still won by a good margin. In further
tests I performed more CPU-intensive calculations and the Javascript
and Python versions went through the roof compared to C++. This was
most likely due to less time being spent in the kernel for the I/O
calls, which should be about the same regardless of language. There
were two languages that did stand out in the performance tests: Go and
Erlang. Even with heavier CPU loads, they both performed quite well,
usually taking only 10-15% more time than the C or C++ equivalents. Go
is still a no-go due to it's immaturity, but I think Erlang is a real
contender. I've been somewhat frustrated with C++ due to it's verbosity
and nuances. For
example, defining and debugging complex template code can be a nightmare,
but it's required if you want to use the STL. When doing the same thing
in Erlang, I found myself writing more concise code with less bugs in
a fraction of the time. In other words, the code was almost as fast and
much more elegant than the C or C++ equivalents.
And the winner is...
There is of course no single winner, choose the best tool for the job. I
think the combination of C, Python, and Erlang are a good fit for a wide
variety of applications. The mental shift to a functional language may
take a bit in the case of Erlang, but I encourage you to give it a try
if you have not already. The main downside of Erlang is its popularity
(or lack thereof). It's not too far down the list, but certainly not in
the top ten. This is probably due to it being a functional language and
not having a history of general purpose applications. The popularity of
projects such as CouchDB and RabbitMQ are putting Erlang on the
map and giving developers a reason to take a closer look. If you still
need to squeeze every bit of CPU and memory out of your applications,
you'll probably need to stick with C or C++.
Comment via Twitter or E-Mail
View all thoughts. |
Thoughts
Code
About
RSS
Scale Stack
NW Veg
Veg Food & Fit
|