Eric DayThoughts, code, and other oddments. |
Dark | Light |
|
|
|
< Gearman and Drizzle at OSCON || Narada – A Scalable Open Source Search Engine > Cache Line Sizes and ConcurrencyMay 27th, 2009We’ve been looking at high concurrency level issues with Drizzle and MySQL. Jay pointed me to this article on the concurrency issues due to shared cache lines and decided to run some of my own tests. The results were dramatic, and anyone who is writing multi-threaded code needs to be aware of current CPU cache line sizes and how to optimize around them. I ran my tests on two 16-core Intel machines, one with a 64 byte cache line, and one with 128 byte cache line. First off, how did I find these values? one:~$ cat /proc/cpuinfo | grep cache_alignment cache_alignment : 64 ... two:~$ cat /proc/cpuinfo | grep cache_alignment cache_alignment : 128 ... You will see one line for each CPU. If you are not familiar with /proc/cpuinfo, take a closer look at the full output. It’s a nice quick reference of other things like L2 cache sizes and CPU speed. As you can see, machine one has a 64 byte cache size, and machine two has a 128 byte cache size. Next, I wrote the following C program to test concurrency: This program creates a global array of counter variables and runs a variable number of threads, where each thread increments it’s own 4-byte counter in the array. It does so at a number of array spacing levels to see the performance when counters fall on the same cache lines. With a spacing of 1 the memory is directly adjacent, and for each spacing level it skips that many counter variables in the global array. For example, if spacing is 4, the threads would use counter[0], counter[4], counter[8], and so on, which uses a chunk of memory every 16 bytes. The cache_line.c program outputs a CSV formatted table that you can use to generate some graphs. The seconds CSV output is the same set of tests without using the global array counters, and instead a local counter on the stack. This is meant to provide a baseline (since those will always be on their own cache line). The results were: So what does this tell us? When spacing is one and all counter memory (16 threads * 4 bytes == 64 bytes) is entirely on one cache line, concurrency is poor. As we add more space between each counter variable, we start to see performance improve (faster runtime). This is because all thread counters are no longer on one cache line. On the 64 byte cache line machine, we see things really level off when spacing is 16. This is because each counter is now on it’s own cache line. On the 128 byte cache line machine, you can see it takes one more iteration of spacing because the cache line is twice as big. So what can we take from this? If you have any arrays or data structures that are accessed and updated independently from different threads, make sure they are on a different cache line. This may mean wasting a little space, but as you can see, the concurrency performance is well worth it. Posted in Drizzle, Main, MySQL5 Responses to "Cache Line Sizes and Concurrency"
Leave a Reply< Gearman and Drizzle at OSCON || Narada – A Scalable Open Source Search Engine > |
Blog Wiki About Resume RSS Comments Launchpad identi.ca OpenStack Scale Stack Gearman NW Veg Veg Food & Fit |
|
Copyright (C) Eric Day - eday@oddments.org All content licensed under the Creative Commons Attribution 3.0 License. Hosted by Rackspace Cloud |
|
Hi Eric,
Thanks for the info. This is something to really keep in mind when designing concurrent systems.
I would also be interested in the affect of spacing on atomic ops, if you have some time… :)