Eric Day

Thoughts, code, and other oddments.
Dark | Light

< || >

Drizzle Regression Hunting

June 9th, 2009

We’ve been looking for a Drizzle regression for some time now, and today I decided I would take a step back and make another attempt to find it. The first step in doing this was to reproduce this consistently and find a baseline. We’ve noticed it most dramatically with a 16 concurrent connection test from sysbench in read-only mode. I used two 16-core Intel machines running Linux we have for development. We’ve noticed the regression on certain machines but not all, and these two machines provided one of each. I also setup a MySQL 5.1.35 server to use as a baseline to give some comparisons outside of Drizzle. So first, a few more details on the machines:

Machine 1: 16 core, 16GB RAM, cache sizes from dmesg:
[    0.010000] CPU: L1 I cache: 32K, L1 D cache: 32K
[    0.010000] CPU: L2 cache: 4096K
From /proc/cpuinfo:
cache_alignment : 64

Machine 2: 16 core, 40GB RAM, cache sizes from dmesg:
[    0.010000] CPU: Trace cache: 12K uops, L1 D cache: 16K
[    0.010000] CPU: L2 cache: 1024K
[    0.010000] CPU: L3 cache: 16384K
From /proc/cpuinfo
cache_alignment : 128

For Drizzle I used the latest trunk in Launchpad (r1058), and for MySQL I downloaded mysql-5.1.35-linux-x86_64-glibc23.tar.gz from mysql.com.

For sysbench, I grabbed the Drizzle branch of it at lp:~drizzle-developers/sysbench/trunk since this has the libdrizzle driver. The libdrizzle driver also supports the MySQL so I use it to test against both. The sysbench commands I used were:

Drizzle: sysbench –test=oltp –oltp-read-only=on –max-time=15 –max-requests=0 –oltp-table-size=1000000 –num-threads=16 –db-ps-mode=disable –db-driver=drizzle –drizzle-host=127.0.0.1 –drizzle-port=4427 –drizzle-db=test –drizzle-user=root –drizzle-table-engine=innodb run

MySQL: sysbench –test=oltp –oltp-read-only=on –max-time=15 –max-requests=0 –oltp-table-size=1000000 –num-threads=16 –db-ps-mode=disable –db-driver=drizzle –drizzle-host=127.0.0.1 –drizzle-port=3306 –drizzle-db=test –drizzle-user=root –drizzle-table-engine=innodb –drizzle-mysql=on run

I started Drizzle and MySQL with the following options. These are not meant to be finely tuned options, but just enough to get the servers running with some sane comparable defaults and able to reproduce the regression.

bin/mysqld –no-defaults –server-id=1 –port=3306 –socket=/home/eday/other/mysql.data/sock.master –basedir=/home/eday/other/mysql –datadir=/home/eday/other/mysql.data/db.master –log-error=/home/eday/other/mysql.data/db.master/error –innodb-buffer-pool-size=128M –innodb-log-file-size=64M –innodb-log-buffer-size=8M –innodb-thread-concurrency=0 –innodb-additional-mem-pool-size=16M –character-set-server=utf8 –table-open-cache=4096 –open-files-limit=4096 –pid-file=/home/eday/other/mysql.data/db.master/pid

drizzled –datadir=/home/eday/other/drizzle.data –innodb-buffer-pool-size=128M –innodb-log-file-size=64M –innodb-log-buffer-size=8M –innodb-thread-concurrency=0 –innodb-additional-mem-pool-size=16M –table-open-cache=4096 –table-definition-cache=4096

Now with everything up and running, I gathered some data. Headings are: -

  1-drizzle 1-mysql 2-drizzle 2-mysql
TPS 1335 2434 1559 1239
  vmstat
in 6k 110k 60k 50k
cs 100k 210k 120k 100k
us 22 75 72 78
sy 6 20 25 18
id 72 5 3 4
wa 0 0 0 0
  valgrind with cachegrind tool
TPS 5.21 3.15 3.55 1.96
iref 858M 1011M 668M 789M

As you can see, we hit the major regression in column one. Our interrupts and context switches are way out of line, and the CPU is mostly idle. Note though that when run under cachegrind (valgrind –tool=callgrind ), we see the normal pattern and don’t notice the regression. This means to reproduce we can’t have any intrusive debugging tools. I also tried counting system calls as a sanity check and found (using strace -fc ):

1-drizzle: 402 TPS
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.47  106.807940        2867     37252     11127 futex
 13.97   26.907878      220556       122           select
  5.67   10.913999    10913999         1           rt_sigtimedwait
  5.29   10.181982      565666        18           poll
  2.16    3.1154314          11    362463           read
  1.63    2.1146381        1479      2128           pread
  1.55    2.981563          16    181220           write
  0.67    0.1297864         122     10598           sched_yield
  0.67    0.1288412        1394       924           nanosleep

1-mysql: 245 TPS
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.03  108.1509367        4594     23836      5835 futex
 15.42   27.1119721      100788       279         1 select
  5.25    8.1579991     1064443         9           rt_sigtimedwait
  1.65    2.1005637         637      4716           pread
  1.30    1.1367183          21    110986           write
  1.07    1.950471           9    221889    221889 sched_setscheduler
  1.06    1.929137           9    223097      1044 read

2-drizzle: 276 TPS
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.93  115.556211        3704     31198      8101 futex
 15.30   26.810000      203106       132           select
  5.89   10.330000    10330000         1           rt_sigtimedwait
  5.80   10.170000      565000        18           poll
  2.65    4.647123          37    125060           write
  2.35    4.116470          16    250158           read
  1.72    3.021068        1483      2037           pread
  0.27    0.477662       20768        23           fsync
  0.04    0.072760           4     18218           madvise
  0.02    0.043264          18      2357           sched_yield

2-mysql: 168 TPS
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 68.25  100.851565        6704     15044      3328 futex
 21.22   31.350000      116111       270         1 select
  5.17    7.642591      347391        22           rt_sigtimedwait
  1.93    2.853778         619      4612           pread
  1.15    1.694888          11    153040       346 read
  0.90    1.327743           9    152529    152529 sched_setscheduler
  0.78    1.145964          15     76305           write
  0.32    0.468542          37     12705           madvise

Again, when tracing the process to count system calls, we see the regression disappear, which leaves us with a smaller set of tools to use.

So what next? One theory we were tossing around is a cache alignment issues. This seems like a pretty dramatic drop in performance to be caused by this, but I ran a test to see what the behavior of a process is when you are being throttled by a shared cache line. The results showed idle CPU, but the interrupts and context switches did not drop off. This does not follow the same pattern we saw in Drizzle (interrupts and context switches did drop off). Our cache line size is also smaller on the machine showing the regression, so that did not help support this theory.

While stabbing at a few other ideas, I ran ldd to see which libraries were being use in the two drizzled binaries on each machine. Suppressing some common libs:

1-drizzle: ldd drizzled/drizzled
        ...
        libpcre.so.3 => /lib/libpcre.so.3 (0x00007f9a88532000)
        libtbb.so.2 => /usr/lib/libtbb.so.2 (0x00007f9a88320000)
        libtcmalloc.so.0 => /usr/lib/libtcmalloc.so.0 (0x00007fcced5bc000)
        ...

2-drizzle: ldd drizzled/drizzled
        ...
        libpcre.so.3 => /lib/libpcre.so.3 (0x00007fc0af803000)
        libtbb.so.2 => /usr/lib/libtbb.so.2 (0x00007fc0af5e9000)
        ...

The machine showing the regression is linking with tcmalloc. Looking at the drizzle configure.ac, we use libtcmalloc by default if it is found (machine 2 does not have tcmalloc installed). I relinked drizzled without tcmalloc and received these results:

  1-drizzle 1-mysql 2-drizzle 2-mysql
TPS 2751 2434 1559 1239

There it is! For some reason tcmalloc was giving us a 51% performance drop. Perhaps this is due to the tcmalloc version or settings we need to tweak for performance (something to look into later), but for now disabling this by default is the solution. We’re verifying the fix now and should be in the Drizzle trunk shortly.

Posted in Drizzle, Main, MySQL

2 Responses to "Drizzle Regression Hunting"

  1. There is nothing worse then a problem that goes away when you try to analyze it!

    So well done for finding this one!

    Good to know that tcmalloc can have such a dramatic affect.

    And, strange that the problem disappeared as soon as the timing was slightly changed.

    Would be great to know why. But as you say, that will be a job for another day…

Leave a Reply


< || >
Blog
Wiki
About
Resume
RSS
Comments

E-Mail
Launchpad
LinkedIn
Twitter
identi.ca
Facebook

OpenStack
Scale Stack
Gearman
NW Veg
Veg Food & Fit

Linux On Laptops