Eric DayThoughts, code, and other oddments. |
Dark | Light |
|
|
|
< Older Entries | Archive for the "Drizzle" CategoryOSCON and OpenStackJuly 26th, 2010
The past two weeks have been both exciting and extremely busy, first traveling to Austin, TX for the first OpenStack Design Summit, and then back home to Portland, OR for The O’Reilly Open Source Conference (OSCON) and Community Leadership Summit. The events were great in different ways, and there was some overlap with OpenStack since we announced it on the first day of OSCON and created quite a bit of buzz around the conference. I want to comment on a few things that came up during these two weeks. New RoleI’m now focusing on OpenStack related projects at Rackspace. I’m no longer working on Drizzle, but I will still be involved in the MySQL and database ecosystems through future projects and conferences (see you at OpenSQL Camp). I will also still be working on a couple of Gearman related projects in my spare time. At OSCON I gave two presentations on Gearman and Drizzle, you can find the slides here. The Five Steps to OpenOne question that came up a few times over the past couple weeks is what the term “Open” means when a business or organization decides to adopt the open source philosophy. It turns out this means many different things to folks, and when an organization decides to go open, they need to make a decision on how open they are willing to be. Here are the various layers we’ve seen over the years:
There have been examples of success for organizations who have stopped at each of these steps. Given the proper environment, any can work. My preference is to work on projects that are fully open, where company and organizational boundaries do not exist between developers and users. I’m thrilled to say that we’ve gone all in with OpenStack. We’re hosted on Launchpad and have a governance structure that allows all parties within the community to have a say in the future of the project. Preventing Vendor Lock-inDuring the Cloud Summit at OSCON, there was a debate titled: “Are Open APIs Enough to Prevent Lock-in?”. Most folks came to the conclusion that the answer is “no,” and I agree. While I feel open APIs are necessary, they are by no means sufficient. Even if a project is open source and allows for open development, it probably will not prevent vendor lock-in. The key is to provide some incentive for vendors to adopt and invest resources within a project. Much like customers don’t want vendor lock-in when choosing a platform, vendors do not want project or feature lock-in when choosing the software to power their business. Each vendor who chooses to participate must have the ability to voice their opinion on the direction of APIs, features, and other project priorities. This is why it is critical that any open source project must take all the steps described above to give the project a chance of being adopted and becoming the de facto standard. There is of course no guarantee that adoption and prevention of vendor lock-in will happen, but I see them as necessary steps. This is another area where OpenStack has done the correct thing. We are planning on having another developer summit in November, and then once every six months after that time. All design discussions and decision making will happen in public forums such as the mailing list and IRC. We want all participants in the community to have a chance to respond to topics being discussed, and we believe the more we have, the more successful the project will be. Having many voices allows the project to be more applicable to different environments. For example, Rackspace and NASA have different requirements for their compute architectures, but they also share many components as well. Through open participation we can ensure all needs are accounted for. Much like the LAMP stack has powered universities, governments, and competing business, we hope OpenStack can do the same. Contributor License Agreement (CLA)During the past couple of weeks a few folks asked what the CLA was all about. When the foundations of OpenStack were forming, the requirement of having a CLA came up from the legal side. Having been involved with open source projects that had very invasive CLAs, initially I had quite a bit of concern. The CLA is actually quite innocuous, and it does NOT require assignment or dual-ownership of copyright. You are the sole owner of code you contribute. For all intents and purposes it is a signed version of the Apache 2.0 license, the CLA just makes these terms more explicit. The CLA is handled through digital signatures, so no papers, pens, or faxing is required. Get Involved!Expect to see more posts on my blog related to OpenStack topics. If you would like to get involved, you can join the IRC channel (#openstack on irc.freenode.net), join the mailing list, or start contributing code! There are even jobs around OpenStack popping up already! Posted in Drizzle, Gearman, Main, MySQL, OpenStack | 4 CommentsMySQL Server Protocol BugJuly 24th, 2010A few months ago I wrote a tool that verified MySQL and Drizzle protocol compatibility, along with testing for all sorts of edge cases. In analyzing protocol command interactions in mysqld, I found that the MySQL server will happily read an infinite amount of data if you exceed the maximum packet size while using a special sequence of protocol packets. The reasoning behind this behavior is so that the server can be polite and flush your data before sending a “max packet exceeded” error message, but perhaps there should be a limit to one’s politeness. What’s more interesting is that you can do this during the client handshake packet without authorization, so anyone could do this to any open MySQL server. The appropriate thing to do here would be to set some maximum limit of data to read and force a connection close when it is reached, otherwise your bandwidth and CPU could be consumed (essentially a DoS attack). This portion of code was ripped out entirely in Drizzle, so there are no risks there. I submitted this as a bug to MySQL and MariaDB back in February and they both have patches available to fix this as well. You can find the bug here and a patch here. If you have publicly accessible MySQL or MariaDB servers, you probably want to upgrade binaries or patch this. Posted in Drizzle, Main, MySQL | 2 CommentsOpen Source Bridge Database SessionsMay 6th, 2010Open Source Bridge, the “conference for open source citizens,” is right around the corner! The sessions were just announced and it’s going to be packed with quite a variety of really interesting talks. From open cloud computing topics to hardware hacking to language hacks (like HipHop from Facebook), I’m really looking forward to being there (I’m helping organize the event, but hopefully I’ll have time to attend sessions as well). I wanted to point out a few of the great database talks:
Beyond the DB talks, I’m also exited for a few other talks around high performance and high availability, from Facebook operations to Rasmus Lerdorf’s talk on making your PHP applications faster. I’ll also take the opportunity to shamelessly plug my own talk on writing high performance multi-core applications. There are also rumors of donut trucks, tesla coils, and scavenger hunts. You should register to attend today, it’s going to be awesome. Posted in Drizzle, Main, MySQL | 2 CommentsThreads with EventsApril 20th, 2010Last week I was surprised to see this paper bubble back up on Planet MySQL. It describes the pros and cons of thread and event based programming for high concurrency applications (like a web server), arguing that thread-based programming is superior if you use an appropriate lightweight threading implementation. I don’t entirely disagree with this, but the problem is such a library does not exist that is standard, portable, and useful for all types of applications. We have POSIX threads in the portable Linux/Unix/BSD world, so we need to work with this. Other experimental libraries based on lightweight threads or “fibers” are really interesting as they can maintain your stack without all the normal overhead, but it is hard to get the scheduling correct for all application types. I would even argue that thread and event based programming is actually not all that different, it’s just a matter of how state is maintained (stack vs state variables) and how scheduling is performed. The comparisons done in that paper also put a C-based web server using a co-routine threading library against a Java based server that depends on the poll() system call. I’m sorry, but this is comparing apples to oranges. First, you’re in the Java VM with a number of runtime components (like garbage collection) which may be getting in the way. Also, the standard poll() system call is not an efficient event-handling mechanism, it’s much better to use epoll or some other Kernel-based handling mechanism. One high-concurrency userland threading implementation I do like is in Erlang. Erlang processes are extremely lightweight and I’ve written apps that depend heavily on them. One interesting application I saw was caching objects where each object got it’s own Erlang process. This put a whole new spin on cache management, and it looked like it could actually scale reasonably well. The “problem” with Erlang, which may or may not be a problem depending on your requirements, is that it is still a bit of overhead running byte-code in a VM, as well as it being a functional language. I love functional programming, but I’ve found it still ties most developer’s heads in knots if they don’t have a reason to use it regularly. For open source projects trying to build a contributor community, it can act as one more hurdle. So, what is the “best” paradigm? Back in 2000 some colleagues and I wrote a hybrid thread-event library that would create one event-handler instance per thread, and connections would be spread across the pool of event-handling threads. I believe this gave the best of both worlds, and I saw high throughputs with fairly minimal overhead. I wrote a number of servers based on this architecture, including HTTP, IMAP, POP3, and DNS, and with each server type this model proved to be efficient and scalable. Ultimately the best architecture depends on your application. If you never intend to have many connections, and your applications has long-running computations, one-thread-per-connection would probably be best. If you need to handle large numbers of connections and have short, non-blocking request processing, event-based scales extremely well. You can of course create a hybrid of these two and have all connections managed by event threads and asynchronous queues to dedicated processing threads for heavy request handling (this is sort of what I did in the C Gearman Job Server). There is no single correct answer, so take a look at your options before deciding how to approach your own applications. Don’t be afraid to create hybrids as well. Regardless of which paradigm you choose, concurrent programming can be hard, especially at the lower levels. There have been a number of higher level abstractions to help developers, from new libraries to new languages, but most of these come with a cost in performance or flexibility. When you need to squeeze every bit of performance out of your application, you will most likely end up in C or C++ dealing with these issues directly. This is actually one of the problems I’m attempting to address with the Scale Stack Event modules. I’m trying to create a healthy level of abstraction on hybrid thread/event based applications so you don’t have any overhead or limitations while a lot of the common headaches are taken care of for you. If you have a need for such a system, get in touch, I’d be interested to talk. Since it is BSD licensed you can use it in any application, including commercial. Posted in Drizzle, Gearman, Main, MySQL | 5 CommentsDrizzle Developer Day RecapApril 19th, 2010Last Friday we held the Drizzle Developer Day at the Santa Clara convention center, taking advantage of the fact that many developers and interested contributors were already there for the MySQL Conference & Expo. Minus a few small glitches like wifi and pizza consumption location, I would say it was an overall success. There were a lot of new folks interested in learning about Drizzle and getting the server up and running. The day was organized by splitting folks up into small groups with matching interests, and then switching up groups every hour or so. We had groups focused on replication, documentation, writing plugins, the optimizer, Boots (the new client tool), and a “getting started” group. The first group I participated in was about Boots, the new command line tool developed by a group of students I sponsored at Portland State University. One of the students who created it was there (Chromakode), so he gave a demo of all the features and ways you could extend it for custom use. Baron from Percona was there and had a lot of good feedback on what is needed by DBAs, as well as for monitoring/troubleshooting problems. Some of the new features in Boots will help quite a bit with this since you are able to write simple Python scripts that work inside the program rather than having to write a bunch of shell processing code around the existing tool. This extended into a discussion about testing tools for production systems, and how to capture and replay production traffic with the same timing and load (or increased load). The next group I sat in on was around creating plugins. There were topics like getting started with writing your own plugin, a script to generate a skeleton for your own, and more advanced topics like dependency tracking. Since I used the same pandora-plugin system for another project and added dependency tracking there, I am interested in getting dependency tracking into Drizzle. We didn’t get to any code, but this will require some changes in how plugins are loaded in the Drizzle kernel. I had to leave a little early to catch my flight home, but for the second half of the day I bounced between helping a group get started from scratch (mainly installing dependencies to getting Drizzle built and running) and the other group topics. Thanks to everyone who showed up and helped participate, we all had some great conversations providing valuable feedback for directions to take moving forward. Posted in Drizzle, Main, MySQL | 8 CommentsBoots: A Modular CLI for DatabasesApril 8th, 2010
Boots is written in Python and aims to replace the the previous ‘drizzle’ tool (which was modified from the ‘mysql’ command line tool). It doesn’t support everything that the old tool has yet (like tab completion), but it adds some new features. For example, there are multiple ‘lingos’, or modular languages, that can be used to communicate with the shell. This allows you to use plain SQL, Python, or even LISP to interact with the shell. One of the lingos, piped-sql, lets you do interesting things such as:
shell$ boots -u root -h 127.0.0.1 -l pipedsql
Boots (v0.2.0)
127.0.0.1:3306 (server v5.1.40)
> SELECT * FROM mysql.user; | csv_out("users.csv")
5 rows in set (0.06s server | +0.00s working)
> Boots quit.
shell$ cat users.csv
localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0
...
It’s ready to use, so download and install it now! If you have any features you would like to see, please get in touch through the Boots blueprints, mailing list, or #boots IRC channel on irc.freenode.net. One of the original developers from the project, Chromakode (the same from the awesome xkcd.com shell), will also be attending the MySQL Conference & Expo next week and helping out with the Drizzle booth. Come find one of us to talk more about the project there! Posted in Drizzle, Main, MySQL | 2 CommentsScale Stack and Database Proxy PrototypeApril 8th, 2010Back in January when I was between jobs I had a free weekend to do some fun hacking. I decided to start a new open source project that had been brewing in the back of my head and since then have been poking at it on the weekends and an occasional late night. I decided to call it Scale Stack because it aims to provide a scalable network service stack. This may sound a bit generic and boring, but let me show a graph of a database proxy module I slapped together in the past couple days: I setup MySQL 5.5.2-m2 and ran the sysbench read-only tests against it with 1-8192 threads. I then started up the database proxy module built on Scale Stack so sysbench would route through that, and you can see the concurrency improved quite a bit at higher thread counts. The database module doesn’t do much, it simply does connection concentration, mapping M to N connections, where N is a fixed parameter given at startup. In this case I always mapped all incoming sysbench connections down to 128 connections between Scale Stack and MySQL. It also uses a fixed number of threads and is entirely non-blocking. As you can see the max throughput around 64 threads is a bit lower, but I’ve not done much to optimize this yet (there should be some easy improvements where I simply stuck in a mutex instead of doing a lockless queue). It’s only a simple proof-of-concept module to see how well this would work, but it’s a start to a potentially useful module built on the other Scale Stack components. One other thing to mention is that these tests were run on a single 16-core Intel machine. I’d really like to test this with multiple machines at some point. So, what is Scale Stack? Check out the website for a simple overview of what it is. The goal is to pick up where the operating system kernel leaves off with the network stack. It is written in C++ and is extremely modular with only the module loader, option parsing, and basic log in the kernel library. It uses Monty Taylor’s pandora-build autoconf files to provide a sane modular build system, along with some modifications I made so dependency tracking is done between modules. You can actually use it to write modules that would do anything, I’m just most interested in network service based modules. The kernel/module loader is also just a library, so you can actually embed this into existing applications as well. Some of the modules I’ve written for it are a threaded event handling module based on libevent/pthreads and a TCP socket module. There is also an echo server and simple proxy module I created while testing the event and socket modules. The database proxy module builds on top of the event and socket module. The code is under the BSD license and is up on Launchpad, so feel free to check it out and contribute. If you need a base to build high-performance network services on, you should definitely take a look and talk with me. What’s up next? I have a long list of things I would like to do with this, but first up are still some basics. This includes other socket type modules like TLS/SSL, UDP, and Unix sockets. Then are some more protocol modules such as Drizzle, a real MySQL protocol module, and others like HTTP, Gearman, and memcached. It’s fairly trivial to write these since the socket modules handle all buffering and provide a simple API. As for the DatabaseProxy module, I’d like to rework how things are now so it’s not MySQL protocol specific, integrate other protocol modules, improve performance, add in multi-tenancy support for quality-of-service queuing based on account rules, and a laundry list of other features I won’t bore you with right now. I also have plans for other services besides a database proxy, especially one that could combine a number of protocols into a generic URI server with pluggable handlers so you can do some interesting translations between modules (like Apache httpd but not http-centric). For example, think of the crazy things you can do with Twisted for Python, but now with a fast, threaded C++ kernel. I also still need to experiment with live reloading of modules, but I’m not sure if this will be worthwhile yet. If any of this sounds interesting, get in touch, I’d love to have some help! I’ll have some blog posts later on how to get started writing modules, but for now just take a look at the existing modules. The EchoServer is a good place to start since it is pretty simple. Also, if you’ll be at the MySQL Conference and Expo next week, I’d be happy to talk more about it then. Posted in Drizzle, Main, MySQL | 3 CommentsGearman Releases and Talks at the MySQL ConferenceApril 5th, 2010I spent some time this weekend fixing up the Gearman MySQL UDFs (user defined functions) and fixed a few bugs in the Gearman Server. You can find links to the new releases on the Gearman website. The UDFs now use Monty Taylor’s pandora-build autoconf files instead of the old fragile autoconf setup that relied on pkgconfig. If you are attending the MySQL Conference & Expo next week and want to learn more about Gearman, be sure to check out one of the three sessions Giuseppe Maxia and I are giving:
Hope to see you there! Posted in Drizzle, Gearman, Main, MySQL | No CommentsWriting Authentication Plugins for DrizzleApril 5th, 2010In this post I’m going to describe how to write an authentication plugin for Drizzle. The plugin I’ll be demonstrating is a simple file-based plugin that takes a file containing a list of ‘username:password’ entries (one per line like a .htpasswd file for Apache). The first step is to setup a proper build environment and create a branch, see the Drizzle wiki page to get going. From here I’ll assume you have Drizzle checked out from bzr and are able to compile it. Setup a development branch and plugin directory Change to your shared-repository directory for Drizzle and run (assuming you branched ‘lp:drizzle’ to ‘drizzle’): shell$ bzr branch drizzle auth-file Branched 1432 revision(s). shell$ cd auth-file Next, we’ll want to create the plugin directory and create plugin.ini and auth_file.cc. shell$ mkdir plugin/auth_file plugin/auth_file/plugin.ini: [plugin] title=File-based Authentication description=A simple plugin to authenticate against a list of username:password entries in a plain text file. version=0.1 author=Eric Day <eday@oddments.org> license=PLUGIN_LICENSE_GPL plugin/auth_file/auth_file.cc:
/* -*- mode: c++; c-basic-offset: 2; indent-tabs-mode: nil; -*-
* vim:expandtab:shiftwidth=2:tabstop=2:smarttab:
*
* Copyright (C) 2010 Eric Day
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; version 2 of the License.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include "config.h"
#include <string>
#include "drizzled/plugin/authentication.h"
#include "drizzled/security_context.h"
using namespace std;
using namespace drizzled;
namespace auth_file
{
class AuthFile: public plugin::Authentication
{
public:
AuthFile(string name_arg):
plugin::Authentication(name_arg)
{ }
bool authenticate(const SecurityContext &sctx, const string &password)
{
/* Let "root" user always succeed for now because of test suite. */
if (sctx.getUser() == "root" && password.empty())
return true;
/* Only allow hard coded username for now. */
if (sctx.getUser() == "auth_file")
return true;
return false;
}
};
static int init(plugin::Context &context)
{
context.add(new AuthFile("auth_file"))
return 0;
}
} /* namespace auth_file */
DRIZZLE_PLUGIN(auth_file::init, NULL);
All authentication plugins need to inherit from the ‘plugin::Authentication’ class and implement an ‘authenticate’ method. This takes the user context and a password as its arguments and simply returns true if the user is allowed or false otherwise. As you can see, this plugin will verify all sessions for the ‘auth_file’ user with any password and deny everything else. It also allows ‘root’ with no password for the test suite, we’ll fix this later so it’s not required. The init method is called when the plugin is loaded, and here we want to register an instance of the plugin class with the kernel. The DRIZZLE_PLUGIN definition is required so the Drizzle kernel can load the module and grab some basic information about it (like the name of the init method). Create tests cases to verify our plugin works We’ll want to add some test cases so we can check our plugin as we make progress. This is done by creating test case and result files inside the plugin directory. You’ll want to create the following directories and files: shell$ mkdir plugin/auth_file/tests shell$ mkdir plugin/auth_file/tests/t shell$ mkdir plugin/auth_file/tests/r plugin/auth_file/tests/t/basic-master.opt --plugin-add=auth_file plugin/auth_file/tests/t/basic.test --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT --replace_regex /@'.*?'/@'LOCALHOST'/ --error ER_ACCESS_DENIED_ERROR connect (bad_user,localhost,bad_user,,,); --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT connect (auth_file,localhost,auth_file,,,); connection auth_file; SELECT 1; plugin/auth_file/tests/r/basic.result connect(localhost,bad_user,,test,MASTER_PORT,); ERROR 28000: Access denied for user 'bad_user'@'LOCALHOST' (using password: NO) SELECT 1; 1 1 The files in the ‘tests/t’ directory drive the test system, and the file in ‘tests/r’ are the results that should match the output. This test tries two connections, one with ‘bad_user’ which should fail, and another with ‘auth_file’ user which should pass. Before writing the code to check against a list of users in a file, we’ll compile what we have so far and check the test cases to make sure things are working properly. shell$ ./config/autorun.sh ... shell$ ./configure --with-debug ... shell$ make -j 3 ... shell$ make check ... auth_file.basic [ pass ] 6 ... It works! To save some time while developing, you can also test just the auth_file plugin without everything else by running: ( cd tests && ./dtr --suite=auth_file ) Add options We’re going to want users to be able to specify a location for the file to load, so we’ll need to tell the kernel about the option through the plugin interface. This is done by adding:
#include "drizzled/configmake.h"
...
static char* users_file= NULL;
static const char DEFAULT_USERS_FILE[]= SYSCONFDIR "/drizzle.users";
...
static DRIZZLE_SYSVAR_STR(users,
users_file,
PLUGIN_VAR_READONLY,
N_("File to load for usernames and passwords"),
NULL, /* check func */
NULL, /* update func*/
DEFAULT_USERS_FILE /* default */);
static drizzle_sys_var* sys_variables[]=
{
DRIZZLE_SYSVAR(users),
NULL
};
...
DRIZZLE_PLUGIN(auth_file::init, auth_file::sys_variables);
The first include is there so we can have access to the SYSCONFDIR macro, which maps to the ‘etc’ directory of our install path. That path plus the file ‘drizzle.users’ is our default. We also define a variable to either this default path or a custom path the user specifies. Next, we define a system variable with the macro DRIZZLE_SYSVAR_STR and provide some information like where to store it, the help string, and the default value. We also need to define a system variables list. The new variable is the only entry in the list right now, but you could define more system variables and add them to this list (just make sure it is NULL terminated). Last, we modify our DRIZZLE_PLUGIN call to give a second argument instead of NULL. This tells the kernel to look for variables in the provided list when loading. With this option, we’ll now be able to specify: –auth-file-users=/some/path/to/drizzles.users Write the plugin With the plugin compiling, tests setup, and options specified, we can start to write some real code. First up is adding a couple class methods, the new AuthFile class looks like:
class AuthFile: public plugin::Authentication
{
public:
AuthFile(string name_arg);
/**
* Retrieve the last error encountered in the class.
*/
string& getError(void);
/**
* Load the users file into a local map.
*
* @return True on success, false on error. If false is returned an error
* is set and can be retrieved with getError().
*/
bool loadFile(void);
private:
bool authenticate(const SecurityContext &sctx, const string &password);
string error;
map<string, string> users;
};
We’ve moved the method definitions out of the class (for Drizzle coding standards) and now have two new declarations: loadFile() to load the specified users file into a std::map, and getError() to return errors, if any. The getError() method simply returns the ‘error’ data member, but loadFile() is a bit more interesting:
bool AuthFile::loadFile(void)
{
ifstream file(users_file);
if (!file.is_open())
{
error = "Could not open users file: ";
error += users_file;
return false;
}
while (!file.eof())
{
string line;
getline(file, line);
if (line == "" || line[line.find_first_not_of(" \t")] == '#')
continue;
string username;
string password;
size_t password_offset = line.find(":");
if (password_offset == string::npos)
username = line;
else
{
username = string(line, 0, password_offset);
password = string(line, password_offset + 1);
}
pair<map<string, string>::iterator, bool> result;
result = users.insert(pair<string, string>(username, password));
if (result.second == false)
{
error = "Duplicate entry found in users file: ";
error += username;
file.close();
return false;
}
}
file.close();
return true;
}
This method opens the users file, and for each line, either ignores it because of blank lines/comments or parses out the username:password pair. Note that you don’t need to specify a password option. Next up, we change the authenticate() method to use the map instead of the hard coded values:
bool AuthFile::authenticate(const SecurityContext &sctx, const string &password)
{
map<string, string>::const_iterator user = users.find(sctx.getUser());
if (user == users.end())
return false;
if (password == user->second)
return true;
return false;
}
This method now looks up users in the map and, if found with a password match, lets the user in. Now lets update our test case to use this. First we need to create a users file to allow the ‘root’ and ‘auth_file’ user we put in our test cases: plugin/auth_file/tests/t/basic.users # Always allow root user with no password for drizzletest program root auth_file plugin/auth_file/tests/t/basic-master.opt --plugin-add=auth_file --auth-file-users=$DRIZZLE_TEST_DIR/../plugin/auth_file/tests/t/basic.users Now it’s time to recompile and check our new code: shell$ make -j 3 ... shell$ ( cd tests && ./dtr --suite=auth_file ) ... auth_file.basic [ pass ] 6 ... It still works! I’d like to say we’re done here, but notice we’ve not actually tested any passwords. Before trying that, a little explanation about how password authentication is required. Verifying passwords in Drizzle Because Drizzle has a pluggable protocol, the usernames and passwords can be coming from any source. They could be coming from the embedded console plugin which passes the password through as plain text, or from the MySQL protocol plugin that uses the custom MySQL hashing algorithm. This means a simple string equality does not suffice for all password sources. The code above handles the plain text case, but since the default connection method is the MySQL protocol, including for the test suite, we need to also handle the case when the user supplied password is hashed. Verify MySQL Hashed Passwords To accomplish this we add in an extra check in the authenticate() method. This now looks like:
bool AuthFile::authenticate(const SecurityContext &sctx, const string &password)
{
map<string, string>::const_iterator user = users.find(sctx.getUser());
if (user == users.end())
return false;
if (sctx.getPasswordType() == SecurityContext::MYSQL_HASH)
return verifyMySQLHash(user->second, sctx.getPasswordContext(), password);
if (password == user->second)
return true;
return false;
}
This extra check calls the verifyMySQLHash() method to verify the local password with the client-scrambled password, using the random bytes the server sent during the handshake (password context). This method is:
#include "drizzled/util/convert.h"
#include "drizzled/algorithm/sha1.h"
...
/**
* Verify the local and remote scrambled password match using the MySQL
* hashing algorithm.
*
* @param[in] password Plain text password that is stored locally.
* @param[in] scramble_bytes The random bytes the server sent to client
* to use for scrambling the password.
* @param[in] scrambled_password The result of the client scrambling the
* password remotely.
* @return True if the password matched, false if not.
*/
bool verifyMySQLHash(const string &password,
const string &scramble_bytes,
const string &scrambled_password);
...
bool AuthFile::verifyMySQLHash(const string &password,
const string &scramble_bytes,
const string &scrambled_password)
{
if (scramble_bytes.size() != SHA1_DIGEST_LENGTH ||
scrambled_password.size() != SHA1_DIGEST_LENGTH)
{
return false;
}
SHA1_CTX ctx;
uint8_t local_scrambled_password[SHA1_DIGEST_LENGTH];
uint8_t temp_hash[SHA1_DIGEST_LENGTH];
uint8_t scrambled_password_check[SHA1_DIGEST_LENGTH];
/* Generate the double SHA1 hash for the password stored locally first. */
SHA1Init(&ctx);
SHA1Update(&ctx, reinterpret_cast<const uint8_t *>(password.c_str()),
password.size());
SHA1Final(temp_hash, &ctx);
SHA1Init(&ctx);
SHA1Update(&ctx, temp_hash, SHA1_DIGEST_LENGTH);
SHA1Final(local_scrambled_password, &ctx);
/* Hash the scramble that was sent to client with the local password. */
SHA1Init(&ctx);
SHA1Update(&ctx, reinterpret_cast<const uint8_t*>(scramble_bytes.c_str()),
SHA1_DIGEST_LENGTH);
SHA1Update(&ctx, local_scrambled_password, SHA1_DIGEST_LENGTH);
SHA1Final(temp_hash, &ctx);
/* Next, XOR the result with what the client sent to get the original
single-hashed password. */
for (int x= 0; x < SHA1_DIGEST_LENGTH; x++)
temp_hash[x]= temp_hash[x] ^ scrambled_password[x];
/* Hash this result once more to get the double-hashed password again. */
SHA1Init(&ctx);
SHA1Update(&ctx, temp_hash, SHA1_DIGEST_LENGTH);
SHA1Final(scrambled_password_check, &ctx);
/* These should match for a successful auth. */
return memcmp(local_scrambled_password, scrambled_password_check, SHA1_DIGEST_LENGTH) == 0;
}
I won't get into the details of what this method does, this is left as an exercise for the reader. :) The one thing we do care about is if this works, so back to adding to our test cases: plugin/auth_file/tests/t/basic.users # Always allow root user with no password for drizzletest program root auth_file auth_file_password:test_password plugin/auth_file/tests/t/basic.test --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT --replace_regex /@'.*?'/@'LOCALHOST'/ --error ER_ACCESS_DENIED_ERROR connect (bad_user,localhost,bad_user,,,); --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT connect (auth_file,localhost,auth_file,,,); connection auth_file; SELECT 1; --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT connect (auth_file_password,localhost,auth_file_password,test_password,,); connection auth_file_password; SELECT 1; --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT --replace_regex /@'.*?'/@'LOCALHOST'/ --error ER_ACCESS_DENIED_ERROR connect (bad_user_password,localhost,auth_file_password,bad_password,,); plugin/auth_file/tests/r/basic.result connect(localhost,bad_user,,test,MASTER_PORT,); ERROR 28000: Access denied for user 'bad_user'@'LOCALHOST' (using password: NO) SELECT 1; 1 1 SELECT 1; 1 1 connect(localhost,auth_file_password,bad_password,test,MASTER_PORT,); ERROR 28000: Access denied for user 'auth_file_password'@'LOCALHOST' (using password: YES) With the test files updated, lets compile and run the tests: shell$ make -j 3 ... shell$ ( cd tests && ./dtr --suite=auth_file ) ... auth_file.basic [ pass ] 11 ... It works! At this point we have a fully functional plugin. To finish up the plugin, we'll want to commit the changes, push the branch to Launchpad, and propose the plugin for review so it can be merged into the trunk. You can see the full source code in lp:~eday/drizzle/auth-file (it should also appear in the Drizzle trunk in the next couple of days). There are improvements that can be made such as checking if the file changed to reload while running (being conscious of the possibility of multiple concurrent readers) or being able to store passwords in a format other than plain text. Patches are welcome! I hope this gives you enough information to get started writing your own authentication plugins. I'm going to be working on a direct LDAP authentication plugin next, supporting both plain text and MySQL hashed passwords. If you need any help getting started with your own, come ask your questions on IRC or on the Drizzle mailing list. We'll also be hosting a Drizzle Developer Day after the MySQL Conference where you can get started in person. Posted in Drizzle, Main, MySQL | 2 CommentsThoughts on “NoSQL”March 26th, 2010I’ve decided to jump on the bandwagon and spill my thoughts on “NoSQL” since it’s been such a hot topic lately ([1], [2], [3], [4]). Since I work on the Drizzle project some folks would probably think I take the SQL side of the “debate,” but actually I’m pretty objective about the topic and find value in projects on both sides. Let me explain. Last November at OpenSQL Camp I assembled a panel to debate “SQL vs NoSQL.” We had folks representing a variety of projects, including Cassandra, CouchDB, Drizzle, MariaDB, MongoDB, MySQL, and PostgreSQL. Even though I realized this was a poor name for such a panel, I went with it anyways because this “debate” was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term “NoSQL” is a bit misleading. I talked with Eric Evans (one of my new co-workers over on the Cassandra team) who reintroduced the term, and even he admits it is vague and doesn’t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah. One reason for all this confusion is that for some people, the term “database” equates to “relational database.” This makes the non-relational projects look foreign because they don’t fit the database model that became “traditional” due it’s popularity. Anyone who has ever read up on other database models would quickly realize relational is just one of many models, and many of the “NoSQL” projects fit quite nicely into one of these categories. The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the “SQL” side should really take the time to study them – there are some tricks to learn.
One of the main criticisms of the “NoSQL” projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins. I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it’s a hard problem. People have worked around this by accepting new data models and consistency levels. It all depends on what your problem requires. I fully embrace the “NoSQL” projects out there, there is something we can all learn from them even if we don’t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset “peg” into the relational “hole.” Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember: ![]() < Older Entries | |
Blog Wiki About Resume RSS Comments Launchpad identi.ca OpenStack Scale Stack Gearman NW Veg Veg Food & Fit |
|
Copyright (C) Eric Day - eday@oddments.org All content licensed under the Creative Commons Attribution 3.0 License. Hosted by Rackspace Cloud |
|