Setting up a high performance OpenVZ container
by Liz van Dijk on January 22, 2010 12:00 AM EST- Posted in
- Virtualization
As promised for a long time, we've been working on pitting Xen and OpenVZ against eachother in a little "battle of the free virtualization solutions". (If you can't quite recall what this OpenVZ business is all about, we suggest you go read our article on container-based virtualization)
Though development of our vApus FOS benchmark suite is moving on quite diligently, it takes time to create both a realistic testing setup that will prove useful and relevant for a while in a world where cores are multiplying like a pair of rabbits. As it turns out, our test client is up for a thorough rewrite and optimization as well in the face of the upcoming Magny-Cours and 64-core Nehalem systems, so we definitely have our work cut out for us.
In preparation for the "official" rollout of vApus FOS, we have been using our beta versions to test both the performances of CentOS 5.4 Xen and OpenVZ, meanwhile figuring out just how easy it is to set up a large scale realistic testing environment in OpenVZ.
As with many extensive open source software packages, OpenVZ comes with quite a few hefty man-pages and very minimal basic configuration, making the learning curve quite steep.
Having a repeatable test ready, however, helps quite a lot in tracking down possible bottlenecks in your container setup, and because our greatest issues came up when trying to configure a container for a relatively heavily queried MySQL database, here's some pointers for our readers out there trying to do the same.
- While testing, keep a very close look on /proc/user_beancounters. The very last column of this table displays the failcount of a certain resource in the container. When you start noticing problems, check user_beancounters first to get a better idea of what's going wrong.
- Problematic resource counters to look out for are the following:
numproc - This is the number of processes the container is allowed to create. In MySQL, every connection will get its own process, so make sure you allow for at least the the value you entered for max_connections in my.cnf, plus the usual amount of processes in a container. For a test with 900 users, we just set this to 1000 to be sure.
numtcpsock - Same as above, you need to increase this to at least the amount of users you want to allow at the same time. Each of them will need a TCP Socket.
kmemsize - When allowing a container access to a certain amount of memory, not all of it will be used in the same way. kmemsize is the amount of bytes that will be used for kernel activity of that specific container. Creating a large amount of processes requires quite some kernel intervention, so make sure it gets the memory it needs to keep track of the processes' data structures. Though it's best to experiment somewhat to figure out which setting is optimal, a good starting point is to look at your number of processes and multiply it by 50kb, then downscale or upscale as necessary. This is something you can easily keep track of by watching /proc/user_beancounters.
numfile - Again, this parameter depends on the type of application you use, how many users use it, how many tables they access (in the case of MySQL) and even which storage engine you use. Giving pointers here can become quite complex, but what worked for us was simply multiplying the base value by two to start with and examining the maxheld column in /proc/user_beancounters to downsize the amount to what we required.
tcpsndbuf & tcprcvbuf - These two buffers can be a little tricky, and confusing to notice while not paying attention. When the difference between the barrier setting and the limit of these buffers are too small, some connections can in fact be made, but some of them simply won't send or receive anything, and keep silent. This was very confusing to vApus, which opens its full amount of connections before starting the test, in the assumption that the successful creation of all connections would allow transmission of data, however slow. Instead, quite a few of its connections simply stalled indefinitely, for no apparent reason. The rule of thumb in this case is that, no matter the amount of memory you want to allow the container for networking purposes, the difference between the barrier and limit for these buffers should always allow for 2.5kB per connection, e.g. the amount filled in for numtcpsock. For our environment, this came down to 2500kB. As such, you can set the barrier value for these buffers as low as you like, but the limit should be set at barrier + numtcpsock * 2.5kB.
The easiest way to tweak these settings is by simply updating your containers' config files. In our case, they were located at /etc/vz/conf/[containerid].conf in the host container's filesystem.
Well, it's back to the grindstone for me, time to show these multicore monsters what we're made of.
6 Comments
View All Comments
gilboa - Monday, January 25, 2010 - link
Given the fact the RedHat is not pushing KVM out of the Fedora testing ground and into RHEL 5.4 (more-or-less displacing Xen), I'd consider setting up a benchmark comparing it to Xen and VMWare Server.Should be interesting.
(Read: In my own somewhat limited experience, KVM's ability to scale nicely into >= 8 cores >4GB RAM puts it as a possible alternative to VMWare ESX*... but as I said, my experience it limited to my own deployments...)
- Gilboa
gilboa - Monday, January 25, 2010 - link
%s/RedHat is not/RedHat is now/gkbob - Friday, January 22, 2010 - link
Are you running HTTP servers on these instances? If so, 2.5KB for tcpsendbuf sounds too small. It should be big enough for each socket to hold an entire mean-size HTTP response. That number is very application-specific, but 2.5KB is too small for anything but a toy app.LizVD - Monday, January 25, 2010 - link
The 2.5kB is not the amount a single buffer will get assigned to it. It's a value that, in case the tcpsndbuf limit has been reached (which you can set as high as you'd like), is reserved to assure the responsiveness of the connections, however slow.So, the idea is that you set the limit of tcpsnfbuf as high as you'd prefer, and for the barrier value, increase that amount by at least "numproc * 2.5kB".
karlkesselman - Friday, January 22, 2010 - link
This is really cool stuff. Looking forward to it.(I'm kind of surprised to see such bench. on Anandtech - most bench. here are for windows desktops so this seems like fresh air)
gwolfman - Friday, January 22, 2010 - link
AnandTech has a whole IT section where to do lots of relevant and useful reviews; ones I haven't seen other sites come close to.Anand knows what he's doing, no worries there.
I am also looking forward to the results.