Container-Based OS Virtualization
by Liz van Dijk on July 8, 2008 1:30 AM EST- Posted in
- IT Computing
Beancounters
Current day resource management in Linux is certainly adequate for management of a conventional operating system, but when trying to provide limited resource sets for contained groups of processes, it falls a bit short. Of these resources, memory is the most notable, so an important task is to keep track of the amount of pages each process is allowed to use.
For this reason, OpenVZ has implemented an addition to the kernel called beancounters (BC). These kernel-level objects keep track of the resources available and can notify the kernel of user space processes that are overstepping their boundaries.
Since containers are no more than groups of processes, actual allocation of memory and other resources requires no large changes. However, upon creation of a new container, an "init" process is made for it. Readers familiar with Linux-based systems will recognize this process as the proverbial mother of all processes, and in containers it is no different. Upon creation, the process is assigned to a beancounter object by the OpenVZ software. As a result, each of its child processes is bound to it. From that point on, the BC controls the maximum amount of resources available to that container.
Think of the BC as containing a sort of table, of which the columns contain the resources currently held, the maximum of resources held during the session and a barrier value, at which the BC will send a warning to the kernel to stop allocating resources. Furthermore, an absolute limit for moments of burst usage and the number of failed allocations due to a shortage of still available resources can be tracked. The rows, in turn, contain the different types of resources the BC can keep track of, including but not limited to:
- the user memory
- network buffers
- # of tasks
- # of files
- # of sockets
- # of file locks
We're able to see the current status of beancounters for each existing container by outputting the contents of /proc/user_beancounters
We'll try to cover the actual accounting of some types of resources as short and as clear as possible.
Memory accounting
Accounting of memory is divided into several parts and discussing all of them would fill up several pages. The following paragraphs dip into the logic behind the system a bit more deeply, and discuss some of the challenges faced in allocating memory to several containers while showing examples of how beancounters can be put to use in a container-based environment.
Virtual Memory Area lengths:
Processes can make requests for extra memory pages, without necessarily putting them to use immediately or even using them at all. Instead of receiving the full amount on the physical RAM, it is given an amount of "virtual" pages, so that only the number of pages actually used are loaded into the RAM, while "empty" pages remain free for other processes to use.
This way, each process works with the impression that the full amount of virtual pages is always available, and even though it isn't currently using the free pages, it can do so in the future if the need arises. The issue here is that the total amount of virtual memory for all processes is usually much larger than what is available in the RAM.
When a lot of processes start actually using most of their virtual pages, the RAM might not suffice, and will need to swap certain data to the first available storage space to make room: the hard disk. As the hard disk is a much slower medium, this is a situation to avoid, especially when many users need to make use of the same system. One user could technically put all the others out of business by starting up some processes that are able to fill up the entirety of the physical RAM.
In the above picture, we can see pages in the virtual memory that exist physically in either the RAM or the HD. The total amount of Virtual Memory available is the sum of what is available in the RAM along with a certain amount assigned to it on the HD. The beancounter keeps track of the total amount of virtual pages allocated, so it can anticipate troublesome situations and deny requests for more virtual pages if needed.
Resident Set Size
RSS is a term used to denote only the actual "used" pages by a process that exist in the physical RAM. The problem here is that, even though a page is the smallest possible unit for memory allocation, it is sometimes possible for a page to be mapped by several processes (i.e. when the same file is used). Therefore, a single page in use on two different beancounters would count as two pages in total, making the count inaccurate. Therefore, RSS is calculated with a system of parts, as follows:
- BC1 encounters a used page, and counts it as a whole = 1
- BC2 encounters the same page, and both beancounters count half a page = 1/2, 1/2
- BC3 encounters the same page and one of the present beancounters splits half the page = 1/2, 1/4, 1/4
- When BC4 arrives, the other half is split, and so on. = 1/4, 1/4, 1/4, 1/4
In this picture from Pavlov Emelyanov's explanation on beancounters, we can see the division of pages explained. This is an efficient way of keeping the RSS count accurate, as the arrival of a new beancounter only triggers a change in one of the others, as opposed to the full group.
3 Comments
View All Comments
CEO Ballmer - Tuesday, October 28, 2008 - link
I think we all know that this stuuf does not work very well!http://fakesteveballmer.blogspot.com">http://fakesteveballmer.blogspot.com
AprilLee - Thursday, April 2, 2020 - link
Scheduling and other matters are discussed just briefly here. More work has to be done on it. Wishing better stuff ahead and liking thi s one also https://www.customessaysreviews.com/college-paper-... which students feel easier. Thank you!Olivia Young - Monday, April 26, 2021 - link
Do you want to find fresh facts and data about 99papers, including its pricing policy? I recommend you to check out this 99papers prices review.https://essayservices.review/reviews/99papers-revi...