commit 486a86eb184c008c5957fb68c63f163289f3344b Author: Daniel P. Berrange Date: Fri May 3 16:58:26 2013 +0100 Add docs about cgroups layout and usage Describe the new cgroups layout, how to customize placement of guests and what virsh commands are used to access the parameters. Signed-off-by: Daniel P. Berrange Index: libvirt-1.0.5/docs/cgroups.html.in =================================================================== --- /dev/null +++ libvirt-1.0.5/docs/cgroups.html.in @@ -0,0 +1,285 @@ + + + + +

Control Groups Resource Management

+ + + +

+ The QEMU and LXC drivers make use of the Linux "Control Groups" facility + for applying resource management to their virtual machines and containers. +

+ +

Required controllers

+ +

+ The control groups filesystem supports multiple "controllers". By default + the init system (such as systemd) should mount all controllers compiled + into the kernel at /sys/fs/cgroup/$CONTROLLER-NAME. Libvirt + will never attempt to mount any controllers itself, merely detect where + they are mounted. +

+ +

+ The QEMU driver is capable of using the cpuset, + cpu, memory, blkio and + devices controllers. None of them are compulsory. + If any controller is not mounted, the resource management APIs + which use it will cease to operate. It is possible to explicitly + turn off use of a controller, even when mounted, via the + /etc/libvirt/qemu.conf configuration file. +

+ +

+ The LXC driver is capable of using the cpuset, + cpu, cpuset, freezer, + memory, blkio and devices + controllers. The cpuset, devices + and memory controllers are compulsory. Without + them mounted, no containers can be started. If any of the + other controllers are not mounted, the resource management APIs + which use them will cease to operate. +

+ +

Current cgroups layout

+ +

+ As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been + simplified, in order to facilitate the setup of resource control policies by + administrators / management applications. The layout is based on the concepts of + "partitions" and "consumers". Each virtual machine or container is a consumer, + and has a corresponding cgroup named $VMNAME.libvirt-{qemu,lxc}. + Each consumer is associated with exactly one partition, which also have a + corresponding cgroup usually named $PARTNAME.partition. The + exceptions to this naming rule are the three top level default partitions, + named /system (for system services), /user (for + user login sessions) and /machine (for virtual machines and + containers). By default every consumer will of course be associated with + the /machine partition. This leads to a hierarchy that looks + like +

+ +
+$ROOT
+  |
+  +- system
+  |   |
+  |   +- libvirtd.service
+  |
+  +- machine
+      |
+      +- vm1.libvirt-qemu
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- vm2.libvirt-qemu
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- vm3.libvirt-qemu
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- container1.libvirt-lxc
+      |
+      +- container2.libvirt-lxc
+      |
+      +- container3.libvirt-lxc
+    
+ +

+ The default cgroups layout ensures that, when there is contention for + CPU time, it is shared equally between system services, user sessions + and virtual machines / containers. This prevents virtual machines from + locking the administrator out of the host, or impacting execution of + system services. Conversely, when there is no contention from + system services / user sessions, it is possible for virtual machines + to fully utilize the host CPUs. +

+ +

Using custom partitions

+ +

+ If there is a need to apply resource constraints to groups of + virtual machines or containers, then the single default + partition /machine may not be sufficiently + flexible. The administrator may wish to sub-divide the + default partition, for example into "testing" and "production" + partitions, and then assign each guest to a specific + sub-partition. This is achieved via a small element addition + to the guest domain XML config, just below the main domain + element +

+ +
+  ...
+  <resource>
+    <partition>/machine/production</partition>
+  </resource>
+  ...
+    
+ +

+ Libvirt will not auto-create the cgroups directory to back + this partition. In the future, libvirt / virsh will provide + APIs / commands to create custom partitions, but currently + this is left as an exercise for the administrator. For + example, given the XML config above, the admin would need + to create a cgroup named '/machine/production.partition' +

+ +
+# cd /sys/fs/cgroup
+# for i in blkio cpu,cpuacct cpuset devices freezer memory net_cls perf_event
+  do
+    mkdir $i/machine/production.partition
+  done
+# for i in cpuset.cpus  cpuset.mems
+  do
+    cat cpuset/machine/$i > cpuset/machine/production.partition/$i
+  done
+
+ +

+ Note: the cgroups directory created as a ".partition" + suffix, but the XML config does not require this suffix. +

+ +

+ Note: the ability to place guests in custom + partitions is only available with libvirt >= 1.0.5, using + the new cgroup layout. The legacy cgroups layout described + later did not support customization per guest. +

+ +

Resource management APIs/commands

+ +

+ Since libvirt aims to provide an API which is portable across + hypervisors, the concept of cgroups is not exposed directly + in the API or XML configuration. It is considered to be an + internal implementation detail. Instead libvirt provides a + set of APIs for applying resource controls, which are then + mapped to corresponding cgroup tunables +

+ +

Scheduler tuning

+ +

+ Parameters from the "cpu" controller are exposed via the + schedinfo command in virsh. +

+ +
+# virsh schedinfo demo
+Scheduler      : posix
+cpu_shares     : 1024
+vcpu_period    : 100000
+vcpu_quota     : -1
+emulator_period: 100000
+emulator_quota : -1
+ + +

Block I/O tuning

+ +

+ Parameters from the "blkio" controller are exposed via the + bkliotune command in virsh. +

+ + +
+# virsh blkiotune demo
+weight         : 500
+device_weight  : 
+ +

Memory tuning

+ +

+ Parameters from the "memory" controller are exposed via the + memtune command in virsh. +

+ +
+# virsh memtune demo
+hard_limit     : 580192
+soft_limit     : unlimited
+swap_hard_limit: unlimited
+    
+ +

Network tuning

+ +

+ The net_cls is not currently used. Instead traffic + filter policies are set directly against individual virtual + network interfaces. +

+ +

Legacy cgroups layout

+ +

+ Prior to libvirt 1.0.5, the cgroups layout created by libvirt was different + from that described above, and did not allow for administrator customization. + Libvirt used a fixed, 3-level hierarchy libvirt/{qemu,lxc}/$VMNAME + which was rooted at the point in the hierarchy where libvirtd itself was + located. So if libvirtd was placed at /system/libvirtd.service + by systemd, the groups for each virtual machine / container would be located + at /system/libvirtd.service/libvirt/{qemu,lxc}/$VMNAME. In addition + to this, the QEMU drivers further child groups for each vCPU thread and the + emulator thread(s). This leads to a hierarchy that looked like +

+ + +
+$ROOT
+  |
+  +- system
+      |
+      +- libvirtd.service
+           |
+           +- libvirt
+               |
+               +- qemu
+               |   |
+               |   +- vm1
+               |   |   |
+               |   |   +- emulator
+               |   |   +- vcpu0
+               |   |   +- vcpu1
+               |   |
+               |   +- vm2
+               |   |   |
+               |   |   +- emulator
+               |   |   +- vcpu0
+               |   |   +- vcpu1
+               |   |
+               |   +- vm3
+               |       |
+               |       +- emulator
+               |       +- vcpu0
+               |       +- vcpu1
+               |
+               +- lxc
+                   |
+                   +- container1
+                   |
+                   +- container2
+                   |
+                   +- container3
+    
+ +

+ Although current releases are much improved, historically the use of deep + hierarchies has had a significant negative impact on the kernel scalability. + The legacy libvirt cgroups layout highlighted these problems, to the detriment + of the performance of virtual machines and containers. +

+ + Index: libvirt-1.0.5/docs/sitemap.html.in =================================================================== --- libvirt-1.0.5.orig/docs/sitemap.html.in +++ libvirt-1.0.5/docs/sitemap.html.in @@ -87,6 +87,10 @@ Ensuring exclusive guest access to disks
  • + CGroups + Control groups integration +
  • +
  • Hooks Hooks for system specific management