| 
									
										
										
										
											2015-03-23 16:32:16 +08:00
										 |  |  | Use multiple thread (de)compression in live migration | 
					
						
							|  |  |  | ===================================================== | 
					
						
							|  |  |  | Copyright (C) 2015 Intel Corporation | 
					
						
							|  |  |  | Author: Liang Li <liang.z.li@intel.com> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This work is licensed under the terms of the GNU GPLv2 or later. See | 
					
						
							|  |  |  | the COPYING file in the top-level directory. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Contents: | 
					
						
							|  |  |  | ========= | 
					
						
							|  |  |  | * Introduction | 
					
						
							|  |  |  | * When to use | 
					
						
							|  |  |  | * Performance | 
					
						
							|  |  |  | * Usage | 
					
						
							|  |  |  | * TODO | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Introduction | 
					
						
							|  |  |  | ============ | 
					
						
							|  |  |  | Instead of sending the guest memory directly, this solution will | 
					
						
							|  |  |  | compress the RAM page before sending; after receiving, the data will | 
					
						
							|  |  |  | be decompressed. Using compression in live migration can help | 
					
						
							|  |  |  | to reduce the data transferred about 60%, this is very useful when the | 
					
						
							|  |  |  | bandwidth is limited, and the total migration time can also be reduced | 
					
						
							|  |  |  | about 70% in a typical case. In addition to this, the VM downtime can be | 
					
						
							|  |  |  | reduced about 50%. The benefit depends on data's compressibility in VM. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The process of compression will consume additional CPU cycles, and the | 
					
						
							|  |  |  | extra CPU cycles will increase the migration time. On the other hand, | 
					
						
							|  |  |  | the amount of data transferred will decrease; this factor can reduce | 
					
						
							|  |  |  | the total migration time. If the process of the compression is quick | 
					
						
							|  |  |  | enough, then the total migration time can be reduced, and multiple | 
					
						
							|  |  |  | thread compression can be used to accelerate the compression process. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The decompression speed of Zlib is at least 4 times as quick as | 
					
						
							|  |  |  | compression, if the source and destination CPU have equal speed, | 
					
						
							|  |  |  | keeping the compression thread count 4 times the decompression | 
					
						
							|  |  |  | thread count can avoid resource waste. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Compression level can be used to control the compression speed and the | 
					
						
							|  |  |  | compression ratio. High compression ratio will take more time, level 0 | 
					
						
							|  |  |  | stands for no compression, level 1 stands for the best compression | 
					
						
							|  |  |  | speed, and level 9 stands for the best compression ratio. Users can | 
					
						
							|  |  |  | select a level number between 0 and 9. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When to use the multiple thread compression in live migration | 
					
						
							|  |  |  | ============================================================= | 
					
						
							|  |  |  | Compression of data will consume extra CPU cycles; so in a system with | 
					
						
							|  |  |  | high overhead of CPU, avoid using this feature. When the network | 
					
						
							|  |  |  | bandwidth is very limited and the CPU resource is adequate, use of | 
					
						
							|  |  |  | multiple thread compression will be very helpful. If both the CPU and | 
					
						
							|  |  |  | the network bandwidth are adequate, use of multiple thread compression | 
					
						
							|  |  |  | can still help to reduce the migration time. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Performance | 
					
						
							|  |  |  | =========== | 
					
						
							|  |  |  | Test environment: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz | 
					
						
							|  |  |  | Socket Count: 2 | 
					
						
							|  |  |  | RAM: 128G | 
					
						
							|  |  |  | NIC: Intel I350 (10/100/1000Mbps) | 
					
						
							|  |  |  | Host OS: CentOS 7 64-bit | 
					
						
							|  |  |  | Guest OS: RHEL 6.5 64-bit | 
					
						
							| 
									
										
										
										
											2018-06-13 07:05:19 +02:00
										 |  |  | Parameter: qemu-system-x86_64 -accel kvm -smp 4 -m 4096 | 
					
						
							| 
									
										
										
										
											2015-03-23 16:32:16 +08:00
										 |  |  |  /share/ia32e_rhel6u5.qcow -monitor stdio | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | There is no additional application is running on the guest when doing | 
					
						
							|  |  |  | the test. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Speed limit: 1000Gb/s | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  |                     | original  | compress thread: 8 | 
					
						
							|  |  |  |                     |   way     | decompress thread: 2 | 
					
						
							|  |  |  |                     |           | compression level: 1 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | total time(msec):   |   3333    |  1833 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | downtime(msec):     |    100    |   27 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | transferred ram(kB):|  363536   | 107819 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | throughput(mbps):   |  893.73   | 482.22 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | total ram(kB):      |  4211524  | 4211524 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | There is an application running on the guest which write random numbers | 
					
						
							|  |  |  | to RAM block areas periodically. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Speed limit: 1000Gb/s | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  |                     | original  | compress thread: 8 | 
					
						
							|  |  |  |                     |   way     | decompress thread: 2 | 
					
						
							|  |  |  |                     |           | compression level: 1 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | total time(msec):   |   37369   | 15989 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | downtime(msec):     |    337    |  173 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | transferred ram(kB):|  4274143  | 1699824 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | throughput(mbps):   |  936.99   | 870.95 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | total ram(kB):      |  4211524  | 4211524 | 
					
						
							|  |  |  | --------------------------------------------------------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Usage | 
					
						
							|  |  |  | ===== | 
					
						
							|  |  |  | 1. Verify both the source and destination QEMU are able | 
					
						
							|  |  |  | to support the multiple thread compression migration: | 
					
						
							| 
									
										
										
										
											2016-05-23 17:43:57 +08:00
										 |  |  |     {qemu} info migrate_capabilities | 
					
						
							| 
									
										
										
										
											2015-03-23 16:32:16 +08:00
										 |  |  |     {qemu} ... compress: off ... | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2. Activate compression on the source: | 
					
						
							|  |  |  |     {qemu} migrate_set_capability compress on | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3. Set the compression thread count on source: | 
					
						
							|  |  |  |     {qemu} migrate_set_parameter compress_threads 12 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 4. Set the compression level on the source: | 
					
						
							|  |  |  |     {qemu} migrate_set_parameter compress_level 1 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 5. Set the decompression thread count on destination: | 
					
						
							|  |  |  |     {qemu} migrate_set_parameter decompress_threads 3 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 6. Start outgoing migration: | 
					
						
							|  |  |  |     {qemu} migrate -d tcp:destination.host:4444 | 
					
						
							|  |  |  |     {qemu} info migrate | 
					
						
							|  |  |  |     Capabilities: ... compress: on | 
					
						
							|  |  |  |     ... | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The following are the default settings: | 
					
						
							|  |  |  |     compress: off | 
					
						
							|  |  |  |     compress_threads: 8 | 
					
						
							|  |  |  |     decompress_threads: 2 | 
					
						
							|  |  |  |     compress_level: 1 (which means best speed) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | So, only the first two steps are required to use the multiple | 
					
						
							|  |  |  | thread compression in migration. You can do more if the default | 
					
						
							|  |  |  | settings are not appropriate. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | TODO | 
					
						
							|  |  |  | ==== | 
					
						
							|  |  |  | Some faster (de)compression method such as LZ4 and Quicklz can help | 
					
						
							|  |  |  | to reduce the CPU consumption when doing (de)compression. If using | 
					
						
							|  |  |  | these faster (de)compression method, less (de)compression threads | 
					
						
							|  |  |  | are needed when doing the migration. |