2012-08-06 22:42:49 +04:00
|
|
|
XBZRLE (Xor Based Zero Run Length Encoding)
|
|
|
|
===========================================
|
|
|
|
|
|
|
|
Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
|
|
|
|
of VM downtime and the total live-migration time of Virtual machines.
|
|
|
|
It is particularly useful for virtual machines running memory write intensive
|
|
|
|
workloads that are typical of large enterprise applications such as SAP ERP
|
|
|
|
Systems, and generally speaking for any application that uses a sparse memory
|
|
|
|
update pattern.
|
|
|
|
|
|
|
|
Instead of sending the changed guest memory page this solution will send a
|
|
|
|
compressed version of the updates, thus reducing the amount of data sent during
|
|
|
|
live migration.
|
|
|
|
In order to be able to calculate the update, the previous memory pages need to
|
|
|
|
be stored on the source. Those pages are stored in a dedicated cache
|
|
|
|
(hash table) and are accessed by their address.
|
|
|
|
The larger the cache size the better the chances are that the page has already
|
|
|
|
been stored in the cache.
|
|
|
|
A small cache size will result in high cache miss rate.
|
|
|
|
Cache size can be changed before and during migration.
|
|
|
|
|
|
|
|
Format
|
|
|
|
=======
|
|
|
|
|
|
|
|
The compression format performs a XOR between the previous and current content
|
|
|
|
of the page, where zero represents an unchanged value.
|
|
|
|
The page data delta is represented by zero and non zero runs.
|
|
|
|
A zero run is represented by its length (in bytes).
|
|
|
|
A non zero run is represented by its length (in bytes) and the new data.
|
|
|
|
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
|
|
|
|
|
|
|
|
There can be more than one valid encoding, the sender may send a longer encoding
|
|
|
|
for the benefit of reducing computation cost.
|
|
|
|
|
|
|
|
page = zrun nzrun
|
|
|
|
| zrun nzrun page
|
|
|
|
|
|
|
|
zrun = length
|
|
|
|
|
|
|
|
nzrun = length byte...
|
|
|
|
|
|
|
|
length = uleb128 encoded integer
|
|
|
|
|
|
|
|
On the sender side XBZRLE is used as a compact delta encoding of page updates,
|
|
|
|
retrieving the old page content from the cache (default size of 512 MB). The
|
|
|
|
receiving side uses the existing page's content and XBZRLE to decode the new
|
|
|
|
page's content.
|
|
|
|
|
|
|
|
This work was originally based on research results published
|
|
|
|
VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
|
|
|
|
Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
|
|
|
|
Additionally the delta encoder XBRLE was improved further using the XBZRLE
|
|
|
|
instead.
|
|
|
|
|
|
|
|
XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
|
|
|
|
ideal for in-line, real-time encoding such as is needed for live-migration.
|
|
|
|
|
|
|
|
Example
|
|
|
|
old buffer:
|
|
|
|
1001 zeros
|
|
|
|
05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
|
|
|
|
3074 zeros
|
|
|
|
|
|
|
|
new buffer:
|
|
|
|
1001 zeros
|
|
|
|
01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
|
|
|
|
3074 zeros
|
|
|
|
|
|
|
|
encoded buffer:
|
|
|
|
|
|
|
|
encoded length 24
|
|
|
|
e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
|
|
|
|
|
xbzrle: optimize XBZRLE to decrease the cache misses
Avoid hot pages being replaced by others to remarkably decrease cache
misses
Sample results with the test program which quote from xbzrle.txt ran in
vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)
the test program:
include <stdlib.h>
include <stdio.h>
int main()
{
char *buf = (char *) calloc(4096, 4096);
while (1) {
int i;
for (i = 0; i < 4096 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
before this patch:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
"cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
"cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
"ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
"transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
"dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
"normal":1403465}},"id":"libvirt-706"}
18k pages sent compressed in 52 seconds.
cache-miss-rate is 98.7%, totally miss.
after optimizing:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
"cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
"cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
"ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
"transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
"dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
"normal":387794}},"id":"libvirt-266"}
194k pages sent compressed in 18 seconds.
The value of cache-miss-rate decrease to 48.59%.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2014-11-24 14:55:47 +03:00
|
|
|
Cache update strategy
|
|
|
|
=====================
|
|
|
|
Keeping the hot pages in the cache is effective for decreased cache
|
|
|
|
misses. XBZRLE uses a counter as the age of each page. The counter will
|
|
|
|
increase after each ram dirty bitmap sync. When a cache conflict is
|
|
|
|
detected, XBZRLE will only evict pages in the cache that are older than
|
|
|
|
a threshold.
|
|
|
|
|
2012-08-06 22:42:49 +04:00
|
|
|
Usage
|
|
|
|
======================
|
|
|
|
1. Verify the destination QEMU version is able to decode the new format.
|
|
|
|
{qemu} info migrate_capabilities
|
|
|
|
{qemu} xbzrle: off , ...
|
|
|
|
|
|
|
|
2. Activate xbzrle on both source and destination:
|
|
|
|
{qemu} migrate_set_capability xbzrle on
|
|
|
|
|
|
|
|
3. Set the XBZRLE cache size - the cache size is in MBytes and should be a
|
|
|
|
power of 2. The cache default value is 64MBytes. (on source only)
|
|
|
|
{qemu} migrate_set_cache_size 256m
|
|
|
|
|
|
|
|
4. Start outgoing migration
|
|
|
|
{qemu} migrate -d tcp:destination.host:4444
|
|
|
|
{qemu} info migrate
|
|
|
|
capabilities: xbzrle: on
|
|
|
|
Migration status: active
|
|
|
|
transferred ram: A kbytes
|
|
|
|
remaining ram: B kbytes
|
|
|
|
total ram: C kbytes
|
|
|
|
total time: D milliseconds
|
|
|
|
duplicate: E pages
|
|
|
|
normal: F pages
|
|
|
|
normal bytes: G kbytes
|
|
|
|
cache size: H bytes
|
|
|
|
xbzrle transferred: I kbytes
|
|
|
|
xbzrle pages: J pages
|
|
|
|
xbzrle cache miss: K
|
|
|
|
xbzrle overflow : L
|
|
|
|
|
|
|
|
xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
|
|
|
|
indicates that the cache size is set too low.
|
|
|
|
xbzrle overflow: the number of overflows in the decoding which where the delta
|
|
|
|
could not be compressed. This can happen if the changes in the pages are too
|
|
|
|
large or there are many short changes; for example, changing every second byte
|
|
|
|
(half a page).
|
|
|
|
|
|
|
|
Testing: Testing indicated that live migration with XBZRLE was completed in 110
|
|
|
|
seconds, whereas without it would not be able to complete.
|
|
|
|
|
|
|
|
A simple synthetic memory r/w load generator:
|
|
|
|
.. include <stdlib.h>
|
|
|
|
.. include <stdio.h>
|
|
|
|
.. int main()
|
|
|
|
.. {
|
|
|
|
.. char *buf = (char *) calloc(4096, 4096);
|
|
|
|
.. while (1) {
|
|
|
|
.. int i;
|
|
|
|
.. for (i = 0; i < 4096 * 4; i++) {
|
|
|
|
.. buf[i * 4096 / 4]++;
|
|
|
|
.. }
|
|
|
|
.. printf(".");
|
|
|
|
.. }
|
|
|
|
.. }
|