qemu/block
Liu Yuan a9db86b223 block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default
pattern.

For now we do a quorum vote on all the reads, it is designed for unreliable
underlying storage such as non-redundant NFS to make sure data integrity at the
cost of the read performance.

For some use cases as following:

        VM
  --------------
  |            |
  v            v
  A            B

Both A and B has hardware raid storage to justify the data integrity on its own.
So it would help performance if we do a single read instead of on all the nodes.
Further, if we run VM on either of the storage node, we can make a local read
request for better performance.

This patch generalize the above 2 nodes case in the N nodes. That is,

vm -> write to all the N nodes, read just one of them. If single read fails, we
try to read next node in FIFO order specified by the startup command.

The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
functionality in the single device/node failure for now. But compared with DRBD
we still have some advantages over it:

- Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
storage. And if A crashes, we need to restart all the VMs on node B. But for
practice case, we can't because B might not have enough resources to setup 20 VMs
at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
images over the data center, we can very likely restart 20 VMs without any
resource problem.

After all, I think we can build a more powerful replicated image functionality
on quorum and block jobs(block mirror) to meet various High Availibility needs.

E.g, Enable single read pattern on 2 children,

-drive driver=quorum,children.0.file.filename=0.qcow2,\
children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1

[1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device

[Dropped \n from an error_setg() error message
--Stefan]

Cc: Benoit Canet <benoit@irqsave.net>
Cc: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
..
archipelago.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
backup.c block/backup: Fix hang for unaligned image size 2014-07-09 15:50:11 +02:00
blkdebug.c blkdebug: Delete BH in bdrv_aio_cancel 2014-08-22 11:07:00 +02:00
blkverify.c blkverify: Implement bdrv_refresh_filename() 2014-08-20 14:31:56 +02:00
bochs.c block: Use g_new() & friends to avoid multiplying sizes 2014-08-20 11:51:28 +02:00
cloop.c cloop: Handle failure for potentially large allocations 2014-08-15 15:07:15 +02:00
commit.c block: extend block-commit to accept a string for the backing file 2014-07-01 10:47:01 +02:00
cow.c block/cow: Avoid use of uninitialized cow_bs in error path 2014-07-01 10:15:34 +02:00
curl.c block.curl: adding 'timeout' option 2014-08-29 10:46:57 +01:00
dmg.c dmg: Handle failure for potentially large allocations 2014-08-15 15:07:15 +02:00
gluster.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
iscsi.c block/iscsi: fix memory corruption on iscsi resize 2014-08-22 10:55:22 +02:00
linux-aio.c linux-aio: Fix laio resource leak 2014-07-15 15:34:13 +02:00
Makefile.objs block: Support Archipelago as a QEMU block backend 2014-08-15 15:07:14 +02:00
mirror.c mirror: fix uninitialized variable delay_ns warnings 2014-08-28 13:42:25 +01:00
nbd-client.c nbd: implement .bdrv_detach/attach_aio_context() 2014-06-04 09:56:11 +02:00
nbd-client.h nbd: implement .bdrv_detach/attach_aio_context() 2014-06-04 09:56:11 +02:00
nbd.c nbd: Implement bdrv_refresh_filename() 2014-08-20 14:31:56 +02:00
nfs.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
parallels.c block: Use g_new() & friends to avoid multiplying sizes 2014-08-20 11:51:28 +02:00
qapi.c qemu-img info: show nocow info 2014-08-15 15:07:14 +02:00
qcow2-cache.c qcow2: Use g_try_new0() for cache array 2014-08-20 11:51:28 +02:00
qcow2-cluster.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
qcow2-refcount.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
qcow2-snapshot.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
qcow2.c qcow2: Add runtime options for cache sizes 2014-08-20 11:51:28 +02:00
qcow2.h qcow2: Add runtime options for cache sizes 2014-08-20 11:51:28 +02:00
qcow.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
qed-check.c block: Use g_new() & friends to avoid multiplying sizes 2014-08-20 11:51:28 +02:00
qed-cluster.c Use glib memory allocation and free functions 2011-08-20 23:01:08 -05:00
qed-gencb.c Use glib memory allocation and free functions 2011-08-20 23:01:08 -05:00
qed-l2-cache.c qed: do not evict in-use L2 table cache entries 2012-03-12 15:14:06 +01:00
qed-table.c qed: use BlockDriverState's AioContext 2014-06-04 09:56:11 +02:00
qed.c qed: Handle failure for potentially large allocations 2014-08-15 15:07:15 +02:00
qed.h qed: Make qiov match request size until backing file EOF 2014-07-14 12:03:20 +02:00
quorum.c block/quorum: add simple read pattern support 2014-08-29 10:46:58 +01:00
raw_bsd.c block: Add Error argument to bdrv_refresh_limits() 2014-07-18 13:18:43 +01:00
raw-aio.h linux-aio: implement io plug, unplug and flush io queue 2014-07-07 11:05:17 +02:00
raw-posix.c raw-posix: fix O_DIRECT short reads 2014-08-22 11:00:56 +02:00
raw-win32.c cleanup QEMUOptionParameter 2014-06-16 17:23:21 +08:00
rbd.c block: Use g_new() & friends to avoid multiplying sizes 2014-08-20 11:51:28 +02:00
sheepdog.c sheepdog: improve error handling for a case of failed lock 2014-08-29 10:46:57 +01:00
snapshot.c Use error_is_set() only when necessary 2014-02-17 11:57:23 -05:00
ssh.c cleanup QEMUOptionParameter 2014-06-16 17:23:21 +08:00
stream.c block: Add Error argument to bdrv_refresh_limits() 2014-07-18 13:18:43 +01:00
vdi.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
vhdx-endian.c block: VHDX endian fixes 2014-08-15 15:07:14 +02:00
vhdx-log.c block: Drop some superfluous casts from void * 2014-08-20 11:51:28 +02:00
vhdx.c block: Use g_new() & friends where that makes obvious sense 2014-08-20 11:51:28 +02:00
vhdx.h block: VHDX endian fixes 2014-08-15 15:07:14 +02:00
vmdk.c vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted 2014-08-22 11:10:12 +02:00
vpc.c vpc: Handle failure for potentially large allocations 2014-08-15 15:07:16 +02:00
vvfat.c block/vvfat.c: remove debugging code to reinit stderr if NULL 2014-08-21 10:36:29 +02:00
win32-aio.c raw-win32: Handle failure for potentially large allocations 2014-08-15 15:07:16 +02:00