Commit Graph

44810 Commits

Author SHA1 Message Date
Markus Armbruster
a3feb08639 ivshmem: Rely on server sending the ID right after the version
The protocol specification (ivshmem-spec.txt, formerly
ivshmem_device_spec.txt) has always required the ID message to be sent
right at the beginning, and ivshmem-server has always complied.  The
device, however, accepts it out of order.  If an interrupt setup
arrived before it, though, it would be misinterpreted as connect
notification.  Fix the latent bug by relying on the spec and
ivshmem-server's actual behavior.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-28-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
1309cf448a ivshmem: Propagate errors through ivshmem_recv_setup()
This kills off the funny state described in the previous commit.

Simplify ivshmem_io_read() accordingly, and update documentation.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-27-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
3a55fc0f24 ivshmem: Receive shared memory synchronously in realize()
When configured for interrupts (property "chardev" given), we receive
the shared memory from an ivshmem server.  We do so asynchronously
after realize() completes, by setting up callbacks with
qemu_chr_add_handlers().

Keeping server I/O out of realize() that way avoids delays due to a
slow server.  This is probably relevant only for hot plug.

However, this funny "no shared memory, yet" state of the device also
causes a raft of issues that are hard or impossible to work around:

* The guest is exposed to this state: when we enter and leave it its
  shared memory contents is apruptly replaced, and device register
  IVPosition changes.

  This is a known issue.  We document that guests should not access
  the shared memory after device initialization until the IVPosition
  register becomes non-negative.

  For cold plug, the funny state is unlikely to be visible in
  practice, because we normally receive the shared memory long before
  the guest gets around to mess with the device.

  For hot plug, the timing is tighter, but the relative slowness of
  PCI device configuration has a good chance to hide the funny state.

  In either case, guests complying with the documented procedure are
  safe.

* Migration becomes racy.

  If migration completes before the shared memory setup completes on
  the source, shared memory contents is silently lost.  Fortunately,
  migration is rather unlikely to win this race.

  If the shared memory's ramblock arrives at the destination before
  shared memory setup completes, migration fails.

  There is no known way for a management application to wait for
  shared memory setup to complete.

  All you can do is retry failed migration.  You can improve your
  chances by leaving more time between running the destination QEMU
  and the migrate command.

  To mitigate silent memory loss, you need to ensure the server
  initializes shared memory exactly the same on source and
  destination.

  These issues are entirely undocumented so far.

I'd expect the server to be almost always fast enough to hide these
issues.  But then rare catastrophic races are in a way the worst kind.

This is way more trouble than I'm willing to take from any device.
Kill the funny state by receiving shared memory synchronously in
realize().  If your hot plug hangs, go kill your ivshmem server.

For easier review, this commit only makes the receive synchronous, it
doesn't add the necessary error propagation.  Without that, the funny
state persists.  The next commit will do that, and kill it off for
real.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-26-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
9db51b4d64 ivshmem: Plug leaks on unplug, fix peer disconnect
close_peer_eventfds() cleans up three things: ioeventfd triggers if
they exist, eventfds, and the array to store them.

Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
when they don't exist (property ioeventfd=off, which is the default).
Unfortunately, the fix also made it skip cleanup of the eventfds and
the array then.  This is a memory and file descriptor leak on unplug.

Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
unplug.  On peer disconnect, however, this permanently wedges the
interrupt vectors used for that peer's ID.  The eventfds stay behind,
but aren't connected to a peer anymore.  When the ID gets recycled for
a new peer, the new peer's eventfds get assigned to vectors after the
old ones.  Commonly, the device's number of vectors matches the
server's, so the new ones get dropped with a "Too many eventfd
received" message.  Interrupts either don't work (common case) or go
to the wrong vector.

Fix by narrowing the conditional to just the ioeventfd trigger
cleanup.

While there, move the "invalid" peer check to the only caller where it
can actually happen, and tighten it to reject own ID.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-25-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
ca0b7566cc ivshmem: Disentangle ivshmem_read()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-24-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
cd9953f720 ivshmem: Simplify rejection of invalid peer ID from server
ivshmem_read() processes server messages.  These are 64 bit signed
integers.  -1 is shared memory setup, 16 bit unsigned is a peer ID,
anything else is invalid.

ivshmem_read() rejects invalid negative messages right away, silently.

Invalid positive messages get rejected only in resize_peers(), and
ivshmem_read() then prints the rather cryptic message "failed to
resize peers array".

Extend the first check to cover all invalid messages, make it report
"server sent invalid message", and drop the second check.

Now resize_peers() can't fail anymore; simplify.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-23-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
3c27969b3e ivshmem: Assert interrupts are set up once
An interrupt is set up when the interrupt's file descriptor is
received.  Each message applies to the next interrupt vector.
Therefore, each vector cannot be set up more than once.

ivshmem_add_kvm_msi_virq() half-heartedly tries not to rely on this by
doing nothing then, but that's not going to recover from this error
should it become possible in the future.  watch_vector_notifier()
doesn't even try.

Simply assert what is the case, so we get alerted if we ever screw it
up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-22-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
2d1d422d11 ivshmem: Leave INTx alone when using MSI-X
The ivshmem device can either use MSI-X or legacy INTx for interrupts.

With MSI-X enabled, peer interrupt events trigger an MSI as they
should.  But software can still raise INTx via interrupt status and
mask register in BAR 0.  This is explicitly prohibited by PCI Local
Bus Specification Revision 3.0, section 6.8.3.3:

    While enabled for MSI or MSI-X operation, a function is prohibited
    from using its INTx# pin (if implemented) to request service (MSI,
    MSI-X, and INTx# are mutually exclusive).

Fix the device model to leave INTx alone when using MSI-X.

Document that we claim to use INTx in config space even when we don't.
Unlike other devices, ivshmem does *not* use INTx when configured for
MSI-X and MSI-X isn't enabled by software.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-21-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
082751e82b ivshmem: Clean up MSI-X conditions
There are three predicates related to MSI-X:

* ivshmem_has_feature(s, IVSHMEM_MSI) is true unless the non-MSI-X
  variant of the device is selected with msi=off.

* msix_present() is true when the device has the PCI capability MSI-X.
  It's initially false, and becomes true during successful realize of
  the MSI-X variant of the device.  Thus, it's the same as
  ivshmem_has_feature(s, IVSHMEM_MSI) for realized devices.

* msix_enabled() is true when msix_present() is true and guest software
  has enabled MSI-X.

Code that differs between the non-MSI-X and the MSI-X variant of the
device needs to be guarded by ivshmem_has_feature(s, IVSHMEM_MSI) or
by msix_present(), except the latter works only for realized devices.

Code that depends on whether MSI-X is in use needs to be guarded with
msix_enabled().

Code review led me to two minor messes:

* ivshmem_vector_notify() calls msix_notify() even when
  !msix_enabled(), unlike most other MSI-X-capable devices.  As far as
  I can tell, msix_notify() does nothing when !msix_enabled().  Add
  the guard anyway.

* Most callers of ivshmem_use_msix() guard it with
  ivshmem_has_feature(s, IVSHMEM_MSI).  Not necessary, because
  ivshmem_use_msix() does nothing when !msix_present().  That's
  ivshmem's only use of msix_present(), though.  Guard it
  consistently, and drop the now redundant msix_present() check.
  While there, rename ivshmem_use_msix() to ivshmem_msix_vector_use().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-20-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
434ad76db5 ivshmem: Clean up register callbacks
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-19-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
d855e27565 ivshmem: Failed realize() can leave migration blocker behind
If pci_ivshmem_realize() fails after it created its migration blocker,
the blocker is left in place.  Fix that by creating it last.

Likewise, if it fails after it called fifo8_create(), it leaks fifo
memory.  Fix that the same way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-18-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
9cf70c5225 ivshmem: Fix harmless misuse of Error
We reuse errp after passing it host_memory_backend_get_memory().  If
both host_memory_backend_get_memory() and the reuse set an error, the
reuse will fail the assertion in error_setv().  Fortunately,
host_memory_backend_get_memory() can't fail.

Pass it &error_abort to make our assumption explicit, and to get the
assertion failure in the right place should it become invalid.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-17-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
71c265816d ivshmem: Don't destroy the chardev on version mismatch
Yes, the chardev is commonly useless after we read a bad version from
it, but destroying it is inappropriate anyway: the user created it, so
the user should be able to hold on to it as long as he likes.  We
don't destroy it on other errors.  Screwed up in commit 5105b1d.

Stop reading instead.

Also note QEMU's behavior in ivshmem-spec.txt.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-16-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
c20fc0c3ee ivshmem: Drop ivshmem_event() stub
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-15-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
e64befe929 ivshmem: Clean up after commit 9940c32
IVShmemState member eventfd_chr is useless since commit 9940c32.  Drop
it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-14-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
a4fa93bf20 ivshmem: Compile debug prints unconditionally to prevent bit-rot
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-13-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
97553976dd ivshmem: Add missing newlines to debug printfs
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-12-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
fdee2025dd ivshmem: Rewrite specification document
This started as an attempt to update ivshmem_device_spec.txt for
clarity, accuracy and completeness while working on its code, and
quickly became a full rewrite.  Since the diff would be useless
anyway, I'm using the opportunity to rename the file to
ivshmem-spec.txt.

I tried hard to ensure the new text contradicts neither the old text
nor the code.  If the new text contradicts the old text but not the
code, it's probably a bug in the old text.  If the new text
contradicts both, its probably a bug in the new text.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-11-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
41b65e5eda ivshmem-test: Improve test cases /ivshmem/server-*
Document missing test: behavior with MSI-X present but not enabled.

For MSI-X, we test and clear the interrupt pending bit before testing
the interrupt.  For INTx, we only clear.  Change to test and clear for
consistency.

Test MSI-X vector 1 in addition to vector 0.

Improve comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-10-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
14c5d49ab3 ivshmem-test: Clean up wait for devices to become operational
test_ivshmem_server() waits until the first byte in BAR 2 contains the
0x42 we put into shared memory.  Works because the byte reads zero
until the device maps the shared memory gotten from the server.

Check the IVPosition register instead: it's initially -1, and becomes
non-negative right when the device maps the share memory, so no
change, just cleaner, because it's what guest software is supposed to
do.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-9-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
4958fe5d3c ivshmem-test: Improve test case /ivshmem/single
Test state of registers after reset.

Test reading Interrupt Status clears it.

Test (invalid) read of Doorbell.

Add more comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-8-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
998261726a tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned
qpci_pc_iomap() maps BARs one after the other, without padding.  This
is wrong.  PCI Local Bus Specification Revision 3.0, 6.2.5.1. Address
Maps: "all address spaces used are a power of two in size and are
naturally aligned".  That's because the size of a BAR is given by the
number of address bits the device decodes, and the BAR needs to be
mapped at a multiple of that size to ensure the address decoding
works.

Fix qpci_pc_iomap() accordingly.  This takes care of a FIXME in
ivshmem-test.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-7-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
330b58368c event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
Event notifiers are designed for eventfd(2).  They can fall back to
pipes, but according to Paolo, event_notifier_init_fd() really
requires the real thing, and should therefore be under #ifdef
CONFIG_EVENTFD.  Do that.

Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
to CONFIG_EVENTFD.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-6-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
ad4929384b qemu-doc: Fix ivshmem huge page example
Option parameter "share" is missing.  Without it, you get a *private*
mmap(), which defeats ivshmem's purpose pretty thoroughly ;)

While there, switch to the conventional mountpoint of hugetlbfs
/dev/hugepages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-5-git-send-email-armbru@redhat.com>
2016-03-18 17:34:55 +01:00
Markus Armbruster
3625c739ea ivshmem-server: Don't overload POSIX shmem and file name
Option -m NAME is interpreted as directory name if we can statfs() it
and its on hugetlbfs.  Else it's interpreted as POSIX shared memory
object name.  This is nuts.

Always interpret -m as directory.  Create new -M for POSIX shared
memory.  Last of -m or -M wins.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-4-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-18 17:34:40 +01:00
Markus Armbruster
e3ad72965a ivshmem-server: Fix and clean up command line help
Burying error messages in ~20 lines of usage help is bad form.  Print
a single line pointing to -h instead.

Print -h help to stdout rather than stderr.  Fix default of -p.  Clean
up the help text a bit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-3-git-send-email-armbru@redhat.com>
2016-03-18 17:34:40 +01:00
Markus Armbruster
3be5cc2324 target-ppc: Document TOCTTOU in hugepage support
The code to find the minimum page size is is vulnerable to TOCTTOU.
Added in commit 2d103aa "target-ppc: fix hugepage support when using
memory-backend-file" (v2.4.0).  Since I can't fix it myself right now,
add a FIXME comment.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-2-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-18 17:34:21 +01:00
Peter Maydell
6741d38ad0 Block layer patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJW6tIJAAoJEH8JsnLIjy/WsDUP/AhuS1/89YcQKinnC7oFbdRo
 PDe+z4E9O8+OODFdKfo8LVaUAREJvhykjkP7nL6SeUsPmSjwqAK2FDKV1ykId24R
 BB+JXeH57Kwmc+8rtJE9FuqhEL0CUc1sC0/leHltVKwnsI95rrZX6NRKzJnIw4aS
 zOAVFbMQQCAaN1pqFI0SV/y3c4uz0ycFkW+AxvskzSjrVsS0JX9Ezg5kiDAypOQe
 dSeg1kQ2hVNf8inpEnLCXK1rmODBfBtWB7ckoPIUC96ePdT6buGuCBQtAIBvqGAq
 TuNi7CxzKIPwxygSt9Et92pTn6WGaDooN6wOZCsudY8P5qx6C/pdx1qRp+Vx/HE7
 b5u7EPXGXyY5zroq7OMPg+R3JmUg16twCkyj+SuMLyJl8KwM0nbdkyIIgnEaTOoJ
 Z9emXUzusNGtUi5hgXnSCC5Bak/8ujsczdhfdrNHkVvIJeRMas4FNZ9euH3xN0XO
 72d8SXr11Hk4LloUR0gGwrIiULLQfePF5Y0udCRoeqadh3NEHT5GgUS1cNVwsCgI
 Q/SUDEESj20PnFcc2tZ7ojlCZUSJmoXOZgdn3wxdTy9zF8aMxtNKlMQfqD6mifsi
 aww+20m+qSu1IcUZAlFZB5pfWeAMTIT84iCPqmw4XGPr5727lgePFzVnnHrQX12Y
 BIUtbg8gUffHuGl/J39H
 =OfzG
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 17 Mar 2016 15:49:29 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (29 commits)
  iotests: Test QUORUM_REPORT_BAD in fifo mode
  quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode
  block: Use blk_co_pwritev() in blk_co_write_zeroes()
  block: Use blk_aio_prwv() for aio_read/write/write_zeroes
  block: Use blk_prw() in blk_pread()/blk_pwrite()
  block: Use blk_co_pwritev() in blk_write_zeroes()
  block: Pull up blk_read_unthrottled() implementation
  block: Use blk_co_pwritev() for blk_write()
  block: Use blk_co_preadv() for blk_read()
  block: Use BdrvChild in BlockBackend
  block: Remove bdrv_states list
  block: Use bdrv_next() instead of bdrv_states
  block: Rewrite bdrv_next()
  block: Add blk_next_root_bs()
  block: Add bdrv_next_monitor_owned()
  block: Move some bdrv_*_all() functions to BB
  blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()
  blockdev: Split monitor reference from BB creation
  blockdev: Separate BB name management
  blockdev: Add list of all BlockBackends
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-17 15:59:42 +00:00
Kevin Wolf
361dca7a5a Two quorum patches for the block queue, v2.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW6tDLAAoJEDuxQgLoOKytAisH/3nPs+SCVD9Vd936wx0o1+dG
 2sWwR2QirfbK98+/XGIuW77qXzsrKbaku+KqCvQ8/R6zTKENblEm+EJuCqcvif/f
 6SktylvNlyKPuKFqCY9mHbaC2tiFYUsmH50afrMegu0dO9ZM4DnRcgAqJKrGeHVr
 U3kbLjmTYnciv3YJ4YgWyKkY++IuZXT0ElS2lasOxPa8ntQhFSQgRWdjQE0RZyEC
 wB5gJQYtwOdc6++Y/cGQgnoY/Nz24ggAnQ3OaDJSH4GdWjNUKn02KaZhcBw+MgfU
 lnfTFpPJ57htWuP3Pbi69dh+qpdkU9U+wD6TXecrnLdAovUf8m+/M/x76LvU9ro=
 =nG9k
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-03-17-v2' into queue-block

Two quorum patches for the block queue, v2.

# gpg: Signature made Thu Mar 17 16:44:11 2016 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-03-17-v2:
  iotests: Test QUORUM_REPORT_BAD in fifo mode
  quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:48:49 +01:00
Alberto Garcia
509565f36f iotests: Test QUORUM_REPORT_BAD in fifo mode
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: c0a8dbfdbe939520cda5f661af6f1cd7b6b4df9d.1458034554.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-17 16:43:30 +01:00
Alberto Garcia
6049490df4 quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode
If there's an I/O error in one of Quorum children then QEMU
should emit QUORUM_REPORT_BAD. However this is not working with
read-pattern=fifo. This patch fixes this problem.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: d57e39e8d3e8564003a1e2aadbd29c97286eb2d2.1458034554.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-17 16:43:30 +01:00
Kevin Wolf
8896e08814 block: Use blk_co_pwritev() in blk_co_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf
57d6a42883 block: Use blk_aio_prwv() for aio_read/write/write_zeroes
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf
a55d3fba99 block: Use blk_prw() in blk_pread()/blk_pwrite()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
fc1453cdfc block: Use blk_co_pwritev() in blk_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
5bd5119667 block: Pull up blk_read_unthrottled() implementation
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
a8823a3bfd block: Use blk_co_pwritev() for blk_write()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
1bf1cbc91f block: Use blk_co_preadv() for blk_read()
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
f21d96d04b block: Use BdrvChild in BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
9aaf28c61d block: Remove bdrv_states list
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
79720af640 block: Use bdrv_next() instead of bdrv_states
There is no point in manually iterating through the bdrv_states list
when there is bdrv_next().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
2626058034 block: Rewrite bdrv_next()
Instead of using the bdrv_states list, iterate over all the
BlockDriverStates attached to BlockBackends, and over all the
monitor-owned BDSs afterwards (except for those attached to a BB).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
981f4f578e block: Add blk_next_root_bs()
This function iterates over all BDSs attached to a BB. We are going to
need it when rewriting bdrv_next() so it no longer uses bdrv_states.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
262b4e8f74 block: Add bdrv_next_monitor_owned()
Add a function for iterating over all monitor-owned BlockDriverStates so
the generic block layer can do so.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
fe1a9cbc33 block: Move some bdrv_*_all() functions to BB
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
7c735873d9 blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()
We can basically inline it in hmp_drive_del(); monitor_remove_blk() is
called already, so we just need to call bdrv_make_anon(), too.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
efaa7c4eeb blockdev: Split monitor reference from BB creation
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.

In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".

If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
e5e785500b blockdev: Separate BB name management
Introduce separate functions (monitor_add_blk() and
monitor_remove_blk()) which set or unset a BB name. Since the name is
equivalent to the monitor's reference to a BB, adding a name the same as
declaring the BB to be monitor-owned and removing it revokes this
status, hence the function names.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
2cf22d6a1a blockdev: Add list of all BlockBackends
While monitor_block_backends contains nearly all BBs, we sometimes
really need all BBs. To this end, this patch adds the block_backend
list.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
9492b0b928 blockdev: Rename blk_backends
The blk_backends list does not contain all BlockBackends but only the
ones which are referenced by the monitor, and that is not necessarily
true for every BlockBackend. Rename the list to monitor_block_backends
to make that fact clear.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00