In a real AHCI device, several S/ATA registers are mirrored or shadowed
within the AHCI register set. These registers are not updated
synchronously for each read access, but are instead updated after a
Device-to-Host Register FIS packet is received. The D2H FIS contains
the values from these registers on the device.
In QEMU, by reaching directly into the device to grab these bits before
they are "sent," we may introduce race conditions where unexpected
values are present "before they are sent" which could cause issues for
some guests, particularly if an attempt is made to read the PxTFD
register prior to enabling the port, where incorrect values will be read.
This patch also addresses the boot-time values for the PxTFD and PxSIG
registers to bring them in line with the AHCI 1.3 specification.
Lastly, several fields (PxTFD, PxSIG and PxSACT) are read-only,
and any attempts to write to them should be ignored.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1408643079-30675-6-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a test wherein we engage the PCI AHCI
device and ensure that the memory region for the
HBA functionality is now accessible.
Under Q35 environments, additional PCI configuration
is performed to ensure that the HBA functionality
will become usable.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1408643079-30675-5-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Adds a specification adherence test for AHCI
where the boot-up values for the PCI configuration space
are compared against the AHCI 1.3 specification.
This test does not itself attempt to engage the device.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1408643079-30675-4-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In the Intel ICH9 data sheet, the MSI capability offset
in the PCI configuration space for ICH9 AHCI devices is
specified to be 0x80.
Further, the PCI capability pointer should always point
to 0x80 in ICH9 devices, despite the fact that AHCI 1.3
specifies that it should be pointing to PMCAP (Which in
this instance would be 0x70) to maintain adherence to
the Intel data sheet specifications and real observed behavior.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1408643079-30675-3-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently, there is no qtest to test the functionality of
the AHCI functionality present within the Q35 machine type.
This patch adds a skeleton for an AHCI test suite,
and adds a simple sanity-check test case where we
identify that the AHCI device is present, then
disengage the virtual machine.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1408643079-30675-2-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Refcount structures are placed in clusters randomly selected from all
unallocated host clusters.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Maria Kustova <maria.k@catit.be>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 7e2f38608db6fba2da53997390b19400d445c45d.1408450493.git.maria.k@catit.be
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qcow2 supports more than four options by now, add the new options
(overlap check mode and metadata cache size)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1408557576-14574-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Being able to set the overlap-check option to a string and then refine
it via the overlap-check.* options is a nice idea for the command line
but does not work so well for non-flattened dicts. In that case, one can
only specify either but not both, so add a field to overlap-check.*
which does the same as directly specifying overlap-check but can be used
in conjunction with the other fields in non-flattened dicts.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1408557576-14574-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1408557576-14574-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently, the QemuOpts object opts is leaked if anything fails from its
creation up to and including the image repair block. Fix this by freeing
that object in the fail path.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1408557576-14574-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add tests for unaligned L1/L2/reftable entries and non-fatal corruption
reports.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-6-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Offsets taken from the L1, L2 and refcount tables are generally assumed
to be correctly aligned. However, this cannot be guaranteed if the image
has been written to by something different than qemu, thus check all
offsets taken from these tables for correct cluster alignment.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use the new function in case of a failed overlap check.
This changes output in case of corruption, so adapt iotest 060's
reference output accordingly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1409926039-29044-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a helper function for easily marking an image corrupt (on fatal
corruptions) while outputting an informative message to stderr and via
QAPI.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1409926039-29044-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
reading from an image, they should generally not be. Nonetheless, even
an image only read from may of course be corrupted and this can be
detected during normal operation. In this case, a non-fatal event should
be emitted, but the image should not be marked corrupt (in accordance to
"fatal" set to false).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is an analogue to Linux null_blk. It can be used for testing or
benchmarking block device emulation and general block layer
functionalities such as coroutines and throttling, where disk IO is not
necessary or wanted.
Use null-aio:// for AIO version, and null-co:// for coroutine version.
[Resolved conflict with Fam's async bdrv_aio_cancel() series:
1. Drop .bdrv_aio_cancel() since it is now done by block.c
2. Rename qemu_aio_release() to qemu_aio_unref()
--Stefan]
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1410415798-20673-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If ret is WAIT_TIMEOUT and there was an event returned by select(),
we can write to a location after the end of the array. But in
that case we can retry the WaitForMultipleObjects call with the
same set of events, so just move the event[ret - WAIT_OBJECT_0]
assignment inside the existin conditional.
Reported-by: TeLeMan <geleman@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: TeLeMan <geleman@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Normally, qmp_device_list_properties() may return NULL when
a device haven't special properties excpet Object and DeviceState
properties, such as virtio-balloon-device.
We just need check local_err instead of prop_list.
Example:
Segmentation fault (core dumped)
The backtrace as below:
Program received signal SIGSEGV, Segmentation fault.
0x00005555559af1a8 in error_get_pretty (err=0x0) at util/error.c:152
152 return err->msg;
(gdb) bt
func=0x55555574a6ca <device_help_func>, opaque=0x0, abort_on_failure=0) at util/qemu-option.c:1072
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that all the implementations are converted to asynchronous version
and we can emulate synchronous cancellation with it. Let's drop the
unused member.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We know that either bh is scheduled or ide_issue_trim_cb will be called
again, so we just set i, j and ret to the right values. In both cases,
ide_trim_bh_cb will be called.
Also forward the cancellation to the iocb->aiocb which we get from
bdrv_aio_discard.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Also drop the now unused SheepdogAIOCB.finished field. Note that this
aio is internal to sheepdog driver and has NULL cb and opaque, and
should be unused at all.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Before, we cancel all the child requests with bdrv_aio_cancel, then free
the acb..
Now we just kick off asynchronous cancellation of child requests and
return, we know quorum_aio_cb will be called later, so in the end
quorum_aio_finalize will take care of calling the caller's cb.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For a fifo read pattern, we only have one running aio (possible other cases that
has less number than num_children in the future), so we need to check if
.acb is NULL against bdrv_aio_cancel() to avoid segfault.
Cc: Eric Blake <eblake@redhat.com>
Cc: Benoit Canet <benoit@irqsave.net>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The cancelled flag is no longer useful. Later the request will complete
as before, and cb will be called.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Just call io_cancel (2), if it fails, it means the request is not
canceled, so the event loop will eventually call
qemu_laio_process_completion.
In qemu_laio_process_completion, change to call the cb unconditionally.
It is required by bdrv_aio_cancel_async.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The .cancel_async shares the same the first half with .cancel: try to
steal the request if not submitted yet. In this case set the elem to
THREAD_DONE status and ret to -ECANCELED, which means
thread_pool_completion_bh will call the cb with -ECANCELED.
If the request is already submitted, do nothing, as we know the normal
completion will happen in the future.
Testing code update:
Before, done_cb is only called if the request is already submitted by
thread pool. Now done_cb is always called, even before it is submitted,
because we emulate bdrv_aio_cancel with bdrv_aio_cancel_async. So also
update the test criteria accordingly.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is the async version of bdrv_aio_cancel, which doesn't block the
caller. It guarantees that the cb is called either before returning or
some time later.
bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert
all .io_cancel implementations to .io_cancel_async, and the aio_poll is
the common logic. In the end, .io_cancel can be dropped.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This will be useful in synchronous cancel emulation with
bdrv_aio_cancel_async.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Before, bdrv_aio_cancel will either complete the request (like normal)
and call CB with an actual return code, or skip calling the request (for
example when the IO req is not submitted by thread pool yet).
We will change bdrv_aio_cancel to do it differently: always call CB
before return, with either [1] a normal req completion ret code, or [2]
ret == -ECANCELED. So the callers' callback must accept both cases. The
existing logic works with case [1], but not [2].
The simplest transition of callback code is do nothing in case [2], just
as if the CB is not called by the bdrv_aio_cancel() call.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Always initialize it with the return value of aio_prepare.
Reported-by: TeLeMan <geleman@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When the command completion code in IDE and AHCI
was unified to put all command completion inside
of a callback, "cmd_done," we neglected to
ensure that all AHCI/ATAPI command paths would
eventually register as finished. for the PCI
interface to IDE this is not a problem because
cmd_done is a nop, but the AHCI implementation
needs to send a D2H_REG_FIS and interrupt back
to the guest to inform of completion.
This patch adds calls to ide_stop_transfer,
which calls ide_cmd_done, inside of
ide_atapi_cmd_ok and ide_atapi_cmd_error.
This fixes regressions observed by trying to boot QEMU
with a Fedora 20 live CD under Q35/AHCI, which uses
ATAPI command 0x00, which is a status check that may
cause a hang because we never complete, and ATAPI
command 0x56, which is unsupported by our current
implementation and results in an error that we never
report back to the guest.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The parent_vhdx_guid variable is defined but never used, which provokes
complaints from newer versions of clang. Since the variable definition
is here acting as documentation of the image format, mark it with the
'unused' attribute to keep the compiler happy rather than simply
deleting it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Every QEMU_ARCH is now in (1 << n) notation, instead of a mixture of decimal and hexadecimal.
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>