Commit 983a1600 changed the semantics of blk_write_zeroes() to
be byte-based rather than sector-based, but did not change the
name, which is an open invitation for other code to misuse the
function. Renaming to pwrite_zeroes() makes it more in line
with other byte-based interfaces, and will help make it easier
to track which remaining write_zeroes interfaces still need
conversion.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
because the original callback and opaque are gone by the time DMAIOFunc
is called. On the other hand, the BlockBackend is usually derived
from those extra data that you could pass to the DMAIOFunc (in the
next patch, that would be the SCSIRequest).
So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
and blk_aio_writev's. The new prototype loses the BlockBackend
and gains an extra opaque value which, in the case of dma_blk_readv
and dma_blk_writev, is of course used for the BlockBackend.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blk_new() cannot fail so its Error ** parameter has become superfluous.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Its only caller is blk_new_open(), so we can just inline it there.
The bdrv_new_root() call is dropped in the process because we can just
let bdrv_open() create the BDS.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJXQ4i7AAoJECgHk2+YTcWmkhsQAIELcp/4Kj/9TBqurmicXrBm
sEfLGThc94NZv6tDNUg4d96x83263RJPUsJqOP7TGSavO5xWD8J4A1yf8x6tI71l
/kr064fZzYZGgKyaIvFBqmNT8uy3CVZ1+5algaYh4pOBD3y0hrTBmwB2vIHZeKDC
6ljLuOVV2/n3SlhthK4Me9DiNPTwmMolfq2EG+5jBHmFXnbXRmApXaiX4a3qNvI+
Aqz9btyJgTReepPcpuxF3o/h+Dx0JgC1gT4bPkIDV0wx00adUWufhRqc3D1QaOBr
AREMSpeepuHNrkPpc7ITvyMK9q+bW8OzB0koXQW6Q50DtM7DRsxkqsr3TDvkSGG+
ZyxsL6UdDoSKwJaDcbkhRL82P6U+MvfMBLMbo4V+S1728maUVRx74Ah+axRVX6wb
hWtEYPvFUKp03lY82hXDoZJC+WLu+mhuAqS6a74/OG47lV2V61X8i3zrZ54RVHMh
1fkr4MXkPvrRTH+ifjJlH6leWg5JA6OCPPMqO0fujUIxlMGA9QAHK6Fs0ReKEUyy
WNXQ/CMfHcZv3WIJWH2NEBTJHfQwP4KWM6nIoO8Z4WT7tFpqUGULabJ9jt5t3NB5
lW586UT6pCnqUwh89yTYLg0+Q+rmgmIlAwjVL6q0tRXXL98kosng2jgzlu7gpSBJ
2lXFSzoqr+Ux3GbNF6E4
=Rh0Q
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 queue, 2016-05-23
# gpg: Signature made Mon 23 May 2016 23:48:27 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
* remotes/ehabkost/tags/x86-pull-request:
target-i386: kvm: Eliminate kvm_msr_entry_set()
target-i386: kvm: Simplify MSR setting functions
target-i386: kvm: Simplify MSR array construction
target-i386: kvm: Increase MSR_BUF_SIZE
target-i386: kvm: Allocate kvm_msrs struct once per VCPU
target-i386: Call cpu_exec_init() on realize
target-i386: Move TCG initialization to realize time
target-i386: Move TCG initialization check to tcg_x86_init()
cpu: Eliminate cpudef_init(), cpudef_setup()
target-i386: Set constant model_id for qemu64/qemu32/athlon
pc: Set CPU model-id on compat_props for pc <= 2.4
osdep: Move default qemu_hw_version() value to a macro
target-i386: kvm: Use X86XSaveArea struct for xsave save/load
target-i386: Use xsave structs for ext_save_area
target-i386: Define structs for layout of xsave area
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
x86_cpudef_init() doesn't do anything anymore, cpudef_init(),
cpudef_setup(), and x86_cpudef_init() can be finally removed.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Usually, Random Number Generator is abbreviated to RNG/rng.
so replacing RndRandom with RngRandom seems more reasonable
and keep consistent with RngBackend.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <1460684168-5403-1-git-send-email-weijg.fnst@cn.fujitsu.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
All DisplayType values are just UI options that don't affect any
hardware emulation code, except for DT_NOGRAPHIC. Replace
DT_NOGRAPHIC with DT_NONE plus a new "-machine graphics=on|off"
option, so hardware emulation code don't need to use the
display_type variable.
Cc: Michael Walle <michael@walle.cc>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of reusing DT_SDL for Cocoa, use DT_COCOA to indicate
that a Cocoa display was requested.
configure already ensures CONFIG_COCOA and CONFIG_SDL are never
set at the same time. The only case where DT_SDL is used outside
a #ifdef CONFIG_SDL block is in the no_frame/alt_grab/ctrl_grab
check. That means the only user-visible change is that we will
start printing a warning if the SDL-specific options are used in
Cocoa mode. This is a bugfix, because no_frame/alt_grab/ctrl_grab
are not used by Cocoa code.
Cc: Andreas Färber <andreas.faerber@web.de>
Cc: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of implementing separate check functions for each vga
interface type, add a table enumerating the possible VGA
interfaces.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
In many cases we just want to know whether a BDS has at least one BB
attached, without needing to know the exact BB that is attached. In
contrast to bs->blk, this is still a valid question when more than one
BB can be attached, so just answer it by checking the parents list.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Since virtio-blk implements request merging itself these days, the only
remaining users are test cases for the function. That doesn't make the
function exactly useful any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.
This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Some features, like I/O throttling, are implemented outside
block-backend.c, but still want to keep information in BlockBackend,
e.g. list entries that allow keeping a list of BlockBackends.
In order to avoid exposing the whole struct layout in the public header
file, this patch introduces an embedded public struct where such
information can be added and a pair of functions to convert between
BlockBackend and BlockBackendPublic.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Memory barriers are needed also by Xen and, when the ioeventfd
bugs are fixed, by TCG as well.
sysemu/kvm.h is not anymore needed in sysemu/dma.h, move it to
the actual users.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return the negated value of accel_initialised is meaningless,
and the caller vl doesn't check it.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Now that there are no remaining clients, we can drop the
sector-based blk_read(), blk_write(), blk_aio_readv(), and
blk_aio_writev(). Sadly, there are still remaining
sector-based interfaces, such as blk_*discard(), or
blk_write_compressed(); those will have to wait for another
day.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
The patch had to touch multiple files at once, because dma_blk_io()
takes pointers to the functions, and ide_issue_trim() piggybacks on
the same interface (while ignoring offset under the hood).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).
Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead. Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.
This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This can be used when probing whether KVM support specific device. Here,
a raw vmfd is used.
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1458788142-17509-4-git-send-email-peterx@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch introduces block driver that implement recording
and replaying of block devices' operations.
All block completion operations are added to the queue.
Queue is flushed at checkpoints and information about processed requests
is recorded to the log. In replay phase the queue is matched with
events read from the log. Therefore block devices requests are processed
deterministically.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
[ kwolf: Rebased onto modified and already applied part of the series ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
ParallelIOArg is shared between just qemu-char.c and
hw/char/parallel.c, and as such has no business in qemu-common.h.
Move it to sysemu/char.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu-common.h should only be included by .c files. Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."
qemu/iov.h includes qemu-common.h for QEMUIOVector stuff. Move all
that to qemu/iov.h and drop the ill-advised include. Include
qemu/iov.h where the QEMUIOVector stuff is now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Re-run scripts/clean-includes to apply the previous commit's
corrections and updates. Besides redundant qemu/typedefs.h, this only
finds a redundant config-host.h include in ui/egl-helpers.c. No idea
how that escaped the previous runs.
Some manual whitespace trimming around dropped includes squashed in.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This function iterates over all BDSs attached to a BB. We are going to
need it when rewriting bdrv_next() so it no longer uses bdrv_states.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We can basically inline it in hmp_drive_del(); monitor_remove_blk() is
called already, so we just need to call bdrv_make_anon(), too.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.
In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".
If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Introduce separate functions (monitor_add_blk() and
monitor_remove_blk()) which set or unset a BB name. Since the name is
equivalent to the monitor's reference to a BB, adding a name the same as
declaring the BB to be monitor-owned and removing it revokes this
status, hence the function names.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Later, we will remove bdrv_commit_all() and move its contents here, and
in order to replace bdrv_commit_all() calls by calls to blk_commit_all()
before doing so, we need to add it as an alias now.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu_clock_warp function is called to update virtual clock when CPU
is sleeping. This function includes replay checkpoint to make execution
deterministic in icount mode.
Record/replay module flushes async event queue at checkpoints.
Some of the events (e.g., block devices operations) include interaction
with hardware. E.g., APIC polled by block devices sets one of IRQ flags.
Flag to be set depends on currently executed thread (CPU or iothread).
Therefore in replay mode we have to process the checkpoints in the same thread
as they were recorded.
qemu_clock_warp function (and its checkpoint) may be called from different
thread. This patch decouples two different execution cases of this function:
call when CPU is sleeping from iothread and call from cpu thread to update
virtual clock.
First task is performed by qemu_start_warp_timer function. It sets warp
timer event to the moment of nearest pending virtual timer.
Second function (qemu_account_warp_timer) is called from cpu thread
before execution of the code. It advances virtual clock by adding the length
of period while CPU was sleeping.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20160310115609.4812.44986.stgit@PASHA-ISP>
[Update docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch implements record and replay of character devices.
It records chardevs communication in replay mode. Recorded information
include data read from backend and counter of bytes written
from frontend to backend to preserve frontend internal state.
If character device was configured through the command line in record mode,
then in replay mode it should be also added to command line. Backend of
the character device could be changed in replay mode.
Replaying of devices that perform ioctl and get_msgfd operations is not
supported.
gdbstub which also acts as a backend is not recorded to allow controlling
the replaying through gdb. Monitor backends are also not recorded.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20160314074436.4980.83856.stgit@PASHA-ISP>
[Add stubs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We check that the guest can't write beyond the end of its disk, but for
other internal users it can make sense to allow growing a file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that QEMU wraps the Win32 sockets methods to automatically
set errno upon failure, there is no reason for callers to use
the socket_error() method. They can rely on accessing errno
even on Win32. Remove all use of socket_error() from general
code, leaving it as a static method in oslib-win32.c only.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The windows socket functions look identical to the normal POSIX
sockets functions, but instead of setting errno, the caller needs
to call WSAGetLastError(). QEMU has tried to deal with this
incompatibility by defining a socket_error() method that callers
must use that abstracts the difference between WSAGetLastError()
and errno.
This approach is somewhat error prone though - many callers of
the sockets functions are just using errno directly because it
is easy to forget the need use a QEMU specific wrapper. It is
not always immediately obvious that a particular function will
in fact call into Windows sockets functions, so the dev may not
even realize they need to use socket_error().
This introduces an alternative approach to portability inspired
by the way GNULIB fixes portability problems. We use a macro to
redefine the original socket function names to refer to a QEMU
wrapper function. The wrapper function calls the original Win32
sockets method and then sets errno from the WSAGetLastError()
value.
Thus all code can simply call the normal POSIX sockets APIs are
have standard errno reporting on error, even on Windows. This
makes the socket_error() method obsolete.
We also bring closesocket & ioctlsocket into this approach. Even
though they are non-standard Win32 names, we can't wrap the normal
close/ioctl methods since there's no reliable way to distinguish
between a file descriptor and HANDLE in Win32.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Historically QEMU has had a socket_error() macro that was
defined to map to WSASocketError(). The os-win32.h header
file would define errno constants that mapped to the
WSA error constants. This worked fine with Mingw32 since
its header files never defined any errno values, nor did
it even provide an errno.h. So callers of socket_error()
could match on traditional Exxxx constants and it would
all "just work".
With Mingw64 though, things work rather differently. First
there is an errno.h file which defines all the traditional
errno constants you'd expect from a UNIX platform. There
is then a winerror.h which defined the WSA error constants.
Crucially the WSAExxxx errno values in winerror.h do not
match the Exxxx errno values in error.h.
If QEMU had only imported winerror.h it would still work,
but the qemu/osdep.h file unconditionally imports errno.h.
So callers of socket_error() will get now WSAExxxx values
back and compare them to the Exxx constants. This will
always fail silently at runtime.
To solve this QEMU needs to stop assuming the WSAExxxx
constant values match the Exxx constant values. Thus the
socket_error() macro is turned into a small function that
re-maps WSAExxxx values into Exxx.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
QSIMPLEQ supports appending to tail in O(1) and is intrusive so
it doesn't require extra memory allocations for the bookkeeping
data.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1457010971-24771-1-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>