These checks verify that the guest RAM turns from read-write to
"blackhole" when crossing the low boundary of the TSEG. Both the standard
1MB/2MB/8MB TSEG sizes and an extended (16MB) TSEG size are tested.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A test program can start up QEMU several times, with different command
lines. For such cases, qtest_start() and qtest_end() are called from
within the individual test functions. Examples: "virtio-console-test.c",
"numa-test.c", and many others.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The q35 machine type currently lets the guest firmware select a 1MB, 2MB
or 8MB TSEG (basically, SMRAM) size. In edk2/OVMF, we use 8MB, but even
that is not enough when a lot of VCPUs (more than approx. 224) are
configured -- SMRAM footprint scales largely proportionally with VCPU
count.
Introduce a new property for "mch" called "extended-tseg-mbytes", which
expresses (in megabytes) the user's choice of TSEG (SMRAM) size.
Invent a new, QEMU-specific register in the config space of the DRAM
Controller, at offset 0x50, in order to allow guest firmware to query the
TSEG (SMRAM) size.
According to Intel Document Number 316966-002, Table 5-1 "DRAM Controller
Register Address Map (D0:F0)":
Warning: Address locations that are not listed are considered Intel
Reserved registers locations. Reads to Reserved registers may
return non-zero values. Writes to reserved locations may
cause system failures.
All registers that are defined in the PCI 2.3 specification,
but are not necessary or implemented in this component are
simply not included in this document. The
reserved/unimplemented space in the PCI configuration header
space is not documented as such in this summary.
Offsets 0x50 and 0x51 are not listed in Table 5-1. They are also not part
of the standard PCI config space header. And they precede the capability
list as well, which starts at 0xe0 for this device.
When the guest writes value 0xffff to this register, the value that can be
read back is that of "mch.extended-tseg-mbytes" -- unless it remains
0xffff. The guest is required to write 0xffff first (as opposed to a
read-only register) because PCI config space is generally not cleared on
QEMU reset, and after S3 resume or reboot, new guest firmware running on
old QEMU could read a guest OS-injected value from this register.
After reading the available "extended" TSEG size, the guest firmware may
actually request that TSEG size by writing pattern 11b to the ESMRAMC
register's TSEG_SZ bit-field. (The Intel spec referenced above defines
only patterns 00b (1MB), 01b (2MB) and 10b (8MB); 11b is reserved.)
On the QEMU command line, the value can be set with
-global mch.extended-tseg-mbytes=N
The default value for 2.10+ q35 machine types is 16. The value is limited
to 0xfff (4095) at the moment, purely so that the product (4095 MB) can be
stored to the uint32_t variable "tseg_size" in mch_update_smram(). Users
are responsible for choosing sensible TSEG sizes.
On 2.9 and earlier q35 machine types, the default value is 0. This lets
the 11b bit pattern in ESMRAMC.TSEG_SZ, and the register at offset 0x50,
keep their original behavior.
When "extended-tseg-mbytes" is nonzero, the new register at offset 0x50 is
set to that value on reset, for completeness.
PCI config space is migrated automatically, so no VMSD changes are
necessary.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1447027
Ref: https://lists.01.org/pipermail/edk2-devel/2017-May/010456.html
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I'm not trying too hard yet. Later, with multiqueue support,
this may cause mutex contention or cacheline bouncing.
Cc: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-20-pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
block_acct_destroy is called unconditionally in blk_delete, but there is
no BlockAcctStats function that is called unconditionally in blk_new.
Split block_acct_init in two, so that it will be possible to create a
QemuMutex in block_acct_init and destroy it in block_acct_cleanup.
Cc: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-19-pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
This is the common code to account operations that produced actual I/O.
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-18-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Any data that is returned by read may be stale already, the bitmap
has to be cleared before issuing the read.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-16-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
It protects only the list of dirty bitmaps; in the next patch we will
also protect their content.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-15-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
This module provides fast paths for 64-bit atomic operations on machines
that only have 32-bit atomic access.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170605123908.18777-11-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Another possibility is to use tg->lock, which we're holding anyway in
both schedule_next_request and throttle_group_co_io_limits_intercept.
This would require open-coding the CoQueue however, so I've chosen this
alternative.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-10-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Prepare for removing this function; always restart throttled requests
from coroutine context. This will matter when restarting throttled
requests will have to acquire a CoMutex.
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-9-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Starting all waiting coroutines from bdrv_drain_all is unnecessary;
throttle_group_co_io_limits_intercept calls schedule_next_request as
soon as the coroutine restarts, which in turn will restart the next
request if possible.
If we only start the first request and let the coroutines dance from
there the code is simpler and there is more reuse between
throttle_group_config, throttle_group_restart_blk and timer_cb. The
next patch will benefit from this.
We also stop accessing from throttle_group_restart_blk the
blkp->throttled_reqs CoQueues even when there was no
attached throttling group. This worked but is not pretty.
The only thing that can interrupt the dance is the QEMU_CLOCK_VIRTUAL
timer when switching from one block device to the next, because the
timer is set to "now + 1" but QEMU_CLOCK_VIRTUAL might not be running.
Set that timer to point in the present ("now") rather than the future
and things work.
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-8-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Currently there are warnings about flex and bison being missing when
building in the centos6 image:
make[1]: flex: Command not found
BISON dtc-parser.tab.c
make[1]: bison: Command not found
Add them.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170524005206.31916-1-famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
It is used by qemu-iotests.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170505032340.26467-3-famz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
We've used --add-current-user to create a user in the image, use it to
run tests, because root has too much priviledge, and can surprise test
cases.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170505032340.26467-2-famz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
This commit introduces a vhost-user-scsi backend sample application. It
must be linked with libiscsi and libvhost-user.
To use it, compile with:
$ make vhost-user-scsi
And run as follows:
$ ./vhost-user-scsi -u vus.sock -i iscsi://uri_to_target/
$ qemu-system-x86_64 --enable-kvm -m 512 \
-object memory-backend-file,id=mem,size=512m,share=on,mem-path=guestmem \
-numa node,memdev=mem \
-chardev socket,id=vhost-user-scsi,path=vus.sock \
-device vhost-user-scsi-pci,chardev=vhost-user-scsi \
The application is currently limited at one LUN only and it processes
requests synchronously (therefore only achieving QD1). The purpose of
the code is to show how a backend can be implemented and to test the
vhost-user-scsi Qemu implementation.
If a different instance of this vhost-user-scsi application is executed
at a remote host, a VM can be live migrated to such a host.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <1488479153-21203-5-git-send-email-felipe@nutanix.com>
This commit introduces a vhost-user device for SCSI. This is based
on the existing vhost-scsi implementation, but done over vhost-user
instead. It also uses a chardev to connect to the backend. Unlike
vhost-scsi (today), VMs using vhost-user-scsi can be live migrated.
To use it, start Qemu with a command line equivalent to:
qemu-system-x86_64 \
-chardev socket,id=vus0,path=/tmp/vus.sock \
-device vhost-user-scsi-pci,chardev=vus0,bus=pci.0,addr=...
A separate commit presents a sample application linked with libiscsi to
provide a backend for vhost-user-scsi.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <1488479153-21203-4-git-send-email-felipe@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is for the future interoperability & management guide. It includes
the QAPI docs, including the automatically generated ones, other socket
protocols (vhost-user, VNC), and the qcow2 file format.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are defined in config-target.h and thus should never be
used in common code.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1497468113-2874-3-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since we've got some new CPU targets in QEMU during the last months
and years, we've got some new TARGET_xxx defines now which should
be marked as poisoned for common code.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1497468113-2874-2-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- do not use 'goto error_reply' outside a switch to jump into the
middle of the switch's default case label
- reduce code duplication
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-13-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For consistency use 'ret' name for saving return code everywhere
in the file.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-12-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"goto fail" error handling scheme is not needed for just returning
error code. Better is return it immediately.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-11-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current code will return 0 on this nbd_write fail, as rc is 0
after successful nbd_negotiate_options. Fix this.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-10-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"co" field of NBDClientNewData has never been used, all the way back to
its declaration in commit 1a6245a5. So let's just use client pointer
instead of extra structure.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-9-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move function tail, about receiving next request out of the function.
Error path is simplified and nbd_co_receive_request becomes more
corresponding to its name.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-8-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For now nbd_read never returns EAGAIN. So, don't handle it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-7-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As nbd_write never returns value > 0, we can get rid of extra ret.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-6-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now nbd_read and friends return int, so get rid of ssize_t.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-5-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Functions nbd_negotiate_{read,write,drop_sync} were introduced in
1a6245a5b, when nbd_rwv (was nbd_wr_sync) was working through
qemu_co_sendv_recvv (the path is nbd_wr_sync -> qemu_co_{recv/send} ->
qemu_co_send_recv -> qemu_co_sendv_recvv), which just yields, without
setting any handlers. But starting from ff82911cd nbd_rwv (was
nbd_wr_syncv) works through qio_channel_yield() which sets handlers, so
watchers are redundant in nbd_negotiate_{read,write,drop_sync}, then,
let's just use nbd_{read,write,drop} functions.
Functions nbd_{read,write,drop} has errp parameter, which is unused in
this patch. This will be fixed later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-4-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Following commit will reuse it for nbd server too.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170602150150.258222-3-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename
nbd_wr_syncv -> nbd_rwv
read_sync -> nbd_read
read_sync_eof -> nbd_read_eof
write_sync -> nbd_write
drop_sync -> nbd_drop
1. nbd_ prefix
read_sync and write_sync are already shared, so it is good to have a
namespace prefix. drop_sync will be shared, and read_sync_eof is
related to read_sync, so let's rename them all.
2. _sync suffix
_sync is related to the fact that nbd_wr_syncv doesn't return if a
write to socket returns EAGAIN. The first implementation of
nbd_wr_syncv (was wr_sync in 7a5ca8648b) just loops while getting
EAGAIN, the current implementation yields in this case.
Why we want to get rid of it:
- it is normal for r/w functions to be synchronous, so having an
additional suffix for it looks redundant (contrariwise, we have
_aio suffix for async functions)
- _sync suffix in block layer is used when function does flush (so
using it for other thing is confusing a bit)
- keep function names short after adding nbd_ prefix
3. for nbd_wr_syncv let's use more common notation 'rw'
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170602150150.258222-2-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
move kvm related accelerator files into accel/ subdirectory, also
create one stub subdirectory, which will include accelerator's stub
files.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <1496383606-18060-5-git-send-email-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
move tcg-runtime.c, translate-all.(ch) and translate-common.c into
accel/tcg/ subdirectory and updated related trace-events file.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <1496383606-18060-4-git-send-email-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
move cputlb.c, cpu-exec-common.c and cpu-exec.c related tcg exec
file into accel/tcg/ subdirectory.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <1496383606-18060-3-git-send-email-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
there are some types of accelerators in qemu, and all accelerators
have their own file except tcg. tcg accelerator is also defined in
accel.c file. tcg accelerator file will be splited from accel.c and
re-name to tcg-all.c. accel/ directory will be created to include
kvm and tcg related files.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <1496383606-18060-2-git-send-email-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu proper has done so for 13 years
(8a7ddc38a6), qemu-img and qemu-io have
done so for four years (526eda14a6).
Ignoring this signal is especially important in qemu-nbd because
otherwise a client can easily take down the qemu-nbd server by dropping
the connection when the server wants to send something, for example:
$ qemu-nbd -x foo -f raw -t null-co:// &
[1] 12726
$ qemu-io -c quit nbd://localhost/bar
can't open device nbd://localhost/bar: No export with name 'bar' available
[1] + 12726 broken pipe qemu-nbd -x foo -f raw -t null-co://
In this case, the client sends an NBD_OPT_ABORT and closes the
connection (because it is not required to wait for a reply), but the
server replies with an NBD_REP_ACK (because it is required to reply).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20170611123714.31292-1-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>