Implicitly and explicitly opended zones are always bulk processed
together, so merge the two processing masks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
A shutdown is only about flushing stuff. It is the host that should
delete any queues, so do not perform a reset here.
Also, on shutdown, make sure that the PMR is flushed if in use.
Fixes: 368f4e752cf9 ("hw/block/nvme: Process controller reset and shutdown differently")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The device uses the BDRV_BLOCK_ZERO flag to determine the "deallocated"
status of logical blocks. Since the zoned namespaces command set
specification defines that logical blocks SHALL be marked as deallocated
when the zone is in the Empty or Offline states, DULBE can only be
supported if the zone size is a multiple of the calculated deallocation
granularity (reported in NPDG) which depends on the underlying block
device cluster size (if applicable) or the configured
discard_granularity.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar()
return value") had the unintended effect of breaking support on
several platforms not supporting MSI-X.
Still check for errors, but only report that MSI-X is unsupported
instead of bailing out.
Fixes: 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar() return value")
Fixes: fbf2e5375e ("hw/block/nvme: Verify msix_vector_use() returned value")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Added brief descriptions of the new device properties that are
now available to users to configure features of Zoned Namespace
Command Set in the emulator.
This patch is for documentation only, no functionality change.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Zone Descriptor Extension is a label that can be assigned to a zone.
It can be set to an Empty zone and it stays assigned until the zone
is reset.
This commit adds a new optional module property,
"zoned.descr_ext_size". Its value must be a multiple of 64 bytes.
If this value is non-zero, it becomes possible to assign extensions
of that size to any Empty zones. The default value for this property
is 0, therefore setting extensions is disabled by default.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add two module properties, "zoned.max_active" and "zoned.max_open"
to control the maximum number of zones that can be active or open.
Once these variables are set to non-default values, these limits are
checked during I/O and Too Many Active or Too Many Open command status
is returned if they are exceeded.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.
Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.
In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.
Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.
Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.
Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.
Subsequent commits in this series add ZDE support and checks for
active and open zone limits.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053).
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Many CNS commands have "allocated" command variants. These include
a namespace as long as it is allocated, that is a namespace is
included regardless if it is active (attached) or not.
While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.
However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the
same result as the active CNS command variants.
The reason for not hooking up this command completely is because the
NVMe specification requires the namespace management command to be
supported if the namespace attachment command is supported.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Define the structures and constants required to implement
Namespace Types support.
Namespace Types introduce a new command set, "I/O Command Sets",
that allows the host to retrieve the command sets associated with
a namespace. Introduce support for the command set and enable
detection for the NVM Command Set.
The new workflows for identify commands rely heavily on zero-filled
identify structs. E.g., certain CNS commands are defined to return
a zero-filled identify struct when an inactive namespace NSID
is supplied.
Add a helper function in order to avoid code duplication when
reporting zero-filled identify structures.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This log page becomes necessary to implement to allow checking for
Zone Append command support in Zoned Namespace Command Set.
This commit adds the code to report this log page for NVM Command
Set only. The parts that are specific to zoned operation will be
added later in the series.
All incoming admin and i/o commands are now only processed if their
corresponding support bits are set in this log. This provides an
easy way to control what commands to support and what not to
depending on set CC.CSS.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
v2
Dropped vmstate: Fix memory leak in vmstate_handle_alloc
Broke on Power
Added migration: only check page size match if RAM postcopy is enabled
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmAhIE4ACgkQBRYzHrxb
/ecPuA/+Pgo++1ZSseJUgbLePwyTVc0jahdcvYEDmLUn8UM6ikBcBXBgUKHdkFW3
bjSSVgB/xxvXSiafBK4xFNrCqSgqMSr3DJcHmvWgv2wVARcYf6Z26Da53LZq1Qru
0tvRyb40Od1f9zb8Zj7e2Y3pjQ9ybLLbjfNhgnOBbQivqWkjZI31oV2KUCWY2+eV
T1BEwr6mgYepqhmeB6OvQZtaQVC5toirS6NajNF4nt0vZEIGIvK6/A9erCVU8Tze
5ch1J0MUqgc3q6ZSE/I9BHEy6MaL0X8G6H+ezjxdoRQtbt1iM/YqZJCSrXkAxiLC
ROohryb6qVk26+UYuana79faLwrw359WlkwNEE6SEIRSENu+6p7bgN3LZuCILCO7
xJEkeTgy6r40IGCkDC9aWa8pyLHpNX9gyLpGBHdIRD6zEOWaKNtzh7E2uo/T0ann
BpcfgQOsYN25hIHiiXnxozUREbx71VDfMq7GqGB6eC3u2+a3U6jpSJb1nNq5NB89
FJYLZy5Rbuy7OStMwfMsxRs7E63XvGgnwrN8FczU/pumCPX4lDYIpnocqinUmP8p
XubRQQVaVDSKIq1mvzw7iR/1NsP9vfYvnrAIv941f38NBmDKqdPuMOXR/qB/Kp2Y
jB7b1L5/JcXbWsQmK7fda9jmPzFwSO2cTeTiUonk9RfuuDEws0A=
=4tbe
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210208a' into staging
Migration pull 2021-02-08
v2
Dropped vmstate: Fix memory leak in vmstate_handle_alloc
Broke on Power
Added migration: only check page size match if RAM postcopy is enabled
# gpg: Signature made Mon 08 Feb 2021 11:28:14 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20210208a: (27 commits)
migration: only check page size match if RAM postcopy is enabled
migration: introduce snapshot-{save, load, delete} QMP commands
iotests: fix loading of common.config from tests/ subdir
iotests: add support for capturing and matching QMP events
migration: introduce a delete_snapshot wrapper
migration: wire up support for snapshot device selection
migration: control whether snapshots are ovewritten
block: rename and alter bdrv_all_find_snapshot semantics
block: allow specifying name of block device for vmstate storage
block: add ability to specify list of blockdevs during snapshot
migration: stop returning errno from load_snapshot()
migration: Make save_snapshot() return bool, not 0/-1
block: push error reporting into bdrv_all_*_snapshot functions
migration: Display the migration blockers
migration: Add blocker information
migration: Fix a few absurdly defective error messages
migration: Fix cache_init()'s "Failed to allocate" error messages
migration: Clean up signed vs. unsigned XBZRLE cache-size
migration: Fix migrate-set-parameters argument validation
migration: introduce 'userfaultfd-wrlat.py' script
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Move write processing to nvme_do_write() that now handles both WRITE
and WRITE ZEROES. Both nvme_write() and nvme_write_zeroes() become
inline helper functions.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The majority of code in nvme_rw() is becoming read- or write-specific.
Move these parts to two separate handlers, nvme_read() and nvme_write()
to make the code more readable and to remove multiple is_write checks
that has been present in the i/o path.
This is a refactoring patch, no change in functionality.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In NVMe 1.4, a namespace must report an ID descriptor of UUID type
if it doesn't support EUI64 or NGUID. Add a new namespace property,
"uuid", that provides the user the option to either specify the UUID
explicitly or have a UUID generated automatically every time a
namespace is initialized.
Suggested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Controller reset ans subsystem shutdown are handled very much the same
in the current code, but some of the steps should be different in these
two cases.
Introduce two new functions, nvme_reset_ctrl() and nvme_shutdown_ctrl(),
to separate some portions of the code from nvme_clear_ctrl(). The steps
that are made different between reset and shutdown are that BAR.CC is not
reset to zero upon the shutdown and namespace data is flushed to
backing storage as a part of shutdown handling, but not upon reset.
Suggested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 37712e00b1 ("hw/block/nvme: factor out pmr setup") changed the
control flow such that the CAP register is erronously cleared after
nvme_init_pmr() has configured it. Since the entire NvmeCtrl structure
is zero-filled initially, there is no need for the explicit clearing, so
just remove it.
Fixes: 37712e00b1 ("hw/block/nvme: factor out pmr setup")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add the Compare command.
This implementation uses a bounce buffer to read in the data from
storage and then compare with the host supplied buffer.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for the Dataset Management command and the Deallocate
attribute. Deallocation results in discards being sent to the underlying
block device. Whether of not the blocks are actually deallocated is
affected by the same factors as Write Zeroes (see previous commit).
format | discard | dsm (512B) dsm (4KiB) dsm (64KiB)
--------------------------------------------------------
qcow2 ignore n n n
qcow2 unmap n n y
raw ignore n n n
raw unmap n y y
Again, a raw format and 4KiB LBAs are preferable.
In order to set the Namespace Preferred Deallocate Granularity and
Alignment fields (NPDG and NPDA), choose a sane minimum discard
granularity of 4KiB. If we are using a passthru device supporting
discard at a 512B granularity, user should set the discard_granularity
property explicitly. NPDG and NPDA will also account for the
cluster_size of the block driver if required (i.e. for QCOW2).
See NVM Express 1.3d, Section 6.7 ("Dataset Management command").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
This adds the NPWG, NPWA, NPDG, NPDA and NOWS family of fields to the
shared nvme.h header for use by later patches.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add support for reporting the Deallocated or Unwritten Logical Block
Error (DULBE).
Rely on the block status flags reported by the block layer and consider
any block with the BDRV_BLOCK_ZERO flag to be deallocated.
Multiple factors affect when a Write Zeroes command result in
deallocation of blocks.
* the underlying file system block size
* the blockdev format
* the 'discard' and 'logical_block_size' parameters
format | discard | wz (512B) wz (4KiB) wz (64KiB)
-----------------------------------------------------
qcow2 ignore n n y
qcow2 unmap n n y
raw ignore n y y
raw unmap n y y
So, this works best with an image in raw format and 4KiB LBAs, since
holes can then be punched on a per-block basis (this assumes a file
system with a 4kb block size, YMMV). A qcow2 image, uses a cluster size
of 64KiB by default and blocks will only be marked deallocated if a full
cluster is zeroed or discarded. However, this *is* consistent with the
spec since Write Zeroes "should" deallocate the block if the Deallocate
attribute is set and "may" deallocate if the Deallocate attribute is not
set. Thus, we always try to deallocate (the BDRV_REQ_MAY_UNMAP flag is
always set).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add a new function, nvme_aio_err, to handle errors resulting from AIOs
and use this from the callbacks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_check_bounds has no use of the NvmeCtrl parameter; remove it.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
The "🥑 enable" is not necessary and was removed in 9531d26c,
so let's remove from the docs.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210203172357.1422425-4-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
If the connection to the ssh server fails, it may indeed be a "sshd"
issue, but it may also not be that. Let's state what we know: the
establishment of the connection from the client side was not possible.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210203172357.1422425-13-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
AFAICT, there should not be a situation where IP and port do not have
at least one whitespace character separating them.
This may be true for other '\s*' patterns in the same regex too.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210203172357.1422425-10-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Instead of having to cast it whenever it's going to be used, let's
standardize it as an integer, which is the data type that will be
used most often.
Given that the regex will only match digits, it's safe that we'll
end up getting a integer, but, it could as well be a zero.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Message-Id: <20210203172357.1422425-9-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
In a virtiofs based tests, it seems safe to assume that the guest will
be capable of a virtio-net device.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210203172357.1422425-7-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tests are supposed to be non-interactive, and ssh-keygen is asking for
a passphrase when creating a key. Let's set an empty passphrase to
avoid the prompt.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210203172357.1422425-6-crosa@redhat.com>
[PMD: Reword description per Alex Bennée comment]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
For Avocado Instrumented based tests, it's a better idea to just use
the property. The environment variable is a fall back for tests not
written using that Python API.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reference: https://avocado-framework.readthedocs.io/en/84.0/api/test/avocado.html#avocado.Test.workdir
Message-Id: <20210203172357.1422425-5-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
There's no downloading happening on that method, so let's call it
"prepare" instead. While at it, and because of it, the current
"prepare_boot" and "prepare_cloudinit" are also renamed.
The reasoning here is that "prepare_" methods will just work on the
images, while "set_up_" will make them effective to the VM that will
be launched. Inspiration comes from the "virtiofs_submounts.py"
tests, which this expects to converge more into.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210203172357.1422425-3-crosa@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The microblaze kernel sometimes gets stuck during boot (ca. 1 out of 200
times), so we disabled the corresponding acceptance tests some months
ago. However, it's likely better to check that the kernel is still
starting than to not testing it at all anymore. Move the test to
a separate file, enable it again and check for an earlier console
message that should always appear.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-Id: <20210128152815.585478-1-thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Expose the VMX exit/entry load pkrs control bits in
VMX_TRUE_EXIT_CTLS/VMX_TRUE_ENTRY_CTLS MSRs to guest, which supports the
PKS in nested VM.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210205083325.13880-3-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
PKS introduces MSR IA32_PKRS(0x6e1) to manage the supervisor protection
key rights. Page access and writes can be managed via the MSR update
without TLB flushes when permissions change.
Add the support to save/load IA32_PKRS MSR in guest.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210205083325.13880-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During migrations, after each iteration, cpu_throttle_set() is called,
which irrespective of input, re-arms the timer according to value of
new_throttle_pct. This causes cpu_throttle_thread() to be delayed in
getting scheduled and consqeuntly lets guest run for more time than what
the throttle value should allow. This leads to spikes in guest throughput
at high cpu-throttle percentage whenever cpu_throttle_set() is called.
A solution would be not to modify the timer immediately in
cpu_throttle_set(), instead, only modify throttle_percentage so that the
throttle would automatically adjust to the required percentage when
cpu_throttle_timer_tick() is invoked.
Manually tested the patch using following configuration:
Guest:
Centos7 (3.10.0-123.el7.x86_64)
Total Memory - 64GB , CPUs - 16
Tool used - stress (1.0.4)
Workload - stress --vm 32 --vm-bytes 1G --vm-keep
Migration Parameters:
Network Bandwidth - 500MBPS
cpu-throttle-initial - 99
Results:
With timer_mod(): fails to converge, continues indefinitely
Without timer_mod(): converges in 249 sec
Signed-off-by: Utkarsh Tripathi <utkarsh.tripathi@nutanix.com>
Message-Id: <1609420384-119407-1-git-send-email-utkarsh.tripathi@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch enables using rng-builtin with record/replay
by making the callbacks deterministic.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <161233201286.170686.7858208964037376305.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Before the change /usr/share/qemu/firmware/50-edk2-x86_64-secure.json
contained the relative path:
"filename": "share/qemu/edk2-x86_64-secure-code.fd",
"filename": "share/qemu/edk2-i386-vars.fd",
After then change the paths are absolute:
"filename": "/usr/share/qemu/edk2-x86_64-secure-code.fd",
"filename": "/usr/share/qemu/edk2-i386-vars.fd",
The regression appeared in qemu-5.2.0 (seems to be related
to meson port).
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: "Marc-André Lureau" <marcandre.lureau@redhat.com>
CC: "Philippe Mathieu-Daudé" <philmd@redhat.com>
Bug: https://bugs.gentoo.org/766743
Bug: https://bugs.launchpad.net/qemu/+bug/1913012
Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Message-Id: <20210131143434.2513363-1-slyfox@gentoo.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sometimes interrupt event comes at the same time with
the virtual timers. In this case replay tries to proceed
the timers, because deadline for them is zero.
This patch allows processing interrupts and exceptions
by entering the vCPU execution loop, when deadline is zero,
but checkpoint associated with virtual timers is not ready
to be replayed.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <161216312794.2030770.1709657858900983160.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kvm_vm_ioctl() wrapper already returns -errno if the ioctl itself
returned -1, so the callers of kvm_vm_ioctl() should not check for -1
but for a value < 0 instead.
This problem has been fixed once already in commit b533f658a9
but that commit missed that the ENOENT error code is not fatal for
this ioctl, so the commit has been reverted in commit 50212d6346
since the problem occurred close to a pending release at that point
in time. The plan was to fix it properly after the release, but it
seems like this has been forgotten. So let's do it now finally instead.
Resolves: https://bugs.launchpad.net/qemu/+bug/1294227
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210129084354.42928-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>