Re-read the timebase before migrate was ported from x86 commit:
6053a86fe7: kvmclock: reduce kvmclock difference on migration
The clock move makes the guest knows about the paused time between
the stop and migrate commands. This is an issue in an already-paused
VM because some side effects, like process stalls, could happen
after migration.
So, this patch checks the runstate of guest in the pre_save handler and
do not re-reads the timebase in case of paused state (cold migration).
Signed-off-by: Maxiwell S. Garcia <maxiwell@linux.ibm.com>
Message-Id: <20190711194702.26598-1-maxiwell@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've had the qemu and kernel KVM infrastructure to handle larger TCE
page sizes for a while, but forgot to update the defaults to actually
allow them. This turns that change on.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
They just called audio_pcm_sw_read/write anyway, so it makes no sense
to have them too. (The noaudio's read is the only exception, but it
should work with the generic code too.)
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Message-id: 92ddc98133bc4b687c6e4608b9321e7b64c0e496.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Pulseaudio normally assumes that when the server wants it, the client
can generate the audio samples and send it right away. Unfortunately
this is not the case with QEMU -- it's up to the emulated system when
does it generate the samples. Buffering the samples and sending them
from a background thread is just a workaround, that doesn't work too
well. Instead enable pa's compatibility support and let pa worry about
the details.
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Message-id: aa4e3613122ccbaa62b1feb4e427260731f7477c.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
audio_run is called manually by alsa and oss backends when polling.
In this case only the requesting backend should be run, not all of them.
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 10221fcea2028fa18d95cf531526ffe3b1d9b21a.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
There's already a MIN and MAX macro in include/qemu/osdep.h, use them
instead.
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 303222477df6f7373217e0df768635fab5855745.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Unless we disable stream moving, pulseaudio can easily move the stream
on connect, effectively ignoring the source/sink specified by the user.
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: c245929463e6e46a48b2875a150815e2ccba11b4.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This means you should probably stop using -soundhw (as it doesn't allow
you to specify any options) and add the device manually with -device.
The exception is pcspk, it's currently not possible to manually add it.
To use it with audiodev, use something like this:
-audiodev id=foo,... -global isa-pcspk.audiodev=foo -soundhw pcspk
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Message-id: 9072b955acffda13976bca7b61f86d7f708c9269.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Have a pool of refcounted connections per server, so if the user creates
multiple audiodevs to the same pa server, it will use a single connection. (It
will still create different streams, so the user can manage those streams
separately in pulseaudio.)
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Message-id: d43218f327c62cdbd16ea0c922612025fbc4805e.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Finally add audiodev= options to audio frontends so users can specify
which backend to use when multiple backends exist. Not specifying an
audiodev= option currently causes the first audiodev to be used, this is
fixed in the next commit.
Example usage: -audiodev pa,id=foo -device AC97,audiodev=foo
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: d64db52dda2d0e9d97bc5ab1dd9adf724280fea1.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Audio functions no longer access glob_audio_state, instead they get an
AudioState as a parameter. This is required in order to support
multiple backends.
glob_audio_state is also gone, and replaced with a tailq so we can store
more than one states.
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Message-id: 67aef54f9e729a7160fe95c465351115e392164b.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Remove glob_audio_state from functions, where possible without breaking
the API. This means that most static functions in audio.c now take an
AudioState pointer instead of implicitly using glob_audio_state. Also
included a pointer in SWVoice*, HWVoice* structs, so that functions
dealing them can know the audio state without having to pass it around
separately.
This is required in order to support multiple simultaneous audio
backends (added in a later commit).
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: b5e241f24e795267b145bcde7c6a72dd5e6037ea.1566168923.git.DirtY.iCE.hu@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add 4.2 machine types for arm/i440fx/q35/s390x/spapr.
For i440fx and q35, unversioned cpu models are still translated
to -v1, as 0788a56bd1 ("i386: Make unversioned CPU models be
aliases") states this should only transition to the latest cpu
model version in 4.3 (or later).
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190724103524.20916-1-cohuck@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Its not immediately obvious how cap-X=Y setting need to be applied
to the command line so, for spapr capability error messages, this
has been clarified to:
appending -machine cap-X=Y
The wrong value messages have been left as is, as the user has found
the right location.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Message-Id: <20190812071044.30806-1-daniel@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Intel CooperLake cpu adds AVX512_BF16 instruction, defining as
CPUID.(EAX=7,ECX=1):EAX[bit 05].
The patch adds a property for setting the subleaf of CPUID leaf 7 in
case that people would like to specify it.
The release spec link as follows,
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When executing script in lsi_execute_script(), the LSI scsi adapter
emulator advances 's->dsp' index to read next opcode. This can lead
to an infinite loop if the next opcode is empty. Move the existing
loop exit after 10k iterations so that it covers no-op opcodes as
well.
Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All current bitmap_set test cases set range across word, while the
handle of a range within one word is different from that.
Add case to set 1 bit as a represent for set range within one word.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 18269069c3 ("migration: Introduce ignore-shared capability")
addes ignore-shared capability to bypass the shared ramblock (e,g,
membackend + numa node). It does good to live migration.
As told by Yury,this commit expectes that QEMU doesn't write to guest RAM
until VM starts, but it does on aarch64 qemu:
Backtrace:
1 0x000055f4a296dd84 in address_space_write_rom_internal () at
exec.c:3458
2 0x000055f4a296de3a in address_space_write_rom () at exec.c:3479
3 0x000055f4a2d519ff in rom_reset () at hw/core/loader.c:1101
4 0x000055f4a2d475ec in qemu_devices_reset () at hw/core/reset.c:69
5 0x000055f4a2c90a28 in qemu_system_reset () at vl.c:1675
6 0x000055f4a2c9851d in main () at vl.c:4552
Actually, on arm64 virt marchine, ramblock "dtb" will be filled into ram
druing rom_reset. In ignore-shared incoming case, this rom filling
is not required since all the data has been stored in memory backend
file.
Further more, as suggested by Peter Xu, if we do rom_reset() now with
these ROMs then the RAM data should be re-filled again too with the
migration stream coming in.
Fixes: commit 18269069c3 ("migration: Introduce ignore-shared
capability")
Suggested-by: Yury Kotov <yury-kotov@yandex-team.ru>
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Catherine Ho <catherine.hecx@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sometimes we use the 'struct' keyword in headers to help us
reduce dependencies between header files. Document that
practice.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was introduced in commit ab129972c8,
with the following motivation:
Because start_exclusive uses CPU_FOREACH, merge exclusive_lock with
qemu_cpu_list_lock: together with a call to exclusive_idle (via
cpu_exec_start/end) in cpu_list_add, this protects exclusive work
against concurrent CPU addition and removal.
However, it seems to be redundant, because the cpu-exclusive
infrastructure provides suffificent protection against the newly added
CPU starting execution while the cpu-exclusive work is running, and the
aforementioned traversing of the cpu list is protected by
qemu_cpu_list_lock.
Besides, this appears to be the only place where the cpu-exclusive
section is entered with the BQL taken, which has been found to trigger
AB-BA deadlock as follows:
vCPU thread main thread
----------- -----------
async_safe_run_on_cpu(self,
async_synic_update)
... [cpu hot-add]
process_queued_cpu_work()
qemu_mutex_unlock_iothread()
[grab BQL]
start_exclusive() cpu_list_add()
async_synic_update() finish_safe_work()
qemu_mutex_lock_iothread() cpu_exec_start()
So remove it. This paves the way to establishing a strict nesting rule
of never entering the exclusive section with the BQL taken.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20190523105440.27045-2-rkagan@virtuozzo.com>
Prior patch resets can_do_io flag at the TB entry. Therefore there is no
need in resetting this flag at the end of the block.
This patch removes redundant gen_io_end calls.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404429499.18669.13404064982854123855.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@gmail.com>
Most of IO instructions can be executed only at the end of the block in
icount mode. Therefore translator can set cpu_can_io flag when translating
the last instruction.
But when the blocks are chained, then this flag is not reset and may
remain set at the beginning of the next block.
This patch resets the flag at the entry of any translation block,
making I/O operations impossible by default.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
--
v2 changes:
- reset can_do_io at the start of every TB (suggested by Paolo Bonzini)
Message-Id: <156404428943.18669.15747009371169578935.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch renames replay_get_current_step() and related variables
to make these names consistent with existing 'icount' command line
option and future record/replay hmp/qmp commands.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404428377.18669.15476429889039912070.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch removes refactoring artifacts from the replay/replay-time.c
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404427799.18669.8072341590511911277.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch fixes shutdown of the replay process, which is terminated with
the assert when shutdown event is read from the log.
replay_finish_event reads new data_kind and therefore the value of data_kind
should be preserved to be valid at qemu_system_shutdown_request call.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404427238.18669.12378772823692338069.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
icount-based record/replay uses qemu_clock_deadline_ns_all to measure
the period until vCPU may be interrupted.
This function takes in account the virtual timers, because they belong
to the virtual devices that may generate interrupt request or affect
the virtual machine state.
However, there are a subset of virtual timers, that are marked with
'external' flag. These do not change the virtual machine state and
only based on virtual clock. Calculating the deadling using the external
timers breaks the determinism, because they do not belong to the replayed
part of the virtual machine.
This patch fixes the deadline calculation for this case by adding
new parameter for skipping the external timers when it is needed.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
--
v2 changes:
- added new parameter for timer attribute mask
Message-Id: <156404426682.18669.17014100602930969222.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch introduces docs/devel/replay.txt which describes the rules
that should be followed to make virtual devices usable in record/replay mode.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgauk@ispras.ru>
--
v9: fixed external virtual clock description (reported by Artem Pisarenko)
Message-Id: <156404426119.18669.6707258931552832854.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
This is a fix which was missed by patch
74c0b816ad, which added current_step
parameter to the replay_advance_current_step function.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404425561.18669.13015037579222450241.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The reset notifiers kept a 'last' counter to notice jumps;
now that we've remove the notifier we don't need to keep 'last'.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-5-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we're not using the 'last' field in the timer, remove it from
replay.
Bump the version number of the replay structure since we've
removed the field.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-4-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the reset notifer from the core qemu-timer code.
The only user was mc146818 and we've just remove it's use.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-3-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The reset notifiers are unreliable and recalculating the offsets
after boot causes problems with migration in cases where explicit
base times are set on the destination.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-2-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is a race between TCG and accesses to the dirty log:
vCPU thread reader thread
----------------------- -----------------------
TLB check -> slow path
notdirty_mem_write
write to RAM
set dirty flag
clear dirty flag
TLB check -> fast path
read memory
write to RAM
Fortunately, in order to fix it, no change is required to the
vCPU thread. However, the reader thread must delay the read after
the vCPU thread has finished the write. This can be approximated
conservatively by run_on_cpu, which waits for the end of the current
translation block.
A similar technique is used by KVM, which has to do a synchronous TLB
flush after doing a test-and-clear of the dirty-page flags.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The x86 architecture requires that all conversions from floating
point to integer which raise the 'invalid' exception (infinities of
both signs, NaN, and all values which don't fit in the destination
integer) return what the x86 spec calls the "indefinite integer
value", which is 0x8000_0000 for 32-bits or 0x8000_0000_0000_0000 for
64-bits. The softfloat functions return the more usual behaviour of
positive overflows returning the maximum value that fits in the
destination integer format and negative overflows returning the
minimum value that fits.
Wrap the softfloat functions in x86-specific versions which
detect the 'invalid' condition and return the indefinite integer.
Note that we don't use these wrappers for the 3DNow! pf2id and pf2iw
instructions, which do return the minimum value that fits in
an int32 if the input float is a large negative number.
Fixes: https://bugs.launchpad.net/qemu/+bug/1815423
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20190805180332.10185-1-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Not the whole structure is initialized before passing it to the KVM.
Reduce the number of Valgrind reports.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1564502498-805893-4-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
One byte in the local buffer stays uninitialized, at least with the
first iteration, because of the double decrement in the
test_visitor_in_fuzz(). This is what Valgrind does not like and not
critical for the test itself. So, reduce the number of the memory
issues reports.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1564502498-805893-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ThrottleState::cfg of the static variable 'ts' is reassigned with the
local one in the do_test_accounting() and then is passed to the
throttle_account() with uninitialized member LeakyBucket::burst_length.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1564502498-805893-2-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Function 'kvm_get_supported_msrs' is only called once
now, get rid of the static variable 'kvm_supported_msrs'.
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190725151639.21693-1-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Express the complex conditions in Kconfig rather than Makefiles, since Kconfig
is better suited at expressing dependencies and detecting contradictions.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch moves the define of target access alignment earlier from
target/foo/cpu.h to configure.
Suggested in Richard Henderson's reply to "[PATCH 1/4] tcg: TCGMemOp is now
accelerator independent MemOp"
Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Message-Id: <11e818d38ebc40e986cfa62dd7d0afdc@tpw09926dag18e.domain1.systemhost.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: tony.nguyen@bt.com <tony.nguyen@bt.com>
It is wrong for an entry to have parts out of scope of notifier's range.
assert this condition.
Out of scope mapping/unmapping would cause problem, as in below case:
1. initially there are two notifiers with ranges
0-0xfedfffff, 0xfef00000-0xffffffffffffffff,
IOVAs from 0x3c000000 - 0x3c1fffff is in shadow page table.
2. in vfio, memory_region_register_iommu_notifier() is followed by
memory_region_iommu_replay(), which will first call address space
unmap,
and walk and add back all entries in vtd shadow page table. e.g.
(1) for notifier 0-0xfedfffff,
IOVAs from 0 - 0xffffffff get unmapped,
and IOVAs from 0x3c000000 - 0x3c1fffff get mapped
(2) for notifier 0xfef00000-0xffffffffffffffff
IOVAs from 0 - 0x7fffffffff get unmapped,
but IOVAs from 0x3c000000 - 0x3c1fffff cannot get mapped back.
Cc: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-Id: <1561432878-13754-1-git-send-email-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to reduce the memory footprint we map into memory
the initrd using g_mapped_file_new() instead of reading it.
In this way we can share the initrd pages between multiple
instances of QEMU.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20190724143105.307042-4-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to reduce the memory footprint we map into memory
the ELF to load using g_mapped_file_new_from_fd() instead of
reading each sections. In this way we can share the ELF pages
between multiple instances of QEMU.
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20190724143105.307042-3-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>