Commit Graph

100114 Commits

Author SHA1 Message Date
BALATON Zoltan
091798aaf0 ati-vga: Implement fallback for pixman routines
Pixman routines can fail if no implementation is available and it will
become optional soon so add fallbacks when pixman does not work.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <ed0fba3f74e48143f02228b83bf8796ca49f3e7d.1698871239.git.balaton@eik.bme.hu>
(cherry picked from commit 08730ee0cc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-07 20:23:38 +03:00
Vladimir Sementsov-Ogievskiy
0d7c40a1e2 block/nvme: nvme_process_completion() fix bound for cid
NVMeQueuePair::reqs has length NVME_NUM_REQS, which less than
NVME_QUEUE_SIZE by 1.

Fixes: 1086e95da1 ("block/nvme: switch to a NVMeRequest freelist")
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-5-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit cc8fb0c3ae)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-07 19:22:08 +03:00
Peter Maydell
26bb3ab8ff target/arm: Correctly propagate stage 1 BTI guarded bit in a two-stage walk
In a two-stage translation, the result of the BTI guarded bit should
be the guarded bit from the first stage of translation, as there is
no BTI guard information in stage two.  Our code tried to do this,
but got it wrong, because we currently have two fields where the GP
bit information might live (ARMCacheAttrs::guarded and
CPUTLBEntryFull::extra::arm::guarded), and we were storing the GP bit
in the latter during the stage 1 walk but trying to copy the former
in combine_cacheattrs().

Remove the duplicated storage, and always use the field in
CPUTLBEntryFull; correctly propagate the stage 1 value to the output
in get_phys_addr_twostage().

Note for stable backports: in v8.0 and earlier the field is named
result->f.guarded, not result->f.extra.arm.guarded.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1950
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20231031173723.26582-1-peter.maydell@linaro.org
(cherry picked from commit 4c09abeae8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: replace f.extra.arm.guarded -> f.guarded due to v8.1.0-1179-ga81fef4b64)
2023-11-03 19:35:34 +03:00
Peter Maydell
e19b7b8165 target/arm: Fix handling of SW and NSW bits for stage 2 walks
We currently don't correctly handle the VSTCR_EL2.SW and VTCR_EL2.NSW
configuration bits.  These allow configuration of whether the stage 2
page table walks for Secure IPA and NonSecure IPA should do their
descriptor reads from Secure or NonSecure physical addresses. (This
is separate from how the translation table base address and other
parameters are set: an NS IPA always uses VTTBR_EL2 and VTCR_EL2
for its base address and walk parameters, regardless of the NSW bit,
and similarly for Secure.)

Provide a new function ptw_idx_for_stage_2() which returns the
MMU index to use for descriptor reads, and use it to set up
the .in_ptw_idx wherever we call get_phys_addr_lpae().

For a stage 2 walk, wherever we call get_phys_addr_lpae():
 * .in_ptw_idx should be ptw_idx_for_stage_2() of the .in_mmu_idx
 * .in_secure should be true if .in_mmu_idx is Stage2_S

This allows us to correct S1_ptw_translate() so that it consistently
always sets its (out_secure, out_phys) to the result it gets from the
S2 walk (either by calling get_phys_addr_lpae() or by TLB lookup).
This makes better conceptual sense because the S2 walk should return
us an (address space, address) tuple, not an address that we then
randomly assign to S or NS.

Our previous handling of SW and NSW was broken, so guest code
trying to use these bits to put the s2 page tables in the "other"
address space wouldn't work correctly.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1600
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230504135425.2748672-3-peter.maydell@linaro.org
(cherry picked from commit fcc0b0418f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-03 19:19:06 +03:00
Peter Maydell
6861482dea target/arm: Don't allow stage 2 page table walks to downgrade to NS
Bit 63 in a Table descriptor is only the NSTable bit for stage 1
translations; in stage 2 it is RES0.  We were incorrectly looking at
it all the time.

This causes problems if:
 * the stage 2 table descriptor was incorrectly setting the RES0 bit
 * we are doing a stage 2 translation in Secure address space for
   a NonSecure stage 1 regime -- in this case we would incorrectly
   do an immediate downgrade to NonSecure

A bug elsewhere in the code currently prevents us from getting
to the second situation, but when we fix that it will be possible.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230504135425.2748672-2-peter.maydell@linaro.org
(cherry picked from commit 21a4ab8318)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-03 19:19:06 +03:00
Fabiano Rosas
d9da3f8dbd target/arm: Don't access TCG code when debugging with KVM
When TCG is disabled this part of the code should not be reachable, so
wrap it with an ifdef for now.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 0d3de77a07)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: trivial change which makes two subsequent cherry-picks to apply cleanly)
2023-11-03 19:18:23 +03:00
Daniel P. Berrangé
3e273f4c16 Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h"
This reverts commit 3cd3df2a95.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230110174901.2580297-3-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 6003159ce1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-02 15:09:22 +03:00
Daniel P. Berrangé
cc64f9ac3d Revert "linux-user: add more compat ioctl definitions"
This reverts commit c5495f4ecb.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230110174901.2580297-2-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 9f0246539a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-02 15:09:11 +03:00
Andrey Drobyshev
e4a44eb819 qemu-iotests: 024: add rebasing test case for overlay_size > backing_size
Before previous commit, rebase was getting infitely stuck in case of
rebasing within the same backing chain and when overlay_size > backing_size.
Let's add this case to the rebasing test 024 to make sure it doesn't
break again.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-3-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 827171c318)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-02 15:04:24 +03:00
Andrey Drobyshev
1e67bd7c21 qemu-img: rebase: stop when reaching EOF of old backing file
In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0.  As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop.  Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-2-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 8b097fd6b0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-02 15:04:24 +03:00
Akihiko Odaki
6367e823ca tests/tcg: Add -fno-stack-protector
A build of GCC 13.2 will have stack protector enabled by default if it
was configured with --enable-default-ssp option. For such a compiler,
it is necessary to explicitly disable stack protector when linking
without standard libraries.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230731091042.139159-3-akihiko.odaki@daynix.com>
[AJB: fix comment string typo]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20231029145033.592566-3-alex.bennee@linaro.org>
(cherry picked from commit 580731dcc8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-02 15:04:24 +03:00
Akihiko Odaki
30811710fb tests/migration: Add -fno-stack-protector
A build of GCC 13.2 will have stack protector enabled by default if it
was configured with --enable-default-ssp option. For such a compiler,
it is necessary to explicitly disable stack protector when linking
without standard libraries.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230731091042.139159-2-akihiko.odaki@daynix.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 7a06a8fec9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-02 15:04:24 +03:00
Glenn Miles
d241c15f75 misc/led: LED state is set opposite of what is expected
Testing of the LED state showed that when the LED polarity was
set to GPIO_POLARITY_ACTIVE_LOW and a low logic value was set on
the input GPIO of the LED, the LED was being turn off when it was
expected to be turned on.

Fixes: ddb67f6402 ("hw/misc/led: Allow connecting from GPIO output")
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Message-id: 20231024191945.4135036-1-milesg@linux.vnet.ibm.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 6f83dc6716)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-31 20:39:03 +03:00
Lu Gao
cec02bc037 hw/sd/sdhci: Block Size Register bits [14:12] is lost
Block Size Register bits [14:12] is SDMA Buffer Boundary, it is missed
in register write, but it is needed in SDMA transfer. e.g. it will be
used in sdhci_sdma_transfer_multi_blocks to calculate boundary_ variables.

Missing this field will cause wrong operation for different SDMA Buffer
Boundary settings.

Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Fixes: dfba99f17f ("hw/sdhci: Fix DMA Transfer Block Size field")
Signed-off-by: Lu Gao <lu.gao@verisilicon.com>
Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-ID: <20220321055618.4026-1-lu.gao@verisilicon.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit ae5f70baf5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-24 09:12:49 +03:00
Helge Deller
ef93ba0d7b lasips2: LASI PS/2 devices are not user-createable
Those PS/2 ports are created with the LASI controller when
a 32-bit PA-RISC machine is created.

Mark them not user-createable to avoid showing them in
the qemu device list.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: qemu-stable@nongnu.org
(cherry picked from commit a1e6a5c462)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Mikulas Patocka
cb79e6a451 linux-user/sh4: Fix crashes on signal delivery
sh4 uses gUSA (general UserSpace Atomicity) to provide atomicity on CPUs
that don't have atomic instructions. A gUSA region that adds 1 to an
atomic variable stored in @R2 looks like this:

  4004b6:       03 c7           mova    4004c4 <gusa+0x10>,r0
  4004b8:       f3 61           mov     r15,r1
  4004ba:       09 00           nop
  4004bc:       fa ef           mov     #-6,r15
  4004be:       22 63           mov.l   @r2,r3
  4004c0:       01 73           add     #1,r3
  4004c2:       32 22           mov.l   r3,@r2
  4004c4:       13 6f           mov     r1,r15

R0 contains a pointer to the end of the gUSA region
R1 contains the saved stack pointer
R15 contains negative length of the gUSA region

When this region is interrupted by a signal, the kernel detects if
R15 >= -128U. If yes, the kernel rolls back PC to the beginning of the
region and restores SP by copying R1 to R15.

The problem happens if we are interrupted by a signal at address 4004c4.
R15 still holds the value -6, but the atomic value was already written by
an instruction at address 4004c2. In this situation we can't undo the
gUSA. The function unwind_gusa does nothing, the signal handler attempts
to push a signal frame to the address -6 and crashes.

This patch fixes it, so that if we are interrupted at the last instruction
in a gUSA region, we copy R1 to R15 to restore the correct stack pointer
and avoid crashing.

There's another bug: if we are interrupted in a delay slot, we save the
address of the instruction in the delay slot. We must save the address of
the previous instruction.

Cc: qemu-stable@nongnu.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Yoshinori Sato <ysato@users.sourcefoege.jp>
Message-Id: <b16389f7-6c62-70b7-59b3-87533c0bcc@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 3b894b699c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Mikulas Patocka
ea3c95a6b0 linux-user/mips: fix abort on integer overflow
QEMU mips userspace emulation crashes with "qemu: unhandled CPU exception
0x15 - aborting" when one of the integer arithmetic instructions detects
an overflow.

This patch fixes it so that it delivers SIGFPE with FPE_INTOVF instead.

Cc: qemu-stable@nongnu.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Message-Id: <3ef979a8-3ee1-eb2d-71f7-d788ff88dd11@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 6fad9b4bb9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Fabiano Rosas
f5358bc18b migration: Fix analyze-migration read operation signedness
The migration code uses unsigned values for 16, 32 and 64-bit
operations. Fix the script to do the same.

This was causing an issue when parsing the migration stream generated
on the ppc64 target because one of instance_ids was larger than the
32bit signed maximum:

Traceback (most recent call last):
  File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 658, in <module>
    dump.read(dump_memory = args.memory)
  File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 592, in read
    classdesc = self.section_classes[section_key]
KeyError: ('spapr_iommu', -2147483648)

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-6-farosas@suse.de>
(cherry picked from commit caea03279e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Yuval Shaia
06c9bf032f hw/pvrdma: Protect against buggy or malicious guest driver
Guest driver allocates and initialize page tables to be used as a ring
of descriptors for CQ and async events.
The page table that represents the ring, along with the number of pages
in the page table is passed to the device.
Currently our device supports only one page table for a ring.

Let's make sure that the number of page table entries the driver
reports, do not exceeds the one page table size.

Reported-by: Soul Chen <soulchen8650@gmail.com>
Signed-off-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Fixes: CVE-2023-1544
Message-ID: <20230301142926.18686-1-yuval.shaia.ml@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 85fc35afa9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Alvin Chang
0c049eafd5 disas/riscv: Fix the typo of inverted order of pmpaddr13 and pmpaddr14
Fix the inverted order of pmpaddr13 and pmpaddr14 in csr_name().

Signed-off-by: Alvin Chang <alvinga@andestech.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230907084500.328-1-alvinga@andestech.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit cffa995490)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Volker Rümelin
c19fd37eb3 hw/audio/es1370: reset current sample counter
Reset the current sample counter when writing the Channel Sample
Count Register. The Linux ens1370 driver and the AROS sb128
driver expect the current sample counter counts down from sample
count to 0 after a write to the Channel Sample Count Register.
Currently the current sample counter starts from 0 after a reset
or the last count when the counter was stopped.

The current sample counter is used to raise an interrupt whenever
a complete buffer was transferred. When the counter starts with a
value lower than the reload value, the interrupt triggeres before
the buffer was completly transferred. This may lead to corrupted
audio streams.

Tested-by: Rene Engel <ReneEngel80@emailn.de>
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20230917065813.6692-1-vr_qemu@t-online.de>
(cherry picked from commit 00e3b29d06)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Peter Xu
77d36ba300 migration/qmp: Fix crash on setting tls-authz with null
QEMU will crash if anyone tries to set tls-authz (which is a type
StrOrNull) with 'null' value.  Fix it in the easy way by converting it to
qstring just like the other two tls parameters.

Cc: qemu-stable@nongnu.org # v4.0+
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter")
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230905162335.235619-2-peterx@redhat.com>
(cherry picked from commit 86dec715a7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: the function is in m/migration.c, not m/option.c; minor context tweak)
2023-10-21 14:05:14 +03:00
Akihiko Odaki
85c27e5b37 amd_iommu: Fix APIC address check
An MSI from I/O APIC may not exactly equal to APIC_DEFAULT_ADDRESS. In
fact, Windows 17763.3650 configures I/O APIC to set the dest_mode bit.
Cover the range assigned to APIC.

Fixes: 577c470f43 ("x86_iommu/amd: Prepare for interrupt remap support")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230921114612.40671-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0114c45130)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Richard Henderson
24d5b71884 linux-user/hppa: Fix struct target_sigcontext layout
Use abi_ullong not uint64_t so that the alignment of the field
and therefore the layout of the struct is correct.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 33bc4fa78b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-21 14:05:14 +03:00
Thomas Huth
5f2b5606e2 chardev/char-pty: Avoid losing bytes when the other side just (re-)connected
When starting a guest via libvirt with "virsh start --console ...",
the first second of the console output is missing. This is especially
annoying on s390x that only has a text console by default and no graphical
output - if the bios fails to boot here, the information about what went
wrong is completely lost.

One part of the problem (there is also some things to be done on the
libvirt side) is that QEMU only checks with a 1 second timer whether
the other side of the pty is already connected, so the first second of
the console output is always lost.

This likely used to work better in the past, since the code once checked
for a re-connection during write, but this has been removed in commit
f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
some locking.

To ease the situation here at least a little bit, let's check with g_poll()
whether we could send out the data anyway, even if the connection has not
been marked as "connected" yet. The file descriptor is marked as non-blocking
anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
not cause any trouble if the other side is not ready for receiving yet.

With this patch applied, I can now successfully see the bios output of
a s390x guest when running it with "virsh start --console" (with a patched
version of virsh that fixes the remaining issues there, too).

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230816210743.1319018-1-thuth@redhat.com>
(cherry picked from commit 4f7689f081)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: use TFR() instead of RETRY_ON_EINTR() before v7.2.0-538-g8b6aa69365)
2023-10-21 14:04:51 +03:00
Laszlo Ersek
e12abe5a7d hw/display/ramfb: plug slight guest-triggerable leak on mode setting
The fw_cfg DMA write callback in ramfb prepares a new display surface in
QEMU; this new surface is put to use ("swapped in") upon the next display
update. At that time, the old surface (if any) is released.

If the guest triggers the fw_cfg DMA write callback at least twice between
two adjacent display updates, then the second callback (and further such
callbacks) will leak the previously prepared (but not yet swapped in)
display surface.

The issue can be shown by:

(1) starting QEMU with "-trace displaysurface_free", and

(2) running the following program in the guest UEFI shell:

> #include <Library/ShellCEntryLib.h>           // ShellAppMain()
> #include <Library/UefiBootServicesTableLib.h> // gBS
> #include <Protocol/GraphicsOutput.h>          // EFI_GRAPHICS_OUTPUT_PROTOCOL
>
> INTN
> EFIAPI
> ShellAppMain (
>   IN UINTN   Argc,
>   IN CHAR16  **Argv
>   )
> {
>   EFI_STATUS                    Status;
>   VOID                          *Interface;
>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>   UINT32                        Mode;
>
>   Status = gBS->LocateProtocol (
>                   &gEfiGraphicsOutputProtocolGuid,
>                   NULL,
>                   &Interface
>                   );
>   if (EFI_ERROR (Status)) {
>     return 1;
>   }
>
>   Gop = Interface;
>
>   Mode = 1;
>   for ( ; ;) {
>     Status = Gop->SetMode (Gop, Mode);
>     if (EFI_ERROR (Status)) {
>       break;
>     }
>
>     Mode = 1 - Mode;
>   }
>
>   return 1;
> }

The symptom is then that:

- only one trace message appears periodically,

- the time between adjacent messages keeps increasing -- implying that
  some list structure (containing the leaked resources) keeps growing,

- the "surface" pointer is ever different.

> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
> 18566@1695127484.759987:displaysurface_free surface=0x7f2fcc321700
> 18566@1695127485.940289:displaysurface_free surface=0x7f2fccaad100

We figured this wasn't a CVE-worthy problem, as only small amounts of
memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU
only allocates administrative structures), plus libvirt restricts QEMU
memory footprint anyway, thus the guest can only DoS itself.

Plug the leak, by releasing the last prepared (not yet swapped in) display
surface, if any, in the fw_cfg DMA write callback.

Regarding the "reproducer", with the fix in place, the log is flooded with
trace messages (one per fw_cfg write), *and* the trace message alternates
between just two "surface" pointer values (i.e., nothing is leaked, the
allocator flip-flops between two objects in effect).

This issue appears to date back to the introducion of ramfb (995b30179b,
"hw/display: add ramfb, a simple boot framebuffer living in guest ram",
2018-06-18).

Cc: Gerd Hoffmann <kraxel@redhat.com> (maintainer:ramfb)
Cc: qemu-stable@nongnu.org
Fixes: 995b30179b
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230919131955.27223-1-lersek@redhat.com>
(cherry picked from commit e0288a7784)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-05 08:44:37 +03:00
Paolo Bonzini
755cf29ead target/i386: fix memory operand size for CVTPS2PD
CVTPS2PD only loads a half-register for memory, unlike the other
operations under 0x0F 0x5A.  "Unpack" the group into separate
emission functions instead of using gen_unary_fp_sse.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit abd41884c5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:58:16 +03:00
Paolo Bonzini
796468c24a target/i386: generalize operand size "ph" for use in CVTPS2PD
CVTPS2PD only loads a half-register for memory, like CVTPH2PS.  It can
reuse the "ph" packed half-precision size to load a half-register,
but rename it to "xh" because it is now a variation of "x" (it is not
used only for half-precision values).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a48b26978a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:58:16 +03:00
Ricky Zhou
327d65ea97 target/i386: Fix exception classes for MOVNTPS/MOVNTPD.
Before this change, MOVNTPS and MOVNTPD were labeled as Exception Class
4 (only requiring alignment for legacy SSE instructions). This changes
them to Exception Class 1 (always requiring memory alignment), as
documented in the Intel manual.
Message-Id: <20230501111428.95998-3-ricky@rzhou.org>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 8bf171c2d1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:58:06 +03:00
Ricky Zhou
72ef83b12f target/i386: Fix exception classes for SSE/AVX instructions.
Fix the exception classes for some SSE/AVX instructions to match what is
documented in the Intel manual.

These changes are expected to have no functional effect on the behavior
that qemu implements (primarily >= 16-byte memory alignment checks). For
instance, since qemu does not implement the AC flag, there is no
difference in behavior between Exception Classes 4 and 5 for
instructions where the SSE version only takes <16 byte memory operands.
Message-Id: <20230501111428.95998-2-ricky@rzhou.org>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cab529b0dc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:58:06 +03:00
Ricky Zhou
cebd957e7b target/i386: Fix and add some comments next to SSE/AVX instructions.
Adds some comments describing what instructions correspond to decoding
table entries and fixes some existing comments which named the wrong
instruction.
Message-Id: <20230501111428.95998-1-ricky@rzhou.org>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit afa94dabc5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:57:04 +03:00
Paolo Bonzini
91fa137979 tests/tcg/i386: correct mask for VPERM2F128/VPERM2I128
The instructions also use bits 3 and 7 of their 8-byte immediate.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 9e65829699)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:56:26 +03:00
Paolo Bonzini
c9a2d122bf target/i386: fix operand size of unary SSE operations
VRCPSS, VRSQRTSS and VCVTSx2Sx have a 32-bit or 64-bit memory operand,
which is represented in the decoding tables by X86_VEX_REPScalar.  Add it
to the tables, and make validate_vex() handle the case of an instruction
that is in exception type 4 without the REP prefix and exception type 5
with it; this is the cas of VRCP and VRSQRT.

Reported-by: yongwoo <https://gitlab.com/yongwoo36>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1377
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3d304620ec)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-04 17:41:04 +03:00
Mark Cave-Ayland
8da0da8799 scsi-disk: ensure that FORMAT UNIT commands are terminated
Otherwise when a FORMAT UNIT command is issued, the SCSI layer can become
confused because it can find itself in the situation where it thinks there
is still data to be transferred which can cause the next emulated SCSI
command to fail.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: 6ab71761 ("scsi-disk: add FORMAT UNIT command")
Tested-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230913204410.65650-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit be2b619a17)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-03 18:25:00 +03:00
Mark Cave-Ayland
fa2f090d62 esp: restrict non-DMA transfer length to that of available data
In the case where a SCSI layer transfer is incorrectly terminated, it is
possible for a TI command to cause a SCSI buffer overflow due to the
expected transfer data length being less than the available data in the
FIFO. When this occurs the unsigned async_len variable underflows and
becomes a large offset which writes past the end of the allocated SCSI
buffer.

Restrict the non-DMA transfer length to be the smallest of the expected
transfer length and the available FIFO data to ensure that it is no longer
possible for the SCSI buffer overflow to occur.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1810
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230913204410.65650-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 77668e4b9b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-03 18:25:00 +03:00
Mark Cave-Ayland
9b7feb87d0 esp: use correct type for esp_dma_enable() in sysbus_esp_gpio_demux()
The call to esp_dma_enable() was being made with the SYSBUS_ESP type instead of
the ESP type. This meant that when GPIO 1 was being used to trigger a DMA
request from an external DMA controller, the setting of ESPState's dma_enabled
field would clobber unknown memory whilst the dma_cb callback pointer would
typically return NULL so the DMA request would never start.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230913204410.65650-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b86dc5cb0b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-03 18:24:12 +03:00
Fabiano Rosas
ae0b40d9d9 optionrom: Remove build-id section
Our linker script for optionroms specifies only the placement of the
.text section, leaving the linker free to place the remaining sections
at arbitrary places in the file.

Since at least binutils 2.39, the .note.gnu.build-id section is now
being placed at the start of the file, which causes label addresses to
be shifted. For linuxboot_dma.bin that means that the PnP header
(among others) will not be found when determining the type of ROM at
optionrom_setup():

(0x1c is the label _pnph, where the magic "PnP" is)

$ xxd /usr/share/qemu/linuxboot_dma.bin | grep "PnP"
00000010: 0000 0000 0000 0000 0000 1c00 2450 6e50  ............$PnP

$ xxd pc-bios/optionrom/linuxboot_dma.bin | grep "PnP"
00000010: 0000 0000 0000 0000 0000 4c00 2450 6e50  ............$PnP
                                   ^bad

Using a freshly built linuxboot_dma.bin ROM results in a broken boot:

  SeaBIOS (version rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org)
  Booting from Hard Disk...
  Boot failed: could not read the boot disk

  Booting from Floppy...
  Boot failed: could not read the boot disk

  No bootable device.

We're not using the build-id section, so pass the --build-id=none
option to the linker to remove it entirely.

Note: In theory, this same issue could happen with any other
section. The ideal solution would be to have all unused sections
discarded in the linker script. However that would be a larger change,
specially for the pvh rom which uses the .bss and COMMON sections so
I'm addressing only the immediate issue here.

Reported-by: Vasiliy Ulyanov <vulyanov@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230926192502.15986-1-farosas@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 35ed01ba54)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(mjt: remove unrelated stable@vger)
2023-10-03 18:21:41 +03:00
Paolo Bonzini
97037995c1 ui/vnc: fix handling of VNC_FEATURE_XVP
VNC_FEATURE_XVP was not shifted left before adding it to vs->features,
so it was never enabled; but it was also checked the wrong way with
a logical AND instead of vnc_has_feature.  Fix both places.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 477b301000)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-28 07:34:34 +03:00
Paolo Bonzini
0732512dbb ui/vnc: fix debug output for invalid audio message
The debug message was cut and pasted from the invalid audio format
case, but the audio message is at bytes 2-3.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0cb9c5880e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-28 07:33:33 +03:00
Thomas Huth
66aa4b1b4b hw/scsi/scsi-disk: Disallow block sizes smaller than 512 [CVE-2023-42467]
We are doing things like

    nb_sectors /= (s->qdev.blocksize / BDRV_SECTOR_SIZE);

in the code here (e.g. in scsi_disk_emulate_mode_sense()), so if
the blocksize is smaller than BDRV_SECTOR_SIZE (=512), this crashes
with a division by 0 exception. Thus disallow block sizes of 256
bytes to avoid this situation.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1813
CVE: 2023-42467
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230925091854.49198-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7cfcc79b0a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-28 07:32:23 +03:00
Nicholas Piggin
e6fdfb8433 accel/tcg: mttcg remove false-negative halted assertion
mttcg asserts that an execution ending with EXCP_HALTED must have
cpu->halted. However between the event or instruction that sets
cpu->halted and requests exit and the assertion here, an
asynchronous event could clear cpu->halted.

This leads to crashes running AIX on ppc/pseries because it uses
H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and
H_PROD sets other cpu->halted = 0 and kicks it.

H_PROD could be turned into an interrupt to wake, but several other
places in ppc, sparc, and semihosting follow what looks like a similar
pattern setting halted = 0 directly. So remove this assertion.

Reported-by: Ivan Warren <ivan@vmfacility.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20230829010658.8252-1-npiggin@gmail.com>
[rth: Keep the case label and adjust the comment.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 0e5903436d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-27 21:39:52 +03:00
Peter Maydell
c8d2fc2177 target/arm: Don't skip MTE checks for LDRT/STRT at EL0
The LDRT/STRT "unprivileged load/store" instructions behave like
normal ones if executed at EL0. We handle this correctly for
the load/store semantics, but get the MTE checking wrong.

We always look at s->mte_active[is_unpriv] to see whether we should
be doing MTE checks, but in hflags.c when we set the TB flags that
will be used to fill the mte_active[] array we only set the
MTE0_ACTIVE bit if UNPRIV is true (i.e.  we are not at EL0).

This means that a LDRT at EL0 will see s->mte_active[1] as 0,
and will not do MTE checks even when MTE is enabled.

To avoid the translate-time code having to do an explicit check on
s->unpriv to see if it is OK to index into the mte_active[] array,
duplicate MTE_ACTIVE into MTE0_ACTIVE when UNPRIV is false.

(This isn't a very serious bug because generally nobody executes
LDRT/STRT at EL0, because they have no use there.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230912140434.1333369-2-peter.maydell@linaro.org
(cherry picked from commit 903dbefc2b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: before v7.2.0-1636-g671efad16a this code was in target/arm/helper.c)
2023-09-25 23:43:49 +03:00
Li Zhijian
79f113448a hw/cxl: Fix CFMW config memory leak
Allocate targets and targets[n] resources when all sanity checks are
passed to avoid memory leaks.

Cc: qemu-stable@nongnu.org
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 7b165fa164)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Mikulas Patocka
19bbe3a6e9 linux-user/hppa: lock both words of function descriptor
The code in setup_rt_frame reads two words at haddr, but locks only one.
This patch fixes it to lock both.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Helge Deller <deller@gmx.de>
(cherry picked from commit 5b1270ef14)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Mikulas Patocka
96783cfe1e linux-user/hppa: clear the PSW 'N' bit when delivering signals
qemu-hppa may crash when delivering a signal. It can be demonstrated with
this program. Compile the program with "hppa-linux-gnu-gcc -O2 signal.c"
and run it with "qemu-hppa -one-insn-per-tb a.out". It reports that the
address of the flag is 0xb4 and it crashes when attempting to touch it.

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <signal.h>

sig_atomic_t flag;

void sig(int n)
{
	printf("&flag: %p\n", &flag);
	flag = 1;
}

int main(void)
{
	struct sigaction sa;
	struct itimerval it;

	sa.sa_handler = sig;
	sigemptyset(&sa.sa_mask);
	sa.sa_flags = SA_RESTART;
	if (sigaction(SIGALRM, &sa, NULL)) perror("sigaction"), exit(1);

	it.it_interval.tv_sec = 0;
	it.it_interval.tv_usec = 100;
	it.it_value.tv_sec = it.it_interval.tv_sec;
	it.it_value.tv_usec = it.it_interval.tv_usec;

	if (setitimer(ITIMER_REAL, &it, NULL)) perror("setitimer"), exit(1);

	while (1) {
	}
}

The reason for the crash is that the signal handling routine doesn't clear
the 'N' flag in the PSW. If the signal interrupts a thread when the 'N'
flag is set, the flag remains set at the beginning of the signal handler
and the first instruction of the signal handler is skipped.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Helge Deller <deller@gmx.de>
(cherry picked from commit 2529497cb6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Nicholas Piggin
270bf7977f hw/ppc: Always store the decrementer value
When writing a value to the decrementer that raises an exception, the
irq is raised, but the value is not stored so the store doesn't appear
to have changed the register when it is read again.

Always store the write value to the register.

Fixes: e81a982aa5 ("PPC: Clean up DECR implementation")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
(cherry picked from commit febb71d543)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Nicholas Piggin
e916d29608 target/ppc: Decrementer fix BookE semantics
The decrementer store function has logic that short-cuts the timer if a
very small value is stored (0, 1, or 2) and raises an interrupt
directly. There are two problem with this on BookE.

First is that BookE says a decrementer interrupt should not be raised
on a store of 0, only of a decrement from 1. Second is that raising
the irq directly will bypass the auto-reload logic in the booke decr
timer function, breaking autoreload when 1 or 2 is stored.

Fix this by removing that small-value special case. It makes this
tricky logic even more difficult to reason about, and it hardly matters
for performance.

Cc: sdicaro@DDCI.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20230530131214.373524-2-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 17dd1354c1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Nicholas Piggin
8cfe6d241b target/ppc: Sign-extend large decrementer to 64-bits
When storing a large decrementer value with the most significant
implemented bit set, it is to be treated as a negative and sign
extended.

This isn't hit for book3s DEC because of another bug, fixing it
in the next patch exposes this one and can cause additional
problems, so fix this first. It can be hit with HDECR and other
edge triggered types.

Fixes: a8dafa5251 ("target/ppc: Implement large decrementer support for TCG")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: removed extra cpu and pcc variables shadowing local variables ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
(cherry picked from commit c8fbc6b9f2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Nicholas Piggin
5eadeeec0d hw/ppc: Avoid decrementer rounding errors
The decrementer register contains a relative time in timebase units.
When writing to DECR this is converted and stored as an absolute value
in nanosecond units, reading DECR converts back to relative timebase.

The tb<->ns conversion of the relative part can cause rounding such that
a value writen to the decrementer can read back a different, with time
held constant. This is a particular problem for a deterministic icount
and record-replay trace.

Fix this by storing the absolute value in timebase units rather than
nanoseconds. The math before:
  store:  decr_next = now_ns + decr * ns_per_sec / tb_per_sec
  load:        decr = (decr_next - now_ns) * tb_per_sec / ns_per_sec
  load(store): decr = decr * ns_per_sec / tb_per_sec * tb_per_sec /
                      ns_per_sec

After:
  store:  decr_next = now_ns * tb_per_sec / ns_per_sec + decr
  load:        decr = decr_next - now_ns * tb_per_sec / ns_per_sec
  load(store): decr = decr

Fixes: 9fddaa0c0c ("PowerPC merge: real time TB and decrementer - faster and simpler exception handling (Jocelyn Mayer)")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
(cherry picked from commit 8e0a5ac878)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00
Nicholas Piggin
b9a0f1194a hw/ppc: Round up the decrementer interval when converting to ns
The rule of timers is typically that they should never expire before the
timeout, but some time afterward. Rounding timer intervals up when doing
conversion is the right thing to do.

Under most circumstances it is impossible observe the decrementer
interrupt before the dec register has triggered. However with icount
timing, problems can arise. For example setting DEC to 0 can schedule
the timer for now, causing it to fire before any more instructions
have been executed and DEC is still 0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
(cherry picked from commit eab0888418)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-25 23:43:49 +03:00