Commit Graph

108721 Commits

Author SHA1 Message Date
Richard Henderson
adc8467e69 host/include/loongarch64: Add atomic16 load and store
While loongarch64 does not have a 128-bit cmpxchg, it does
have 128-bit atomic load and store via the vector unit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230916220151.526140-6-richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
f2a553481e tcg/loongarch64: Use cpuinfo.h
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiajie Chen <c@jia.je>
Message-Id: <20230916220151.526140-5-richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
0885f1221e util: Add cpuinfo for loongarch64
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiajie Chen <c@jia.je>
Message-Id: <20230916220151.526140-4-richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
2b2ae0a42e tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
Use new registers for the output, so that we never overlap
the input address, which could happen for user-only.
This avoids a "tmp = addr + 0" in that case.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiajie Chen <c@jia.je>
Message-Id: <20230916220151.526140-3-richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
fa645b48d3 tcg: Add C_N2_I1
Constraint with two outputs, both in new registers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiajie Chen <c@jia.je>
Message-Id: <20230916220151.526140-2-richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
24a4d59aa7 accel/tcg: Move HMP info jit and info opcount code
Move all of it into accel/tcg/monitor.c.  This puts everything
about tcg that is only used by the monitor in the same place.

Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Naohiro Aota
ad4feaca61 file-posix: fix over-writing of returning zone_append offset
raw_co_zone_append() sets "s->offset" where "BDRVRawState *s". This pointer
is used later at raw_co_prw() to save the block address where the data is
written.

When multiple IOs are on-going at the same time, a later IO's
raw_co_zone_append() call over-writes a former IO's offset address before
raw_co_prw() completes. As a result, the former zone append IO returns the
initial value (= the start address of the writing zone), instead of the
proper address.

Fix the issue by passing the offset pointer to raw_co_prw() instead of
passing it through s->offset. Also, remove "offset" from BDRVRawState as
there is no usage anymore.

Fixes: 4751d09adc ("block: introduce zone append write for zoned devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Message-Id: <20231030073853.2601162-1-naohiro.aota@wdc.com>
Reviewed-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-11-06 16:15:07 +01:00
Sam Li
10b9e0802a block/file-posix: fix update_zones_wp() caller
When the zoned request fail, it needs to update only the wp of
the target zones for not disrupting the in-flight writes on
these other zones. The wp is updated successfully after the
request completes.

Fixed the callers with right offset and nr_zones.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-Id: <20230825040556.4217-1-faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[hreitz: Rebased and fixed comment spelling]
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-11-06 16:15:07 +01:00
Jean-Louis Dupond
b2b109041e qcow2: keep reference on zeroize with discard-no-unref enabled
When the discard-no-unref flag is enabled, we keep the reference for
normal discard requests.
But when a discard is executed on a snapshot/qcow2 image with backing,
the discards are saved as zero clusters in the snapshot image.

When committing the snapshot to the backing file, not
discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
any logic to keep the reference when discard-no-unref is enabled.

Therefor we add logic in the zero_in_l2_slice call to keep the reference
on commit.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-Id: <20231003125236.216473-2-jean-louis@dupond.be>
[hreitz: Made the documentation change more verbose, as discussed
         on-list]
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-11-06 16:15:07 +01:00
Peter Maydell
5722fc4712 target/arm: Fix A64 LDRA immediate decode
In commit be23a049 in the conversion to decodetree we broke the
decoding of the immediate value in the LDRA instruction.  This should
be a 10 bit signed value that is scaled by 8, but in the conversion
we incorrectly ended up scaling it only by 2.  Fix the scaling
factor.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1970
Fixes: be23a049 ("target/arm: Convert load (pointer auth) insns to decodetree")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20231106113445.1163063-1-peter.maydell@linaro.org
2023-11-06 15:00:29 +00:00
Peter Maydell
13edcf591e hw/arm/vexpress-a9: Remove useless mapping of RAM at address 0
On the vexpress-a9 board we try to map both RAM and flash to address 0,
as seen in "info mtree":

address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): alias vexpress.flashalias @vexpress.flash0 0000000000000000-0000000003ffffff
    0000000000000000-0000000003ffffff (prio 0, ram): alias vexpress.lowmem @vexpress.highmem 0000000000000000-0000000003ffffff
    0000000010000000-0000000010000fff (prio 0, i/o): arm-sysctl
    0000000010004000-0000000010004fff (prio 0, i/o): pl041
(etc)

The flash "wins" and the RAM mapping is useless (but also harmless).

This happened as a result of commit 6ec1588e in 2014, which changed
"we always map the RAM to the low addresses for vexpress-a9" to "we
always map flash in the low addresses", but forgot to stop mapping
the RAM.

In real hardware, this low part of memory is remappable, both at
runtime by the guest writing to a control register, and configurably
as to what you get out of reset -- you can have the first flash
device, or the second, or the DDR2 RAM, or the external AXI bus
(which for QEMU means "nothing there").  In an ideal world we would
support that remapping both at runtime and via a machine property to
select the out-of-reset behaviour.

Pending anybody caring enough to implement the full remapping
behaviour:
 * remove the useless mapped-but-inaccessible lowram MR
 * document that QEMU doesn't support remapping of low memory

Fixes: 6ec1588e ("hw/arm/vexpress: Alias NOR flash at 0 for vexpress-a9")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1761
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20231103185602.875849-1-peter.maydell@linaro.org
2023-11-06 15:00:29 +00:00
Vladimir Sementsov-Ogievskiy
35bafa95da io/channel-socket: qio_channel_socket_flush(): improve msg validation
For SO_EE_ORIGIN_ZEROCOPY the 32-bit notification range is encoded
as [ee_info, ee_data] inclusively, so ee_info should be less or
equal to ee_data.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-7-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:28 +00:00
Vladimir Sementsov-Ogievskiy
59a3aff685 hw/core/loader: gunzip(): initialize z_stream
Coverity signals that variable as being used uninitialized. And really,
when work with external APIs that's better to zero out the structure,
where we set some fields by hand.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-6-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:28 +00:00
Vladimir Sementsov-Ogievskiy
cc8fb0c3ae block/nvme: nvme_process_completion() fix bound for cid
NVMeQueuePair::reqs has length NVME_NUM_REQS, which less than
NVME_QUEUE_SIZE by 1.

Fixes: 1086e95da1 ("block/nvme: switch to a NVMeRequest freelist")
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-5-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:28 +00:00
Vladimir Sementsov-Ogievskiy
394bca2fa4 mc146818rtc: rtc_set_time(): initialize tm to zeroes
set_time() function doesn't set all the fields, so it's better to
initialize tm structure. And Coverity will be happier about it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-4-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:27 +00:00
Vladimir Sementsov-Ogievskiy
2e12dd405c util/filemonitor-inotify: qemu_file_monitor_watch(): assert no overflow
Prefer clear assertions instead of [im]possible array overflow.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-3-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:27 +00:00
Vladimir Sementsov-Ogievskiy
212c5fe191 hw/i386/intel_iommu: vtd_slpte_nonzero_rsvd(): assert no overflow
We support only 3- and 4-level page-tables, which is firstly checked in
vtd_decide_config(), then setup in vtd_init(). Than level fields are
checked by vtd_is_level_supported().

So here we can't have level out from 1..4 inclusive range. Let's assert
it. That also explains Coverity that we are not going to overflow the
array.

CID: 1487158, 1487186
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-2-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:27 +00:00
Peter Maydell
806f71eecf tests/qtest/bios-tables-test: Update virt SPCR and DBG2 golden references
Update the virt SPCR and DBG2 golden reference files to have the
fix for the description of the UART.

Diffs from iasl:

@@ -1,57 +1,57 @@
 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20200925 (64-bit version)
  * Copyright (c) 2000 - 2020 Intel Corporation
  *
- * Disassembly of tests/data/acpi/virt/SPCR, Fri Nov  3 14:12:06 2023
+ * Disassembly of /tmp/aml-E6YUD2, Fri Nov  3 14:12:06 2023
  *
  * ACPI Data Table [SPCR]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
  */

 [000h 0000   4]                    Signature : "SPCR"    [Serial Port Console Redirection table]
 [004h 0004   4]                 Table Length : 00000050
 [008h 0008   1]                     Revision : 02
-[009h 0009   1]                     Checksum : CB
+[009h 0009   1]                     Checksum : B1
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPC    "
 [018h 0024   4]                 Oem Revision : 00000001
 [01Ch 0028   4]              Asl Compiler ID : "BXPC"
 [020h 0032   4]        Asl Compiler Revision : 00000001

 [024h 0036   1]               Interface Type : 03
 [025h 0037   3]                     Reserved : 000000

 [028h 0040  12]         Serial Port Register : [Generic Address Structure]
 [028h 0040   1]                     Space ID : 00 [SystemMemory]
-[029h 0041   1]                    Bit Width : 08
+[029h 0041   1]                    Bit Width : 20
 [02Ah 0042   1]                   Bit Offset : 00
-[02Bh 0043   1]         Encoded Access Width : 01 [Byte Access:8]
+[02Bh 0043   1]         Encoded Access Width : 03 [DWord Access:32]
 [02Ch 0044   8]                      Address : 0000000009000000

 [034h 0052   1]               Interrupt Type : 08
 [035h 0053   1]          PCAT-compatible IRQ : 00
 [036h 0054   4]                    Interrupt : 00000021
 [03Ah 0058   1]                    Baud Rate : 03
 [03Bh 0059   1]                       Parity : 00
 [03Ch 0060   1]                    Stop Bits : 01
 [03Dh 0061   1]                 Flow Control : 02
 [03Eh 0062   1]                Terminal Type : 00
 [04Ch 0076   1]                     Reserved : 00
 [040h 0064   2]                PCI Device ID : FFFF
 [042h 0066   2]                PCI Vendor ID : FFFF
 [044h 0068   1]                      PCI Bus : 00
 [045h 0069   1]                   PCI Device : 00
 [046h 0070   1]                 PCI Function : 00
 [047h 0071   4]                    PCI Flags : 00000000
 [04Bh 0075   1]                  PCI Segment : 00
 [04Ch 0076   4]                     Reserved : 00000000

 Raw Table Data: Length 80 (0x50)

-    0000: 53 50 43 52 50 00 00 00 02 CB 42 4F 43 48 53 20  // SPCRP.....BOCHS
+    0000: 53 50 43 52 50 00 00 00 02 B1 42 4F 43 48 53 20  // SPCRP.....BOCHS
     0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
-    0020: 01 00 00 00 03 00 00 00 00 08 00 01 00 00 00 09  // ................
+    0020: 01 00 00 00 03 00 00 00 00 20 00 03 00 00 00 09  // ......... ......
     0030: 00 00 00 00 08 00 21 00 00 00 03 00 01 02 00 00  // ......!.........
     0040: FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00  // ................

@@ -1,57 +1,57 @@
 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20200925 (64-bit version)
  * Copyright (c) 2000 - 2020 Intel Corporation
  *
- * Disassembly of tests/data/acpi/virt/DBG2, Fri Nov  3 14:12:06 2023
+ * Disassembly of /tmp/aml-V1YUD2, Fri Nov  3 14:12:06 2023
  *
  * ACPI Data Table [DBG2]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
  */

 [000h 0000   4]                    Signature : "DBG2"    [Debug Port table type 2]
 [004h 0004   4]                 Table Length : 00000057
 [008h 0008   1]                     Revision : 00
-[009h 0009   1]                     Checksum : CF
+[009h 0009   1]                     Checksum : B5
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPC    "
 [018h 0024   4]                 Oem Revision : 00000001
 [01Ch 0028   4]              Asl Compiler ID : "BXPC"
 [020h 0032   4]        Asl Compiler Revision : 00000001

 [024h 0036   4]                  Info Offset : 0000002C
 [028h 0040   4]                   Info Count : 00000001

 [02Ch 0044   1]                     Revision : 00
 [02Dh 0045   2]                       Length : 002B
 [02Fh 0047   1]               Register Count : 01
 [030h 0048   2]              Namepath Length : 0005
 [032h 0050   2]              Namepath Offset : 0026
 [034h 0052   2]              OEM Data Length : 0000 [Optional field not present]
 [036h 0054   2]              OEM Data Offset : 0000 [Optional field not present]
 [038h 0056   2]                    Port Type : 8000
 [03Ah 0058   2]                 Port Subtype : 0003
 [03Ch 0060   2]                     Reserved : 0000
 [03Eh 0062   2]          Base Address Offset : 0016
 [040h 0064   2]          Address Size Offset : 0022

 [042h 0066  12]        Base Address Register : [Generic Address Structure]
 [042h 0066   1]                     Space ID : 00 [SystemMemory]
-[043h 0067   1]                    Bit Width : 08
+[043h 0067   1]                    Bit Width : 20
 [044h 0068   1]                   Bit Offset : 00
-[045h 0069   1]         Encoded Access Width : 01 [Byte Access:8]
+[045h 0069   1]         Encoded Access Width : 03 [DWord Access:32]
 [046h 0070   8]                      Address : 0000000009000000

 [04Eh 0078   4]                 Address Size : 00001000

 [052h 0082   5]                     Namepath : "COM0"

 Raw Table Data: Length 87 (0x57)

-    0000: 44 42 47 32 57 00 00 00 00 CF 42 4F 43 48 53 20  // DBG2W.....BOCHS
+    0000: 44 42 47 32 57 00 00 00 00 B5 42 4F 43 48 53 20  // DBG2W.....BOCHS
     0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
     0020: 01 00 00 00 2C 00 00 00 01 00 00 00 00 2B 00 01  // ....,........+..
     0030: 05 00 26 00 00 00 00 00 00 80 03 00 00 00 16 00  // ..&.............
-    0040: 22 00 00 08 00 01 00 00 00 09 00 00 00 00 00 10  // "...............
+    0040: 22 00 00 20 00 03 00 00 00 09 00 00 00 00 00 10  // ".. ............
     0050: 00 00 43 4F 4D 30 00                             // ..COM0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:27 +00:00
Udo Steinberg
41f7b58b63 hw/arm/virt: Report correct register sizes in ACPI DBG2/SPCR tables.
Documentation for using the GAS in ACPI tables to report debug UART addresses at
https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/acpi-debug-port-table
states the following:

- The Register Bit Width field contains the register stride and must be a
  power of 2 that is at least as large as the access size.  On 32-bit
  platforms this value cannot exceed 32.  On 64-bit platforms this value
  cannot exceed 64.
- The Access Size field is used to determine whether byte, WORD, DWORD, or
  QWORD accesses are to be used.  QWORD accesses are only valid on 64-bit
  architectures.

Documentation for the ARM PL011 at
https://developer.arm.com/documentation/ddi0183/latest/
states that the registers are:

- spaced 4 bytes apart (see Table 3-2), so register stride must be 32.
- 16 bits in size in some cases (see individual registers), so access
  size must be at least 2.

Linux doesn't seem to care about this error in the table, but it does
affect at least the NOVA microhypervisor.

In theory we therefore have a choice between reporting the access
size as 2 (16 bit accesses) or 3 (32-bit accesses).  In practice,
Linux does not correctly handle the case where the table reports the
access size as 2: as of kernel commit 750b95887e5678, the code in
acpi_parse_spcr() tries to tell the serial driver to use 16 bit
accesses by passing "mmio16" in the option string, but the PL011
driver code in pl011_console_match() only recognizes "mmio" or
"mmio32". The result is that unless the user has enabled 'earlycon'
there is no console output from the guest kernel.

We therefore choose to report the access size as 32 bits; this works
for NOVA and also for Linux.  It is also what the UEFI firmware on a
Raspberry Pi 4 reports, so we're in line with existing real-world
practice.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1938
Signed-off-by: Udo Steinberg <udo@hypervisor.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: minor commit message tweaks; use 32 bit accesses]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:26 +00:00
Peter Maydell
1eb2888ecd tests/qtest/bios-tables-test: Allow changes to virt SPCR and DBG2
Allow changes to the virt board SPCR and DBG2 -- we are going to fix
an error in the UART descriptions there.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:26 +00:00
Sebastian Ott
fa68ecb330 hw/arm/virt: fix PMU IRQ registration
Since commit 9036e917f8 ("{include/}hw/arm: refactor virt PPI logic")
PMU IRQ registration fails for arm64 guests:

[    0.563689] hw perfevents: unable to request IRQ14 for ARM PMU counters
[    0.565160] armv8-pmu: probe of pmu failed with error -22

That commit re-defined VIRTUAL_PMU_IRQ to be a INTID but missed a case
where the PMU IRQ is actually referred by its PPI index. Fix that by using
INTID_TO_PPI() in that case.

Fixes: 9036e917f8 ("{include/}hw/arm: refactor virt PPI logic")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1960
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 475d918d-ab0e-f717-7206-57a5beb28c7b@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-06 15:00:26 +00:00
Marc-André Lureau
10b9ddbc83 Revert "virtio-gpu: block migration of VMs with blob=true"
If we decide to apply this patch (for easier backporting reasons), we
can now revert it.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
2023-11-06 17:30:01 +04:00
Marc-André Lureau
f66767f75c virtio-gpu: add virtio-gpu/blob vmstate subsection
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
2023-11-06 17:29:54 +04:00
Maciej S. Szmigiero
00313b517d MAINTAINERS: Add an entry for Hyper-V Dynamic Memory Protocol
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
Maciej S. Szmigiero
9a52aa40dc hw/i386/pc: Support hv-balloon
Add the necessary plumbing for the hv-balloon driver to the PC machine.

Co-developed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
Maciej S. Szmigiero
259ebed45a qapi: Add HV_BALLOON_STATUS_REPORT event and its QMP query command
Used by the hv-balloon driver for (optional) guest memory status reports.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
Maciej S. Szmigiero
16dff2f9bb qapi: Add query-memory-devices support to hv-balloon
Used by the driver to report its provided memory state information.

Co-developed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
Maciej S. Szmigiero
99a4706ae8 Add Hyper-V Dynamic Memory Protocol driver (hv-balloon) hot-add support
One of advantages of using this protocol over ACPI-based PC DIMM hotplug is
that it allows hot-adding memory in much smaller granularity because the
ACPI DIMM slot limit does not apply.

In order to enable this functionality a new memory backend needs to be
created and provided to the driver via the "memdev" parameter.

This can be achieved by, for example, adding
"-object memory-backend-ram,id=mem1,size=32G" to the QEMU command line and
then instantiating the driver with "memdev=mem1" parameter.

The device will try to use multiple memslots to cover the memory backend in
order to reduce the size of metadata for the not-yet-hot-added part of the
memory backend.

Co-developed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
Maciej S. Szmigiero
0d9e8c0b67 Add Hyper-V Dynamic Memory Protocol driver (hv-balloon) base
This driver is like virtio-balloon on steroids: it allows both changing the
guest memory allocation via ballooning and (in the next patch) inserting
pieces of extra RAM into it on demand from a provided memory backend.

The actual resizing is done via ballooning interface (for example, via
the "balloon" HMP command).
This includes resizing the guest past its boot size - that is, hot-adding
additional memory in granularity limited only by the guest alignment
requirements, as provided by the next patch.

In contrast with ACPI DIMM hotplug where one can only request to unplug a
whole DIMM stick this driver allows removing memory from guest in single
page (4k) units via ballooning.

After a VM reboot the guest is back to its original (boot) size.

In the future, the guest boot memory size might be changed on reboot
instead, taking into account the effective size that VM had before that
reboot (much like Hyper-V does).

For performance reasons, the guest-released memory is tracked in a few
range trees, as a series of (start, count) ranges.
Each time a new page range is inserted into such tree its neighbors are
checked as candidates for possible merging with it.

Besides performance reasons, the Dynamic Memory protocol itself uses page
ranges as the data structure in its messages, so relevant pages need to be
merged into such ranges anyway.

One has to be careful when tracking the guest-released pages, since the
guest can maliciously report returning pages outside its current address
space, which later clash with the address range of newly added memory.
Similarly, the guest can report freeing the same page twice.

The above design results in much better ballooning performance than when
using virtio-balloon with the same guest: 230 GB / minute with this driver
versus 70 GB / minute with virtio-balloon.

During a ballooning operation most of time is spent waiting for the guest
to come up with newly freed page ranges, processing the received ranges on
the host side (in QEMU and KVM) is nearly instantaneous.

The unballoon operation is also pretty much instantaneous:
thanks to the merging of the ballooned out page ranges 200 GB of memory can
be returned to the guest in about 1 second.
With virtio-balloon this operation takes about 2.5 minutes.

These tests were done against a Windows Server 2019 guest running on a
Xeon E5-2699, after dirtying the whole memory inside guest before each
balloon operation.

Using a range tree instead of a bitmap to track the removed memory also
means that the solution scales well with the guest size: even a 1 TB range
takes just a few bytes of such metadata.

Since the required GTree operations aren't present in every Glib version
a check for them was added to the meson build script, together with new
"--enable-hv-balloon" and "--disable-hv-balloon" configure arguments.
If these GTree operations are missing in the system's Glib version this
driver will be skipped during QEMU build.

An optional "status-report=on" device parameter requests memory status
events from the guest (typically sent every second), which allow the host
to learn both the guest memory available and the guest memory in use
counts.

Following commits will add support for their external emission as
"HV_BALLOON_STATUS_REPORT" QMP events.

The driver is named hv-balloon since the Linux kernel client driver for
the Dynamic Memory Protocol is named as such and to follow the naming
pattern established by the virtio-balloon driver.
The whole protocol runs over Hyper-V VMBus.

The driver was tested against Windows Server 2012 R2, Windows Server 2016
and Windows Server 2019 guests and obeys the guest alignment requirements
reported to the host via DM_CAPABILITIES_REPORT message.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
Maciej S. Szmigiero
4f80cd2f03 Add Hyper-V Dynamic Memory Protocol definitions
This commit adds Hyper-V Dynamic Memory Protocol definitions, taken
from hv_balloon Linux kernel driver, adapted to the QEMU coding style and
definitions.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 13:54:57 +01:00
David Hildenbrand
eb1b7c4bd4 memory-device: Drop size alignment check
There is no strong requirement that the size has to be multiples of the
requested alignment, let's drop it. This is a preparation for hv-baloon.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 13:54:57 +01:00
Maciej S. Szmigiero
2d7f108186 Revert "hw/virtio/virtio-pmem: Replace impossible check by assertion"
This reverts commit 5960f254db since the
previous commit made this situation possible again.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 13:53:59 +01:00
Zhenzhong Duan
a2347c60a8 vfio/common: Move vfio_host_win_add/del into spapr.c
Only spapr supports a customed host window list, other vfio driver
assume 64bit host window. So remove the check in listener callback
and move vfio_host_win_add/del into spapr.c and make it static.

With the check removed, we still need to do the same check for
VFIO_SPAPR_TCE_IOMMU which allows a single host window range
[dma32_window_start, dma32_window_size). Move vfio_find_hostwin
into spapr.c and do same check in vfio_container_add_section_window
instead.

When mapping a ram device section, if it's unaligned with
hostwin->iova_pgsizes, this mapping is bypassed. With hostwin
moved into spapr, we changed to check container->pgsizes.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-06 13:23:23 +01:00
Zhenzhong Duan
a17879f0e2 vfio/spapr: Make vfio_spapr_create/remove_window static
vfio_spapr_create_window calls vfio_spapr_remove_window,
With reoder of definition of the two, we can make
vfio_spapr_create/remove_window static.

No functional changes intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-06 13:23:23 +01:00
Zhenzhong Duan
770c3b6e43 vfio/container: Move spapr specific init/deinit into spapr.c
Move spapr specific init/deinit code into spapr.c and wrap
them with vfio_spapr_container_init/deinit, this way footprint
of spapr is further reduced, vfio_prereg_listener could also
be made static.

vfio_listener_release is unnecessary when prereg_listener is
moved out, so have it removed.

No functional changes intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-06 13:23:23 +01:00
Zhenzhong Duan
521c8f4ebc vfio/container: Move vfio_container_add/del_section_window into spapr.c
vfio_container_add/del_section_window are spapr specific functions,
so move them into spapr.c to make container.c cleaner.

No functional changes intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-06 13:23:23 +01:00
Zhenzhong Duan
54876d25fe vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
With vfio_eeh_as_ok/vfio_eeh_as_op moved and made static,
vfio.h becomes empty and is deleted.

No functional changes intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-06 13:23:23 +01:00
Marc-André Lureau
0f2a301db3 virtio-gpu: move scanout restoration to post_load
As we are going to introduce an extra subsection for "blob" resources,
scanout have to be restored after.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
2023-11-06 16:13:34 +04:00
Marc-André Lureau
e92ffae6ba virtio-gpu: factor out restore mapping
The same function is going to be used next to restore "blob" resources.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
2023-11-06 16:13:30 +04:00
Marc-André Lureau
9c549ab689 virtio-gpu: block migration of VMs with blob=true
"blob" resources don't have an associated pixman image:

#0  pixman_image_get_stride (image=0x0) at ../pixman/pixman-image.c:921
#1  0x0000562327c25236 in virtio_gpu_save (f=0x56232bb13b00, opaque=0x56232b555a60, size=0, field=0x5623289ab6c8 <__compound_literal.3+104>, vmdesc=0x56232ab59fe0) at ../hw/display/virtio-gpu.c:1225

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=2236353

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
2023-11-06 16:12:56 +04:00
BALATON Zoltan
08730ee0cc ati-vga: Implement fallback for pixman routines
Pixman routines can fail if no implementation is available and it will
become optional soon so add fallbacks when pixman does not work.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <ed0fba3f74e48143f02228b83bf8796ca49f3e7d.1698871239.git.balaton@eik.bme.hu>
2023-11-06 15:58:45 +04:00
BALATON Zoltan
bf9ac62a92 ati-vga: Add 30 bit palette access register
Radeon cards have a 30 bit DAC and corresponding palette register to
access it. We only use 8 bits but let the guests use 10 bit color
values for those that access it through this register.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <9fa19eec95d1563cc65853cf26912f230c702b32.1698871239.git.balaton@eik.bme.hu>
2023-11-06 15:58:43 +04:00
BALATON Zoltan
e876b3400a ati-vga: Support unaligned access to GPIO DDC registers
The GPIO_VGA_DDC and GPIO_DVI_DDC registers are used on Radeon for DDC
access. Some drivers like the PPC Mac FCode ROM uses unaligned writes
to these registers so implement this the same way as already done for
GPIO_MONID which is used the same way for the Rage 128 Pro.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <dff6ce16ccabdfd54ffda348bf57c6d8b810cd98.1698871239.git.balaton@eik.bme.hu>
2023-11-06 15:58:40 +04:00
BALATON Zoltan
f7ecde051d ati-vga: Fix aperture sizes
Apparently these should be half the memory region sizes confirmed at
least by Radeon FCocde ROM while Rage 128 Pro ROMs don't seem to use
these. Linux r100 DRM driver also checks for a bit in HOST_PATH_CNTL
so we also add that even though the FCode ROM does not seem to set it.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <d077d4f90d19db731df78da6f05058db074cada1.1698871239.git.balaton@eik.bme.hu>
2023-11-06 15:58:37 +04:00
Cong Liu
9d9ae0f07b virtio-gpu-rutabaga: Add empty interface to fix arm64 crash
Add an empty element to the interfaces array, which is consistent with
the behavior of other devices in qemu and fixes the crash on arm64.

0  0x0000fffff5c18550 in  () at /usr/lib64/libc.so.6
1  0x0000fffff6c9cd6c in g_strdup () at /usr/lib64/libglib-2.0.so.0
2  0x0000aaaaab4945d8 in g_strdup_inline (str=<optimized out>) at /usr/include/glib-2.0/glib/gstrfuncs.h:321
3  type_new (info=info@entry=0xaaaaabc1b2c8 <virtio_gpu_rutabaga_pci_info>) at ../qom/object.c:133
4  0x0000aaaaab494f14 in type_register_internal (info=0xaaaaabc1b2c8 <virtio_gpu_rutabaga_pci_info>) at ../qom/object.c:143
5  type_register (info=0xaaaaabc1b2c8 <virtio_gpu_rutabaga_pci_info>) at ../qom/object.c:152
6  type_register_static (info=0xaaaaabc1b2c8 <virtio_gpu_rutabaga_pci_info>) at ../qom/object.c:157
7  type_register_static_array (infos=<optimized out>, nr_infos=<optimized out>) at ../qom/object.c:165
8  0x0000aaaaab6147e8 in module_call_init (type=type@entry=MODULE_INIT_QOM) at ../util/module.c:109
9  0x0000aaaaab10a0ec in qemu_init_subsystems () at ../system/runstate.c:817
10 0x0000aaaaab10d334 in qemu_init (argc=13, argv=0xfffffffff198) at ../system/vl.c:2760
11 0x0000aaaaaae4da6c in main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:47

Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20231031012515.15504-1-liucong2@kylinos.cn>
2023-11-06 14:25:30 +04:00
David Woodhouse
a1c1082908 hw/xen: use correct default protocol for xen-block on x86
Even on x86_64 the default protocol is the x86-32 one if the guest doesn't
specifically ask for x86-64.

Cc: qemu-stable@nongnu.org
Fixes: b6af8926fb ("xen: add implementations of xen-block connect and disconnect functions...")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06 10:03:45 +00:00
David Woodhouse
debc995e88 hw/xen: take iothread mutex in xen_evtchn_reset_op()
The xen_evtchn_soft_reset() function requires the iothread mutex, but is
also called for the EVTCHNOP_reset hypercall. Ensure the mutex is taken
in that case.

Cc: qemu-stable@nongnu.org
Fixes: a15b10978f ("hw/xen: Implement EVTCHNOP_reset")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06 10:03:45 +00:00
David Woodhouse
4a5780f520 hw/xen: fix XenStore watch delivery to guest
When fire_watch_cb() found the response buffer empty, it would call
deliver_watch() to generate the XS_WATCH_EVENT message in the response
buffer and send an event channel notification to the guest… without
actually *copying* the response buffer into the ring. So there was
nothing for the guest to see. The pending response didn't actually get
processed into the ring until the guest next triggered some activity
from its side.

Add the missing call to put_rsp().

It might have been slightly nicer to call xen_xenstore_event() here,
which would *almost* have worked. Except for the fact that it calls
xen_be_evtchn_pending() to check that it really does have an event
pending (and clear the eventfd for next time). And under Xen it's
defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
so the emu implementation follows suit.

This fixes Xen device hot-unplug.

Cc: qemu-stable@nongnu.org
Fixes: 0254c4d19d ("hw/xen: Add xenstore wire implementation and implementation stubs")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06 10:03:45 +00:00
David Woodhouse
3de75ed352 hw/xen: don't clear map_track[] in xen_gnttab_reset()
The refcounts actually correspond to 'active_ref' structures stored in a
GHashTable per "user" on the backend side (mostly, per XenDevice).

If we zero map_track[] on reset, then when the backend drivers get torn
down and release their mapping we hit the assert(s->map_track[ref] != 0)
in gnt_unref().

So leave them in place. Each backend driver will disconnect and reconnect
as the guest comes back up again and reconnects, and it all works out OK
in the end as the old refs get dropped.

Cc: qemu-stable@nongnu.org
Fixes: de26b26197 ("hw/xen: Implement soft reset for emulated gnttab")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06 10:03:45 +00:00
David Woodhouse
18e83f28bf hw/xen: select kernel mode for per-vCPU event channel upcall vector
A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.

For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:

       /* Trick toolstack to think we are enlightened. */
       if (!cpu)
               rc = xen_set_callback_via(1);

That's explicitly setting the delivery to GSI#1, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in Qemu
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.

Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.

Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)

Cc: qemu-stable@nongnu.org
Fixes: 91cce75617 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06 10:03:45 +00:00