Commit Graph

84618 Commits

Author SHA1 Message Date
Cédric Le Goater
05ce9b73b8 ppc/pnv: Use skiboot addresses to load kernel and ramfs
The current settings are useful to load large kernels (with debug) but
it moves the initrd image in a memory region not protected by
skiboot. If skiboot is compiled with DEBUG=1, memory poisoning will
corrupt the initrd.

Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-4-clg@kaod.org>
Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Cédric Le Goater
cb9428642e ppc/xive: Add firmware bit when dumping the ENDs
ENDs allocated by OPAL for the HW thread VPs are tagged as owned by FW.
Dump the state in 'info pic'.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Cédric Le Goater
2cfc9f1a96 ppc/pnv: Add trace events for PCI event notification
On POWER9 systems, PHB controllers signal the XIVE interrupt controller
of a source interrupt notification using a store on a MMIO region. Add
traces for such events.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Philippe Mathieu-Daudé
1f38f48900 target/ppc: Remove unused MMU definitions
Remove these confusing and unused definitions.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210127232401.3525126-1-f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Greg Kurz
040bdafce1 spapr: Adjust firmware path of PCI devices
It is currently not possible to perform a strict boot from USB storage:

$ qemu-system-ppc64 -accel kvm -nodefaults -nographic -serial stdio \
	-boot strict=on \
	-device qemu-xhci \
	-device usb-storage,drive=disk,bootindex=0 \
	-blockdev driver=file,node-name=disk,filename=fedora-ppc64le.qcow2

SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 17 2020 11:15:24
 FW Version = git-e18ddad8516ff2cf
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 0000 (D) : 1b36 000d    serial bus [ usb-xhci ]
No NVRAM common partition, re-initializing...
Scanning USB
  XHCI: Initializing
    USB Storage
       SCSI: Looking for devices
          101000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
Using default console: /vdevice/vty@71000000

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Trying to load:  from: /pci@800000020000000/usb@0/storage@1/disk@101000000000000 ...
E3405: No such device

E3407: Load failed

  Type 'boot' and press return to continue booting the system.
  Type 'reset-all' and press return to reboot the system.

Ready!
0 >

The device tree handed over by QEMU to SLOF indeed contains:

qemu,boot-list =
	"/pci@800000020000000/usb@0/storage@1/disk@101000000000000 HALT";

but the device node is named usb-xhci@0, not usb@0.

This happens because the firmware names of PCI devices returned
by get_boot_devices_list() come from pcibus_get_fw_dev_path(),
while the sPAPR PHB code uses a different naming scheme for
device nodes. This inconsistency has always been there but it was
hidden for a long time because SLOF used to rename USB device
nodes, until this commit, merged in QEMU 4.2.0 :

commit 85164ad4ed
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date:   Wed Sep 11 16:24:32 2019 +1000

    pseries: Update SLOF firmware image

    This fixes USB host bus adapter name in the device tree to match QEMU's
    one.

    Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
    Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Fortunately, sPAPR implements the firmware path provider interface.
This provides a way to override the default firmware paths.

Just factor out the sPAPR PHB naming logic from spapr_dt_pci_device()
to a helper, and use it in the sPAPR firmware path provider hook.

Fixes: 85164ad4ed ("pseries: Update SLOF firmware image")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210122170157.246374-1-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza
a85bb34e1c spapr.c: add 'name' property for hotplugged CPUs nodes
In the CPU hotunplug bug [1] the guest kernel throws a scary
message in dmesg:

pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16

The reason isn't related to the bug though. This happens because the
kernel file arch/powerpc/platform/pseries/hotplug-cpu.c, function
dlpar_cpu_remove(), is not finding the device_node.name of the offending
CPU.

We're not populating the 'name' property for hotplugged CPUs. Since the
kernel relies on device_node.name for identifying CPU nodes, and the
CPUs that are coldplugged has the 'name' property filled by SLOF, this
is creating an unneeded inconsistency between hotplug and coldplug CPUs
in the kernel.

Let's fill the 'name' property for hotplugged CPUs as well. This will
make the guest dmesg throws a less intimidating message when we try to
unplug the last online CPU:

pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9@1, rc: -16

[1] https://bugzilla.redhat.com/1911414

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:49 +11:00
Daniel Henrique Barboza
7265bc3e54 spapr.c: use g_auto* with 'nodename' in CPU DT functions
Next patch will use the 'nodename' string in spapr_core_dt_populate()
after the point it's being freed today.

Instead of moving 'g_free(nodename)' around, let's do a QoL change in
both CPU DT functions where 'nodename' is being freed, and use
g_autofree to avoid the 'g_free()' call altogether.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:49 +11:00
Peter Maydell
1214d55d1c Emulated NVMe device updates
* deallocate or unwritten logical block error feature (me)
   * dataset management command (me)
   * compare command (Gollu Appalanaidu)
   * namespace types (Niklas Cassel)
   * zoned namespaces (Dmitry Fomichev)
   * smart critical warning toggle (Zhenwei Pi)
   * allow cmb and pmr to coexist (me)
   * pmr rds/wds support (Naveen Nagar)
   * cmb v1.4 logic (Padmakar Kalghatgi)
 
 And a lot of smaller fixes from Gollu Appalanaidu and Minwoo Im.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmAiON4ACgkQTeGvMW1P
 DenhxQf/WzoiO5bXmvZeRv+IHoIn5GhJ1NxLRYO5MX0diswh0BBwUmIscbEDMe1b
 GsD6rpd9YfEO80L/sEGqgV09HT+6e8YwDsNNZBTAsUNRVx0WxckgcGWcrzNNA9Nc
 WEl0Q8si5USSQ1C7djplLXdR6p4pbA/gIk6AjNIo3q2VK1ZqCBhQGESGEfGgrAXW
 xWo8C1V8dnKdxUYI2blbti44sElHZJ6jcF5N3Xmv0UUa1WL0hh0u6qr7IbCZe1kO
 SUFWMIGLF+1C35MUyWpgCjCn5cUdnTA0s/SLEuWtDlNYRhRRh0D6LZviVTZi38Wx
 6Cxg/bRkSlcKo1/jswwYcAaH7qQ4Eg==
 =NB9D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging

Emulated NVMe device updates

  * deallocate or unwritten logical block error feature (me)
  * dataset management command (me)
  * compare command (Gollu Appalanaidu)
  * namespace types (Niklas Cassel)
  * zoned namespaces (Dmitry Fomichev)
  * smart critical warning toggle (Zhenwei Pi)
  * allow cmb and pmr to coexist (me)
  * pmr rds/wds support (Naveen Nagar)
  * cmb v1.4 logic (Padmakar Kalghatgi)

And a lot of smaller fixes from Gollu Appalanaidu and Minwoo Im.

# gpg: Signature made Tue 09 Feb 2021 07:25:18 GMT
# gpg:                using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9
# gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown]
# gpg:                 aka "Klaus Jensen <k.jensen@samsung.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468  4272 63D5 6FC5 E55D A838
#      Subkey fingerprint: 5228 33AA 75E2 DCE6 A247  66C0 4DE1 AF31 6D4F 0DE9

* remotes/nvme/tags/nvme-next-pull-request: (56 commits)
  hw/block/nvme: refactor the logic for zone write checks
  hw/block/nvme: fix zone boundary check for append
  hw/block/nvme: fix wrong parameter name 'cross_read'
  hw/block/nvme: align with existing style
  hw/block/nvme: fix set feature save field check
  hw/block/nvme: fix set feature for error recovery
  hw/block/nvme: error if drive less than a zone size
  hw/block/nvme: lift cmb restrictions
  hw/block/nvme: bump to v1.4
  hw/block/nvme: move cmb logic to v1.4
  hw/block/nvme: add PMR RDS/WDS support
  hw/block/nvme: disable PMR at boot up
  hw/block/nvme: remove redundant zeroing of PMR registers
  hw/block/nvme: rename PMR/CMB shift/mask fields
  hw/block/nvme: allow cmb and pmr to coexist
  hw/block/nvme: move msix table and pba to BAR 0
  hw/block/nvme: indicate CMB support through controller capabilities register
  hw/block/nvme: fix 64 bit register hi/lo split writes
  hw/block/nvme: add size to mmio read/write trace events
  hw/block/nvme: trigger async event during injecting smart warning
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-09 13:24:37 +00:00
Peter Maydell
41d306ec7d * Fuzzing improvements (Qiuhao, Alexander)
* i386: Fix BMI decoding for instructions with the 0x66 prefix (David)
 * initial attempt at fixing event_notifier emulation (Maxim)
 * i386: PKS emulation, fix for "qemu-system-i386 -cpu host" (myself)
 * meson: RBD test fixes (myself)
 * meson: TCI warnings (Philippe)
 * Leaner build for --disable-guest-agent, --disable-system and
   --disable-tools (Philippe, Stefan)
 * --enable-tcg-interpreter fix (Richard)
 * i386: SVM feature bits (Wei)
 * KVM bugfix (Thomas H.)
 * Add missing MemoryRegionOps callbacks (PJP)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmAhR4cUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOviAf/Ymk/KwHZKySVqHOjDZnvP5PXMAru
 p1zlMRLAorXK+CTbshkIliaQyD8ggzT4HCinJ2NisdfTWMmlWbgr8gahNqKyZ5UG
 HlL28va3dvGhelswh/CNso1ZhVb2Q+aAYn/c6LXQva2r0xi26ohJTkIkSCPP/bnI
 +73dGzwAilBOsBVbn4cCm/70XtwDpPkw41IZIDoy/4lhL8ZdpHMz8oOjNIlOdlcU
 aEDfM8vYE4C70OtUlRZ1OwVxzcjS1Bf6dQYcpg5gAKy/jAAqR+v2PStxXiUuj5D3
 cAzd03Goh78Wcre+CbWxDKGcGtiooUT+J09wmvDPYVUHcpQMbumf4MufrQ==
 =INB5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* Fuzzing improvements (Qiuhao, Alexander)
* i386: Fix BMI decoding for instructions with the 0x66 prefix (David)
* initial attempt at fixing event_notifier emulation (Maxim)
* i386: PKS emulation, fix for "qemu-system-i386 -cpu host" (myself)
* meson: RBD test fixes (myself)
* meson: TCI warnings (Philippe)
* Leaner build for --disable-guest-agent, --disable-system and
  --disable-tools (Philippe, Stefan)
* --enable-tcg-interpreter fix (Richard)
* i386: SVM feature bits (Wei)
* KVM bugfix (Thomas H.)
* Add missing MemoryRegionOps callbacks (PJP)

# gpg: Signature made Mon 08 Feb 2021 14:15:35 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (46 commits)
  target/i386: Expose VMX entry/exit load pkrs control bits
  target/i386: Add support for save/load IA32_PKRS MSR
  imx7-ccm: add digprog mmio write method
  tz-ppc: add dummy read/write methods
  spapr_pci: add spapr msi read method
  nvram: add nrf51_soc flash read method
  prep: add ppc-parity write method
  vfio: add quirk device write method
  pci-host: designware: add pcie-msi read method
  hw/pci-host: add pci-intack write method
  cpu-throttle: Remove timer_mod() from cpu_throttle_set()
  replay: rng-builtin support
  pc-bios/descriptors: fix paths in json files
  replay: fix replay of the interrupts
  accel/kvm/kvm-all: Fix wrong return code handling in dirty log code
  qapi/meson: Restrict UI module to system emulation and tools
  qapi/meson: Restrict system-mode specific modules
  qapi/meson: Remove QMP from user-mode emulation
  qapi/meson: Restrict qdev code to system-mode emulation
  meson: Restrict emulation code
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-09 10:04:51 +00:00
Peter Maydell
34b7d4193e Integration testing patches
Tests added:
 - Armbian 20.08 on Orange Pi PC (Philippe)
 - MPC8544ds machine (Thomas)
 - Virtex-ml507 ppc machine (Thomas)
 - Re-enable the microblaze test (Thomas)
 
 Various fixes and documentation improvements from Cleber.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmAhnMAACgkQ4+MsLN6t
 wN4TuA/+Lsk5Sxpe5+3eEaXUAo0v9jHm0Nqh+fbvf/aGkcIbNh0Un3eNpLiv44vE
 vdJlFsfWvwomHylyjPnINBnBjbdIHCC0KdHl6k1YEHVONWpL8t0LyT/eQNXnH4lO
 I6m1uhqR6L5bBIl8gyaQ4PNvDcKtADwqCQUSjuMpSjLzuiwbiwI4Jw137blq9pEm
 zf+LEsD+AZ0zWA1q1bOkRkHDEXaorcUuPGv/es7Xy3tELoXTEBNLEOcu/hNcwimE
 C+X6fyJPCkUxicCRN4X29MEXUgBJin070f/0fxUO3BGpnx1N2lSK5uJwUQ1FilSP
 yPZ6wu9+RakvABuuWlO/mKRR6y9lPhTM2o1niL/hqrVN9DaIysFNYGXosJZ7Cmwq
 KTwcU8fWBT6/66L+4cKxRO7iUHRHeedu+tIqXEjMHkpBpMe7mrVVmcORp8i12Eca
 JVCpjedrX+7X/tWqUKSf63UJrTSFdsdUsdlxWd6uHl5PISJkMWwPsOKxHdMmASqr
 dJmSmCSZ/Gz6+LrmJcB/5/on6NjkiuQ/wFJIkFF1ekhwXJ1Pg/POd1yB7GJd3GZJ
 6XzB2gwNVn3lcuL6kBLHeInUXqDvNAApuLfjHprWQ5N469hCpbXobSqP7Bmvu9Ei
 I56V6EOeLZYb3MYBhMGETz7mJ+zsanYisBPO7gL0ApuPwTQDNdk=
 =mUpx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/philmd-gitlab/tags/integration-testing-20210208' into staging

Integration testing patches

Tests added:
- Armbian 20.08 on Orange Pi PC (Philippe)
- MPC8544ds machine (Thomas)
- Virtex-ml507 ppc machine (Thomas)
- Re-enable the microblaze test (Thomas)

Various fixes and documentation improvements from Cleber.

# gpg: Signature made Mon 08 Feb 2021 20:19:12 GMT
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* remotes/philmd-gitlab/tags/integration-testing-20210208:
  Acceptance Tests: remove unnecessary tag from documentation example
  Acceptance tests: clarify ssh connection failure reason
  tests/acceptance/virtiofs_submounts: required space between IP and port
  tests/acceptance/virtiofs_submounts: standardize port as integer
  tests/acceptance/virtiofs_submounts: use a virtio-net device instead
  tests/acceptance/virtiofs_submounts: do not ask for ssh key password
  tests/acceptance/virtiofs_submounts: use workdir property
  tests/acceptance/boot_linux: rename misleading cloudinit method
  tests/acceptance/boot_linux: fix typo on cloudinit error message
  tests/acceptance: Re-enable the microblaze test
  tests/acceptance: Add a test for the virtex-ml507 ppc machine
  tests/acceptance: Test the mpc8544ds machine
  tests/acceptance: Move the pseries test to a separate file
  tests/acceptance: Test U-Boot/Linux from Armbian 20.08 on Orange Pi PC
  tests/acceptance: Extract do_test_arm_orangepi_armbian_uboot() method
  tests/acceptance: Introduce tesseract_ocr() helper
  tests/acceptance: Extract tesseract_available() helper in new namespace

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-08 20:22:54 +00:00
Klaus Jensen
3e22762edc hw/block/nvme: refactor the logic for zone write checks
Refactor the zone write check logic such that the most "meaningful"
error is returned first. That is, first, if the zone is not writable,
return an appropriate status code for that. Then, make sure we are
actually writing at the write pointer and finally check that we do not
cross the zone write boundary. This aligns with the "priority" of status
codes for zone read checks.

Also add a couple of additional descriptive trace events and remove an
always true assert.

Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Klaus Jensen
a679dc3efd hw/block/nvme: fix zone boundary check for append
When a zone append is processed the controller checks that validity of
the write before assigning the LBA to the append command. This causes
the boundary check to be wrong.

Fix this by checking the write *after* assigning the LBA. Remove the
append special case from the nvme_check_zone_write and open code it in
nvme_do_write, assigning the slba when basic sanity checks have been
performed. Then check the validity of the resulting write like any other
write command.

In the process, also fix a missing endianness conversion for the zone
append ALBA.

Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Minwoo Im
74cbbf3031 hw/block/nvme: fix wrong parameter name 'cross_read'
The actual parameter name is 'cross_read' rather than 'cross_zone_read'.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Gollu Appalanaidu
74eb89219e hw/block/nvme: align with existing style
Change status checks to align with the existing style and remove the
explicit check against NVME_SUCCESS.

Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Gollu Appalanaidu
0065f42ef1 hw/block/nvme: fix set feature save field check
Currently, no features are saveable, so the current check is not wrong,
but add a check against the feature capabilities to make sure this will
not regress if saveable features are added later.

Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Gollu Appalanaidu
56990c777a hw/block/nvme: fix set feature for error recovery
Only enable DULBE if the namespace supports it.

Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Minwoo Im
044f1876b0 hw/block/nvme: error if drive less than a zone size
If a user assigns a backing device with less capacity than the size of a
single zone, the namespace capacity will be reported as zero and the
kernel will silently fail to allocate the namespace.

This patch errors out in case that the backing device cannot accomodate
at least a single zone.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: small fixup in the error and commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Klaus Jensen
38001f7340 hw/block/nvme: lift cmb restrictions
The controller now implements v1.4 and we can lift the restrictions on
CMB Data Pointer and Command Independent Locations Support (CDPCILS) and
CMB Data Pointer Mixed Locations Support (CDPMLS) since the device
really does not care about mixed host/cmb pointers in those cases.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Klaus Jensen
c2a3640de8 hw/block/nvme: bump to v1.4
With the new CMB logic in place, bump the implemented specification
version to v1.4 by default.

This requires adding the setting the CNTRLTYPE field and modifying the
VWC field since 0x00 is no longer a valid value for bits 2:1.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Padmakar Kalghatgi
f4319477b4 hw/block/nvme: move cmb logic to v1.4
Implement v1.4 logic for configuring the Controller Memory Buffer. By
default, the v1.4 scheme will be used (CMB must be explicitly enabled by
the host), so drivers that only support v1.3 will not be able to use the
CMB anymore.

To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device
parameter.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:54 +01:00
Naveen Nagar
7ec9f2eef9 hw/block/nvme: add PMR RDS/WDS support
Add support for the PMRMSCL and PMRMSCU MMIO registers. This allows
adding RDS/WDS support for PMR as well.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
75c3c9de96 hw/block/nvme: disable PMR at boot up
The PMR should not be enabled at boot up. Disable the PMR MemoryRegion
initially and implement MMIO for PMRCTL, allowing the host to enable the
PMR explicitly.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
b78b9bb0ee hw/block/nvme: remove redundant zeroing of PMR registers
The controller registers are initially zero. Remove the redundant
zeroing.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
8e9e8b4821 hw/block/nvme: rename PMR/CMB shift/mask fields
Use the correct field names.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
709cc8fc68 hw/block/nvme: allow cmb and pmr to coexist
With BAR 4 now free to use, allow PMR and CMB to be enabled
simultaneously.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
1901b4967c hw/block/nvme: move msix table and pba to BAR 0
In the interest of supporting both CMB and PMR to be enabled on the same
device, move the MSI-X table and pending bit array out of BAR 4 and into
BAR 0.

This is a simplified version of the patch contributed by Andrzej
Jakowski (see [1]). Leaving the CMB at offset 0 removes the need for
changes to CMB address mapping code.

  [1]: https://lore.kernel.org/qemu-devel/20200729220107.37758-3-andrzej.jakowski@linux.intel.com/

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Andrzej Jakowski
c705063129 hw/block/nvme: indicate CMB support through controller capabilities register
This patch sets CMBS bit in controller capabilities register when user
configures NVMe driver with CMB support, so capabilites are correctly
reported to guest OS.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Maxim Levitsky <mlevitsky@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
0d3d5da2cc hw/block/nvme: fix 64 bit register hi/lo split writes
64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32
bit write combination as well as a plain 64 bit write. The spec does not
define ordering on the hi/lo split, but the code currently assumes that
the low order bits are written first. Additionally, the code does not
consider that another address might already have been written into the
register, causing the OR'ing to result in a bad address.

Fix this by explicitly overwriting only the low or high order bits for
32 bit writes.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-02-08 21:15:53 +01:00
Klaus Jensen
ffacaf0908 hw/block/nvme: add size to mmio read/write trace events
Add the size of the mmio read/write to the trace event.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
zhenwei pi
c62720f137 hw/block/nvme: trigger async event during injecting smart warning
During smart critical warning injection by setting property from QMP
command, also try to trigger asynchronous event.

Suggested by Keith, if a event has already been raised, there is no
need to enqueue the duplicate event any more.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
[k.jensen: fix typo in commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
zhenwei pi
4714791b66 hw/block/nvme: add smart_critical_warning property
There is a very low probability that hitting physical NVMe disk
hardware critical warning case, it's hard to write & test a monitor
agent service.

For debugging purposes, add a new 'smart_critical_warning' property
to emulate this situation.

The orignal version of this change is implemented by adding a fixed
property which could be initialized by QEMU command line. Suggested
by Philippe & Klaus, rework like current version.

Test with this patch:
1, change smart_critical_warning property for a running VM:
 #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
  "arguments": { "path": "/machine/peripheral-anon/device[0]",
  "property": "smart_critical_warning", "value":16 } }'
2, run smartctl in guest
 #smartctl -H -l error /dev/nvme0n1

  === START OF SMART DATA SECTION ===
  SMART overall-health self-assessment test result: FAILED!
  - volatile memory backup device has failed

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
zhenwei pi
c6d1b5c13b nvme: introduce bit 5 for critical warning
According to NVM Express v1.4, Section 5.14.1.2 ("SMART / Health
Information"), introduce bit 5 for "Persistent Memory Region has become
read-only or unreliable".

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
[k.jensen: minor brush ups in commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
635b23ad43 hw/block/nvme: fix zone write finalize
The zone write pointer is unconditionally advanced, even for write
faults. Make sure that the zone is always transitioned to Full if the
write pointer reaches zone capacity.

Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Minwoo Im
24ec776a5a hw/block/nvme: remove unused argument in nvme_ns_setup
nvme_ns_setup() finally does not have nothing to do with NvmeCtrl
instance.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Minwoo Im
15d024d4aa hw/block/nvme: split setup and register for namespace
In NVMe, namespace is being attached to process I/O.  We register NVMe
namespace to a controller via nvme_register_namespace() during
nvme_ns_setup().  This is main reason of receiving NvmeCtrl object
instance to this function to map the namespace to a controller.

To make namespace instance more independent, it should be split into two
parts: setup and register.  This patch split them into two differnt
parts, and finally nvme_ns_setup() does not have nothing to do with
NvmeCtrl instance at all.

This patch is a former patch to introduce NVMe subsystem scheme to the
existing design especially for multi-path.  In that case, it should be
split into two to make namespace independent from a controller.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Minwoo Im
337ccd7650 hw/block/nvme: remove unused argument in nvme_ns_init_blk
Removed no longer used aregument NvmeCtrl object in nvme_ns_init_blk().

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Minwoo Im
aa5e55e3b0 hw/block/nvme: open code for volatile write cache
Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the
initial time.  This feature is related to block device backed,  but this
feature is controlled in controller level via Set/Get Features command.

This patch removed dependency between nvme and nvme-ns to manage the VWC
flag value.  Also, it open coded the Get Features for VWC to check all
namespaces attached to the controller, and if false detected, return
directly false.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: report write cache preset if present on ANY namespace]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Minwoo Im
1490be5a8a hw/block/nvme: remove unused argument in nvme_ns_init_zoned
nvme_ns_init_zoned() has no use for given NvmeCtrl object.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Dmitry Fomichev
add961300c hw/block/nvme: Correct error status for unaligned ZA
TP 4053 says (in section 2.3.1.1) -
... if a Zone Append command specifies a ZSLBA that is not the lowest
logical block address in that zone, then the controller shall abort
that command with a status code of Invalid Field In Command.

In the code, Zone Invalid Write is returned instead, fix this.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
521ea778b2 hw/block/nvme: remove unnecessary check for append
nvme_io_cmd already checks if the namespace supports the Zone Append
command, so the removed check is dead code.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
cd42771a33 hw/block/nvme: add missing string representations for commands
Add missing string representations for a couple of new commands.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
5f5dc4c6a9 hw/block/nvme: zero out zones on reset
The zoned command set specification states that "All logical blocks in a
zone *shall* be marked as deallocated when [the zone is reset]". Since
the device guarantees 0x00 to be read from deallocated blocks we have to
issue a pwrite_zeroes since we cannot be sure that a discard will do
anything. But typically, this will be achieved with an efficient
unmap/discard operation.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
b05fde2881 hw/block/nvme: enum style fix
Align with existing style and use a typedef for header-file enums.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
5720669605 hw/block/nvme: merge implicitly/explicitly opened processing masks
Implicitly and explicitly opended zones are always bulk processed
together, so merge the two processing masks.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
165f134f3d hw/block/nvme: fix shutdown/reset logic
A shutdown is only about flushing stuff. It is the host that should
delete any queues, so do not perform a reset here.

Also, on shutdown, make sure that the PMR is flushed if in use.

Fixes: 368f4e752cf9 ("hw/block/nvme: Process controller reset and shutdown differently")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
2021-02-08 21:15:53 +01:00
Klaus Jensen
1b5804a80d hw/block/nvme: conditionally enable DULBE for zoned namespaces
The device uses the BDRV_BLOCK_ZERO flag to determine the "deallocated"
status of logical blocks. Since the zoned namespaces command set
specification defines that logical blocks SHALL be marked as deallocated
when the zone is in the Empty or Offline states, DULBE can only be
supported if the zone size is a multiple of the calculated deallocation
granularity (reported in NPDG) which depends on the underlying block
device cluster size (if applicable) or the configured
discard_granularity.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:15:49 +01:00
Klaus Jensen
55886345d0 hw/block/nvme: fix for non-msix machines
Commit 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar()
return value") had the unintended effect of breaking support on
several platforms not supporting MSI-X.

Still check for errors, but only report that MSI-X is unsupported
instead of bailing out.

Fixes: 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar() return value")
Fixes: fbf2e5375e ("hw/block/nvme: Verify msix_vector_use() returned value")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:05:28 +01:00
Dmitry Fomichev
00dd640dff hw/block/nvme: Document zoned parameters in usage text
Added brief descriptions of the new device properties that are
now available to users to configure features of Zoned Namespace
Command Set in the emulator.

This patch is for documentation only, no functionality change.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:05:28 +01:00
Dmitry Fomichev
1a9290ade3 hw/block/nvme: Support Zone Descriptor Extensions
Zone Descriptor Extension is a label that can be assigned to a zone.
It can be set to an Empty zone and it stays assigned until the zone
is reset.

This commit adds a new optional module property,
"zoned.descr_ext_size". Its value must be a multiple of 64 bytes.
If this value is non-zero, it becomes possible to assign extensions
of that size to any Empty zones. The default value for this property
is 0, therefore setting extensions is disabled by default.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:05:28 +01:00
Dmitry Fomichev
8d18ddcd22 hw/block/nvme: Introduce max active and open zone limits
Add two module properties, "zoned.max_active" and "zoned.max_open"
to control the maximum number of zones that can be active or open.
Once these variables are set to non-default values, these limits are
checked during I/O and Too Many Active or Too Many Open command status
is returned if they are exceeded.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-02-08 21:05:28 +01:00