Commit Graph

75 Commits

Author SHA1 Message Date
Jonathan Cameron
a82fe82916 hw/acpi: Generic Port Affinity Structure support
These are very similar to the recently added Generic Initiators
but instead of representing an initiator of memory traffic they
represent an edge point beyond which may lie either targets or
initiators.  Here we add these ports such that they may
be targets of hmat_lb records to describe the latency and
bandwidth from host side initiators to the port.  A discoverable
mechanism such as UEFI CDAT read from CXL devices and switches
is used to discover the remainder of the path, and the OS can build
up full latency and bandwidth numbers as need for work and data
placement decisions.

Acked-by: Markus Armbruster <armbru@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916174122.1843197-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-11-04 16:03:24 -05:00
Markus Armbruster
01bed0ff14 qapi: Refill doc comments to conform to conventions
Sweep the entire documentation again.  Last done in commit
209e64d9ed (qapi: Refill doc comments to conform to current
conventions).

To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3".  Finds no
differences.  Comparing with diff is not useful, as the reflown
paragraphs are visible there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240729065220.860163-1-armbru@redhat.com>
[Straightforward conflict with commit 442110bc6f resolved]
2024-08-05 09:31:51 +02:00
Stefano Garzarella
657ea58ba3 qapi/qom: make some QOM properties depend on the build settings
Some QOM properties are associated with ObjectTypes that already
depend on CONFIG_* switches. So to avoid generating dead code,
let's also make the definition of those properties dependent on
the corresponding CONFIG_*.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20240604135931.311709-1-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Make SecretKeyringProperties conditional, too]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2024-07-29 07:29:47 +02:00
Richard Henderson
e6485190f7 QAPI patches patches for 2024-07-17
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmaXoUESHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTatYP/jPlsmx8S6X397COQf6Wd4oEFQEMo/FS
 tWFiHWenPUZ56U3O3lDNIw+5URbhF4aUpxhLGg6cmkrOwK0zPjARI2UNnUnZvPtN
 EHf//KJOpYLsSdkIlIW2nYzB27ps0DRf5PgOGdOOdW32Nuq93FLx7ChDgbpmrijc
 HzByyJIn1QEv/G0aOMLCuTPA7LpGjCAd2a/LjWYpSXB3WGizrS2Rrat7oJYUl8Rz
 mAPgdiE0aH2yWHOTcWabKfN4AsIHCnv7qNOZkumoWpZ0XULbgyK1OO05ju3jRSrB
 0WiwiE8pEhHz7qstKGcjS1c7pPuId64ubm3RAZ1RUqVvA5TZGucwuYiuQHUVX6jH
 BGzpfojISFzIfTiKemyfqBb1gjXjxT6OIlCtmlJSUCJohb70f0fhX3vniyhzyl1d
 fFTM0jMbmBX89e/o3j6ZXa7anafYNDh5TjTK4BYeAXRqe+jZpvDJUrwu1OZIq1cd
 Wr1RR8qaawpfjD5r9SXu1mX5MPCX4SmNVNoQ7N4ruWjpVojQNmuCRW9yLPIv3yTH
 c5ESND4zdvceW5EF9f5GSIFwnIdGqnUwJyBMcULGoCxz1HougQmGR4bhqSkEl6RD
 GRK+bj3pLdj9f/en62mE6+f5rkEJye3Y5fJ5dn9+Ld09PeUtY59YKnJGg896g55V
 /pGOUWf3L4iY
 =E0F5
 -----END PGP SIGNATURE-----

Merge tag 'pull-qapi-2024-07-17' of https://repo.or.cz/qemu/armbru into staging

QAPI patches patches for 2024-07-17

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmaXoUESHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTatYP/jPlsmx8S6X397COQf6Wd4oEFQEMo/FS
# tWFiHWenPUZ56U3O3lDNIw+5URbhF4aUpxhLGg6cmkrOwK0zPjARI2UNnUnZvPtN
# EHf//KJOpYLsSdkIlIW2nYzB27ps0DRf5PgOGdOOdW32Nuq93FLx7ChDgbpmrijc
# HzByyJIn1QEv/G0aOMLCuTPA7LpGjCAd2a/LjWYpSXB3WGizrS2Rrat7oJYUl8Rz
# mAPgdiE0aH2yWHOTcWabKfN4AsIHCnv7qNOZkumoWpZ0XULbgyK1OO05ju3jRSrB
# 0WiwiE8pEhHz7qstKGcjS1c7pPuId64ubm3RAZ1RUqVvA5TZGucwuYiuQHUVX6jH
# BGzpfojISFzIfTiKemyfqBb1gjXjxT6OIlCtmlJSUCJohb70f0fhX3vniyhzyl1d
# fFTM0jMbmBX89e/o3j6ZXa7anafYNDh5TjTK4BYeAXRqe+jZpvDJUrwu1OZIq1cd
# Wr1RR8qaawpfjD5r9SXu1mX5MPCX4SmNVNoQ7N4ruWjpVojQNmuCRW9yLPIv3yTH
# c5ESND4zdvceW5EF9f5GSIFwnIdGqnUwJyBMcULGoCxz1HougQmGR4bhqSkEl6RD
# GRK+bj3pLdj9f/en62mE6+f5rkEJye3Y5fJ5dn9+Ld09PeUtY59YKnJGg896g55V
# /pGOUWf3L4iY
# =E0F5
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 17 Jul 2024 08:47:29 PM AEST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]

* tag 'pull-qapi-2024-07-17' of https://repo.or.cz/qemu/armbru:
  qapi: remove "Example" doc section
  qapi: convert "Example" sections with longer prose
  qapi: convert "Example" sections with titles
  qapi: convert "Example" sections without titles
  docs/sphinx: add CSS styling for qmp-example directive
  docs/qapidoc: add QMP highlighting to annotated qmp-example blocks
  docs/qapidoc: create qmp-example directive
  docs/qapidoc: factor out do_parse()
  qapi/ui: Drop note on naming of SpiceQueryMouseMode
  qapi/sockets: Move deprecation note out of SocketAddress doc comment
  qapi/machine: Clarify query-uuid value when none has been specified
  qapi/machine: Clean up documentation around CpuInstanceProperties
  qapi/pci: Clean up documentation around PciDeviceClass
  qapi/qom: Document feature unstable of @x-vfio-user-server

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-18 10:06:35 +10:00
John Snow
a9eab6e2e6 qapi: convert "Example" sections with titles
When an Example section has a brief explanation, convert it to a
qmp-example:: section using the :title: option.

Rule of thumb: If the title can fit on a single line and requires no rST
markup, it's a good candidate for using the :title: option of
qmp-example.

In this patch, trailing punctuation is removed from the title section
for consistent headline aesthetics. In just one case, specifics of the
example are removed to make the title read better.

See commit-4: "docs/qapidoc: create qmp-example directive", for a
              detailed explanation of this custom directive syntax.

See commit+2: "qapi: remove "Example" doc section" for a detailed
              explanation of why.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240717021312.606116-8-jsnow@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2024-07-17 10:20:53 +02:00
John Snow
14b48aaab9 qapi: convert "Example" sections without titles
Use the no-option form of ".. qmp-example::" to convert any Examples
that do not have any form of caption or explanation whatsoever. Note
that in a few cases, example sections are split into two or more
separate example blocks. This is only done stylistically to create a
delineation between two or more logically independent examples.

See commit-3: "docs/qapidoc: create qmp-example directive", for a
              detailed explanation of this custom directive syntax.

See commit+3: "qapi: remove "Example" doc section" for a detailed
              explanation of why.

Note: an empty "TODO" line was added to announce-self to keep the
example from floating up into the body; this will be addressed more
rigorously in the new qapidoc generator.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20240717021312.606116-7-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Markup fixed in one place]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2024-07-17 10:20:53 +02:00
Markus Armbruster
3becc93908 qapi/qom: Document feature unstable of @x-vfio-user-server
Commit 8f9a9259d3 added ObjectType member @x-vfio-user-server with
feature unstable, but neglected to explain why it is unstable.  Do
that now.

Fixes: 8f9a9259d3 (vfio-user: define vfio-user-server object)
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Cc: John G Johnson <john.g.johnson@oracle.com>
Cc: Jagannathan Raman <jag.raman@oracle.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240703095310.1242102-1-armbru@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
[Indentation fixed]
2024-07-17 10:14:44 +02:00
Michael Roth
9d38d9dca2 i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT
Currently if the 'legacy-vm-type' property of the sev-guest object is
'on', QEMU will attempt to use the newer KVM_SEV_INIT2 kernel
interface in conjunction with the newer KVM_X86_SEV_VM and
KVM_X86_SEV_ES_VM KVM VM types.

This can lead to measurement changes if, for instance, an SEV guest was
created on a host that originally had an older kernel that didn't
support KVM_SEV_INIT2, but is booted on the same host later on after the
host kernel was upgraded.

Instead, if legacy-vm-type is 'off', QEMU should fail if the
KVM_SEV_INIT2 interface is not provided by the current host kernel.
Modify the fallback handling accordingly.

In the future, VMSA features and other flags might be added to QEMU
which will require legacy-vm-type to be 'off' because they will rely
on the newer KVM_SEV_INIT2 interface. It may be difficult to convey to
users what values of legacy-vm-type are compatible with which
features/options, so as part of this rework, switch legacy-vm-type to a
tri-state OnOffAuto option. 'auto' in this case will automatically
switch to using the newer KVM_SEV_INIT2, but only if it is required to
make use of new VMSA features or other options only available via
KVM_SEV_INIT2.

Defining 'auto' in this way would avoid inadvertantly breaking
compatibility with older kernels since it would only be used in cases
where users opt into newer features that are only available via
KVM_SEV_INIT2 and newer kernels, and provide better default behavior
than the legacy-vm-type=off behavior that was previously in place, so
make it the default for 9.1+ machine types.

Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
cc: kvm@vger.kernel.org
Signed-off-by: Michael Roth <michael.roth@amd.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240710041005.83720-1-michael.roth@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 10:45:06 +02:00
John Snow
d461c27973 qapi: convert "Note" sections to plain rST
We do not need a dedicated section for notes. By eliminating a specially
parsed section, these notes can be treated as normal rST paragraphs in
the new QMP reference manual, and can be placed and styled much more
flexibly.

Convert all existing "Note" and "Notes" sections to pure rST. As part of
the conversion, capitalize the first letter of each sentence and add
trailing punctuation where appropriate to ensure notes look sensible and
consistent in rendered HTML documentation. Markup is also re-aligned to
the de-facto standard of 3 spaces for directives.

Update docs/devel/qapi-code-gen.rst to reflect the new paradigm, and
update the QAPI parser to prohibit "Note" sections while suggesting a
new syntax. The exact formatting to use is a matter of taste, but a good
candidate is simply:

.. note:: lorem ipsum ...
   ... dolor sit amet ...
   ... consectetur adipiscing elit ...

... but there are other choices, too. The Sphinx readthedocs theme
offers theming for the following forms (capitalization unimportant); all
are adorned with a (!) symbol () in the title bar for rendered HTML
docs.

See
https://sphinx-rtd-theme.readthedocs.io/en/stable/demo/demo.html#admonitions
for examples of each directive/admonition in use.

These are rendered in orange:

.. Attention:: ...
.. Caution:: ...
.. WARNING:: ...

These are rendered in red:

.. DANGER:: ...
.. Error:: ...

These are rendered in green:

.. Hint:: ...
.. Important:: ...
.. Tip:: ...

These are rendered in blue:

.. Note:: ...
.. admonition:: custom title

   admonition body text

This patch uses ".. note::" almost everywhere, with just two "caution"
directives. Several instances of "Notes:" have been converted to
merely ".. note::", or multiple ".. note::" where appropriate.
".. admonition:: notes" is used in a few places where we had an
ordered list of multiple notes that would not make sense as
standalone/separate admonitions.  Two "Note:" following "Example:"
have been turned into ordinary paragraphs within the example.

NOTE: Because qapidoc.py does not attempt to preserve source ordering of
sections, the conversion of Notes from a "tagged section" to an
"untagged section" means that rendering order for some notes *may
change* as a result of this patch. The forthcoming qapidoc.py rewrite
strictly preserves source ordering in the rendered documentation, so
this issue will be rectified in the new generator.

Signed-off-by: John Snow <jsnow@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [for block*.json]
Message-ID: <20240626222128.406106-11-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message clarified slightly, period added to one more note]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2024-07-06 08:58:24 +02:00
Stefano Garzarella
4e647fa085 hostmem: add a new memory backend based on POSIX shm_open()
shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240618100519.145853-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-03 18:14:06 -04:00
Stefano Garzarella
0aa7f10c7a qapi: clarify that the default is backend dependent
The default value of the @share option of the @MemoryBackendProperties
really depends on the backend type, so let's document the default
values in the same place where we define the option to avoid
dispersing the information.

Cc: David Hildenbrand <david@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240618100043.144657-2-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-02 09:27:56 -04:00
Brijesh Singh
7b34df4426 i386/sev: Introduce 'sev-snp-guest' object
SEV-SNP support relies on a different set of properties/state than the
existing 'sev-guest' object. This patch introduces the 'sev-snp-guest'
object, which can be used to configure an SEV-SNP guest. For example,
a default-configured SEV-SNP guest with no additional information
passed in for use with attestation:

  -object sev-snp-guest,id=sev0

or a fully-specified SEV-SNP guest where all spec-defined binary
blobs are passed in as base64-encoded strings:

  -object sev-snp-guest,id=sev0, \
    policy=0x30000, \
    init-flags=0, \
    id-block=YWFhYWFhYWFhYWFhYWFhCg==, \
    id-auth=CxHK/OKLkXGn/KpAC7Wl1FSiisWDbGTEKz..., \
    author-key-enabled=on, \
    host-data=LNkCWBRC5CcdGXirbNUV1OrsR28s..., \
    guest-visible-workarounds=AA==, \

See the QAPI schema updates included in this patch for more usage
details.

In some cases these blobs may be up to 4096 characters, but this is
generally well below the default limit for linux hosts where
command-line sizes are defined by the sysconf-configurable ARG_MAX
value, which defaults to 2097152 characters for Ubuntu hosts, for
example.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (for QAPI schema)
Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Pankaj Gupta <pankaj.gupta@amd.com>
Message-ID: <20240530111643.1091816-8-pankaj.gupta@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05 11:01:06 +02:00
Michael Roth
16dcf200dc i386/sev: Introduce "sev-common" type to encapsulate common SEV state
Currently all SEV/SEV-ES functionality is managed through a single
'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this
same approach won't work well since some of the properties/state
managed by 'sev-guest' is not applicable to SEV-SNP, which will instead
rely on a new QOM type with its own set of properties/state.

To prepare for this, this patch moves common state into an abstract
'sev-common' parent type to encapsulate properties/state that are
common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific
properties/state in the current 'sev-guest' type. This should not
affect current behavior or command-line options.

As part of this patch, some related changes are also made:

  - a static 'sev_guest' variable is currently used to keep track of
    the 'sev-guest' instance. SEV-SNP would similarly introduce an
    'sev_snp_guest' static variable. But these instances are now
    available via qdev_get_machine()->cgs, so switch to using that
    instead and drop the static variable.

  - 'sev_guest' is currently used as the name for the static variable
    holding a pointer to the 'sev-guest' instance. Re-purpose the name
    as a local variable referring the 'sev-guest' instance, and use
    that consistently throughout the code so it can be easily
    distinguished from sev-common/sev-snp-guest instances.

  - 'sev' is generally used as the name for local variables holding a
    pointer to the 'sev-guest' instance. In cases where that now points
    to common state, use the name 'sev_common'; in cases where that now
    points to state specific to 'sev-guest' instance, use the name
    'sev_guest'

In order to enable kernel-hashes for SNP, pull it from
SevGuestProperties to its parent SevCommonProperties so
it will be available for both SEV and SNP.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Dov Murik <dovmurik@linux.ibm.com>
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema)
Co-developed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Pankaj Gupta <pankaj.gupta@amd.com>
Message-ID: <20240530111643.1091816-5-pankaj.gupta@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05 11:01:06 +02:00
Michael Roth
023267334d i386/sev: Add 'legacy-vm-type' parameter for SEV guest objects
QEMU will currently automatically make use of the KVM_SEV_INIT2 API for
initializing SEV and SEV-ES guests verses the older
KVM_SEV_INIT/KVM_SEV_ES_INIT interfaces.

However, the older interfaces will silently avoid sync'ing FPU/XSAVE
state to the VMSA prior to encryption, thus relying on behavior and
measurements that assume the related fields to be allow zero.

With KVM_SEV_INIT2, this state is now synced into the VMSA, resulting in
measurements changes and, theoretically, behaviorial changes, though the
latter are unlikely to be seen in practice.

To allow a smooth transition to the newer interface, while still
providing a mechanism to maintain backward compatibility with VMs
created using the older interfaces, provide a new command-line
parameter:

  -object sev-guest,legacy-vm-type=true,...

and have it default to false.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240409230743.962513-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Stefan Weil
f6822fee96 Fix some typos in documentation (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-04-02 13:38:40 +03:00
Markus Armbruster
5305a4eeb8 qapi: Correct documentation indentation and whitespace
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240322140910.328840-12-armbru@redhat.com>
[Add a previous patch's stray hunk]
2024-03-26 06:36:08 +01:00
Markus Armbruster
209e64d9ed qapi: Refill doc comments to conform to current conventions
For legibility, wrap text paragraphs so every line is at most 70
characters long.

To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3".  Finds no
differences.  Comparing with diff is not useful, as the refilled
paragraphs are visible there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240322140910.328840-11-armbru@redhat.com>
2024-03-26 06:36:08 +01:00
Ankit Agrawal
b64b7ed8bb qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.

Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator (GI) Affinity structures that allows
association between nodes and devices. Multiple GI structures per BDF is
possible, allowing creation of multiple nodes by exposing unique PXM in each
of these structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.

When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.

Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201   1]                       Length : 20

[0CAh 0202   1]                    Reserved1 : 00
[0CBh 0203   1]           Device Handle Type : 01
[0CCh 0204   4]             Proximity Domain : 00000007
[0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
[0E4h 0228   4]                    Reserved2 : 00000000

[0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233   1]                       Length : 20

An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.

For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:

-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \

-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \

Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-2-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12 17:56:55 -04:00
Markus Armbruster
53d5c36d8d qapi: Delete useless "Returns" sections
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240227113921.236097-6-armbru@redhat.com>
2024-03-04 07:12:40 +01:00
Markus Armbruster
2746f060be qapi: Move error documentation to new "Errors" sections
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240227113921.236097-5-armbru@redhat.com>
2024-03-04 07:12:40 +01:00
Markus Armbruster
d23055b8db qapi: Require descriptions and tagged sections to be indented
By convention, we indent the second and subsequent lines of
descriptions and tagged sections, except for examples.

Turn this into a hard rule, and apply it to examples, too.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240216145841.2099240-11-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[Straightforward conflicts in qapi/migration.json resolved]
2024-02-26 10:43:56 +01:00
Markus Armbruster
ae7ccd50c3 qapi: Fix mangled "Returns" sections in documentation
Commit e050e42678 (qapi: Use explicit bulleted lists) added list
markup to correct bad rendering:

    A JSON block comment like this:
         Returns: nothing on success
                  If @node is not a valid block device, DeviceNotFound
                  If @name is not found, GenericError with an explanation

    renders like this:

         Returns: nothing on success If node is not a valid block device,
         DeviceNotFound If name is not found, GenericError with an explanation

    because whitespace is not significant.

    Use an actual bulleted list, so that the formatting is correct.

It missed a few instances.  Commit a937b6aa73 (qapi: Reformat doc
comments to conform to current conventions) then reflowed them.

Revert the reflowing, and add list markup.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240120095327.666239-6-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2024-01-26 07:04:53 +01:00
Eric Auger
6e6d8ac62b backends/iommufd: Introduce the iommufd object
Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.

The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:

This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.

If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19 19:03:38 +01:00
David Hildenbrand
e92666b0ba backends/hostmem-file: Add "rom" property to support VM templating with R/O files
For now, "share=off,readonly=on" would always result in us opening the
file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
turning it into ROM.

Especially for VM templating, "share=off" is a common use case. However,
that use case is impossible with files that lack write permissions,
because "share=off,readonly=on" will not give us writable RAM.

The sole user of ROM via memory-backend-file are R/O NVDIMMs, but as we
have users (Kata Containers) that rely on the existing behavior --
malicious VMs should not be able to consume COW memory for R/O NVDIMMs --
we cannot change the semantics of "share=off,readonly=on"

So let's add a new "rom" property with on/off/auto values. "auto" is
the default and what most people will use: for historical reasons, to not
change the old semantics, it defaults to the value of the "readonly"
property.

For VM templating, one can now use:
    -object memory-backend-file,share=off,readonly=on,rom=off,...

But we'll disallow:
    -object memory-backend-file,share=on,readonly=on,rom=off,...
because we would otherwise get an error when trying to mmap the R/O file
shared and writable. An explicit error message is cleaner.

We will also disallow for now:
    -object memory-backend-file,share=off,readonly=off,rom=on,...
    -object memory-backend-file,share=on,readonly=off,rom=on,...
It's not harmful, but also not really required for now.

Alternatives that were abandoned:
* Make "unarmed=on" for the NVDIMM set the memory region container
  readonly. We would still see a change of ROM->RAM and possibly run
  into memslot limits with vhost-user. Further, there might be use cases
  for "unarmed=on" that should still allow writing to that memory
  (temporary files, system RAM, ...).
* Add a new "readonly=on/off/auto" parameter for NVDIMMs. Similar issues
  as with "unarmed=on".
* Make "readonly" consume "on/off/file" instead of being a 'bool' type.
  This would slightly changes the behavior of the "readonly" parameter:
  values like true/false (as accepted by a 'bool'type) would no longer be
  accepted.

Message-ID: <20230906120503.359863-4-david@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-09-19 10:23:21 +02:00
Markus Armbruster
9e272073e1 qapi: Reformat recent doc comments to conform to current conventions
Since commit a937b6aa73 (qapi: Reformat doc comments to conform to
current conventions), a number of comments not conforming to the
current formatting conventions were added.  No problem, just sweep
the entire documentation once more.

To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3".  Finds no
differences.  Comparing with diff is not useful, as the reflown
paragraphs are visible there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20230720071610.1096458-7-armbru@redhat.com>
2023-07-26 14:51:36 +02:00
Alexander Graf
4b870dc4d0 hostmem-file: add offset option
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

To make this work consistently, also fix up all places in QEMU that
expect fd offsets to be 0.

Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20230403221421.60877-1-graf@amazon.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-05-23 16:47:03 +02:00
Markus Armbruster
a937b6aa73 qapi: Reformat doc comments to conform to current conventions
Change

    # @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
    #        do eiusmod tempor incididunt ut labore et dolore magna aliqua.

to

    # @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
    #     do eiusmod tempor incididunt ut labore et dolore magna aliqua.

See recent commit "qapi: Relax doc string @name: description
indentation rules" for rationale.

Reflow paragraphs to 70 columns width, and consistently use two spaces
to separate sentences.

To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3".  Finds no
differences.  Comparing with diff is not useful, as the reflown
paragraphs are visible there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230428105429.1687850-18-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
[Straightforward conflicts in qapi/audio.json qapi/misc-target.json
qapi/run-state.json resolved]
2023-05-10 10:01:01 +02:00
Markus Armbruster
37fa48a4cb qapi: Tidy up examples
A few examples neglect to prefix QMP input with '->'.  Fix that.

Two examples have extra space after '<-'.  Delete it.

A few examples neglect to show output.  Provide some.  The example
output for query-vcpu-dirty-limit could use further improvement.  Add
a TODO comment.

Use "Examples:" instead of "Example:" where multiple examples are
given.

One example section numbers its two examples.  Not done elsewhere;
drop.

Another example section separates them with "or".  Likewise.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230425064223.820979-8-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
2023-04-28 11:48:34 +02:00
Markus Armbruster
f1a787b5f4 qapi: @foo should be used to reference, not `foo`
Documentation suggests @foo is merely shorthand for ``foo``.  It's
not, it carries additional meaning: it's a reference to a QAPI schema
name.

Reword the documentation to spell that out.

Fix up the few ``foo`` that should be @foo.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230425064223.820979-7-armbru@redhat.com>
2023-04-28 11:48:34 +02:00
zhenwei pi
2580b452ff cryptodev: support QoS
Add 'throttle-bps' and 'throttle-ops' limitation to set QoS. The
two arguments work with both QEMU command line and QMP command.

Example of QEMU command line:
-object cryptodev-backend-builtin,id=cryptodev1,throttle-bps=1600,\
throttle-ops=100

Example of QMP command:
virsh qemu-monitor-command buster --hmp qom-set /objects/cryptodev1 \
throttle-ops 100

or cancel limitation:
virsh qemu-monitor-command buster --hmp qom-set /objects/cryptodev1 \
throttle-ops 0

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230301105847.253084-11-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07 12:38:59 -05:00
Stefan Hajnoczi
f21f1cfeb9 pci,pc,virtio: features, tests, fixes, cleanups
lots of acpi rework
 first version of biosbits infrastructure
 ASID support in vhost-vdpa
 core_count2 support in smbios
 PCIe DOE emulation
 virtio vq reset
 HMAT support
 part of infrastructure for viommu support in vhost-vdpa
 VTD PASID support
 fixes, tests all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
 uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
 Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
 EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
 =zTCn
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

pci,pc,virtio: features, tests, fixes, cleanups

lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
# uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
# 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
# Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
# 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
# EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
# =zTCn
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 07 Nov 2022 14:27:53 EST
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits)
  checkpatch: better pattern for inline comments
  hw/virtio: introduce virtio_device_should_start
  tests/acpi: update tables for new core count test
  bios-tables-test: add test for number of cores > 255
  tests/acpi: allow changes for core_count2 test
  bios-tables-test: teach test to use smbios 3.0 tables
  hw/smbios: add core_count2 to smbios table type 4
  vhost-user: Support vhost_dev_start
  vhost: Change the sequence of device start
  intel-iommu: PASID support
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: drop VTDBus
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  vfio: move implement of vfio_get_xlat_addr() to memory.c
  tests: virt: Update expected *.acpihmatvirt tables
  tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators
  hw/arm/virt: Enable HMAT on arm virt machine
  tests: Add HMAT AArch64/virt empty table files
  tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT:
  tests: acpi: q35: add test for hmat nodes without initiators
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-07 18:43:56 -05:00
Stefan Weil
1e458f1127 Fix some typos in documentation and comments
Most of them were found and fixed using codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221030105944.311940-1-sw@weilnetz.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-11-05 20:35:45 +01:00
Lei He
39fff6f3e8 cryptodev: Add a lkcf-backend for cryptodev
cryptodev: Added a new type of backend named lkcf-backend for
cryptodev. This backend upload asymmetric keys to linux kernel,
and let kernel do the accelerations if possible.
The lkcf stands for Linux Kernel Cryptography Framework.

Signed-off-by: lei he <helei.sig11@bytedance.com>
Message-Id: <20221008085030.70212-5-helei.sig11@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-11-02 06:56:32 -04:00
David Hildenbrand
e681645862 hostmem: Allow for specifying a ThreadContext for preallocation
Let's allow for specifying a thread context via the "prealloc-context"
property. When set, preallcoation threads will be crated via the
thread context -- inheriting the same CPU affinity as the thread
context.

Pinning preallcoation threads to CPUs can heavily increase performance
in NUMA setups, because, preallocation from a CPU close to the target
NUMA node(s) is faster then preallocation from a CPU further remote,
simply because of memory bandwidth for initializing memory with zeroes.
This is especially relevant for very large VMs backed by huge/gigantic
pages, whereby preallocation is mandatory.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-7-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27 11:01:03 +02:00
David Hildenbrand
10218ae6d0 util: Add write-only "node-affinity" property for ThreadContext
Let's make it easier to pin threads created via a ThreadContext to
all host CPUs currently belonging to a given set of host NUMA nodes --
which is the common case.

"node-affinity" is simply a shortcut for setting "cpu-affinity" manually
to the list of host CPUs belonging to the set of host nodes. This property
can only be written.

A simple QEMU example to set the CPU affinity to host node 1 on a system
with two nodes, 24 CPUs each, whereby odd-numbered host CPUs belong to
host node 1:
    qemu-system-x86_64 -S \
      -object thread-context,id=tc1,node-affinity=1

And we can query the cpu-affinity via HMP/QMP:
    (qemu) qom-get tc1 cpu-affinity
    [
        1,
        3,
        5,
        7,
        9,
        11,
        13,
        15,
        17,
        19,
        21,
        23,
        25,
        27,
        29,
        31,
        33,
        35,
        37,
        39,
        41,
        43,
        45,
        47
    ]

We cannot query the node-affinity:
    (qemu) qom-get tc1 node-affinity
    Error: Insufficient permission to perform this operation

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered. We'll wire
this up next to make it work.

Note that if the host CPUs for a host node change due do CPU hot(un)plug
CPU onlining/offlining (i.e., lscpu output changes) after the ThreadContext
was started, the CPU affinity will not get updated.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221014134720.168738-5-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27 11:00:50 +02:00
David Hildenbrand
e2de2c497e util: Introduce ThreadContext user-creatable object
Setting the CPU affinity of QEMU threads is a bit problematic, because
QEMU doesn't always have permissions to set the CPU affinity itself,
for example, with seccomp after initialized by QEMU:
    -sandbox enable=on,resourcecontrol=deny

General information about CPU affinities can be found in the man page of
taskset:
    CPU affinity is a scheduler property that "bonds" a process to a given
    set of CPUs on the system. The Linux scheduler will honor the given CPU
    affinity and the process will not run on any other CPUs.

While upper layers are already aware of how to handle CPU affinities for
long-lived threads like iothreads or vcpu threads, especially short-lived
threads, as used for memory-backend preallocation, are more involved to
handle. These threads are created on demand and upper layers are not even
able to identify and configure them.

Introduce the concept of a ThreadContext, that is essentially a thread
used for creating new threads. All threads created via that context
thread inherit the configured CPU affinity. Consequently, it's
sufficient to create a ThreadContext and configure it once, and have all
threads created via that ThreadContext inherit the same CPU affinity.

The CPU affinity of a ThreadContext can be configured two ways:

(1) Obtaining the thread id via the "thread-id" property and setting the
    CPU affinity manually (e.g., via taskset).

(2) Setting the "cpu-affinity" property and letting QEMU try set the
    CPU affinity itself. This will fail if QEMU doesn't have permissions
    to do so anymore after seccomp was initialized.

A simple QEMU example to set the CPU affinity to host CPU 0,1,6,7 would be:
    qemu-system-x86_64 -S \
      -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7

And we can query it via HMP/QMP:
    (qemu) qom-get tc1 cpu-affinity
    [
        0,
        1,
        6,
        7
    ]

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered. We'll wire
this up next to make it work.

In general, the interface behaves like pthread_setaffinity_np(): host
CPU numbers that are currently not available are ignored; only host CPU
numbers that are impossible with the current kernel will fail. If the
list of host CPU numbers does not include a single CPU that is
available, setting the CPU affinity will fail.

A ThreadContext can be reused, simply by reconfiguring the CPU affinity.
Note that the CPU affinity of previously created threads will not get
adjusted.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221014134720.168738-4-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27 11:00:43 +02:00
Jagannathan Raman
8f9a9259d3 vfio-user: define vfio-user-server object
Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: e45a17001e9b38f451543a664ababdf860e5f2f2.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-06-15 16:43:42 +01:00
Nicolas Saenz Julienne
71ad4713cc util/event-loop-base: Introduce options to set the thread pool size
The thread pool regulates itself: when idle, it kills threads until
empty, when in demand, it creates new threads until full. This behaviour
doesn't play well with latency sensitive workloads where the price of
creating a new thread is too high. For example, when paired with qemu's
'-mlock', or using safety features like SafeStack, creating a new thread
has been measured take multiple milliseconds.

In order to mitigate this let's introduce a new 'EventLoopBase'
property to set the thread pool size. The threads will be created during
the pool's initialization or upon updating the property's value, remain
available during its lifetime regardless of demand, and destroyed upon
freeing it. A properly characterized workload will then be able to
configure the pool to avoid any latency spikes.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20220425075723.20019-4-nsaenzju@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09 10:43:23 +01:00
Nicolas Saenz Julienne
70ac26b9e5 util/main-loop: Introduce the main loop into QOM
'event-loop-base' provides basic property handling for all 'AioContext'
based event loops. So let's define a new 'MainLoopClass' that inherits
from it. This will permit tweaking the main loop's properties through
qapi as well as through the command line using the '-object' keyword[1].
Only one instance of 'MainLoopClass' might be created at any time.

'EventLoopBaseClass' learns a new callback, 'can_be_deleted()' so as to
mark 'MainLoop' as non-deletable.

[1] For example:
      -object main-loop,id=main-loop,aio-max-batch=<value>

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20220425075723.20019-3-nsaenzju@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09 10:43:23 +01:00
Nicolas Saenz Julienne
7d5983e3c8 Introduce event-loop-base abstract class
Introduce the 'event-loop-base' abstract class, it'll hold the
properties common to all event loops and provide the necessary hooks for
their creation and maintenance. Then have iothread inherit from it.

EventLoopBaseClass is defined as user creatable and provides a hook for
its children to attach themselves to the user creatable class 'complete'
function. It also provides an update_params() callback to propagate
property changes onto its children.

The new 'event-loop-base' class will live in the root directory. It is
built on its own using the 'link_whole' option (there are no direct
function dependencies between the class and its children, it all happens
trough 'constructor' magic). And also imposes new compilation
dependencies:

    qom <- event-loop-base <- blockdev (iothread.c)

And in subsequent patches:

    qom <- event-loop-base <- qemuutil (util/main-loop.c)

All this forced some amount of reordering in meson.build:

 - Moved qom build definition before qemuutil. Doing it the other way
   around (i.e. moving qemuutil after qom) isn't possible as a lot of
   core libraries that live in between the two depend on it.

 - Process the 'hw' subdir earlier, as it introduces files into the
   'qom' source set.

No functional changes intended.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20220425075723.20019-2-nsaenzju@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09 10:43:23 +01:00
Dov Murik
55cdf56641 qapi/qom,target/i386: sev-guest: Introduce kernel-hashes=on|off option
Introduce new boolean 'kernel-hashes' option on the sev-guest object.
It will be used to to decide whether to add the hashes of
kernel/initrd/cmdline to SEV guest memory when booting with -kernel.
The default value is 'off'.

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Acked-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-11-18 11:07:44 +00:00
Markus Armbruster
9fb49daabf qapi: Mark unstable QMP parts with feature 'unstable'
Add special feature 'unstable' everywhere the name starts with 'x-',
except for InputBarrierProperties member x-origin and
MemoryBackendProperties member x-use-canonical-path-for-ramblock-id,
because these two are actually stable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20211028102520.747396-3-armbru@redhat.com>
2021-10-29 15:55:52 +02:00
Thomas Huth
f1279fc15b qapi: Make some ObjectTypes depend on the build settings
Some of the ObjectType entries already depend on CONFIG_* switches.
Some others also only make sense with certain configurations, but
are currently always listed in the ObjectType enum. Let's make them
depend on the correpsonding CONFIG_* switches, too, so that upper
layers (like libvirt) have a better way to determine which features
are available in QEMU.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210928160232.432980-1-thuth@redhat.com>
[Do the same for MemoryBackendEpcProperties. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Yang Zhong
46a1d21dba qom: Add memory-backend-epc ObjectOptions support
Add the new 'memory-backend-epc' user creatable QOM object in
the ObjectOptions to support SGX since v6.1, or the sgx backend
object cannot bootup.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:19 +02:00
Marc-André Lureau
8a9f1e1d9c qapi: make 'if' condition strings simple identifiers
Change the 'if' condition strings to be C-agnostic. It will accept
'[A-Z][A-Z0-9_]*' identifiers. This allows to express configuration
conditions in other languages (Rust or Python for ex) or other more
suitable forms.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Message-Id: <20210804083105.97531-11-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased with semantic conflict in redefined-event.json]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-08-26 13:53:56 +02:00
Stefano Garzarella
1793ad0247 iothread: add aio-max-batch parameter
The `aio-max-batch` parameter will be propagated to AIO engines
and it will be used to control the maximum number of queued requests.

When there are in queue a number of requests equal to `aio-max-batch`,
the engine invokes the system call to forward the requests to the kernel.

This parameter allows us to control the maximum batch size to reduce
the latency that requests might accumulate while queued in the AIO
engine queue.

If `aio-max-batch` is equal to 0 (default value), the AIO engine will
use its default maximum batch size value.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20210721094211.69853-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-21 13:47:50 +01:00
David Hildenbrand
9181fb7043 hostmem: Wire up RAM_NORESERVE via "reserve" property
Let's provide a way to control the use of RAM_NORESERVE via memory
backends using the "reserve" property which defaults to true (old
behavior).

Only Linux currently supports clearing the flag (and support is checked at
runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
Windows and other POSIX systems will bail out with "reserve=false".

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. This essentially allows
avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
virtio-mem and also supporting hugetlbfs in the future.

As really only Linux implements RAM_NORESERVE right now, let's expose
the property only with CONFIG_LINUX. Setting the property to "false"
will then only fail in corner cases -- for example on very old kernels
or when memory overcommit was completely disabled by the admin.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-11-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15 20:27:38 +02:00
Paolo Bonzini
6ba7ada355 qtest: add a QOM object for qtest
The qtest server right now can only be created using the -qtest
and -qtest-log options.  Allow an alternative way to create it
using "-object qtest,chardev=...,log=...".

This is part of the long term plan to make more (or all) of
QEMU configurable through QMP and preconfig mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-26 14:49:45 +02:00
Paolo Bonzini
9e33013bd4 object: add more commands to preconfig mode
Creating and destroying QOM objects does not require a fully constructed
machine.  Allow running object-add and object-del before machine
initialization has concluded.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-26 14:49:45 +02:00
Michael Tokarev
09ceb33091 qapi: spelling fix (addtional)
Fixes: 3d0d3c30ae
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210508093315.393274-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-13 17:57:24 +02:00