This contains an important CVE fix for virtiofsd,
together with two fixes for over-eager seccomp rules.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmAcPU0ACgkQBRYzHrxb
/edRZhAAtyA0EuCFyXks4waSM9jsunFj+FrNbARLTVVzyn1/2dr1w+0qf8YP4IaJ
hHJJUYGCS1L+j9+EZsrUV06Ry/oCeMEukGgjxitquPVord09PujZ+Ty6wSmxzOr0
lokJKJGqB8O3M/F7Tq28OuH0wRwf8f0xG/vw16Q3LGDc44hiB4RrYOHWEAx9tnKT
dBlUAmSdO5SMoP47PcK4KwX++VCjZ09Kr/zhbU0yyIhYqtZxXInCApxwkScUK7zG
F0YN9xtJV1QPdg4ZNpVMOWf48bf5x0dphNg4H8RUbqmmtoje4A++thfSsy8h+GxP
21lnctCzFioVPwmNpiOQHGi5LzX3rasd0lfUvNkV1WE8LvNR7qcc8n47gBuTNsF7
GQm+cUPGn3IVogpirU/OPqixLh5FfMwaMN20ujdQuek2+llVKJYG3t01ITiVUyca
wVDSkt62asCEBay48CcbTx2ceSQ8I3h9VZW8mpE8nHxxJhxu2xQZISqwjQmyafU5
gCDZUcZB1znA0IaTtMZYg3+0ICY/3TyjYc5DxKd7bdxP2nWryjjNeSKySPMWurD2
3GpeCCecad441qmjQjKFzrza7PeBYgIM1iUzHeSiFST9dAIXcu8qe7aOHxLbzyK5
5wQzygm7+ttr1GqjM9jWv8462M/FmjZ1ALXEp33XZatGYWIjnYs=
=R0/n
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210204' into staging
virtiofs: Security pull 2021-02-04
This contains an important CVE fix for virtiofsd,
together with two fixes for over-eager seccomp rules.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Thu 04 Feb 2021 18:30:37 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert-gitlab/tags/pull-virtiofs-20210204:
virtiofsd: Add restart_syscall to the seccomp whitelist
virtiofsd: Add _llseek to the seccomp whitelist
virtiofsd: prevent opening of special files (CVE-2020-35517)
virtiofsd: optionally return inode pointer from lo_do_lookup()
virtiofsd: extract lo_do_open() from lo_open()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is how linux restarts some system calls after SIGSTOP/SIGCONT.
This is needed to avoid virtiofsd termination when resuming execution
under GDB for example.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210201193305.136390-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This is how glibc implements lseek(2) on POWER.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1917692
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210121171540.1449777-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
A well-behaved FUSE client does not attempt to open special files with
FUSE_OPEN because they are handled on the client side (e.g. device nodes
are handled by client-side device drivers).
The check to prevent virtiofsd from opening special files is missing in
a few cases, most notably FUSE_OPEN. A malicious client can cause
virtiofsd to open a device node, potentially allowing the guest to
escape. This can be exploited by a modified guest device driver. It is
not exploitable from guest userspace since the guest kernel will handle
special files inside the guest instead of sending FUSE requests.
This patch fixes this issue by introducing the lo_inode_open() function
to check the file type before opening it. This is a short-term solution
because it does not prevent a compromised virtiofsd process from opening
device nodes on the host.
Restructure lo_create() to try O_CREAT | O_EXCL first. Note that O_CREAT
| O_EXCL does not follow symlinks, so O_NOFOLLOW masking is not
necessary here. If the file exists and the user did not specify O_EXCL,
open it via lo_do_open().
Reported-by: Alex Xu <alex@alxu.ca>
Fixes: CVE-2020-35517
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210204150208.367837-4-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
lo_do_lookup() finds an existing inode or allocates a new one. It
increments nlookup so that the inode stays alive until the client
releases it.
Existing callers don't need the struct lo_inode so the function doesn't
return it. Extend the function to optionally return the inode. The next
commit will need it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210204150208.367837-3-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Both lo_open() and lo_create() have similar code to open a file. Extract
a common lo_do_open() function from lo_open() that will be used by
lo_create() in a later commit.
Since lo_do_open() does not otherwise need fuse_req_t req, convert
lo_add_fd_mapping() to use struct lo_data *lo instead.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210204150208.367837-2-stefanha@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This will check virtio/vhost-user-vga & virgl are correctly initialized
by the Linux kernel on an egl-headless display.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-21-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Not all chardevs are created via qmp_chardev_open_socket(), and those
should not call the yank function registration, as this will eventually
assert() not being registered.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-20-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Displaying rendered resources requires blocking qemu GPU to avoid extra
framebuffer copies. For an external display, via Spice currently, there
is a callback to block/unblock the rendering in the same thread.
But with the vhost-user-gpu backend, the qemu process doesn't handle
the rendering itself, and the blocking callback isn't effective.
Instead, the backend must be notified when the display code is done.
Fix this by adding a new GraphicHwOps callback to indicate the GL state
is flushed, and we are done manipulating the shared GL resources. Call
it from gtk and spice display.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-19-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The next patch will notify the GL context got flush, which will resume
the queue processing. However, if this happens within the caller
context, it will end up with a stack overflow flush/update loop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-18-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
GtkGLArea is used on wayland, where EGL is usually available.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-17-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Those flags can be used to express different requirements for the
display or other needs.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-12-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This check is currently limited. It only is used by vhost-user-gpu (not
by vfio-display), and will print an error repeatedly during run-time.
We are going to dissociate the GL context from the
DisplayChangeListener, and listeners may come and go. The following
patches will address this differently.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-10-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
There are no users left.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-7-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes a deadlock where the backend calls QEMU, while QEMU also calls the
backend simultaneously, both ends waiting for each other.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-5-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Introduce a pending state for commands which aren't finished yet, but
are being handled. See following patch.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210204105232.834642-4-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes a deadlock where the backend calls QEMU, while QEMU also calls the
backend simultaneously, both ends waiting for each other.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-3-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
EDID has been enabled by default, but the backend may not implement
it (such as the contrib backend). This results in extra warnings and
potentially other issues in the guest.
The option shouldn't probably have been added to VIRTIO_GPU_BASE, but
it's a bit too late now, report an error and disable EDID when it's
not available.
Fixes: 0a7196625 ("edid: flip the default to enabled")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-2-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmAb5tsSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZTQMsQAJSlE7ns788RRhPH79kPh9yTFp8/DCWD
vzNesSCMVp3o7a9DBPf2U2+oxHvF5HsLJG+2eqftouqlT4DWUtfYsBFPCsaHyPry
KCAvrb3smP4GjedI5xpseyLKeW/c5SmNdwqahAD8ma/hAtJnoTBblphne88jYxj9
Q2l/dUzQuIgPy+jZ2DjxQYam9Vu2l3svNLvMGMe5L6/TAcSoNJV3jJ14Wz7GjoL2
If2LaHsvStFgMAniL16kv4OGd5c4jOy1QFdLGQQAbxavJsHQPhXGlq7xYRudXcdX
vn2HH6WQHCfwFfxkPAqOgund3Bf/LacODD9+oNUhhWgeZUiZZfVGvHjmODp2gzHa
b7m3YogIYHdr8taYemwEzuuhOR9I1gE/v4UvTokd3eXRY6HhkjTpwWjV06Codrhq
DKAPcjRZ1Q4XZVWAURZBz7icGXc15EuuneC6ygIZDrR1Ez73xfJ82WUscNx6MujX
8pOPgNmQtVCRqMFRlfvZl1cVCu9Qh/1pkeDDWjR80OxvNB4iYQKfvB1GLxZJL3Vx
zsH8oVajd+8sXdUyUA1JTM5r7b9/6TJOlktHhZ9F3TDDVbwhmcBPgTWvRa139SVf
SUkAMZjBFx/lsuuR8S3aOG+NftYQZs3TjPqWBklpXe1nEwOxjBipPDAZMq5xccVf
DVrE5Ay2/1T0
=l1y4
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qmp-2021-02-04' into staging
QMP patches patches for 2021-02-04
# gpg: Signature made Thu 04 Feb 2021 12:21:47 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qmp-2021-02-04:
qmp: Resume OOB-enabled monitor before processing the request
qmp: Add more tracepoints
qmp: Fix up comments after commit 9ce44e2ce2
docs/interop/qmp-spec: Document the request queue limit
qobject: braces {} are necessary for all arms of this statement
qobject: spaces required around that operators
qobject: code indent should never use tabs
qobject: open brace '{' following struct go on the same line
monitor/qmp-cmds.c: Don't include ui/vnc.h
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QEMU used to run qemu_spice.display_init() before vm_start(), and
QXL/display interfaces where started then. Now, vm_start() happens
before QXL/display interfaces are added and Spice server doesn't
automatically start them in this case (fixed in spice git)
Fixes Spice regression introduced after 5.2, with refactoring commits
b4e1a34211 ("vl: remove separate preconfig main_loop") and
facf7c60ee ("vl: initialize displays _after_ exiting preconfiguration"),
probably others.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210129152351.161971-1-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Asynchronous handlers may be waiting for the graphic_hw_update_done() to
be called in this case too.
Fixes: 4d6316218 ("console: add graphic_hw_update_done()")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210201201422.446552-3-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
On secondary QXL devices, the console is only set on qxl.vga.con. But
graphic_hw_update_done() is called with qxl.ssd.dcl.con.
Like for primary QXL devices, set qxl.sdd.dcl.con = qxl.vga.con.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210201201422.446552-2-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
We should use printf format specifier "%u" instead of "%d" for
argument of type "unsigned int".
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Message-id: 20201119025851.56487-1-alex.chen@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
monitor_qmp_dispatcher_co() needs to resume the monitor if
handle_qmp_command() suspended it. Two cases:
1. OOB enabled: suspended if mon->qmp_requests has no more space
2. OOB disabled: suspended always
We resume only after we processed the request. Which can take a long
time.
Resume the monitor right when the queue has space to keep the monitor
available for out-of-band commands even in this corner case.
Leave the "OOB disabled" case alone.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210201161504.1976989-4-armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
[Trailing whitespace tidied up]
Add tracepoints for in-band request enqueue and dequeue, processing of
queued in-band errors, and responses.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210201161504.1976989-3-armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Commit 9ce44e2ce2 "qmp: Move dispatcher to a coroutine" replaced
monitor_qmp_bh_dispatcher() by monitor_qmp_dispatcher_co(), but
neglected to update comments. Do that now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210201161504.1976989-2-armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210127144734.2367693-1-armbru@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Add braces {} for arms of if/for statement
Signed-off-by: Zhang Han <zhanghan64@huawei.com>
Message-Id: <20201228071129.24563-5-zhanghan64@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add spaces around operators.
Signed-off-by: Zhang Han <zhanghan64@huawei.com>
Message-Id: <20201228071129.24563-4-zhanghan64@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Transfer tabs to spaces.
Signed-off-by: Zhang Han <zhanghan64@huawei.com>
Message-Id: <20201228071129.24563-3-zhanghan64@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Put open brace '{' on the same line of struct.
Signed-off-by: Zhang Han <zhanghan64@huawei.com>
Message-Id: <20201228071129.24563-2-zhanghan64@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The qmp-cmds.c file currently includes ui/vnc.h, which (being located
in the ui/ directory rather than include) is really supposed to be
for use only by the ui subsystem. In fact the function prototypes we
need (vnc_display_password(), etc) are all declared in
include/ui/console.h, so we can switch to including that instead.
(ui/vnc.h includes include/ui/console.h, so this change strictly
reduces the quantity of headers qmp-cmds.c pulls in.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210104161200.15068-1-peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmAZ2UcdHHJpY2hhcmQu
aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/2lggAi8Es/LyFve8gHcIf
A18SuZ46SqDrUYFHUZNWYxbsoIlLGdDTWYxGJHN90Zeo7Gjzhm2zA3UOEP5E+vnF
bDNZAMCiQqo4imAZMMS6LAebTG+D3COJVQ6HBiGoIccgGIPXwu0GCqObnXinRcP+
EA7WQcxujEbELIIKIVpuoC5QV949MSA4npGrfJyX0AaEtu6womPQxaPmF0xqqwJP
HDxH2l9D9oqWdrFOo/TPyma9VXLTaMvxNXPNdnhPwmjwh1139TenZW1lElyMKPYD
KfvVz2v808kC2WShZFAi/7Gl3+f24IqMBNIGuethP9APwcRgV8zTCeAlZTo2mdND
6ohENA==
=pliz
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210202' into staging
TCG backend constraints cleanup
# gpg: Signature made Tue 02 Feb 2021 22:59:19 GMT
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth-gitlab/tags/pull-tcg-20210202: (24 commits)
tcg: Remove TCG_TARGET_CON_SET_H
tcg/tci: Split out constraint sets to tcg-target-con-set.h
tcg/sparc: Split out constraint sets to tcg-target-con-set.h
tcg/s390: Split out constraint sets to tcg-target-con-set.h
tcg/riscv: Split out constraint sets to tcg-target-con-set.h
tcg/ppc: Split out constraint sets to tcg-target-con-set.h
tcg/mips: Split out constraint sets to tcg-target-con-set.h
tcg/arm: Split out constraint sets to tcg-target-con-set.h
tcg/aarch64: Split out constraint sets to tcg-target-con-set.h
tcg/i386: Split out constraint sets to tcg-target-con-set.h
tcg: Remove TCG_TARGET_CON_STR_H
tcg/sparc: Split out target constraints to tcg-target-con-str.h
tcg/s390: Split out target constraints to tcg-target-con-str.h
tcg/riscv: Split out target constraints to tcg-target-con-str.h
tcg/mips: Split out target constraints to tcg-target-con-str.h
tcg/tci: Split out target constraints to tcg-target-con-str.h
tcg/ppc: Split out target constraints to tcg-target-con-str.h
tcg/aarch64: Split out target constraints to tcg-target-con-str.h
tcg/arm: Split out target constraints to tcg-target-con-str.h
tcg/i386: Split out target constraints to tcg-target-con-str.h
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- more cleanup from iotest python conversion
- progress towards consistent use of signed 64-bit types through block layer
- fix some crashes related to NBD reconnect
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAasREACgkQp6FrSiUn
Q2q0bAgAqkkrQtMhTcQ6cZenLhBc7qIojOybnOnO8hXb7SpndHjE9f9OF0f0+OOm
9vk24KyohvCFc7IwMHA/lccvHYH4Ftb8VPgYEs+U+iCxzdyM8M1goy8tbOp82Fgg
8ahxRbBP4VTkBUiXNcCxO2KbBQL2m+YkLKvfbw+XN0KbxER6hoikfNsSty97/3su
ZKtfEqBPH7Tl7kJz0JBfGTpCSPa6cXFzIwxK+m8pq7IKeNrICg1Uu0VR49QfTWlh
2ptIxvKx+QkxLV+QoW7m/fqXFaIFye0tQAwTr/5b3/jBtAkgg8Yznn0TMgiqwXMr
SuxbUpjNWqBtjjhwavSje020VhrD2w==
=7eUe
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-02-02-v2' into staging
nbd patches for 2021-02-02
- more cleanup from iotest python conversion
- progress towards consistent use of signed 64-bit types through block layer
- fix some crashes related to NBD reconnect
# gpg: Signature made Wed 03 Feb 2021 14:20:01 GMT
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-02-02-v2:
nbd: make nbd_read* return -EIO on error
block/nbd: only enter connection coroutine if it's present
block/nbd: only detach existing iochannel from aio_context
block/io: use int64_t bytes in copy_range
block/io: support int64_t bytes in read/write wrappers
block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()
block/io: support int64_t bytes in bdrv_aligned_preadv()
block/io: support int64_t bytes in bdrv_co_do_copy_on_readv()
block/io: support int64_t bytes in bdrv_aligned_pwritev()
block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()
block/io: use int64_t bytes in driver wrappers
block: use int64_t as bytes type in tracked requests
block/io: improve bdrv_check_request: check qiov too
block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes
block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure
block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up
block: fix theoretical overflow in bdrv_init_padding()
util/iov: make qemu_iovec_init_extended() honest
block: refactor bdrv_check_request: add errp
iotests: Fix expected whitespace for 185
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
NBD reconnect logic considers the error code from the functions that
read NBD messages to tell if reconnect should be attempted or not: it is
attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT
state (see nbd_channel_error). This error code is propagated from the
primitives like nbd_read.
The problem, however, is that nbd_read itself turns every error into -1
rather than -EIO. As a result, if the NBD server happens to die while
sending the message, the client in QEMU receives less data than it
expects, considers it as a fatal error, and wouldn't attempt
reestablishing the connection.
Fix it by turning every negative return from qio_channel_read_all into
-EIO returned from nbd_read. Apparently that was the original behavior,
but got broken later. Also adjust nbd_readXX to follow.
Fixes: e6798f06a6 ("nbd: generalize usage of nbd_read")
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-4-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.
However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine. As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:
#0 qio_channel_detach_aio_context (ioc=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452
#1 0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151
#2 bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00,
new_context=new_context@entry=0x562a260c9580,
ignore=ignore@entry=0x7feeadc9b780)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230
#3 0x0000562a24282969 in bdrv_child_try_set_aio_context
(bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332
#4 0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0,
new_context=0x562a260c9580,
update_root_node=update_root_node@entry=true, errp=errp@entry=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989
#5 0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>,
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010
#6 0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized
out>)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
#7 0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245
#8 0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90,
running=0, state=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220
#9 0x0000562a2402ebfd in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275
#10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
send_stop=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032
#11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914
#12 migration_iteration_run (s=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275
#13 migration_thread (opaque=opaque@entry=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439
#14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519
#15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700)
at pthread_create.c:333
#16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908,
oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0
<__func__.28102>, newlen=0)
at ../sysdeps/unix/sysv/linux/sysctl.c:30
#17 0x0000000000000000 in ?? ()
Fix it by checking that the connection coroutine is non-null before
trying to enter it. If it is null, no entering is needed, as the
connection is probably going down anyway.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
When the reconnect in NBD client is in progress, the iochannel used for
NBD connection doesn't exist. Therefore an attempt to detach it from
the aio_context of the parent BlockDriverState results in a NULL pointer
dereference.
The problem is triggerable, in particular, when an outgoing migration is
about to finish, and stopping the dataplane tries to move the
BlockDriverState from the iothread aio_context to the main loop. If the
NBD connection is lost before this point, and the NBD client has entered
the reconnect procedure, QEMU crashes:
#0 qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
#1 0x00005618034b1b68 in nbd_client_attach_aio_context_bh (
opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
#2 0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
#3 0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80)
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
#4 aio_bh_poll (ctx=ctx@entry=0x5618056c7580)
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
#5 0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580,
blocking=blocking@entry=true)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
#6 0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580,
cb=<optimized out>, opaque=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
#7 0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580,
bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
#8 bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00,
new_context=new_context@entry=0x5618056c7580,
ignore=ignore@entry=0x7f60e1e63780)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
#9 0x000056180345c969 in bdrv_child_try_set_aio_context (
bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
#10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0,
new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
#11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>,
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
#12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
#13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
#14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90,
running=0, state=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
#15 0x0000561803208bfd in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
#16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
#17 0x00005618033e3765 in migration_completion (s=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
#18 migration_iteration_run (s=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
#19 migration_thread (opaque=opaque@entry=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
#20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
#21 0x00007f61085d06ba in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x0000000000000000 in ?? ()
Fix it by checking that the iochannel is non-null before trying to
detach it from the aio_context. If it is null, no detaching is needed,
and it will get reattached in the proper aio_context once the connection
is reestablished.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert now copy_range parameters which are already 64bit to signed
type.
It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH
(which is less than INT64_MAX), and do check the requests in
bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls
bdrv_check_request()).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been
updated, update all their wrappers.
For all of them type of 'bytes' is widening, so callers are safe. We
have update request_fn in blkverify.c simultaneously. Still it's just a
pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is
widening for callers of the request_fn anyway.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.
bdrv_pad_request() is updated simultaneously, as pointer to bytes passed
to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part().
So, all callers of bdrv_pad_request() are updated to pass 64bit bytes.
bdrv_pad_request() is already good for 64bit requests, add
corresponding assertion.
Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part().
Type is widening, so callers are safe. Let's look inside the functions.
In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes
to other already int64_t interfaces (and some obviously safe
calculations), it's OK.
In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still
it's passed to bdrv_aligned_pwritev which supports int64_t bytes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>