Commit Graph

112 Commits

Author SHA1 Message Date
Chen Qun
d6eb39b554 qtest: delete superfluous inclusions of qtest.h
There are 23 files that include the "sysemu/qtest.h",
but they do not use any qtest functions.

Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210226081414.205946-1-kuhn.chenqun@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-03-09 06:03:53 +01:00
Greg Kurz
81f9766b7a 9pfs: Convert reclaim list to QSLIST
Use QSLIST instead of open-coding for a slightly improved readability.

No behavioral change.

Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <20210122143514.215780-1-groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2021-01-22 18:26:40 +01:00
Greg Kurz
20b7f45b22 9pfs: Improve unreclaim loop
If a fid was actually re-opened by v9fs_reopen_fid(), we re-traverse the
fid list from the head in case some other request created a fid that
needs to be marked unreclaimable as well (i.e. the client opened a new
handle on the path that is being unlinked). This is suboptimal since
most if not all fids that require it have likely been taken care of
already.

This is mostly the result of new fids being added to the head of the
list. Since the list is now a QSIMPLEQ, add new fids at the end instead
to avoid the need to rewind. Take a reference on the fid to ensure it
doesn't go away during v9fs_reopen_fid() and that it can be safely
passed to QSIMPLEQ_NEXT() afterwards. Since the associated put_fid()
can also yield, same is done with the next fid. So the logic here is
to get a reference on a fid and only put it back during the next
iteration after we could get a reference on the next fid.

Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <20210121181510.1459390-1-groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2021-01-22 15:17:19 +01:00
Greg Kurz
feabd6cf78 9pfs: Convert V9fsFidState::fid_list to QSIMPLEQ
The fid_list is currently open-coded. This doesn't seem to serve any
purpose that cannot be met with QEMU's generic lists. Let's go for a
QSIMPLEQ : this will allow to add new fids at the end of the list and
to improve the logic in v9fs_mark_fids_unreclaim().

Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <20210118142300.801516-3-groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2021-01-21 17:49:45 +01:00
Greg Kurz
2e53160fc6 9pfs: Convert V9fsFidState::clunked to bool
This can only be 0 or 1.

Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <20210118142300.801516-2-groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2021-01-21 17:49:45 +01:00
Greg Kurz
89fbea8737 9pfs: Fully restart unreclaim loop (CVE-2021-20181)
Depending on the client activity, the server can be asked to open a huge
number of file descriptors and eventually hit RLIMIT_NOFILE. This is
currently mitigated using a reclaim logic : the server closes the file
descriptors of idle fids, based on the assumption that it will be able
to re-open them later. This assumption doesn't hold of course if the
client requests the file to be unlinked. In this case, we loop on the
entire fid list and mark all related fids as unreclaimable (the reclaim
logic will just ignore them) and, of course, we open or re-open their
file descriptors if needed since we're about to unlink the file.

This is the purpose of v9fs_mark_fids_unreclaim(). Since the actual
opening of a file can cause the coroutine to yield, another client
request could possibly add a new fid that we may want to mark as
non-reclaimable as well. The loop is thus restarted if the re-open
request was actually transmitted to the backend. This is achieved
by keeping a reference on the first fid (head) before traversing
the list.

This is wrong in several ways:
- a potential clunk request from the client could tear the first
  fid down and cause the reference to be stale. This leads to a
  use-after-free error that can be detected with ASAN, using a
  custom 9p client
- fids are added at the head of the list : restarting from the
  previous head will always miss fids added by a some other
  potential request

All these problems could be avoided if fids were being added at the
end of the list. This can be achieved with a QSIMPLEQ, but this is
probably too much change for a bug fix. For now let's keep it
simple and just restart the loop from the current head.

Fixes: CVE-2021-20181
Buglink: https://bugs.launchpad.net/qemu/+bug/1911666
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-Id: <161064025265.1838153.15185571283519390907.stgit@bahia.lan>
Signed-off-by: Greg Kurz <groug@kaod.org>
2021-01-15 08:44:28 +01:00
Xinhao Zhang
01011733ea hw/9pfs : add spaces around operator
Fix code style. Operator needs spaces both sides.

Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com>
Signed-off-by: Kai Deng <dengkai1@huawei.com>
Reported-by: Euler Robot <euler.robot@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201030043515.1030223-1-zhangxinhao1@huawei.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2020-11-05 15:14:03 +01:00
Christian Schoenebeck
c418f935ac 9pfs: disable msize warning for synth driver
Previous patch introduced a performance warning being logged on host
side if client connected with an 'msize' <= 8192. Disable this
performance warning for the synth driver to prevent that warning from
being printed whenever the 9pfs (qtest) test cases are running.

Introduce a new export flag V9FS_NO_PERF_WARN for that purpose, which
might also be used to disable such warnings from the CLI in future.

We could have also prevented the warning by simply raising P9_MAX_SIZE
in virtio-9p-test.c to any value larger than 8192, however in the
context of test cases it makes sense running for edge cases, which
includes the lowest 'msize' value supported by the server which is
4096, hence we want to preserve an msize of 4096 for the test client.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1kEyDy-0006nN-5A@lizzy.crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2020-09-15 12:12:03 +02:00
Christian Schoenebeck
62777d825b 9pfs: log warning if msize <= 8192
It is essential to choose a reasonable high value for 'msize' to avoid
severely degraded file I/O performance. This parameter can only be
chosen on client/guest side, and a Linux client defaults to an 'msize'
of only 8192 if the user did not explicitly specify a value for 'msize',
which results in very poor file I/O performance.

Unfortunately many users are not aware that they should specify an
appropriate value for 'msize' to avoid severe performance issues, so
log a performance warning (with a QEMU wiki link explaining this issue
in detail) on host side in that case to make it more clear.

Currently a client cannot automatically pick a reasonable value for
'msize', because a good value for 'msize' depends on the file I/O
potential of the underlying storage on host side, i.e. a feature
invisible to the client, and even then a user would still need to trade
off between performance profit and additional RAM costs, i.e. with
growing 'msize' (RAM occupation), performance still increases, but
performance delta will shrink continuously.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <e6fc84845c95816ad5baecb0abd6bfefdcf7ec9f.1599144062.git.qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2020-09-15 12:12:03 +02:00
Christian Schoenebeck
d2c5cf7ca1 9pfs: differentiate readdir lock between 9P2000.u vs. 9P2000.L
Previous patch suggests that it might make sense to use a different mutex
type now while handling readdir requests, depending on the precise
protocol variant, as v9fs_do_readdir_with_stat() (used by 9P2000.u) uses
a CoMutex to avoid deadlocks that might happen with QemuMutex otherwise,
whereas do_readdir_many() (used by 9P2000.L) should better use a
QemuMutex, as the precise behaviour of a failed CoMutex lock on fs driver
side would not be clear.

And to avoid the wrong lock type being used, be now strict and error out
if a 9P2000.L client sends a Tread on a directory, and likeweise error out
if a 9P2000.u client sends a Treaddir request.

This patch is just intended as transitional measure, as currently 9P2000.u
vs. 9P2000.L implementations currently differ where the main logic of
fetching directory entries is located at (9P2000.u still being more top
half focused, while 9P2000.L already being bottom half focused in regards
to fetching directory entries that is).

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <9a2ddc347e533b0d801866afd9dfac853d2d4106.1596012787.git.qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2020-08-12 09:17:32 +02:00
Christian Schoenebeck
0c4356ba7d 9pfs: T_readdir latency optimization
Make top half really top half and bottom half really bottom half:

Each T_readdir request handling is hopping between threads (main
I/O thread and background I/O driver threads) several times for
every individual directory entry, which sums up to huge latencies
for handling just a single T_readdir request.

Instead of doing that, collect now all required directory entries
(including all potentially required stat buffers for each entry) in
one rush on a background I/O thread from fs driver by calling the
previously added function v9fs_co_readdir_many() instead of
v9fs_co_readdir(), then assemble the entire resulting network
response message for the readdir request on main I/O thread. The
fs driver is still aborting the directory entry retrieval loop
(on the background I/O thread inside of v9fs_co_readdir_many())
as soon as it would exceed the client's requested maximum R_readdir
response size. So this will not introduce a performance penalty on
another end.

Also: No longer seek initial directory position in v9fs_readdir(),
as this is now handled (more consistently) by
v9fs_co_readdir_many() instead.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <c7c3d1cf4e86611538cef44897842819d9359d7a.1596012787.git.qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2020-08-12 09:17:32 +02:00
Christian Schoenebeck
29c9d2ca80 9pfs: make v9fs_readdir_response_size() public
Rename function v9fs_readdir_data_size() -> v9fs_readdir_response_size()
and make it callable from other units. So far this function is only
used by 9p.c, however subsequent patches require the function to be
callable from another 9pfs unit. And as we're at it; also make it clear
for what this function is used for.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <3668ebc7d5b929a0e4f1357457060d96f50f76f4.1596012787.git.qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2020-08-12 09:17:32 +02:00
Vladimir Sementsov-Ogievskiy
92c451222c virtio-9p: Use ERRP_GUARD()
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
   &error_fatal, this means that we don't break error_abort
   (we'll abort on error_set, not on error_propagate)

If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call).  Fix such a case in
v9fs_device_realize_common().

This commit is generated by command

    sed -n '/^virtio-9p$/,/^$/{s/^F: //p}' MAINTAINERS | \
    xargs git ls-files | grep '\.[hc]$' | \
    xargs spatch \
        --sp-file scripts/coccinelle/errp-guard.cocci \
        --macro-file scripts/cocci-macro-file.h \
        --in-place --no-show-diff --max-width 80

Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-7-armbru@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci.  Commit message
tweaked again.]
2020-07-10 15:18:09 +02:00
Markus Armbruster
9261ef5e32 Clean up some calls to ignore Error objects the right way
Receiving the error in a local variable only to free it is less clear
(and also less efficient) than passing NULL.  Clean up.

Cc: Daniel P. Berrange <berrange@redhat.com>
Cc: Jerome Forissier <jerome@forissier.org>
CC: Greg Kurz <groug@kaod.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200630090351.1247703-4-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-07-02 06:25:28 +02:00
Stefano Stabellini
cf45183b71 Revert "9p: init_in_iov_from_pdu can truncate the size"
This reverts commit 16724a1730.
It causes https://bugs.launchpad.net/bugs/1877688.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <20200521192627.15259-1-sstabellini@kernel.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2020-05-25 11:45:38 +02:00
Dan Robertson
03556ea920 9pfs: include linux/limits.h for XATTR_SIZE_MAX
linux/limits.h should be included for the XATTR_SIZE_MAX definition used
by v9fs_xattrcreate.

Fixes: 3b79ef2cf4 ("9pfs: limit xattr size in xattrcreate")
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <20200515203015.7090-2-dan@dlrobertson.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2020-05-25 10:38:03 +02:00
Markus Armbruster
b69c3c21a5 qdev: Unrealize must not fail
Devices may have component devices and buses.

Device realization may fail.  Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).

When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far.  If any of these unrealizes
failed, the device would be left in an inconsistent state.  Must not
happen.

device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.

Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back?  We'd have to
re-realize, which can fail.  This design is fundamentally broken.

device_set_realized() does not roll back at all.  Instead, it keeps
unrealizing, ignoring further errors.

It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.

bus_set_realized() does not roll back either.  Instead, it stops
unrealizing.

Fortunately, no unrealize method can fail, as we'll see below.

To fix the design error, drop parameter @errp from all the unrealize
methods.

Any unrealize method that uses @errp now needs an update.  This leads
us to unrealize() methods that can fail.  Merely passing it to another
unrealize method cannot cause failure, though.  Here are the ones that
do other things with @errp:

* virtio_serial_device_unrealize()

  Fails when qbus_set_hotplug_handler() fails, but still does all the
  other work.  On failure, the device would stay realized with its
  resources completely gone.  Oops.  Can't happen, because
  qbus_set_hotplug_handler() can't actually fail here.  Pass
  &error_abort to qbus_set_hotplug_handler() instead.

* hw/ppc/spapr_drc.c's unrealize()

  Fails when object_property_del() fails, but all the other work is
  already done.  On failure, the device would stay realized with its
  vmstate registration gone.  Oops.  Can't happen, because
  object_property_del() can't actually fail here.  Pass &error_abort
  to object_property_del() instead.

* spapr_phb_unrealize()

  Fails and bails out when remove_drcs() fails, but other work is
  already done.  On failure, the device would stay realized with some
  of its resources gone.  Oops.  remove_drcs() fails only when
  chassis_from_bus()'s object_property_get_uint() fails, and it can't
  here.  Pass &error_abort to remove_drcs() instead.

Therefore, no unrealize method can fail before this patch.

device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool().  Can't drop @errp there, so pass
&error_abort.

We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors.  Pass &error_abort instead.

Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.

One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().

Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Christian Schoenebeck
d36a5c2270 9pfs: validate count sent by client with T_readdir
A good 9p client sends T_readdir with "count" parameter that's sufficiently
smaller than client's initially negotiated msize (maximum message size).
We perform a check for that though to avoid the server to be interrupted
with a "Failed to encode VirtFS reply type 41" transport error message by
bad clients. This count value constraint uses msize - 11, because 11 is the
header size of R_readdir.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <3990d3891e8ae2074709b56449e96ab4b4b93b7d.1579567020.git.qemu_oss@crudebyte.com>
[groug: added comment ]
Signed-off-by: Greg Kurz <groug@kaod.org>
2020-02-08 09:28:54 +01:00
Christian Schoenebeck
e16453a31a 9pfs: require msize >= 4096
A client establishes a session by sending a Tversion request along with a
'msize' parameter which client uses to suggest server a maximum message
size ever to be used for communication (for both requests and replies)
between client and server during that session. If client suggests a 'msize'
smaller than 4096 then deny session by server immediately with an error
response (Rlerror for "9P2000.L" clients or Rerror for "9P2000.u" clients)
instead of replying with Rversion.

So far any msize submitted by client with Tversion was simply accepted by
server without any check. Introduction of some minimum msize makes sense,
because e.g. a msize < 7 would not allow any subsequent 9p operation at
all, because 7 is the size of the header section common by all 9p message
types.

A substantial higher value of 4096 was chosen though to prevent potential
issues with some message types. E.g. Rreadlink may yield up to a size of
PATH_MAX which is usually 4096, and like almost all 9p message types,
Rreadlink is not allowed to be truncated by the 9p protocol. This chosen
size also prevents a similar issue with Rreaddir responses (provided client
always sends adequate 'count' parameter with Treaddir), because even though
directory entries retrieval may be split up over several T/Rreaddir
messages; a Rreaddir response must not truncate individual directory entries
though. So msize should be large enough to return at least one directory
entry with the longest possible file name supported by host. Most file
systems support a max. file name length of 255. Largest known file name
lenght limit would be currently ReiserFS with max. 4032 bytes, which is
also covered by this min. msize value because 4032 + 35 < 4096.

Furthermore 4096 is already the minimum msize of the Linux kernel's 9pfs
client.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <8ceecb7fb9fdbeabbe55c04339349a36929fb8e3.1579567019.git.qemu_oss@crudebyte.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2020-02-08 09:28:43 +01:00
Daniel Henrique Barboza
b858e80a02 9pfs/9p.c: remove unneeded labels
'out' label in v9fs_xattr_write() and 'out_nofid' label in
v9fs_complete_rename() can be replaced by appropriate return
calls.

CC: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2020-01-20 15:11:39 +01:00
Greg Kurz
16724a1730 9p: init_in_iov_from_pdu can truncate the size
init_in_iov_from_pdu might not be able to allocate the full buffer size
requested, which comes from the client and could be larger than the
transport has available at the time of the request. Specifically, this
can happen with read operations, with the client requesting a read up to
the max allowed, which might be more than the transport has available at
the time.

Today the implementation of init_in_iov_from_pdu throws an error, both
Xen and Virtio.

Instead, change the V9fsTransport interface so that the size becomes a
pointer and can be limited by the implementation of
init_in_iov_from_pdu.

Change both the Xen and Virtio implementations to set the size to the
size of the buffer they managed to allocate, instead of throwing an
error. However, if the allocated buffer size is less than P9_IOHDRSZ
(the size of the header) still throw an error as the case is unhandable.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: groug@kaod.org
CC: anthony.perard@citrix.com
CC: roman@zededa.com
CC: qemu_oss@crudebyte.com
[groug: fix 32-bit build]
Signed-off-by: Greg Kurz <groug@kaod.org>
2020-01-20 15:11:39 +01:00
Dan Schatzberg
68d654daee 9pfs: Fix divide by zero bug
Some filesystems may return 0s in statfs (trivially, a FUSE filesystem
can do so). QEMU should handle this gracefully and just behave the
same as if statfs failed.

Signed-off-by: Dan Schatzberg <dschatzberg@fb.com>
Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2019-11-23 15:51:48 +01:00
Christian Schoenebeck
6b6aa8285d 9p: Use variable length suffixes for inode remapping
Use variable length suffixes for inode remapping instead of the fixed
16 bit size prefixes before. With this change the inode numbers on guest
will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48
with the previous fixed size inode remapping.

Additionally this solution is more efficient, since inode numbers in
practice can take almost their entire 64 bit range on guest as well, so
there is less likely a need for generating and tracking additional suffixes,
which might also be beneficial for nested virtualization where each level of
virtualization would shift up the inode bits and increase the chance of
expensive remapping actions.

The "Exponential Golomb" algorithm is used as basis for generating the
variable length suffixes. The algorithm has a parameter k which controls the
distribution of bits on increasing indeces (minimum bits at low index vs.
maximum bits at high index). With k=0 the generated suffixes look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [1] (1 bits)
2 [10] -> [010] (3 bits)
3 [11] -> [110] (3 bits)
4 [100] -> [00100] (5 bits)
5 [101] -> [10100] (5 bits)
6 [110] -> [01100] (5 bits)
7 [111] -> [11100] (5 bits)
8 [1000] -> [0001000] (7 bits)
9 [1001] -> [1001000] (7 bits)
10 [1010] -> [0101000] (7 bits)
11 [1011] -> [1101000] (7 bits)
12 [1100] -> [0011000] (7 bits)
...
65533 [1111111111111101] ->  [1011111111111111000000000000000] (31 bits)
65534 [1111111111111110] ->  [0111111111111111000000000000000] (31 bits)
65535 [1111111111111111] ->  [1111111111111111000000000000000] (31 bits)
Hence minBits=1 maxBits=31

And with k=5 they would look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [000001] (6 bits)
2 [10] -> [100001] (6 bits)
3 [11] -> [010001] (6 bits)
4 [100] -> [110001] (6 bits)
5 [101] -> [001001] (6 bits)
6 [110] -> [101001] (6 bits)
7 [111] -> [011001] (6 bits)
8 [1000] -> [111001] (6 bits)
9 [1001] -> [000101] (6 bits)
10 [1010] -> [100101] (6 bits)
11 [1011] -> [010101] (6 bits)
12 [1100] -> [110101] (6 bits)
...
65533 [1111111111111101] -> [0011100000000000100000000000] (28 bits)
65534 [1111111111111110] -> [1011100000000000100000000000] (28 bits)
65535 [1111111111111111] -> [0111100000000000100000000000] (28 bits)
Hence minBits=6 maxBits=28

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2019-10-10 11:36:23 +02:00
Antonios Motakis
f3fe4a2d92 9p: stat_to_qid: implement slow path
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
However this implementation makes some assumptions about inode number
generation on the host.

If qid_path_prefixmap fails, we still have 48 bits available in the QID
path to fall back to a less memory efficient full mapping.

Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com>
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
       (SHA1 7fc4c49e91).
     - Updated hash calls to new xxhash API.
     - Removed unnecessary parantheses in qpf_lookup_func().
     - Removed unnecessary g_malloc0() result checks.
     - Log error message when running out of prefixes in
       qid_path_fullmap().
     - Log warning message about potential degraded performance in
       qid_path_prefixmap().
     - Wrapped qpf_table initialization to dedicated qpf_table_init()
       function.
     - Fixed typo in comment. ]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2019-10-10 11:36:14 +02:00
Antonios Motakis
1a6ed33cc5 9p: Added virtfs option 'multidevs=remap|forbid|warn'
'warn' (default): Only log an error message (once) on host if more than one
device is shared by same export, except of that just ignore this config
error though. This is the default behaviour for not breaking existing
installations implying that they really know what they are doing.

'forbid': Like 'warn', but except of just logging an error this
also denies access of guest to additional devices.

'remap': Allows to share more than one device per export by remapping
inodes from host to guest appropriately. To support multiple devices on the
9p share, and avoid qid path collisions we take the device id as input to
generate a unique QID path. The lowest 48 bits of the path will be set
equal to the file inode, and the top bits will be uniquely assigned based
on the top 16 bits of the inode and the device id.

Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com>
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
       (SHA1 7fc4c49e91).
     - Added virtfs option 'multidevs', original patch simply did the inode
       remapping without being asked.
     - Updated hash calls to new xxhash API.
     - Updated docs for new option 'multidevs'.
     - Fixed v9fs_do_readdir() not having remapped inodes.
     - Log error message when running out of prefixes in
       qid_path_prefixmap().
     - Fixed definition of QPATH_INO_MASK.
     - Wrapped qpp_table initialization to dedicated qpp_table_init()
       function.
     - Dropped unnecessary parantheses in qpp_lookup_func().
     - Dropped unnecessary g_malloc0() result checks. ]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
[groug: - Moved "multidevs" parsing to the local backend.
        - Added hint to invalid multidevs option error.
	- Turn "remap" into "x-remap". ]
Signed-off-by: Greg Kurz <groug@kaod.org>
2019-10-10 11:36:05 +02:00
Antonios Motakis
3b5ee9e86b 9p: Treat multiple devices on one export as an error
The QID path should uniquely identify a file. However, the
inode of a file is currently used as the QID path, which
on its own only uniquely identifies files within a device.
Here we track the device hosting the 9pfs share, in order
to prevent security issues with QID path collisions from
other devices.

We only print a warning for now but a subsequent patch will
allow users to have finer control over the desired behaviour.
Failing the I/O will be one the proposed behaviour, so we
also change stat_to_qid() to return an error here in order to
keep other patches simpler.

Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com>
[CS: - Assign dev_id to export root's device already in
       v9fs_device_realize_common(), not postponed in
       stat_to_qid().
     - error_report_once() if more than one device was
       shared by export.
     - Return -ENODEV instead of -ENOSYS in stat_to_qid().
     - Fixed typo in log comment. ]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
[groug, changed to warning, updated message and changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
2019-10-10 11:36:05 +02:00
Greg Kurz
c0da0cb761 9p: Simplify error path of v9fs_device_realize_common()
Make v9fs_device_unrealize_common() idempotent and use it for rollback,
in order to reduce code duplication.

Signed-off-by: Greg Kurz <groug@kaod.org>
2019-10-10 11:36:04 +02:00
Antonios Motakis
8703283352 9p: unsigned type for type, version, path
There is no need for signedness on these QID fields for 9p.

Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com>
[CS: - Also make QID type unsigned.
     - Adjust donttouch_stat() to new types.
     - Adjust trace-events to new types. ]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2019-10-10 11:36:04 +02:00
Markus Armbruster
db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Greg Kurz
1923923bfa 9p: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
Because it is a recommended coding practice (see HACKING).

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2018-12-12 14:18:10 +01:00
Greg Kurz
1d20398694 9p: fix QEMU crash when renaming files
When using the 9P2000.u version of the protocol, the following shell
command line in the guest can cause QEMU to crash:

    while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done

With 9P2000.u, file renaming is handled by the WSTAT command. The
v9fs_wstat() function calls v9fs_complete_rename(), which calls
v9fs_fix_path() for every fid whose path is affected by the change.
The involved calls to v9fs_path_copy() may race with any other access
to the fid path performed by some worker thread, causing a crash like
shown below:

Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0,
 flags=65536, mode=0) at hw/9pfs/9p-local.c:59
59          while (*path && fd != -1) {
(gdb) bt
#0  0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8,
 path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59
#1  0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8,
 path=0x0) at hw/9pfs/9p-local.c:92
#2  0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8,
 fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185
#3  0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498,
 path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53
#4  0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498)
 at hw/9pfs/9p.c:1083
#5  0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767)
 at util/coroutine-ucontext.c:116
#6  0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6
#7  0x0000000000000000 in  ()
(gdb)

The fix is to take the path write lock when calling v9fs_complete_rename(),
like in v9fs_rename().

Impact:  DoS triggered by unprivileged guest users.

Fixes: CVE-2018-19489
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-11-23 13:28:03 +01:00
Greg Kurz
5b3c77aa58 9p: take write lock on fid path updates (CVE-2018-19364)
Recent commit 5b76ef50f6 fixed a race where v9fs_co_open2() could
possibly overwrite a fid path with v9fs_path_copy() while it is being
accessed by some other thread, ie, use-after-free that can be detected
by ASAN with a custom 9p client.

It turns out that the same can happen at several locations where
v9fs_path_copy() is used to set the fid path. The fix is again to
take the write lock.

Fixes CVE-2018-19364.

Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-11-20 13:00:35 +01:00
Keno Fischer
aca6897fba 9p: xattr: Properly translate xattrcreate flags
As with unlinkat, these flags come from the client and need to
be translated to their host values. The protocol values happen
to match linux, but that need not be true in general.

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-06-07 12:17:22 +02:00
Keno Fischer
67e8734574 9p: Properly check/translate flags in unlinkat
The 9p-local code previously relied on P9_DOTL_AT_REMOVEDIR and AT_REMOVEDIR
having the same numerical value and deferred any errorchecking to the
syscall itself. However, while the former assumption is true on Linux,
it is not true in general. 9p-handle did this properly however. Move
the translation code to the generic 9p server code and add an error
if unrecognized flags are passed.

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-06-07 12:17:22 +02:00
Keno Fischer
a647502c58 9p: xattr: Fix crashes due to free of uninitialized value
If the size returned from llistxattr/lgetxattr is 0, we skipped
the malloc call, leaving xattr.value uninitialized. However, this
value is later passed to `g_free` without any further checks,
causing an error. Fix that by always calling g_malloc unconditionally.
If `size` is 0, it will return NULL, which is safe to pass to g_free.

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-06-07 12:17:22 +02:00
Greg Kurz
8f9c64bfa5 9p: add trace event for v9fs_setattr()
Don't print the tv_nsec part of atime and mtime, to stay below the 10
argument limit of trace events.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2018-05-02 08:59:24 +02:00
Marc-André Lureau
e446a1eb5e 9p: v9fs_path_copy() readability
lhs/rhs doesn't tell much about how argument are handled, dst/src is
and const arguments is clearer in my mind. Use g_memdup() while at it.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-02-19 18:27:15 +01:00
Greg Kurz
357e2f7f4e tests: virtio-9p: add FLUSH operation test
The idea is to send a victim request that will possibly block in the
server and to send a flush request to cancel the victim request.

This patch adds two test to verifiy that:
- the server does not reply to a victim request that was actually
  cancelled
- the server replies to the flush request after replying to the
  victim request if it could not cancel it

9p request cancellation reference:

http://man.cat-v.org/plan_9/5/flush

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(groug, change the test to only write a single byte to avoid
        any alignment or endianess consideration)
2018-02-02 11:11:55 +01:00
Keno Fischer
fc78d5ee76 9pfs: Correctly handle cancelled requests
# Background

I was investigating spurious non-deterministic EINTR returns from
various 9p file system operations in a Linux guest served from the
qemu 9p server.

 ## EINTR, ERESTARTSYS and the linux kernel

When a signal arrives that the Linux kernel needs to deliver to user-space
while a given thread is blocked (in the 9p case waiting for a reply to its
request in 9p_client_rpc -> wait_event_interruptible), it asks whatever
driver is currently running to abort its current operation (in the 9p case
causing the submission of a TFLUSH message) and return to user space.
In these situations, the error message reported is generally ERESTARTSYS.
If the userspace processes specified SA_RESTART, this means that the
system call will get restarted upon completion of the signal handler
delivery (assuming the signal handler doesn't modify the process state
in complicated ways not relevant here). If SA_RESTART is not specified,
ERESTARTSYS gets translated to EINTR and user space is expected to handle
the restart itself.

 ## The 9p TFLUSH command

The 9p TFLUSH commands requests that the server abort an ongoing operation.
The man page [1] specifies:

```
If it recognizes oldtag as the tag of a pending transaction, it should
abort any pending response and discard that tag.
[...]
When the client sends a Tflush, it must wait to receive the corresponding
Rflush before reusing oldtag for subsequent messages. If a response to the
flushed request is received before the Rflush, the client must honor the
response as if it had not been flushed, since the completed request may
signify a state change in the server
```

In particular, this means that the server must not send a reply with the
orignal tag in response to the cancellation request, because the client is
obligated to interpret such a reply as a coincidental reply to the original
request.

 # The bug

When qemu receives a TFlush request, it sets the `cancelled` flag on the
relevant pdu. This flag is periodically checked, e.g. in
`v9fs_co_name_to_path`, and if set, the operation is aborted and the error
is set to EINTR. However, the server then violates the spec, by returning
to the client an Rerror response, rather than discarding the message
entirely. As a result, the client is required to assume that said Rerror
response is a result of the original request, not a result of the
cancellation and thus passes the EINTR error back to user space.
This is not the worst thing it could do, however as discussed above, the
correct error code would have been ERESTARTSYS, such that user space
programs with SA_RESTART set get correctly restarted upon completion of
the signal handler.
Instead, such programs get spurious EINTR results that they were not
expecting to handle.

It should be noted that there are plenty of user space programs that do not
set SA_RESTART and do not correctly handle EINTR either. However, that is
then a userspace bug. It should also be noted that this bug has been
mitigated by a recent commit to the Linux kernel [2], which essentially
prevents the kernel from sending Tflush requests unless the process is about
to die (in which case the process likely doesn't care about the response).
Nevertheless, for older kernels and to comply with the spec, I believe this
change is beneficial.

 # Implementation

The fix is fairly simple, just skipping notification of a reply if
the pdu was previously cancelled. We do however, also notify the transport
layer that we're doing this, so it can clean up any resources it may be
holding. I also added a new trace event to distinguish
operations that caused an error reply from those that were cancelled.

One complication is that we only omit sending the message on EINTR errors in
order to avoid confusing the rest of the code (which may assume that a
client knows about a fid if it sucessfully passed it off to pud_complete
without checking for cancellation status). This does mean that if the server
acts upon the cancellation flag, it always needs to set err to EINTR. I
believe this is true of the current code.

[1] https://9fans.github.io/plan9port/man/man9/flush.html
[2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, send a zero-sized reply instead of detaching the buffer]
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2018-02-01 21:21:27 +01:00
Greg Kurz
066eb006b5 9pfs: drop v9fs_register_transport()
No good reasons to do this outside of v9fs_device_realize_common().

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2018-02-01 21:21:27 +01:00
Greg Kurz
65603a801e fsdev: improve error handling of backend init
This patch changes some error messages in the backend init code and
convert backends to propagate QEMU Error objects instead of calling
error_report().

One notable improvement is that the local backend now provides a more
detailed error report when it fails to open the shared directory.

Signed-off-by: Greg Kurz <groug@kaod.org>
2018-01-08 11:18:23 +01:00
Greg Kurz
7567359094 9pfs: make pdu_marshal() and pdu_unmarshal() static functions
They're only used by the 9p core code.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-01-08 11:18:22 +01:00
Greg Kurz
d1471233bb 9pfs: fix error path in pdu_submit()
If we receive an unsupported request id, we first decide to
return -ENOTSUPP to the client, but since the request id
causes is_read_only_op() to return false, we change the
error to be -EROFS if the fsdev is read-only. This doesn't
make sense since we don't know what the client asked for.

This patch ensures that -EROFS can only be returned if the
request id is supported.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-01-08 11:18:22 +01:00
Greg Kurz
8e71b96c62 9pfs: fix some type definitions
To comply with the QEMU coding style.

Signed-off-by: Greg Kurz <groug@kaod.org>
2018-01-08 11:18:22 +01:00
Greg Kurz
267fcadf32 9pfs: fix v9fs_mark_fids_unreclaim() return value
The return value of v9fs_mark_fids_unreclaim() is then propagated to
pdu_complete(). It should be a negative errno, not -1.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-11-06 18:05:35 +01:00
Prasad J Pandit
7bd9275630 9pfs: use g_malloc0 to allocate space for xattr
9p back-end first queries the size of an extended attribute,
allocates space for it via g_malloc() and then retrieves its
value into allocated buffer. Race between querying attribute
size and retrieving its could lead to memory bytes disclosure.
Use g_malloc0() to avoid it.

Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2017-10-16 14:21:59 +02:00
Jan Dakinevich
772a73692e 9pfs: check the size of transport buffer before marshaling
v9fs_do_readdir_with_stat() should check for a maximum buffer size
before an attempt to marshal gathered data. Otherwise, buffers assumed
as misconfigured and the transport would be broken.

The patch brings v9fs_do_readdir_with_stat() in conformity with
v9fs_do_readdir() behavior.

Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
[groug, regression caused my commit 8d37de41ca # 2.10]
Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-20 08:48:52 +02:00
Jan Dakinevich
4d8bc7334b 9pfs: fix name_to_path assertion in v9fs_complete_rename()
The third parameter of v9fs_co_name_to_path() must not contain `/'
character.

The issue is most likely related to 9p2000.u protocol only.

Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
[groug, regression caused by commit f57f587857 # 2.10]
Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-20 08:48:52 +02:00
Jan Dakinevich
6069537f43 9pfs: fix readdir() for 9p2000.u
If the client is using 9p2000.u, the following occurs:

$ cd ${virtfs_shared_dir}
$ mkdir -p a/b/c
$ ls a/b
ls: cannot access 'a/b/a': No such file or directory
ls: cannot access 'a/b/b': No such file or directory
a  b  c

instead of the expected:

$ ls a/b
c

This is a regression introduced by commit f57f5878578a;
local_name_to_path() now resolves ".." and "." in paths,
and v9fs_do_readdir_with_stat()->stat_to_v9stat() then
copies the basename of the resulting path to the response.
With the example above, this means that "." and ".." are
turned into "b" and "a" respectively...

stat_to_v9stat() currently assumes it is passed a full
canonicalized path and uses it to do two different things:
1) to pass it to v9fs_co_readlink() in case the file is a symbolic
   link
2) to set the name field of the V9fsStat structure to the basename
   part of the given path

It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat().

v9fs_stat() really needs 1) and 2) to be performed since it starts
with the full canonicalized path stored in the fid. It is different
for v9fs_do_readdir_with_stat() though because the name we want to
put into the V9fsStat structure is the d_name field of the dirent
actually (ie, we want to keep the "." and ".." special names). So,
we only need 1) in this case.

This patch hence adds a basename argument to stat_to_v9stat(), to
be used to set the name field of the V9fsStat structure, and moves
the basename logic to v9fs_stat().

Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
(groug, renamed old name argument to path and updated changelog)
Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-20 08:48:51 +02:00
Philippe Mathieu-Daudé
403a905b03 9pfs: avoid sign conversion error simplifying the code
(note this is how other functions also handle the errors).

hw/9pfs/9p.c:948:18: warning: Loss of sign in implicit conversion
        offset = err;
                 ^~~

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-05 14:01:16 +02:00