mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Christian Schoenebeck	7e985780aa	9pfs: use P9Array in v9fs_walk() Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <90c65d1c1ca11c1b434bb981b1fc7966f7711c8f.1633097129.git.qemu_oss@crudebyte.com>	2021-10-27 14:45:22 +02:00
Christian Schoenebeck	cc82fde9c7	9pfs: make V9fsPath usable via P9Array API Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <79a0ddf8375f6c95f0565ef155a1bf1e9387664f.1633097129.git.qemu_oss@crudebyte.com>	2021-10-27 14:45:22 +02:00
Christian Schoenebeck	04a7f9e55e	9pfs: simplify blksize_to_iounit() Use QEMU_ALIGN_DOWN() macro to reduce code and to make it more human readable. Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <b84eb324d2ebdcc6f9c442c97b5b4d01eecb4f43.1632758315.git.qemu_oss@crudebyte.com>	2021-10-27 14:45:22 +02:00
Christian Schoenebeck	b565bccb00	9pfs: deduplicate iounit code Remove redundant code that translates host fileystem's block size into 9p client (guest side) block size. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <129bb71d5119e61d335f1e3107e472e4beea223a.1632758315.git.qemu_oss@crudebyte.com>	2021-10-27 14:45:22 +02:00
Christian Schoenebeck	669ced09b3	9pfs: fix wrong I/O block size in Rgetattr When client sent a 9p Tgetattr request then the wrong I/O block size value was returned by 9p server; instead of host file system's I/O block size it should rather return an I/O block size according to 9p session's 'msize' value, because the value returned to client should be an "optimum" block size for I/O (i.e. to maximize performance), it should not reflect the actual physical block size of the underlying storage media. The I/O block size of a host filesystem is typically 4k, so the value returned was far too low for good 9p I/O performance. This patch adds stat_to_iounit() with a similar approach as the existing get_iounit() function. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1mT2Js-0000DW-OH@lizzy.crudebyte.com>	2021-10-27 14:45:22 +02:00
Christian Schoenebeck	869605b5a0	hw/9pfs: use g_autofree in v9fs_walk() where possible Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <b51670d2a39399535a035f6bc77c3cbeed85edae.1629208359.git.qemu_oss@crudebyte.com>	2021-09-02 13:26:22 +02:00
Christian Schoenebeck	97b1d8fdf6	hw/9pfs: avoid 'path' copy in v9fs_walk() The v9fs_walk() function resolves all client submitted path nodes to the local 'pathes' array. Using a separate string scalar variable 'path' inside the background worker thread loop and copying that local 'path' string scalar variable subsequently to the 'pathes' array (at the end of each loop iteration) is not necessary. Instead simply resolve each path directly to the 'pathes' array and don't use the string scalar variable 'path' inside the fs worker thread loop at all. The only advantage of the 'path' scalar was that in case of an error the respective 'pathes' element would not be filled. Right now this is not an issue as the v9fs_walk() function returns as soon as any error occurs. Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <7dacbecf25b2c9b4a0ce12d689a8a535f09a31e3.1629208359.git.qemu_oss@crudebyte.com>	2021-09-02 13:26:22 +02:00
Christian Schoenebeck	8d6cb10073	9pfs: reduce latency of Twalk As with previous performance optimization on Treaddir handling; reduce the overall latency, i.e. overall time spent on processing a Twalk request by reducing the amount of thread hops between the 9p server's main thread and fs worker thread(s). In fact this patch even reduces the thread hops for Twalk handling to its theoritical minimum of exactly 2 thread hops: main thread -> fs worker thread -> main thread This is achieved by doing all the required fs driver tasks altogether in a single v9fs_co_run_in_worker({ ... }); code block. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <1a6701674afc4f08d40396e3aa2631e18a4dbb33.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	66550339b7	9pfs: drop root_qid There is no longer a user of root_qid, so drop it. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <6896dd161d3257db6b0513842a14f87ca191fdf6.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	f22cad4228	9pfs: replace not_same_qid() by same_stat_id() As we are actually only comparing the filesystem ID (i.e. device number and inode number pair) let's use the POSIX stat buffer instead of QIDs, because resolving QIDs requires to be done on 9p server's main thread only as it might mutate the server state if inode remapping is enabled. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <26aa465ff9cc9c07e053331554a02fdae3994417.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	1d0fc0d0ee	9pfs: drop fid_to_qid() There is only one user of fid_to_qid() which is v9fs_walk(). Let's open-code fid_to_qid() directly within v9fs_walk(), because fid_to_qid() hides the POSIX stat buffer which we are going to need in the subsequent patch. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <e9a4c9c7a0792ed4db6578d105a0823ea05bc324.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	110243750d	9pfs: capture root stat We already capture the QID of the exported 9p root path, i.e. to prevent client access outside the defined, exported filesystem's tree. This is currently checked by comparing the root QID with another FID's QID. The problem with the latter is that resolving a QID of any given 9p path can only be done on 9p server's main thread, that's because it might mutate the server's state if inode remapping is enabled. For that reason also capture the POSIX stat info of the root path for being able to identify on any (e.g. worker) thread whether an arbitrary given path is identical to the export root. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <eb07d6c2e9925788454cfe33d3802e4ffb23ea9a.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	8bf27550ef	9pfs: fix not_same_qid() There is only one user of not_same_qid() which is v9fs_walk() and the latter is using it for comparing a client supplied path with the 9p export root path, for the sole purpose to prevent a Twalk request from escaping from the exported 9p tree via "..". However for that specific purpose the implementation of not_same_qid() is wrong; if mtime of the 9p export root path changed between Tattach and Twalk then not_same_qid() returns true when actually comparing against the export root path. To fix for the actual semantic being used, only compare QID path members, but do not compare version or type members. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <ca0abae4a899d81c6e87f683732d6c1f56915232.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	232a4d2c25	9pfs: simplify v9fs_walk() There is only one comparison between nwnames and P9_MAXWELEM required. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1liKiz-0006BC-Ja@lizzy.crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	6f56908427	9pfs: add link to 9p developer docs To lower the entry level for new developers, add a link to the 9p developer docs (i.e. qemu wiki) to MAINTAINERS and to the beginning of 9p source files, that is to: https://wiki.qemu.org/Documentation/9p Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Acked-by: Greg Kurz <groug@kaod.org> Message-Id: <E1leeDf-0008GZ-9q@lizzy.crudebyte.com>	2021-07-05 13:03:16 +02:00
Chen Qun	d6eb39b554	qtest: delete superfluous inclusions of qtest.h There are 23 files that include the "sysemu/qtest.h", but they do not use any qtest functions. Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210226081414.205946-1-kuhn.chenqun@huawei.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-03-09 06:03:53 +01:00
Greg Kurz	81f9766b7a	9pfs: Convert reclaim list to QSLIST Use QSLIST instead of open-coding for a slightly improved readability. No behavioral change. Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210122143514.215780-1-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-22 18:26:40 +01:00
Greg Kurz	20b7f45b22	9pfs: Improve unreclaim loop If a fid was actually re-opened by v9fs_reopen_fid(), we re-traverse the fid list from the head in case some other request created a fid that needs to be marked unreclaimable as well (i.e. the client opened a new handle on the path that is being unlinked). This is suboptimal since most if not all fids that require it have likely been taken care of already. This is mostly the result of new fids being added to the head of the list. Since the list is now a QSIMPLEQ, add new fids at the end instead to avoid the need to rewind. Take a reference on the fid to ensure it doesn't go away during v9fs_reopen_fid() and that it can be safely passed to QSIMPLEQ_NEXT() afterwards. Since the associated put_fid() can also yield, same is done with the next fid. So the logic here is to get a reference on a fid and only put it back during the next iteration after we could get a reference on the next fid. Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210121181510.1459390-1-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-22 15:17:19 +01:00
Greg Kurz	feabd6cf78	9pfs: Convert V9fsFidState::fid_list to QSIMPLEQ The fid_list is currently open-coded. This doesn't seem to serve any purpose that cannot be met with QEMU's generic lists. Let's go for a QSIMPLEQ : this will allow to add new fids at the end of the list and to improve the logic in v9fs_mark_fids_unreclaim(). Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210118142300.801516-3-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-21 17:49:45 +01:00
Greg Kurz	2e53160fc6	9pfs: Convert V9fsFidState::clunked to bool This can only be 0 or 1. Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210118142300.801516-2-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-21 17:49:45 +01:00
Greg Kurz	89fbea8737	9pfs: Fully restart unreclaim loop (CVE-2021-20181) Depending on the client activity, the server can be asked to open a huge number of file descriptors and eventually hit RLIMIT_NOFILE. This is currently mitigated using a reclaim logic : the server closes the file descriptors of idle fids, based on the assumption that it will be able to re-open them later. This assumption doesn't hold of course if the client requests the file to be unlinked. In this case, we loop on the entire fid list and mark all related fids as unreclaimable (the reclaim logic will just ignore them) and, of course, we open or re-open their file descriptors if needed since we're about to unlink the file. This is the purpose of v9fs_mark_fids_unreclaim(). Since the actual opening of a file can cause the coroutine to yield, another client request could possibly add a new fid that we may want to mark as non-reclaimable as well. The loop is thus restarted if the re-open request was actually transmitted to the backend. This is achieved by keeping a reference on the first fid (head) before traversing the list. This is wrong in several ways: - a potential clunk request from the client could tear the first fid down and cause the reference to be stale. This leads to a use-after-free error that can be detected with ASAN, using a custom 9p client - fids are added at the head of the list : restarting from the previous head will always miss fids added by a some other potential request All these problems could be avoided if fids were being added at the end of the list. This can be achieved with a QSIMPLEQ, but this is probably too much change for a bug fix. For now let's keep it simple and just restart the loop from the current head. Fixes: CVE-2021-20181 Buglink: https://bugs.launchpad.net/qemu/+bug/1911666 Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <161064025265.1838153.15185571283519390907.stgit@bahia.lan> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-15 08:44:28 +01:00
Xinhao Zhang	01011733ea	hw/9pfs : add spaces around operator Fix code style. Operator needs spaces both sides. Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com> Signed-off-by: Kai Deng <dengkai1@huawei.com> Reported-by: Euler Robot <euler.robot@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20201030043515.1030223-1-zhangxinhao1@huawei.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-11-05 15:14:03 +01:00
Christian Schoenebeck	c418f935ac	9pfs: disable msize warning for synth driver Previous patch introduced a performance warning being logged on host side if client connected with an 'msize' <= 8192. Disable this performance warning for the synth driver to prevent that warning from being printed whenever the 9pfs (qtest) test cases are running. Introduce a new export flag V9FS_NO_PERF_WARN for that purpose, which might also be used to disable such warnings from the CLI in future. We could have also prevented the warning by simply raising P9_MAX_SIZE in virtio-9p-test.c to any value larger than 8192, however in the context of test cases it makes sense running for edge cases, which includes the lowest 'msize' value supported by the server which is 4096, hence we want to preserve an msize of 4096 for the test client. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1kEyDy-0006nN-5A@lizzy.crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-09-15 12:12:03 +02:00
Christian Schoenebeck	62777d825b	9pfs: log warning if msize <= 8192 It is essential to choose a reasonable high value for 'msize' to avoid severely degraded file I/O performance. This parameter can only be chosen on client/guest side, and a Linux client defaults to an 'msize' of only 8192 if the user did not explicitly specify a value for 'msize', which results in very poor file I/O performance. Unfortunately many users are not aware that they should specify an appropriate value for 'msize' to avoid severe performance issues, so log a performance warning (with a QEMU wiki link explaining this issue in detail) on host side in that case to make it more clear. Currently a client cannot automatically pick a reasonable value for 'msize', because a good value for 'msize' depends on the file I/O potential of the underlying storage on host side, i.e. a feature invisible to the client, and even then a user would still need to trade off between performance profit and additional RAM costs, i.e. with growing 'msize' (RAM occupation), performance still increases, but performance delta will shrink continuously. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <e6fc84845c95816ad5baecb0abd6bfefdcf7ec9f.1599144062.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-09-15 12:12:03 +02:00
Christian Schoenebeck	d2c5cf7ca1	9pfs: differentiate readdir lock between 9P2000.u vs. 9P2000.L Previous patch suggests that it might make sense to use a different mutex type now while handling readdir requests, depending on the precise protocol variant, as v9fs_do_readdir_with_stat() (used by 9P2000.u) uses a CoMutex to avoid deadlocks that might happen with QemuMutex otherwise, whereas do_readdir_many() (used by 9P2000.L) should better use a QemuMutex, as the precise behaviour of a failed CoMutex lock on fs driver side would not be clear. And to avoid the wrong lock type being used, be now strict and error out if a 9P2000.L client sends a Tread on a directory, and likeweise error out if a 9P2000.u client sends a Treaddir request. This patch is just intended as transitional measure, as currently 9P2000.u vs. 9P2000.L implementations currently differ where the main logic of fetching directory entries is located at (9P2000.u still being more top half focused, while 9P2000.L already being bottom half focused in regards to fetching directory entries that is). Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <9a2ddc347e533b0d801866afd9dfac853d2d4106.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	0c4356ba7d	9pfs: T_readdir latency optimization Make top half really top half and bottom half really bottom half: Each T_readdir request handling is hopping between threads (main I/O thread and background I/O driver threads) several times for every individual directory entry, which sums up to huge latencies for handling just a single T_readdir request. Instead of doing that, collect now all required directory entries (including all potentially required stat buffers for each entry) in one rush on a background I/O thread from fs driver by calling the previously added function v9fs_co_readdir_many() instead of v9fs_co_readdir(), then assemble the entire resulting network response message for the readdir request on main I/O thread. The fs driver is still aborting the directory entry retrieval loop (on the background I/O thread inside of v9fs_co_readdir_many()) as soon as it would exceed the client's requested maximum R_readdir response size. So this will not introduce a performance penalty on another end. Also: No longer seek initial directory position in v9fs_readdir(), as this is now handled (more consistently) by v9fs_co_readdir_many() instead. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <c7c3d1cf4e86611538cef44897842819d9359d7a.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	29c9d2ca80	9pfs: make v9fs_readdir_response_size() public Rename function v9fs_readdir_data_size() -> v9fs_readdir_response_size() and make it callable from other units. So far this function is only used by 9p.c, however subsequent patches require the function to be callable from another 9pfs unit. And as we're at it; also make it clear for what this function is used for. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <3668ebc7d5b929a0e4f1357457060d96f50f76f4.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Vladimir Sementsov-Ogievskiy	92c451222c	virtio-9p: Use ERRP_GUARD() If we want to check error after errp-function call, we need to introduce local_err and then propagate it to errp. Instead, use the ERRP_GUARD() macro, benefits are: 1. No need of explicit error_propagate call 2. No need of explicit local_err variable: use errp directly 3. ERRP_GUARD() leaves errp as is if it's not NULL or &error_fatal, this means that we don't break error_abort (we'll abort on error_set, not on error_propagate) If we want to add some info to errp (by error_prepend() or error_append_hint()), we must use the ERRP_GUARD() macro. Otherwise, this info will not be added when errp == &error_fatal (the program will exit prior to the error_append_hint() or error_prepend() call). Fix such a case in v9fs_device_realize_common(). This commit is generated by command sed -n '/^virtio-9p$/,/^$/{s/^F: //p}' MAINTAINERS \| \ xargs git ls-files \| grep '\.[hc]$' \| \ xargs spatch \ --sp-file scripts/coccinelle/errp-guard.cocci \ --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Greg Kurz <groug@kaod.org> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200707165037.1026246-7-armbru@redhat.com> [ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and auto-propagated-errp.cocci to errp-guard.cocci. Commit message tweaked again.]	2020-07-10 15:18:09 +02:00
Markus Armbruster	9261ef5e32	Clean up some calls to ignore Error objects the right way Receiving the error in a local variable only to free it is less clear (and also less efficient) than passing NULL. Clean up. Cc: Daniel P. Berrange <berrange@redhat.com> Cc: Jerome Forissier <jerome@forissier.org> CC: Greg Kurz <groug@kaod.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200630090351.1247703-4-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2020-07-02 06:25:28 +02:00
Stefano Stabellini	cf45183b71	Revert "9p: init_in_iov_from_pdu can truncate the size" This reverts commit `16724a1730`. It causes https://bugs.launchpad.net/bugs/1877688. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200521192627.15259-1-sstabellini@kernel.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 11:45:38 +02:00
Dan Robertson	03556ea920	9pfs: include linux/limits.h for XATTR_SIZE_MAX linux/limits.h should be included for the XATTR_SIZE_MAX definition used by v9fs_xattrcreate. Fixes: `3b79ef2cf4` ("9pfs: limit xattr size in xattrcreate") Signed-off-by: Dan Robertson <dan@dlrobertson.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200515203015.7090-2-dan@dlrobertson.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 10:38:03 +02:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Christian Schoenebeck	d36a5c2270	9pfs: validate count sent by client with T_readdir A good 9p client sends T_readdir with "count" parameter that's sufficiently smaller than client's initially negotiated msize (maximum message size). We perform a check for that though to avoid the server to be interrupted with a "Failed to encode VirtFS reply type 41" transport error message by bad clients. This count value constraint uses msize - 11, because 11 is the header size of R_readdir. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <3990d3891e8ae2074709b56449e96ab4b4b93b7d.1579567020.git.qemu_oss@crudebyte.com> [groug: added comment ] Signed-off-by: Greg Kurz <groug@kaod.org>	2020-02-08 09:28:54 +01:00
Christian Schoenebeck	e16453a31a	9pfs: require msize >= 4096 A client establishes a session by sending a Tversion request along with a 'msize' parameter which client uses to suggest server a maximum message size ever to be used for communication (for both requests and replies) between client and server during that session. If client suggests a 'msize' smaller than 4096 then deny session by server immediately with an error response (Rlerror for "9P2000.L" clients or Rerror for "9P2000.u" clients) instead of replying with Rversion. So far any msize submitted by client with Tversion was simply accepted by server without any check. Introduction of some minimum msize makes sense, because e.g. a msize < 7 would not allow any subsequent 9p operation at all, because 7 is the size of the header section common by all 9p message types. A substantial higher value of 4096 was chosen though to prevent potential issues with some message types. E.g. Rreadlink may yield up to a size of PATH_MAX which is usually 4096, and like almost all 9p message types, Rreadlink is not allowed to be truncated by the 9p protocol. This chosen size also prevents a similar issue with Rreaddir responses (provided client always sends adequate 'count' parameter with Treaddir), because even though directory entries retrieval may be split up over several T/Rreaddir messages; a Rreaddir response must not truncate individual directory entries though. So msize should be large enough to return at least one directory entry with the longest possible file name supported by host. Most file systems support a max. file name length of 255. Largest known file name lenght limit would be currently ReiserFS with max. 4032 bytes, which is also covered by this min. msize value because 4032 + 35 < 4096. Furthermore 4096 is already the minimum msize of the Linux kernel's 9pfs client. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <8ceecb7fb9fdbeabbe55c04339349a36929fb8e3.1579567019.git.qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-02-08 09:28:43 +01:00
Daniel Henrique Barboza	b858e80a02	9pfs/9p.c: remove unneeded labels 'out' label in v9fs_xattr_write() and 'out_nofid' label in v9fs_complete_rename() can be replaced by appropriate return calls. CC: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-01-20 15:11:39 +01:00
Greg Kurz	16724a1730	9p: init_in_iov_from_pdu can truncate the size init_in_iov_from_pdu might not be able to allocate the full buffer size requested, which comes from the client and could be larger than the transport has available at the time of the request. Specifically, this can happen with read operations, with the client requesting a read up to the max allowed, which might be more than the transport has available at the time. Today the implementation of init_in_iov_from_pdu throws an error, both Xen and Virtio. Instead, change the V9fsTransport interface so that the size becomes a pointer and can be limited by the implementation of init_in_iov_from_pdu. Change both the Xen and Virtio implementations to set the size to the size of the buffer they managed to allocate, instead of throwing an error. However, if the allocated buffer size is less than P9_IOHDRSZ (the size of the header) still throw an error as the case is unhandable. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> CC: groug@kaod.org CC: anthony.perard@citrix.com CC: roman@zededa.com CC: qemu_oss@crudebyte.com [groug: fix 32-bit build] Signed-off-by: Greg Kurz <groug@kaod.org>	2020-01-20 15:11:39 +01:00
Dan Schatzberg	68d654daee	9pfs: Fix divide by zero bug Some filesystems may return 0s in statfs (trivially, a FUSE filesystem can do so). QEMU should handle this gracefully and just behave the same as if statfs failed. Signed-off-by: Dan Schatzberg <dschatzberg@fb.com> Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-11-23 15:51:48 +01:00
Christian Schoenebeck	6b6aa8285d	9p: Use variable length suffixes for inode remapping Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution is more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well, so there is less likely a need for generating and tracking additional suffixes, which might also be beneficial for nested virtualization where each level of virtualization would shift up the inode bits and increase the chance of expensive remapping actions. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a parameter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1111111111111101] -> [1011111111111111000000000000000] (31 bits) 65534 [1111111111111110] -> [0111111111111111000000000000000] (31 bits) 65535 [1111111111111111] -> [1111111111111111000000000000000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [000001] (6 bits) 2 [10] -> [100001] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1111111111111101] -> [0011100000000000100000000000] (28 bits) 65534 [1111111111111110] -> [1011100000000000100000000000] (28 bits) 65535 [1111111111111111] -> [0111100000000000100000000000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:23 +02:00
Antonios Motakis	f3fe4a2d92	9p: stat_to_qid: implement slow path stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Updated hash calls to new xxhash API. - Removed unnecessary parantheses in qpf_lookup_func(). - Removed unnecessary g_malloc0() result checks. - Log error message when running out of prefixes in qid_path_fullmap(). - Log warning message about potential degraded performance in qid_path_prefixmap(). - Wrapped qpf_table initialization to dedicated qpf_table_init() function. - Fixed typo in comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:14 +02:00
Antonios Motakis	1a6ed33cc5	9p: Added virtfs option 'multidevs=remap\|forbid\|warn' 'warn' (default): Only log an error message (once) on host if more than one device is shared by same export, except of that just ignore this config error though. This is the default behaviour for not breaking existing installations implying that they really know what they are doing. 'forbid': Like 'warn', but except of just logging an error this also denies access of guest to additional devices. 'remap': Allows to share more than one device per export by remapping inodes from host to guest appropriately. To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Added virtfs option 'multidevs', original patch simply did the inode remapping without being asked. - Updated hash calls to new xxhash API. - Updated docs for new option 'multidevs'. - Fixed v9fs_do_readdir() not having remapped inodes. - Log error message when running out of prefixes in qid_path_prefixmap(). - Fixed definition of QPATH_INO_MASK. - Wrapped qpp_table initialization to dedicated qpp_table_init() function. - Dropped unnecessary parantheses in qpp_lookup_func(). - Dropped unnecessary g_malloc0() result checks. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [groug: - Moved "multidevs" parsing to the local backend. - Added hint to invalid multidevs option error. - Turn "remap" into "x-remap". ] Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:05 +02:00
Antonios Motakis	3b5ee9e86b	9p: Treat multiple devices on one export as an error The QID path should uniquely identify a file. However, the inode of a file is currently used as the QID path, which on its own only uniquely identifies files within a device. Here we track the device hosting the 9pfs share, in order to prevent security issues with QID path collisions from other devices. We only print a warning for now but a subsequent patch will allow users to have finer control over the desired behaviour. Failing the I/O will be one the proposed behaviour, so we also change stat_to_qid() to return an error here in order to keep other patches simpler. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Assign dev_id to export root's device already in v9fs_device_realize_common(), not postponed in stat_to_qid(). - error_report_once() if more than one device was shared by export. - Return -ENODEV instead of -ENOSYS in stat_to_qid(). - Fixed typo in log comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [groug, changed to warning, updated message and changelog] Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:05 +02:00
Greg Kurz	c0da0cb761	9p: Simplify error path of v9fs_device_realize_common() Make v9fs_device_unrealize_common() idempotent and use it for rollback, in order to reduce code duplication. Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:04 +02:00
Antonios Motakis	8703283352	9p: unsigned type for type, version, path There is no need for signedness on these QID fields for 9p. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Also make QID type unsigned. - Adjust donttouch_stat() to new types. - Adjust trace-events to new types. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:04 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Greg Kurz	1923923bfa	9p: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>	2018-12-12 14:18:10 +01:00
Greg Kurz	1d20398694	9p: fix QEMU crash when renaming files When using the 9P2000.u version of the protocol, the following shell command line in the guest can cause QEMU to crash: while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done With 9P2000.u, file renaming is handled by the WSTAT command. The v9fs_wstat() function calls v9fs_complete_rename(), which calls v9fs_fix_path() for every fid whose path is affected by the change. The involved calls to v9fs_path_copy() may race with any other access to the fid path performed by some worker thread, causing a crash like shown below: Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 59 while (*path && fd != -1) { (gdb) bt #0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 #1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8, path=0x0) at hw/9pfs/9p-local.c:92 #2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8, fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185 #3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498, path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53 #4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498) at hw/9pfs/9p.c:1083 #5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767) at util/coroutine-ucontext.c:116 #6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () (gdb) The fix is to take the path write lock when calling v9fs_complete_rename(), like in v9fs_rename(). Impact: DoS triggered by unprivileged guest users. Fixes: CVE-2018-19489 Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-11-23 13:28:03 +01:00
Greg Kurz	5b3c77aa58	9p: take write lock on fid path updates (CVE-2018-19364) Recent commit `5b76ef50f6` fixed a race where v9fs_co_open2() could possibly overwrite a fid path with v9fs_path_copy() while it is being accessed by some other thread, ie, use-after-free that can be detected by ASAN with a custom 9p client. It turns out that the same can happen at several locations where v9fs_path_copy() is used to set the fid path. The fix is again to take the write lock. Fixes CVE-2018-19364. Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-11-20 13:00:35 +01:00
Keno Fischer	aca6897fba	9p: xattr: Properly translate xattrcreate flags As with unlinkat, these flags come from the client and need to be translated to their host values. The protocol values happen to match linux, but that need not be true in general. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Keno Fischer	67e8734574	9p: Properly check/translate flags in unlinkat The 9p-local code previously relied on P9_DOTL_AT_REMOVEDIR and AT_REMOVEDIR having the same numerical value and deferred any errorchecking to the syscall itself. However, while the former assumption is true on Linux, it is not true in general. 9p-handle did this properly however. Move the translation code to the generic 9p server code and add an error if unrecognized flags are passed. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Keno Fischer	a647502c58	9p: xattr: Fix crashes due to free of uninitialized value If the size returned from llistxattr/lgetxattr is 0, we skipped the malloc call, leaving xattr.value uninitialized. However, this value is later passed to `g_free` without any further checks, causing an error. Fix that by always calling g_malloc unconditionally. If `size` is 0, it will return NULL, which is safe to pass to g_free. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00

1 2 3

127 Commits