mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Dan Robertson	03556ea920	9pfs: include linux/limits.h for XATTR_SIZE_MAX linux/limits.h should be included for the XATTR_SIZE_MAX definition used by v9fs_xattrcreate. Fixes: `3b79ef2cf4` ("9pfs: limit xattr size in xattrcreate") Signed-off-by: Dan Robertson <dan@dlrobertson.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200515203015.7090-2-dan@dlrobertson.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 10:38:03 +02:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Christian Schoenebeck	d36a5c2270	9pfs: validate count sent by client with T_readdir A good 9p client sends T_readdir with "count" parameter that's sufficiently smaller than client's initially negotiated msize (maximum message size). We perform a check for that though to avoid the server to be interrupted with a "Failed to encode VirtFS reply type 41" transport error message by bad clients. This count value constraint uses msize - 11, because 11 is the header size of R_readdir. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <3990d3891e8ae2074709b56449e96ab4b4b93b7d.1579567020.git.qemu_oss@crudebyte.com> [groug: added comment ] Signed-off-by: Greg Kurz <groug@kaod.org>	2020-02-08 09:28:54 +01:00
Christian Schoenebeck	e16453a31a	9pfs: require msize >= 4096 A client establishes a session by sending a Tversion request along with a 'msize' parameter which client uses to suggest server a maximum message size ever to be used for communication (for both requests and replies) between client and server during that session. If client suggests a 'msize' smaller than 4096 then deny session by server immediately with an error response (Rlerror for "9P2000.L" clients or Rerror for "9P2000.u" clients) instead of replying with Rversion. So far any msize submitted by client with Tversion was simply accepted by server without any check. Introduction of some minimum msize makes sense, because e.g. a msize < 7 would not allow any subsequent 9p operation at all, because 7 is the size of the header section common by all 9p message types. A substantial higher value of 4096 was chosen though to prevent potential issues with some message types. E.g. Rreadlink may yield up to a size of PATH_MAX which is usually 4096, and like almost all 9p message types, Rreadlink is not allowed to be truncated by the 9p protocol. This chosen size also prevents a similar issue with Rreaddir responses (provided client always sends adequate 'count' parameter with Treaddir), because even though directory entries retrieval may be split up over several T/Rreaddir messages; a Rreaddir response must not truncate individual directory entries though. So msize should be large enough to return at least one directory entry with the longest possible file name supported by host. Most file systems support a max. file name length of 255. Largest known file name lenght limit would be currently ReiserFS with max. 4032 bytes, which is also covered by this min. msize value because 4032 + 35 < 4096. Furthermore 4096 is already the minimum msize of the Linux kernel's 9pfs client. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <8ceecb7fb9fdbeabbe55c04339349a36929fb8e3.1579567019.git.qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-02-08 09:28:43 +01:00
Daniel Henrique Barboza	b858e80a02	9pfs/9p.c: remove unneeded labels 'out' label in v9fs_xattr_write() and 'out_nofid' label in v9fs_complete_rename() can be replaced by appropriate return calls. CC: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-01-20 15:11:39 +01:00
Greg Kurz	16724a1730	9p: init_in_iov_from_pdu can truncate the size init_in_iov_from_pdu might not be able to allocate the full buffer size requested, which comes from the client and could be larger than the transport has available at the time of the request. Specifically, this can happen with read operations, with the client requesting a read up to the max allowed, which might be more than the transport has available at the time. Today the implementation of init_in_iov_from_pdu throws an error, both Xen and Virtio. Instead, change the V9fsTransport interface so that the size becomes a pointer and can be limited by the implementation of init_in_iov_from_pdu. Change both the Xen and Virtio implementations to set the size to the size of the buffer they managed to allocate, instead of throwing an error. However, if the allocated buffer size is less than P9_IOHDRSZ (the size of the header) still throw an error as the case is unhandable. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> CC: groug@kaod.org CC: anthony.perard@citrix.com CC: roman@zededa.com CC: qemu_oss@crudebyte.com [groug: fix 32-bit build] Signed-off-by: Greg Kurz <groug@kaod.org>	2020-01-20 15:11:39 +01:00
Dan Schatzberg	68d654daee	9pfs: Fix divide by zero bug Some filesystems may return 0s in statfs (trivially, a FUSE filesystem can do so). QEMU should handle this gracefully and just behave the same as if statfs failed. Signed-off-by: Dan Schatzberg <dschatzberg@fb.com> Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-11-23 15:51:48 +01:00
Christian Schoenebeck	6b6aa8285d	9p: Use variable length suffixes for inode remapping Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution is more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well, so there is less likely a need for generating and tracking additional suffixes, which might also be beneficial for nested virtualization where each level of virtualization would shift up the inode bits and increase the chance of expensive remapping actions. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a parameter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1111111111111101] -> [1011111111111111000000000000000] (31 bits) 65534 [1111111111111110] -> [0111111111111111000000000000000] (31 bits) 65535 [1111111111111111] -> [1111111111111111000000000000000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [000001] (6 bits) 2 [10] -> [100001] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1111111111111101] -> [0011100000000000100000000000] (28 bits) 65534 [1111111111111110] -> [1011100000000000100000000000] (28 bits) 65535 [1111111111111111] -> [0111100000000000100000000000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:23 +02:00
Antonios Motakis	f3fe4a2d92	9p: stat_to_qid: implement slow path stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Updated hash calls to new xxhash API. - Removed unnecessary parantheses in qpf_lookup_func(). - Removed unnecessary g_malloc0() result checks. - Log error message when running out of prefixes in qid_path_fullmap(). - Log warning message about potential degraded performance in qid_path_prefixmap(). - Wrapped qpf_table initialization to dedicated qpf_table_init() function. - Fixed typo in comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:14 +02:00
Antonios Motakis	1a6ed33cc5	9p: Added virtfs option 'multidevs=remap\|forbid\|warn' 'warn' (default): Only log an error message (once) on host if more than one device is shared by same export, except of that just ignore this config error though. This is the default behaviour for not breaking existing installations implying that they really know what they are doing. 'forbid': Like 'warn', but except of just logging an error this also denies access of guest to additional devices. 'remap': Allows to share more than one device per export by remapping inodes from host to guest appropriately. To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Added virtfs option 'multidevs', original patch simply did the inode remapping without being asked. - Updated hash calls to new xxhash API. - Updated docs for new option 'multidevs'. - Fixed v9fs_do_readdir() not having remapped inodes. - Log error message when running out of prefixes in qid_path_prefixmap(). - Fixed definition of QPATH_INO_MASK. - Wrapped qpp_table initialization to dedicated qpp_table_init() function. - Dropped unnecessary parantheses in qpp_lookup_func(). - Dropped unnecessary g_malloc0() result checks. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [groug: - Moved "multidevs" parsing to the local backend. - Added hint to invalid multidevs option error. - Turn "remap" into "x-remap". ] Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:05 +02:00
Antonios Motakis	3b5ee9e86b	9p: Treat multiple devices on one export as an error The QID path should uniquely identify a file. However, the inode of a file is currently used as the QID path, which on its own only uniquely identifies files within a device. Here we track the device hosting the 9pfs share, in order to prevent security issues with QID path collisions from other devices. We only print a warning for now but a subsequent patch will allow users to have finer control over the desired behaviour. Failing the I/O will be one the proposed behaviour, so we also change stat_to_qid() to return an error here in order to keep other patches simpler. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Assign dev_id to export root's device already in v9fs_device_realize_common(), not postponed in stat_to_qid(). - error_report_once() if more than one device was shared by export. - Return -ENODEV instead of -ENOSYS in stat_to_qid(). - Fixed typo in log comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [groug, changed to warning, updated message and changelog] Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:05 +02:00
Greg Kurz	c0da0cb761	9p: Simplify error path of v9fs_device_realize_common() Make v9fs_device_unrealize_common() idempotent and use it for rollback, in order to reduce code duplication. Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:04 +02:00
Antonios Motakis	8703283352	9p: unsigned type for type, version, path There is no need for signedness on these QID fields for 9p. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Also make QID type unsigned. - Adjust donttouch_stat() to new types. - Adjust trace-events to new types. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:04 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Greg Kurz	1923923bfa	9p: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>	2018-12-12 14:18:10 +01:00
Greg Kurz	1d20398694	9p: fix QEMU crash when renaming files When using the 9P2000.u version of the protocol, the following shell command line in the guest can cause QEMU to crash: while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done With 9P2000.u, file renaming is handled by the WSTAT command. The v9fs_wstat() function calls v9fs_complete_rename(), which calls v9fs_fix_path() for every fid whose path is affected by the change. The involved calls to v9fs_path_copy() may race with any other access to the fid path performed by some worker thread, causing a crash like shown below: Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 59 while (*path && fd != -1) { (gdb) bt #0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 #1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8, path=0x0) at hw/9pfs/9p-local.c:92 #2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8, fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185 #3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498, path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53 #4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498) at hw/9pfs/9p.c:1083 #5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767) at util/coroutine-ucontext.c:116 #6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () (gdb) The fix is to take the path write lock when calling v9fs_complete_rename(), like in v9fs_rename(). Impact: DoS triggered by unprivileged guest users. Fixes: CVE-2018-19489 Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-11-23 13:28:03 +01:00
Greg Kurz	5b3c77aa58	9p: take write lock on fid path updates (CVE-2018-19364) Recent commit `5b76ef50f6` fixed a race where v9fs_co_open2() could possibly overwrite a fid path with v9fs_path_copy() while it is being accessed by some other thread, ie, use-after-free that can be detected by ASAN with a custom 9p client. It turns out that the same can happen at several locations where v9fs_path_copy() is used to set the fid path. The fix is again to take the write lock. Fixes CVE-2018-19364. Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-11-20 13:00:35 +01:00
Keno Fischer	aca6897fba	9p: xattr: Properly translate xattrcreate flags As with unlinkat, these flags come from the client and need to be translated to their host values. The protocol values happen to match linux, but that need not be true in general. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Keno Fischer	67e8734574	9p: Properly check/translate flags in unlinkat The 9p-local code previously relied on P9_DOTL_AT_REMOVEDIR and AT_REMOVEDIR having the same numerical value and deferred any errorchecking to the syscall itself. However, while the former assumption is true on Linux, it is not true in general. 9p-handle did this properly however. Move the translation code to the generic 9p server code and add an error if unrecognized flags are passed. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Keno Fischer	a647502c58	9p: xattr: Fix crashes due to free of uninitialized value If the size returned from llistxattr/lgetxattr is 0, we skipped the malloc call, leaving xattr.value uninitialized. However, this value is later passed to `g_free` without any further checks, causing an error. Fix that by always calling g_malloc unconditionally. If `size` is 0, it will return NULL, which is safe to pass to g_free. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Greg Kurz	8f9c64bfa5	9p: add trace event for v9fs_setattr() Don't print the tv_nsec part of atime and mtime, to stay below the 10 argument limit of trace events. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2018-05-02 08:59:24 +02:00
Marc-André Lureau	e446a1eb5e	9p: v9fs_path_copy() readability lhs/rhs doesn't tell much about how argument are handled, dst/src is and const arguments is clearer in my mind. Use g_memdup() while at it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-02-19 18:27:15 +01:00
Greg Kurz	357e2f7f4e	tests: virtio-9p: add FLUSH operation test The idea is to send a victim request that will possibly block in the server and to send a flush request to cancel the victim request. This patch adds two test to verifiy that: - the server does not reply to a victim request that was actually cancelled - the server replies to the flush request after replying to the victim request if it could not cancel it 9p request cancellation reference: http://man.cat-v.org/plan_9/5/flush Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> (groug, change the test to only write a single byte to avoid any alignment or endianess consideration)	2018-02-02 11:11:55 +01:00
Keno Fischer	fc78d5ee76	9pfs: Correctly handle cancelled requests # Background I was investigating spurious non-deterministic EINTR returns from various 9p file system operations in a Linux guest served from the qemu 9p server. ## EINTR, ERESTARTSYS and the linux kernel When a signal arrives that the Linux kernel needs to deliver to user-space while a given thread is blocked (in the 9p case waiting for a reply to its request in 9p_client_rpc -> wait_event_interruptible), it asks whatever driver is currently running to abort its current operation (in the 9p case causing the submission of a TFLUSH message) and return to user space. In these situations, the error message reported is generally ERESTARTSYS. If the userspace processes specified SA_RESTART, this means that the system call will get restarted upon completion of the signal handler delivery (assuming the signal handler doesn't modify the process state in complicated ways not relevant here). If SA_RESTART is not specified, ERESTARTSYS gets translated to EINTR and user space is expected to handle the restart itself. ## The 9p TFLUSH command The 9p TFLUSH commands requests that the server abort an ongoing operation. The man page [1] specifies: ``` If it recognizes oldtag as the tag of a pending transaction, it should abort any pending response and discard that tag. [...] When the client sends a Tflush, it must wait to receive the corresponding Rflush before reusing oldtag for subsequent messages. If a response to the flushed request is received before the Rflush, the client must honor the response as if it had not been flushed, since the completed request may signify a state change in the server ``` In particular, this means that the server must not send a reply with the orignal tag in response to the cancellation request, because the client is obligated to interpret such a reply as a coincidental reply to the original request. # The bug When qemu receives a TFlush request, it sets the `cancelled` flag on the relevant pdu. This flag is periodically checked, e.g. in `v9fs_co_name_to_path`, and if set, the operation is aborted and the error is set to EINTR. However, the server then violates the spec, by returning to the client an Rerror response, rather than discarding the message entirely. As a result, the client is required to assume that said Rerror response is a result of the original request, not a result of the cancellation and thus passes the EINTR error back to user space. This is not the worst thing it could do, however as discussed above, the correct error code would have been ERESTARTSYS, such that user space programs with SA_RESTART set get correctly restarted upon completion of the signal handler. Instead, such programs get spurious EINTR results that they were not expecting to handle. It should be noted that there are plenty of user space programs that do not set SA_RESTART and do not correctly handle EINTR either. However, that is then a userspace bug. It should also be noted that this bug has been mitigated by a recent commit to the Linux kernel [2], which essentially prevents the kernel from sending Tflush requests unless the process is about to die (in which case the process likely doesn't care about the response). Nevertheless, for older kernels and to comply with the spec, I believe this change is beneficial. # Implementation The fix is fairly simple, just skipping notification of a reply if the pdu was previously cancelled. We do however, also notify the transport layer that we're doing this, so it can clean up any resources it may be holding. I also added a new trace event to distinguish operations that caused an error reply from those that were cancelled. One complication is that we only omit sending the message on EINTR errors in order to avoid confusing the rest of the code (which may assume that a client knows about a fid if it sucessfully passed it off to pud_complete without checking for cancellation status). This does mean that if the server acts upon the cancellation flag, it always needs to set err to EINTR. I believe this is true of the current code. [1] https://9fans.github.io/plan9port/man/man9/flush.html [2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891 Signed-off-by: Keno Fischer <keno@juliacomputing.com> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, send a zero-sized reply instead of detaching the buffer] Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2018-02-01 21:21:27 +01:00
Greg Kurz	066eb006b5	9pfs: drop v9fs_register_transport() No good reasons to do this outside of v9fs_device_realize_common(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2018-02-01 21:21:27 +01:00
Greg Kurz	65603a801e	fsdev: improve error handling of backend init This patch changes some error messages in the backend init code and convert backends to propagate QEMU Error objects instead of calling error_report(). One notable improvement is that the local backend now provides a more detailed error report when it fails to open the shared directory. Signed-off-by: Greg Kurz <groug@kaod.org>	2018-01-08 11:18:23 +01:00
Greg Kurz	7567359094	9pfs: make pdu_marshal() and pdu_unmarshal() static functions They're only used by the 9p core code. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-01-08 11:18:22 +01:00
Greg Kurz	d1471233bb	9pfs: fix error path in pdu_submit() If we receive an unsupported request id, we first decide to return -ENOTSUPP to the client, but since the request id causes is_read_only_op() to return false, we change the error to be -EROFS if the fsdev is read-only. This doesn't make sense since we don't know what the client asked for. This patch ensures that -EROFS can only be returned if the request id is supported. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-01-08 11:18:22 +01:00
Greg Kurz	8e71b96c62	9pfs: fix some type definitions To comply with the QEMU coding style. Signed-off-by: Greg Kurz <groug@kaod.org>	2018-01-08 11:18:22 +01:00
Greg Kurz	267fcadf32	9pfs: fix v9fs_mark_fids_unreclaim() return value The return value of v9fs_mark_fids_unreclaim() is then propagated to pdu_complete(). It should be a negative errno, not -1. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-11-06 18:05:35 +01:00
Prasad J Pandit	7bd9275630	9pfs: use g_malloc0 to allocate space for xattr 9p back-end first queries the size of an extended attribute, allocates space for it via g_malloc() and then retrieves its value into allocated buffer. Race between querying attribute size and retrieving its could lead to memory bytes disclosure. Use g_malloc0() to avoid it. Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-10-16 14:21:59 +02:00
Jan Dakinevich	772a73692e	9pfs: check the size of transport buffer before marshaling v9fs_do_readdir_with_stat() should check for a maximum buffer size before an attempt to marshal gathered data. Otherwise, buffers assumed as misconfigured and the transport would be broken. The patch brings v9fs_do_readdir_with_stat() in conformity with v9fs_do_readdir() behavior. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused my commit `8d37de41ca` # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-20 08:48:52 +02:00
Jan Dakinevich	4d8bc7334b	9pfs: fix name_to_path assertion in v9fs_complete_rename() The third parameter of v9fs_co_name_to_path() must not contain `/' character. The issue is most likely related to 9p2000.u protocol only. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused by commit `f57f587857` # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-20 08:48:52 +02:00
Jan Dakinevich	6069537f43	9pfs: fix readdir() for 9p2000.u If the client is using 9p2000.u, the following occurs: $ cd ${virtfs_shared_dir} $ mkdir -p a/b/c $ ls a/b ls: cannot access 'a/b/a': No such file or directory ls: cannot access 'a/b/b': No such file or directory a b c instead of the expected: $ ls a/b c This is a regression introduced by commit f57f5878578a; local_name_to_path() now resolves ".." and "." in paths, and v9fs_do_readdir_with_stat()->stat_to_v9stat() then copies the basename of the resulting path to the response. With the example above, this means that "." and ".." are turned into "b" and "a" respectively... stat_to_v9stat() currently assumes it is passed a full canonicalized path and uses it to do two different things: 1) to pass it to v9fs_co_readlink() in case the file is a symbolic link 2) to set the name field of the V9fsStat structure to the basename part of the given path It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat(). v9fs_stat() really needs 1) and 2) to be performed since it starts with the full canonicalized path stored in the fid. It is different for v9fs_do_readdir_with_stat() though because the name we want to put into the V9fsStat structure is the d_name field of the dirent actually (ie, we want to keep the "." and ".." special names). So, we only need 1) in this case. This patch hence adds a basename argument to stat_to_v9stat(), to be used to set the name field of the V9fsStat structure, and moves the basename logic to v9fs_stat(). Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> (groug, renamed old name argument to path and updated changelog) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-20 08:48:51 +02:00
Philippe Mathieu-Daudé	403a905b03	9pfs: avoid sign conversion error simplifying the code (note this is how other functions also handle the errors). hw/9pfs/9p.c:948:18: warning: Loss of sign in implicit conversion offset = err; ^~~ Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-05 14:01:16 +02:00
Alistair Francis	3dc6f86936	Convert error_report() to warn_report() Convert all uses of error_report("warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these two commands: find ./* -type f -exec sed -i \ 's\|error_report(".*warning[,:] \|warn_report("\|Ig' {} + Indentation fixed up manually afterwards. The test-qdev-global-props test case was manually updated to ensure that this patch passes make check (as the test cases are case sensitive). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Thomas Huth <thuth@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Josh Durgin <jdurgin@redhat.com> Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Greg Kurz <groug@kaod.org> Cc: Rob Herring <robh@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Peter Chubb <peter.chubb@nicta.com.au> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-07-13 13:49:58 +02:00
Greg Kurz	06a37db7b1	9pfs: handle transport errors in pdu_complete() Contrary to what is written in the comment, a buggy guest can misconfigure the transport buffers and pdu_marshal() may return an error. If this ever happens, it is up to the transport layer to handle the situation (9P is transport agnostic). This fixes Coverity issue CID1348518. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-06-29 15:11:51 +02:00
Greg Kurz	8d37de41ca	virtio-9p: break device if buffers are misconfigured The 9P protocol is transport agnostic: if the guest misconfigured the buffers, the best we can do is to set the broken flag on the device. Signed-off-by: Greg Kurz <groug@kaod.org>	2017-06-29 15:11:51 +02:00
Tobias Schramm	b96feb2cb9	9pfs: local: Add support for custom fmode/dmode in 9ps mapped security modes In mapped security modes, files are created with very restrictive permissions (600 for files and 700 for directories). This makes file sharing between virtual machines and users on the host rather complicated. Imagine eg. a group of users that need to access data produced by processes on a virtual machine. Giving those users access to the data will be difficult since the group access mode is always 0. This patch makes the default mode for both files and directories configurable. Existing setups that don't know about the new parameters keep using the current secure behavior. Signed-off-by: Tobias Schramm <tobleminer@gmail.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-06-29 15:11:50 +02:00
Greg Kurz	4fa62005d0	9pfs: check return value of v9fs_co_name_to_path() These v9fs_co_name_to_path() call sites have always been around. I guess no care was taken to check the return value because the name_to_path operation could never fail at the time. This is no longer true: the handle and synth backends can already fail this operation, and so will the local backend soon. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-25 10:30:14 +02:00
Greg Kurz	a17d8659c4	9pfs: drop pdu_push_and_notify() Only pdu_complete() needs to notify the client that a request has completed. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Greg Kurz	506f327582	virtio-9p/xen-9p: move 9p specific bits to core 9p code These bits aren't related to the transport so let's move them to the core code. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Juan Quintela	795c40b8bd	migration: Create migration/blocker.h This allows us to remove lots of includes of migration/migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Greg Kurz	6d54af0ea9	9pfs: clear migration blocker at session reset The migration blocker survives a device reset: if the guest mounts a 9p share and then gets rebooted with system_reset, it will be unmigratable until it remounts and umounts the 9p share again. This happens because the migration blocker is supposed to be cleared when we put the last reference on the root fid, but virtfs_reset() wrongly calls free_fid() instead of put_fid(). This patch fixes virtfs_reset() so that it honor the way fids are supposed to be manipulated: first get a reference and later put it back when you're done. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Li Qiang <liqiang6-s@360.cn>	2017-04-04 18:06:01 +02:00
Greg Kurz	18adde86dd	9pfs: fix multiple flush for same request If a client tries to flush the same outstanding request several times, only the first flush completes. Subsequent ones keep waiting for the request completion in v9fs_flush() and, therefore, leak a PDU. This will cause QEMU to hang when draining active PDUs the next time the device is reset. Let have each flush request wake up the next one if any. The last waiter frees the cancelled PDU. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-04-04 18:06:01 +02:00
Li Qiang	d63fb193e7	9pfs: fix file descriptor leak The v9fs_create() and v9fs_lcreate() functions are used to create a file on the backend and to associate it to a fid. The fid shouldn't be already in-use, otherwise both functions may silently leak a file descriptor or allocated memory. The current code doesn't check that. This patch ensures that the fid isn't already associated to anything before using it. Signed-off-by: Li Qiang <liqiang6-s@360.cn> (reworded the changelog, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-03-27 21:13:19 +02:00
Greg Kurz	d5f2af7b95	9pfs: don't try to flush self and avoid QEMU hang on reset According to the 9P spec [], when a client wants to cancel a pending I/O request identified by a given tag (uint16), it must send a Tflush message and wait for the server to respond with a Rflush message before reusing this tag for another I/O. The server may still send a completion message for the I/O if it wasn't actually cancelled but the Rflush message must arrive after that. QEMU hence waits for the flushed PDU to complete before sending the Rflush message back to the client. If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then allocate a PDU identified by tag, find it in the PDU list and wait for this same PDU to complete... i.e. wait for a completion that will never happen. This causes a tag and ring slot leak in the guest, and a PDU leak in QEMU, all of them limited by the maximal number of PDUs (128). But, worse, this causes QEMU to hang on device reset since v9fs_reset() wants to drain all pending I/O. This insane behavior is likely to denote a bug in the client, and it would deserve an Rerror message to be sent back. Unfortunately, the protocol allows it and requires all flush requests to suceed (only a Tflush response is expected). The only option is to detect when we have to handle a self-referencing flush request and report success to the client right away. [] http://man.cat-v.org/plan_9/5/flush Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-03-21 09:12:47 +01:00
Pradeep Jagadeesh	b8bbdb886e	fsdev: add IO throttle support to fsdev devices This patchset adds the throttle support for the 9p-local driver. For now this functionality can be enabled only through qemu cli options. QMP interface and support to other drivers need further extensions. To make it simple for other 9p drivers, the throttle code has been put in separate files. Signed-off-by: Pradeep Jagadeesh <pradeep.jagadeesh@huawei.com> Reviewed-by: Alberto Garcia <berto@igalia.com> (pass extra NULL CoMutex * argument to qemu_co_queue_wait(), added options to qemu-options.hx, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-02-28 10:31:46 +01:00
Paolo Bonzini	4bae2b397f	9pfs: fix v9fs_lock error case In this case, we are marshaling an error status instead of the errno value. Reorganize the out and out_nofid labels to look like all the other cases. Coverity reports this because the "err = -ENOENT" and "err = -EINVAL" assignments above are dead, overwritten by the call to pdu_marshal. (Coverity issues CID1348512 and CID1348513) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (also open-coded the success path since locking is a nop for us, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-02-28 10:31:46 +01:00
Paolo Bonzini	1ace7ceac5	coroutine-lock: add mutex argument to CoQueue APIs All that CoQueue needs in order to become thread-safe is help from an external mutex. Add this to the API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Peter Maydell	c7f1cf01b8	This pull request fixes a 2.9 regression and a long standing bug that can cause 9p clients to hang. Other patches are minor enhancements. -----BEGIN PGP SIGNATURE----- iEYEABECAAYFAliIegsACgkQAvw66wEB28LjzwCeIKbBFC/hbc43UqaNX82OGd2v soYAn0YYXJUAykyjNEMLdhhNp+rABzNk =1PaE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging This pull request fixes a 2.9 regression and a long standing bug that can cause 9p clients to hang. Other patches are minor enhancements. # gpg: Signature made Wed 25 Jan 2017 10:12:27 GMT # gpg: using DSA key 0x02FC3AEB0101DBC2 # gpg: Good signature from "Greg Kurz <groug@kaod.org>" # gpg: aka "Greg Kurz <groug@free.fr>" # gpg: aka "Greg Kurz <gkurz@fr.ibm.com>" # gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>" # gpg: aka "Gregory Kurz (Groug) <groug@free.fr>" # gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>" # gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2 * remotes/gkurz/tags/for-upstream: 9pfs: fix offset error in v9fs_xattr_read() 9pfs: local: trivial cosmetic fix in pwritev op 9pfs: fix off-by-one error in PDU free list tests: virtio-9p: improve error reporting 9pfs: add missing coroutine_fn annotations Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-01-25 17:54:14 +00:00
Greg Kurz	fa0eb5c512	9pfs: fix offset error in v9fs_xattr_read() The current code tries to copy `read_count' bytes starting at offset `offset' from a `read_count`-sized iovec. This causes v9fs_pack() to fail with ENOBUFS. Since the PDU iovec is already partially filled with `offset' bytes, let's skip them when creating `qiov_full' and have v9fs_pack() to copy the whole of it. Moreover, this is consistent with the other places where v9fs_init_qiov_from_pdu() is called. This fixes commit "bcb8998fac16 9pfs: call v9fs_init_qiov_from_pdu before v9fs_pack". Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-01-25 09:34:35 +01:00
Greg Kurz	0d78289c3d	9pfs: fix off-by-one error in PDU free list The server can handle MAX_REQ - 1 PDUs at a time and the virtio-9p device has a MAX_REQ sized virtqueue. If the client manages to fill up the virtqueue, pdu_alloc() will fail and the request won't be processed without any notice to the client (it actually causes the linux 9p client to hang). This has been there since the beginning (commit `9f10751365` "virtio-9p: Add a virtio 9p device to qemu"), but it needs an agressive workload to run in the guest to show up. We actually allocate MAX_REQ PDUs and I see no reason not to link them all into the free list, so let's fix the init loop. Reported-by: Tuomas Tynkkynen <tuomas@tuxera.com> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-01-25 09:34:35 +01:00
Greg Kurz	a1bf8b7414	9pfs: add missing coroutine_fn annotations Signed-off-by: Greg Kurz <groug@kaod.org>	2017-01-25 09:34:35 +01:00
Ashijeet Acharya	fe44dc9180	migration: disallow migrate_add_blocker during migration If a migration is already in progress and somebody attempts to add a migration blocker, this should rightly fail. Add an errp parameter and a retcode return value to migrate_add_blocker. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Merged with recent 'Allow invtsc migration' change	2017-01-24 18:00:30 +00:00
Greg Kurz	f2b58c4375	9pfs: fix crash when fsdev is missing If the user passes -device virtio-9p without the corresponding -fsdev, QEMU dereferences a NULL pointer and crashes. This is a 2.8 regression introduced by commit `702dbcc274`. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Li Qiang <liq3ea@gmail.com>	2017-01-03 17:28:44 +01:00
Stefano Stabellini	88da0b0301	9pfs: introduce init_out/in_iov_from_pdu Not all 9pfs transports share memory between request and response. For those who don't, it is necessary to know how much memory is required in the response. Split the existing init_iov_from_pdu function in two: init_out_iov_from_pdu (for writes) and init_in_iov_from_pdu (for reads). init_in_iov_from_pdu takes an additional size parameter to specify the memory required for the response message. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-01-03 17:28:44 +01:00
Stefano Stabellini	bcb8998fac	9pfs: call v9fs_init_qiov_from_pdu before v9fs_pack v9fs_xattr_read should not access VirtQueueElement elems directly. Move v9fs_init_qiov_from_pdu up in the file and call v9fs_init_qiov_from_pdu before v9fs_pack. Use v9fs_pack on the new iovec. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-01-03 17:28:44 +01:00
Stefano Stabellini	ea83441cc4	9pfs: introduce transport specific callbacks Don't call virtio functions from 9pfs generic code, use generic function callbacks instead. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-01-03 17:28:44 +01:00
Stefano Stabellini	583f21f8b9	9pfs: move pdus to V9fsState pdus are initialized and used in 9pfs common code. Move the array from V9fsVirtioState to V9fsState. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-01-03 17:28:44 +01:00
Li Qiang	702dbcc274	9pfs: add cleanup operation in FileOperations Currently, the backend of VirtFS doesn't have a cleanup function. This will lead resource leak issues if the backed driver allocates resources. This patch addresses this issue. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2016-11-23 13:53:34 +01:00
Li Qiang	4774718e5c	9pfs: adjust the order of resource cleanup in device unrealize Unrealize should undo things that were set during realize in reverse order. So should do in the error path in realize. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2016-11-23 13:53:34 +01:00
Greg Kurz	79decce35b	9pfs: drop excessive error message from virtfs_reset() The virtfs_reset() function is called either when the virtio-9p device gets reset, or when the client starts a new 9P session. In both cases, if it finds fids from a previous session, the following is printed in the monitor: 9pfs:virtfs_reset: One or more uncluncked fids found during reset For example, if a linux guest with a mounted 9P share is reset from the monitor with system_reset, the message will be printed. This is excessive since these fids are now clunked and the state is clean. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-11-01 12:03:03 +01:00
Greg Kurz	49dd946bb5	9pfs: don't BUG_ON() if fid is already opened A buggy or malicious guest could pass the id of an already opened fid and cause QEMU to abort. Let's return EINVAL to the guest instead. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-11-01 12:03:02 +01:00
Greg Kurz	dd654e0365	9pfs: xattrcreate requires non-opened fids The xattrcreate operation only makes sense on a freshly cloned fid actually, since any open state would be leaked because of the fid_type change. This is indeed what the linux kernel client does: fid = clone_fid(fid); [...] retval = p9_client_xattrcreate(fid, name, value_len, flags); This patch also reverts commit `ff55e94d23` since we are sure that a fid with type P9_FID_NONE doesn't have a previously allocated xattr. Signed-off-by: Greg Kurz <groug@kaod.org>	2016-11-01 12:03:02 +01:00
Greg Kurz	3b79ef2cf4	9pfs: limit xattr size in xattrcreate We shouldn't allow guests to create extended attribute with arbitrary sizes. On linux hosts, the limit is XATTR_SIZE_MAX. Let's use it. Signed-off-by: Greg Kurz <groug@kaod.org>	2016-11-01 12:03:02 +01:00
Li Qiang	7e55d65c56	9pfs: fix integer overflow issue in xattr read/write The v9fs_xattr_read() and v9fs_xattr_write() are passed a guest originated offset: they must ensure this offset does not go beyond the size of the extended attribute that was set in v9fs_xattrcreate(). Unfortunately, the current code implement these checks with unsafe calculations on 32 and 64 bit values, which may allow a malicious guest to cause OOB access anyway. Fix this by comparing the offset and the xattr size, which are both uint64_t, before trying to compute the effective number of bytes to read or write. Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-By: Guido Günther <agx@sigxcpu.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2016-11-01 12:03:01 +01:00
Li Qiang	dd28fbbc2e	9pfs: add xattrwalk_fid field in V9fsXattr struct Currently, 9pfs sets the 'copied_len' field in V9fsXattr to -1 to tag xattr walk fid. As the 'copied_len' is also used to account for copied bytes, this may make confusion. This patch add a bool 'xattrwalk_fid' to tag the xattr walk fid. Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2016-11-01 12:00:40 +01:00
Li Qiang	fdfcc9aeea	9pfs: fix memory leak in v9fs_write If an error occurs when marshalling the transfer length to the guest, the v9fs_write() function doesn't free an IO vector, thus leading to a memory leak. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Li Qiang	4c1586787f	9pfs: fix memory leak in v9fs_link The v9fs_link() function keeps a reference on the source fid object. This causes a memory leak since the reference never goes down to 0. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Li Qiang	ff55e94d23	9pfs: fix memory leak in v9fs_xattrcreate The 'fs.xattr.value' field in V9fsFidState object doesn't consider the situation that this field has been allocated previously. Every time, it will be allocated directly. This leads to a host memory leak issue if the client sends another Txattrcreate message with the same fid number before the fid from the previous time got clunked. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, updated the changelog to indicate how the leak can occur] Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Li Qiang	eb68760285	9pfs: fix information leak in xattr read 9pfs uses g_malloc() to allocate the xattr memory space, if the guest reads this memory before writing to it, this will leak host heap memory to the guest. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Greg Kurz	0e44a0fd3f	virtio-9p: add reset handler Virtio devices should implement the VirtIODevice->reset() function to perform necessary cleanup actions and to bring the device to a quiescent state. In the case of the virtio-9p device, this means: - emptying the list of active PDUs (i.e. draining all in-flight I/O) - freeing all fids (i.e. close open file descriptors and free memory) That's what this patch does. The reset handler first waits for all active PDUs to complete. Since completion happens in the QEMU global aio context, we just have to loop around aio_poll() until the active list is empty. The freeing part involves some actions to be performed on the backend, like closing file descriptors or flushing extended attributes to the underlying filesystem. The virtfs_reset() function already does the job: it calls free_fid() for all open fids not involved in an ongoing I/O operation. We are sure this is the case since we have drained the PDU active list. The current code implements all backend accesses with coroutines, but we want to stay synchronous on the reset path. We can either change the current code to be able to run when not in coroutine context, or create a coroutine context and wait for virtfs_reset() to complete. This patch goes for the latter because it results in simpler code. Note that we also need to create a dummy PDU because it is also an API to pass the FsContext pointer to all backend callbacks. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-10-17 14:13:58 +02:00
Greg Kurz	f74e27bf0f	9pfs: only free completed request if not flushed If a PDU has a flush request pending, the current code calls pdu_free() twice: 1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing 2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the PDU back to the free list. This works but it complexifies the logic of pdu_free(). With this patch, pdu_complete() only calls pdu_free() if no flush request is pending, i.e. qemu_co_queue_next() returns false. Since pdu_free() is now supposed to be called with pdu->cancelled cleared, the check in pdu_free() is dropped and replaced by an assertion. Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Greg Kurz	6868a420c5	9pfs: drop useless check in pdu_free() Out of the three users of pdu_free(), none ever passes a NULL pointer to this function. Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Greg Kurz	8440e22ec1	9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch] All these functions either call the v9fs_co_* functions which have the coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Li Qiang	e95c9a493a	9pfs: fix potential host memory leak in v9fs_read In 9pfs read dispatch function, it doesn't free two QEMUIOVector object thus causing potential memory leak. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Li Qiang	ba42ebb863	9pfs: allocate space for guest originated empty strings If a guest sends an empty string paramater to any 9P operation, the current code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }. This is unfortunate because it can cause NULL pointer dereference to happen at various locations in the 9pfs code. And we don't want to check str->data everywhere we pass it to strcmp() or any other function which expects a dereferenceable pointer. This patch enforces the allocation of genuine C empty strings instead, so callers don't have to bother. Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if the returned string is empty. It now uses v9fs_string_size() since name.data cannot be NULL anymore. Signed-off-by: Li Qiang <liqiang6-s@360.cn> [groug, rewritten title and changelog, fix empty string check in v9fs_xattrwalk()] Signed-off-by: Greg Kurz <groug@kaod.org>	2016-10-17 14:13:58 +02:00
Greg Kurz	13fd08e631	9pfs: fix potential segfault during walk If the call to fid_to_qid() returns an error, we will call v9fs_path_free() on uninitialized paths. It is a regression introduced by the following commit: `56f101ecce` 9pfs: handle walk of ".." in the root directory Let's fix this by initializing dpath and path before calling fid_to_qid(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> [groug: updated the changelog to indicate this is regression and to provide the offending commit SHA1] Signed-off-by: Greg Kurz <groug@kaod.org>	2016-09-19 11:39:48 +02:00
Greg Kurz	e3e83f2e21	9pfs: introduce v9fs_path_sprintf() helper This helper is similar to v9fs_string_sprintf(), but it includes the terminating NUL character in the size field. This is to avoid doing v9fs_string_sprintf((V9fsString *) &path) and then bumping the size. Affected users are changed to use this new helper. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2016-09-16 08:56:15 +02:00
Greg Kurz	abdf008640	9pfs: drop useless v9fs_string_null() function The v9fs_string_null() function just calls v9fs_string_free(). Also it only has 4 users, whereas v9fs_string_free() has 87. This patch converts users to call directly v9fs_string_free() and drops the useless function. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2016-09-16 08:56:15 +02:00
Greg Kurz	56f101ecce	9pfs: handle walk of ".." in the root directory The 9P spec at http://man.cat-v.org/plan_9/5/intro says: All directories must support walks to the directory .. (dot-dot) meaning parent directory, although by convention directories contain no explicit entry for .. or . (dot). The parent of the root directory of a server's tree is itself. This means that a client cannot walk further than the root directory exported by the server. In other words, if the client wants to walk "/.." or "/foo/../..", the server should answer like the request was to walk "/". This patch just does that: - we cache the QID of the root directory at attach time - during the walk we compare the QID of each path component with the root QID to detect if we're in a "/.." situation - if so, we skip the current component and go to the next one Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-08-30 19:23:00 +01:00
Greg Kurz	805b5d98c6	9pfs: forbid . and .. in file names According to the 9P spec http://man.cat-v.org/plan_9/5/open about the create request: The names . and .. are special; it is illegal to create files with these names. This patch causes the create and lcreate requests to fail with EINVAL if the file name is either "." or "..". Even if it isn't explicitly written in the spec, this patch extends the checking to all requests that may cause a directory entry to be created: - mknod - rename - renameat - mkdir - link - symlink The unlinkat request also gets patched for consistency (even if rmdir("foo/..") is expected to fail according to POSIX.1-2001). The various error values come from the linux manual pages. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-08-30 19:21:56 +01:00
Greg Kurz	fff39a7ad0	9pfs: forbid illegal path names Empty path components don't make sense for most commands and may cause undefined behavior, depending on the backend. Also, the walk request described in the 9P spec [1] clearly shows that the client is supposed to send individual path components: the official linux client never sends portions of path containing the / character for example. Moreover, the 9P spec [2] also states that a system can decide to restrict the set of supported characters used in path components, with an explicit mention "to remove slashes from name components". This patch introduces a new name_is_illegal() helper that checks the names sent by the client are not empty and don't contain unwanted chars. Since 9pfs is only supported on linux hosts, only the / character is checked at the moment. When support for other hosts (AKA. win32) is added, other chars may need to be blacklisted as well. If a client sends an illegal path component, the request will fail and ENOENT is returned to the client. [1] http://man.cat-v.org/plan_9/5/walk [2] http://man.cat-v.org/plan_9/5/intro Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-08-30 19:21:39 +01:00
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Greg Kurz	635324e83e	9p: switch back to readdir() This patch changes the 9p code to use readdir() again instead of readdir_r(), which is deprecated in glibc 2.24. All the locking was put in place by a previous patch. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-06-06 11:52:34 +02:00
Greg Kurz	7cde47d4a8	9p: add locking to V9fsDir If several threads concurrently call readdir() with the same directory stream pointer, it is possible that they all get a pointer to the same dirent structure, whose content is overwritten each time readdir() is called. We must thus serialize accesses to the dirent structure. This may be achieved with a mutex like below: lock_mutex(); readdir(); // work with the dirent unlock_mutex(); This patch adds all the locking, to prepare the switch to readdir(). Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-06-06 11:52:34 +02:00
Greg Kurz	f314ea4e30	9p: introduce the V9fsDir type If we are to switch back to readdir(), we need a more complex type than DIR * to be able to serialize concurrent accesses to the directory stream. This patch introduces a placeholder type and fixes all users. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-06-06 11:52:34 +02:00
Greg Kurz	8762a46d36	9p: drop useless out: label Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-06-06 11:52:34 +02:00
Greg Kurz	beff62e683	9p: drop useless inclusion of hw/i386/pc.h Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-06-06 11:52:34 +02:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
Paolo Bonzini	51b19ebe43	virtio: move allocation to virtqueue_pop/vring_pop The return code of virtqueue_pop/vring_pop is unused except to check for errors or 0. We can thus easily move allocation inside the functions and just return a pointer to the VirtQueueElement. The advantage is that we will be able to allocate only the space that is needed for the actual size of the s/g list instead of the full VIRTQUEUE_MAX_SIZE items. Currently VirtQueueElement takes about 48K of memory, and this kind of allocation puts a lot of stress on malloc. By cutting the size by two or three orders of magnitude, malloc can use much more efficient algorithms. The patch is pretty large, but changes to each device are testable more or less independently. Splitting it would mostly add churn. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2016-02-06 20:39:07 +02:00
Peter Maydell	fbc0412709	9pfs: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-18-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:23 +00:00
Greg Kurz	63325b181f	9pfs: use error_report() instead of fprintf(stderr) Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-01-22 15:12:17 +01:00
Wei Liu	00588a0aa2	9pfs: introduce V9fsVirtioState V9fsState now only contains generic fields. Introduce V9fsVirtioState for virtio transport. Change virtio-pci and virtio-ccw to use V9fsVirtioState. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2016-01-12 11:04:14 +05:30
Wei Liu	2a0c56aa4c	9pfs: factor out v9fs_device_{,un}realize_common Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2016-01-08 15:33:24 +05:30
Wei Liu	60ce86c714	9pfs: rename virtio-9p.c to 9p.c Now that file only contains generic code. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2016-01-08 15:32:13 +05:30

1 2 3

147 Commits