mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Simran Singhal	b3ac2b94cd	Compress lines for immediate return Compress two lines into a single line if immediate return statement is found. It also remove variables progress, val, data, ret and sock as they are no longer needed. Remove space between function "mixer_load" and '(' to fix the checkpatch.pl error:- ERROR: space prohibited between function name and open parenthesis '(' Done using following coccinelle script: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Simran Singhal <singhalsimran0@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200401165314.GA3213@simran-Inspiron-5558> [lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef] Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-05-04 14:43:22 +02:00
Markus Armbruster	2500f6f30b	qemu-option: Clean up after the previous commit Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200415083048.14339-6-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-04-30 06:51:15 +02:00
Markus Armbruster	7b1cd1c65a	qobject: Eliminate qdict_iter(), use qdict_first(), qdict_next() qdict_iter() has just three uses and no test coverage. Replace by qdict_first(), qdict_next() for more concise code and less type punning. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200415083048.14339-5-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-04-30 06:51:15 +02:00
Markus Armbruster	80c710cb06	qemu-img: Move is_valid_option_list() to qemu-img.c and rewrite is_valid_option_list()'s purpose is ensuring qemu-img.c's can safely join multiple parameter strings separated by ',' like this: g_strdup_printf("%s,%s", params1, params2); How it does that is anything but obvious. A close reading of the code reveals that it fails exactly when its argument starts with ',' or ends with an odd number of ','. Makes sense, actually, because when the argument starts with ',', a separating ',' preceding it would get escaped, and when it ends with an odd number of ',', a separating ',' following it would get escaped. Move it to qemu-img.c and rewrite it the obvious way. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200415074927.19897-9-armbru@redhat.com>	2020-04-29 08:01:52 +02:00
Markus Armbruster	56a9efa199	qemu-option: Avoid has_help_option() in qemu_opts_parse_noisily() When opts_parse() sets @invalidp to true, qemu_opts_parse_noisily() uses has_help_option() to decide whether to print help. This parses the input string a second time. Easy to avoid: replace @invalidp by @help_wanted. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200415074927.19897-7-armbru@redhat.com>	2020-04-29 08:01:51 +02:00
Markus Armbruster	80a9485573	qemu-option: Fix has_help_option()'s sloppy parsing has_help_option() uses its own parser. It's inconsistent with qemu_opts_parse(), as demonstrated by test-qemu-opts case /qemu-opts/has_help_option. Fix by reusing the common parser. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200415074927.19897-5-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-04-29 08:01:51 +02:00
Markus Armbruster	933d152778	qemu-option: Fix sloppy recognition of "id=..." after ",," Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200415074927.19897-4-armbru@redhat.com>	2020-04-29 08:01:51 +02:00
Markus Armbruster	6129803b55	qemu-options: Factor out get_opt_name_value() helper The next commits will put it to use. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200415074927.19897-3-armbru@redhat.com>	2020-04-29 08:01:51 +02:00
Peter Maydell	e33d61cc9a	Bugfixes, and reworking of the atomics documentation. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6UDRYUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOSGggAgy/pnlhh5NjGc0PZLhz09O1MlOiT iS8/RCudLR/yDJ0K7pweWKc1lGrS11G1n1P+58G6sK7al4NOdlMMgtk1VtZAlMJ4 dSQ+DGV7JaoPztu5ec2V7LiJmhyxrVaKx7xg9JGx0bZ/1wCC1GqZUlZ2hYdgQ8L4 EchdwqzRd2sznlUVAP19ZcPb6sYG2VlkIzFytd5p3xZqrr0g3RJa7nmWRWAnEx1L 5/13U6g2PEU3jFKTtOcELFq8F/tB8id+fwIE2GB3glKzBHXnJSAfpzBV3/8L72xV JqSUa62O12qGX5k5F9BJPgcfxs40wyEkWTBJuW+WQvsmI73EJ3B30gjkhw== =8Dsv -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging Bugfixes, and reworking of the atomics documentation. # gpg: Signature made Mon 13 Apr 2020 07:56:22 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: module: increase dirs array size by one memory: Do not allow direct write access to rom_device regions vl.c: error out if -mem-path is used together with -M memory-backend rcu: do not mention atomic_mb_read/set in documentation atomics: update documentation atomics: convert to reStructuredText oslib-posix: take lock before qemu_cond_broadcast piix: fix xenfv regression, add compat machine xenfv-4.2 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-04-13 13:11:38 +01:00
Bruce Rogers	267514b33f	module: increase dirs array size by one With the module upgrades code change, the statically sized dirs array can now overflow. Increase it's size by one, according to the new maximum possible usage. Fixes: `bd83c861c0` ("modules: load modules from versioned /var/run dir") Signed-off-by: Bruce Rogers <brogers@suse.com> Message-Id: <20200411010746.472295-1-brogers@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-13 02:56:18 -04:00
Bauerchen	278fb16273	oslib-posix: take lock before qemu_cond_broadcast In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast, qemu_cond_broadcast may be called before all touch page threads enter qemu_cond_wait. In this case, the touch page threads wait forever for the main thread to wake them up, causing a deadlock. Signed-off-by: Bauerchen <bauerchen@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-11 08:49:20 -04:00
Paolo Bonzini	5710a3e09f	async: use explicit memory barriers When using C11 atomics, non-seqcst reads and writes do not participate in the total order of seqcst operations. In util/async.c and util/aio-posix.c, in particular, the pattern that we use write ctx->notify_me write bh->scheduled read bh->scheduled read ctx->notify_me if !bh->scheduled, sleep if ctx->notify_me, notify needs to use seqcst operations for both the write and the read. In general this is something that we do not want, because there can be many sources that are polled in addition to bottom halves. The alternative is to place a seqcst memory barrier between the write and the read. This also comes with a disadvantage, in that the memory barrier is implicit on strongly-ordered architectures and it wastes a few dozen clock cycles. Fortunately, ctx->notify_me is never written concurrently by two threads, so we can assert that and relax the writes to ctx->notify_me. The resulting solution works and performs well on both aarch64 and x86. Note that the atomic_set/atomic_read combination is not an atomic read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED; on x86, ATOMIC_RELAXED compiles to a locked operation. Analyzed-by: Ying Fang <fangying1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Ying Fang <fangying1@huawei.com> Message-Id: <20200407140746.8041-6-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-04-09 16:17:14 +01:00
Stefan Hajnoczi	636b836d5f	aio-posix: signal-proof fdmon-io_uring The io_uring_enter(2) syscall returns with errno=EINTR when interrupted by a signal. Retry the syscall in this case. It's essential to do this in the io_uring_submit_and_wait() case. My interpretation of the Linux v5.5 io_uring_enter(2) code is that it shouldn't affect the io_uring_submit() case, but there is no guarantee this will always be the case. Let's check for -EINTR around both APIs. Note that the liburing APIs have -errno return values. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200408091139.273851-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-04-09 15:55:06 +01:00
Alex Bennée	01ef6b9e4e	linux-user: factor out reading of /proc/self/maps Unfortunately reading /proc/self/maps is still considered the gold standard for a process finding out about it's own memory layout. As we will want this data in other contexts soon factor out the code to read and parse the data. Rather than just blindly copying the existing sscanf based code we use a more modern glib version of the parsing code to make a more general purpose map structure. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20200403191150.863-9-alex.bennee@linaro.org>	2020-04-07 16:19:49 +01:00
Stefan Hajnoczi	ae60ab7eb2	aio-posix: fix test-aio /aio/event/wait with fdmon-io_uring When a file descriptor becomes ready we must re-arm POLL_ADD. This is done by adding an sqe to the io_uring sq ring. The ->need_wait() function wasn't taking pending sqes into account and therefore io_uring_submit_and_wait() was not being called. Polling for cqes failed to detect fd readiness since we hadn't submitted the sqe to io_uring. This patch fixes the following tests/test-aio -p /aio/event/wait failure: ok 11 /aio/event/wait ** ERROR:tests/test-aio.c:374:test_flush_event_notifier: assertion failed: (aio_poll(ctx, false)) Reported-by: Cole Robinson <crobinso@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200402145434.99349-1-stefanha@redhat.com Fixes: `73fd282e7b` ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-04-03 12:42:40 +01:00
Robert Hoo	8f13a39dc0	util/bufferiszero: improve avx2 accelerator By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a branch. The authorship of this patch actually belongs to Richard Henderson <richard.henderson@linaro.org>, I just fixed a boundary case on his original patch. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Message-Id: <1585119021-46593-2-git-send-email-robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-01 14:24:03 -04:00
Robert Hoo	b87c99d073	util/bufferiszero: assign length_to_accel value for each accelerator case Because in unit test, init_accel() will be called several times, each with different accelerator type. Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Message-Id: <1585119021-46593-1-git-send-email-robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-01 14:24:03 -04:00
Stefan Hajnoczi	ff807d5592	aio-posix: fix io_uring with external events When external event sources are disabled fdmon-io_uring falls back to fdmon-poll. The ->need_wait() callback needs to watch for this so it can return true when external event sources are disabled. It is also necessary to call ->wait() when AioHandlers have changed because io_uring is asynchronous and we must submit new sqes. Both of these changes to ->need_wait() together fix tests/test-aio -p /aio/external-client, which failed with: test-aio: tests/test-aio.c:404: test_aio_external_client: Assertion `aio_poll(ctx, false)' failed. Reported-by: Julia Suvorova <jusual@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200319163559.117903-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-03-23 11:05:44 +00:00
Vladimir Sementsov-Ogievskiy	299ea9ff01	block/dirty-bitmap: improve _next_dirty_area API Firstly, _next_dirty_area is for scenarios when we may contiguously search for next dirty area inside some limited region, so it is more comfortable to specify "end" which should not be recalculated on each iteration. Secondly, let's add a possibility to limit resulting area size, not limiting searching area. This will be used in NBD code in further commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	9399c54b75	block/dirty-bitmap: add _next_dirty API We have bdrv_dirty_bitmap_next_zero, let's add corresponding bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than bitmap iterators in some cases. For test modify test_hbitmap_next_zero_check_range to check both next_zero and next_dirty and add some new checks. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	642700fda0	block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t We are going to introduce bdrv_dirty_bitmap_next_dirty so that same variable may be used to store its return value and to be its parameter, so it would int64_t. Similarly, we are going to refactor hbitmap_next_dirty_area to use hbitmap_next_dirty together with hbitmap_next_zero, therefore we want hbitmap_next_zero parameter type to be int64_t too. So, for convenience update all parameters of _next_zero and _next_dirty_area to be int64_t. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	0c88f1970c	hbitmap: drop meta bitmaps as they are unused Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	30b8346cc3	hbitmap: unpublish hbitmap_iter_skip_words Function is internal and even commented as internal. Drop its definition from .h file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	be24c7140c	hbitmap: move hbitmap_iter_next_word to hbitmap.c The function is definitely internal (it's not used by third party and it has complicated interface). Move it to .c file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	6a150995d4	hbitmap: assert that we don't create bitmap larger than INT64_MAX We have APIs which returns signed int64_t, to be able to return error. Therefore we can't handle bitmaps with absolute size larger than (INT64_MAX+1). Still, keep maximum to be INT64_MAX which is a bit safer. Note, that bitmaps are used to represent disk images, which can't exceed INT64_MAX anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Stefan Hajnoczi	3284c3ddc4	lockable: add lock guards This patch introduces two lock guard macros that automatically unlock a lock object (QemuMutex and others): void f(void) { QEMU_LOCK_GUARD(&mutex); if (!may_fail()) { return; /* automatically unlocks mutex / } ... } and: WITH_QEMU_LOCK_GUARD(&mutex) { if (!may_fail()) { return; / automatically unlocks mutex / } } / automatically unlocks mutex here */ ... Convert qemu-timer.c functions that benefit from these macros as an example. Manual qemu_mutex_lock/unlock() callers are left unmodified in cases where clarity would not improve by switching to the macros. Many other QemuMutex users remain in the codebase that might benefit from lock guards. Over time they can be converted, if that is desirable. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> [Use QEMU_MAKE_LOCKABLE_NONNULL. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-17 15:18:45 +01:00
Christian Ehrhardt	bd83c861c0	modules: load modules from versioned /var/run dir On upgrades the old .so files usually are replaced. But on the other hand since a qemu process represents a guest instance it is usually kept around. That makes late addition of dynamic features e.g. 'hot-attach of a ceph disk' fail by trying to load a new version of e.f. block-rbd.so into an old still running qemu binary. This adds a fallback to also load modules from a versioned directory in the temporary /var/run path. That way qemu is providing a way for packaging to store modules of an upgraded qemu package as needed until the next reboot. An example how that can then be used in packaging can be seen in: https://git.launchpad.net/~paelzer/ubuntu/+source/qemu/log/?h=bug-1847361-miss-old-so-on-upgrade-UBUNTU Fixes: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1847361 Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200310145806.18335-2-christian.ehrhardt@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:22 +01:00
Paolo Bonzini	78b3f67acd	oslib-posix: initialize mutex and condition variable The mutex and condition variable were never initialized, causing -mem-prealloc to abort with an assertion failure. Fixes: `037fb5eb39` Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Cc: bauerchen <bauerchen@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:22 +01:00
Robert Hoo	27f08ea1c7	util: add util function buffer_zero_avx512() And intialize buffer_is_zero() with it, when Intel AVX512F is available on host. This function utilizes Intel AVX512 fundamental instructions which is faster than its implementation with AVX2 (in my unit test, with 4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S. buffer_zero_avx2()). Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:21 +01:00
Peter Maydell	6e8a73e911	Pull request -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl5o3EQACgkQnKSrs4Gr c8jbYwgAupuS62MCITszybRE5Ote5CL80QDbMXNsZnZh6YBIGhPCqW2zdI2+m0zu +qoN6x6dxxNUWCvNlhbOcA45fiZVmYzS69TiEo21kCDijoK+h8+W+YbXuxGR2xJi cZMm8Q1DiK6Lj3vyfiwkFf4ns3VNz9DhI9hXu6CcpSkNcp79elQu87JJbzEWWWWy uEc7uEyBr0uCAKLEJvaLzzzWE2D2i6qKlmj3G17UbDNgCJ/Q/5HX13RUfMrgNgiP wmpcJ5MsB3Prz3K4XMMytUKXX/M8zpRLahp3p31t9qHelTWC3Lk1U4xzLPTJZPlm if/lrGRCRml+DKb9keBjWeTF4U31vg== =LftU -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request # gpg: Signature made Wed 11 Mar 2020 12:40:36 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: aio-posix: remove idle poll handlers to improve scalability aio-posix: support userspace polling of fd monitoring aio-posix: add io_uring fd monitoring implementation aio-posix: simplify FDMonOps->update() prototype aio-posix: extract ppoll(2) and epoll(7) fd monitoring aio-posix: move RCU_READ_LOCK() into run_poll_handlers() aio-posix: completely stop polling when disabled aio-posix: remove confusing QLIST_SAFE_REMOVE() qemu/queue.h: clear linked list pointers on remove Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-11 14:41:27 +00:00
Stefan Hajnoczi	d37d0e365a	aio-posix: remove idle poll handlers to improve scalability When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>	2020-03-09 16:45:16 +00:00
Stefan Hajnoczi	aa38e19f05	aio-posix: support userspace polling of fd monitoring Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	73fd282e7b	aio-posix: add io_uring fd monitoring implementation The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	b321051cf4	aio-posix: simplify FDMonOps->update() prototype The AioHandler node, bool is_new arguments are more complicated to think about than simply being given AioHandler old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	1f050a4690	aio-posix: extract ppoll(2) and epoll(7) fd monitoring The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	3aa221b382	aio-posix: move RCU_READ_LOCK() into run_poll_handlers() Now that run_poll_handlers_once() is only called by run_poll_handlers() we can improve the CPU time profile by moving the expensive RCU_READ_LOCK() out of the polling loop. This reduces the run_poll_handlers() from 40% CPU to 10% CPU in perf's sampling profiler output. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-3-stefanha@redhat.com Message-Id: <20200305170806.1313245-3-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	e4346192f1	aio-posix: completely stop polling when disabled One iteration of polling is always performed even when polling is disabled. This is done because: 1. Userspace polling is cheaper than making a syscall. We might get lucky. 2. We must poll once more after polling has stopped in case an event occurred while stopping polling. However, there are downsides: 1. Polling becomes a bottleneck when the number of event sources is very high. It's more efficient to monitor fds in that case. 2. A high-frequency polling event source can starve non-polling event sources because ppoll(2)/epoll(7) is never invoked. This patch removes the forced polling iteration so that poll_ns=0 really means no polling. IOPS increases from 10k to 60k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device because the large number of event sources being polled slows down the event loop. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-2-stefanha@redhat.com Message-Id: <20200305170806.1313245-2-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	c39cbedb54	aio-posix: remove confusing QLIST_SAFE_REMOVE() QLIST_SAFE_REMOVE() is confusing here because the node must be on the list. We actually just wanted to clear the linked list pointers when removing it from the list. QLIST_REMOVE() now does this, so switch to it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200224103406.1894923-3-stefanha@redhat.com Message-Id: <20200224103406.1894923-3-stefanha@redhat.com>	2020-03-09 16:39:20 +00:00
Philippe Mathieu-Daudé	cf0c76cd6d	util/osdep: Improve error report by calling error_setg_win32() Use error_setg_win32() which adds a hint similar to strerror(errno)). Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200228100726.8414-3-philmd@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-03-09 13:36:15 +01:00
bauerchen	037fb5eb39	mem-prealloc: optimize large guest startup [desc]: Large memory VM starts slowly when using -mem-prealloc, and there are some areas to optimize in current method; 1、mmap will be used to alloc threads stack during create page clearing threads, and it will attempt mm->mmap_sem for write lock, but clearing threads have hold read lock, this competition will cause threads createion very slow; 2、methods of calcuating pages for per threads is not well;if we use 64 threads to split 160 hugepage,63 threads clear 2page,1 thread clear 34 page,so the entire speed is very slow; to solve the first problem,we add a mutex in thread function,and start all threads when all threads finished createion; and the second problem, we spread remainder to other threads,in situation that 160 hugepage and 64 threads, there are 32 threads clear 3 pages,and 32 threads clear 2 pages. [test]: 320G 84c VM start time can be reduced to 10s 680G 84c VM start time can be reduced to 18s Signed-off-by: bauerchen <bauerchen@tencent.com> Reviewed-by: Pan Rui <ruippan@tencent.com> Reviewed-by: Ivan Ren <ivanren@tencent.com> [Simplify computation of the number of pages per thread. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-25 09:18:01 +01:00
Alexander Bulekov	46a07579eb	module: check module wasn't already initialized The virtual-device fuzzer must initialize QOM, prior to running vl:qemu_init, so that it can use the qos_graph to identify the arguments required to initialize a guest for libqos-assisted fuzzing. This change prevents errors when vl:qemu_init tries to (re)initialize the previously initialized QOM module. Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200220041118.23264-4-alxndr@bu.edu Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	7391d34c3c	aio-posix: make AioHandler dispatch O(1) with epoll File descriptor monitoring is O(1) with epoll(7), but aio_dispatch_handlers() still scans all AioHandlers instead of dispatching just those that are ready. This makes aio_poll() O(n) with respect to the total number of registered handlers. Add a local ready_list to aio_poll() so that each nested aio_poll() builds a list of handlers ready to be dispatched. Since file descriptor polling is level-triggered, nested aio_poll() calls also see fds that were ready in the parent but not yet dispatched. This guarantees that nested aio_poll() invocations will dispatch all fds, even those that became ready before the nested invocation. Since only handlers ready to be dispatched are placed onto the ready_list, the new aio_dispatch_ready_handlers() function provides O(1) dispatch. Note that AioContext polling is still O(n) and currently cannot be fully disabled. This still needs to be fixed before aio_poll() is fully O(1). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-6-stefanha@redhat.com [Fix compilation error on macOS where there is no epoll(87). The aio_epoll() prototype was out of date and aio_add_ready_list() needed to be moved outside the ifdef. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	4749079ce0	aio-posix: make AioHandler deletion O(1) It is not necessary to scan all AioHandlers for deletion. Keep a list of deleted handlers instead of scanning the full list of all handlers. The AioHandler->deleted field can be dropped. Let's check if the handler has been inserted into the deleted list instead. Add a new QLIST_IS_INSERTED() API for this check. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	ca8c6b2275	aio-posix: don't pass ns timeout to epoll_wait() Don't pass the nanosecond timeout into epoll_wait(), which expects milliseconds. The epoll_wait() timeout value does not matter if qemu_poll_ns() determined that the poll fd is ready, but passing a value in the wrong units is still ugly. Pass a 0 timeout to epoll_wait() instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	ff29ed3a33	aio-posix: fix use after leaving scope in aio_poll() epoll_handler is a stack variable and must not be accessed after it goes out of scope: if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) { AioHandler epoll_handler; ... add_pollfd(&epoll_handler); ret = aio_epoll(ctx, pollfds, npfd, timeout); } ... ... /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { nodes[i]->pfd.revents = pollfds[i].revents; } } nodes[0] is &epoll_handler, which has already gone out of scope. There is no need to use pollfds[] for epoll. We don't need an AioHandler for the epoll fd. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	8c6b0356b5	util/async: make bh_aio_poll() O(1) The ctx->first_bh list contains all created BHs, including those that are not scheduled. The list is iterated by the event loop and therefore has O(n) time complexity with respected to the number of created BHs. Rewrite BHs so that only scheduled or deleted BHs are enqueued. Only BHs that actually require action will be iterated. One semantic change is required: qemu_bh_delete() enqueues the BH and therefore invokes aio_notify(). The tests/test-aio.c:test_source_bh_delete_from_cb() test case assumed that g_main_context_iteration(NULL, false) returns false after qemu_bh_delete() but it now returns true for one iteration. Fix up the test case. This patch makes aio_compute_timeout() and aio_bh_poll() drop from a CPU profile reported by perf-top(1). Previously they combined to 9% CPU utilization when AioContext polling is commented out and the guest has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200221093951.1414693-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	f25c0b5479	aio-posix: avoid reacquiring rcu_read_lock() when polling The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. This optimization increases IOPS from 73k to 162k with a Linux guest that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200218182708.914552-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Peter Maydell	a8c6af67e1	ppc patch queue 2020-02-21 Here's the next patch of ppc target patches. Highlights are: * Some fixes for CAS / unplug interactions * Remove some leaks of device trees * Some fixes for the PHB3 and PHB4 devices * Support for NVDIMMs on the pseries machine type * Assorted other fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl5PUAwACgkQbDjKyiDZ s5KBiBAAzUs/K1ygQuvcNTChOeu+3rBMe5XWRSRPIhqIZWnebssptwiMfqjVQXHB 1unLmmkUU8ov9cXWiRiAF/mrmBX6Kw2UbnNzjatvB7L1BX8b3Z848vYKb0mTZG6g Q7E8nyqQqmWUtzeF6B0sNf597oPalHO5j6r5e/Pl5hjZGkjDXihB5tKePBRucnOO wCsWSsoafaBJwq2w8KBahfgLYdQdCW2G2GHcfGouobzravEuOaxpknoLJiUw65Wa Gf6PFrJE5XxDbn+zxkvs6RvPnNMkfFxn0fYbmPghjAM58Unz2LGyf9RRd5b6beuu fYsoZPxX+gOFVjBBwjTmj+VxSL1Fw+b2xOrd1uf7zVNyJNO8AsxRWQMD7bLFTGSZ 95KkP5wSjcw2sm8eTtKkUAOP9YcTBJP1mjfjX/9rSGASCoOv6DJ71YOtyI4pzh5S DYLYm2EL3A0BxqxLodSx4xcTrh3IHAUXcteYr+ndGOy7MHzZGR88gtUmuAoFcgKc N0xV8dRoC5gNgLLxQ7zMh9q6qQHr0n13rRFKQ075piU4ce7JUzSR+EXCfQh2HLv+ GYM+FpZo2zFsWZxXDQE1fsLaCfz533HsplevAM2J/bGWAokYMEZKV6jV7DKXgxW0 6SrrSdxJrEThb198j1jSKCs0vwLDPcvV1GV6/sWStHBH1totRhY= =hrAW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200221' into staging ppc patch queue 2020-02-21 Here's the next patch of ppc target patches. Highlights are: * Some fixes for CAS / unplug interactions * Remove some leaks of device trees * Some fixes for the PHB3 and PHB4 devices * Support for NVDIMMs on the pseries machine type * Assorted other fixes and cleanups # gpg: Signature made Fri 21 Feb 2020 03:35:40 GMT # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-5.0-20200221: hw/ppc/virtex_ml507:fix leak of fdevice tree blob spapr: Fix handling of unplugged devices during CAS and migration spapr: Don't use spapr_drc_needed() in CAS code ppc: free 'fdt' after reset the machine target/ppc/cpu.h: Clean up comments in the struct CPUPPCState definition target/ppc/cpu.h: Move fpu related members closer in cpu env target/ppc: Fix typo in comments spapr: Allow changing offset for -kernel image pnv/phb3: Add missing break statement pnv/phb4: Fix error path in pnv_pec_realize() pnv/phb3: Convert 1u to 1ull target/ppc/cpu.h: Remove duplicate includes spapr: Add Hcalls to support PAPR NVDIMM device spapr: Add NVDIMM device support nvdimm: add uuid property to nvdimm mem: move nvdimm_device_list to utilities ppc: function to setup latest class options ppc/pnv: Fix PCI_EXPRESS dependency qtest: Fix rtas dependencies spapr/rtas: Print message from "ibm,os-term" Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-02-21 14:20:42 +00:00
Shivaprasad G Bhat	3f350f6bb3	mem: move nvdimm_device_list to utilities nvdimm_device_list is required for parsing the list for devices in subsequent patches. Move it to common utility area. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <158131055857.2897.15658377276504711773.stgit@lep8c.aus.stglabs.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:03 +11:00
Peter Maydell	b651b80822	Implement membarrier, SO_RCVTIMEO and SO_SNDTIMEO Disable by default build of fdt, slirp and tools with linux-user Improve strace and use qemu_log to send trace to a file Add partial ALSA ioctl supports -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl5OT1QSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748i9kQAJYbShtYQNoNhSf/joKtQbxJ6vtQPryU 8J85wD2RShYpX2qi86WZ4IGFhSE+wFQ+Lfi1Z0SPs/VUgTSaiOYYEZBfROiKzY/M hbpXlfzWGOLxfzgP17QR5eoSxbVA81N7ruyTiUxFGUTeVbowIXR+/v5Ek8GOJCuu pAA/07NHxuP/YP2MkrlfAolFte5riQgfQT+QrJwpVwygdRwMx0Ed+gDWJosl8zL/ qEOzyZlXI/Mm+5bKCUQknSqpL3yQibmYfqM0XLhxjfvZ4vJDYk3Vet7ww39Je9Rz 7eAmXU7ERzzuFBTA7HEJjAjy+BrPkPMJINczfYcahUrt3TYS36DxinTre0TJbkMV 1np1Snn4LhSbvcAY/gFpO6X+lS5siYKZC9Ki2chBfl7AygjfswSl1wqE4cNwEyma ycjdYnlzo8JvuPPI7qploFAPr0hDgloAVfIzTlz+LC7dhsAhHtpf0vq0xC717z98 qOTO1MVoZa0WserrnyAFDaNkBwwYGwhm9WKcd23/9XO6gjc5yZWF98oc31Z1KpT4 ya+qqpcy++FNM3K+74BC403pdoda75N7sl0HYsxJWnhRjUOAPjEwHRo69iGzO/ku 8AOcjws/wf2BnoR7KO/KLQrkpc+EAGY/ulndTqUEyC1MdsMoF6VOB5aTFiBgJBMO GlsvTkt+05i4 =WAH+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging Implement membarrier, SO_RCVTIMEO and SO_SNDTIMEO Disable by default build of fdt, slirp and tools with linux-user Improve strace and use qemu_log to send trace to a file Add partial ALSA ioctl supports # gpg: Signature made Thu 20 Feb 2020 09:20:20 GMT # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/linux-user-for-5.0-pull-request: linux-user: Add support for selected alsa timer instructions using ioctls linux-user: Add support for getting/setting selected alsa timer parameters using ioctls linux-user: Add support for selecting alsa timer using ioctl linux-user: Add support for getting/setting specified alsa timer parameters using ioctls linux-user: Add support for getting alsa timer version and id linux-user: remove gemu_log from the linux-user tree linux-user: Use `qemu_log' for strace linux-user: Use `qemu_log' for non-strace logging configure: Avoid compiling system tools on user build by default linux-user/strace: Improve output of various syscalls configure: linux-user doesn't need neither fdt nor slirp linux-user: implement getsockopt SO_RCVTIMEO and SO_SNDTIMEO linux-user: Implement membarrier syscall Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-02-20 17:35:42 +00:00

1 2 3 4 5 ...

1124 Commits