mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Marc-André Lureau	f793dde091	Replace qemu_gettimeofday() with g_get_real_time() GLib g_get_real_time() is an alternative to gettimeofday() which allows to simplify our code. For semihosting, a few bits are lost on POSIX host, but this shouldn't be a big concern. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220307070401.171986-5-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-06 10:50:37 +02:00
Bernhard Beschow	a2d860bb54	virtio/virtio-balloon: Prefer Object* over void* parameter opaque is an alias to obj. Using the ladder makes the code consistent with with other devices, e.g. accel/kvm/kvm-all and accel/tcg/tcg-all. It also makes the cast more typesafe. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220301222301.103821-2-shentey@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2022-03-18 13:57:50 +01:00
Peter Maydell	b85ea5fa2f	include: Move qemu_madvise() and related #defines to new qemu/madvise.h The function qemu_madvise() and the QEMU_MADV_* constants associated with it are used in only 10 files. Move them out of osdep.h to a new qemu/madvise.h header that is included where it is needed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org	2022-02-21 13:30:20 +00:00
Jason Wang	d3f1f940eb	virtio-balloon: correct used length Spec said: "and len the total of bytes written into the buffer." For inflateq, deflateq and statsq, we don't process in_sg so the used length should be zero. For free_page_vq, tough the pages could be changed by the device (in the destination), spec said: "Note: len is particularly useful for drivers using untrusted buffers: if a driver does not know exactly how much has been written by the device, the driver would have to zero the buffer in advance to ensure no data leakage occurs." So 0 should be used as well here. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20211129030841.3611-2-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2021-11-29 08:49:36 -05:00
Jason Wang	0fe7245d8b	virtio-balloon: process all in sgs for free_page_vq We only process the first in sg which may lead to the bitmap of the pages belongs to following sgs were not cleared. This may result more pages to be migrated. Fixing this by process all in sgs for free_page_vq. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20211129030841.3611-1-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-11-29 08:49:36 -05:00
Dr. David Alan Gilbert	243a9284a9	virtio-balloon: Fix page-poison subsection name The subsection name for page-poison was typo'd as: vitio-balloon-device/page-poison Note the missing 'r' in virtio. When we have a machine type that enables page poison, and the guest enables it (which needs a new kernel), things fail rather unpredictably. The fallout from this is that most of the other subsections fail to load, including things like the feature bits in the device, one possible fallout is that the physical addresses of the queues then get aligned differently and we fail with an error about last_avail_idx being wrong. It's not obvious to me why this doesn't produce a more obvious failure, but virtio's vmstate loading is a bit open-coded. Fixes: `7483cbbaf8` ("virtio-balloon: Implement support for page poison reporting feature") bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210914131716.102851-1-dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2021-10-05 17:30:57 -04:00
David Hildenbrand	2d050ed07c	virtio-balloon: free page hinting cleanups Let's compress the code a bit to improve readability. We can drop the vm_running check in virtio_balloon_free_page_start() as it's already properly checked in the single caller. Cc: Wei Wang <wei.w.wang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210708095339.20274-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-04 16:35:17 -04:00
David Hildenbrand	fd51e54fa1	virtio-balloon: don't start free page hinting if postcopy is possible Postcopy never worked properly with 'free-page-hint=on', as there are at least two issues: 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE and consequently won't release free pages back to the OS once migration finishes. The issue is that for postcopy, we won't do a final bitmap sync while the guest is stopped on the source and virtio_balloon_free_page_hint_notify() will only call virtio_balloon_free_page_done() on the source during PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to the destination. 2) Once the VM touches a page on the destination that has been excluded from migration on the source via qemu_guest_free_page_hint() while postcopy is active, that thread will stall until postcopy finishes and all threads are woken up. (with older Linux kernels that won't retry faults when woken up via userfaultfd, we might actually get a SEGFAULT) The issue is that the source will refuse to migrate any pages that are not marked as dirty in the dirty bmap -- for example, because the page might just have been sent. Consequently, the faulting thread will stall, waiting for the page to be migrated -- which could take quite a while and result in guest OS issues. While we could fix 1) comparatively easily, 2) is harder to get right and might require more involved RAM migration changes on source and destination [1]. As it never worked properly, let's not start free page hinting in the precopy notifier if the postcopy migration capability was enabled to fix it easily. Capabilities cannot be enabled once migration is already running. Note 1: in the future we might either adjust migration code on the source to track pages that have actually been sent or adjust migration code on source and destination to eventually send pages multiple times from the source and and deal with pages that are sent multiple times on the destination. Note 2: virtio-mem has similar issues, however, access to "unplugged" memory by the guest is very rare and we would have to be very lucky for it to happen during migration. The spec states "The driver SHOULD NOT read from unplugged memory blocks ..." and "The driver MUST NOT write to unplugged memory blocks". virtio-mem will move away from virtio_balloon_free_page_done() soon and handle this case explicitly on the destination. [1] https://lkml.kernel.org/r/e79fd18c-aa62-c1d8-c7f3-ba3fc2c25fc8@redhat.com Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: qemu-stable@nongnu.org Cc: Wei Wang <wei.w.wang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210708095339.20274-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2021-09-04 16:35:17 -04:00
David Hildenbrand	1a37352277	migrate/ram: remove "ram_bulk_stage" and "fpo_enabled" The bulk stage is kind of weird: migration_bitmap_find_dirty() will indicate a dirty page, however, ram_save_host_page() will never save it, as migration_bitmap_clear_dirty() detects that it is not dirty. We already fill the bitmap in ram_list_init_bitmaps() with ones, marking everything dirty - it didn't used to be that way, which is why we needed an explicit first bulk stage. Let's simplify: make the bitmap the single source of thuth. Explicitly handle the "xbzrle_enabled after first round" case. Regarding XBZRLE (implicitly handled via "ram_bulk_stage = false" right now), there is now a slight change in behavior: - Colo: When starting, it will be disabled (was implicitly enabled) until the first round actually finishes. - Free page hinting: When starting, XBZRLE will be disabled (was implicitly enabled) until the first round actually finished. - Snapshots: When starting, XBZRLE will be disabled. We essentially only do a single run, so I guess it will never actually get disabled. Postcopy seems to indirectly disable it in ram_save_page(), so there shouldn't be really any change. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210216105039.40680-1-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Andrey Gruzdev	1a8e44a89f	migration: Inhibit virtio-balloon for the duration of background snapshot The same thing as for incoming postcopy - we cannot deal with concurrent RAM discards when using background snapshot feature in outgoing migration. Fixes: `8518278a6a` (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-06 18:56:01 +01:00
Peter Maydell	729cc68373	Remove superfluous timer_del() calls This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org	2021-01-08 15:13:38 +00:00
Paolo Bonzini	b326b6ea79	make ram_size local to vl.c Use the machine properties for the leftovers too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-10 12:15:10 -05:00
Philippe Mathieu-Daudé	a83e24ba1a	qapi: Restrict balloon-related commands to machine code Only qemu-system-FOO and qemu-storage-daemon provide QMP monitors, therefore such declarations and definitions are irrelevant for user-mode emulation. Restricting the balloon-related commands to machine.json pulls less QAPI-generated code into user-mode. Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200913195348.1064154-4-philmd@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-09-29 15:41:35 +02:00
Alexander Duyck	3219b42f02	virtio-balloon: Replace free page hinting references to 'report' with 'hint' Recently a feature named Free Page Reporting was added to the virtio balloon. In order to avoid any confusion we should drop the use of the word 'report' when referring to Free Page Hinting. So what this patch does is go through and replace all instances of 'report' with 'hint" when we are referring to free page hinting. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Message-Id: <20200720175128.21935.93927.stgit@localhost.localdomain> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-22 07:57:07 -04:00
Alexander Duyck	1a83e0b9c4	virtio-balloon: Add locking to prevent possible race when starting hinting There is already locking in place when we are stopping free page hinting but there is not similar protections in place when we start. I can only assume this was overlooked as in most cases the page hinting should not be occurring when we are starting the hinting, however there is still a chance we could be processing hints by the time we get back around to restarting the hinting so we are better off making sure to protect the state with the mutex lock rather than just updating the value with no protections. Based on feedback from Peter Maydell this issue had also been spotted by Coverity: CID 1430269 Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Message-Id: <20200720175122.21935.78013.stgit@localhost.localdomain> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-22 07:57:07 -04:00
Alexander Duyck	20a4da0f23	virtio-balloon: Prevent guest from starting a report when we didn't request one Based on code review it appears possible for the driver to force the device out of a stopped state when hinting by repeating the last ID it was provided. Prevent this by only allowing a transition to the start state when we are in the requested state. This way the driver is only allowed to send one descriptor that will transition the device into the start state. All others will leave it in the stop state once it has finished. Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Message-Id: <20200720175115.21935.99563.stgit@localhost.localdomain> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-22 07:57:07 -04:00
Markus Armbruster	668f62ec62	error: Eliminate error_propagate() with Coccinelle, part 1 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) \| - !fun(args, &err, args2) + !fun(args, errp, args2) \| - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var \| !var \| var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; \| return var; ) } @depends on rule1 \|\| rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	62a35aaa31	qapi: Use returned bool to check for failure, Coccinelle part The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list\|input_type_enum\|lv_start_struct\|lv_type_bool\|lv_type_int64\|lv_type_str\|lv_type_uint64\|output_type_enum\|parse_type_bool\|parse_type_int64\|parse_type_null\|parse_type_number\|parse_type_size\|parse_type_str\|parse_type_uint64\|print_type_bool\|print_type_int64\|print_type_null\|print_type_number\|print_type_size\|print_type_str\|print_type_uint64\|qapi_clone_start_alternate\|qapi_clone_start_list\|qapi_clone_start_struct\|qapi_clone_type_bool\|qapi_clone_type_int64\|qapi_clone_type_null\|qapi_clone_type_number\|qapi_clone_type_str\|qapi_clone_type_uint64\|qapi_dealloc_start_list\|qapi_dealloc_start_struct\|qapi_dealloc_type_anything\|qapi_dealloc_type_bool\|qapi_dealloc_type_int64\|qapi_dealloc_type_null\|qapi_dealloc_type_number\|qapi_dealloc_type_str\|qapi_dealloc_type_uint64\|qobject_input_check_list\|qobject_input_check_struct\|qobject_input_start_alternate\|qobject_input_start_list\|qobject_input_start_struct\|qobject_input_type_any\|qobject_input_type_bool\|qobject_input_type_bool_keyval\|qobject_input_type_int64\|qobject_input_type_int64_keyval\|qobject_input_type_null\|qobject_input_type_number\|qobject_input_type_number_keyval\|qobject_input_type_size_keyval\|qobject_input_type_str\|qobject_input_type_str_keyval\|qobject_input_type_uint64\|qobject_input_type_uint64_keyval\|qobject_output_start_list\|qobject_output_start_struct\|qobject_output_type_any\|qobject_output_type_bool\|qobject_output_type_int64\|qobject_output_type_null\|qobject_output_type_number\|qobject_output_type_str\|qobject_output_type_uint64\|start_list\|visit_check_list\|visit_check_struct\|visit_start_alternate\|visit_start_list\|visit_start_struct\|visit_type_."; expression list args; typedef Error; Error err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
David Hildenbrand	06df2e692a	virtio-balloon: Rip out qemu_balloon_inhibit() The only remaining special case is postcopy. It cannot handle concurrent discards yet, which would result in requesting already sent pages from the source. Special-case it in virtio-balloon instead. Introduce migration_in_incoming_postcopy(), to find out if incoming postcopy is active. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-02 05:54:59 -04:00
David Hildenbrand	dd8eeb9671	virtio-balloon: always indicate S_DONE when migration fails If something goes wrong during precopy, before stopping the VM, we will never send a S_DONE indication to the VM, resulting in the hinted pages not getting released to be used by the guest OS (e.g., Linux). Easy to reproduce: 1. Start migration (e.g., HMP "migrate -d 'exec:gzip -c > STATEFILE.gz'") 2. Cancel migration (e.g., HMP "migrate_cancel") 3. Oberve in the guest (e.g., cat /proc/meminfo) that there is basically no free memory left. While at it, add similar locking to virtio_balloon_free_page_done() as done in virtio_balloon_free_page_stop. Locking is still weird, but that has to be sorted out separately. There is nothing to do in the PRECOPY_NOTIFY_COMPLETE case. Add some comments regarding S_DONE handling. Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200629080615.26022-1-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-02 05:54:59 -04:00
Alexander Duyck	91b867191d	virtio-balloon: Provide an interface for free page reporting Add support for free page reporting. The idea is to function very similar to how the balloon works in that we basically end up madvising the page as not being used. However we don't really need to bother with any deflate type logic since the page will be faulted back into the guest when it is read or written to. This provides a new way of letting the guest proactively report free pages to the hypervisor, so the hypervisor can reuse them. In contrast to inflate/deflate that is triggered via the hypervisor explicitly. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Message-Id: <20200527041407.12700.73735.stgit@localhost.localdomain>	2020-06-09 14:18:04 -04:00
Alexander Duyck	7483cbbaf8	virtio-balloon: Implement support for page poison reporting feature We need to make certain to advertise support for page poison reporting if we want to actually get data on if the guest will be poisoning pages. Add a value for reporting the poison value being used if page poisoning is enabled in the guest. With this we can determine if we will need to skip free page reporting when it is enabled in the future. The value currently has no impact on existing balloon interfaces. In the case of existing balloon interfaces the onus is on the guest driver to reapply whatever poison is in place. When we add free page reporting the poison value is used to determine if we can perform in-place page reporting. The expectation is that a reported page will already contain the value specified by the poison, and the reporting of the page should not change that value. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Message-Id: <20200527041400.12700.33251.stgit@localhost.localdomain>	2020-06-09 14:18:04 -04:00
David Hildenbrand	105aef9c94	virtio-balloon: unref the iothread when unrealizing We took a reference when realizing, so let's drop that reference when unrealizing. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: qemu-stable@nongnu.org Cc: Wei Wang <wei.w.wang@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200520100439.19872-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-06-09 14:18:04 -04:00
David Hildenbrand	49b01711b8	virtio-balloon: fix free page hinting check on unrealize Checking against guest features is wrong. We allocated data structures based on host features. We can rely on "free_page_bh" as an indicator whether to un-do stuff instead. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: qemu-stable@nongnu.org Cc: Wei Wang <wei.w.wang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200520100439.19872-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-06-09 14:18:04 -04:00
David Hildenbrand	12fc8903a8	virtio-balloon: fix free page hinting without an iothread In case we don't have an iothread, we mark the feature as abscent but still add the queue. 'free_page_bh' remains set to NULL. qemu-system-i386 \ -M microvm \ -nographic \ -device virtio-balloon-device,free-page-hint=true \ -nographic \ -display none \ -monitor none \ -serial none \ -qtest stdio Doing a "write 0xc0000e30 0x24 0x030000000300000003000000030000000300000003000000030000000300000003000000" We will trigger a SEGFAULT. Let's move the check and bail out. While at it, move the static initializations to instance_init(). free_page_report_status and block_iothread are implicitly set to the right values (0/false) already, so drop the initialization. Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Reported-by: Alexander Bulekov <alxndr@bu.edu> Cc: qemu-stable@nongnu.org Cc: Wei Wang <wei.w.wang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200520100439.19872-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-06-09 14:18:04 -04:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Markus Armbruster	d2623129a7	qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]	2020-05-15 07:07:58 +02:00
Marc-André Lureau	4f67d30b5e	qdev: set properties with device_class_set_props() The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:15 +01:00
Pan Nengyuan	3627842855	virtio-balloon: fix memory leak while attach virtio-balloon device ivq/dvq/svq/free_page_vq is forgot to cleanup in virtio_balloon_device_unrealize, the memory leak stack is as follow: Direct leak of 14336 byte(s) in 2 object(s) allocated from: #0 0x7f99fd9d8560 in calloc (/usr/lib64/libasan.so.3+0xc7560) #1 0x7f99fcb20015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) #2 0x557d90638437 in virtio_add_queue hw/virtio/virtio.c:2327 #3 0x557d9064401d in virtio_balloon_device_realize hw/virtio/virtio-balloon.c:793 #4 0x557d906356f7 in virtio_device_realize hw/virtio/virtio.c:3504 #5 0x557d9073f081 in device_set_realized hw/core/qdev.c:876 #6 0x557d908b1f4d in property_set_bool qom/object.c:2080 #7 0x557d908b655e in object_property_set_qobject qom/qom-qobject.c:26 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <1575444716-17632-2-git-send-email-pannengyuan@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2020-01-05 07:03:03 -05:00
Markus Armbruster	a27bd6c779	Include hw/qdev-properties.h less In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>	2019-08-16 13:31:53 +02:00
Michael S. Tsirkin	1b47b37c33	virtio-balloon: free pbp more aggressively Previous patches switched to a temporary pbp but that does not go far enough: after device uses a buffer, guest is free to reuse it, so tracking the page and freeing it later is wrong. Free and reset the pbp after we push each element. Fixes: `ed48c59875` ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Cc: qemu-stable@nongnu.org #v4.0.0 Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-25 11:19:25 -04:00
David Hildenbrand	9a7ca8a7c9	virtio-balloon: don't track subpages for the PBP As ramblocks cannot get removed/readded while we are processing a bulk of inflation requests, there is no more need to track the page size in form of the number of subpages. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190725113638.4702-8-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-25 07:58:10 -04:00
David Hildenbrand	a8cd64d488	virtio-balloon: Use temporary PBP only We still have multiple issues in the current code - The PBP is not freed during unrealize() - The PBP is not reset on device resets: After a reset, the PBP is stale. - We are not indicating VIRTIO_BALLOON_F_MUST_TELL_HOST, therefore guests (esp. legacy guests) will reuse pages without deflating, turning the PBP stale. Adding that would require compat handling. Instead, let's use the PBP only temporarily, when processing one bulk of inflation requests. This will keep guest_page_size > 4k working (with Linux guests). There is nothing to do for deflation requests anymore. The pbp is only used for a limited amount of time. Fixes: `ed48c59875` ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Cc: qemu-stable@nongnu.org #v4.0.0 Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-25 07:58:03 -04:00
David Hildenbrand	1c5cfc2b71	virtio-balloon: Rework pbp tracking data Using the address of a RAMBlock to test for a matching pbp is not really safe. Instead, let's use the guest physical address of the base page along with the page size (via the number of subpages). Also, let's allocate the bitmap separately. This makes the code easier to read and maintain - we can reuse bitmap_new(). Prepare the code to move the PBP out of the device. Fixes: `ed48c59875` ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Fixes: `b27b323914` ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates") Cc: qemu-stable@nongnu.org #v4.0.0 Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-6-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-25 07:57:59 -04:00
David Hildenbrand	e6129b271b	virtio-balloon: Better names for offset variables in inflate/deflate code "host_page_base" is really confusing, let's make this clearer, also rename the other offsets to indicate to which base they apply. offset -> mr_offset ram_offset -> rb_offset host_page_base -> rb_aligned_offset While at it, use QEMU_ALIGN_DOWN() instead of a handcrafted computation and move the computation to the place where it is needed. Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-25 07:57:57 -04:00
David Hildenbrand	2ffc49eea1	virtio-balloon: Simplify deflate with pbp Let's simplify this - the case we are optimizing for is very hard to trigger and not worth the effort. If we're switching from inflation to deflation, let's reset the pbp. Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-25 07:57:56 -04:00
David Hildenbrand	483f13524b	virtio-balloon: Fix QEMU crashes on pagesize > BALLOON_PAGE_SIZE We are using the wrong functions to set/clear bits, effectively touching multiple bits, writing out of range of the bitmap, resulting in memory corruptions. We have to use set_bit()/clear_bit() instead. Can easily be reproduced by starting a qemu guest on hugetlbfs memory, inflating the balloon. QEMU crashes. This never could have worked properly - especially, also pages would have been discarded when the first sub-page would be inflated (the whole bitmap would be set). While testing I realized, that on hugetlbfs it is pretty much impossible to discard a page - the guest just frees the 4k sub-pages in random order most of the time. I was only able to discard a hugepage a handful of times - so I hope that now works correctly. Fixes: `ed48c59875` ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Fixes: `b27b323914` ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates") Cc: qemu-stable@nongnu.org #v4.0.0 Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-25 07:57:52 -04:00
David Hildenbrand	ffa207d082	virtio-balloon: Fix wrong sign extension of PFNs If we directly cast from int to uint64_t, we will first sign-extend to an int64_t, which is wrong. We actually want to treat the PFNs like unsigned values. As far as I can see, this dates back to the initial virtio-balloon commit, but wasn't triggered as fairly big guests would be required. Cc: qemu-stable@nongnu.org Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-25 07:57:49 -04:00
Stefan Hajnoczi	2bbadb08ce	virtio-balloon: fix QEMU 4.0 config size migration incompatibility The virtio-balloon config size changed in QEMU 4.0 even for existing machine types. Migration from QEMU 3.1 to 4.0 can fail in some circumstances with the following error: qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 This happens because the virtio-balloon config size affects the VIRTIO Legacy I/O Memory PCI BAR size. Introduce a qdev property called "qemu-4-0-config-size" and enable it only for the QEMU 4.0 machine types. This way <4.0 machine types use the old size, 4.0 uses the larger size, and >4.0 machine types use the appropriate size depending on enabled virtio-balloon features. Live migration to and from old QEMUs to QEMU 4.1 works again as long as a versioned machine type is specified (do not use just "pc"!). Originally-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190710141440.27635-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-07-12 10:56:26 -04:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
David Gibson	596546fe9e	virtio-balloon: Restore MADV_WILLNEED hint on balloon deflate Prior to `f6deb6d9` "virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate", the balloon device issued an madvise() MADV_WILLNEED on pages removed from the balloon. That would hint to the host kernel that the pages were likely to be needed by the guest in the near future. It's unclear if this is actually valuable or not, and so `f6deb6d9` removed this, essentially ignoring balloon deflate requests. However, concerns have been raised that this might cause a performance regression by causing extra latency for the guest in certain configurations. So, until we can get actual benchmark data to see if that's the case, this restores the old behaviour, issuing a MADV_WILLNEED when a page is removed from the balloon. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190306030601.21986-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-03-12 21:22:31 -04:00
David Gibson	b27b323914	virtio-balloon: Fix possible guest memory corruption with inflates & deflates This fixes a balloon bug with a nasty consequence - potentially corrupting guest memory - but which is extremely unlikely to be triggered in practice. The balloon always works in 4kiB units, but the host could have a larger page size on certain platforms. Since `ed48c59` "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" we've handled this by accumulating requests to balloon 4kiB subpages until they formed a full host page. Since `f6deb6d` "virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate" we essentially ignore deflate requests. Suppose we have a host with 8kiB pages, and one host page has subpages A & B. If we get this sequence of events - inflate A deflate A inflate B - the current logic will discard the whole host page. That's incorrect because the guest has deflated subpage A, and could have written important data to it. This patch fixes the problem by adjusting our state information about partially ballooned host pages when deflate requests are received. Fixes: `ed48c59` "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190306030601.21986-3-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>	2019-03-12 21:22:31 -04:00
David Gibson	301cf2a8dd	virtio-balloon: Don't mismatch g_malloc()/free (CID 1399146) `ed48c59875` "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" introduced a new temporary data structure which tracks 4kiB chunks which have been inserted into the balloon by the guest but don't yet form a full host page which we can discard. Unfortunately, I had a thinko and allocated that structure with g_malloc0() but freed it with a plain free() rather than g_free(). This corrects the problem. Fixes: `ed48c59875` Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190306030601.21986-2-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2019-03-12 21:22:31 -04:00
Wei Wang	ae440bd14c	virtio-balloon: fix a use-after-free case The elem could theorically contain both outbuf and inbufs. We move the free operation to the end of this function to avoid using elem->in_sg while elem has been freed. Fixes: `c13c4153f7` ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Peter Xu <peterx@redhat.com> Message-Id: <1552383280-4122-1-git-send-email-wei.w.wang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-03-12 21:22:31 -04:00
Wei Wang	c13c4153f7	virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT The new feature enables the virtio-balloon device to receive hints of guest free pages from the free page vq. A notifier is registered to the migration precopy notifier chain. The notifier calls free_page_start after the migration thread syncs the dirty bitmap, so that the free page optimization starts to clear bits of free pages from the bitmap. It calls the free_page_stop before the migration thread syncs the bitmap, which is the end of the current round of ram save. The free_page_stop is also called to stop the optimization in the case when there is an error occurred in the process of ram saving. Note: balloon will report pages which were free at the time of this call. As the reporting happens asynchronously, dirty bit logging must be enabled before this free_page_start call is made. Guest reporting must be disabled before the migration dirty bitmap is synchronized. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Peter Xu <peterx@redhat.com> Message-Id: <1544516693-5395-8-git-send-email-wei.w.wang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Dropped kernel header update, fixed up CMD_ID_* name change	2019-03-06 10:49:18 +00:00
David Gibson	ed48c59875	virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but we can only actually discard memory in units of the host page size. Now, we handle this very badly: we silently ignore balloon requests that aren't host page aligned, and for requests that are host page aligned we discard the entire host page. The latter can corrupt guest memory if its page size is smaller than the host's. The obvious choice would be to disable the balloon if the host page size is not 4kiB. However, that would break the special case where host and guest have the same page size, but that's larger than 4kiB. That case currently works by accident[1] - and is used in practice on many production POWER systems where 64kiB has long been the Linux default page size on both host and guest. To make the balloon safe, without breaking that useful special case, we need to accumulate 4kiB balloon requests until we have a whole contiguous host page to discard. We could in principle do that across all guest memory, but it would require a large bitmap to track. This patch represents a compromise: we track ballooned subpages for a single contiguous host page at a time. This means that if the guest discards all 4kiB chunks of a host page in succession, we will discard it. This is the expected behaviour in the (host page) == (guest page) != 4kiB case we want to support. If the guest scatters 4kiB requests across different host pages, we don't discard anything, and issue a warning. Not ideal, but at least we don't corrupt guest memory as the previous version could. Warning reporting is kind of a compromise here. Determining whether we're in a problematic state at realize() time is tricky, because we'd have to look at the host pagesizes of all memory backends, but we can't really know if some of those backends could be for special purpose memory that's not subject to ballooning. Reporting only when the guest tries to balloon a partial page also isn't great because if the guest page size happens to line up it won't indicate that we're in a non ideal situation. It could also cause alarming repeated warnings whenever a migration is attempted. So, what we do is warn the first time the guest attempts balloon a partial host page, whether or not it will end up ballooning the rest of the page immediately afterwards. [1] Because when the guest attempts to balloon a page, it will submit requests for each 4kiB subpage. Most will be ignored, but the one which happens to be host page aligned will discard the whole lot. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-22 10:51:31 -05:00
David Gibson	dbe1a27745	virtio-balloon: Use ram_block_discard_range() instead of raw madvise() Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually discard RAM pages inserted into the balloon. This is basically a Linux only interface (MADV_DONTNEED exists on some other platforms, but doesn't always have the same semantics). It also doesn't work on hugepages and has some other limitations. It turns out that postcopy also needs to discard chunks of memory, and uses a better interface for it: ram_block_discard_range(). It doesn't cover every case, but it covers more than going direct to madvise() and this gives us a single place to update for more possibilities in future. There are some subtleties here to maintain the current balloon behaviour: * For now, we just ignore requests to balloon in a hugepage backed region. That matches current behaviour, because MADV_DONTNEED on a hugepage would simply fail, and we ignore the error. * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on non-host-page-aligned addresses. These would also fail in madvise(), which we then ignored. ram_block_discard_range() error_report()s calls on unaligned addresses, so we explicitly check that case to avoid spamming the logs. * We now call ram_block_discard_range() with the host page size, whereas we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly, this also matches existing behaviour. Although the kernel fails madvise on unaligned addresses, it will round unaligned sizes up to the host page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size we can incorrectly discard more memory than the guest asked us to. I'm planning to address that soon. Errors other than the ones discussed above, will now be reported by ram_block_discard_range(), rather than silently ignored, which means we have a much better chance of seeing when something is going wrong. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20190214043916.22128-5-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-21 12:28:41 -05:00
David Gibson	e9550234d7	virtio-balloon: Rework ballon_page() interface This replaces the balloon_page() internal interface with ballon_inflate_page(), with a slightly different interface. The new interface will make future alterations simpler. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20190214043916.22128-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-21 12:28:41 -05:00
David Gibson	b218a70e6a	virtio-balloon: Corrections to address verification The virtio-balloon device's verification of the address given to it by the guest has a number of faults: * The addresses here are guest physical addresses, which should be 'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly pretty subtle and confusing) * We don't check for section.mr being NULL, which is the main way that memory_region_find() reports basic failures. We really need to check that before looking at any other section fields, because memory_region_find() doesn't initialize them on the failure path * We're passing a length of '1' to memory_region_find(), but really the guest is requesting that we put the entire page into the balloon, so it makes more sense to call it with BALLOON_PAGE_SIZE Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20190214043916.22128-3-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-21 12:28:41 -05:00
David Gibson	f6deb6d95a	virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate When the balloon is inflated, we discard memory place in it using madvise() with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which sounds like it makes sense but is actually unnecessary. The misleadingly named MADV_DONTNEED just discards the memory in question, it doesn't set any persistent state on it in-kernel; all that's necessary to bring the memory back is to touch it. MADV_WILLNEED in contrast specifically says that the memory will be used soon and faults it in. This patch simplify's the balloon operation by dropping the madvise() on deflate. This might have an impact on performance - it will move a delay at deflate time until that memory is actually touched, which might be more latency sensitive. However: * Memory that's being given back to the guest by deflating the balloon might be used soon, but it equally could just sit around in the guest's pools until needed (or even be faulted out again if the host is under memory pressure). * Usually, the timescale over which you'll be adjusting the balloon is long enough that a few extra faults after deflation aren't going to make a difference. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20190214043916.22128-2-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-21 12:28:41 -05:00

1 2 3

123 Commits