mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Greg Kurz	4be56c1959	fsdev: fix virtfs-proxy-helper cwd Since chroot() doesn't change the current directory, it is indeed a good practice to chdir() to the target directory and then then chroot(), or to chroot() to the target directory and then chdir("/"). The current code does neither of them actually. Let's go for the latter. This doesn't fix any security issue since all of this takes place before the helper begins to process requests. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-25 10:30:13 +02:00
Greg Kurz	6a87e7929f	9pfs: local: fix unlink of alien files in mapped-file mode When trying to remove a file from a directory, both created in non-mapped mode, the file remains and EBADF is returned to the guest. This is a regression introduced by commit "df4938a6651b 9pfs: local: unlinkat: don't follow symlinks" when fixing CVE-2016-9602. It changed the way we unlink the metadata file from ret = remove("$dir/.virtfs_metadata/$name"); if (ret < 0 && errno != ENOENT) { /* Error out / } / Ignore absence of metadata / to fd = openat("$dir/.virtfs_metadata") unlinkat(fd, "$name") if (ret < 0 && errno != ENOENT) { / Error out / } / Ignore absence of metadata */ If $dir was created in non-mapped mode, openat() fails with ENOENT and we pass -1 to unlinkat(), which fails in turn with EBADF. We just need to check the return of openat() and ignore ENOENT, in order to restore the behaviour we had with remove(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> [groug: rewrote the comments as suggested by Eric]	2017-05-25 10:30:13 +02:00
Greg Kurz	a17d8659c4	9pfs: drop pdu_push_and_notify() Only pdu_complete() needs to notify the client that a request has completed. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Greg Kurz	57a0aa6b50	fsdev: don't allow unknown format in marshal/unmarshal The code only uses well known format strings. An unknown format token is a bug. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Greg Kurz	506f327582	virtio-9p/xen-9p: move 9p specific bits to core 9p code These bits aren't related to the transport so let's move them to the core code. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Greg Kurz	62f94fc94f	xics: add unrealize handler Now that ICPState objects get finalized on CPU unplug, we should unregister reset handlers as well to avoid a QEMU crash at machine reset time. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza	16ee99805e	hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release When a LMB hot unplug starts, the current DRC LMB status is stored at spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus if a migration occurs in the middle of a LMB unplug the spapr_lmb_release callback will lost track of the LMB unplug progress. This patch implements a new recover function spapr_recover_pending_dimm_state that is used inside spapr_lmb_release to recover this DRC LMB release status that is lost during the migration. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic changes, simplify error handling] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza	a50919dddf	hw/ppc: migrating the DRC state of hotplugged devices In pseries, a firmware abstraction called Dynamic Reconfiguration Connector (DRC) is used to assign a particular dynamic resource to the guest and provide an interface to manage configuration/removal of the resource associated with it. In other words, DRC is the 'plugged state' of a device. Before this patch, DRC wasn't being migrated. This causes post-migration problems due to DRC state mismatch between source and target. The DRC state of a device X in the source might change, while in the target the DRC state of X is still fresh. When migrating the guest, X will not have the same hotplugged state as it did in the source. This means that we can't hot unplug X in the target after migration is completed because its DRC state is not consistent. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one bug that is caused by this DRC state mismatch between source and target. To migrate the DRC state, we defined the VMStateDescription struct for spapr_drc to enable the transmission of spapr_drc state in migration. Not all the elements in the DRC state are migrated - only those that can be modified by guest actions or device add/remove operations: - 'isolation_state', 'allocation_state' and 'indicator_state' are involved in the DR state transition diagram from PAPR+ 2.7, 13.4; - 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation' are needed in attaching and detaching devices; - 'indicator_state' provides users with hardware state information. These are the DRC elements that are migrated. In this patch the DRC state is migrated for PCI, LMB and CPU connector types. At this moment there is no support to migrate DRC for the PHB (PCI Host Bridge) type. In the 'realize' function the DRC is registered using vmstate_register, similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'. This approach works because DRCs are bus-less and do not sit on a BusClass that implements bc->get_dev_path, so as a fallback the VMSD gets identified via "spapr_drc"/get_index(drc). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza	318347234d	hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed. After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either. This patch makes the following changes: - removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed. - removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach(). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
David Gibson	0cffce56ae	hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState The LMB DRC release callback, spapr_lmb_release(), uses an opaque parameter, a sPAPRDIMMState struct that stores the current LMBs that are allocated to a DIMM (nr_lmbs). After each call to this callback, the nr_lmbs is decremented by one and, when it reaches zero, the callback proceeds with the qdev calls to hot unplug the LMB. Using drc->detach_cb_opaque is problematic because it can't be migrated in the future DRC migration work. This patch makes the following changes to eliminate the usage of this opaque callback inside spapr_lmb_release: - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new attribute called 'addr' was added to it. This is used as an unique identifier to associate a sPAPRDIMMState to a PCDIMM element. - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'. This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs that are currently going under an unplug process. - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address was created to fetch the address of a PCDIMM device inside spapr_lmb_release. When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs. After these changes, the opaque argument for spapr_lmb_release is now unused and is passed as NULL inside spapr_del_lmbs. This and the other opaque arguments can now be safely removed from the code. As an additional cleanup made by this patch, the spapr_del_lmbs function was merged with spapr_memory_unplug_request. The former was being called only by the latter and both were small enough to fit one single function. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic cleanups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:28 +10:00
Jeff Cody	223a23c198	block/gluster: glfs_lseek() workaround On current released versions of glusterfs, glfs_lseek() will sometimes return invalid values for SEEK_DATA or SEEK_HOLE. For SEEK_DATA and SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in the case of error: LSEEK(2): off_t lseek(int fd, off_t offset, int whence); [...] SEEK_HOLE Adjust the file offset to the next hole in the file greater than or equal to offset. If offset points into the middle of a hole, then the file offset is set to offset. If there is no hole past offset, then the file offset is adjusted to the end of the file (i.e., there is an implicit hole at the end of any file). [...] RETURN VALUE Upon successful completion, lseek() returns the resulting offset location as measured in bytes from the beginning of the file. On error, the value (off_t) -1 is returned and errno is set to indicate the error However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a value less than the passed offset, yet greater than zero. For instance, here are example values observed from this call: offs = glfs_lseek(s->fd, start, SEEK_HOLE); if (offs < 0) { return -errno; /* D1 and (H3 or H4) */ } start == 7608336384 offs == 7607877632 This causes QEMU to abort on the assert test. When this value is returned, errno is also 0. This is a reported and known bug to glusterfs: https://bugzilla.redhat.com/show_bug.cgi?id=1425293 Although this is being fixed in gluster, we still should work around it in QEMU, given that multiple released versions of gluster behave this way. This patch treats the return case of (offs < start) the same as if an error value other than ENXIO is returned; we will assume we learned nothing, and there are no holes in the file. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Message-id: 87c0140e9407c08f6e74b04131b610f2e27c014c.1495560397.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:44:46 -04:00
Paolo Bonzini	eb05e011e2	blockjob: use deferred_to_main_loop to indicate the coroutine has ended All block jobs are using block_job_defer_to_main_loop as the final step just before the coroutine terminates. At this point, block_job_enter should do nothing, but currently it restarts the freed coroutine. Now, the job->co states should probably be changed to an enum (e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming block_job_started, job->deferred_to_main_loop and job->busy. For now, this patch eliminates the problematic reenter by removing the reset of job->deferred_to_main_loop (which served no purpose, as far as I could see) and checking the flag in block_job_enter. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-12-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	4fb588e95b	blockjob: reorganize block_job_completed_txn_abort This splits the part that touches job states from the part that invokes callbacks. It will make the code simpler to understand once job states will be protected by a different mutex than the AioContext lock. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-11-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	7e74a73499	blockjob: strengthen a bit test-blockjob-txn Unlike test-blockjob-txn, QMP releases the reference to the transaction before the jobs finish. Thus, qemu-iotest 124 showed a failure while working on the next patch that the unit tests did not have. Make the test a little nastier. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-10-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	c8ab5c2dde	blockjob: group BlockJob transaction functions together Yet another pure code movement patch, preparing for the next change. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-9-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	4c241cf5d6	blockjob: introduce block_job_cancel_async, check iostatus invariants The new functions helps respecting the invariant that the coroutine is entered with false user_resume, zero pause count and no error recorded in the iostatus. Resetting the iostatus is now common to all of block_job_cancel_async, block_job_user_resume and block_job_iostatus_reset, albeit with slight differences: - block_job_cancel_async resets the iostatus, and resumes the job if there was an error, but the coroutine is not restarted immediately. For example the caller may continue with a call to block_job_finish_sync. - block_job_user_resume resets the iostatus. It wants to resume the job unconditionally, even if there was no error. - block_job_iostatus_reset doesn't resume the job at all. Maybe that's a bug but it should be fixed separately. block_job_iostatus_reset does the least common denominator, so add some checking but otherwise leave it as the entry point for resetting the iostatus. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-8-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	2caf63a903	blockjob: move iostatus reset inside block_job_user_resume Outside blockjob.c, the block_job_iostatus_reset function is used once in the monitor and once in BlockBackend. When we introduce the block job mutex, block_job_iostatus_reset's client is going to be the block layer (for which blockjob.c will take the block job mutex) rather than the monitor (which will take the block job mutex by itself). The monitor's call to block_job_iostatus_reset from the monitor comes just before the sole call to block_job_user_resume, so reset the iostatus directly from block_job_iostatus_reset. This will avoid the need to introduce separate block_job_iostatus_reset and block_job_iostatus_reset_locked APIs. After making this change, move the function together with the others that were moved in the previous patch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-7-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	88691b37f8	blockjob: separate monitor and blockjob APIs We have two different headers for block job operations, blockjob.h and blockjob_int.h. The former contains APIs called by the monitor, the latter contains APIs called by the block job drivers and the block layer itself. Keep the two APIs separate in the blockjob.c file too. This will be useful when transitioning away from the AioContext lock, because there will be locking policies for the two categories, too---the monitor will have to call new block_job_lock/unlock APIs, while blockjob APIs will take care of this for the users. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	f321dcb57f	blockjob: introduce block_job_pause/resume_all Remove use of block_job_pause/resume from outside blockjob.c, thus making them static. The new functions are used by the block layer, so place them in blockjob_int.h. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	05b0d8e3b8	blockjob: introduce block_job_early_fail Outside blockjob.c, block_job_unref is only used when a block job fails to start, and block_job_ref is not used at all. The reference counting thus is pretty well hidden. Introduce a separate function to be used by block jobs; because block_job_ref and block_job_unref now become static, move them earlier in blockjob.c. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-4-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	9f086abbe4	blockjob: remove iostatus_reset callback This is unused since commit `66a0fae` ("blockjob: Don't touch BDS iostatus", 2016-05-19). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-3-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	6573d9c638	blockjob: remove unnecessary check !job is always checked prior to the call, drop it from here. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Stefan Hajnoczi	e1fe27a208	s390x updates: - support for vfio-ccw to passthrough channel devices - allow ccw bios to boot from scsi generic devices - bugfix for initial reset -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZJBbAAAoJEN7Pa5PG8C+vXtoP/0hXz0NAG+fSdwXXcfjPjisX sRfu3rln6dCAAZKZNVvlPEctCqwjwVOnuygUln0/UL0XgBdRT9desjA2uQnwVrLn vsLuG+8jWdmGbs0Wt2t5GfSoSs40V1KIRKd4b+MAtDjQQ52WvIBFsbTW/ZRan+LY ltgqBuBh3sfOQ/g5QGzR1RBrJAABkTs00mlgfZfws0p5QeJbPKjmQaB4Al+HJMKC bmug0ZlxysJQ2wJy0Ybw2Y0NGSIw/hFi1PGgtwJKLj5OwH/WtBjr4lpcO/7vN9+l vsV8CkayeHr+VShXe9Vh+tbIQtaiX8jYPVlD2mQFt7EyS1JrB6L6DPHvlZwkQyBi C7IQhEkziUv7CJzYX9pUHEPqwOqmxzao1E+GKxVhqlIV7OCpVGoIiFoQu/aRI8v/ Rz3BAEzogdR4N+04Ww3rU+NrDYZUFO0BGZtCjEuvjbPtdeuvt+hbWPz/uPZgCrcX wKBHxafQ/BRKxOw4rJkpfweCf/sHeD2DELzn/KXZbibhKBfe0hjTDvoIu6xffyfW HElT477sOnAqOm9JgFdI58qBHT3OepMg62szF+QDk/7zBY095OchmQgs8vnkQ6x/ LVvxrWXZJyBj4joU94BPntt9lU0oky3XgjSoBnrblRGOqA0nwyQnkR63SDnCHzz0 FULYu/bd0kvLlodRlZge =XpHA -----END PGP SIGNATURE----- Merge remote-tracking branch 'cohuck/tags/s390x-20170523' into staging s390x updates: - support for vfio-ccw to passthrough channel devices - allow ccw bios to boot from scsi generic devices - bugfix for initial reset # gpg: Signature made Tue 23 May 2017 12:02:24 PM BST # gpg: using RSA key 0xDECF6B93C6F02FAF # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" # gpg: aka "Cornelia Huck <cohuck@kernel.org>" # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * cohuck/tags/s390x-20170523: (21 commits) s390/kvm: do not reset riccb on initial cpu reset MAINTAINERS: Add vfio-ccw maintainer vfio/ccw: update sense data if a unit check is pending s390x/css: ccw translation infrastructure s390x/css: introduce and realize ccw-request callback vfio/ccw: get irqs info and set the eventfd fd vfio/ccw: get io region info vfio/ccw: vfio based subchannel passthrough driver s390x/css: device support for s390-ccw passthrough s390x/css: realize css_create_sch s390x/css: realize css_sch_build_schib s390x/css: add s390-squash-mcss machine option linux-headers: update pc-bios/s390-ccw.img: rebuild image pc-bios/s390-ccw: Build a reasonable max_sectors limit pc-bios/s390-ccw: Get Block Limits VPD device data pc-bios/s390-ccw: Get list of supported VPD pages pc-bios/s390-ccw: Refactor scsi_inquiry function pc-bios/s390-ccw: Break up virtio-scsi read into multiples pc-bios/s390-ccw: Move SCSI block factor to outer read ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-24 13:53:17 +01:00
Laurent Vivier	c871bc70bb	spapr: add pre_plug function for memory This allows to manage errors before the memory has started to be hotplugged. We already have the function for the CPU cores. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fixed a couple of style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 17:27:39 +10:00
David Gibson	459264ef24	pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types As of pseries-2.7 and later, we require the total number of guest vcpus to be a multiple of the threads-per-core. pseries-2.6 and earlier machine types, however, are supposed to allow this for the sake of migration from old qemu versions which allowed this. Unfortunately, `8149e29` "pseries: Enforce homogeneous threads-per-core" broke this by not considering the old machine type case. This fixes it by only applying the check when the machine type supports hotpluggable cpus. By not-entirely-coincidence, that corresponds to the same time when we started enforcing total threads being a multiple of threads-per-core. Fixes: `8149e2992f` Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2017-05-24 11:39:53 +10:00
David Gibson	80c33d343f	pseries: Split CAS PVR negotiation out into a separate function Guests of the qemu machine type go through a feature negotiation process known as "client architecture support" (CAS) during early boot. This does a number of things, one of which is finding a CPU compatibility mode which can be supported by both guest and host. In fact the CPU negotiation is probably the single most complex part of the CAS process, so this splits it out into a helper function. We've recently made some mistakes in maintaining backward compatibility for old machine types here. Splitting this out will also make it easier to fix this. This also adds a possibly useful error message if the negotiation fails (i.e. if there isn't a CPU mode that's suitable for both guest and host). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-05-24 11:39:53 +10:00
Greg Kurz	3d85885a1b	spapr: fix error reporting in xics_system_init() If the user explicitely asked for kernel-irqchip support and "xics-kvm" initialization fails, we shouldn't fallback to emulated "xics" as we do now. It is also awkward to print an error message when we have an errp pointer argument. Let's use the errp argument to report the error and let the caller decide. This simplifies the code as we don't need a local Error * here. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	249127d0df	spapr_cpu_core: drop reference on ICP object during CPU realization When a piece of code allocates an object, it implicitely gets a reference on it. If it then makes that object a child property of another object, it should drop its own reference at some point otherwise the child object can never be finalized. The current code hence leaks one ICP object per CPU when hot-removing a core. Failing to add a newly allocated ICP object to the CPU is a bug. While here, let's ensure QEMU aborts if this ever happens. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Daniel Henrique Barboza	bff3063837	hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry Currenty we do not have any RTAS event that is reported by the event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception interface and, as such, marked as 'exception=true'. Commit `79853e18d9`, 'spapr_events: event-scan RTAS interface', added the event_scan interface because the guest kernel requires it to initialize other required interfaces. It is acting since then as a stub because no events that would be reported by it were added since then. However, the existence of the 'exception' boolean adds an unnecessary load in the future migration of the pending_events, sPAPREventLogEntry QTAILQ that hosts the pending RTAS events. To make the code cleaner and ease the future migration changes, this patch makes the following changes: - remove the 'exception' boolean that filter these events. There is nothing to filter since all events are reported by check-exception; - functions rtas_event_log_queue, rtas_event_log_dequeue and rtas_event_log_contains don't receive the 'exception' boolean as parameter; - event_scan function was simplified. It was calling 'rtas_event_log_dequeue(mask, false)' that was always returning 'NULL' because we have no events that are created with exception=false, thus in the end it would execute a jump to 'out_no_events' all the time. The function now assumes that this will always be the case and all the remaining logic were deleted. In the future, when or if we add new RTAS events that should be reported with the event_scan interface, we can refer to the changes made in this patch to add the event_scan logic back. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	07572c0653	spapr: ensure core_slot isn't NULL in spapr_core_unplug() If we go that far on the path of hot-removing a core and we find out that the core-id is invalid, then we have a serious bug. Let's make it explicit with an assert() instead of dereferencing a NULL pointer. This fixes Coverity issue CID 1375404. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	de86eccc0c	xics_kvm: cache already enabled vCPU ids Since commit `a45863bda9` ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled"), we were able to re-hotplug a vCPU that had been hot- unplugged ealier, thanks to a boolean flag in ICPState that we set when enabling KVM_CAP_IRQ_XICS. This could work because the lifecycle of all ICPState objects was the same as the machine. Commit `5bc8d26de2` ("spapr: allocate the ICPState object from under sPAPRCPUCore") broke this assumption and now we always pass a freshly allocated ICPState object (ie, with the flag unset) to icp_kvm_cpu_setup(). This cause re-hotplug to fail with: Unable to connect CPU8 to kernel XICS: Device or resource busy Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was enabled. This also drops the now useless boolean flag from ICPState. Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Bharata B Rao	06ec79e865	spapr: Consolidate HPT freeing code into a routine Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	c8a98293f7	spapr-cpu-core: release ICP object when realization fails While here we introduce a single error path to avoid code duplication. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	175d2aa038	spapr: sanitize error handling in spapr_ics_create() The spapr_ics_create() function handles errors in a rather convoluted way, with two local Error * variables. Moreover, failing to parent the ICS object to the machine should be considered as a bug but it is currently ignored. This patch addresses both issues. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	f63ebfe0ac	ppc/xics: simplify prototype of xics_spapr_init() This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Nikunj A Dadhania	a8b7373421	target/ppc: reset reservation in do_rfi() For transitioning back to userspace after the interrupt. Suggested-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Stefan Hajnoczi	9964e96dc9	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJZI54ZAAoJEO8Ells5jWIRieUH/A5im/ud4QMJlLTPPI9grim8 KSl8InbMdpG9CkROZIA6lt8torestH60YvzR+128kI4rKiyglGWMhWqyo+Cli9NK bhZCeqS/zVWWSU/LR+SkFI4mePgnLmfDL+kbZvZQ7eSF9xwSWXYZd7d8HPxY7gcF fE+cnxSQl1VbtT/ncvrsYykgQG2L8MjGWfLjspzeJ0qG0YuwiMyJnmruPKgjVdcW 1A0CFOIxWd/5m1d5cC8I8+kQPn0aB4uB/gXFL46c3ZoxwtZWSs+IKA1dl8aORnZL +ihJ1YEVxJzY/UPo8mrbN/9XE+u6qpL4UfaNdWmu7KTMVI6+UUaXOc0r2UKuBj8= =dLzH -----END PGP SIGNATURE----- Merge remote-tracking branch 'jasowang/tags/net-pull-request' into staging # gpg: Signature made Tue 23 May 2017 03:27:37 AM BST # gpg: using RSA key 0xEF04965B398D6211 # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" # Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211 * jasowang/tags/net-pull-request: e1000e: Fix ICR "Other" causes clear logic net/filter-rewriter: Remove unused option in filter-rewriter net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle net/filter-mirror.c: Remove duplicate check code. hmp / net: Mark host_net_add/remove as deprecated COLO-compare: Improve tcp compare trace event readability virtio-net: fix wild pointer when remove virtio-net queues net/dump: Issue a warning for the deprecated "-net dump" net/tap: Replace tap-haiku.c and tap-aix.c by a generic tap-stub.c Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-23 15:01:31 +01:00
Eduardo Habkost	8c1bc1e9d7	qapi-schema: Remove obsolete note from ObjectTypeInfo The "This command is experimental" note in ObjectTypeInfo is obsolete since 2012. Commit `5192082097` removed the warning from the qom-list-types command documentation, but we forgot to remove the warning from ObjectTypeInfo. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170516205351.12101-1-ehabkost@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eric Blake	579cf1d104	block: Use QDict helpers for --force-share Fam's addition of --force-share in commits `459571f7` and `335e9937` were developed prior to the addition of QDict scalar insertion macros, but merged after the general cleanup in commit `46f5ac20`. Patch created mechanically by rerunning: spatch --sp-file scripts/coccinelle/qobject.cocci \ --macro-file scripts/cocci-macro-file.h --dir . --in-place Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170515195439.17677-1-eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eric Blake	08fba7ac9b	shutdown: Expose bool cause in SHUTDOWN and RESET events Libvirt would like to be able to distinguish between a SHUTDOWN event triggered solely by guest request and one triggered by a SIGTERM or other action on the host. While qemu_kill_report() was already able to give different output to stderr based on whether a shutdown was triggered by a host signal (but NOT by a host UI event, such as clicking the X on the window), that information was then lost to management. The previous patches improved things to use an enum throughout all callsites, so now we have something ready to expose through QMP. Note that for now, the decision was to expose ONLY a boolean, rather than promoting ShutdownCause to a QAPI enum; this is because libvirt has not expressed an interest in anything finer-grained. We can still add additional details, in a backwards-compatible manner, if a need later arises (if the addition happens before 2.10, we can replace the bool with an enum; otherwise, the enum will have to be in addition to the bool); this patch merely adds a helper shutdown_caused_by_guest() to map the internal enum into the external boolean. Update expected iotest outputs to match the new data (complete coverage of the affected tests is obtained by -raw, -qcow2, and -nbd). Here is output from 'virsh qemu-monitor-event --loop' with the patch installed: event SHUTDOWN at 1492639680.731251 for domain fedora_13: {"guest":true} event STOP at 1492639680.732116 for domain fedora_13: <null> event SHUTDOWN at 1492639680.732830 for domain fedora_13: {"guest":false} Note that libvirt runs qemu with -no-shutdown: the first SHUTDOWN event was triggered by an action I took directly in the guest (shutdown -h), at which point qemu stops the vcpus and waits for libvirt to do any final cleanups; the second SHUTDOWN event is the result of libvirt sending SIGTERM now that it has completed cleanup. Libvirt is already smart enough to only feed the first qemu SHUTDOWN event to the end user (remember, virsh qemu-monitor-event is a low-level debugging interface that is explicitly unsupported by libvirt, so it sees things that normal end users do not); changing qemu to emit SHUTDOWN only once is outside the scope of this series. See also https://bugzilla.redhat.com/1384007 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170515214114.15442-6-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eric Blake	cf83f14005	shutdown: Add source information to SHUTDOWN and RESET Time to wire up all the call sites that request a shutdown or reset to use the enum added in the previous patch. It would have been less churn to keep the common case with no arguments as meaning guest-triggered, and only modified the host-triggered code paths, via a wrapper function, but then we'd still have to audit that I didn't miss any host-triggered spots; changing the signature forces us to double-check that I correctly categorized all callers. Since command line options can change whether a guest reset request causes an actual reset vs. a shutdown, it's easy to also add the information to reset requests. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> [ppc parts] Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [SPARC part] Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x parts] Message-Id: <20170515214114.15442-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eric Blake	802f045a5f	shutdown: Preserve shutdown cause through replay With the recent addition of ShutdownCause, we want to be able to pass a cause through any shutdown request, and then faithfully replay that cause when later replaying the same sequence. The easiest way is to expand the reply event mechanism to track a series of values for EVENT_SHUTDOWN, one corresponding to each value of ShutdownCause. We are free to change the replay stream as needed, since there are already no guarantees about being able to use a replay stream by any other version of qemu than the one that generated it. The cause is not actually fed back until the next patch changes the signature for requesting a shutdown; a TODO marks that upcoming change. Yes, this uses the gcc/clang extension of a ranged case label, but this is not the first time we've used non-C99 constructs. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20170515214114.15442-4-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eric Blake	aedbe19297	shutdown: Prepare for use of an enum in reset/shutdown_request We want to track why a guest was shutdown; in particular, being able to tell the difference between a guest request (such as ACPI request) and host request (such as SIGINT) will prove useful to libvirt. Since all requests eventually end up changing shutdown_requested in vl.c, the logical change is to make that value track the reason, rather than its current 0/1 contents. Since command-line options control whether a reset request is turned into a shutdown request instead, the same treatment is given to reset_requested. This patch adds an internal enum ShutdownCause that describes reasons that a shutdown can be requested, and changes qemu_system_reset() to pass the reason through, although for now nothing is actually changed with regards to what gets reported. The enum could be exported via QAPI at a later date, if deemed necessary, but for now, there has not been a request to expose that much detail to end clients. For the most part, we turn 0 into SHUTDOWN_CAUSE_NONE, and 1 into SHUTDOWN_CAUSE_HOST_ERROR; the only specific case where we have enough information right now to use a different value is when we are reacting to a host signal. It will take a further patch to edit all call-sites that can trigger a reset or shutdown request to properly pass in any other reasons; this patch includes TODOs to point such places out. qemu_system_reset() trades its 'bool report' parameter for a 'ShutdownCause reason', with all non-zero values having the same effect; this lets us get rid of the weird #defines for VMRESET_* as synonyms for bools. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170515214114.15442-3-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eric Blake	7af88279e4	shutdown: Simplify shutdown_signal There is no signal 0 (kill(pid, 0) has special semantics to probe whether a process is alive), rather than actually sending a signal 0). So we can use the simpler 0, instead of -1, for our sentinel of whether a shutdown request due to a signal has happened. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Message-Id: <20170515214114.15442-2-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Markus Armbruster	fc0f005958	sockets: Plug memory leak in socket_address_flatten() socket_address_flatten() leaks a SocketAddress when its argument is null. Happens when opening a ChardevBackend of type 'udp' that is configured without a local address. Screwed up in commit `bd269ebc` due to last minute semantic conflict resolution. Spotted by Coverity. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1494866344-11013-1-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-23 13:28:17 +02:00
Greg Kurz	fe2f74af2b	scripts/qmp/qom-set: fix the value argument passed to srv.command() When invoking the script with -s, we end up passing a bogus value to QEMU: $ ./scripts/qmp/qom-set -s /var/tmp/qmp-sock-exp /machine.accel kvm {} $ ./scripts/qmp/qom-get -s /var/tmp/qmp-sock-exp /machine.accel /var/tmp/qmp-sock-exp This happens because sys.argv[2] isn't necessarily the command line argument that holds the value. It is sys.argv[4] when -s was also passed. Actually, the code already has a variable to handle that. This patch simply uses it. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <149373610338.5144.9635049015143453288.stgit@bahia.lan> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Sameeh Jubran	82342e91b6	e1000e: Fix ICR "Other" causes clear logic This commit fixes a bug which causes the guest to hang. The bug was observed upon a "receive overrun" (bit #6 of the ICR register) interrupt which could be triggered post migration in a heavy traffic environment. Even though the "receive overrun" bit (#6) is masked out by the IMS register (refer to the log below) the driver still receives an interrupt as the "receive overrun" bit (#6) causes the "Other" - bit #24 of the ICR register - bit to be set as documented below. The driver handles the interrupt and clears the "Other" bit (#24) but doesn't clear the "receive overrun" bit (#6) which leads to an infinite loop. Apparently the Windows driver expects that the "receive overrun" bit and other ones - documented below - to be cleared when the "Other" bit (#24) is cleared. So to sum that up: 1. Bit #6 of the ICR register is set by heavy traffic 2. As a results of setting bit #6, bit #24 is set 3. The driver receives an interrupt for bit 24 (it doesn't receieve an interrupt for bit #6 as it is masked out by IMS) 4. The driver handles and clears the interrupt of bit #24 5. Bit #6 is still set. 6. 2 happens all over again The Interrupt Cause Read - ICR register: The ICR has the "Other" bit - bit #24 - that is set when one or more of the following ICR register's bits are set: LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit #17, MNG - bit #18 This bug can occur with any of these bits depending on the driver's behaviour and the way it configures the device. However, trying to reproduce it with any bit other than RX0 is challenging and came to failure as the drivers don't implement most of these bits, trying to reproduce it with LSC (Link Status Change - bit #2) bit didn't succeed too as it seems that Windows handles this bit differently. Log sample of the storm: 27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) 27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) * This bug behaviour wasn't observed with the Linux driver. This commit solves: https://bugzilla.redhat.com/show_bug.cgi?id=1447935 https://bugzilla.redhat.com/show_bug.cgi?id=1449490 Cc: qemu-stable@nongnu.org Signed-off-by: Sameeh Jubran <sjubran@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2017-05-23 10:10:38 +08:00
Zhang Chen	61fcc16af6	net/filter-rewriter: Remove unused option in filter-rewriter Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2017-05-23 10:10:38 +08:00
Zhang Chen	e05dc4cf56	net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle Because filter_mirror_receive_iov() and filter_redirector_receive_iov() both use the filter_mirror_send() to send packet, so I change filter_mirror_send() to filter_send() that looks more common. And fix some codestyle. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2017-05-23 10:10:38 +08:00
Zhang Chen	e2f8401638	net/filter-mirror.c: Remove duplicate check code. The s->outdev have checked in filter_mirror_set_outdev(). Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2017-05-23 10:10:38 +08:00

1 2 3 4 5 ...

53554 Commits