Switch the mss-timer code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191008171740.9679-20-peter.maydell@linaro.org
Provide the new transaction-based API. If a ptimer is created
using ptimer_init() rather than ptimer_init_with_bh(), then
instead of providing a QEMUBH, it provides a pointer to the
callback function directly, and has opted into the transaction
API. All calls to functions which modify ptimer state:
- ptimer_set_period()
- ptimer_set_freq()
- ptimer_set_limit()
- ptimer_set_count()
- ptimer_run()
- ptimer_stop()
must be between matched calls to ptimer_transaction_begin()
and ptimer_transaction_commit(). When ptimer_transaction_commit()
is called it will evaluate the state of the timer after all the
changes in the transaction, and call the callback if necessary.
In the old API the individual update functions generally would
call ptimer_trigger() immediately, which would schedule the QEMUBH.
In the new API the update functions will instead defer the
"set s->next_event and call ptimer_reload()" work to
ptimer_transaction_commit().
Because ptimer_trigger() can now immediately call into the
device code which may then call other ptimer functions that
update ptimer_state fields, we must be more careful in
ptimer_reload() not to cache fields from ptimer_state across
the ptimer_trigger() call. (This was harmless with the QEMUBH
mechanism as the BH would not be invoked until much later.)
We use assertions to check that:
* the functions modifying ptimer state are not called outside
a transaction block
* ptimer_transaction_begin() and _commit() calls are paired
* the transaction API is not used with a QEMUBH ptimer
There is some slight repetition of code:
* most of the set functions have similar looking "if s->bh
call ptimer_reload, otherwise set s->need_reload" code
* ptimer_init() and ptimer_init_with_bh() have similar code
We deliberately don't try to avoid this repetition, because
it will all be deleted when the QEMUBH version of the API
is removed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191008171740.9679-3-peter.maydell@linaro.org
Currently the ptimer design uses a QEMU bottom-half as its
mechanism for calling back into the device model using the
ptimer when the timer has expired. Unfortunately this design
is fatally flawed, because it means that there is a lag
between the ptimer updating its own state and the device
callback function updating device state, and guest accesses
to device registers between the two can return inconsistent
device state.
We want to replace the bottom-half design with one where
the guest device's callback is called either immediately
(when the ptimer triggers by timeout) or when the device
model code closes a transaction-begin/end section (when the
ptimer triggers because the device model changed the
ptimer's count value or other state). As the first step,
rename ptimer_init() to ptimer_init_with_bh(), to free up
the ptimer_init() name for the new API. We can then convert
all the ptimer users away from ptimer_init_with_bh() before
removing it entirely.
(Commit created with
git grep -l ptimer_init | xargs sed -i -e 's/ptimer_init/ptimer_init_with_bh/'
and three overlong lines folded by hand.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191008171740.9679-2-peter.maydell@linaro.org
Update the headers against commit:
0f1a7b3fac05 ("timer-of: don't use conditional expression
with mixed 'void' types")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Message-id: 20191003154640.22451-2-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- block: Fix crash with qcow2 partial cluster COW with small cluster
sizes (misaligned write requests with BDRV_REQ_NO_FALLBACK)
- qcow2: Fix integer overflow potentially causing corruption with huge
requests
- vhdx: Detect truncated image files
- tools: Support help options for --object
- Various block-related replay improvements
- iotests/028: Fix for long $TEST_DIRs
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdpJwuAAoJEH8JsnLIjy/W4EIP/ieDG6LwYIkxk6UPAnPm2ZtT
jEBj1jZNbPPMVPCzeVryS2qgoSB4ItnKlSpbW9Z+DX2EL8gr/sHZot4J6BXkxaJV
mxHa4KCLZUYHfLbVBX+ubb/fufu2ItGUtY2LcdxyWASk/6mWKYVivPunwaK4gEiD
CBl9RgSMSy1bpBGIky2NMylOCr5KlSeweH/XL3J3Jodpp/3nnDpZ96iy953R7Bes
OpwZEsDHQmYhDufPqIWVzyq2hEUn22kGe/Rn2KfOHO7SR6aqr+DqtUwlDE9mxxBf
hFhm6bqbJhHQOTMbTrzqiYwDirR4S5/FlynI9+YbngU9fnkbCIOheTKL5M8PlSow
H+ZUtmU1Avp0wG3RZVmtCT9upFV7hpC4/fiMr8bdXCyuWy/7d7WB1G4e9ELiX7uo
VCl6gVviDQbEgnoNS7v6JbP/xjhHuu7Fxh5K0xgT6wtwP53cBqbxORMkwv2u3zCI
QRuiKOHZW3wv8tdRP/5qhdtIxTy6w20v/lAO/s0Xqn8YlnyfrH71LCNWmG4MOfgP
ZXwCv9nxpzVsTPU2nLowl0avCwmDVY8Iv/0sN+eybo8xp47pCmPV9dKa0rJ+RhFe
N5blnnwsmJFPW+QD5gBZn7eH3jafHxN2URhG3cwNdWxQS6SiTcVXDsTfn5YB6PjN
Tb9P2aUJYw94BvUJF2c+
=gCHy
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- block: Fix crash with qcow2 partial cluster COW with small cluster
sizes (misaligned write requests with BDRV_REQ_NO_FALLBACK)
- qcow2: Fix integer overflow potentially causing corruption with huge
requests
- vhdx: Detect truncated image files
- tools: Support help options for --object
- Various block-related replay improvements
- iotests/028: Fix for long $TEST_DIRs
# gpg: Signature made Mon 14 Oct 2019 17:02:54 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
iotests: Test large write request to qcow2 file
qcow2: Limit total allocation range to INT_MAX
qemu-nbd: Support help options for --object
qemu-img: Support help options for --object
qemu-io: Support help options for --object
vl: Split off user_creatable_print_help()
iotests/028: Fix for long $TEST_DIRs
block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK
replay: add BH oneshot event for block layer
replay: finish record/replay before closing the disks
replay: don't drain/flush bdrv queue while RR is working
replay: update docs for record/replay with block devices
replay: disable default snapshot for record/replay
block: implement bdrv_snapshot_goto for blkreplay
block/vhdx: add check for truncated image files
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Printing help for --object is something that we not only want in the
system emulator, but also in tools that support --object. Move it into a
separate function in qom/object_interfaces.c to make the code accessible
for tools.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Replay is capable of recording normal BH events, but sometimes
there are single use callbacks scheduled with aio_bh_schedule_oneshot
function. This patch enables recording and replaying such callbacks.
Block layer uses these events for calling the completion function.
Replaying these calls makes the execution deterministic.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mostly cleanups and minor fixes
[Note I'm seeing a hang on the aarch64 hosted x86-64 tcg migration
test in xbzrle; but I'm seeing that on current head as well]
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl2g1JcACgkQBRYzHrxb
/efl1RAAjYukmf+kCFCw4Ws6nJ4000O85mpj0117SJpgTck1ivTC968REpl5pD0C
aHDzamNW82fiqjRxwF6KJRWic217NrmR1Z/j++SDyIjOc1ERQdB+RdCc7T2NkBT5
2HiPaceNiu9wOpqX/bto/xAug9vAxq5/1jeq+vhKxd+IcvAZII0SwKWn9mWA2209
H4i3v8OCv9isT6MRNitfWT/giYkI5HwFzA9a13S+zXioEGnoAmqzrrAQs2/MkyDt
bIeLbZyonH9hKbdrwmIXCvNEHA32BOPQyrsRp9CPZwRKVP2AzRYU9K9UjKncmYJS
bPdLYFmqEQm8ILQI6lyJ+pW1r/cyAUQBQii6NA+9ZfimxCSB06ArU+JeM0csl7HV
b4cG/bENFmtOzaoc3SrE6t1APlTiS9nxW6iH8zW3ozMEQGGihru7/6VIlwKTOfeX
kXKF92FTiTBpJ1u3/t05TPnxo4c2bKWM+Gj1okDAUsP8HovQpvJa8r92n1cC0+l8
l3pkFnrejzTcrexWIiKXYnPnO7Ez/Dm+0aCzlQkX7DSFxDnwI2T/BYk21FNlcI/L
rCHnkSLjYMWPelTLo9ZNuFaKL9UMeMtLPaIU9NBSSmsQ32/d8EXpDQwe8uAq+9Z/
qBir/mKyDe7I/InumtWQS46SS1/E1VyxDG2dxRWK9lN8DDOXRlM=
=Jouv
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20191011a' into staging
Migration pull 2019-10-11
Mostly cleanups and minor fixes
[Note I'm seeing a hang on the aarch64 hosted x86-64 tcg migration
test in xbzrle; but I'm seeing that on current head as well]
# gpg: Signature made Fri 11 Oct 2019 20:14:31 BST
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20191011a: (21 commits)
migration: Support gtree migration
migration/multifd: pages->used would be cleared when attach to multifd_send_state
migration/multifd: initialize packet->magic/version once at setup stage
migration/multifd: use pages->allocated instead of the static max
migration/multifd: fix a typo in comment of multifd_recv_unfill_packet()
migration/postcopy: check PostcopyState before setting to POSTCOPY_INCOMING_RUNNING
migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup
migration/postcopy: postpone setting PostcopyState to END
migration/postcopy: mis->have_listen_thread check will never be touched
migration: report SaveStateEntry id and name on failure
migration: pass in_postcopy instead of check state again
migration/postcopy: fix typo in mark_postcopy_blocktime_begin's comment
migration/postcopy: map large zero page in postcopy_ram_incoming_setup()
migration/postcopy: allocate tmp_page in setup stage
migration: Don't try and recover return path in non-postcopy
rcu: Use automatic rc_read unlock in core memory/exec code
migration: Use automatic rcu_read unlock in rdma.c
migration: Use automatic rcu_read unlock in ram.c
migration: Fix missing rcu_read_unlock
rcu: Add automatically released rcu_read_lock variants
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Introduce support for GTree migration. A custom save/restore
is implemented. Each item is made of a key and a data.
If the key is a pointer to an object, 2 VMSDs are passed into
the GTree VMStateField.
When putting the items, the tree is traversed in sorted order by
g_tree_foreach.
On the get() path, gtrees must be allocated using the proper
key compare, key destroy and value destroy. This must be handled
beforehand, for example in a pre_load method.
Tests are added to test save/dump of structs containing gtrees
including the virtio-iommu domain/mappings scenario.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20191011121724.433-1-eric.auger@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
uintptr_t fixup for test on 32bit
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20191007143642.301445-6-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RCU_READ_LOCK_GUARD() takes the rcu_read_lock and then uses glib's
g_auto infrastructure (and thus whatever the compiler's hooks are) to
release it on all exits of the block.
WITH_RCU_READ_LOCK_GUARD() is similar but is used as a wrapper for the
lock, i.e.:
WITH_RCU_READ_LOCK_GUARD() {
stuff under lock
}
Note the 'unused' attribute is needed to work around clang bug:
https://bugs.llvm.org/show_bug.cgi?id=43482
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20191007143642.301445-2-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Wrap the check into a function to make it easy to read.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190717005341.14140-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
MVCL is interruptible and we should check for interrupts and process
them after writing back the variables to the registers. Let's check
for any exit requests and exit to the main loop. Introduce a new helper
function for that: cpu_loop_exit_requested().
When booting Fedora 30, I can see a handful of these exits and it seems
to work reliable. Also, Richard explained why this works correctly even
when MVCL is called via EXECUTE:
(1) TB with EXECUTE runs, at address Ae
- env->psw_addr stored with Ae.
- helper_ex() runs, memory address Am computed
from D2a(X2a,B2a) or from psw.addr+RI2.
- env->ex_value stored with memory value modified by R1a
(2) TB of executee runs,
- env->ex_value stored with 0.
- helper_mvcl() runs, using and updating R1b, R1b+1, R2b, R2b+1.
(3a) helper_mvcl() completes,
- TB of executee continues, psw.addr += ilen.
- Next instruction is the one following EXECUTE.
(3b) helper_mvcl() exits to main loop,
- cpu_loop_exit_restore() unwinds psw.addr = Ae.
- Next instruction is the EXECUTE itself...
- goto 1.
As the PoP mentiones that an interruptible instruction called via EXECUTE
should avoid modifying storage/registers that are used by EXECUTE itself,
it is fine to retrigger EXECUTE.
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Drop write notifiers and use filter node instead.
= Changes =
1. Add filter-node-name argument for backup qmp api. We have to do it
in this commit, as 257 needs to be fixed.
2. There are no more write notifiers here, so is_write_notifier
parameter is dropped from block-copy paths.
3. To sync with in-flight requests at job finish we now have drained
removing of the filter, we don't need rw-lock.
4. Block-copy is now using BdrvChildren instead of BlockBackends
5. As backup-top owns these children, we also move block-copy state
into backup-top's ownership.
= Iotest changes =
56: op-blocker doesn't shoot now, as we set it on source, but then
check on filter, when trying to start second backup.
To keep the test we instead can catch another collision: both jobs will
get 'drive0' job-id, as job-id parameter is unspecified. To prevent
interleaving with file-posix locks (as they are dependent on config)
let's use another target for second backup.
Also, it's obvious now that we'd like to drop this op-blocker at all
and add a test-case for two backups from one node (to different
destinations) actually works. But not in these series.
141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk
check inside qmp_blockdev_del. But we've dropped block-copy blk
objects, so no more blk objects on source bs (job blk is on backup-top
filter bs). New message is from op-blocker, which is the next check in
qmp_blockdev_add.
257: The test wants to emulate guest write during backup. They should
go to filter node, not to original source node, of course. Therefore we
need to specify filter node name and use it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Split block_copy_set_callbacks out of block_copy_state_new. It's needed
for further commit: block-copy will use BdrvChildren of backup-top
filter, so it will be created from backup-top filter creation function.
But callbacks will still belong to backup job and will be set in
separate.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Move synchronization mechanism to block-copy, to be able to use one
block-copy instance from backup job and backup-top filter in parallel.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
A block driver can provide a callback to report driver-specific
statistics.
file-posix driver now reports discard statistics
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Each block_acct_done/failed call is designed to correspond to a
previous block_acct_start call, which initializes the stats cookie.
However sometimes it is not the case, e.g. some error paths might
report the same cookie twice because it is hard to accurately track if
the cookie was reported yet or not.
This patch cleans the cookie after report.
(Note: block_acct_failed/done without a previous block_acct_start at
all should be avoided. Uninitialized cookie might hold a garbage value
and there is still "< BLOCK_MAX_IOTYPE" assertion for that)
It will be particularly useful in ide code where it's hard to
keep track whether the request done its accounting or not: in the
following patch of the series, trim requests will do the accounting
separately.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Split block_copy to separate file, to be cleanly shared with backup-top
filter driver in further commits.
It's a clean movement, the only change is drop "static" from interface
functions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Here's the next batch of ppc and spapr patches. Includes:
* Fist part of a large cleanup to irq infrastructure
* Recreate the full FDT at CAS time, instead of making a difficult
to follow set of updates. This will help us move towards
eliminating CAS reboots altogether
* No longer provide RTAS blob to SLOF - SLOF can include it just as
well itself, since guests will generally need to relocate it with
a call to instantiate-rtas
* A number of DFP fixes and cleanups from Mark Cave-Ayland
* Assorted bugfixes
* Several new small devices for powernv
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl2XEn0ACgkQbDjKyiDZ
s5I6bA/7B5sjY/QxuE8axm5KupoAnE8zf205hN8mbYASwtDfFwgaeNreVaOSJUpr
fgcx/g9G3rAryGZv3O6i02+wcRgNw1DnJ3ynCthIrExZEcfbTYJiS4s9apwPEQy8
HFmBNdPDqrhFI0aFvXEUauiOp1aapPUUklm34eFscs94lJXxphRUEfa3XT5uEhUh
xrIZwYq20A+ih4UHwk3Onyx/cvFpl6BRB2nVEllQFqzwF5eTTfz9t8+JGTebxD/7
8qqt8ti0KM3wxSDTQnmyMUmpgy+C1iCvNYvv6nWFg+07QuGs48EHlQUUVVni4r9j
kUrDwKS2eC+8e8gP/xdIXEq3R2DsAMq+wFIswXZ3X6x4DoUV0OAJSHc9iMD4l+pr
LyWnVpDprc6XhJHWKpuHZ5w9EuBnZFbIXdlZGFno+8UvXtusnbbuwAZzHTrRJRqe
/AWVpFwGAoOF4KxIOFlPVBI8m4vFad/soVojC0vzIbRqaogOFZAjiL/yD5GwLmMa
tywOEMBUJ/j2lgudTCyKn5uCa/Ew3DS1TSdenJjyqRi/gZM0IaORIhJhyFYW/eO1
U7Uh8BnbC+4J11wwvFR5+W789dgM2+EEtAX9uI08VcE/R2ASabZlN4Zwrl0w4cb/
VRybMT4bgmjzHRpfrqYPxpn8wqPcIw0BCeipSOjY3QU1Q25TEYQ=
=PXXe
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191004' into staging
ppc patch queue 2019-10-04
Here's the next batch of ppc and spapr patches. Includes:
* Fist part of a large cleanup to irq infrastructure
* Recreate the full FDT at CAS time, instead of making a difficult
to follow set of updates. This will help us move towards
eliminating CAS reboots altogether
* No longer provide RTAS blob to SLOF - SLOF can include it just as
well itself, since guests will generally need to relocate it with
a call to instantiate-rtas
* A number of DFP fixes and cleanups from Mark Cave-Ayland
* Assorted bugfixes
* Several new small devices for powernv
# gpg: Signature made Fri 04 Oct 2019 10:35:57 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-4.2-20191004: (53 commits)
ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
spapr: Eliminate SpaprIrq::init hook
spapr: Add return value to spapr_irq_check()
spapr: Use less cryptic representation of which irq backends are supported
xive: Improve irq claim/free path
spapr, xics, xive: Better use of assert()s on irq claim/free paths
spapr: Handle freeing of multiple irqs in frontend only
spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()
spapr: Eliminate SpaprIrq:get_nodename method
spapr: Simplify spapr_qirq() handling
spapr: Fix indexing of XICS irqs
spapr: Eliminate nr_irqs parameter to SpaprIrq::init
spapr: Clarify and fix handling of nr_irqs
spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
spapr: Fold spapr_phb_lsi_qirq() into its single caller
xics: Create sPAPR specific ICS subtype
xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
xics: Eliminate reset hook
xics: Rename misleading ics_simple_*() functions
xics: Eliminate 'reject', 'resend' and 'eoi' class hooks
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP
and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but
actually have only 32 bits of information).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.
So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.
All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the amd_iommu code.
in SMMUv3 we turn the warning message into an error message saying
that the assigned device would not work properly.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The container error integer field is currently used to store
the first error potentially encountered during any
vfio_listener_region_add() call. However this fails to propagate
detailed error messages up to the vfio_connect_container caller.
Instead of using an integer, let's use an Error handle.
Messages are slightly reworded to accomodate the propagation.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This method is used to set up the interrupt backends for the current
configuration. However, this means some confusing redirection between
the "dual" mode init and the init hooks for xics only and xive only modes.
Since we now have simple flags indicating whether XICS and/or XIVE are
supported, it's easier to just open code each initialization directly in
spapr_irq_init(). This will also make some future cleanups simpler.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available. As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).
But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_xive_irq_claim() returns a bool to indicate if it succeeded.
But most of the callers and one callee use int return values and/or an
Error * with more information instead. In any case, ints are a more
common idiom for success/failure states than bools (one never knows
what sense they'll be in).
So instead change to an int return value to indicate presence of error
+ an Error * to describe the details through that call chain.
It also didn't actually check if the irq was already claimed, which is
one of the primary purposes of the claim path, so do that.
spapr_xive_irq_free() also returned a bool... which no callers checked
and was always true, so just drop it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_irq_free() can be used to free multiple irqs at once. That's useful
for its callers, but there's no need to make the individual backend hooks
handle this. We can loop across the irqs in spapr_irq_free() itself and
have the hooks just do one at time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This method is used to determine the name of the irq backend's node in the
device tree, so that we can find its phandle (after SLOF may have modified
it from the phandle we initially gave it).
But, in the two cases the only difference between the node name is the
presence of a unit address. Searching for a node name without considering
unit address is standard practice for the device tree, and
fdt_subnode_offset() will do exactly that, making this method unecessary.
While we're there, remove the XICS_NODENAME define. The name
"interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of
places assume it already.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr
global irq number, redirects through the SpaprIrq::qirq method. But
the array of qemu_irqs is allocated in the PAPR layer, not the
backends, and so the method implementations all return the same thing,
just differing in the preliminary checks they make.
So, we can remove the method, and just implement spapr_qirq() directly,
including all the relevant checks in one place. We change all those
checks into assert()s as well, since a failure here indicates an error in
the calling code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The only reason this parameter was needed was to work around the
inconsistent meaning of nr_irqs between xics and xive. Now that we've
fixed that, we can consistently use the number directly in the SpaprIrq
configuration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
it means slightly different things. For XICS (or, strictly, the ICS) it
indicates the number of "real" external IRQs. Those start at XICS_IRQ_BASE
(0x1000) and don't include the special IPI vector. For XIVE, however, it
includes the whole IRQ space, including XIVE's many IPI vectors.
The spapr code currently doesn't handle this sensibly, with the
nr_irqs value in SpaprIrq having different meanings depending on the
backend. We fix this by renaming nr_irqs to nr_xirqs and making it
always indicate just the number of external irqs, adjusting the value
we pass to XIVE accordingly. We also move to using common constants
in most of the irq configurations, to make it clearer that the IRQ
space looks the same to the guest (and emulated devices), even if the
backend is different.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with
the result, so we might as well just fold that into the helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
No point having a two-line helper that's used exactly once, and not likely
to be used anywhere else in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
We create a subtype of TYPE_ICS specifically for sPAPR. For now all this
does is move the setup of the PAPR specific hcalls and RTAS calls to
the realize() function for this, rather than requiring the PAPR code to
explicitly call xics_spapr_init(). In future it will have some more
function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated. The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.
There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this). The current split isn't really in the right place for
that though. If we need this in future, we can re-implement it more
in line with what we actually need.
So, collapse the two classes together into just TYPE_ICS.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently TYPE_XICS_BASE and TYPE_XICS_SIMPLE have their own reset methods,
using the standard technique for having the subtype call the supertype's
methods before doing its own thing.
But TYPE_XICS_SIMPLE is the only subtype of TYPE_XICS_BASE ever
instantiated, so there's no point having the split here. Merge them
together into just an ics_reset() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE. Rename them to ics_*() accordingly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently ics_reject(), ics_resend() and ics_eoi() indirect through
class methods. But there's only one implementation of each method,
the one in TYPE_ICS_SIMPLE. TYPE_ICS_BASE has no implementation, but
it's never instantiated, and has no other subtypes.
So clean up by eliminating the method and just having ics_reject(),
ics_resend() and ics_eoi() contain the logic directly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Interface instances should never be directly dereferenced. So, the common
practice is to make them incomplete types to make sure no-one does that.
XICSFrabric, however, had a dummy type which is less safe.
We were also using OBJECT_CHECK() where we should have been using
INTERFACE_CHECK().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Certain old guest versions don't understand the radix MMU introduced with
POWER ISA 3.0, but incorrectly select it if presented with the option at
CAS time. We workaround this in qemu by explicitly excluding the radix
(and other ISA 3.0 linked) options if the guest doesn't explicitly note
support for ISA 3.0.
This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty
vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for.
In addition, we unnecessarily call spapr_populate_pa_features() with
different options when initially constructing the device tree and when
adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest
is already false, so we can still use the flag, rather than explicitly
overriding it to be false at the callsite.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
It will help us to discard interrupt numbers which have not been
claimed in the next patch.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
add PnvHomer device model to emulate homer memory access
for pstate table, occ-sensors, slw, occ static and dynamic
values for Power8 and Power9 chips.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-4-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
emulate occ common area region with occ sram device model which
occ and skiboot uses it to communicate regarding sensors, slw
and HWMON in PowerNV emulated host.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-3-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
During PowerNV boot skiboot populates the device tree by
retrieving base address of homer/occ common area from
PBA BARs and prd ipoll mask by accessing xscom read/write
accesses.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-2-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Max memslot size supported by kvm on s390 is 8Tb,
move logic of splitting RAM in chunks upto 8T to KVM code.
This way it will hide KVM specific restrictions in KVM code
and won't affect board level design decisions. Which would allow
us to avoid misusing memory_region_allocate_system_memory() API
and eventually use a single hostmem backend for guest RAM.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190924144751.24149-4-imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>