mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Lluís Vilanova	9c489ea6be	tcg: Pass generic CPUState to gen_intermediate_code() Needed to implement a target-agnostic gen_intermediate_code() in the future. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Benneé <alex.benee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Message-Id: <150002025498.22386.18051908483085660588.stgit@frigg.lan> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-07-19 14:45:16 -07:00
Peter Maydell	6c4591566d	target-arm queue: * new model of the ARM MPS2/MPS2+ FPGA based development board * clean up DISAS_* exit conditions and fix various regressions since commits `e75449a346` `8a6b28c7b5` (in particular including ones which broke OP-TEE guests) * make Cortex-M3 and M4 correctly default to 8 PMSA regions -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJZbLEBAAoJEDwlJe0UNgzeTqsP/06M2a/rswUKjIGAsXv+TeTl 5N31g9E6Jr57HXK94Q0XtNkLlPwvIn97Dcv6VKg5+E8OgJx7ozldwZVFghWvMbOA mbaikzgTRRUf6ydNTA4DtWYZPkaLNT86Vmb2T1GKS0nmw2ymd+hMLNk5vZd1jhDv krHxwECI5e+u1INpw7erlQ2mqVP1NjvOuMNtjdAgtJ5tnjFRfQaVedePmr5qOuIK xkYMKMNtled/KS0gP4TaSu5S012iYhzrpKISN/g4WHT/8kllr+iEowNAOJSJ6l38 oaBJJJCsLwnnV1nRClp4NNQv0Q/RXyIex5mPkeWERk4QU9adSDHnYJR7xn7JEyzV l9o+av28bXA7l3C8BOi3ahSGh5cDu+hif0Biml/Kke7e4+1Lp3/QWSQ+p/E5PDDq rhk65cg07PxSHeogN8hgu+RYN0gF3WBKASwUIDAkVdBsLlH8LVmoT5DtllL+6PyY cwCp3nWeu0q2YDxGOfCZrUC4YJMl8hqHoWbdVah8vLKV/w/JVUtVEIol0za50dzG ii6wOLqzV8GH0vkVa5x0InlH+t+/LtDRVkgHUT3/64eEEG+SsK/GmZeEtvcmp7GP 7Qx+Dd7hPgh+uis0XZPz37vqyCYhaFNw1+M9EESlQKUKfdY8B5B5bpXVDOBF+0Zl daOoMw8xBd21DXNk9tCk =gVxi -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20170717' into staging target-arm queue: * new model of the ARM MPS2/MPS2+ FPGA based development board * clean up DISAS_* exit conditions and fix various regressions since commits `e75449a346` `8a6b28c7b5` (in particular including ones which broke OP-TEE guests) * make Cortex-M3 and M4 correctly default to 8 PMSA regions # gpg: Signature made Mon 17 Jul 2017 13:43:45 BST # gpg: using RSA key 0x3C2525ED14360CDE # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" # gpg: aka "Peter Maydell <pmaydell@gmail.com>" # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20170717: MAINTAINERS: Add entries for MPS2 board hw/arm/mps2: Add ethernet hw/arm/mps2: Add SCC hw/misc/mps2_scc: Implement MPS2 Serial Communication Controller hw/arm/mps2: Add timers hw/char/cmsdk-apb-timer: Implement CMSDK APB timer device hw/arm/mps2: Add UARTs hw/char/cmsdk-apb-uart.c: Implement CMSDK APB UART hw/arm/mps2: Implement skeleton mps2-an385 and mps2-an511 board models target/arm: use DISAS_EXIT for eret handling target/arm: use gen_goto_tb for ISB handling target/arm/translate: ensure gen_goto_tb sets exit flags target/arm/translate.h: expand comment on DISAS_EXIT target/arm/translate: make DISAS_UPDATE match declared semantics include/exec/exec-all: document common exit conditions target/arm: Make Cortex-M3 and M4 default to 8 PMSA regions qdev: support properties which don't set a default value qdev-properties.h: Explicitly set the default value for arraylen properties Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-18 10:35:06 +01:00
Alex Bennée	df0311e634	include/exec/exec-all: document common exit conditions As a precursor to later patches attempt to come up with a more concrete wording for what each of the common exit cases would be. CC: Emilio G. Cota <cota@braap.org> CC: Richard Henderson <rth@twiddle.net> CC: Lluís Vilanova <vilanova@ac.upc.edu> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 20170713141928.25419-2-alex.bennee@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-17 13:36:07 +01:00
Lluís Vilanova	61a67f71dd	exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Every vCPU now uses a separate set of TBs for each set of dynamic tracing event state values. Each set of TBs can be used by any number of vCPUs to maximize TB reuse when vCPUs have the same tracing state. This feature is later used by tracetool to optimize tracing of guest code events. The maximum number of TB sets is defined as 2^E, where E is the number of events that have the 'vcpu' property (their state is stored in CPUState->trace_dstate). For this to work, a change on the dynamic tracing state of a vCPU will force it to flush its virtual TB cache (which is only indexed by address), and fall back to the physical TB cache (which now contains the vCPU's dynamic tracing state as part of the hashing function). Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-07-17 13:11:05 +01:00
Pranith Kumar	406bc339b0	Revert "exec.c: Fix breakpoint invalidation race" Now that we have proper locking after MTTCG patches have landed, we can revert the commit. This reverts commit `a9353fe897`. CC: Peter Maydell <peter.maydell@linaro.org> CC: Alex BennÃ©e <alex.bennee@linaro.org> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Message-Id: <20170712215143.19594-1-bobby.prani@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 11:05:19 +02:00
Yang Zhong	b11ec7f2e4	tcg: add CONFIG_TCG guards in headers Add CONFIG_TCG around TLB-related functions and structure declarations. Some of these functions are defined in ./accel/tcg/cputlb.c, which will not be linked in if TCG is disabled, and have no stubs; therefore, their callers will also be compiled out for --disable-tcg. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-05 09:11:08 +02:00
Paolo Bonzini	beeaef55e4	tcg: move tb_lock out of translate-all.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 16:01:16 +02:00
Richard Henderson	3fb53fb4d1	tcg/arm: Use indirect branch for goto_tb Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-06-19 11:10:59 -07:00
Emilio G. Cota	cedbcb0152	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org> [rth: Squashed 4 related commits.] Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-06-05 09:25:42 -07:00
Alex Bennée	c3b9a07a33	cputlb: introduce tlb_flush_*_all_cpus[_synced] This introduces support to the cputlb API for flushing all CPUs TLBs with one call. This avoids the need for target helpers to iterate through the vCPUs themselves. An additional variant of the API (_synced) will cause the source vCPUs work to be scheduled as "safe work". The result will be all the flush operations will be complete by the time the originating vCPU executes its safe work. The calling implementation can either end the TB straight away (which will then pick up the cpu->exit_request on entering the next block) or defer the exit until the architectural sync point (usually a barrier instruction). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-02-24 10:32:46 +00:00
Alex Bennée	0336cbf853	cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap While the vargs approach was flexible the original MTTCG ended up having munge the bits to a bitmap so the data could be used in deferred work helpers. Instead of hiding that in cputlb we push the change to the API to make it take a bitmap of MMU indexes instead. For ARM some the resulting flushes end up being quite long so to aid readability I've tended to move the index shifting to a new line so all the bits being or-ed together line up nicely, for example: tlb_flush_page_by_mmuidx(other_cs, pageaddr, (1 << ARMMMUIdx_S1SE1) \| (1 << ARMMMUIdx_S1SE0)); Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [AT: SPARC parts only] Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> [PM: ARM parts only] Reviewed-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-24 10:32:46 +00:00
KONRAD Frederic	e3b9ca8109	cputlb: introduce tlb_flush_* async work. Some architectures allow to flush the tlb of other VCPUs. This is not a problem when we have only one thread for all VCPUs but it definitely needs to be an asynchronous work when we are in true multithreaded work. We take the tb_lock() when doing this to avoid racing with other threads which may be invalidating TB's at the same time. The alternative would be to use proper atomic primitives to clear the tlb entries en-mass. This patch doesn't do anything to protect other cputlb function being called in MTTCG mode making cross vCPU changes. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [AJB: remove need for g_malloc on defer, make check fixes, tb_lock] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-02-24 10:32:46 +00:00
Alex Bennée	e5143e30fb	tcg: remove global exit_request There are now only two uses of the global exit_request left. The first ensures we exit the run_loop when we first start to process pending work and in the kick handler. This is just as easily done by setting the first_cpu->exit_request flag. The second use is in the round robin kick routine. The global exit_request ensured every vCPU would set its local exit_request and cause a full exit of the loop. Now the iothread isn't being held while running we can just rely on the kick handler to push us out as intended. We lightly re-factor the main vCPU thread to ensure cpu->exit_requests cause us to exit the main loop and process any IO requests that might come along. As an cpu->exit_request may legitimately get squashed while processing the EXCP_INTERRUPT exception we also check cpu->queued_work_first to ensure queued work is expedited as soon as possible. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-02-24 10:32:45 +00:00
Alex Bennée	791158d93b	tcg: rename tcg_current_cpu to tcg_current_rr_cpu ..and make the definition local to cpus. In preparation for MTTCG the concept of a global tcg_current_cpu will no longer make sense. However we still need to keep track of it in the single-threaded case to be able to exit quickly when required. qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as well as qemu_kick_rr_cpu() which will become a no-op in MTTCG. For the time being the setting of the global exit_request remains. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2017-02-24 10:32:45 +00:00
Paolo Bonzini	43d70ddf9f	cpu-exec: fix icount out-of-bounds access When icount is active, tb_add_jump is surprisingly called with an out of bounds basic block index. I have no idea how that can work, but it does not seem like a good idea. Clear *last_tb for all TB_EXIT_ICOUNT_EXPIRED cases, even when all you have to do is refill icount_extra. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-16 14:06:56 +01:00
Alex Bennée	d10eb08f5d	cputlb: drop flush_global flag from tlb_flush We have never has the concept of global TLB entries which would avoid the flush so we never actually use this flag. Drop it and make clear that tlb_flush is the sledge-hammer it has always been. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> [DG: ppc portions] Acked-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-13 14:24:37 +00:00
Paolo Bonzini	7d7500d998	tcg: comment on which functions have to be called with tb_lock held softmmu requires more functions to be thread-safe, because translation blocks can be invalidated from e.g. notdirty callbacks. Probably the same holds for user-mode emulation, it's just that no one has ever tried to produce a coherent locking there. This patch will guide the introduction of more tb_lock and tb_unlock calls for system emulation. Note that after this patch some (most) of the mentioned functions are still called outside tb_lock/tb_unlock. The next one will rectify this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-7-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-31 10:51:16 +01:00
Alex Bennée	301e40ed80	translate-all: add DEBUG_LOCKING asserts This adds asserts to check the locking on the various translation engines structures. There are two sets of structures that are protected by locks. The first the l1map and PageDesc structures used to track which translation blocks are associated with which physical addresses. In user-mode this is covered by the mmap_lock. The second case are TB context related structures which are protected by tb_lock which is also user-mode only. Currently the asserts do nothing in SoftMMU mode but this will change for MTTCG. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-4-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-31 10:24:45 +01:00
Richard Henderson	fdbc2b5722	tcg: Add EXCP_ATOMIC When we cannot emulate an atomic operation within a parallel context, this exception allows us to stop the world and try again in a serial context. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-10-26 08:29:00 -07:00
Laurent Vivier	ce5b1bbf62	exec: move cpu_exec_init() calls to realize functions Modify all CPUs to call it from XXX_cpu_realizefn() function. Remove all the cannot_destroy_with_object_finalize_yet as unsafe references have been moved to cpu_exec_realizefn(). (tested with QOM command provided by commit `4c315c27`) for arm: Setting of cpu->mp_affinity is moved from arm_cpu_initfn() to arm_cpu_realizefn() as setting of cpu_index is now done in cpu_exec_realizefn(). To avoid to overwrite an user defined value, we set it to an invalid value by default, and update it in realize function only if the value is still invalid. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-10-24 17:29:16 -02:00
Paolo Bonzini	267f685b8b	cpus-common: move CPU list management to common code Add a mutex for the CPU list to system emulation, as it will be used to manage safe work. Abstract manipulation of the CPU list in new functions cpu_list_add and cpu_list_remove. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 11:57:29 +02:00
Richard Henderson	01ecaf438b	tcg: Merge GETPC and GETRA The return address argument to the softmmu template helpers was confused. In the legacy case, we wanted to indicate that there is no return address, and so passed in NULL. However, we then immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero value, indicating the presence of an (invalid) return address. Push the GETPC_ADJ subtraction down to the only point it's required: immediately before use within cpu_restore_state_from_tb, after all NULL pointer checks have been completed. This makes GETPC and GETRA identical. Remove GETRA as the lesser used macro, replacing all uses with GETPC. Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-09-16 08:12:11 -07:00
Paolo Bonzini	6d21e4208f	tcg: Prepare TB invalidation for lockless TB lookup When invalidating a translation block, set an invalid flag into the TranslationBlock structure first. It is also necessary to check whether the target TB is still valid after acquiring 'tb_lock' but before calling tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in future. Note that we don't have to check 'last_tb'; an already invalidated TB will not be executed anyway and it is thus safe to patch it. Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-13 19:08:43 +02:00
Igor Mammedov	1bc7e522d9	exec: Reduce CONFIG_USER_ONLY ifdeffenery Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-07-26 15:31:58 -03:00
Markus Armbruster	2a6a4076e1	Clean up ill-advised or unusual header guards Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:20:46 +02:00
Sergey Sorokin	b35399bb4e	Fix confusing argument names in some common functions There are functions tlb_fill(), cpu_unaligned_access() and do_unaligned_access() that are called with access type and mmu index arguments. But these arguments are named 'is_write' and 'is_user' in their declarations. The patches fix the arguments to avoid a confusion. Signed-off-by: Sergey Sorokin <afarallax@yandex.ru> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-id: 1465907177-1399402-1-git-send-email-afarallax@yandex.ru Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-12 13:06:08 +01:00
Emilio G. Cota	909eaac9bb	tb hash: track translated blocks with qht Having a fixed-size hash table for keeping track of all translation blocks is suboptimal: some workloads are just too big or too small to get maximum performance from the hash table. The MRU promotion policy helps improve performance when the hash table is a little undersized, but it cannot make up for severely undersized hash tables. Furthermore, frequent MRU promotions result in writes that are a scalability bottleneck. For scalability, lookups should only perform reads, not writes. This is not a big deal for now, but it will become one once MTTCG matures. The appended fixes these issues by using qht as the implementation of the TB hash table. This solution is superior to other alternatives considered, namely: - master: implementation in QEMU before this patchset - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU. - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU. MRU is implemented here by adding an intermediate struct that contains the u32 hash and a pointer to the TB; this allows us, on an MRU promotion, to copy said struct (that is not at the head), and put this new copy at the head. After a grace period, the original non-head struct can be eliminated, and after another grace period, freed. - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize + no MRU for lookups; MRU for inserts. The appended solution is the following: - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize + no MRU for lookups; MRU for inserts. The plots below compare the considered solutions. The Y axis shows the boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis sweeps the number of buckets (or initial number of buckets for qht-autoresize). The plots in PNG format (and with errorbars) can be seen here: http://imgur.com/a/Awgnq Each test runs 5 times, and the entire QEMU process is pinned to a single core for repeatability of results. Host: Intel Xeon E5-2690 28 ++------------+-------------+-------------+-------------+------------++ A*** + + + master A*** + 27 ++ * xxhash ##B###++ \| A****A** xxhash-rcu $$C$$$ \| 26 C$$ A**A**** qht-fixed-nomru%%D%%%++ D%%$$ A***A***Aqht-dyn-mru AE*A 25 ++ %%$$ qht-dyn-nomru &&F&&&++ B#####% \| 24 ++ #C$$$$$ ++ \| B### $ \| \| ## C$$$$$$ \| 23 ++ # C$$$$$$ ++ \| B###### C$$$$$$ %%%D 22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C \| D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E 21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B + E@@@ F&&& + E@ + F&&& + + 20 ++------------+-------------+-------------+-------------+------------++ 14 16 18 20 22 24 log2 number of buckets Host: Intel i7-4790K 14.5 ++------------+------------+-------------+------------+------------++ A + + + master A* + 14 ++ xxhash ##B###++ 13.5 ++ xxhash-rcu $$C$$$++ \| qht-fixed-nomru %%D%%% \| 13 ++ A**** qht-dyn-mru @@E@@@++ \| A*A**A** qht-dyn-nomru &&F&&& \| 12.5 C$$ A**A**A*A** A 12 ++ $$ A ++ D%%% $$ \| 11.5 ++ %% ++ B### %C$$$$$$ \| 11 ++ ## D%%%%% C$$$$$ ++ \| # % C$$$$$$ \| 10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C 10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B + F&& D%%%%%%B######B######B#####B###@@@D%%% + 9.5 ++------------+------------+-------------+------------+------------++ 14 16 18 20 22 24 log2 number of buckets Note that the original point before this patch series is X=15 for "master"; the little sensitivity to the increased number of buckets is due to the poor hashing function in master. xxhash-rcu has significant overhead due to the constant churn of allocating and deallocating intermediate structs for implementing MRU. An alternative would be do consider failed lookups as "maybe not there", and then acquire the external lock (tb_lock in this case) to really confirm that there was indeed a failed lookup. This, however, would not be enough to implement dynamic resizing--this is more complex: see "Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming" by Triplett, McKenney and Walpole. This solution was discarded due to the very coarse RCU read critical sections that we have in MTTCG; resizing requires waiting for readers after every pointer update, and resizes require many pointer updates, so this would quickly become prohibitive. qht-fixed-nomru shows that MRU promotion is advisable for undersized hash tables. However, qht-dyn-mru shows that MRU promotion is not important if the hash table is properly sized: there is virtually no difference in performance between qht-dyn-nomru and qht-dyn-mru. Before this patch, we're at X=15 on "xxhash"; after this patch, we're at X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we can achieve with optimum sizing of the hash table, while keeping the hash table scalable for readers. The improvement we get before and after this patch for booting debian jessie with arm-softmmu is: - Intel Xeon E5-2690: 10.5% less time - Intel i7-4790K: 5.2% less time We could get this same improvement _for this particular workload_ by statically increasing the size of the hash table. But this would hurt workloads that do not need a large hash table. The dynamic (upward) resizing allows us to start small and enlarge the hash table as needed. A quick note on downsizing: the table is resized back to 215 buckets on every tb_flush; this makes sense because it is not guaranteed that the table will reach the same number of TBs later on (e.g. most bootup code is thrown away after boot); it makes sense to grow the hash table as more code blocks are translated. This also avoids the complication of having to build downsizing hysteresis logic into qht. Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-06-11 17:11:16 -07:00
Peter Maydell	6886b98036	cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() The function cpu_resume_from_signal() is now always called with a NULL puc argument, and is rather misnamed since it is never called from a signal handler. It is essentially forcing an exit to the top level cpu loop but without raising any exception, so rename it to cpu_loop_exit_noexc() and drop the useless unused argument. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Riku Voipio <riku.voipio@linaro.org> Message-id: 1463494687-25947-4-git-send-email-peter.maydell@linaro.org	2016-06-09 15:55:02 +01:00
Paolo Bonzini	63c915526d	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:29 +02:00
Paolo Bonzini	00f6da6a1a	exec: extract exec/tb-context.h TCG backends do not need most of exec-all.h; extract what they actually need to a separate file or move it directly to tcg.h. The next patch will stop including exec-all.h from everywhere. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:29 +02:00
Sergey Fedorov	6f789be56d	tcg: Rework tb_invalidated_flag 'tb_invalidated_flag' was meant to catch two events: * some TB has been invalidated by tb_phys_invalidate(); * the whole translation buffer has been flushed by tb_flush(). Then it was checked: * in cpu_exec() to ensure that the last executed TB can be safely linked to directly call the next one; * in cpu_exec_nocache() to decide if the original TB should be provided for further possible invalidation along with the temporarily generated TB. It is always safe to patch an invalidated TB since it is not going to be used anyway. It is also safe to call tb_phys_invalidate() for an already invalidated TB. Thus, setting this flag in tb_phys_invalidate() is simply unnecessary. Moreover, it can prevent from pretty proper linking of TBs, if any arbitrary TB has been invalidated. So just don't touch it in tb_phys_invalidate(). If this flag is only used to catch whether tb_flush() has been called then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using only 'true' and 'false' to set its value. Also, instead of setting it in tb_gen_code(), just after tb_flush() has been called, do it right inside of tb_flush(). In cpu_exec(), this flag is used to track if tb_flush() has been called and have made 'next_tb' (a reference to the last executed TB) invalid for linking it to directly call the next TB. tb_flush() can be called during the CPU execution loop from tb_gen_code(), during TB execution or by another thread while 'tb_lock' is released. Catch for translation buffer flush reliably by resetting this flag once before first TB lookup and each time we find it set before trying to add a direct jump. Don't touch in in tb_find_physical(). Each vCPU has its own execution loop in multithreaded mode and thus should have its own copy of the flag to be able to reset it with its own 'next_tb' and don't affect any other vCPU execution thread. So make this flag per-vCPU and move it to CPUState. In cpu_exec_nocache(), we only need to check if tb_flush() has been called from tb_gen_code() called by cpu_exec_nocache() itself. To do this reliably, preserve the old value of the flag, reset it before calling tb_gen_code(), check afterwards, and combine the saved value back to the flag. This patch is based on the patch "tcg: move tb_invalidated_flag to CPUState" from Paolo Bonzini <pbonzini@redhat.com>. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:42 -10:00
Sergey Fedorov	9962c478b1	tcg: Clarify thread safety check in tb_add_jump() The check is to make sure that another thread hasn't already done the same while we were outside of tb_lock. Mention this in a comment. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	c37e6d7e35	tcg: Use uintptr_t type for jmp_list_{next\|first} fields of TB These fields do not contain pure pointers to a TranslationBlock structure. So uintptr_t is the most appropriate type for them. Also put some asserts to assure that the two least significant bits of the pointer are always zero before assigning it to jmp_list_first. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	f309101c26	tcg: Clean up direct block chaining data fields Briefly describe in a comment how direct block chaining is done. It should help in understanding of the following data fields. Rename some fields in TranslationBlock and TCGContext structures to better reflect their purpose (dropping excessive 'tb_' prefix in TranslationBlock but keeping it in TCGContext): tb_next_offset => jmp_reset_offset tb_jmp_offset => jmp_insn_offset tb_next => jmp_target_addr jmp_next => jmp_list_next jmp_first => jmp_list_first Avoid using a magic constant as an invalid offset which is used to indicate that there's no n-th jump generated. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	10b4f48555	tcg: Note requirement on atomic direct jump patching Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <1461341333-19646-12-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	7d14e0e2d6	tcg/arm: Make direct jump patching thread-safe Ensure direct jump patching in ARM is atomic by using atomic_read()/atomic_set() for code patching. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1461341333-19646-8-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	ed3d51ecd7	tcg/s390: Make direct jump patching thread-safe Ensure direct jump patching in s390 is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() for code patching. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1461341333-19646-7-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	0d07abf05e	tcg/i386: Make direct jump patching thread-safe Ensure direct jump patching in i386 is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() for code patching. tcg_out_nopn() implementation: Suggested-by: Richard Henderson <rth@twiddle.net>. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1461341333-19646-6-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	76442a939e	tci: Make direct jump patching thread-safe Ensure direct jump patching in TCI is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() to load/store the address. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1461341333-19646-4-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:40 -10:00
Emilio G. Cota	89fee74a0f	tb: consistently use uint32_t for tb->flags We are inconsistent with the type of tb->flags: usage varies loosely between int and uint64_t. Settle to uint32_t everywhere, which is superior to both: at least one target (aarch64) uses the most significant bit in the u32, and uint64_t is wasteful. Compile-tested for all targets. Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Suggested-by: Richard Henderson <rth@twiddle.net> Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1460049562-23517-1-git-send-email-cota@braap.org>	2016-05-12 14:06:40 -10:00
Alex Bennée	d977e1c2db	qemu-log: dfilter-ise exec, out_asm, op and opt_op This ensures the code generation debug code will honour -dfilter if set. For the "exec" tracing I've added a new inline macro for efficiency's sake. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Aurelien Jarno <aurelien@aureL32.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1458052224-9316-8-git-send-email-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:18 +01:00
Peter Maydell	1a83063522	qemu-log: Improve the "exec" TB execution logging Improve the TB execution logging so that it is easier to identify what is happening from trace logs: * move the "Trace" logging of executed TBs into cpu_tb_exec() so that it is emitted if and only if we actually execute a TB, and for consistency for the CPU state logging * log when we link two TBs together via tb_add_jump() * log when cpu_tb_exec() returns early from a chain of TBs The new style logging looks like this: Trace 0x7fb7cc822ca0 [ffffffc0000dce00] Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823420 [ffffffc000302688] Trace 0x7fb7cc8234a0 [ffffffc000302698] Trace 0x7fb7cc823520 [ffffffc0003026a4] Trace 0x7fb7cc823560 [ffffffc0000dce44] Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc822fd0 [ffffffc0000dd52c] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [AJB: reword patch title, Abandoned->Stopped] Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1458052224-9316-6-git-send-email-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:18 +01:00
Peter Maydell	651a5bc037	exec.c: Add cpu_get_address_space() Add a function to return the AddressSpace for a CPU based on its numerical index. (Callers outside exec.c don't have access to the CPUAddressSpace struct so can't just fish it out of the CPUState struct directly.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2016-01-21 14:15:05 +00:00
Peter Maydell	a54c87b68a	exec.c: Pass MemTxAttrs to iotlb_to_region so it uses the right AS Pass the MemTxAttrs for the memory access to iotlb_to_region(); this allows it to determine the correct AddressSpace to use for the lookup. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2016-01-21 14:15:05 +00:00
Peter Maydell	d7898cda81	cputlb.c: Use correct address space when looking up MemoryRegionSection When looking up the MemoryRegionSection for the new TLB entry in tlb_set_page_with_attrs(), use cpu_asidx_from_attrs() to determine the correct address space index for the lookup, and pass it into address_space_translate_for_iotlb(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2016-01-21 14:15:05 +00:00
Peter Maydell	1787cc8ee5	exec-all.h: Document tlb_set_page_with_attrs, tlb_set_page Add documentation comments for tlb_set_page_with_attrs() and tlb_set_page(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2016-01-21 14:15:04 +00:00
Peter Maydell	12ebc9a76d	exec.c: Allow target CPUs to define multiple AddressSpaces Allow multiple calls to cpu_address_space_init(); each call adds an entry to the cpu->ases array at the specified index. It is up to the target-specific CPU code to actually use these extra address spaces. Since this multiple AddressSpace support won't work with KVM, add an assertion to avoid confusing failures. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2016-01-21 14:15:04 +00:00
Peter Maydell	56943e8cc1	exec.c: Don't set cpu->as until cpu_address_space_init Rather than setting cpu->as unconditionally in cpu_exec_init (and then having target-i386 override this later), don't set it until the first call to cpu_address_space_init. This requires us to initialise the address space for both TCG and KVM (KVM doesn't need the AS listener but it does require cpu->as to be set). For target CPUs which don't set up any address spaces (currently everything except i386), add the default address_space_memory in qemu_init_vcpu(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2016-01-21 14:15:04 +00:00
Dr. David Alan Gilbert	87f50caa30	Move page_size_init earlier The HOST_PAGE_ALIGN macros don't work until the page size variables have been set up; later in postcopy I use those macros in the RAM code, and it can be triggered using -object. Fix this by initialising page_size_init() earlier - it's currently initialised inside the accelerators, move it up into vl.c. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-10 14:51:48 +01:00
Pavel Dovgalyuk	56c0269a9e	cpu-exec: allow temporary disabling icount This patch is required for deterministic replay to generate an exception by trying executing an instruction without changing icount. It adds new flag to TB for disabling icount while translating it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20150917162359.8676.77011.stgit@PASHA-ISP.def.inno> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-05 12:19:09 +01:00

1 2 3

128 Commits