mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Richard Henderson	1308e02671	cputlb: Split large page tracking per mmu_idx The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush more often than necessary. This allows tlb_flush_page_async_work to flush just the one mmu_idx implicated, which in turn allows us to remove tlb_check_page_and_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:08 +00:00
Richard Henderson	60a2ad7d86	cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush Protect it with the tlb_lock instead of using atomics. The move puts it in or near the same cacheline as the lock; using the lock means we don't need a second atomic operation in order to perform the update. Which makes it cheap to also update pending_flush in tlb_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:02 +00:00
Richard Henderson	8ab102667e	cputlb: Remove tcg_enabled hack from tlb_flush_nocheck The bugs this was working around were fixed with commits `022d6378c7` target/unicore32: remove tlb_flush from uc32_init_fn `6e11beecfd` target/alpha: remove tlb_flush from alpha_cpu_initfn Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:15:57 +00:00
Richard Henderson	53d284554c	cputlb: Move tlb_lock to CPUTLBCommon This is the first of several moves to reduce the size of the CPU_COMMON_TLB macro and improve some locality of refernce. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:15:28 +00:00
Emilio G. Cota	403f290c06	cputlb: read CPUTLBEntry.addr_write atomically Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB). This completes the conversion to tlb_lock. This conversion results on average in no performance loss, as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz) show. 1. aarch64 bootup+shutdown test: - Before: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% ) 31,574,905,303 cycles # 4.217 GHz ( +- 0.12% ) 57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% ) 10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% ) 173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% ) 7.504481349 seconds time elapsed ( +- 0.14% ) - After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 31,478,476,520 cycles # 4.218 GHz ( +- 0.07% ) 57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% ) 10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% ) 173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% ) 7.474970463 seconds time elapsed ( +- 0.07% ) 2. SPEC06int: SPEC06int (test set) [Y axis: Speedup over master] 1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+ \| \| 1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+ \| +++ \| +++ tlb-lock-v3 (spinl\|ck) \| \| +++ \| \| +++ +++ \| \| \| 1.05 +-+....+++...........####.........\|####.+++.\|......\|.....###....+++...........+++....###.........+-+ \| ### ++#\| # \|# \|# *### +++### +++#+# \| +++ \| #\|# ### \| 1 +-++++#++++####+++#++#++++++++++#++#++++#++++#+#+**+#++++###++++###++++###++++#+#++++#+#+++-+ \| +* # #++# * # #### * # * ++# **+# \| * # ***\|# \|# # #\|# #+# # # \| 0.95 +-+....#....#..#.\|..#...#..#.\|..#....#.\|..#.++.#.+++#.**.#....#+#....#.#..++#.#..+-+ \| * # # # \| # # # \| # * * # ++ # * * # * * # * \|* # ++# # # # *** # \| \| * * # ++# # + # # # \| # * * # * * # * * # * * # ++ # **** # ++# # * * # \| 0.9 +-+....#...\|#..#....#.++#..#.\|..#....#....#....#....#....#..\|.#...\|#.#....#..+-+ \| * * # *** # * * # \|# # + # * * # * * # * * # * * # * * # ++ # \|# # * * # \| 0.85 +-+....#..\|..#....#.**..#....#....#....#....#....#....#....#.**.#....#..+-+ \| * # + # * * # \| # * * # * * # * * # * * # * * # * * # * * # * \|* # * * # \| \| * * # * * # * * # + # * * # * * # * * # * * # * * # * * # * * # * \|* # * * # \| 0.8 +-+....#.....#....#....#....#....#....#....#....#....#....#.++.#....#..+-+ \| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # \| 0.75 +-+--*##--###-###-###-###-###-*##-##-##-##-##-##--*##--+-+ 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean png: https://imgur.com/a/BHzpPTW Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181016153840.25877-1-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 19:46:53 -07:00
Richard Henderson	e6cd4bb59b	tcg: Split CONFIG_ATOMIC128 GCC7+ will no longer advertise support for 16-byte __atomic operations if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still has support for __sync_compare_and_swap_16 and we can make use of that. AArch64 does not have, nor ever has had such support, so open-code it. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 19:46:36 -07:00
Richard Henderson	383beda9cf	tcg: Add tlb_index and tlb_entry helpers Isolate the computation of an index from an address into a helper before we change that function. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> [ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ] Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009175129.17888-2-cota@braap.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	71aec3541d	cputlb: serialize tlb updates with env->tlb_lock Currently we rely on atomic operations for cross-CPU invalidations. There are two cases that these atomics miss: cross-CPU invalidations can race with either (1) vCPU threads flushing their TLB, which happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB, which updates .addr_write with a regular store. This results in undefined behaviour, since we're mixing regular and atomic ops on concurrent accesses. Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table and the corresponding victim cache now hold the lock. The readers that do not hold tlb_lock must use atomic reads when reading .addr_write, since this field can be updated by other threads; the conversion to atomic reads is done in the next patch. Note that an alternative fix would be to expand the use of atomic ops. However, in the case of TLB flushes this would have a huge performance impact, since (1) TLB flushes can happen very frequently and (2) we currently use a full memory barrier to flush each TLB entry, and a TLB has many entries. Instead, acquiring the lock is barely slower than a full memory barrier since it is uncontended, and with a single lock acquisition we can flush the entire TLB. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-6-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	ea9025cb49	cputlb: fix assert_cpu_is_self macro Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-5-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	5005e2537d	exec: introduce tlb_init Paves the way for the addition of a per-TLB lock. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	fff42f183e	tcg: access cpu->icount_decr.u16.high with atomics Consistently access u16.high with atomics to avoid undefined behaviour in MTTCG. Note that icount_decr.u16.low is only used in icount mode, so regular accesses to it are OK. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-2-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Richard Henderson	d7f425fdea	tcg: Implement CPU_LOG_TB_NOCHAIN during expansion Rather than test NOCHAIN before linking, do not emit the goto_tb opcode at all. We already do this for goto_ptr. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Thomas Huth	dcf6760a64	accel/tcg: Remove dead code The global cpu_single_env variable has been removed more than 5 years ago, so apparently nobody used this dead debug code in that timeframe anymore. Thus let's remove it completely now. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1537204134-15905-1-git-send-email-thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-02 19:09:13 +02:00
Pavel Dovgalyuk	f9f1f56e4d	translator: fix breakpoint processing QEMU cannot pass through the breakpoints when 'si' command is used in remote gdb. This patch disables inserting the breakpoints when we are already single stepping though the gdb remote protocol. This patch also fixes icount calculation for the blocks that include breakpoints - instruction with breakpoint is not executed and shouldn't be used in icount calculation. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Message-Id: <20180912081910.3228.8523.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-02 19:08:57 +02:00
Emilio G. Cota	78255ba2cc	qht: drop ht argument from qht iterators Accessing the HT from an iterator results almost always in a deadlock. Given that only one qht-internal function uses this argument, drop it from the interface. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-09-26 08:55:54 -07:00
Peter Maydell	55a7cb144d	accel/tcg: Check whether TLB entry is RAM consistently with how we set it up We set up TLB entries in tlb_set_page_with_attrs(), where we have some logic for determining whether the TLB entry is considered to be RAM-backed, and thus has a valid addend field. When we look at the TLB entry in get_page_addr_code(), we use different logic for determining whether to treat the page as RAM-backed and use the addend field. This is confusing, and in fact buggy, because the code in tlb_set_page_with_attrs() correctly decides that rom_device memory regions not in romd mode are not RAM-backed, but the code in get_page_addr_code() thinks they are RAM-backed. This typically results in "Bad ram pointer" assertion if the guest tries to execute from such a memory region. Fix this by making get_page_addr_code() just look at the TLB_MMIO bit in the code_address field of the TLB, which tlb_set_page_with_attrs() sets if and only if the addend field is not valid for code execution. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180713150945.12348-1-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	20cb6ae472	accel/tcg: Return -1 for execution from MMIO regions in get_page_addr_code() Now that all the callers can handle get_page_addr_code() returning -1, remove all the code which tries to handle execution from MMIO regions or small-MMU-region RAM areas. This will mean that we can correctly execute from these areas, rather than ending up either aborting QEMU or delivering an incorrect guest exception. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Cédric Le Goater <clg@kaod.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180710160013.26559-6-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	9739e3767a	accel/tcg: tb_gen_code(): Create single-insn TB for execution from non-RAM If get_page_addr_code() returns -1, this indicates that there is no RAM page we can read a full TB from. Instead we must create a TB which contains a single instruction and which we do not cache, so it is executed only once. Since this means we can now have TBs which are not in any page list, we also need to make tb_phys_invalidate() handle them (by not trying to remove them from a nonexistent page list). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180710160013.26559-5-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	c360a0fd71	accel/tcg: Handle get_page_addr_code() returning -1 in tb_check_watchpoint() When we support execution from non-RAM MMIO regions, get_page_addr_code() will return -1 to indicate that there is no RAM at the requested address. Handle this in tb_check_watchpoint() -- if the exception happened for a PC which doesn't correspond to RAM then there is no need to invalidate any TBs, because the one-instruction TB will not have been cached. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180710160013.26559-4-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	7252f2dea9	accel/tcg: Handle get_page_addr_code() returning -1 in hashtable lookups When we support execution from non-RAM MMIO regions, get_page_addr_code() will return -1 to indicate that there is no RAM at the requested address. Handle this in the cpu-exec TB hashtable lookup code, treating it as "no match found". Note that the call to get_page_addr_code() in tb_lookup_cmp() needs no changes -- a return of -1 will already correctly result in the function returning false. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180710160013.26559-3-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	dbea78a4d6	accel/tcg: Pass read access type through to io_readx() The io_readx() function needs to know whether the load it is doing is an MMU_DATA_LOAD or an MMU_INST_FETCH, so that it can pass the right value to the cpu_transaction_failed() function. Plumb this information through from the softmmu code. This is currently not often going to give the wrong answer, because usually instruction fetches go via get_page_addr_code(). However once we switch over to handling execution from non-RAM by creating single-insn TBs, the path for an insn fetch to generate a bus error will be through cpu_ld*_code() and io_readx(), so without this change we will generate a d-side fault when we should generate an i-side fault. We also have to pass the access type via a CPU struct global down to unassigned_mem_read(), for the benefit of the targets which still use the cpu_unassigned_access() hook (m68k, mips, sparc, xtensa). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180710160013.26559-2-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	59b5552f02	Bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJbTgXfAAoJEL/70l94x66DTbYH/3NutBAkNZKX7EImj/d0I1O8 nERMVH1R70KBcugdsjhaBfTRoATDXdrBng4MBqloIK9dEMT3g6D4TFZJLU+WAjOc 8sItx0BrUR7Sl8SnAvWNFoqVtvVancFiLnu11DsFGM0l8mJHRlZSkQZ0Fd0FL2W/ OPnW7t6F7B2bc1VlPfSs093FVCoD3S+lJmbj64dwNrn8+fOX918V6gSaYQe92aIY pSbJjkRDx2iULmzMY8QH4OQiHgnd/Pijj+D628DMrUc0iW1Rsw5V2Yq7SMY6zoa8 MoI/YDwX6eRMU2mq74BrKlULZrpmQn+6ZCdZTvXzLwc2zpKD4puO4FuMBOA7yx4= =GcxI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging Bug fixes. # gpg: Signature made Tue 17 Jul 2018 16:06:07 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: Document command line options with single dash opts: remove redundant check for NULL parameter i386: only parse the initrd_filename once for multiboot modules i386: fix regression parsing multiboot initrd modules virtio-scsi: fix hotplug ->reset() vs event race qdev: add HotplugHandler->post_plug() callback hw/char/serial: retry write if EAGAIN PC Chipset: Improve serial divisor calculation vhost-user-test: added proper TestServer *dest initialization in test_migrate() hyperv: ensure VP index equal to QEMU cpu_index hyperv: rename vcpu_id to vp_index accel: Fix typo and grammar in comment dump: add kernel_gs_base to QEMU CPU state Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-17 17:06:32 +01:00
Peter Maydell	3474c98a2a	accel/tcg: Assert that tlb fill gave us a valid TLB entry In commit `4b1a3e1e34` we added a check for whether the TLB entry we had following a tlb_fill had the INVALID bit set. This could happen in some circumstances because a stale or wrong TLB entry was pulled out of the victim cache. However, after commit `68fea03855` (which prevents stale entries being in the victim cache) and the previous commit (which ensures we don't incorrectly hit in the victim cache)) this should never be possible. Drop the check on TLB_INVALID_MASK from the "is this a TLB_RECHECK?" condition, and instead assert that the tlb fill procedure has given us a valid TLB entry (or longjumped out with a guest exception). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180713141636.18665-3-peter.maydell@linaro.org	2018-07-16 17:26:01 +01:00
Peter Maydell	b493ccf1fc	accel/tcg: Use correct test when looking in victim TLB for code In get_page_addr_code(), we were incorrectly looking in the victim TLB for an entry which matched the target address for reads, not for code accesses. This meant that we could hit on a victim TLB entry that indicated that the address was readable but not executable, and incorrectly bypass the call to tlb_fill() which should generate the guest MMU exception. Fix this bug. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180713141636.18665-2-peter.maydell@linaro.org	2018-07-16 17:26:01 +01:00
Stefan Weil	696c706642	accel: Fix typo and grammar in comment The typo was found by codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-Id: <20180712194454.26765-1-sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-07-16 16:58:16 +02:00
Emilio G. Cota	ec7eb2ae77	translate-all: honour CF_NOCACHE in tb_gen_code This fixes a record-replay regression introduced by `95590e2` ("translate-all: discard TB when tb_link_page returns an existing matching TB", 2018-06-15). The problem is that code using CF_NOCACHE assumes that the TB returned from tb_gen_code is always a newly-generated one. This assumption, however, was broken in the aforementioned commit. Fix it by honouring CF_NOCACHE, so that tb_gen_code always returns a newly-generated TB when CF_NOCACHE is passed to it. Do this by avoiding the TB hash table if CF_NOCACHE is set. Reported-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Tested-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Signed-off-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 1530806837-5416-1-git-send-email-cota@braap.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-09 16:14:36 +01:00
Richard Henderson	68fea03855	accel/tcg: Avoid caching overwritten tlb entries When installing a TLB entry, remove any cached version of the same page in the VTLB. If the existing TLB entry matches, do not copy into the VTLB, but overwrite it. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-02 08:05:16 -07:00
Peter Maydell	4b1a3e1e34	accel/tcg: Don't treat invalid TLB entries as needing recheck In get_page_addr_code() when we check whether the TLB entry is marked as TLB_RECHECK, we should not go down that code path if the TLB entry is not valid at all (ie the TLB_INVALID bit is set). Tested-by: Laurent Vivier <laurent@vivier.eu> Reported-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180629161731.16239-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-02 08:02:20 -07:00
Peter Maydell	e4c967a720	accel/tcg: Correct "is this a TLB miss" check in get_page_addr_code() In commit `71b9a45330` we changed the condition we use to determine whether we need to refill the TLB in get_page_addr_code() to if (unlikely(env->tlb_table[mmu_idx][index].addr_code != (addr & (TARGET_PAGE_MASK \| TLB_INVALID_MASK)))) { This isn't the right check (it will falsely fail if the input addr happens to have the low bit corresponding to TLB_INVALID_MASK set, for instance). Replace it with a use of the new tlb_hit() function, which is the correct test. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180629162122.19376-3-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-02 08:02:20 -07:00
Peter Maydell	334692bce7	tcg: Define and use new tlb_hit() and tlb_hit_page() functions The condition to check whether an address has hit against a particular TLB entry is not completely trivial. We do this in various places, and in fact in one place (get_page_addr_code()) we have got the condition wrong. Abstract it out into new tlb_hit() and tlb_hit_page() inline functions (one for a known-page-aligned address and one for an arbitrary address), and use them in all the places where we had the condition correct. This is a no-behaviour-change patch; we leave fixing the buggy code in get_page_addr_code() to a subsequent patch. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180629162122.19376-2-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-02 08:02:20 -07:00
Emilio G. Cota	a688e73ba8	translate-all: fix locking of TBs whose two pages share the same physical page Commit `0b5c91f` ("translate-all: use per-page locking in !user-mode", 2018-06-15) introduced per-page locking. It assumed that the physical pages corresponding to a TB (at most two pages) are always distinct, which is wrong. For instance, an xtensa test provided by Max Filippov is broken by the commit, since the test maps two virtual pages to the same physical page: virt1: 7fff, virt2: 8000 phys1 6000fff, phys2 6000000 Fix it by removing the assumption from page_lock_pair. If the two physical page addresses are equal, we only lock the PageDesc once. Note that the two callers of page_lock_pair, namely page_unlock_tb and tb_link_page, are also updated so that we do not try to unlock the same PageDesc twice. Fixes: `0b5c91f74f` Reported-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1529944302-14186-1-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-02 08:02:20 -07:00
Peter Maydell	109b25045b	* "info mtree" improvements (Alexey) * fake VPD block limits for SCSI passthrough (Daniel Barboza) * chardev and main loop fixes (Daniel Berrangé, Sergio, Stefan) * help fixes (Eduardo) * pc-dimm refactoring (David) * tests improvements and fixes (Emilio, Thomas) * SVM emulation fixes (Jan) * MemoryRegionCache fix (Eric) * WHPX improvements (Justin) * ESP cleanup (Mark) * -overcommit option (Michael) * qemu-pr-helper fixes (me) * "info pic" improvements for x86 (Peter) * x86 TCG emulation fixes (Richard) * KVM slot handling fix (Shannon) * Next round of deprecation (Thomas) * Windows dump format support (Viktor) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJbNhHpAAoJEL/70l94x66D4ZoH/22cAWj2SOEVbHt1id1noVpI VxyS+MWXYG3se1HseFpNVlI32f3XyyABGMtJqzNuusD5s5Him8yZcwPxsu1RmEy2 uOk+PIo67qbLhJyZ+f3Q+rWRbFV9W+DvrRBM7RCArWUDCDOBaEVoPrRTWC2y3oId EdcLDc2tP/DvOmXtbNcELCuS3w6G2Nly0WwRI4VLJ2aJT6jAZoSfOONjuRg1gamw 7iUwk6UlCHmIMawnlwe1iQHtleX9KNYv0bA9etDrYIpNoZP935pGybchvztcmgMv QymjNptqse65emcbZ9rp0tqNyJhvP2wOyjQCWlCsooyRSoPQDY2Qc7623cRthqU= =D07Z -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * "info mtree" improvements (Alexey) * fake VPD block limits for SCSI passthrough (Daniel Barboza) * chardev and main loop fixes (Daniel Berrangé, Sergio, Stefan) * help fixes (Eduardo) * pc-dimm refactoring (David) * tests improvements and fixes (Emilio, Thomas) * SVM emulation fixes (Jan) * MemoryRegionCache fix (Eric) * WHPX improvements (Justin) * ESP cleanup (Mark) * -overcommit option (Michael) * qemu-pr-helper fixes (me) * "info pic" improvements for x86 (Peter) * x86 TCG emulation fixes (Richard) * KVM slot handling fix (Shannon) * Next round of deprecation (Thomas) * Windows dump format support (Viktor) # gpg: Signature made Fri 29 Jun 2018 12:03:05 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (60 commits) tests/boot-serial: Do not delete the output file in case of errors hw/scsi: add VPD Block Limits emulation hw/scsi: centralize SG_IO calls into single function hw/scsi: cleanups before VPD BL emulation dump: add Windows live system dump dump: add fallback KDBG using in Windows dump dump: use system context in Windows dump dump: add Windows dump format to dump-guest-memory i386/cpu: make -cpu host support monitor/mwait kvm: support -overcommit cpu-pm=on\|off hmp: obsolete "info ioapic" ioapic: support "info irq" ioapic: some proper indents when dump info ioapic: support "info pic" doc: another fix to "info pic" target-i386: Mark cpu_vmexit noreturn target-i386: Allow interrupt injection after STGI target-i386: Add NMI interception to SVM memory/hmp: Print owners/parents in "info mtree" WHPX: register for unrecognized MSR exits ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-29 12:30:29 +01:00
Michael S. Tsirkin	2266d44311	i386/cpu: make -cpu host support monitor/mwait When guest CPU PM is enabled, and with -cpu host, expose the host CPU MWAIT leaf in the CPUID so guest can make good PM decisions. Note: the result is 100% CPU utilization reported by host as host no longer knows that the CPU is halted. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20180622192148.178309-3-mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-29 13:02:47 +02:00
Paolo Bonzini	8bca9a03ec	move public invalidate APIs out of translate-all.{c,h}, clean up Place them in exec.c, exec-all.h and ram_addr.h. This removes knowledge of translate-all.h (which is an internal header) from several files outside accel/tcg and removes knowledge of AddressSpace from translate-all.c (as it only operates on ram_addr_t). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-28 19:05:30 +02:00
Peter Maydell	7106a87d96	Pull request * Gracefully handle Linux AIO init failure -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbM6O8AAoJEJykq7OBq3PIXuUH/1viFUFjKgXnYnAUKYIITRc/ JwU0FPa+eBxAmD9hj3prx3q1hIVGOTOYO0tDkDMNOB20frqBoKqGCjRFlXtTB/iA VEMFzveznz+c0lrvmZWOh63sqGFXFZ+RFlx3a2cHx9V3Mn/KLpXYsqRQyObYvZIR cTenlE4NkiifIDRuRxjRLal7rcTKgZfaJfofmpVWiPgiQtNOMOYzd3SPvCXfYP9O 0YH8cnSkF9j3MgiLxBw/cHokFE2Si3eymcVOPwE9WbZtc1ovb58lbw5RrVvKLTax X65+Yw0z8dU40juK+3jvXYpzJC/jB08daiLZJn1LmV+b36Vhx20zq8Jzo5e0Cvg= =P5mU -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request * Gracefully handle Linux AIO init failure # gpg: Signature made Wed 27 Jun 2018 15:48:28 BST # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: linux-aio: properly bubble up errors from initialization compiler: add a sizeof_field() macro Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-28 16:28:22 +01:00
Stefan Hajnoczi	f18793b096	compiler: add a sizeof_field() macro Determining the size of a field is useful when you don't have a struct variable handy. Open-coding this is ugly. This patch adds the sizeof_field() macro, which is similar to typeof_field(). Existing instances are updated to use the macro. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180614164431.29305-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-27 13:01:40 +01:00
Emilio G. Cota	d071f4cd55	trace: enable tracing of TCG atomics We do not trace guest atomic accesses. Fix it. Tested with a modified atomic_add-bench so that it executes a deterministic number of instructions, i.e. fixed seeding, no threading and fixed number of loop iterations instead of running for a certain time. Before: - With parallel_cpus = false (no clone syscall so it is never set to true): 220070 memory accesses - With parallel_cpus = true (hard-coded): 212105 memory accesses <-- we're not tracing the atomics! After: 220070 memory accesses regardless of parallel_cpus. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-id: 1527028012-21888-6-git-send-email-cota@braap.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-27 11:09:24 +01:00
Peter Maydell	55df6fcf54	tcg: Support MMU protection regions smaller than TARGET_PAGE_SIZE Add support for MMU protection regions that are smaller than TARGET_PAGE_SIZE. We do this by marking the TLB entry for those pages with a flag TLB_RECHECK. This flag causes us to always take the slow-path for accesses. In the slow path we can then special case them to always call tlb_fill() again, so we have the correct information for the exact address being accessed. This change allows us to handle reading and writing from small regions; we cannot deal with execution from the small region. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180620130619.11362-2-peter.maydell@linaro.org	2018-06-26 17:50:41 +01:00
Emilio G. Cota	0ac20318ce	tcg: remove tb_lock Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+--------------+-+ \| + + + + + B \| \| before *B* ** * \| \|tb lock removal ###D### * \| 600 +-+ * +-+ \| ** # \| \| B #D \| \| *** * ## \| 500 +-+ *** ### +-+ \| * *** ### \| \| B # ## \| \| ** * #D# \| 400 +-+ ## +-+ \| ### \| \| ## \| \| # ## \| 300 +-+ * B* #D# +-+ \| B *** ### \| \| * ** #### \| \| * *** ### \| 200 +-+ B B #D# +-+ \| #B * ## # \| \| #* ## \| \| + D##D# + + + + \| 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ \| + + + + + + \| \| before *B* B \| 80 +tb lock removal ###D### D +-+ \| ### \| \| ## \| 70 +-+ # +-+ \| ## \| \| # \| 60 +-+ B ## +-+ \| * ## \| \| * #D \| 50 +-+ * ## +-+ \| * ### \| \| B* ### \| 40 +-+ ** # ## +-+ \| #D# \| \| B* ### \| 30 +-+ B*B #### +-+ \| B * * # ### \| \| B ###D# \| 20 +-+ D ##D## +-+ \| D# \| \| + + + + + + \| 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 08:18:48 -10:00
Emilio G. Cota	705ad1ff0c	translate-all: remove tb_lock mention from cpu_restore_state_from_tb tb_lock was needed when the function did retranslation. However, since `fca8a500d5` ("tcg: Save insn data and use it in cpu_restore_state_from_tb") we don't do retranslation. Get rid of the comment. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 08:18:48 -10:00
Emilio G. Cota	b7542f7fe8	cputlb: remove tb_lock from tlb_flush functions The acquisition of tb_lock was added when the async tlb_flush was introduced in `e3b9ca810` ("cputlb: introduce tlb_flush_* async work.") tb_lock was there to allow us to do memset() on the tb_jmp_cache's. However, since `f3ced3c592` ("tcg: consistently access cpu->tb_jmp_cache atomically") all accesses to tb_jmp_cache are atomic, so tb_lock is not needed here. Get rid of it. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 08:18:48 -10:00
Emilio G. Cota	194125e3eb	translate-all: protect TB jumps with a per-destination-TB lock This applies to both user-mode and !user-mode emulation. Instead of relying on a global lock, protect the list of incoming jumps with tb->jmp_lock. This lock also protects tb->cflags, so update all tb->cflags readers outside tb->jmp_lock to use atomic reads via tb_cflags(). In order to find the destination TB (and therefore its jmp_lock) from the origin TB, we introduce tb->jmp_dest[]. I considered not using a linked list of jumps, which simplifies code and makes the struct smaller. However, it unnecessarily increases memory usage, which results in a performance decrease. See for instance these numbers booting+shutting down debian-arm: Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%) ------------------------------------------------------------------------------ before 20.88 0.74 0.154512 0. after 20.81 0.38 0.079078 -0.33524904 GTree 21.02 0.28 0.058856 0.67049808 GHashTable + xxhash 21.63 1.08 0.233604 3.5919540 Using a hash table or a binary tree to keep track of the jumps doesn't really pay off, not only due to the increased memory usage, but also because most TBs have only 0 or 1 jumps to them. The maximum number of jumps when booting debian-arm that I measured is 35, but as we can see in the histogram below a TB with that many incoming jumps is extremely rare; the average TB has 0.80 incoming jumps. n_jumps: 379208; avg jumps/tb: 0.801099 dist: [0.0,1.0)\|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁\|[34.0,35.0] Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 08:18:48 -10:00
Emilio G. Cota	95590e24af	translate-all: discard TB when tb_link_page returns an existing matching TB Use the recently-gained QHT feature of returning the matching TB if it already exists. This allows us to get rid of the lookup we perform right after acquiring tb_lock. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 08:18:42 -10:00
Emilio G. Cota	faa9372c07	translate-all: introduce assert_no_pages_locked The appended adds assertions to make sure we do not longjmp with page locks held. Note that user-mode has nothing to check, since page_locks are !user-mode only. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	6d9abf85d5	translate-all: add page_locked assertions This is only compiled under CONFIG_DEBUG_TCG to avoid bloating the binary. In user-mode, assert_page_locked is equivalent to assert_mmap_lock. Note: There are some tb_lock assertions left that will be removed by later patches. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Suggested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	0b5c91f74f	translate-all: use per-page locking in !user-mode Groundwork for supporting parallel TCG generation. Instead of using a global lock (tb_lock) to protect changes to pages, use fine-grained, per-page locks in !user-mode. User-mode stays with mmap_lock. Sometimes changes need to happen atomically on more than one page (e.g. when a TB that spans across two pages is added/invalidated, or when a range of pages is invalidated). We therefore introduce struct page_collection, which helps us keep track of a set of pages that have been locked in the appropriate locking order (i.e. by ascending page index). This commit first introduces the structs and the function helpers, to then convert the calling code to use per-page locking. Note that tb_lock is not removed yet. While at it, rename tb_alloc_page to tb_page_add, which pairs with tb_page_remove. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	45c73de594	translate-all: move tb_invalidate_phys_page_range up in the file This greatly simplifies next commit's diff. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	ae5486e273	translate-all: work page-by-page in tb_invalidate_phys_range_1 So that we pass a same-page range to tb_invalidate_phys_page_range, instead of always passing an end address that could be on a different page. As discussed with Peter Maydell on the list [1], tb_invalidate_phys_page_range doesn't actually do much with 'end', which explains why we have never hit a bug despite going against what the comment on top of tb_invalidate_phys_page_range requires: > * Invalidate all TBs which intersect with the target physical address range > * [start;end[. NOTE: start and end must refer to the same physical page. The appended honours the comment, which avoids confusion. While at it, rework the loop into a for loop, which is less error prone (e.g. "continue" won't result in an infinite loop). [1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09165.html Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	94da9aec2a	translate-all: remove hole in PageDesc Groundwork for supporting parallel TCG generation. Move the hole to the end of the struct, so that a u32 field can be added there without bloating the struct. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	78722ed0b8	translate-all: make l1_map lockless Groundwork for supporting parallel TCG generation. We never remove entries from the radix tree, so we can use cmpxchg to implement lockless insertions. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	1e05197f24	translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB This commit does several things, but to avoid churn I merged them all into the same commit. To wit: - Use uintptr_t instead of TranslationBlock * for the list of TBs in a page. Just like we did in (`c37e6d7e` "tcg: Use uintptr_t type for jmp_list_{next\|first} fields of TB"), the rationale is the same: these are tagged pointers, not pointers. So use a more appropriate type. - Only check the least significant bit of the tagged pointers. Masking with 3/~3 is unnecessary and confusing. - Introduce the TB_FOR_EACH_TAGGED macro, and use it to define PAGE_FOR_EACH_TB, which improves readability. Note that TB_FOR_EACH_TAGGED will gain another user in a subsequent patch. - Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there is a bug and we attempt to remove a TB that is not in the list, instead of segfaulting (since the list is NULL-terminated) we will reach g_assert_not_reached(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	128ed2278c	tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx Thereby making it per-TCGContext. Once we remove tb_lock, this will avoid an atomic increment every time a TB is invalidated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	be2cdc5e35	tcg: track TBs with per-region BST's This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	32359d529f	qht: return existing entry when qht_insert fails The meaning of "existing" is now changed to "matches in hash and ht->cmp result". This is saner than just checking the pointer value. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	61b8cef1d4	qht: require a default comparison function qht_lookup now uses the default cmp function. qht_lookup_custom is defined to retain the old behaviour, that is a cmp function is explicitly provided. qht_insert will gain use of the default cmp in the next patch. Note that we move qht_lookup_custom's @func to be the last argument, which makes the new qht_lookup as simple as possible. Instead of this (i.e. keeping @func 2nd): 0000000000010750 <qht_lookup>: 10750: 89 d1 mov %edx,%ecx 10752: 48 89 f2 mov %rsi,%rdx 10755: 48 8b 77 08 mov 0x8(%rdi),%rsi 10759: e9 22 ff ff ff jmpq 10680 <qht_lookup_custom> 1075e: 66 90 xchg %ax,%ax We get: 0000000000010740 <qht_lookup>: 10740: 48 8b 4f 08 mov 0x8(%rdi),%rcx 10744: e9 37 ff ff ff jmpq 10680 <qht_lookup_custom> 10749: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Peter Maydell	1f871c5e6b	exec.c: Handle IOMMUs in address_space_translate_for_iotlb() Currently we don't support board configurations that put an IOMMU in the path of the CPU's memory transactions, and instead just assert() if the memory region fonud in address_space_translate_for_iotlb() is an IOMMUMemoryRegion. Remove this limitation by having the function handle IOMMUs. This is mostly straightforward, but we must make sure we have a notifier registered for every IOMMU that a transaction has passed through, so that we can flush the TLB appropriately when any of the IOMMUs change their mappings. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-5-peter.maydell@linaro.org	2018-06-15 15:23:34 +01:00
Peter Maydell	2d54f19401	cputlb: Pass cpu_transaction_failed() the correct physaddr The API for cpu_transaction_failed() says that it takes the physical address for the failed transaction. However we were actually passing it the offset within the target MemoryRegion. We don't currently have any target CPU implementations of this hook that require the physical address; fix this bug so we don't get confused if we ever do add one. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180611125633.32755-3-peter.maydell@linaro.org	2018-06-15 15:23:34 +01:00
Peter Maydell	ace4109011	cpu-defs.h: Document CPUIOTLBEntry 'addr' field The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious use; add a comment documenting it (reverse-engineered from what the code that sets it is doing). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180611125633.32755-2-peter.maydell@linaro.org	2018-06-15 15:23:34 +01:00
Peter Maydell	afd76ffba9	* Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlsRd2UUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPG8Qf+M85E8xAQ/bhs90tAymuXkUUsTIFF uI76K8eM0K3b2B+vGckxh1gyN5O3GQaMEDL7vITfqbX+EOH5U2lv8V9JRzf2YvbG Zahjd4pOCYzR0b9JENA1r5U/J8RntNrBNXlKmGTaXOaw9VCXlZyvgVd9CE3z/e2M 0jSXMBdF4LB3UzECI24Va8ejJxdSiJcqXA2j3J+pJFxI698i+Z5eBBKnRdo5TVe5 jl0TYEsbS6CLwhmbLXmt3Qhq+ocZn7YH9X3HjkHEdqDUeYWyT9jwUpa7OHFrIEKC ikWm9er4YDzG/vOC0dqwKbShFzuTpTJuMz5Mj4v8JjM/iQQFrp4afjcW2g== =RS/B -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) # gpg: Signature made Fri 01 Jun 2018 17:42:13 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (56 commits) hw: make virtio devices configurable via default-configs/ hw: allow compiling out SCSI memory: Make operations using MemoryRegionIoeventfd struct pass by pointer. char: Remove unwanted crlf conversion qdev: Remove DeviceClass::init() and ::exit() qdev: Simplify the SysBusDeviceClass::init path hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init hw/i2c/smbus: Use DeviceClass::realize instead of SMBusDeviceClass::init target/i386/kvm.c: Remove compatibility shim for KVM_HINTS_REALTIME Update Linux headers to 4.17-rc6 target/i386/kvm.c: Handle renaming of KVM_HINTS_DEDICATED scripts/update-linux-headers: Handle kernel license no longer being one file scripts/update-linux-headers: Handle __aligned_u64 virtio-gpu-3d: Define VIRTIO_GPU_CAPSET_VIRGL2 elsewhere gdbstub: Prevent fd leakage docs/interop: add "firmware.json" ipmi: Use proper struct reference for KCS vmstate vmstate: Add a VSTRUCT type tcg: remove softfloat from --disable-tcg builds qemu-options: Mark the non-functional -clock option as deprecated ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-01 18:24:16 +01:00
Philippe Mathieu-Daudé	df924c0643	accel: Do not include "exec/address-spaces.h" if it is not necessary Code change produced with: $ git grep '#include "exec/address-spaces.h"' accel \| \ cut -d: -f-1 \| \ xargs egrep -L "(get_system_\|address_space_)" \| \ xargs sed -i.bak '/#include "exec\/address-spaces.h"/d' Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180528232719.4721-3-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-05-31 19:12:13 +02:00
Peter Maydell	bc6b1cec84	Make address_space_translate{, _cached}() take a MemTxAttrs argument As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_translate() and address_space_translate_cached(). Callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-4-peter.maydell@linaro.org	2018-05-31 14:50:52 +01:00
Peter Maydell	c874dc4f5e	Make tb_invalidate_phys_addr() take a MemTxAttrs argument As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to tb_invalidate_phys_addr(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180521140402.23318-3-peter.maydell@linaro.org	2018-05-31 14:50:52 +01:00
Laurent Vivier	4a4ff4c58f	Remove unnecessary variables for function return value Re-run Coccinelle script scripts/coccinelle/return_directly.cocci Signed-off-by: Laurent Vivier <lvivier@redhat.com> ppc part Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2018-05-20 08:48:13 +03:00
Peter Maydell	ae76518047	tcg: Optionally log FPU state in TCG -d cpu logging Usually the logging of the CPU state produced by -d cpu is sufficient to diagnose problems, but sometimes you want to see the state of the floating point registers as well. We don't want to enable that by default as it adds a lot of extra data to the log; instead, allow it to be optionally enabled via -d fpu. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180510130024.31678-1-peter.maydell@linaro.org	2018-05-15 14:58:44 +01:00
Peter Maydell	f5583c527f	target-arm queue: * hw/arm/iotkit.c: fix minor memory leak * softfloat: fix wrong-exception-flags bug for multiply-add corner case * arm: isolate and clean up DTB generation * implement Arm v8.1-Atomics extension * Fix some bugs and missing instructions in the v8.2-FP16 extension -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJa9IUCAAoJEDwlJe0UNgzeEGMQAKKjVRzZ7MBgvxQj0FJSWhSP BZkATf3ktid255PRpIssBZiY9oM+uY6n+/IRozAGvfDBp9eQOkrZczZjfW5hpe0B YsQadtk5cUOXqQzRTegSMPOoMmz8f5GaGOk4R6AEXJEX+Rug/zbOn9Q8Yx7JTd7o yBvU1+fys3galSiB88cffA95B9fwGfLsM7rP6OC4yNdUBYwjHf3wtY53WsxtWqX9 oX4keEiROQkrOfbSy9wYPZzu/0iRo8v35+7wIZhvNSlf02k6yJ7a+w0C4EQIRhWm 5zciE+aMYr7nOGpj7AEJLrRekhwnD6Ppje6aUd15yrxfNRZkpk/FeECWnaOPDis7 QNijx5Zqg6+GyItQKi5U4vFVReMj09OB7xDyAq77xDeBj4l3lg2DNkRfRhqQZAcv 2r4EW+pfLNj76Ah1qtQ410fprw462Sopb6bHmeuFbf1QFbQvJ4CL1+7Jl3ExrDX4 2+iQb4sQghWDxhDLfRSLxQ7K+bX+mNfGdFW8h+jPShD/+JY42dTKkFZEl4ghNgMD mpj8FrQuIkSMqnDmPfoTG5MVTMERacqPU7GGM7/fxudIkByO3zTiLxJ/E+Iy8HvX 29xKoOBjKT5FJrwJABsN6VpA3EuyAARgQIZ/dd6N5GZdgn2KAIHuaI+RHFOesKFd dJGM6sdksnsAAz28aUEJ =uXY+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180510' into staging target-arm queue: * hw/arm/iotkit.c: fix minor memory leak * softfloat: fix wrong-exception-flags bug for multiply-add corner case * arm: isolate and clean up DTB generation * implement Arm v8.1-Atomics extension * Fix some bugs and missing instructions in the v8.2-FP16 extension # gpg: Signature made Thu 10 May 2018 18:44:34 BST # gpg: using RSA key 3C2525ED14360CDE # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" # gpg: aka "Peter Maydell <pmaydell@gmail.com>" # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20180510: (21 commits) target/arm: Clear SVE high bits for FMOV target/arm: Fix float16 to/from int16 target/arm: Implement vector shifted FCVT for fp16 target/arm: Implement vector shifted SCVF/UCVF for fp16 target/arm: Enable ARM_FEATURE_V8_ATOMICS for user-only target/arm: Implement CAS and CASP target/arm: Fill in disas_ldst_atomic target/arm: Introduce ARM_FEATURE_V8_ATOMICS and initial decode target/riscv: Use new atomic min/max expanders tcg: Use GEN_ATOMIC_HELPER_FN for opposite endian atomic add tcg: Introduce atomic helpers for integer min/max target/xtensa: Use new min/max expanders target/arm: Use new min/max expanders tcg: Introduce helpers for integer min/max atomic.h: Work around gcc spurious "unused value" warning make sure that we aren't overwriting mc->get_hotplug_handler by accident arm/boot: split load_dtb() from arm_load_kernel() platform-bus-device: use device plug callback instead of machine_done notifier pc: simplify MachineClass::get_hotplug_handler handling softfloat: Handle default NaN mode after pickNaNMulAdd, not before ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # target/riscv/translate.c	2018-05-11 17:41:54 +01:00
Richard Henderson	58edf9eef9	tcg: Use GEN_ATOMIC_HELPER_FN for opposite endian atomic add Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180508151437.4232-6-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-10 18:10:57 +01:00
Richard Henderson	5507c2bf35	tcg: Introduce atomic helpers for integer min/max Given that this atomic operation will be used by both risc-v and aarch64, let's not duplicate code across the two targets. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180508151437.4232-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-10 18:10:57 +01:00
Emilio G. Cota	b542683d77	translator: merge max_insns into DisasContextBase While at it, use int for both num_insns and max_insns to make sure we have same-type comparisons. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Michael Clark <mjc@sifive.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-09 10:12:21 -07:00
Pavel Dovgalyuk	afd46fcad2	icount: fix cpu_restore_state_from_tb for non-tb-exit cases In icount mode, instructions that access io memory spaces in the middle of the translation block invoke TB recompilation. After recompilation, such instructions become last in the TB and are allowed to access io memory spaces. When the code includes instruction like i386 'xchg eax, 0xffffd080' which accesses APIC, QEMU goes into an infinite loop of the recompilation. This instruction includes two memory accesses - one read and one write. After the first access, APIC calls cpu_report_tpr_access, which restores the CPU state to get the current eip. But cpu_restore_state_from_tb resets the cpu->can_do_io flag which makes the second memory access invalid. Therefore the second memory access causes a recompilation of the block. Then these operations repeat again and again. This patch moves resetting cpu->can_do_io flag from cpu_restore_state_from_tb to cpu_loop_exit* functions. It also adds a parameter for cpu_restore_state which controls restoring icount. There is no need to restore icount when we only query CPU state without breaking the TB. Restoring it in such cases leads to the incorrect flow of the virtual time. In most cases new parameter is true (icount should be recalculated). But there are two cases in i386 and openrisc when the CPU state is only queried without the need to break the TB. This patch fixes both of these cases. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Message-Id: <20180409091320.12504.35329.stgit@pasha-VirtualBox> [rth: Make can_do_io setting unconditional; move from cpu_exec; make cpu_loop_exit_{noexc,restore} call cpu_loop_exit.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-04-11 09:05:22 +10:00
Richard Henderson	6cb1d3b851	tcg: Fix out-of-line generic vector compares A mistake in the type passed to sizeof, that happens to work when the out-of-line fallback itself is using host vectors, but fails when using only the base types. Tested-by: Emilio G. Cota <cota@braap.org> Reported-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-04-06 23:08:50 +10:00
Richard Henderson	87f963be66	tcg: Really fix cpu_io_recompile We have confused the number of instructions that have been executed in the TB with the number of instructions needed to repeat the I/O instruction. We have used cpu_restore_state_from_tb, which means that the guest pc is pointing to the I/O instruction. The only time the answer to the later question is not 1 is when MIPS or SH4 need to re-execute the branch for the delay slot as well. We must rely on cpu->cflags_next_tb to generate the next TB, as otherwise we have a race condition with other guest cpus within the TB cache. Fixes: `0790f86861` Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20180319031545.29359-1-richard.henderson@linaro.org> Tested-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-26 14:37:14 +02:00
Pavel Dovgalyuk	0790f86861	tcg: fix cpu_io_recompile cpu_io_recompile() function was broken by the commit `9b990ee5a3`. Instead of regenerating the block starting from PC of the original block, it just set the instruction counter for TCG. In most cases this was unnoticed, but in icount mode there was an exception for incorrect usage of CF_LAST_IO flag. This patch recovers recompilation of the original block and also configures translation for executing single IO instruction which caused a recompilation. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20180227095338.1060.27385.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>	2018-03-12 17:10:38 +01:00
Pavel Dovgalyuk	5f3bdfd4fa	cpu-exec: fix exception_index handling Function cpu_handle_interrupt calls cc->cpu_exec_interrupt to process pending hardware interrupts. Under the hood cpu_exec_interrupt uses cpu->exception_index to pass information to the internal function which is usually common for exception and interrupt processing. But this value is not reset after return and may be processed again by cpu_handle_exception. This does not happen due to overwriting the exception_index at the end of cpu_handle_interrupt. But this branch may also overwrite the valid exception_index in some cases. Therefore this patch: 1. resets exception_index just after the call to cpu_exec_interrupt 2. prevents overwriting the meaningful value of exception_index Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180227095140.1060.61357.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>	2018-03-12 16:12:50 +01:00
Richard Henderson	22fc352703	tcg: Add generic vector helpers with a scalar operand Use dup to convert a non-constant scalar to a third vector. Add addition, multiplication, and logical operations with an immediate. Add addition, subtraction, multiplication, and logical operations with a non-constant scalar. Allow for the front-end to build operations in which the scalar operand comes first. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	f49b12c6e6	tcg: Add generic helpers for saturating arithmetic No vector ops as yet. SSE only has direct support for 8- and 16-bit saturation; handling 32- and 64-bit saturation is much more expensive. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	3774030a3e	tcg: Add generic vector ops for multiplication Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	212be173f0	tcg: Add generic vector ops for comparisons Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	d0ec97967f	tcg: Add generic vector ops for constant shifts Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	db432672dc	tcg: Add generic vector expanders Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Peter Maydell	b1cef6d02f	Drop remaining bits of ia64 host support We dropped support for ia64 host CPUs in the 2.11 release (removing the TCG backend for it, and advertising the support as being completely removed in the changelog). However there are a few bits and pieces of code still floating about. Remove those, too. We can drop the check in configure for "ia64 or hppa host?" entirely, because we don't support hppa hosts either any more. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1516897189-11035-1-git-send-email-peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-02-05 18:09:45 +01:00
Laurent Vivier	98670d47cd	accel/tcg: add size paremeter in tlb_fill() The MC68040 MMU provides the size of the access that triggers the page fault. This size is set in the Special Status Word which is written in the stack frame of the access fault exception. So we need the size in m68k_cpu_unassigned_access() and m68k_cpu_handle_mmu_fault(). To be able to do that, this patch modifies the prototype of handle_mmu_fault handler, tlb_fill() and probe_write(). do_unassigned_access() already includes a size parameter. This patch also updates handle_mmu_fault handlers and tlb_fill() of all targets (only parameter, no code change). Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20180118193846.24953-2-laurent@vivier.eu>	2018-01-25 16:02:24 +01:00
Peter Maydell	9c4bbee9e3	page_unprotect(): handle calls to pages that are PAGE_WRITE If multiple guest threads in user-mode emulation write to a page which QEMU has marked read-only because of cached TCG translations, the threads can race in page_unprotect: * threads A & B both try to do a write to a page with code in it at the same time (ie which we've made non-writeable, so SEGV) * they race into the signal handler with this faulting address * thread A happens to get to page_unprotect() first and takes the mmap lock, so thread B sits waiting for it to be done * A then finds the page, marks it PAGE_WRITE and mprotect()s it writable * A can then continue OK (returns from signal handler to retry the memory access) * ...but when B gets the mmap lock it finds that the page is already PAGE_WRITE, and so it exits page_unprotect() via the "not due to protected translation" code path, and wrongly delivers the signal to the guest rather than just retrying the access In particular, this meant that trying to run 'javac' in user-mode emulation would fail with a spurious guest SIGSEGV. Handle this by making page_unprotect() assume that a call for a page which is already PAGE_WRITE is due to a race of this sort and return a "fault handled" indication. Since this would cause an infinite loop if we ever called page_unprotect() for some other kind of fault than "write failed due to bad access permissions", tighten the condition in handle_cpu_signal() to check the signal number and si_code, and add a comment so that if somebody does ever find themselves debugging an infinite loop of faults they have some clue about why. (The trick for identifying the correct setting for current_tb_invalidated for thread B (needed to handle the precise-SMC case) is due to Richard Henderson. Paolo Bonzini suggested just relying on si_code rather than trying anything more complicated.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1511879725-9576-3-git-send-email-peter.maydell@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2018-01-23 14:20:53 +01:00
Peter Maydell	a78b1299f1	linux-user: Propagate siginfo_t through to handle_cpu_signal() Currently all the architecture/OS specific cpu_signal_handler() functions call handle_cpu_signal() without passing it the siginfo_t. We're going to want that so we can look at the si_code to determine whether this is a SEGV_ACCERR access violation or some other kind of fault, so change the functions to pass through the pointer to the siginfo_t rather than just the si_addr value. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1511879725-9576-2-git-send-email-peter.maydell@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2018-01-23 14:20:52 +01:00
Paolo Bonzini	4fad446bc9	tcg: add cs_base and flags to -d exec output Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20171217055023.29225-1-pbonzini@redhat.com> [rth: Also change the Chain logging in helper_lookup_tb_ptr.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:40 -08:00
David Hildenbrand	d84be02d69	cpu-exec: fix missed CPU kick during interrupt injection The conditional memory barrier not only looks strange but actually is wrong. On s390x, I can reproduce interrupts via cpu_interrupt() not leading to a proper kick out of emulation every now and then. cpu_interrupt() is especially used for inter CPU communication via SIGP (esp. external calls and emergency interrupts). With this patch, I was not able to reproduce. (esp. no stalls or hangs in the guest). My setup is s390x MTTCG with 16 VCPUs on 8 CPU host, running make -j16. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171129191319.11483-1-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-12-21 09:22:44 +01:00
Philippe Mathieu-Daudé	ff676046fb	misc: remove duplicated includes exec: housekeeping (funny since `02d0e09503`) applied using ./scripts/clean-includes Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Peter Maydell	4d60b25b37	accel/tcg/cpu-exec-common.c: Remove unnecessary include of memory-internal.h The cpu-exec-common.c file includes memory-internal.h, but it doesn't actually use anything from that header. Remove the unnecessary include. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Emilio G. Cota	55bbc8610c	translate-all: fix 'consisits' typo in comment Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Peter Maydell	34d49937e4	accel/tcg: Handle atomic accesses to notdirty memory correctly To do a write to memory that is marked as notdirty, we need to invalidate any TBs we have cached for that memory, and update the cpu physical memory dirty flags for VGA and migration. The slowpath code in notdirty_mem_write() does all this correctly, but the new atomic handling code in atomic_mmu_lookup() doesn't do anything at all, it just clears the dirty bit in the TLB. The effect of this bug is that if the first write to a notdirty page for which we have cached TBs is by a guest atomic access, we fail to invalidate the TBs and subsequently will execute incorrect code. This can be seen by trying to run 'javac' on AArch64. Use the new notdirty_call_before() and notdirty_call_after() functions to correctly handle the update to notdirty memory in the atomic codepath. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1511201308-23580-3-git-send-email-peter.maydell@linaro.org	2017-11-21 12:09:25 +00:00
Peter Maydell	b11ce33fe0	Revert "cpu-exec: don't overwrite exception_index" This reverts commit `e01cecabf3`, which breaks booting of aarch64 Linux images. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-20 10:58:27 +00:00
Peter Maydell	62955e101e	Miscellaneous bugfixes -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAloMXN0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNNAQf/e7/uT2tW7WNfamSOMYXswf0R6ak+ KjVSG+qiNsKaZzXmMFkhm4n0u1vCW0VGEQGRHr0MoSCyyfhupzLRHxfHi8SytqTf S6wqNtIbOK0L8bW+U5vzADks33UCuuUNlVZeOAkEPaXiLlgxmBoHfyoXkIGemJc2 epx5x22rloNQLaBoL7FGmAkQhQCSJg19hAtRLo0tkryCwBZ9P6a1K0aNAHU2RFaB LgRFcxwduwTydsHRYeQ8J7YR0fERle01QUB8y9tlOc8/d2x9yRPBWhPHwscKMv6I JwM0c2Mnw6Yqbwyj7snWty7epgUcHWrOVnZnaIpNW9Z8m/wgz28oZ3a09w== =6wL6 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging Miscellaneous bugfixes # gpg: Signature made Wed 15 Nov 2017 15:27:25 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: fix scripts/update-linux-headers.sh here document exec: Do not resolve subpage in mru_section util/stats64: Fix min/max comparisons cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay cpu-exec: don't overwrite exception_index vhost-user-scsi: add missing virtqueue_size param target-i386: adds PV_TLB_FLUSH CPUID feature bit thread-posix: fix qemu_rec_mutex_trylock macro Makefile: simpler/faster "make help" ioapic/tracing: Remove last DPRINTFs Enable 8-byte wide MMIO for 16550 serial devices Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-16 14:42:54 +00:00
Richard Henderson	ec603b5584	tcg: Record code_gen_buffer address for user-only memory helpers When we handle a signal from a fault within a user-only memory helper, we cannot cpu_restore_state with the PC found within the signal frame. Use a TLS variable, helper_retaddr, to record the unwind start point to find the faulting guest insn. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-11-15 10:33:27 +01:00
Pavel Dovgalyuk	17b50b0c29	cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay This patch ensures that icount_decr.u32.high is clear before calling cpu_exec_nocache when exception is pending. Because the exception is caused by the first instruction in the block and it cannot be executed without resetting the flag. There are two parts in the fix. First, clear icount_decr.u32.high in cpu_handle_interrupt (just before processing the "dependent" request, stored in cpu->interrupt_request or cpu->exit_request) rather than cpu_loop_exec_tb; this ensures that cpu_handle_exception is always reached with zero icount_decr.u32.high unless another interrupt has happened in the meanwhile. Second, try to cause the exception at the beginning of cpu_handle_exception, and exit immediately if the TB cannot execute. With this change, interrupts are processed and cpu_exec_nocache can make process. Signed-off-by: Maria Klimushenkova <maria.klimushenkova@ispras.ru> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20171114081818.27640.33165.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-11-14 14:46:46 +01:00
Pavel Dovgalyuk	e01cecabf3	cpu-exec: don't overwrite exception_index This patch adds a condition before overwriting exception_index fiels. It is needed when exception_index is already set to some meaningful value. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20171114081812.27640.26372.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-11-14 14:46:46 +01:00
Alex Bennée	d25f2a7227	accel/tcg/translate-all: expand cpu_restore_state addr check We are still seeing signals during translation time when we walk over a page protection boundary. This expands the check to ensure the host PC is inside the code generation buffer. The original suggestion was to check versus tcg_ctx.code_gen_ptr but as we now segment the translation buffer we have to settle for just a general check for being inside. I've also fixed up the declaration to make it clear it can deal with invalid addresses. A later patch will fix up the call sites. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20171108153245.20740-2-alex.bennee@linaro.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-13 13:55:27 +00:00
Peter Maydell	426eeecdf5	cpu-exec: Exit exclusive region on longjmp from step_atomic Commit `ac03ee5331` narrowed the scope of the exclusive region so it only covers when we're executing the TB, not when we're generating it. However it missed that there is more than one execution path out of cpu_tb_exec -- if the atomic insn causes an exception then the code will longjmp out, skipping the code to end the exclusive region. This causes QEMU to hang the next time the CPU calls start_exclusive(), waiting for itself to exit the region. Move the "end the region" code out to the end of the function so that it is run for both normal exit and also for exit-via-longjmp. We have to use a volatile bool flag to decide whether we need to end the region, because we can longjump out of the codegen as well as the execution. (For some reason this only reproduces for me with a clang optimized build, not a gcc debug build.) Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Fixes: `ac03ee5331` Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1509640536-32160-1-git-send-email-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-11-03 09:34:21 +01:00
Emilio G. Cota	cc689485ee	translate-all: exit from tb_phys_invalidate if qht_remove fails Two or more threads might race while invalidating the same TB. We currently do not check for this at all despite taking tb_lock, which means we would wrongly invalidate the same TB more than once. This bug has actually been hit by users: I recently saw a report on IRC, although I have yet to see the corresponding test case. Fix this by using qht_remove as the synchronization point; if it fails, that means the TB has already been invalidated, and therefore there is nothing left to do in tb_phys_invalidate. Note that this solution works now that we still have tb_lock, and will continue working once we remove tb_lock. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1508445114-4717-1-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	3468b59e18	tcg: enable multiple TCG contexts in softmmu This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable and tcg_ctxs; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	e8feb96fcc	tcg: introduce regions to split code_gen_buffer This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	f51f315a67	translate-all: use qemu_protect_rwx/none helpers The helpers require the address and size to be page-aligned, so do that before calling them. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	c3fac1138e	tcg: distribute profiling counters across TCGContext's This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	b1311c4acf	tcg: define tcg_init_ctx and make tcg_ctx a pointer Groundwork for supporting multiple TCG contexts. The core of this patch is this change to tcg/tcg.h: > -extern TCGContext tcg_ctx; > +extern TCGContext tcg_init_ctx; > +extern TCGContext tcg_ctx; Note that for now we set tcg_ctx to whatever TCGContext is passed to tcg_context_init -- in this case &tcg_init_ctx. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	44ded3d048	tcg: take tb_ctx out of TCGContext Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	f19c6cc6fc	translate-all: report correct avg host TB size Since commit `6e3b2bfd6` ("tcg: allocate TB structs before the corresponding translated code") we are not fully utilizing code_gen_buffer for translated code, and therefore are incorrectly reporting the amount of translated code as well as the average host TB size. Address this by: - Making the conscious choice of misreporting the total translated code; doing otherwise would mislead users into thinking "-tb-size" is not honoured. - Expanding tb_tree_stats to accurately count the bytes of translated code on the host, and using this for reporting the average tb host size, as well as the expansion ratio. In the future we might want to consider reporting the accurate numbers for the total translated code, together with a "bookkeeping/overhead" field to account for the TB structs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	be1e01171b	exec-all: rename tb_free to tb_remove We don't really free anything in this function anymore; we just remove the TB from the binary search tree. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	2ac01d6daf	translate-all: use a binary search tree to track TBs in TBContext This is a prerequisite for supporting multiple TCG contexts, since we will have threads generating code in separate regions of code_gen_buffer. For this we need a new field (.size) in struct tb_tc to keep track of the size of the translated code. This field uses a size_t to avoid adding a hole to the struct, although really an unsigned int would have been enough. The comparison function we use is optimized for the common case: insertions. Profiling shows that upon booting debian-arm, 98% of comparisons are between existing tb's (i.e. a->size and b->size are both !0), which happens during insertions (and removals, but those are rare). The remaining cases are lookups. From reading the glib sources we see that the first key is always the lookup key. However, the code does not assume this to always be the case because this behaviour is not guaranteed in the glib docs. However, we embed this knowledge in the code as a branch hint for the compiler. Note that tb_free does not free space in the code_gen_buffer anymore, since we cannot easily know whether the tb is the last one inserted in code_gen_buffer. The next patch in this series renames tb_free to tb_remove to reflect this. Performance-wise, lookups in tb_find_pc are the same as before: O(log n). However, insertions are O(log n) instead of O(1), which results in a small slowdown when booting debian-arm: Performance counter stats for 'build/arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel img/arm/aarch32-current-linux-kernel-only.img \ -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): - Before: 8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% ) 16,974 context-switches # 0.002 M/sec ( +- 0.12% ) 0 cpu-migrations # 0.000 K/sec 10,125 page-faults # 0.001 M/sec ( +- 1.23% ) 35,144,901,879 cycles # 4.367 GHz ( +- 0.14% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% ) 10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% ) 192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% ) 8.640869419 seconds time elapsed ( +- 0.57% ) - After: 8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% ) 17,016 context-switches # 0.002 M/sec ( +- 0.40% ) 0 cpu-migrations # 0.000 K/sec 18,769 page-faults # 0.002 M/sec ( +- 0.45% ) 35,660,956,120 cycles # 4.378 GHz ( +- 1.22% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% ) 10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% ) 195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% ) 8.828660235 seconds time elapsed ( +- 0.38% ) Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Richard Henderson	416986d3f9	tcg: Remove CF_IGNORE_ICOUNT Now that we have curr_cflags, we can include CF_USE_ICOUNT early and then remove it as necessary. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	ac03ee5331	cpu-exec: lookup/generate TB outside exclusive region during step_atomic Now that all code generation has been converted to check CF_PARALLEL, we can generate !CF_PARALLEL code without having yet set !parallel_cpus -- and therefore without having to be in the exclusive region during cpu_exec_step_atomic. While at it, merge cpu_exec_step into cpu_exec_step_atomic. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	e82d5a2460	tcg: check CF_PARALLEL instead of parallel_cpus Thereby decoupling the resulting translated code from the current state of the system. The tb->cflags field is not passed to tcg generation functions. So we add a field to TCGContext, storing there a copy of tb->cflags. Most architectures have <= 32 registers, which results in a 4-byte hole in TCGContext. Use this hole for the new field. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	c5a49c63fa	tcg: convert tb->cflags reads to tb_cflags(tb) Convert all existing readers of tb->cflags to tb_cflags, so that we use atomic_read and therefore avoid undefined behaviour in C11. Note that the remaining setters/getters of the field are protected by tb_lock, and therefore do not need conversion. Luckily all readers access the field via 'tb->cflags' (so no foo.cflags, bar->cflags in the code base), which makes the conversion easily scriptable: FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \ accel/tcg/translator.c \| cut -f1 -d':' \| sort \| uniq) perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES perl -pi -e 's/([a-z->.]*)(->\|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES Then manually fixed the few errors that checkpatch reported. Compile-tested for all targets. Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:41 -07:00
Richard Henderson	9b990ee5a3	tcg: Add CPUState cflags_next_tb We were generating code during tb_invalidate_phys_page_range, check_watchpoint, cpu_io_recompile, and (seemingly) discarding the TB, assuming that it would magically be picked up during the next iteration through the cpu_exec loop. Instead, record the desired cflags in CPUState so that we request the proper TB so that there is no more magic. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:41 -07:00
Emilio G. Cota	4e2ca83e71	tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:41 -07:00
David Hildenbrand	f52bfb1214	accel/tcg: allow to invalidate a write TLB entry immediately Background: s390x implements Low-Address Protection (LAP). If LAP is enabled, writing to effective addresses (before any translation) 0-511 and 4096-4607 triggers a protection exception. So we have subpage protection on the first two pages of every address space (where the lowcore - the CPU private data resides). By immediately invalidating the write entry but allowing the caller to continue, we force every write access onto these first two pages into the slow path. we will get a tlb fault with the specific accessed addresses and can then evaluate if protection applies or not. We have to make sure to ignore the invalid bit if tlb_fill() succeeds. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016202358.3633-2-david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2017-10-20 13:32:10 +02:00
Richard Henderson	de258eb07d	tcg: Fix off-by-one in assert in page_set_flags Most of the users of page_set_flags offset (page, page + len) as the end points. One might consider this an error, since the other users do supply an endpoint as the last byte of the region. However, the first thing that page_set_flags does is round end UP to the start of the next page. Which means computing page + len - 1 is in the end pointless. Therefore, accept this usage and do not assert when given the exact size of the vm as the endpoint. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170708025030.15845-2-rth@twiddle.net> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>	2017-10-16 16:00:56 +03:00
Emilio G. Cota	e7e168f413	exec-all: extract tb->tc_* into a separate struct tc_tb In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	6eb062abd6	translate-all: define and use DEBUG_TB_CHECK_GATE This prevents bit rot by ensuring the debug code is compiled when building a user-mode target. Unfortunately the helpers are user-mode-only so we cannot fully get rid of the ifdef checks. Add a comment to explain this. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	dae9e03aed	translate-all: define and use DEBUG_TB_INVALIDATE_GATE This gets rid of an ifdef check while ensuring that the debug code is compiled, which prevents bit rot. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	67a5b5d2f6	exec-all: introduce TB_PAGE_ADDR_FMT And fix the following warning when DEBUG_TB_INVALIDATE is enabled in translate-all.c: CC mipsn32-linux-user/accel/tcg/translate-all.o /data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’: /data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=] printf("protecting code page: 0x" TARGET_FMT_lx "\n", ^ cc1: all warnings being treated as errors /data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed make[1]: * [accel/tcg/translate-all.o] Error 1 Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed make: * [subdir-mipsn32-linux-user] Error 2 cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$ Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	424079c13b	translate-all: define and use DEBUG_TB_FLUSH_GATE This gets rid of some ifdef checks while ensuring that the debug code is compiled, which prevents bit rot. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	84f1c148da	exec-all: bring tb->invalid into tb->cflags This gets rid of a hole in struct TranslationBlock. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	f6bb84d531	tcg: consolidate TB lookups in tb_lookup__cpu_state This avoids duplicating code. cpu_exec_step will also use the new common function once we integrate parallel_cpus into tb->cflags. Note that in this commit we also fix a race, described by Richard Henderson during review. Think of this scenario with threads A and B: (A) Lookup succeeds for TB in hash without tb_lock (B) Sets the TB's tb->invalid flag (B) Removes the TB from tb_htable (B) Clears all CPU's tb_jmp_cache (A) Store TB into local tb_jmp_cache Given that order of events, (A) will keep executing that invalid TB until another flush of its tb_jmp_cache happens, which in theory might never happen. We can fix this by checking the tb->invalid flag every time we look up a TB from tb_jmp_cache, so that in the above scenario, next time we try to find that TB in tb_jmp_cache, we won't, and will therefore be forced to look it up in tb_htable. Performance-wise, I measured a small improvement when booting debian-arm. Note that inlining pays off: Performance counter stats for 'taskset -c 0 qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=jessie.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% ) 23,142 context-switches # 0.001 M/sec ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 10,558 page-faults # 0.001 M/sec ( +- 0.95% ) 53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%] 24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%] 16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%] 76,267,572,582 instructions # 1.41 insns per cycle # 0.32 stalled cycles per insn ( +- 0.87% ) [83.34%] 12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%] 263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%] 19.648474449 seconds time elapsed ( +- 0.82% ) After, w/ inline (this patch): 18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% ) 23,048 context-switches # 0.001 M/sec ( +- 0.48% ) 1 CPU-migrations # 0.000 M/sec 10,708 page-faults # 0.001 M/sec ( +- 0.81% ) 53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%] 23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%] 16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%] 75,786,269,766 instructions # 1.42 insns per cycle # 0.32 stalled cycles per insn ( +- 1.24% ) [83.34%] 12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%] 260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%] 19.340502161 seconds time elapsed ( +- 0.56% ) After, w/o inline: 18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% ) 23,230 context-switches # 0.001 M/sec ( +- 0.42% ) 1 CPU-migrations # 0.000 M/sec 10,563 page-faults # 0.001 M/sec ( +- 1.27% ) 54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%] 24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%] 16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%] 77,659,755,503 instructions # 1.43 insns per cycle # 0.31 stalled cycles per insn ( +- 0.97% ) [83.34%] 12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%] 261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%] 19.700174670 seconds time elapsed ( +- 0.56% ) Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	7f11636dbe	tcg: remove addr argument from lookup_tb_ptr It is unlikely that we will ever want to call this helper passing an argument other than the current PC. So just remove the argument, and use the pc we already get from cpu_get_tb_cpu_state. This change paves the way to having a common "tb_lookup" function. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	841710c78e	cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find Reusing the have_tb_lock name, which is also defined in translate-all.c, makes code reviewing unnecessarily harder. Avoid potential confusion by renaming the local have_tb_lock variable to something else. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	13e1094735	translate-all: make have_tb_lock static It is only used by this object, and it's not exported to any other. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	0aecede612	tcg: fix corruption of code_time profiling counter upon tb_flush Whenever there is an overflow in code_gen_buffer (e.g. we run out of space in it and have to flush it), the code_time profiling counter ends up with an invalid value (that is, code_time -= profile_getclock(), without later on getting += profile_getclock() due to the goto). Fix it by using the ti variable, so that we only update code_time when there is no overflow. Note that in case there is an overflow we fail to account for the elapsed coding time, but this is quite rare so we can probably live with it. "info jit" before/after, roughly at the same time during debian-arm bootup: - before: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 998 JIT cycles -615191529184601 (-256329.804 s at 2.4 GHz) translated TBs 302310 (aborted=0 0.0%) avg ops/TB 48.4 max=438 deleted ops/TB 8.54 avg temps/TB 32.31 max=38 avg host code/TB 361.5 avg search data/TB 24.5 cycles/op -42014693.0 cycles/in byte -121444900.2 cycles/out byte -5629031.1 cycles/search byte -83114481.0 gen_interm time -0.0% gen_code time 100.0% optim./code time -0.0% liveness/code time -0.0% cpu_restore count 6236 avg cycles 110.4 - after: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 1010 JIT cycles 1996899624 (0.832 s at 2.4 GHz) translated TBs 297961 (aborted=0 0.0%) avg ops/TB 48.5 max=438 deleted ops/TB 8.56 avg temps/TB 32.31 max=38 avg host code/TB 361.8 avg search data/TB 24.5 cycles/op 138.2 cycles/in byte 398.4 cycles/out byte 18.5 cycles/search byte 273.1 gen_interm time 14.0% gen_code time 86.0% optim./code time 19.4% liveness/code time 10.3% cpu_restore count 6372 avg cycles 111.0 Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	83974cf4f8	cputlb: bring back tlb_flush_count under !TLB_DEBUG Commit `f0aff0f124` ("cputlb: add assert_cpu_is_self checks") buried the increment of tlb_flush_count under TLB_DEBUG. This results in "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG. Besides, under MTTCG tlb_flush_count is updated by several threads, so in order not to lose counts we'd either have to use atomic ops or distribute the counter, which is more scalable. This patch does the latter by embedding tlb_flush_count in CPUArchState. The global count is then easily obtained by iterating over the CPU list. Note that this change also requires updating the accessors to tlb_flush_count to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to it. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Alex Bennée	8b81253332	accel/tcg/cputlb: avoid recursive BQL (fixes #1706296 ) The mmio path (see exec.c:prepare_mmio_access) already protects itself against recursive locking and it makes sense to do the same for io_readx/writex. Otherwise any helper running in the BQL context will assert when it attempts to write to device memory as in the case of the bug report. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> CC: Richard Jones <rjones@redhat.com> CC: Paolo Bonzini <bonzini@gnu.org> CC: qemu-stable@nongnu.org Message-Id: <20170921110625.9500-1-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-25 11:23:30 -07:00
Philippe Mathieu-Daudé	a411d29637	accel/tcg: move USER code to user-exec.c Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170912211934.20919-1-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Philippe Mathieu-Daudé	10f7d4d53d	accel/tcg: move atomic_template.h to accel/tcg/ Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Thomas Huth <thuth@redhat.com> Message-Id: <20170911213328.9701-5-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Philippe Mathieu-Daudé	61a3f8f6c0	accel/tcg: move tcg-runtime to accel/tcg/ Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170911213328.9701-4-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Philippe Mathieu-Daudé	5841066668	accel/tcg: move user-exec to accel/tcg/ Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170911213328.9701-3-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Thomas Huth	da1849c1eb	accel/tcg: move softmmu_template.h to accel/tcg/ The header is only used by accel/tcg/cputlb.c so we can move it to the accel/tcg/ folder, too. Signed-off-by: Thomas Huth <thuth@redhat.com> [PMD: reword commit title to match series] Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170911213328.9701-2-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	57a269469d	tcg: Infrastructure for managing constant pools A new shared header tcg-pool.inc.c adds new_pool_label, for registering a tcg_target_ulong to be emitted after the generated code, plus relocation data to install a pointer to the data. A new pointer is added to the TCGContext, so that we dump the constant pool as data, not code. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-09-07 11:57:35 -07:00
Richard Henderson	a858339336	tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. This opens the possibility for TCG_TARGET_HAS_direct_jump to be a runtime decision -- based on host cpu capabilities, the size of code_gen_buffer, or a future debugging switch. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-09-07 11:57:34 -07:00
Lluís Vilanova	bb2e0039dc	tcg: Add generic translation framework Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Message-Id: <150002073981.22386.9870422422367410100.stgit@frigg.lan> [rth: Moved max_insns adjustment from tb_start to init_disas_context. Removed pc_next return from translate_insn. Removed tcg_check_temp_count from generic loop. Moved gen_io_end to exactly match gen_io_start. Use qemu_log instead of error_report for temporary leaks. Moved TB size/icount assignments before disas_log.] Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-09-06 08:06:47 -07:00
Peter Maydell	04e3aabde3	cputlb: Support generating CPU exceptions on memory transaction failures Call the new cpu_transaction_failed() hook at the places where CPU generated code interacts with the memory system: io_readx() io_writex() get_page_addr_code() Any access from C code (eg via cpu_physical_memory_rw(), address_space_rw(), ld/st__phys()) will not* trigger CPU exceptions via cpu_transaction_failed(). Handling for transactions failures for this kind of call should be done by using a function which returns a MemTxResult and treating the failure case appropriately in the calling code. In an ideal world we would not generate CPU exceptions for instruction fetch failures in get_page_addr_code() but instead wait until the code translation process tried a load and it failed; however that change would require too great a restructuring and redesign to attempt at this point. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2017-09-04 15:21:55 +01:00
Vladimir Sementsov-Ogievskiy	8908eb1a4a	trace-events: fix code style: print 0x before hex numbers The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events \| xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ .0-9](?:[hljztL]\|ll\|hh)?(?:x\|X\|"\sPRI[xX][^"]"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Lluís Vilanova	9c489ea6be	tcg: Pass generic CPUState to gen_intermediate_code() Needed to implement a target-agnostic gen_intermediate_code() in the future. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Benneé <alex.benee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Message-Id: <150002025498.22386.18051908483085660588.stgit@frigg.lan> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-07-19 14:45:16 -07:00
Lluís Vilanova	61a67f71dd	exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Every vCPU now uses a separate set of TBs for each set of dynamic tracing event state values. Each set of TBs can be used by any number of vCPUs to maximize TB reuse when vCPUs have the same tracing state. This feature is later used by tracetool to optimize tracing of guest code events. The maximum number of TB sets is defined as 2^E, where E is the number of events that have the 'vcpu' property (their state is stored in CPUState->trace_dstate). For this to work, a change on the dynamic tracing state of a vCPU will force it to flush its virtual TB cache (which is only indexed by address), and fall back to the physical TB cache (which now contains the vCPU's dynamic tracing state as part of the hashing function). Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-07-17 13:11:05 +01:00
Emilio G. Cota	d40d3da00c	translate-all: remove redundant !tcg_enabled check in dump_exec_info This check is redundant because it is already performed by the only caller of dump_exec_info -- the caller was updated by `b7da97eef` ("monitor: Check whether TCG is enabled before running the "info jit" code"). Checking twice wouldn't necessarily be too bad, but here the check also returns with tb_lock held. So we can either do the check before tb_lock is acquired, or just get rid of it. Given that it is redundant, I am going for the latter option. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:42 +02:00
Pranith Kumar	b68686bd4b	tcg/aarch64: Use ADRP+ADD to compute target address We use ADRP+ADD to compute the target address for goto_tb. This patch introduces the NOP instruction which is used to align the above instruction pair so that we can use one atomic instruction to patch the destination offsets. CC: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Message-Id: <20170630143614.31059-2-bobby.prani@gmail.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-07-09 21:10:23 -10:00
Paolo Bonzini	f0d14a95a5	monitor: disable "info jit" and "info opcount" if !TCG Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 16:01:16 +02:00
Yang Zhong	8e2b72990e	tcg: make tcg_allowed global Change the tcg_enabled() and make sure user build still enable tcg even x86 softmmu disable tcg. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 16:01:16 +02:00
Paolo Bonzini	290dae4678	cpu: move interrupt handling out of translate-common.c translate-common.c will not be available anymore with --disable-tcg, so we cannot leave cpu_interrupt_handler there. Move the TCG-specific handler to accel/tcg/tcg-all.c, and adopt KVM's handler as the default one, since it works just as well for Xen and qtest. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 16:00:43 +02:00
Yang Zhong	a0be0c585f	tcg: move page_size_init() function translate-all.c will be disabled if tcg is disabled in the build, so page_size_init() function and related variables will be moved to exec.c file. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 16:00:12 +02:00
Paolo Bonzini	8b3ae692b8	vl: convert -tb-size to qemu_strtoul Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 14:39:28 +02:00
Thomas Huth	2cd5394311	cpu: Introduce a wrapper for tlb_flush() that can be used in common code Commit `1f5c00cfdb` ("qom/cpu: move tlb_flush to cpu_common_reset") moved the call to tlb_flush() from the target-specific reset handlers into the common code qom/cpu.c file, and protected the call with "#ifdef CONFIG_SOFTMMU" to avoid that it is called for linux-user only targets. But since qom/cpu.c is common code, CONFIG_SOFTMMU is never defined here, so the tlb_flush() was simply never executed anymore. Fix it by introducing a wrapper for tlb_flush() in a file that is re-compiled for each target, i.e. in translate-all.c. Fixes: `1f5c00cfdb` Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1498454578-18709-5-git-send-email-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 14:30:03 +02:00
Emilio G. Cota	f3ced3c592	tcg: consistently access cpu->tb_jmp_cache atomically Some code paths can lead to atomic accesses racing with memset() on cpu->tb_jmp_cache, which can result in torn reads/writes and is undefined behaviour in C11. These torn accesses are unlikely to show up as bugs, but from code inspection they seem possible. For example, tb_phys_invalidate does: /* remove the TB from the hash list */ h = tb_jmp_cache_hash_func(tb->pc); CPU_FOREACH(cpu) { if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) { atomic_set(&cpu->tb_jmp_cache[h], NULL); } } Here atomic_set might race with a concurrent memset (such as the ones scheduled via "unsafe" async work, e.g. tlb_flush_page) and therefore we might end up with a torn pointer (or who knows what, because we are under undefined behaviour). This patch converts parallel accesses to cpu->tb_jmp_cache to use atomic primitives, thereby bringing these accesses back to defined behaviour. The price to pay is to potentially execute more instructions when clearing cpu->tb_jmp_cache, but given how infrequently they happen and the small size of the cache, the performance impact I have measured is within noise range when booting debian-arm. Note that under "safe async" work (e.g. do_tb_flush) we could use memset because no other vcpus are running. However I'm keeping these accesses atomic as well to keep things simple and to avoid confusing analysis tools such as ThreadSanitizer. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1497486973-25845-1-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-06-30 11:40:59 -07:00
KONRAD Frederic	c935674635	exec: allow to get a pointer for some mmio memory region This introduces a special callback which allows to run code from some MMIO devices. SysBusDevice with a MemoryRegion which implements the request_ptr callback will be notified when the guest try to execute code from their offset. Then it will be able to eg: pre-load some code from an SPI device or ask a pointer from an external simulator, etc.. When the pointer or the data in it are no longer valid the device has to invalidate it. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2017-06-27 15:09:15 +02:00
KONRAD Frederic	71b9a45330	cputlb: fix the way get_page_addr_code fills the tlb get_page_addr_code(..) does a cpu_ldub_code to fill the tlb: This can lead to some side effects if a device is mapped at this address. So this patch replaces the cpu_memory_ld by a tlb_fill. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2017-06-27 15:09:15 +02:00
KONRAD Frederic	f2553f0489	cputlb: move get_page_addr_code This just moves the code before VICTIM_TLB_HIT macro definition so we can use it. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2017-06-27 15:09:14 +02:00
KONRAD Frederic	3416343255	cputlb: cleanup get_page_addr_code to use VICTIM_TLB_HIT This replaces env1 and page_index variables by env and index so we can use VICTIM_TLB_HIT macro later. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2017-06-27 15:09:04 +02:00
Peter Maydell	db7a99cdc1	Queued TCG patches -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZSBP2AAoJEK0ScMxN0CebnyMH/1ZiDhYiqCD7PYfk4/Y7Db+h MNKNozrWKyChWQp1RzwWqcBaIzbuMZkDYn8dfS419PNtFRNoYtHjhYvjSTfcrxS0 U8dGOoqQUHCr/jlyIDUE4y5+aFA9R/1Ih5IQv+QCi5QNXcfeST8zcYF+ImuikP6C 7heIc7dE9kXdA8ycWJ39kYErHK9qEJbvDx6dxMPmb4cM36U239Zb9so985TXULlQ LoHrDpOCBzCbsICBE8iP2RKDvcwENIx21Dwv+9gW/NqR+nRdKcxhTjKEodkS8gl/ UxMxM/TjIPQOLLUhdck5DFgIgBgQWHRqPMJKqt466I0JlXvSpifmWxckWzslXLc= =R+em -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170619' into staging Queued TCG patches # gpg: Signature made Mon 19 Jun 2017 19:12:06 BST # gpg: using RSA key 0xAD1270CC4DD0279B # gpg: Good signature from "Richard Henderson <rth7680@gmail.com>" # gpg: aka "Richard Henderson <rth@redhat.com>" # gpg: aka "Richard Henderson <rth@twiddle.net>" # Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC 16A4 AD12 70CC 4DD0 279B * remotes/rth/tags/pull-tcg-20170619: target/arm: Exit after clearing aarch64 interrupt mask target/s390x: Exit after changing PSW mask target/alpha: Use tcg_gen_lookup_and_goto_ptr tcg: Increase hit rate of lookup_tb_ptr tcg/arm: Use ldr (literal) for goto_tb tcg/arm: Try pc-relative addresses for movi tcg/arm: Remove limit on code buffer size tcg/arm: Use indirect branch for goto_tb tcg/aarch64: Use ADR in tcg_out_movi translate-all: consolidate tb init in tb_gen_code tcg: allocate TB structs before the corresponding translated code util: add cacheinfo Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-22 10:25:03 +01:00
Yang Zhong	244f144134	tcg: move tcg backend files into accel/tcg/ move tcg-runtime.c, translate-all.(ch) and translate-common.c into accel/tcg/ subdirectory and updated related trace-events file. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <1496383606-18060-4-git-send-email-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-15 11:04:06 +02:00
Yang Zhong	d9bb58e510	tcg: move tcg related files into accel/tcg/ subdirectory move cputlb.c, cpu-exec-common.c and cpu-exec.c related tcg exec file into accel/tcg/ subdirectory. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <1496383606-18060-3-git-send-email-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-15 11:04:06 +02:00
Yang Zhong	a9ded6017e	accel: split the tcg accelerator from accel.c file there are some types of accelerators in qemu, and all accelerators have their own file except tcg. tcg accelerator is also defined in accel.c file. tcg accelerator file will be splited from accel.c and re-name to tcg-all.c. accel/ directory will be created to include kvm and tcg related files. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <1496383606-18060-2-git-send-email-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-15 11:04:05 +02:00

... 13 14 15 16 17 ...

856 Commits