mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Like Xu	5cc8767d05	general: Replace global smp variables with smp machine properties Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:07:36 -03:00
Richard Henderson	f75da2988e	tcg: Add support for vector compare select Perform a per-element conditional move. This combination operation is easier to implement on some host vector units than plain cmp+bitsel. Omit the usual gvec interface, as this is intended to be used by target-specific gvec expansion call-backs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	38dc12947e	tcg: Add support for vector bitwise select This operation performs d = (b & a) \| (c & ~a), and is present on a majority of host vector units. Include gvec expanders. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	bcefc90208	tcg: Add support for vector absolute value Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	37ee55a081	tcg: Add INDEX_op_dupm_vec Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	d6ecb4a978	tcg: Add tcg_out_dupm_vec to the backend interface Currently stubbed out in all backends that support vectors. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	bab1671f0f	tcg: Manually expand INDEX_op_dup_vec This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Added some commentary to the tcg_reg_alloc_* functions.	2019-05-13 14:44:03 -07:00
Richard Henderson	e7632cfa8b	tcg: Promote tcg_out_{dup,dupi}_vec to backend interface The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	240c08d099	tcg: Support cross-class moves without instruction support PowerPC Altivec does not support direct moves between vector registers and general registers. So when tcg_out_mov fails, we can use the backing memory for the temporary to perform the move. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	78113e83e0	tcg: Return bool success from tcg_out_mov This patch merely changes the interface, aborting on all failures, of which there are currently none. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	d63e3b6e69	tcg: Assert fixed_reg is read-only The only fixed_reg is cpu_env, and it should not be modified during any TB. Therefore code that tries to special-case moves into a fixed_reg is dead. Remove it. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	aeee05f53a	tcg: Restart TB generation after out-of-line ldst overflow This is part c of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	1768987b73	tcg: Restart TB generation after constant pool overflow This is part b of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	7ecd02a06f	tcg: Restart TB generation after relocation overflow If the TB generates too much code, such that backend relocations overflow, try again with a smaller TB. In support of this, move relocation processing from a random place within tcg_out_op, in the handling of branch opcodes, to a new function at the end of tcg_gen_code. This is not a complete solution, as there are additional relocs generated for out-of-line ldst handling and constant pools. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:21 -07:00
Richard Henderson	6e6c4efed9	tcg: Restart after TB code generation overflow If a TB generates too much code, try again with fewer insns. Fixes: https://bugs.launchpad.net/bugs/1824853 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	fce1296f13	tcg: Add INDEX_op_extract2_{i32,i64} This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Markus Armbruster	3de2faa9a8	tcg: Simplify how dump_exec_info() prints dump_exec_info() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_info_jit() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-5-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Markus Armbruster	d4c51a0af3	tcg: Simplify how dump_opcount_info() prints dump_opcount_info() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_info_opcount() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-4-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Richard Henderson	bef16ab4e6	tcg: Diagnose referenced labels that have not been emitted Currently, a jump to a label that is not defined anywhere will be emitted not be relocated. This results in a jump to a random jump target. With tcg debugging, print a diagnostic to the -d op file and abort. This could help debug or detect errors like `c2d9644e6d` ("target/arm: Fix crash on conditional instruction in an IT block") Reported-by: Roman Kapl <code@rkapl.cz> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-11 08:52:44 -08:00
Richard Henderson	dd0a0fcdd8	tcg: Add opcodes for vector minmax arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	8afaf05066	tcg: Add opcodes for vector saturated arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Paolo Bonzini	eae3eb3e18	qemu/queue.h: simplify reverse access to QTAILQ The new definition of QTAILQ does not require passing the headname, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Richard Henderson	4250da1092	tcg: Improve call argument loading Free the argument register only after we have verified that the temporary is not already in that register. This case is likely now that we are back propagating the preferred register. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:43 +11:00
Richard Henderson	25f49c5f15	tcg: Record register preferences during liveness With these preferences, we can arrange for function call arguments to be computed into the proper registers instead of requiring extra moves. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:35 +11:00
Richard Henderson	ae36a246ed	tcg: Add TCG_OPF_BB_EXIT Use this to notice the opcodes that exit the TB, which implies that local temps are really dead and need not be synced. Previously we so marked the true end of the TB, but that was immediately overwritten by the la_bb_end invoked by any TCG_OPF_BB_END opcode, like exit_tb. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:20 +11:00
Richard Henderson	f65a061c39	tcg: Split out more subroutines from liveness_pass_1 Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:17 +11:00
Richard Henderson	2616c80821	tcg: Rename and adjust liveness_pass_1 helpers No need for a "tcg_" prefix for a static function; we already have another "la_" prefix for indicating liveness analysis. Pass in nb_globals and nb_temps, as we will already have them in registers for other loops within the parent function. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:09 +11:00
Richard Henderson	152c35aab4	tcg: Reindent parts of liveness_pass_1 There are two blocks of the form if (foo) { stuff1; goto bar; } else { baz: stuff2; } which have unnecessary and confusing indentation. Remove the else and unindent stuff2. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:57 +11:00
Richard Henderson	1894f69a61	tcg: Dump register preference info with liveness Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:48 +11:00
Richard Henderson	d62816f2db	tcg: Improve register allocation for matching constraints Try harder to honor the output_pref. When we're forced to allocate a second register for the input, it does not need to use the input constraint; that will be honored by the register we allocate for the output and a move is already required. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:40 +11:00
Richard Henderson	69e3706d2b	tcg: Add output_pref to TCGOp Allocate storage for, but do not yet fill in, per-opcode preferences for the output operands. Pass it in to the register allocation routines for output operands. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:37 +11:00
Richard Henderson	ba87719cd2	tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi Pass this through to temp_sync. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:34 +11:00
Richard Henderson	98b4e186c1	tcg: Add preferred_reg argument to temp_sync Pass this through to tcg_reg_alloc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:32 +11:00
Richard Henderson	b722452aef	tcg: Add preferred_reg argument to temp_load Pass this through to tcg_reg_alloc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:29 +11:00
Richard Henderson	b016486e7b	tcg: Add preferred_reg argument to tcg_reg_alloc This new argument will aid register allocation by indicating how the temporary will be used in future. If the preference cannot be satisfied, fall back to the constraints of the current insn. Short circuit the preference when it cannot be satisfied or if it does not further constrain the operation. With an eye toward optimizing function call sequences, optimize for the preferred_reg set containing a single register. For the moment, all users pass 0 for preference. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:06 +11:00
Richard Henderson	b4fc67c7af	tcg: Add reachable_code_pass Delete trivially dead code that follows unconditional branches and noreturn helpers. These can occur either via optimization or via the structure of a target's translator following an exception. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:30 +11:00
Richard Henderson	d88a117eaa	tcg: Reference count labels Increment when adding branches, and decrement when removing them. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:27 +11:00
Emilio G. Cota	ac1043f6d6	tcg: Drop nargs from tcg_op_insert_{before,after} It's unused since `75e8b9b7aa`. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181209193749.12277-9-cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	6ac1778676	tcg: Return success from patch_reloc This will move the assert for success from within (subroutines of) patch_reloc into the callers. It will also let new code do something different when a relocation is out of range. For the moment, all backends are trivially converted to return true. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Emilio G. Cota	72fd2efbbd	tcg: distribute tcg_time into TCG contexts When we implemented per-vCPU TCG contexts, we forgot to also distribute the tcg_time counter, which has remained as a global accessed without any serialization, leading to potentially missed counts. Fix it by distributing the field over the TCG contexts, embedding it into TCGProfile with a field called "cpu_exec_time", which is more descriptive than "tcg_time". Add a function to query this value directly, and for completeness, fill in the field in tcg_profile_snapshot, even though its callers do not use it. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-5-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	c1f543b739	tcg: fix use of uninitialized variable under CONFIG_PROFILER We forgot to initialize n in commit `15fa08f845` ("tcg: Dynamically allocate TCGOps", 2017-12-29). Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Richard Henderson	9f75462065	tcg: Reduce max TB opcode count Also, assert that we don't overflow any of two different offsets into the TB. Both unwind and goto_tb both record a uint16_t for later use. This fixes an arm-softmmu test case utilizing NEON in which there is a TB generated that runs to 7800 opcodes, and compiles to 96k on an x86_64 host. This overflows the 16-bit offset in which we record the goto_tb reset offset. Because of that overflow, we install a jump destination that goes to neverland. Boom. With this reduced op count, the same TB compiles to about 48k for aarch64, ppc64le, and x86_64 hosts, and neither assertion fires. Cc: qemu-stable@nongnu.org Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 09:39:53 -10:00
Emilio G. Cota	128ed2278c	tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx Thereby making it per-TCGContext. Once we remove tb_lock, this will avoid an atomic increment every time a TB is invalidated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	be2cdc5e35	tcg: track TBs with per-region BST's This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Richard Henderson	abebf92597	tcg: Limit the number of ops in a TB In `6001f7729e` we partially attempt to address the branch displacement overflow caused by `15fa08f845`. However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c is a testcase that contains a TB so large as to overflow anyway. The limit here of 8000 ops produces a maximum output TB size of 24112 bytes on a ppc64le host with that test case. This is still much less than the maximum forward branch distance of 32764 bytes. Cc: qemu-stable@nongnu.org Fixes: `15fa08f845` ("tcg: Dynamically allocate TCGOps") Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-09 08:30:57 -07:00
Laurent Vivier	6001f7729e	tcg: workaround branch instruction overflow in tcg_out_qemu_ld/st ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st slow path. BC instruction uses a relative address encoded on 14 bits. The slow path functions are added at the end of the generated instructions buffer, in the reverse order of the callers. So more we have slow path functions more the distance between the caller (BC) and the function increases. This patch changes the behavior to generate the functions in the same order of the callers. Cc: qemu-stable@nongnu.org Fixes: `15fa08f845` ("tcg: Dynamically allocate TCGOps") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180429235840.16659-1-lvivier@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:55 -07:00
Richard Henderson	5bfa803448	tcg: Improve TCGv_ptr support Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros. Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr, tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32. Use inlines instead of macros where possible. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:16 -07:00
Richard Henderson	3774030a3e	tcg: Add generic vector ops for multiplication Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	212be173f0	tcg: Add generic vector ops for comparisons Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	d0ec97967f	tcg: Add generic vector ops for constant shifts Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00

1 2 3 4 5 ...

341 Commits