mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Richard Henderson	e01fa97dea	tcg: Add temp_readonly In most, but not all, places that we check for TEMP_FIXED, we are really testing that we do not modify the temporary. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-13 08:39:08 -10:00
Richard Henderson	ee17db83d2	tcg: Consolidate 3 bits into enum TCGTempKind The temp_fixed, temp_global, temp_local bits are all related. Combine them into a single enumeration. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-13 08:39:08 -10:00
Richard Henderson	4e18617555	tcg: Increase tcg_out_dupi_vec immediate to int64_t While we don't store more than tcg_target_long in TCGTemp, we shouldn't be limited to that for code generation. We will be able to use this for INDEX_op_dup2_vec with 2 constants. Also pass along the minimal vece that may be said to apply to the constant. This allows some simplification in the various backends. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-13 08:39:08 -10:00
Richard Henderson	0a6a8bc8eb	tcg: Use tcg_out_dupi_vec from temp_load Having dupi pass though movi is confusing and arguably wrong. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-13 08:39:08 -10:00
Richard Henderson	c8bc1168ad	tcg: Constify tcg_code_gen_epilogue Now that all native tcg hosts support splitwx, make this pointer const. Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:42 -10:00
Richard Henderson	755bf9e514	tcg: Adjust tcg_register_jit for const We must change all targets at once, since all must match the declaration in tcg.c. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	92ab8e7d62	tcg: Adjust tcg_out_label for const Simplify the arguments to always use s->code_ptr instead of take it as an argument. That makes it easy to ensure that the value_ptr is always the rx version. Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	2be7d76b15	tcg: Adjust tcg_out_call for const We must change all targets at once, since all must match the declaration in tcg.c. Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	ffd0e50736	tcg: Adjust TCGLabel for const Change TCGLabel.u.value_ptr to const, and initialize it with tcg_splitwx_to_rx. Propagate const through tcg/host/ only as far as needed to avoid errors from the value_ptr change. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	db0c51a380	tcg: Introduce tcg_splitwx_to_{rx,rw} Add two helper functions, using a global variable to hold the displacement. The displacement is currently always 0, so no change in behaviour. Begin using the functions in tcg common code only. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	8b5c2b6260	tcg: Move tcg epilogue pointer out of TCGContext This value is constant across all thread-local copies of TCGContext, so we might as well move it out of thread-local storage. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	b91ccb3115	tcg: Move tcg prologue pointer out of TCGContext This value is constant across all thread-local copies of TCGContext, so we might as well move it out of thread-local storage. Use the correct function pointer type, and name the variable tcg_qemu_tb_exec, which means that we are able to remove the macro that does the casting. Replace HAVE_TCG_QEMU_TB_EXEC with CONFIG_TCG_INTERPRETER, as this is somewhat clearer in intent. Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	1da8de39a3	util: Enhance flush_icache_range with separate data pointer We are shortly going to have a split rw/rx jit buffer. Depending on the host, we need to flush the dcache at the rw data pointer and flush the icache at the rx code pointer. For now, the two passed pointers are identical, so there is no effective change in behaviour. Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	df5d2b1658	tcg: Do not flush icache for interpreter This is currently a no-op within tci/tcg-target.h, but is about to be moved to a more generic location. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:41 -10:00
Richard Henderson	07ce0b0530	tcg: Introduce INDEX_op_qemu_st8_i32 Enable this on i386 to restrict the set of input registers for an 8-bit store, as required by the architecture. This removes the last use of scratch registers for user-only mode. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:06 -10:00
Richard Henderson	084cfca143	util: Extract flush_icache_range to cacheflush.c This has been a tcg-specific function, but is also in use by hardware accelerators via physmem.c. This can cause link errors when tcg is disabled. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20201214140314.18544-3-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:36 +01:00
Richard Henderson	f14bed3fd4	tcg: Remove assert from set_jmp_reset_offset Since `6e6c4efed9`, there has been a more appropriate range check done later at the end of tcg_gen_code. There, a failing range check results in a returned error code, which causes the TB to be restarted at half the size. Reported-by: Sai Pavan Boddu <saipava@xilinx.com> Tested-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-11-04 10:35:40 -08:00
Richard Henderson	b4cb76e620	tcg: Do not kill globals at conditional branches We can easily register allocate the entire extended basic block (in this case, the set of blocks connected by fallthru), simply by not discarding the register state at the branch. This does not help blocks starting with a label, as they are reached via a taken branch, and that would require saving the complete register state at the branch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-10-27 09:48:07 -07:00
Richard Henderson	bc2b17e6ea	tcg: Move some TCG_CT_* bits to TCGArgConstraint bitfields These are easier to set and test when they have their own fields. Reduce the size of alias_index and sort_index to 4 bits, which is sufficient for TCG_MAX_OP_ARGS. This leaves only the bits indicating constants within the ct field. Move all initialization to allocation time, rather than init individual fields in process_op_defs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-10-08 05:57:32 -05:00
Richard Henderson	74a117906b	tcg: Remove TCG_CT_REG This wasn't actually used for anything, really. All variable operands must accept registers, and which are indicated by the set in TCGArgConstraint.regs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-10-08 05:57:32 -05:00
Richard Henderson	66792f90f1	tcg: Move sorted_args into TCGArgConstraint.sort_index This uses an existing hole in the TCGArgConstraint structure and will be convenient for keeping the data in one place. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-10-08 05:57:32 -05:00
Richard Henderson	9be0d08019	tcg: Drop union from TCGArgConstraint The union is unused; let "regs" appear in the main structure without the "u.regs" wrapping. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-10-08 05:57:32 -05:00
Richard Henderson	4c389f6edf	disas: Move host asm annotations to tb_gen_code Instead of creating GStrings and passing them into log_disas, just print the annotations directly in tb_gen_code. Fix the annotations for the slow paths of the TB, after the part implementing the final guest instruction. Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-10-03 04:25:14 -05:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Paolo Bonzini	139c1837db	meson: rename included C source files to .c.inc With Makefiles that have automatically generated dependencies, you generated includes are set as dependencies of the Makefile, so that they are built before everything else and they are available when first building the .c files. Alternatively you can use a fine-grained dependency, e.g. target/arm/translate.o: target/arm/decode-neon-shared.inc.c With Meson you have only one choice and it is a third option, namely "build at the beginning of the corresponding target"; the way you express it is to list the includes in the sources of that target. The problem is that Meson decides if something is a source vs. a generated include by looking at the extension: '.c', '.cc', '.m', '.C' are sources, while everything else is considered an include---including '.inc.c'. Use '.c.inc' to avoid this, as it is consistent with our other convention of using '.rst.inc' for included reStructuredText files. The editorconfig file is adjusted. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:30 -04:00
Emilio G. Cota	938e897a66	tcg: call qemu_spin_destroy for tb->jmp_lock Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Robert Foley <robert.foley@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [RF: minor changes + remove tb_destroy_func] Message-Id: <20200609200738.445-7-robert.foley@linaro.org> Message-Id: <20200612190237.30436-10-alex.bennee@linaro.org>	2020-06-16 14:49:05 +01:00
Richard Henderson	61f15c487f	tcg: Improve move ops in liveness_pass_2 If the output of the move is dead, then the last use is in the store. If we propagate the input to the store, then we can remove the move opcode entirely. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-06-02 08:42:37 -07:00
Richard Henderson	23850a74af	tcg: Implement gvec support for rotate by scalar No host backend support yet, but the interfaces for rotls are in place. Only implement left-rotate for now, as the only known use of vector rotate by scalar is s390x, so any right-rotate would be unused and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-06-02 08:42:37 -07:00
Richard Henderson	5d0ceda902	tcg: Implement gvec support for rotate by vector No host backend support yet, but the interfaces for rotlv and rotrv are in place. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Drop the generic expansion from rot to shift; we can do better for each backend, and then this code becomes unused.	2020-06-02 08:42:37 -07:00
Richard Henderson	b0f7e7444c	tcg: Implement gvec support for rotate by immediate No host backend support yet, but the interfaces for rotli are in place. Canonicalize immediate rotate to the left, based on a survey of architectures, but provide both left and right shift interfaces to the translators. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-06-02 08:42:37 -07:00
Alex Bennée	e5ef4ec28b	disas: include an optional note for the start of disassembly This will become useful shortly for providing more information about output assembly inline. While there fix up the indenting and code formatting in disas(). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200513175134.19619-9-alex.bennee@linaro.org>	2020-05-15 15:25:16 +01:00
Philippe Mathieu-Daudé	dcb32f1d8f	tcg: Search includes from the project root source directory We currently search both the root and the tcg/ directories for tcg files: $ git grep '#include "tcg/' \| wc -l 28 $ git grep '#include "tcg[^/]' \| wc -l 94 To simplify the preprocessor search path, unify by expliciting the tcg/ directory. Patch created mechanically by running: $ for x in \ tcg.h tcg-mo.h tcg-op.h tcg-opc.h \ tcg-op-gvec.h tcg-gvec-desc.h; do \ sed -i "s,#include \"$x\",#include \"tcg/$x\"," \ $(git grep -l "#include \"$x\""); \ done Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200101112303.20724-2-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-01-15 15:13:10 -10:00
Robert Foley	7606488c0e	Add use of RCU for qemu_logfile. This now allows changing the logfile while logging is active, and also solves the issue of a seg fault while changing the logfile. Any read access to the qemu_logfile handle will use the rcu_read_lock()/unlock() around the use of the handle. To fetch the handle we will use atomic_rcu_read(). We also in many cases do a check for validity of the logfile handle before using it to deal with the case where the file is closed and set to NULL. The cases where we write to the qemu_logfile will use atomic_rcu_set(). Writers will also use call_rcu() with a newly added qemu_logfile_free function for freeing/closing when readers have finished. Signed-off-by: Robert Foley <robert.foley@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20191118211528.3221-6-robert.foley@linaro.org>	2019-12-18 20:18:02 +00:00
Robert Foley	fc59d2d870	qemu_log_lock/unlock now preserves the qemu_logfile handle. qemu_log_lock() now returns a handle and qemu_log_unlock() receives a handle to unlock. This allows for changing the handle during logging and ensures the lock() and unlock() are for the same file. Also in target/tilegx/translate.c removed the qemu_log_lock()/unlock() calls (and the log("\n")), since the translator can longjmp out of the loop if it attempts to translate an instruction in an inaccessible page. Signed-off-by: Robert Foley <robert.foley@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20191118211528.3221-5-robert.foley@linaro.org>	2019-12-18 20:18:02 +00:00
Emilio G. Cota	38b47b19ec	plugin-gen: add module for TCG-related code We first inject empty instrumentation from translator_loop. After translation, we go through the plugins to see what they want to register for, filling in the empty instrumentation. If if turns out that some instrumentation remains unused, we remove it. This approach supports the following features: - Inlining TCG code for simple operations. Note that we do not export TCG ops to plugins. Instead, we give them a C API to insert inlined ops. So far we only support adding an immediate to a u64, e.g. to count events. - "Direct" callbacks. These are callbacks that do not go via a helper. Instead, the helper is defined at run-time, so that the plugin code is directly called from TCG. This makes direct callbacks as efficient as possible; they are therefore used for very frequent events, e.g. memory callbacks. - Passing the host address to memory callbacks. Most of this is implemented in a later patch though. - Instrumentation of memory accesses performed from helpers. See the corresponding comment, as well as a later patch. Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: add alloc_tcg_plugin_context, use glib, rm hwaddr] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2019-10-28 15:12:38 +00:00
Tony Nguyen	14776ab5a1	tcg: TCGMemOp is now accelerator independent MemOp Preparation for collapsing the two byte swaps, adjust_endianness and handle_bswap, along the I/O path. Target dependant attributes are conditionalized upon NEED_CPU_H. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:38 -07:00
tony.nguyen@bt.com	52bf9771fd	configure: Define target access alignment in configure This patch moves the define of target access alignment earlier from target/foo/cpu.h to configure. Suggested in Richard Henderson's reply to "[PATCH 1/4] tcg: TCGMemOp is now accelerator independent MemOp" Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <11e818d38ebc40e986cfa62dd7d0afdc@tpw09926dag18e.domain1.systemhost.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: tony.nguyen@bt.com <tony.nguyen@bt.com>	2019-08-20 17:26:19 +02:00
Markus Armbruster	6a0acfff99	Clean up inclusion of exec/cpu-common.h migration/qemu-file.h neglects to include it even though it needs ram_addr_t. Fix that. Drop a few superfluous inclusions elsewhere. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-14-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Like Xu	5cc8767d05	general: Replace global smp variables with smp machine properties Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:07:36 -03:00
Richard Henderson	f75da2988e	tcg: Add support for vector compare select Perform a per-element conditional move. This combination operation is easier to implement on some host vector units than plain cmp+bitsel. Omit the usual gvec interface, as this is intended to be used by target-specific gvec expansion call-backs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	38dc12947e	tcg: Add support for vector bitwise select This operation performs d = (b & a) \| (c & ~a), and is present on a majority of host vector units. Include gvec expanders. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	bcefc90208	tcg: Add support for vector absolute value Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	37ee55a081	tcg: Add INDEX_op_dupm_vec Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	d6ecb4a978	tcg: Add tcg_out_dupm_vec to the backend interface Currently stubbed out in all backends that support vectors. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	bab1671f0f	tcg: Manually expand INDEX_op_dup_vec This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Added some commentary to the tcg_reg_alloc_* functions.	2019-05-13 14:44:03 -07:00
Richard Henderson	e7632cfa8b	tcg: Promote tcg_out_{dup,dupi}_vec to backend interface The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	240c08d099	tcg: Support cross-class moves without instruction support PowerPC Altivec does not support direct moves between vector registers and general registers. So when tcg_out_mov fails, we can use the backing memory for the temporary to perform the move. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	78113e83e0	tcg: Return bool success from tcg_out_mov This patch merely changes the interface, aborting on all failures, of which there are currently none. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	d63e3b6e69	tcg: Assert fixed_reg is read-only The only fixed_reg is cpu_env, and it should not be modified during any TB. Therefore code that tries to special-case moves into a fixed_reg is dead. Remove it. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	aeee05f53a	tcg: Restart TB generation after out-of-line ldst overflow This is part c of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	1768987b73	tcg: Restart TB generation after constant pool overflow This is part b of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	7ecd02a06f	tcg: Restart TB generation after relocation overflow If the TB generates too much code, such that backend relocations overflow, try again with a smaller TB. In support of this, move relocation processing from a random place within tcg_out_op, in the handling of branch opcodes, to a new function at the end of tcg_gen_code. This is not a complete solution, as there are additional relocs generated for out-of-line ldst handling and constant pools. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:21 -07:00
Richard Henderson	6e6c4efed9	tcg: Restart after TB code generation overflow If a TB generates too much code, try again with fewer insns. Fixes: https://bugs.launchpad.net/bugs/1824853 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	fce1296f13	tcg: Add INDEX_op_extract2_{i32,i64} This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Markus Armbruster	3de2faa9a8	tcg: Simplify how dump_exec_info() prints dump_exec_info() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_info_jit() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-5-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Markus Armbruster	d4c51a0af3	tcg: Simplify how dump_opcount_info() prints dump_opcount_info() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_info_opcount() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-4-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Richard Henderson	bef16ab4e6	tcg: Diagnose referenced labels that have not been emitted Currently, a jump to a label that is not defined anywhere will be emitted not be relocated. This results in a jump to a random jump target. With tcg debugging, print a diagnostic to the -d op file and abort. This could help debug or detect errors like `c2d9644e6d` ("target/arm: Fix crash on conditional instruction in an IT block") Reported-by: Roman Kapl <code@rkapl.cz> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-11 08:52:44 -08:00
Richard Henderson	dd0a0fcdd8	tcg: Add opcodes for vector minmax arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	8afaf05066	tcg: Add opcodes for vector saturated arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Paolo Bonzini	eae3eb3e18	qemu/queue.h: simplify reverse access to QTAILQ The new definition of QTAILQ does not require passing the headname, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Richard Henderson	4250da1092	tcg: Improve call argument loading Free the argument register only after we have verified that the temporary is not already in that register. This case is likely now that we are back propagating the preferred register. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:43 +11:00
Richard Henderson	25f49c5f15	tcg: Record register preferences during liveness With these preferences, we can arrange for function call arguments to be computed into the proper registers instead of requiring extra moves. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:35 +11:00
Richard Henderson	ae36a246ed	tcg: Add TCG_OPF_BB_EXIT Use this to notice the opcodes that exit the TB, which implies that local temps are really dead and need not be synced. Previously we so marked the true end of the TB, but that was immediately overwritten by the la_bb_end invoked by any TCG_OPF_BB_END opcode, like exit_tb. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:20 +11:00
Richard Henderson	f65a061c39	tcg: Split out more subroutines from liveness_pass_1 Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:17 +11:00
Richard Henderson	2616c80821	tcg: Rename and adjust liveness_pass_1 helpers No need for a "tcg_" prefix for a static function; we already have another "la_" prefix for indicating liveness analysis. Pass in nb_globals and nb_temps, as we will already have them in registers for other loops within the parent function. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:09 +11:00
Richard Henderson	152c35aab4	tcg: Reindent parts of liveness_pass_1 There are two blocks of the form if (foo) { stuff1; goto bar; } else { baz: stuff2; } which have unnecessary and confusing indentation. Remove the else and unindent stuff2. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:57 +11:00
Richard Henderson	1894f69a61	tcg: Dump register preference info with liveness Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:48 +11:00
Richard Henderson	d62816f2db	tcg: Improve register allocation for matching constraints Try harder to honor the output_pref. When we're forced to allocate a second register for the input, it does not need to use the input constraint; that will be honored by the register we allocate for the output and a move is already required. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:40 +11:00
Richard Henderson	69e3706d2b	tcg: Add output_pref to TCGOp Allocate storage for, but do not yet fill in, per-opcode preferences for the output operands. Pass it in to the register allocation routines for output operands. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:37 +11:00
Richard Henderson	ba87719cd2	tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi Pass this through to temp_sync. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:34 +11:00
Richard Henderson	98b4e186c1	tcg: Add preferred_reg argument to temp_sync Pass this through to tcg_reg_alloc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:32 +11:00
Richard Henderson	b722452aef	tcg: Add preferred_reg argument to temp_load Pass this through to tcg_reg_alloc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:29 +11:00
Richard Henderson	b016486e7b	tcg: Add preferred_reg argument to tcg_reg_alloc This new argument will aid register allocation by indicating how the temporary will be used in future. If the preference cannot be satisfied, fall back to the constraints of the current insn. Short circuit the preference when it cannot be satisfied or if it does not further constrain the operation. With an eye toward optimizing function call sequences, optimize for the preferred_reg set containing a single register. For the moment, all users pass 0 for preference. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:06 +11:00
Richard Henderson	b4fc67c7af	tcg: Add reachable_code_pass Delete trivially dead code that follows unconditional branches and noreturn helpers. These can occur either via optimization or via the structure of a target's translator following an exception. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:30 +11:00
Richard Henderson	d88a117eaa	tcg: Reference count labels Increment when adding branches, and decrement when removing them. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:27 +11:00
Emilio G. Cota	ac1043f6d6	tcg: Drop nargs from tcg_op_insert_{before,after} It's unused since `75e8b9b7aa`. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181209193749.12277-9-cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	6ac1778676	tcg: Return success from patch_reloc This will move the assert for success from within (subroutines of) patch_reloc into the callers. It will also let new code do something different when a relocation is out of range. For the moment, all backends are trivially converted to return true. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Emilio G. Cota	72fd2efbbd	tcg: distribute tcg_time into TCG contexts When we implemented per-vCPU TCG contexts, we forgot to also distribute the tcg_time counter, which has remained as a global accessed without any serialization, leading to potentially missed counts. Fix it by distributing the field over the TCG contexts, embedding it into TCGProfile with a field called "cpu_exec_time", which is more descriptive than "tcg_time". Add a function to query this value directly, and for completeness, fill in the field in tcg_profile_snapshot, even though its callers do not use it. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-5-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	c1f543b739	tcg: fix use of uninitialized variable under CONFIG_PROFILER We forgot to initialize n in commit `15fa08f845` ("tcg: Dynamically allocate TCGOps", 2017-12-29). Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Richard Henderson	9f75462065	tcg: Reduce max TB opcode count Also, assert that we don't overflow any of two different offsets into the TB. Both unwind and goto_tb both record a uint16_t for later use. This fixes an arm-softmmu test case utilizing NEON in which there is a TB generated that runs to 7800 opcodes, and compiles to 96k on an x86_64 host. This overflows the 16-bit offset in which we record the goto_tb reset offset. Because of that overflow, we install a jump destination that goes to neverland. Boom. With this reduced op count, the same TB compiles to about 48k for aarch64, ppc64le, and x86_64 hosts, and neither assertion fires. Cc: qemu-stable@nongnu.org Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 09:39:53 -10:00
Emilio G. Cota	128ed2278c	tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx Thereby making it per-TCGContext. Once we remove tb_lock, this will avoid an atomic increment every time a TB is invalidated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	be2cdc5e35	tcg: track TBs with per-region BST's This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Richard Henderson	abebf92597	tcg: Limit the number of ops in a TB In `6001f7729e` we partially attempt to address the branch displacement overflow caused by `15fa08f845`. However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c is a testcase that contains a TB so large as to overflow anyway. The limit here of 8000 ops produces a maximum output TB size of 24112 bytes on a ppc64le host with that test case. This is still much less than the maximum forward branch distance of 32764 bytes. Cc: qemu-stable@nongnu.org Fixes: `15fa08f845` ("tcg: Dynamically allocate TCGOps") Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-09 08:30:57 -07:00
Laurent Vivier	6001f7729e	tcg: workaround branch instruction overflow in tcg_out_qemu_ld/st ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st slow path. BC instruction uses a relative address encoded on 14 bits. The slow path functions are added at the end of the generated instructions buffer, in the reverse order of the callers. So more we have slow path functions more the distance between the caller (BC) and the function increases. This patch changes the behavior to generate the functions in the same order of the callers. Cc: qemu-stable@nongnu.org Fixes: `15fa08f845` ("tcg: Dynamically allocate TCGOps") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180429235840.16659-1-lvivier@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:55 -07:00
Richard Henderson	5bfa803448	tcg: Improve TCGv_ptr support Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros. Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr, tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32. Use inlines instead of macros where possible. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:16 -07:00
Richard Henderson	3774030a3e	tcg: Add generic vector ops for multiplication Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	212be173f0	tcg: Add generic vector ops for comparisons Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	d0ec97967f	tcg: Add generic vector ops for constant shifts Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	db432672dc	tcg: Add generic vector expanders Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	d2fd745fe8	tcg: Add types and basic operations for host vectors Nothing uses or enables them yet. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:04 +00:00
Richard Henderson	cd9090aa9d	tcg: Generalize TCGOp parameters We had two fields specific to INDEX_op_call. Rename these and add some macros so that the fields may be reused for other opcodes. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:39 -08:00
Richard Henderson	15fa08f845	tcg: Dynamically allocate TCGOps With no fixed array allocation, we can't overflow a buffer. This will be important as optimizations related to host vectors may expand the number of ops used. Use QTAILQ to link the ops together. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:39 -08:00
Richard Henderson	f764718d0c	tcg: Remove TCGV_UNUSED* and TCGV_IS_UNUSED* These are now trivial sets and tests against NULL. Unwrap. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:39 -08:00
Richard Henderson	5b38ee3161	tcg: Allow constant pool entries in the prologue Both ARMv6 and AArch64 currently may drop complex guest_base values into the constant pool. But generic code wasn't expecting that, and the pool is not emitted. Correct that. Tested-by: Emilio G. Cota <cota@braap.org> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-11-03 09:33:45 +01:00
Richard Henderson	1c2adb958f	tcg: Initialize cpu_env generically This is identical for each target. So, move the initialization to common code. Move the variable itself out of tcg_ctx and name it cpu_env to minimize changes within targets. This also means we can remove tcg_global_reg_new_{ptr,i32,i64}, since there are no longer global-register temps created by targets. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	3468b59e18	tcg: enable multiple TCG contexts in softmmu This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable and tcg_ctxs; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	e8feb96fcc	tcg: introduce regions to split code_gen_buffer This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	c3fac1138e	tcg: distribute profiling counters across TCGContext's This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	df2cce2968	tcg: introduce **tcg_ctxs to keep track of all TCGContext's Groundwork for supporting multiple TCG contexts. Note that having n_tcg_ctxs is unnecessary. However, it is convenient to have it, since it will simplify iterating over the array: we'll have just a for loop instead of having to iterate over a NULL-terminated array (which would require n+1 elems) or having to check with ifdef's for usermode/softmmu. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	b1311c4acf	tcg: define tcg_init_ctx and make tcg_ctx a pointer Groundwork for supporting multiple TCG contexts. The core of this patch is this change to tcg/tcg.h: > -extern TCGContext tcg_ctx; > +extern TCGContext tcg_init_ctx; > +extern TCGContext tcg_ctx; Note that for now we set tcg_ctx to whatever TCGContext is passed to tcg_context_init -- in this case &tcg_init_ctx. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00

1 2 3 4 5 ...

429 Commits