Commit Graph

223 Commits

Author SHA1 Message Date
Richard Henderson
a71d9dfbf6 tcg/optimize: Fix TCG_COND_TST* simplification of setcond2
Argument ordering for setcond2 is:

  output, a_low, a_high, b_low, b_high, cond

The test is supposed to be against b_low, not a_high.

Cc: qemu-stable@nongnu.org
Fixes: ceb9ee06b7 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240701024623.1265028-1-richard.henderson@linaro.org>
2024-07-03 10:24:12 -07:00
Richard Henderson
8d65cda728 tcg/optimize: Optimize setcond with zmask
If we can show that high bits of an input are zero,
then we may optimize away some comparisons.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-05-06 12:55:51 -07:00
Richard Henderson
e25fe886b8 tcg/optimize: Do not attempt to constant fold neg_vec
Split out the tail of fold_neg to fold_neg_no_const so that we
can avoid attempting to constant fold vector negate.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2150
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09 07:43:11 -10:00
Richard Henderson
2911e9b95f tcg/optimize: Fix sign_mask for logical right-shift
The 'sign' computation is attempting to locate the sign bit that has
been repeated, so that we can test if that bit is known zero.  That
computation can be zero if there are no known sign repetitions.

Cc: qemu-stable@nongnu.org
Fixes: 93a967fbb5 ("tcg/optimize: Propagate sign info for shifting")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2248
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-29 12:15:55 -10:00
Paolo Bonzini
ff202817dc tcg/optimize: fix uninitialized variable
The variables uext_opc and sext_opc are used without initialization if
TCG_TARGET_extract_i{32,64}_valid returns false.  The result, depending
on the compiler, might be the generation of extract and sextract opcodes
with invalid offset and count, or just random data in the TCG opcode
stream.

Fixes: ceb9ee06b7 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}", 2024-02-03)
Cc: Richard Henderson <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240228110641.287205-1-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 11:36:05 -10:00
Richard Henderson
fb04ab7ddd tcg/optimize: Lower TCG_COND_TST{EQ,NE} if unsupported
After having performed other simplifications, lower any
remaining test comparisons with AND.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 23:43:50 +00:00
Richard Henderson
ceb9ee06b7 tcg/optimize: Handle TCG_COND_TST{EQ,NE}
Fold constant comparisons.
Canonicalize "tst x,x" to equality vs zero.
Canonicalize "tst x,sign" to sign test vs zero.
Fold double-word comparisons with zero parts.
Fold setcond of "tst x,pow2" to a bit extract.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 23:43:50 +00:00
Richard Henderson
7e64b114d1 tcg/optimize: Do swap_commutative2 in do_constant_folding_cond2
Mirror the new do_constant_folding_cond1 by doing all
argument and condition adjustment within one helper.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 23:43:50 +00:00
Richard Henderson
246c4b72fa tcg/optimize: Split out do_constant_folding_cond1
Handle modifications to the arguments and condition
in a single place.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 23:43:50 +00:00
Richard Henderson
27cdb85d06 tcg/optimize: Split out arg_is_const_val
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 23:43:50 +00:00
Richard Henderson
f245757701 tcg/optimize: Canonicalize sub2 with constants to add2
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20231026013945.1152174-4-richard.henderson@linaro.org>
2023-11-06 10:43:04 -08:00
Richard Henderson
6334a968ee tcg/optimize: Canonicalize subi to addi during optimization
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20231026013945.1152174-3-richard.henderson@linaro.org>
2023-11-06 10:43:04 -08:00
Richard Henderson
26aac97c84 tcg/optimize: Split out arg_new_constant
Fixes a bug wherein raw uses of tcg_constant_internal
do not have their TempOptInfo initialized.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06 10:36:56 -08:00
Richard Henderson
3eaadaeb4e tcg: Eliminate duplicate env store operations
Notice when a constant is stored to the same location twice.

Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
ab84dc398b tcg/optimize: Optimize env memory operations
Propagate stores to loads, loads to loads.

Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
9f75e52828 tcg/optimize: Split out cmp_better_copy
Compare two temps for "better", split out from finding
the best from a whole list.  Use TCGKind, which already
gives the proper priority.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
986cac1d2a tcg/optimize: Pipe OptContext into reset_ts
Will be needed in the next patch.

Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
b701f195d3 tcg: Remove TCG_TARGET_HAS_neg_{i32,i64}
The movcond opcode is now mandatory for backends to implement.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231026041404.1229328-7-richard.henderson@linaro.org>
2023-11-06 08:27:21 -08:00
Richard Henderson
d97f8f3941 tcg: Optimize past conditional branches
We already register allocate through extended basic blocks,
optimize through extended basic blocks as well.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-22 16:32:28 -07:00
Richard Henderson
3635502dd0 tcg: Introduce negsetcond opcodes
Introduce a new opcode for negative setcond.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-24 11:22:42 -07:00
Richard Henderson
8f7a840d7d tcg: Fold deposit with zero to and
Inserting a zero into a value, or inserting a value
into zero at offset 0 may be implemented with AND.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-24 11:22:42 -07:00
Richard Henderson
ad3d0e4d5d tcg: Split tcg/tcg-op-common.h from tcg/tcg-op.h
Create tcg/tcg-op-common.h, moving everything that does not concern
TARGET_LONG_BITS or TCGv.  Adjust tcg/*.c to use the new header
instead of tcg-op.h, in preparation for compiling tcg/ only once.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-05 12:04:28 -07:00
Richard Henderson
fecccfcc54 tcg: Split INDEX_op_qemu_{ld,st}* for guest address size
For 32-bit hosts, we cannot simply rely on TCGContext.addr_bits,
as we need one or two host registers to represent the guest address.

Create the new opcodes and update all users.  Since we have not
yet eliminated TARGET_LONG_BITS, only one of the two opcodes will
ever be used, so we can get away with treating them the same in
the backends.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-16 20:07:20 -07:00
Richard Henderson
12fde9bcdb tcg: Add INDEX_op_qemu_{ld,st}_i128
Add opcodes for backend support for 128-bit memory operations.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-16 16:30:25 -07:00
Richard Henderson
732e89f4c4 tcg: Replace tcg_abort with g_assert_not_reached
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-04-23 08:17:46 +01:00
Richard Henderson
f57c69156c tcg: Rename TEMP_LOCAL to TEMP_TB
Use TEMP_TB as that is more explicit about the default
lifetime of the data.  While "global" and "local" used
to be contrasting, we have more lifetimes than that now.

Do not yet rename tcg_temp_local_new_*, just the enum.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01 07:33:27 -10:00
Richard Henderson
39004a71d8 tcg: Reorg function calls
Pre-compute the function call layout for each helper at startup.
Drop TCG_CALL_DUMMY_ARG, as we no longer need to leave gaps
in the op->args[] array.  This allows several places to stop
checking for NULL TCGTemp, to which TCG_CALL_DUMMY_ARG mapped.

For tcg_gen_callN, loop over the arguments once.  Allocate the TCGOp
for the call early but delay emitting it, collecting arguments first.
This allows the argument processing loop to emit code for extensions
and have them sequenced before the call.

For tcg_reg_alloc_call, loop over the arguments in reverse order,
which allows stack slots to be filled first naturally.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-01-05 11:41:29 -08:00
Philippe Mathieu-Daudé
d44789434b tcg: Pass number of arguments to tcg_emit_op() / tcg_op_insert_*()
In order to have variable size allocated TCGOp, pass the number
of arguments we use (and would allocate) up to tcg_op_alloc().

This alters tcg_emit_op(), tcg_op_insert_before() and
tcg_op_insert_after() prototypes.

In tcg_op_alloc() ensure the number of arguments is in range.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMD: Extracted from bigger patch]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221218211832.73312-2-philmd@linaro.org>
2023-01-05 11:41:29 -08:00
Richard Henderson
ed5234735a tcg: Add opcodes for vector nand, nor, eqv
We've had placeholders for these opcodes for a while,
and should have support on ppc, s390x and avx512 hosts.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-03-04 08:50:41 -10:00
Alex Bennée
9becc36f02 tcg/optimize: only read val after const check
valgrind pointed out that arg_info()->val can be undefined which will
be the case if the arguments are not constant. The ordering of the
checks will have ensured we never relied on an undefined value but for
the sake of completeness re-order the code to be clear.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220209112142.3367525-1-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-03-04 08:50:41 -10:00
Richard Henderson
c578ff1858 tcg/optimize: Fix folding of vector ops
Bitwise operations are easy to fold, because the operation is
identical regardless of element size.  But add and sub need
extra element size info that is not currently propagated.

Fixes: 2f9f08ba43
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/799
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-01-04 15:14:42 -08:00
Richard Henderson
225bec0c0e tcg/optimize: Add an extra cast to fold_extract2
There is no bug, but silence a warning about computation
in int32_t being assigned to a uint64_t.

Reported-by: Coverity CID 1465220
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-11 11:47:01 +01:00
Richard Henderson
93a967fbb5 tcg/optimize: Propagate sign info for shifting
For constant shifts, we can simply shift the s_mask.

For variable shifts, we know that sar does not reduce
the s_mask, which helps for sequences like

    ext32s_i64  t, in
    sar_i64     t, t, v
    ext32s_i64  out, t

allowing the final extend to be eliminated.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
2b9d0c59ed tcg/optimize: Propagate sign info for bit counting
The results are generally 6 bit unsigned values, though
the count leading and trailing bits may produce any value
for a zero input.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
275d7d8e70 tcg/optimize: Propagate sign info for setcond
The result is either 0 or 1, which means that we have
a 2 bit signed result, and thus 62 bits of sign.
For clarity, use the smask_from_zmask function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
3f2b1f8376 tcg/optimize: Propagate sign info for logical operations
Sign repetitions are perforce all identical, whether they are 1 or 0.
Bitwise operations preserve the relative quantity of the repetitions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
57fe5c6df2 tcg/optimize: Optimize sign extensions
Certain targets, like riscv, produce signed 32-bit results.
This can lead to lots of redundant extensions as values are
manipulated.

Begin by tracking only the obvious sign-extensions, and
converting them to simple copies when possible.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
267c17e825 tcg/optimize: Use fold_xx_to_i for rem
Recognize the constant function for remainder.

Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
2f9d9a3422 tcg/optimize: Use fold_xi_to_x for div
Recognize the identity function for division.

Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
5b5cf47983 tcg/optimize: Use fold_xi_to_x for mul
Recognize the identity function for low-part multiply.

Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
4e858d96aa tcg/optimize: Use fold_xx_to_i for orc
Recognize the constant function for or-complement.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
faa2e10045 tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
This "garbage" setting pre-dates the addition of the type
changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64,
and INDEX_op_extr{l,h}_i64_i32.

So now we have a definitive points at which to adjust z_mask
to eliminate such bits from the 32-bit operands.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28 20:55:07 -07:00
Richard Henderson
7a2f708452 tcg/optimize: Sink commutative operand swapping into fold functions
Most of these are handled by creating a fold_const2_commutative
to handle all of the binary operators.  The rest were already
handled on a case-by-case basis in the switch, and have their
own fold function in which to place the call.

We now have only one major switch on TCGOpcode.

Introduce NO_DEST and a block comment for swap_commutative in
order to make the handling of brcond and movcond opcodes cleaner.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:23 -07:00
Richard Henderson
9531c078ff tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
Rename to fold_addsub2.
Use Int128 to implement the wider operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
407112b03d tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
Rename to fold_multiply2, and handle muls2_i32, mulu2_i64,
and muls2_i64.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
fae450ba47 tcg/optimize: Split out fold_masks
Move all of the known-zero optimizations into the per-opcode
functions.  Use fold_masks when there is a possibility of the
result being determined, and simply set ctx->z_mask otherwise.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
da48e27202 tcg/optimize: Split out fold_ix_to_i
Pull the "op r, 0, b => movi r, 0" optimization into a function,
and use it in fold_shift.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
a63ce0e9cb tcg/optimize: Split out fold_xi_to_x
Pull the "op r, a, i => mov r, a" optimization into a function,
and use them in the outer-most logical operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
9caca88a76 tcg/optimize: Split out fold_sub_to_neg
Even though there is only one user, place this more complex
conversion into its own helper.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
0e0a32bacb tcg/optimize: Split out fold_to_not
Split out the conditional conversion from a more complex logical
operation to a simple NOT.  Create a couple more helpers to make
this easy for the outer-most logical operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00