Commit Graph

1854 Commits

Author SHA1 Message Date
Richard Henderson
269bd5d8f6 cpu: Move the softmmu tlb to CPUNegativeOffsetState
We have for some time had code within the tcg backends to
handle large positive offsets from env.  This move makes
sure that need not happen.  Indeed, we are able to assert
at build time that simple offsets suffice for all hosts.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:42 -07:00
Richard Henderson
a40ec84ee2 tcg: Create struct CPUTLB
Move all softmmu tlb data into this structure.  Arrange the
members so that we are able to place mask+table together and
at a smaller absolute offset from ENV.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:34 -07:00
Richard Henderson
11e2bfef79 tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
This instruction raises #GP, aka SIGSEGV, if the effective address
is not aligned to 16-bytes.

We have assertions in tcg-op-gvec.c that the offset from ENV is
aligned, for vector types <= V128.  But the offset itself does not
validate that the final pointer is aligned -- one must also remember
to use the QEMU_ALIGNED() attribute on the vector member within ENV.

PowerPC Altivec has vector load/store instructions that silently
discard the low 4 bits of the address, making alignment mistakes
difficult to discover.  Aid that by making the most popular host
visibly signal the error.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
9e27f58b99 tcg/aarch64: Allow immediates for vector ORR and BIC
The allows immediates to be used for ORR and BIC,
as well as the trivial inversions, ORC and AND.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
02f3a5b474 tcg/aarch64: Build vector immediates with two insns
Use MOVI+ORR or MVNI+BIC in order to build some vector constants,
as opposed to dropping them to the constant pool.  This includes
all 16-bit constants and a similar set of 32-bit constants.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
7e308e003e tcg/aarch64: Use MVNI in tcg_out_dupi_vec
The compliment of a subset of immediates can be computed
with a single instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
984fdcee34 tcg/aarch64: Split up is_fimm
There are several sub-classes of vector immediate, and only MOVI
can use them all.  This will enable usage of MVNI and ORRI, which
use progressively fewer sub-classes.

This patch adds no new functionality, merely splits the function
and moves part of the logic into tcg_out_dupi_vec.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
a9e434a5dc tcg/aarch64: Support vector bitwise select value
The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination.  We
can represent the operation without overlap and choose based on
the operands seen.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
ebcfb91abe tcg/i386: Use umin/umax in expanding unsigned compare
Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
3ec3538a45 tcg/i386: Remove expansion for missing minmax
This is now handled by code within tcg-op-vec.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
904c5e1967 tcg/i386: Support vector comparison select value
We already had backend support for this feature.  Expand the new
cmpsel opcode using vpblendb.  The combination allows us to avoid
an extra NOT for some comparison codes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
25c012b400 tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative
If INDEX_op_foo is always expanded by tcg_expand_vec_op, then
there may be no reasonable set of constraints to return from
tcg_target_op_def for that opcode.

Let TCG_TARGET_HAS_foo be specified as -1 in that case.  Thus a
boolean test for TCG_TARGET_HAS_foo is true, but we will not
assert within process_op_defs when no constraints are specified.

Compare this with tcg_can_emit_vec_op, which already uses this
tri-state indication.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
72b4c792c7 tcg: Expand vector minmax using cmp+cmpsel
Provide a generic fallback for the min/max operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
17f79944eb tcg: Introduce do_op3_nofail for vector expansion
This makes do_op3 match do_op2 in allowing for failure,
and thus fall back expansions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
f75da2988e tcg: Add support for vector compare select
Perform a per-element conditional move.  This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
38dc12947e tcg: Add support for vector bitwise select
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units.  Include gvec expanders.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
532ba368a1 tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
The paths through tcg_gen_dup_mem_vec and through MO_128 were
missing the check_size_align.  The path through MO_128 was also
missing the expand_clr.  This last was not visible because the
only user is ARM SVE, which would set oprsz == maxsz, and not
require the clear.

Fix by adding the check_size_align and using do_dup directly
instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
7b60ef3264 tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register.  This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.

Also fix the 32-bit avx2, which should have used VPBROADCASTW.

Fixes: 1e262b49b5
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
a7b6d286cf tcg/aarch64: Do not advertise minmax for MO_64
The min/max instructions are not available for 64-bit elements.

Fixes: 93f332a503
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
a456394ae5 tcg/aarch64: Support vector absolute value
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
18f9b65f1a tcg/i386: Support vector absolute value
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
bcefc90208 tcg: Add support for vector absolute value
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
ff1f11f7f8 tcg: Add support for integer absolute value
Remove a function of the same name from target/arm/.
Use a branchless implementation of abs gleaned from gcc.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
0a8d7a3bf5 tcg/i386: Support vector scalar shift opcodes
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
b4578cd91c tcg: Add gvec expanders for vector shift by scalar
Allow expansion either via shift by scalar or by replicating
the scalar for shift by vector.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Use a private structure for do_gvec_shifts.
2019-05-13 22:52:08 +00:00
Richard Henderson
79525dfd08 tcg/aarch64: Support vector variable shift opcodes
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
a2ce146a06 tcg/i386: Support vector variable shift opcodes
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
5ee5c14cac tcg: Add gvec expanders for variable shift
The gvec expanders perform a modulo on the shift count.  If the target
requires alternate behaviour, then it cannot use the generic gvec
expanders anyway, and will have to have its own custom code.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
37ee55a081 tcg: Add INDEX_op_dupm_vec
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first.  This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

  VECE == 32/64:
    Load the value into a vector register, then dup.
    Both of these must work.

  VECE == 8/16:
    If the value happens to be at an offset such that an aligned
    load would place the desired value in the least significant
    end of the register, go ahead and load w/garbage in high bits.

    Load the value w/INDEX_op_ld{8,16}_i32.
    Attempt a move directly to vector reg, which may fail.
    Store the value into the backing store for OTS.
    Load the value into the vector reg w/TCG_TYPE_I32, which must work.
    Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
f23e5e15ed tcg/aarch64: Implement tcg_out_dupm_vec
The LD1R instruction does all the work.  Note that the only
useful addressing mode is a base register with no offset.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:50:35 +00:00
Richard Henderson
1e262b49b5 tcg/i386: Implement tcg_out_dupm_vec
At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
d6ecb4a978 tcg: Add tcg_out_dupm_vec to the backend interface
Currently stubbed out in all backends that support vectors.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
bab1671f0f tcg: Manually expand INDEX_op_dup_vec
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Added some commentary to the tcg_reg_alloc_* functions.
2019-05-13 14:44:03 -07:00
Richard Henderson
e7632cfa8b tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
The i386 backend already has these functions, and the aarch64 backend
could easily split out one.  Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.

Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
240c08d099 tcg: Support cross-class moves without instruction support
PowerPC Altivec does not support direct moves between vector registers
and general registers.  So when tcg_out_mov fails, we can use the
backing memory for the temporary to perform the move.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
78113e83e0 tcg: Return bool success from tcg_out_mov
This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
c16f52b2c5 tcg/arm: Use tcg_out_mov_reg in tcg_out_mov
We have a function that takes an additional condition parameter
over the standard backend interface.  It already takes care of
eliding no-op moves.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
d63e3b6e69 tcg: Assert fixed_reg is read-only
The only fixed_reg is cpu_env, and it should not be modified
during any TB.  Therefore code that tries to special-case moves
into a fixed_reg is dead.  Remove it.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
53229a7703 tcg: Specify optional vector requirements with a list
Replace the single opcode in .opc with a null-terminated
array in .opt_opc.  We still require that all opcodes be
used with the same .vece.

Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion.  Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.

Convert all existing vector aware front ends.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
ce27c5d1a3 tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
PowerPC Altivec does not support add and subtract of 64-bit elements.
Prepare for that configuration by not assuming the operation is
universally supported.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
ac383dde33 tcg: Do not recreate INDEX_op_neg_vec unless supported
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:26:28 -07:00
David Hildenbrand
e1227bb6e5 tcg: Implement tcg_gen_gvec_3i()
Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190416185301.25344-2-david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:26:28 -07:00
Richard Henderson
b4b82d7e9c tcg/arm: Restrict constant pool displacement to 12 bits
This will not necessarily restrict the size of the TB, since for v7
the majority of constant pool usage is for calls from the out-of-line
ldst code, which is already at the end of the TB.  But this does
allow us to save one insn per reference on the off-chance.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-25 10:39:39 -07:00
Richard Henderson
a7cdaf710f tcg/ppc: Allow the constant pool to overflow at 32k
There is no point in coding for a 2GB offset when the max TB size
is already limited to 64k.  If we further restrict to 32k then we
can eliminate the extra ADDIS instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
aeee05f53a tcg: Restart TB generation after out-of-line ldst overflow
This is part c of relocation overflow handling.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
1768987b73 tcg: Restart TB generation after constant pool overflow
This is part b of relocation overflow handling.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
7ecd02a06f tcg: Restart TB generation after relocation overflow
If the TB generates too much code, such that backend relocations
overflow, try again with a smaller TB.  In support of this, move
relocation processing from a random place within tcg_out_op, in
the handling of branch opcodes, to a new function at the end of
tcg_gen_code.

This is not a complete solution, as there are additional relocs
generated for out-of-line ldst handling and constant pools.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:21 -07:00
Richard Henderson
6e6c4efed9 tcg: Restart after TB code generation overflow
If a TB generates too much code, try again with fewer insns.

Fixes: https://bugs.launchpad.net/bugs/1824853
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
464c2969d5 tcg/aarch64: Support INDEX_op_extract2_{i32,i64}
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
3b832d67a9 tcg/arm: Support INDEX_op_extract2_i32
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
c6fb8c0cf7 tcg/i386: Support INDEX_op_extract2_{i32,i64}
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
b0a6056719 tcg: Use extract2 in tcg_gen_deposit_{i32,i64}
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
02616bad6f tcg: Use deposit and extract2 in tcg_gen_shifti_i64
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
fce1296f13 tcg: Add INDEX_op_extract2_{i32,i64}
This will let backends implement the double-word shift operation.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
David Hildenbrand
2089fcc9e7 tcg: Implement tcg_gen_extract2_{i32,i64}
Will be helpful for s390x. Input 128 bit and output 64 bit only,
which is sufficient for now.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190225154204.26751-1-david@redhat.com>
[rth: Add matching tcg_gen_extract2_i32.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Markus Armbruster
3de2faa9a8 tcg: Simplify how dump_exec_info() prints
dump_exec_info() takes an fprintf()-like callback and a FILE * to pass
to it.

Its only caller hmp_info_jit() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-5-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Markus Armbruster
d4c51a0af3 tcg: Simplify how dump_opcount_info() prints
dump_opcount_info() takes an fprintf()-like callback and a FILE * to
pass to it.

Its only caller hmp_info_opcount() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-4-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Richard Henderson
9e564a1dde tcg: Remove TODO file
The last update to this file was 9 years ago.  In the meantime,
4 of the 6 ideas have actually been completed.  The lat two do
not actually make sense anymore.

Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-21 10:22:24 -08:00
Mark Cave-Ayland
3115584d39 tcg/i386: fix unsigned vector saturating arithmetic
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.

Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Richard Henderson
bef16ab4e6 tcg: Diagnose referenced labels that have not been emitted
Currently, a jump to a label that is not defined anywhere will
be emitted not be relocated.  This results in a jump to a random
jump target.  With tcg debugging, print a diagnostic to the -d op
file and abort.

This could help debug or detect errors like
c2d9644e6d ("target/arm: Fix crash on conditional instruction in an IT block")

Reported-by: Roman Kapl <code@rkapl.cz>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Thomas Huth
fb0343d5b4 tcg: Fix LGPL version number
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-01-30 11:01:52 +01:00
Richard Henderson
e77c89fb08 cputlb: Remove static tlb sizing
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:35 -08:00
Richard Henderson
0a9a83d6bf tcg/tci: enable dynamic TLB sizing
This is automatic due to TCI using the other softtlb macros.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
ac33373e0e tcg/mips: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
a31aa4ce00 tcg/mips: Fix tcg_out_qemu_ld_slow_path
Patch the branch after it has been emitted rather
than before it exists.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
cd7d3cb7a2 tcg/arm: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
41b70f220b tcg/riscv: enable dynamic TLB sizing
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:24 -08:00
Richard Henderson
4f47e338f6 tcg/s390: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
17ff9f7801 tcg/sparc: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
644f591ab0 tcg/ppc: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
f7bcd96669 tcg/aarch64: enable dynamic TLB sizing
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Emilio G. Cota
54eaf40b8f tcg/i386: enable dynamic TLB sizing
As the following experiments show, this series is a net perf gain,
particularly for memory-heavy workloads. Experiments are run on an
Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.

1. System boot + shudown, debian aarch64:

- Before (v3.1.0):
 Performance counter stats for './die.sh v3.1.0' (10 runs):

       9019.797015      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
    29,910,312,379      cycles                    #    3.316 GHz                      ( +-  0.14% )
    54,699,252,014      instructions              #    1.83  insn per cycle           ( +-  0.08% )
    10,061,951,686      branches                  # 1115.541 M/sec                    ( +-  0.08% )
       172,966,530      branch-misses             #    1.72% of all branches          ( +-  0.07% )

       9.084039051 seconds time elapsed                                          ( +-  0.23% )

- After:
 Performance counter stats for './die.sh tlb-dyn-v5' (10 runs):

       8624.084842      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
    28,556,123,404      cycles                    #    3.311 GHz                      ( +-  0.13% )
    51,755,089,512      instructions              #    1.81  insn per cycle           ( +-  0.05% )
     9,526,513,946      branches                  # 1104.641 M/sec                    ( +-  0.05% )
       166,578,509      branch-misses             #    1.75% of all branches          ( +-  0.19% )

       8.680540350 seconds time elapsed                                          ( +-  0.24% )

That is, a 4.4% perf increase.

2. System boot + shutdown, ubuntu 18.04 x86_64:

- Before (v3.1.0):
      56100.574751      task-clock (msec)         #    1.016 CPUs utilized            ( +-  4.81% )
   200,745,466,128      cycles                    #    3.578 GHz                      ( +-  5.24% )
   431,949,100,608      instructions              #    2.15  insn per cycle           ( +-  5.65% )
    77,502,383,330      branches                  # 1381.490 M/sec                    ( +-  6.18% )
       844,681,191      branch-misses             #    1.09% of all branches          ( +-  3.82% )

      55.221556378 seconds time elapsed                                          ( +-  5.01% )

- After:
      56603.419540      task-clock (msec)         #    1.019 CPUs utilized            ( +- 10.19% )
   202,217,930,479      cycles                    #    3.573 GHz                      ( +- 10.69% )
   439,336,291,626      instructions              #    2.17  insn per cycle           ( +- 14.14% )
    80,538,357,447      branches                  # 1422.853 M/sec                    ( +- 16.09% )
       776,321,622      branch-misses             #    0.96% of all branches          ( +-  3.77% )

      55.549661409 seconds time elapsed                                          ( +- 10.44% )

No improvement (within noise range). Note that for this workload,
increasing the time window too much can lead to perf degradation,
since it flushes the TLB *very* frequently.

3. x86_64 SPEC06int:

           x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set)
            Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)

5.5 +------------------------------------------------------------------------+
    |                   +-+                                                  |
  5 |-+.................+-+...............................tlb-dyn-v5.......+-|
    |                   * *                                                  |
4.5 |-+.................*.*................................................+-|
    |                   * *                                                  |
  4 |-+.................*.*................................................+-|
    |                   * *                                                  |
3.5 |-+.................*.*................................................+-|
    |                   * *                                                  |
  3 |-+......+-+*.......*.*................................................+-|
    |        *  *       * *                                                  |
2.5 |-+......*..*.......*.*.................................+-+*...........+-|
    |        *  *       * *                                 *  *             |
  2 |-+......*..*.......*.*.................................*..*...........+-|
    |        *  *       * *                                 *  *  +-+        |
1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-|
    |        *  * *+-+  * *  +-+       *+-+  +-+       +-+  *  * *  * *  *   |
  1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++|
    |   *  * *  * *  *  * *  * *  *  * *  * *  *  * *  * *  *  * *  * *  *   |
0.5 +------------------------------------------------------------------------+
  400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean
  png: https://imgur.com/YRF90f7

That is, a 1.51x average speedup over the baseline, with a max speedup
of 5.17x.

Here's a different look at the SPEC06int results, using KVM as the baseline:

             x86_64-softmmu slowdown vs. KVM for SPEC06int (test set)
             Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)

25 +---------------------------------------------------------------------------+
   |                   +-+                                        +-+          |
   |                   * *                             +-+      v3.1.0         |
   |                   * *                             +-+  tlb-dyn-v5         |
   |                   * *                             * *        +-+          |
20 |-+.................*.*.............................*.+-+......*.*........+-|
   |                   * *                             * # #      * *          |
   |        +-+        * *                             * # #      * *          |
   |        * *        * *                             * # #      * *          |
15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-|
   |        * *        * *                             * # #      * #|#        |
   |        * *        * *        +-+                  * # #      * +-+        |
   |        * *  +-+   * *        ++-+       +-+       * # #      * # # +-+    |
   |        * *  +-+   * *        * ##       *|   +-+  * # #      * # # +-+    |
10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-|
   |        * *  * +-+ * *        * ## +-+   *# # * # #* # # +-+  * # # * *    |
   |        * *  * # # * *  +-+   * ## * +-+ *# # * # #* # # * *  * # # *+-+   |
   |        * *  * # # * *  * +-+ * ## * # # *# # * # #* # # * *  * # # * ##   |
 5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-|
   |        * # #* # # * +-+* # # * ## * # # *# # * # #* # # * *  * # # * ##   |
   |        * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ##   |
   |   ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ##   |
   |+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++|
 0 +---------------------------------------------------------------------------+
 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean
  png: https://imgur.com/YzAMNEV

After this series, we bring down the average SPEC06int slowdown vs KVM
from 11.47x to 7.58x.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Emilio G. Cota
86e1eff8bc tcg: introduce dynamic TLB sizing
Disabled in all TCG backends for now.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
93f332a503 tcg/aarch64: Implement vector minmax arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
d32648d445 tcg/aarch64: Implement vector saturating arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
bc37faf4cb tcg/i386: Implement vector minmax arithmetic
The avx instruction set does not directly provide MO_64.
We can still implement 64-bit with comparison and vpblendvb.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
8ffafbcec2 tcg/i386: Implement vector saturating arithmetic
Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
44f1441dbe tcg/i386: Split subroutines out of tcg_expand_vec_op
This routine was becoming too large.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
dd0a0fcdd8 tcg: Add opcodes for vector minmax arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
8afaf05066 tcg: Add opcodes for vector saturated arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
5d6acdd4a4 tcg: Add write_aofs to GVecGen4
This allows writing 2 output, 3 input operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
f550805d83 tcg: Add gvec expanders for nand, nor, eqv
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
9a9eda78e4 tcg: Add logical simplifications during gvec expand
We handle many of these during integer expansion, and the
rest of them during integer optimization.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Paolo Bonzini
7d37435bd5 avoid TABs in files that only contain a few
Most files that have TABs only contain a handful of them.  Change
them to spaces so that we don't confuse people.

disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line.  Many of them
have a majority of TABs, or were initially committed with all tabs.

    bsd-user/i386/target_syscall.h
    bsd-user/x86_64/target_syscall.h
    crypto/aes.c
    hw/audio/fmopl.c
    hw/audio/fmopl.h
    hw/block/tc58128.c
    hw/display/cirrus_vga.c
    hw/display/xenfb.c
    hw/dma/etraxfs_dma.c
    hw/intc/sh_intc.c
    hw/misc/mst_fpga.c
    hw/net/pcnet.c
    hw/sh4/sh7750.c
    hw/timer/m48t59.c
    hw/timer/sh_timer.c
    include/crypto/aes.h
    include/disas/bfd.h
    include/hw/sh4/sh.h
    libdecnumber/decNumber.c
    linux-headers/asm-generic/unistd.h
    linux-headers/linux/kvm.h
    linux-user/alpha/target_syscall.h
    linux-user/arm/nwfpe/double_cpdo.c
    linux-user/arm/nwfpe/fpa11_cpdt.c
    linux-user/arm/nwfpe/fpa11_cprt.c
    linux-user/arm/nwfpe/fpa11.h
    linux-user/flat.h
    linux-user/flatload.c
    linux-user/i386/target_syscall.h
    linux-user/ppc/target_syscall.h
    linux-user/sparc/target_syscall.h
    linux-user/syscall.c
    linux-user/syscall_defs.h
    linux-user/x86_64/target_syscall.h
    slirp/cksum.c
    slirp/if.c
    slirp/ip.h
    slirp/ip_icmp.c
    slirp/ip_icmp.h
    slirp/ip_input.c
    slirp/ip_output.c
    slirp/mbuf.c
    slirp/misc.c
    slirp/sbuf.c
    slirp/socket.c
    slirp/socket.h
    slirp/tcp_input.c
    slirp/tcpip.h
    slirp/tcp_output.c
    slirp/tcp_subr.c
    slirp/tcp_timer.c
    slirp/tftp.c
    slirp/udp.c
    slirp/udp.h
    target/cris/cpu.h
    target/cris/mmu.c
    target/cris/op_helper.c
    target/sh4/helper.c
    target/sh4/op_helper.c
    target/sh4/translate.c
    tcg/sparc/tcg-target.inc.c
    tests/tcg/cris/check_addo.c
    tests/tcg/cris/check_moveq.c
    tests/tcg/cris/check_swap.c
    tests/tcg/multiarch/test-mmap.c
    ui/vnc-enc-hextile-template.h
    ui/vnc-enc-zywrle.h
    util/envlist.c
    util/readline.c

The following have only TABs:

    bsd-user/i386/target_signal.h
    bsd-user/sparc64/target_signal.h
    bsd-user/sparc64/target_syscall.h
    bsd-user/sparc/target_signal.h
    bsd-user/sparc/target_syscall.h
    bsd-user/x86_64/target_signal.h
    crypto/desrfb.c
    hw/audio/intel-hda-defs.h
    hw/core/uboot_image.h
    hw/sh4/sh7750_regnames.c
    hw/sh4/sh7750_regs.h
    include/hw/cris/etraxfs_dma.h
    linux-user/alpha/termbits.h
    linux-user/arm/nwfpe/fpopcode.h
    linux-user/arm/nwfpe/fpsr.h
    linux-user/arm/syscall_nr.h
    linux-user/arm/target_signal.h
    linux-user/cris/target_signal.h
    linux-user/i386/target_signal.h
    linux-user/linux_loop.h
    linux-user/m68k/target_signal.h
    linux-user/microblaze/target_signal.h
    linux-user/mips64/target_signal.h
    linux-user/mips/target_signal.h
    linux-user/mips/target_syscall.h
    linux-user/mips/termbits.h
    linux-user/ppc/target_signal.h
    linux-user/sh4/target_signal.h
    linux-user/sh4/termbits.h
    linux-user/sparc64/target_syscall.h
    linux-user/sparc/target_signal.h
    linux-user/x86_64/target_signal.h
    linux-user/x86_64/termbits.h
    pc-bios/optionrom/optionrom.h
    slirp/mbuf.h
    slirp/misc.h
    slirp/sbuf.h
    slirp/tcp.h
    slirp/tcp_timer.h
    slirp/tcp_var.h
    target/i386/svm.h
    target/sparc/asi.h
    target/xtensa/core-dc232b/xtensa-modules.inc.c
    target/xtensa/core-dc233c/xtensa-modules.inc.c
    target/xtensa/core-de212/core-isa.h
    target/xtensa/core-de212/xtensa-modules.inc.c
    target/xtensa/core-fsf/xtensa-modules.inc.c
    target/xtensa/core-sample_controller/core-isa.h
    target/xtensa/core-sample_controller/xtensa-modules.inc.c
    target/xtensa/core-test_kc705_be/core-isa.h
    target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
    tests/tcg/cris/check_abs.c
    tests/tcg/cris/check_addc.c
    tests/tcg/cris/check_addcm.c
    tests/tcg/cris/check_addoq.c
    tests/tcg/cris/check_bound.c
    tests/tcg/cris/check_ftag.c
    tests/tcg/cris/check_int64.c
    tests/tcg/cris/check_lz.c
    tests/tcg/cris/check_openpf5.c
    tests/tcg/cris/check_sigalrm.c
    tests/tcg/cris/crisutils.h
    tests/tcg/cris/sys.c
    tests/tcg/i386/test-i386-ssse3.c
    ui/vgafont.h

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:56 +01:00
Paolo Bonzini
eae3eb3e18 qemu/queue.h: simplify reverse access to QTAILQ
The new definition of QTAILQ does not require passing the headname,
remove it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Paolo Bonzini
b58deb344d qemu/queue.h: leave head structs anonymous unless necessary
Most list head structs need not be given a name.  In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed.  In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct.  So clean up everything, not giving a
name except in the rare case where it is necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Richard Henderson
4250da1092 tcg: Improve call argument loading
Free the argument register only after we have verified that the
temporary is not already in that register.  This case is likely
now that we are back propagating the preferred register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:43 +11:00
Richard Henderson
25f49c5f15 tcg: Record register preferences during liveness
With these preferences, we can arrange for function call arguments to
be computed into the proper registers instead of requiring extra moves.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:35 +11:00
Richard Henderson
ae36a246ed tcg: Add TCG_OPF_BB_EXIT
Use this to notice the opcodes that exit the TB, which implies
that local temps are really dead and need not be synced.

Previously we so marked the true end of the TB, but that was
immediately overwritten by the la_bb_end invoked by any
TCG_OPF_BB_END opcode, like exit_tb.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:20 +11:00
Richard Henderson
f65a061c39 tcg: Split out more subroutines from liveness_pass_1
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:17 +11:00
Richard Henderson
2616c80821 tcg: Rename and adjust liveness_pass_1 helpers
No need for a "tcg_" prefix for a static function; we already
have another "la_" prefix for indicating liveness analysis.
Pass in nb_globals and nb_temps, as we will already have them
in registers for other loops within the parent function.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:09 +11:00
Richard Henderson
152c35aab4 tcg: Reindent parts of liveness_pass_1
There are two blocks of the form

    if (foo) {
        stuff1;
        goto bar;
    } else {
    baz:
        stuff2;
    }

which have unnecessary and confusing indentation.
Remove the else and unindent stuff2.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:57 +11:00
Richard Henderson
1894f69a61 tcg: Dump register preference info with liveness
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:48 +11:00
Richard Henderson
d62816f2db tcg: Improve register allocation for matching constraints
Try harder to honor the output_pref.  When we're forced to allocate
a second register for the input, it does not need to use the input
constraint; that will be honored by the register we allocate for the
output and a move is already required.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:40 +11:00
Richard Henderson
69e3706d2b tcg: Add output_pref to TCGOp
Allocate storage for, but do not yet fill in, per-opcode
preferences for the output operands.  Pass it in to the
register allocation routines for output operands.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:37 +11:00
Richard Henderson
ba87719cd2 tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi
Pass this through to temp_sync.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:34 +11:00
Richard Henderson
98b4e186c1 tcg: Add preferred_reg argument to temp_sync
Pass this through to tcg_reg_alloc.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:32 +11:00
Richard Henderson
b722452aef tcg: Add preferred_reg argument to temp_load
Pass this through to tcg_reg_alloc.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:29 +11:00
Richard Henderson
b016486e7b tcg: Add preferred_reg argument to tcg_reg_alloc
This new argument will aid register allocation by indicating how
the temporary will be used in future.  If the preference cannot
be satisfied, fall back to the constraints of the current insn.

Short circuit the preference when it cannot be satisfied or if
it does not further constrain the operation.

With an eye toward optimizing function call sequences, optimize
for the preferred_reg set containing a single register.

For the moment, all users pass 0 for preference.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:06 +11:00
Richard Henderson
b4fc67c7af tcg: Add reachable_code_pass
Delete trivially dead code that follows unconditional branches and
noreturn helpers.  These can occur either via optimization or via
the structure of a target's translator following an exception.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:30 +11:00