Fixes for 128/64 division.
Cleanup tcg/optimize.c
Optimize redundant sign extensions
-----BEGIN PGP SIGNATURE-----
iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmF7cygdHHJpY2hhcmQu
aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV8mAggAtHuBHs018O6k9dSl
5JJReghwMvsapV5w3MTfN72UR8xTVyC0+dk+P3hv2qJMx/Oofb2Z0m9e9n/iwWxJ
kktySWUuHXE/Hty4fVSEfUdx0C4FBF49I1PllzzjS8gR2gHbEoHXc2doJVCXCW0C
BSKzWERZjVdHWT2GeBtSV0n4vOoiHoBaa5ZcH7VVXVOlpT2iu8Tn3RlVELA1h3pY
NeDLCONWNAXHDQfM+63glLDTZ7eMZ8deOcLgJAiYDA2XVegYGeTZuqdBT3SiTno+
ts4D5aBkmy8yinCcJQktd3alsM1cwYlco0U/x8+JEvNqzWmLzsRpox7g6+rrpe+d
KhZ7Ww==
=UEO3
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20211028' into staging
Improvements to qemu/int128
Fixes for 128/64 division.
Cleanup tcg/optimize.c
Optimize redundant sign extensions
# gpg: Signature made Thu 28 Oct 2021 09:06:00 PM PDT
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]
* remotes/rth/tags/pull-tcg-20211028: (60 commits)
softmmu: fix for "after access" watchpoints
softmmu: remove useless condition in watchpoint check
softmmu: fix watchpoint processing in icount mode
tcg/optimize: Propagate sign info for shifting
tcg/optimize: Propagate sign info for bit counting
tcg/optimize: Propagate sign info for setcond
tcg/optimize: Propagate sign info for logical operations
tcg/optimize: Optimize sign extensions
tcg/optimize: Use fold_xx_to_i for rem
tcg/optimize: Use fold_xi_to_x for div
tcg/optimize: Use fold_xi_to_x for mul
tcg/optimize: Use fold_xx_to_i for orc
tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
tcg: Extend call args using the correct opcodes
tcg/optimize: Sink commutative operand swapping into fold functions
tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
tcg/optimize: Split out fold_masks
tcg/optimize: Split out fold_ix_to_i
tcg/optimize: Split out fold_xi_to_x
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
New enum QapiSpecialFeature enumerates the special feature flags.
New helper gen_special_features() returns code to represent a
collection of special feature flags as a bitset.
The next few commits will put them to use.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20211028102520.747396-5-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20211028102520.747396-4-armbru@redhat.com>
Add special feature 'unstable' everywhere the name starts with 'x-',
except for InputBarrierProperties member x-origin and
MemoryBackendProperties member x-use-canonical-path-for-ramblock-id,
because these two are actually stable.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20211028102520.747396-3-armbru@redhat.com>
By convention, names starting with "x-" are experimental. The parts
of external interfaces so named may be withdrawn or changed
incompatibly in future releases.
The naming convention makes unstable interfaces easy to recognize.
Promoting something from experimental to stable involves a name
change. Client code needs to be updated. Occasionally bothersome.
Worse, the convention is not universally observed:
* QOM type "input-barrier" has properties "x-origin", "y-origin".
Looks accidental, but it's ABI since 4.2.
* QOM types "memory-backend-file", "memory-backend-memfd",
"memory-backend-ram", and "memory-backend-epc" have a property
"x-use-canonical-path-for-ramblock-id" that is documented to be
stable despite its name.
We could document these exceptions, but documentation helps only
humans. We want to recognize "unstable" in code, like "deprecated".
So support recognizing it the same way: introduce new special feature
flag "unstable". It will be treated specially by the QAPI generator,
like the existing feature flag "deprecated", and unlike regular
feature flags.
This commit updates documentation and prepares tests. The next commit
updates the QAPI schema. The remaining patches update the QAPI
generator and wire up -compat policy checking.
Management applications can then use query-qmp-schema and -compat to
manage or guard against use of unstable interfaces the same way as for
deprecated interfaces.
docs/devel/qapi-code-gen.txt no longer mandates the naming convention.
Using it anyway might help writers of programs that aren't
full-fledged management applications. Not using it can save us
bothersome renames. We'll see how that shakes out.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20211028102520.747396-2-armbru@redhat.com>
The sNaN propagation behavior has been changed since cd20cee7 in
https://github.com/riscv/riscv-isa-manual.
In Priv spec v1.10, RVF is v2.0. fmin.s and fmax.s are implemented with
IEEE 754-2008 minNum and maxNum operations.
In Priv spec v1.11, RVF is v2.2. fmin.s and fmax.s are amended to
implement IEEE 754-2019 minimumNumber and maximumNumber operations.
Therefore, to prevent the risk of having too many version variables.
Instead of introducing an extra *fext_ver* variable, we tie RVF version
to Priv version. Though it's not completely accurate but is close enough.
Signed-off-by: Chih-Min Chao <chihmin.chao@sifive.com>
Signed-off-by: Frank Chang <frank.chang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211021160847.2748577-3-frank.chang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
For "fmax/fmin ft0, ft1, ft2" and if one of the inputs is sNaN,
The original logic:
Return NaN and set invalid flag if ft1 == sNaN || ft2 == sNan.
The alternative path:
Set invalid flag if ft1 == sNaN || ft2 == sNaN.
Return NaN only if ft1 == NaN && ft2 == NaN.
The IEEE 754 spec allows both implementation and some architecture such
as riscv choose different defintions in two spec versions.
(riscv-spec-v2.2 use original version, riscv-spec-20191213 changes to
alternative)
Signed-off-by: Chih-Min Chao <chihmin.chao@sifive.com>
Signed-off-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211021160847.2748577-2-frank.chang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
There is no need to "force an hs exception" as the current privilege
level, the state of the global ie and of the delegation registers should
be enough to route the interrupt to the appropriate privilege level in
riscv_cpu_do_interrupt. The is true for both asynchronous and
synchronous exceptions, specifically, guest page faults which must be
hardwired to zero hedeleg. As such the hs_force_except mechanism can be
removed.
Signed-off-by: Jose Martins <josemartins90@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20211026145126.11025-3-josemartins90@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
VS interrupts (2, 6, 10) were not correctly forwarded to hs-mode when
not delegated in hideleg (which was not being taken into account). This
was mainly because hs level sie was not always considered enabled when
it should. The spec states that "Interrupts for higher-privilege modes,
y>x, are always globally enabled regardless of the setting of the global
yIE bit for the higher-privilege mode." and also "For purposes of
interrupt global enables, HS-mode is considered more privileged than
VS-mode, and VS-mode is considered more privileged than VU-mode". Also,
vs-level interrupts were not being taken into account unless V=1, but
should be unless delegated.
Finally, there is no need for a special case for to handle vs interrupts
as the current privilege level, the state of the global ie and of the
delegation registers should be enough to route all interrupts to the
appropriate privilege level in riscv_cpu_do_interrupt.
Signed-off-by: Jose Martins <josemartins90@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20211026145126.11025-2-josemartins90@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Fix bug to delay writes to USR until packet commit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJhe3GcAAoJEHsCRPsS3kQia7AIAJvcTa3D6EYT+NcVJONKp4GR
UXfoersAkX1FlKB2PJoJVwSl/KZA0mxVFg3tzbnAoCsuWgZZ2zM0y+N6jWKASqjZ
64hYXu8NYX+TdaclsRfo933Hexdm8P0GnsNV1YSe71dB0ZP1z9Cu7BSdp/iiCDPH
AwUbqMmwNmMFPjgN0/AL7dGgdUf35j8cdD1IPpmPXZTlcWnI/lMVJ2HNqrGiiALK
hBRyqsenDvdymH/UwanswRXtkkbA7FG73SBeMa1OVWasjNtl+vAqnSp7PyitG0wZ
Chao5OCbqc19C7tU+xapYzNM6g62j0Ac7g0L4Pif4vEX55m9MZGGJP2w3S+950w=
=ZqGP
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/quic/tags/pull-hex-20211028' into staging
Followup to replace more tcg_const_* with tcg_constant_tl*
Fix bug to delay writes to USR until packet commit
# gpg: Signature made Thu 28 Oct 2021 08:59:24 PM PDT
# gpg: using RSA key 7B0244FB12DE4422
# gpg: Good signature from "Taylor Simpson (Rock on) <tsimpson@quicinc.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 3635 C788 CE62 B91F D4C5 9AB4 7B02 44FB 12DE 4422
* remotes/quic/tags/pull-hex-20211028:
Hexagon (target/hexagon) put writes to USR into temp until commit
Hexagon (target/hexagon) more tcg_constant_*
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Watchpoints that should fire after the memory access
break an execution of the current block, try to
translate current instruction into the separate block,
which then causes debug interrupt.
But cpu_interrupt can't be called in such block when
icount is enabled, because interrupts muse be allowed
explicitly.
This patch sets CF_LAST_IO flag for retranslated block,
allowing interrupt request for the last instruction.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542169727.2127597.8141772572696627329.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
cpu_check_watchpoint function checks cpu->watchpoint_hit at the entry.
But then it also does the same in the middle of the function,
while this field can't change.
That is why this patch removes this useless condition.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542169094.2127597.8801843697434113110.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Watchpoint processing code restores vCPU state twice:
in tb_check_watchpoint and in cpu_loop_exit_restore/cpu_restore_state.
Normally it does not affect anything, but in icount mode instruction
counter is incremented twice and becomes incorrect.
This patch eliminates unneeded CPU state restore.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542168516.2127597.8781375223437124644.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For constant shifts, we can simply shift the s_mask.
For variable shifts, we know that sar does not reduce
the s_mask, which helps for sequences like
ext32s_i64 t, in
sar_i64 t, t, v
ext32s_i64 out, t
allowing the final extend to be eliminated.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The results are generally 6 bit unsigned values, though
the count leading and trailing bits may produce any value
for a zero input.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The result is either 0 or 1, which means that we have
a 2 bit signed result, and thus 62 bits of sign.
For clarity, use the smask_from_zmask function.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Sign repetitions are perforce all identical, whether they are 1 or 0.
Bitwise operations preserve the relative quantity of the repetitions.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Certain targets, like riscv, produce signed 32-bit results.
This can lead to lots of redundant extensions as values are
manipulated.
Begin by tracking only the obvious sign-extensions, and
converting them to simple copies when possible.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Recognize the constant function for remainder.
Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Recognize the identity function for division.
Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Recognize the identity function for low-part multiply.
Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Recognize the constant function for or-complement.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This "garbage" setting pre-dates the addition of the type
changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64,
and INDEX_op_extr{l,h}_i64_i32.
So now we have a definitive points at which to adjust z_mask
to eliminate such bits from the 32-bit operands.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Pretending that the source is i64 when it is in fact i32 is
incorrect; we have type-changing opcodes that must be used.
This bug trips up the subsequent change to the optimizer.
Fixes: 4f2331e5b6
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change SET_USR_FIELD to write to hex_new_value[HEX_REG_USR] instead
of hex_gpr[HEX_REG_USR].
Then, we need code to mark the instructions that can set implicitly
set USR
- Macros added to hex_common.py
- A_FPOP added in translate.c
Test case added in tests/tcg/hexagon/overflow.c
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Change additional tcg_const_tl to tcg_constant_tl
Note that gen_pred_cancal had slot_mask initialized with tcg_const_tl.
However, it is not constant throughout, so we initialize it with
tcg_temp_new and replace the first use with the constant value.
Inspired-by: Richard Henderson <richard.henderson@linaro.org>
Inspired-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Fixup the PLIC context address to correctly support the threshold and
claim register.
Fixes: ef63100648 ("hw/riscv: opentitan: Update to the latest build")
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-id: 20211025040657.262696-1-alistair.francis@opensource.wdc.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Message-id: 20211022060133.3045020-5-alistair.francis@opensource.wdc.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Message-id: 20211022060133.3045020-4-alistair.francis@opensource.wdc.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Message-id: 20211022060133.3045020-3-alistair.francis@opensource.wdc.com
Add a generic function that can create the PLIC strings.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-id: 20211022060133.3045020-2-alistair.francis@opensource.wdc.com
Using a macro for the PLIC configuration doesn't make the code any
easier to read. Instead it makes it harder to figure out what is going
on, so let's remove it.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20211022060133.3045020-1-alistair.francis@opensource.wdc.com
Most of these are handled by creating a fold_const2_commutative
to handle all of the binary operators. The rest were already
handled on a case-by-case basis in the switch, and have their
own fold function in which to place the call.
We now have only one major switch on TCGOpcode.
Introduce NO_DEST and a block comment for swap_commutative in
order to make the handling of brcond and movcond opcodes cleaner.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rename to fold_addsub2.
Use Int128 to implement the wider operation.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rename to fold_multiply2, and handle muls2_i32, mulu2_i64,
and muls2_i64.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move all of the known-zero optimizations into the per-opcode
functions. Use fold_masks when there is a possibility of the
result being determined, and simply set ctx->z_mask otherwise.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Pull the "op r, 0, b => movi r, 0" optimization into a function,
and use it in fold_shift.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Pull the "op r, a, i => mov r, a" optimization into a function,
and use them in the outer-most logical operations.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Even though there is only one user, place this more complex
conversion into its own helper.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Split out the conditional conversion from a more complex logical
operation to a simple NOT. Create a couple more helpers to make
this easy for the outer-most logical operations.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Compute the type of the operation early.
There are at least 4 places that used a def->flags ladder
to determine the type of the operation being optimized.
There were two places that assumed !TCG_OPF_64BIT means
TCG_TYPE_I32, and so could potentially compute incorrect
results for vector operations.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Pull the "op r, a, 0 => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>