Reducing the number of arguments reduces the overhead of the helper
call
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230405164211.30015-2-tsimpson@quicinc.com>
The following improvements are made for predicated HVX instructions
During gen_commit_hvx, unconditionally move the "new" value into
the dest
Don't set slot_cancelled
Remove runtime bookkeeping of which registers were updated
Reduce the cases where gen_log_vreg_write[_pair] is called
It's only needed for special operands VxxV and VyV
Remove gen_log_qreg_write
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230307025828.1612809-15-tsimpson@quicinc.com>
We only need to track slot for predicated stores and predicated HVX
instructions.
Add arguments to the probe helper functions to indicate if the slot
is predicated.
Here is a simple example of the differences in the TCG code generated:
IN:
0x00400094: 0xf900c102 { if (P0) R2 = and(R0,R1) }
BEFORE
---- 00400094
mov_i32 slot_cancelled,$0x0
mov_i32 new_r2,r2
and_i32 tmp0,p0,$0x1
brcond_i32 tmp0,$0x0,eq,$L1
and_i32 tmp0,r0,r1
mov_i32 new_r2,tmp0
br $L2
set_label $L1
or_i32 slot_cancelled,slot_cancelled,$0x8
set_label $L2
mov_i32 r2,new_r2
AFTER
---- 00400094
mov_i32 new_r2,r2
and_i32 tmp0,p0,$0x1
brcond_i32 tmp0,$0x0,eq,$L1
and_i32 tmp0,r0,r1
mov_i32 new_r2,tmp0
br $L2
set_label $L1
set_label $L2
mov_i32 r2,new_r2
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230307025828.1612809-14-tsimpson@quicinc.com>
Extend the analyze_<tag> functions for HVX vector and predicate writes
Remove calls to ctx_log_vreg_write[_pair] from gen_tcg_funcs.py
During gen_start_packet, reload the predicated HVX registers into
fugure_VRegs and tmp_VRegs
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230307025828.1612809-8-tsimpson@quicinc.com>
The pkt_has_store_s1 field in CPUHexagonState is only needed in generated
helpers for scalar load instructions. See check_noshuf and mem_load[1248]
in op_helper.c.
We add logic in gen_analyze_funcs.py to set need_pkt_has_store_s1 in
DisasContext when it is needed at runtime.
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230307025828.1612809-7-tsimpson@quicinc.com>
We create a new generator that creates an analyze_<tag> function for
each instruction. Currently, these functions record the writes to
R, P, and C registers by calling ctx_log_reg_write[_pair] or
ctx_log_pred_write.
During gen_start_packet, we invoke the analyze_<tag> function for
each instruction in the packet, and we mark the implicit register
and predicate writes.
Doing the analysis up front has several advantages
- We remove calls to ctx_log_* from gen_tcg_funcs.py and genptr.c
- After the analysis is performed, we can initialize hex_new_value
for each of the predicated assignments rather than during TCG
generation for the instructions
- This is a stepping stone for future work where the analysis will
include the set of registers that are read. In cases where
the packet doesn't have an overlap between the registers that are
written and registers that are read, we can avoid the intermediate
step of writing to hex_new_value. Note that other checks will also
be needed (e.g., no instructions can raise an exception).
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230307025828.1612809-6-tsimpson@quicinc.com>
Direct block chaining is documented here
https://qemu.readthedocs.io/en/latest/devel/tcg.html#direct-block-chaining
Hexagon inner loops end with the endloop0 instruction
To go back to the beginning of the loop, this instructions writes to PC
from register SA0 (start address 0). To use direct block chaining, we
have to assign PC with a constant value. So, we specialize the code
generation when the start of the translation block is equal to SA0.
When this is the case, we defer the compare/branch from endloop0 to
gen_end_tb. When this is done, we can assign the start address of the TB
to PC.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Message-Id: <20221108162906.3166-12-tsimpson@quicinc.com>
Direct block chaining is documented here
https://qemu.readthedocs.io/en/latest/devel/tcg.html#direct-block-chaining
Recall that Hexagon allows packets with multiple jumps where only the
first one with a true predicate will actually jump. We can use
tcg_gen_goto_tb/tcg_gen_exit_tb when the packet contains a single
PC-relative branch or jump. If not, we use tcg_gen_lookup_and_goto_ptr.
We add the following to DisasContext in order to delay the branching
until the end of packet commit (in gen_end_tb)
branch_cond
The TCGCond condition under which the branch is taken
When branch_cond == TCG_COND_NEVER, there isn't a single
direct branch in this packet.
When branch_cond != TCG_COND_ALWAYS, the value is in
hex_branch_taken
branch_dest
The destination of the branch
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Message-Id: <20221108162906.3166-11-tsimpson@quicinc.com>
The imported files don't properly mark all CONDEXEC instructions, so
we add some logic to hex_common.py to add the attribute.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Message-Id: <20221108162906.3166-7-tsimpson@quicinc.com>
Here are example instructions with a predicated .tmp/.cur assignment
if (p1) v12.tmp = vmem(r7 + #0)
if (p0) v12.cur = vmem(r9 + #0)
The .tmp/.cur indicates that references to v12 in the same packet
take the result of the load. However, when the predicate is false,
the value at the start of the packet should be used. After the packet
commits, the .tmp value is dropped, but the .cur value is maintained.
To fix this bug, we preload the original value from the HVX register
into the temporary used for the result.
Test cases added to tests/tcg/hexagon/hvx_misc.c
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Co-authored-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Message-Id: <20221108162906.3166-3-tsimpson@quicinc.com>
This enables us to reduce the number of parameters to many functions
In particular, the generated functions previously took all 3 as arguments
Not only does this simplify the code, it improves the translation time
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Message-Id: <20221108162906.3166-2-tsimpson@quicinc.com>
Many files use "qemu/log.h" declarations but neglect to include
it (they inherit it via "exec/exec-all.h"). "exec/exec-all.h" is
a core component and shouldn't be used that way. Move the
"qemu/log.h" inclusion locally to each unit requiring it.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220207082756.82600-10-f4bug@amsat.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Change #if HEX_DEBUG to if (HEX_DEBUG) so the debug code doesn't bit rot
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1617930474-31979-17-git-send-email-tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1617930474-31979-8-git-send-email-tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Multiple writes to the same preg are and'ed together. Rather than
generating a runtime check, we can determine at TCG generation time
if the predicate has previously been written in the packet.
Test added to tests/tcg/hexagon/misc.c
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1617930474-31979-7-git-send-email-tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When exiting a TB, generate all the code before returning from
hexagon_tr_translate_packet so that nothing needs to be done in
hexagon_tr_tb_stop.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1617930474-31979-6-git-send-email-tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Read the instruction memory
Create a packet data structure
Generate TCG code for the start of the packet
Invoke the generate function for each instruction
Generate TCG code for the end of the packet
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Message-Id: <1612763186-18161-30-git-send-email-tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>