Hexagon HVX (target/hexagon) README

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
This commit is contained in:
Taylor Simpson 2021-03-10 20:08:48 -06:00
parent 91e8394415
commit 375bcf389f

View File

@ -1,9 +1,13 @@
Hexagon is Qualcomm's very long instruction word (VLIW) digital signal Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
processor(DSP). processor(DSP). We also support Hexagon Vector eXtensions (HVX). HVX
is a wide vector coprocessor designed for high performance computer vision,
image processing, machine learning, and other workloads.
The following versions of the Hexagon core are supported The following versions of the Hexagon core are supported
Scalar core: v67 Scalar core: v67
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
HVX extension: v66
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
We presented an overview of the project at the 2019 KVM Forum. We presented an overview of the project at the 2019 KVM Forum.
https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
@ -124,6 +128,71 @@ There are also cases where we brute force the TCG code generation.
Instructions with multiple definitions are examples. These require special Instructions with multiple definitions are examples. These require special
handling because qemu helpers can only return a single value. handling because qemu helpers can only return a single value.
For HVX vectors, the generator behaves slightly differently. The wide vectors
won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the
address to helper functions. Here's an example for an HVX vector-add-word
istruction.
static void generate_V6_vaddw(
CPUHexagonState *env,
DisasContext *ctx,
Insn *insn,
Packet *pkt)
{
const int VdN = insn->regno[0];
const intptr_t VdV_off =
ctx_future_vreg_off(ctx, VdN, 1, true);
TCGv_ptr VdV = tcg_temp_local_new_ptr();
tcg_gen_addi_ptr(VdV, cpu_env, VdV_off);
const int VuN = insn->regno[1];
const intptr_t VuV_off =
vreg_src_off(ctx, VuN);
TCGv_ptr VuV = tcg_temp_local_new_ptr();
const int VvN = insn->regno[2];
const intptr_t VvV_off =
vreg_src_off(ctx, VvN);
TCGv_ptr VvV = tcg_temp_local_new_ptr();
tcg_gen_addi_ptr(VuV, cpu_env, VuV_off);
tcg_gen_addi_ptr(VvV, cpu_env, VvV_off);
TCGv slot = tcg_constant_tl(insn->slot);
gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV, slot);
tcg_temp_free(slot);
gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
tcg_temp_free_ptr(VdV);
tcg_temp_free_ptr(VuV);
tcg_temp_free_ptr(VvV);
}
Notice that we also generate a variable named <operand>_off for each operand of
the instruction. This makes it easy to override the instruction semantics with
functions from tcg-op-gvec.h. Here's the override for this instruction.
#define fGEN_TCG_V6_vaddw(SHORTCODE) \
tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
sizeof(MMVector), sizeof(MMVector))
Finally, we notice that the override doesn't use the TCGv_ptr variables, so
we don't generate them when an override is present. Here is what we generate
when the override is present.
static void generate_V6_vaddw(
CPUHexagonState *env,
DisasContext *ctx,
Insn *insn,
Packet *pkt)
{
const int VdN = insn->regno[0];
const intptr_t VdV_off =
ctx_future_vreg_off(ctx, VdN, 1, true);
const int VuN = insn->regno[1];
const intptr_t VuV_off =
vreg_src_off(ctx, VuN);
const int VvN = insn->regno[2];
const intptr_t VvV_off =
vreg_src_off(ctx, VvN);
fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } });
gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
}
In addition to instruction semantics, we use a generator to create the decode In addition to instruction semantics, we use a generator to create the decode
tree. This generation is also a two step process. The first step is to run tree. This generation is also a two step process. The first step is to run
target/hexagon/gen_dectree_import.c to produce target/hexagon/gen_dectree_import.c to produce
@ -140,6 +209,7 @@ runtime information for each thread and contains stuff like the GPR and
predicate registers. predicate registers.
macros.h macros.h
mmvec/macros.h
The Hexagon arch lib relies heavily on macros for the instruction semantics. The Hexagon arch lib relies heavily on macros for the instruction semantics.
This is a great advantage for qemu because we can override them for different This is a great advantage for qemu because we can override them for different
@ -203,6 +273,15 @@ During runtime, the following fields in CPUHexagonState (see cpu.h) are used
pred_written boolean indicating if predicate was written pred_written boolean indicating if predicate was written
mem_log_stores record of the stores (indexed by slot) mem_log_stores record of the stores (indexed by slot)
For Hexagon Vector eXtensions (HVX), the following fields are used
VRegs Vector registers
future_VRegs Registers to be stored during packet commit
tmp_VRegs Temporary registers *not* stored during commit
VRegs_updated Mask of predicated vector writes
QRegs Q (vector predicate) registers
future_QRegs Registers to be stored during packet commit
QRegs_updated Mask of predicated vector writes
*** Debugging *** *** Debugging ***
You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in