Hexagon HVX (target/hexagon) README

Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
2021-03-10 20:08:48 -06:00 · 2021-03-10 20:08:48 -06:00 · 375bcf389f
commit 375bcf389f
parent 91e8394415
1 changed files with 80 additions and 1 deletions
--- a/target/hexagon/README
+++ b/target/hexagon/README
@ -1,9 +1,13 @@
 Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
-processor(DSP).
+processor(DSP).  We also support Hexagon Vector eXtensions (HVX).  HVX
 is a wide vector coprocessor designed for high performance computer vision,
 image processing, machine learning, and other workloads.
 The following versions of the Hexagon core are supported
    Scalar core: v67
    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
    HVX extension: v66
    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
 We presented an overview of the project at the 2019 KVM Forum.
    https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
@ -124,6 +128,71 @@ There are also cases where we brute force the TCG code generation.
 Instructions with multiple definitions are examples.  These require special
 handling because qemu helpers can only return a single value.
 For HVX vectors, the generator behaves slightly differently.  The wide vectors
 won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the
 address to helper functions.  Here's an example for an HVX vector-add-word
 istruction.
    static void generate_V6_vaddw(
                    CPUHexagonState *env,
                    DisasContext *ctx,
                    Insn *insn,
                    Packet *pkt)
    {
        const int VdN = insn->regno[0];
        const intptr_t VdV_off =
            ctx_future_vreg_off(ctx, VdN, 1, true);
        TCGv_ptr VdV = tcg_temp_local_new_ptr();
        tcg_gen_addi_ptr(VdV, cpu_env, VdV_off);
        const int VuN = insn->regno[1];
        const intptr_t VuV_off =
            vreg_src_off(ctx, VuN);
        TCGv_ptr VuV = tcg_temp_local_new_ptr();
        const int VvN = insn->regno[2];
        const intptr_t VvV_off =
            vreg_src_off(ctx, VvN);
        TCGv_ptr VvV = tcg_temp_local_new_ptr();
        tcg_gen_addi_ptr(VuV, cpu_env, VuV_off);
        tcg_gen_addi_ptr(VvV, cpu_env, VvV_off);
        TCGv slot = tcg_constant_tl(insn->slot);
        gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV, slot);
        tcg_temp_free(slot);
        gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
        tcg_temp_free_ptr(VdV);
        tcg_temp_free_ptr(VuV);
        tcg_temp_free_ptr(VvV);
    }
 Notice that we also generate a variable named <operand>_off for each operand of
 the instruction.  This makes it easy to override the instruction semantics with
 functions from tcg-op-gvec.h.  Here's the override for this instruction.
    #define fGEN_TCG_V6_vaddw(SHORTCODE) \
        tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
                         sizeof(MMVector), sizeof(MMVector))
 Finally, we notice that the override doesn't use the TCGv_ptr variables, so
 we don't generate them when an override is present.  Here is what we generate
 when the override is present.
    static void generate_V6_vaddw(
                    CPUHexagonState *env,
                    DisasContext *ctx,
                    Insn *insn,
                    Packet *pkt)
    {
        const int VdN = insn->regno[0];
        const intptr_t VdV_off =
            ctx_future_vreg_off(ctx, VdN, 1, true);
        const int VuN = insn->regno[1];
        const intptr_t VuV_off =
            vreg_src_off(ctx, VuN);
        const int VvN = insn->regno[2];
        const intptr_t VvV_off =
            vreg_src_off(ctx, VvN);
        fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } });
        gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
    }
 In addition to instruction semantics, we use a generator to create the decode
 tree.  This generation is also a two step process.  The first step is to run
 target/hexagon/gen_dectree_import.c to produce
@ -140,6 +209,7 @@ runtime information for each thread and contains stuff like the GPR and
 predicate registers.
 macros.h
 mmvec/macros.h
 The Hexagon arch lib relies heavily on macros for the instruction semantics.
 This is a great advantage for qemu because we can override them for different
@ -203,6 +273,15 @@ During runtime, the following fields in CPUHexagonState (see cpu.h) are used
    pred_written          boolean indicating if predicate was written
    mem_log_stores        record of the stores (indexed by slot)
 For Hexagon Vector eXtensions (HVX), the following fields are used
    VRegs                       Vector registers
    future_VRegs                Registers to be stored during packet commit
    tmp_VRegs                   Temporary registers *not* stored during commit
    VRegs_updated               Mask of predicated vector writes
    QRegs                       Q (vector predicate) registers
    future_QRegs                Registers to be stored during packet commit
    QRegs_updated               Mask of predicated vector writes
 *** Debugging ***
 You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in