Merge tcg_out_bswap32 and tcg_out_bswap32s.
Use the flags in the internal uses for loads and stores.
For mips32r2 bswap32 with zero-extension, standardize on
WSBH+ROTR+DEXT. This is the same number of insns as the
previous DSBH+DSHD+DSRL but fits in better with the flags check.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Merge tcg_out_bswap16 and tcg_out_bswap16s. Use the flags
in the internal uses for loads and stores.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For INDEX_op_bswap16_i64, use 64-bit instructions so that we can
easily provide the extension to 64-bits. Drop the special case,
previously used, where the input is already zero-extended -- the
minor code size savings is not worth the complication.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For INDEX_op_bswap32_i32, pass 0 for flags: input not zero-extended,
output does not need extension within the host 64-bit register.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With the use of a suitable temporary, we can use the same
algorithm when src overlaps dst. The result is the same
number of instructions either way.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We will shortly require sari in other context;
split out both for cleanliness sake.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We will shortly require these in other context;
make the expansion as clear as possible.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Combine the three bswap16 routines, and differentiate via the flags.
Use the correct flags combination from the load/store routines, and
pass along the constant parameter from tcg_out_op.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Pass in the input and output size. We currently use 3 of the 5
possible combinations; the others may be used by new tcg opcodes.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Retain the current rorw bswap16 expansion for the zero-in/zero-out case.
Otherwise, perform a wider bswap plus a right-shift or extend.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This will eventually simplify front-end usage, and will allow
backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
optimization.
The argument is added during expansion, not currently exposed to the
front end translators. The backends currently only support a flags
value of either TCG_BSWAP_IZ, or (TCG_BSWAP_IZ | TCG_BSWAP_OZ),
since they all require zero top bytes and leave them that way.
At the existing call sites we pass in (TCG_BSWAP_IZ | TCG_BSWAP_OZ),
except for the flags-ignored cases of a 32-bit swap of a 32-bit
value and or a 64-bit swap of a 64-bit value, where we pass 0.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The Arm MVE VDUP implementation would like to be able to emit code to
duplicate a byte or halfword value into an i32. We have code to do
this already in tcg-op-gvec.c, so all we need to do is make the
functions global.
For consistency with other functions made available to the frontends:
* we rename to tcg_gen_dup_*
* we expose both the _i32 and _i64 forms
* we provide the #define for a _tl form
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
Assume that we'll have fewer temps allocated after
restarting with a fewer number of instructions.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This function should have been updated for vector types
when they were introduced.
Fixes: d2fd745fe8
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/367
Cc: qemu-stable@nongnu.org
Tested-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We should not be aligning the offset in temp_allocate_frame,
because the odd offset produces an aligned address in the end.
Instead, pass the logical offset into tcg_set_frame and add
the stack bias last.
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Wrap guest memory operations for tci like we do for cpu_ld*_data.
We cannot actually use the cpu_ldst.h interface without duplicating
the memory trace operations performed within, which will already
have been expanded into the tcg opcode stream.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These macros are only used in one place. By expanding,
we get to apply some common-subexpression elimination
and create some local variables.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This reverts commit dc09f047ed.
For tcg, tracepoints are expanded inline in tcg opcodes.
Using a helper which generates a second tracepoint is incorrect.
For system mode, the extraction and re-packing of MemOp and mmu_idx
lost the alignment information from MemOp. So we were no longer
raising alignment exceptions for !TARGET_ALIGNED_ONLY guests.
This can be seen in tests/tcg/xtensa/test_load_store.S.
For user mode, we must update to the new signature of g2h() so that
the revert compiles. We can leave set_helper_retaddr for later.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can share this code between 32-bit and 64-bit loads and stores.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We already had the 32-bit versions for a 32-bit host; expand this
to 64-bit hosts as well. The 64-bit opcodes are new.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We already had mulu2_i32 for a 32-bit host; expand this to 64-bit
hosts as well. The muls2_i32 and the 64-bit opcodes are new.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These were already present in tcg-target.c.inc,
but not in the interpreter.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When this opcode is not available in the backend, tcg middle-end
will expand this as a series of 5 opcodes. So implementing this
saves bytecode space.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This operation is critical to staying within the interpretation
loop longer, which avoids the overhead of setup and teardown for
many TBs.
The check in tcg_prologue_init is disabled because TCI does
want to use NULL to indicate exit, as opposed to branching to
a real epilogue.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This removes all of the problems with unaligned accesses
to the bytecode stream.
With an 8-bit opcode at the bottom, we have 24 bits remaining,
which are generally split into 6 4-bit slots. This fits well
with the maximum length opcodes, e.g. INDEX_op_add2_i32, which
have 6 register operands.
We have, in previous patches, rearranged things such that there
are no operations with a label which have more than one other
operand. Which leaves us with a 20-bit field in which to encode
a label, giving us a maximum TB size of 512k -- easily large.
Change the INDEX_op_tci_movi_{i32,i64} opcodes to tci_mov[il].
The former puts the immediate in the upper 20 bits of the insn,
like we do for the label displacement. The later uses a label
to reference an entry in the constant pool. Thus, in the worst
case we still have a single memory reference for any constant,
but now the constants are out-of-line of the bytecode and can
be shared between different moves saving space.
Change INDEX_op_call to use a label to reference a pair of
pointers in the constant pool. This removes the only slightly
dodgy link with the layout of struct TCGHelperInfo.
The re-encode cannot be done in pieces.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Inline it into its one caller, tci_write_reg64.
Drop the asserts that are redundant with tcg_read_r.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The encoding planned for tci does not have enough room for
brcond2, with 4 registers and a condition as input as well
as the label. Resolve the condition into TCG_REG_TMP, and
relax brcond to one register plus a label, considering the
condition to always be reg != 0.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We're about to adjust the offset range on host memory ops,
and the format of branches. Both will require a temporary.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This requires adjusting where arguments are stored.
Place them on the stack at left-aligned positions.
Adjust the stack frame to be at entirely positive offsets.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
As the only call-clobbered regs for TCI, these should
receive the least priority.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The current setting is much too pessimistic. Indicating only
the one or two registers that are actually assigned after a
call should avoid unnecessary movement between the register
array and the stack array.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add libffi as a build requirement for TCI.
Add libffi to the dockerfiles to satisfy that requirement.
Construct an ffi_cif structure for each unique typemask.
Record the result in a separate hash table for later lookup;
this allows helper_table to stay const.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This will give us both flags and typemask for use later.
We also fix a dumping bug, wherein calls generated for plugins
fail tcg_find_helper and print (null) instead of either a name
or the raw function pointer.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We're going to change how to look up the call flags from a TCGop,
so extract it as a helper.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We will shortly be interested in distinguishing pointers
from integers in the helper's declaration, as well as a
true void return. We currently have two parallel 1 bit
fields; merge them and expand to a 3 bit field.
Our current maximum is 7 helper arguments, plus the return
makes 8 * 3 = 24 bits used within the uint32_t typemask.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 5e8892db93 fixed several function signatures but tcg_out_op for
arm is missing. This patch fixes it as well.
Signed-off-by: Jose R. Ziviani <jziviani@suse.de>
Message-Id: <20210610224450.23425-1-jziviani@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Introduce a function to remove everything emitted
since a given point.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These variables belong to the jit side, not the user side.
Since tcg_init_ctx is no longer used outside of tcg/, move
the declaration to tcg-internal.h.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There's a change in mprotect() behaviour [1] in the latest macOS
on M1 and it's not yet clear if it's going to be fixed by Apple.
In this case, instead of changing permissions of N guard pages,
we change permissions of N rwx regions. The same number of
syscalls are required either way.
[1] https://gist.github.com/hikalium/75ae822466ee4da13cbbe486498a191f
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Do not handle protections on a case-by-case basis in the
various alloc_code_gen_buffer instances; do it within a
single loop in tcg_region_init.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
If qemu_get_host_physmem returns an odd number of pages,
then physmem / 8 will not be a multiple of the page size.
The following was observed on a gitlab runner:
ERROR qtest-arm/boot-serial-test - Bail out!
ERROR:../util/osdep.c:80:qemu_mprotect__osdep: \
assertion failed: (!(size & ~qemu_real_host_page_mask))
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move the call out of the N versions of alloc_code_gen_buffer
and into tcg_region_init.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change the interface from a boolean error indication to a
negative error vs a non-negative protection. For the moment
this is only interface change, not making use of the new data.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Do not mess around with setting values within tcg_init_ctx.
Put the values into 'region' directly, which is where they
will live for the lifetime of the program.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Shortly, the full code_gen_buffer will only be visible
to region.c, so move in_code_gen_buffer out-of-line.
Move the debugging versions of tcg_splitwx_to_{rx,rw}
to region.c as well, so that the compiler gets to see
the implementation of in_code_gen_buffer.
This leaves exactly one use of in_code_gen_buffer outside
of region.c, in cpu_restore_state. Which, being on the
exception path, is not performance critical.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Return output buffer and size via output pointer arguments,
rather than returning size via tcg_ctx->code_gen_buffer_size.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Compute the value using straight division and bounds,
rather than a loop. Pass in tb_size rather than reading
from tcg_init_ctx.code_gen_buffer_size,
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Give the field a name reflecting its actual meaning.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
A size is easier to work with than an end point,
particularly during initial buffer allocation.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Remove the ifdef ladder and move each define into the
appropriate header file.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Finish the divorce of tcg/ from hw/, and do not take
the max cpu value from MachineState; just remember what
we were passed in tcg_init.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Start removing the include of hw/boards.h from tcg/.
Pass down the max_cpus value from tcg_init_machine,
where we have the MachineState already.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Perform both tcg_context_init and tcg_region_init.
Do not leave this split to the caller.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Buffer management is integral to tcg. Do not leave the allocation
to code outside of tcg/. This is code movement, with further
cleanups to follow.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This has only one user, but will make more sense after some
code motion.
Always leave the tcg_init_ctx initialized to the first region,
in preparation for tcg_prologue_init(). This also requires
that we don't re-allocate the region for the first cpu, lest
we hit the assertion for total number of regions allocated .
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This has only one user, and currently needs an ifdef,
but will make more sense after some code motion.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
All callers immediately assert on error, so move the assert
into the function itself.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of delaying tcg_region_init until after tcg_prologue_init
is complete, do tcg_region_init first and let tcg_prologue_init
shrink the first region by the size of the generated prologue.
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Implement via expansion, so don't actually set TCG_TARGET_HAS_rotv_vec.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Implement via expansion, so don't actually set TCG_TARGET_HAS_roti_vec.
For NEON, this is shift-right followed by shift-left-and-insert.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The three vector shift by vector operations are all implemented via
expansion. Therefore do not actually set TCG_TARGET_HAS_shv_vec,
as none of shlv_vec, shrv_vec, sarv_vec may actually appear in the
instruction stream, and therefore also do not appear in tcg_target_op_def.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
NEON has 3 instructions implementing this 4 argument operation,
with each insn overlapping a different logical input onto the
destination register.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is minimum and maximum, signed and unsigned.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is saturating add and subtract, signed and unsigned.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This consists of the three immediate shifts: shli, shri, sari.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These logical and arithmetic operations are optional, but are
trivial to accomplish with the existing infrastructure.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Implementing dup2, add, sub, and, or, xor as the minimal set.
This allows us to actually enable neon in the header file.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Most of dupi is copied from tcg/aarch64, which has the same
encoding for AdvSimdExpandImm.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add registers and function stubs. The functionality
is disabled via use_neon_instructions defined to 0.
We must still include results for the mandatory opcodes in
tcg_target_op_def, as all opcodes are checked during tcg init.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change the return value to bool, because that's what is should
have been from the start. Pass the ct mask instead of the whole
TCGArgConstraint, as that's the only part that's relevant.
Change the value argument to int64_t. We will need the extra
width for 32-bit hosts wanting to match vector constants.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 15e8699f00 ("atomics: convert to reStructuredText") converted
docs/devel/atomics.txt to docs/devel/atomics.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/atomics.txt/atomics.rst/ $(git grep -l docs/devel/atomics.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-3-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The last argument of tcg_out_extr() must be in the range 0-31 if ext==0.
Before the fix, when m==0 it becomes 32 and it crashes with an Illegal
instruction on Apple Silicon. After the fix, it will be 0. If m is in
the range 1-31, it is the same as before.
Signed-off-by: Yasuo Kuwahara <kwhr00@gmail.com>
Message-Id: <CAHfJ0vSXnmnTLmT0kR=a8ACRdw_UsLYOhStzUzgVEHoH8U-7sA@mail.gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Stop including cpu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-4-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The addrl used to compare with SoftTLB entry should be sign-extended
in common case, and it will cause constant failing in SoftTLB
comparisons for the addrl whose address is over 0x80000000 on the
emulation of 32-bit guest on 64-bit host.
This is an important performance bug fix. Spec2000 gzip rate increase
from ~45 to ~140 on Loongson 3A4000 (MIPS compatible platform).
Signed-off-by: Kele Huang <kele.hwang@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210401100457.191458-1-kele.hwang@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There's a change in mprotect() behaviour [1] in the latest macOS
on M1 and it's not yet clear if it's going to be fixed by Apple.
As a short-term fix, ignore failures setting up the guard pages.
[1] https://gist.github.com/hikalium/75ae822466ee4da13cbbe486498a191f
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1914849
Message-Id: <20210320165720.1813545-3-richard.henderson@linaro.org>
The rw portion of the buffer is the only one in which overruns
can be generated. Allow the rx portion to be more completely
covered by huge pages.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20210320165720.1813545-2-richard.henderson@linaro.org>
There are two different versions of prototype for tcg_out_op and
tcg_out_vec_op functions:
1) using const TCGArg *args and const int *const_args arguments
2) using const TCGArg args[TCG_MAX_OP_ARGS] and const int
const_args[TCG_MAX_OP_ARGS] aguments.
This duality causes warnings on GCC 11 and prevents build using
--enable-werror. As second version provides more information,
unify functions prototypes to this variant.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Message-Id: <20210312121418.139093-1-mrezanin@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>