These can be simplified to and/or/andc/orc,
avoiding the load of the constantinto a register.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Do not allow cmpsel_vec to be expanded early, so that we can
make the correct decision wrt the sense of the comparison.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These can be simplified to and/or/andc/orc,
avoiding the load of the constantinto a register.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Do not allow cmpsel_vec to be expanded early, so that we can
make the correct decision wrt the sense of the comparison.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The avx512 vpblendm* instructions exactly implement cmpsel,
using a predicate input. Of course this matches nicely with
the avx512 predicate comparison instructions.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Extend tcg_out_evex_opc to handle the predicate and
zero-merging parameters of the evex prefix.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The sse/avx instruction set only has EQ and GT as direct comparisons.
Other signed comparisons can be generated from swapping and inversion.
However unsigned comparisons are not available and must be transformed
to signed comparisons by biasing the inputs.
The avx512 instruction set has a complete set of comparisons, with
results placed into a predicate register. We can produce the normal
cmp_vec result by using VPMOVM2*.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Place immediate values second in the comparison.
Place destination matches first in the true/false values.
All of this mirrors what we do for integer setcond and movcond.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Fold "x = cond ? y : y" to "x = y".
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Expand during output instead of during opcode generation.
Remove x86_vpblendvb_vec opcode, this this removes the only user.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move most of expansion to opcode generation, leaving the
conversion of unsigned to signed to be done in the early phase.
Small inefficiencies, but not incorrect results, are introduced
until cmpsel_vec is converted in the next patch.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Helper function to handle setting of VEXL based
on the type of the operation.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add declaration to tcg-internal.h, making it available for
use from tcg backend vector expanders.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The loop in the 32-bit case of the vector compare operation
was incorrectly incrementing by 8 bytes per iteration instead
of 4 bytes. This caused the function to process only half of
the intended elements.
Cc: qemu-stable@nongnu.org
Fixes: 9622c697d1 (tcg: Add gvec compare with immediate and scalar operand)
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240904142739.854-2-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The use of tcg_last_op does not interact well with
TCGContext.emit_before_op, resulting in the label
being linked to something other than the branch op.
In this case it is easier to simply collect the emitted
branch op and pass it directly to add_as_label_use.
Reported-by: Elisha Hollander <just4now666666@gmail.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
TCGOp to be propagated further in the next patch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Ensure the code structure is the same for matching constraints
and emitting code, lest we allow constants that cannot be
trivially tested.
Cc: qemu-stable@nongnu.org
Fixes: ad788aebba ("tcg/ppc: Support TCG_COND_TST{EQ,NE}")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2487
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <44328324-af73-4439-9d2b-d414e0e13dd7@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Argument ordering for setcond2 is:
output, a_low, a_high, b_low, b_high, cond
The test is supposed to be against b_low, not a_high.
Cc: qemu-stable@nongnu.org
Fixes: ceb9ee06b7 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240701024623.1265028-1-richard.henderson@linaro.org>
Move detection code out of tcg, similar to other hosts.
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The non-standard .fa library suffix breaks the link source
de-duplication done by Meson so drop it.
The lack of link source de-duplication causes AddressSanitizer to
complain ODR violations, and makes GNU ld abort when combined with
clang's LTO.
Fortunately, the non-standard suffix is not necessary anymore for
two reasons.
First, the non-standard suffix was necessary for fork-fuzzing.
Meson wraps all standard-suffixed libraries with --start-group and
--end-group. This made a fork-fuzz.ld linker script wrapped as well and
broke builds. Commit d2e6f9272d ("fuzz: remove fork-fuzzing
scaffolding") dropped fork-fuzzing so we can now restore the standard
suffix.
Second, the libraries are not even built anymore, because it is
possible to just use the object files directly via extract_all_objects().
The occurences of the suffix were detected and removed by performing
a tree-wide search with 'fa' and .fa (note the quotes and dot).
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-ID: <20240524-xkb-v4-4-2de564e5c859@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We used to request declare_dependency() to link_whole static libraries.
If a static library is a thin archive, GNU ld keeps all object files
referenced by the archive open, and sometimes exceeds the open file limit.
Another problem with link_whole is that suboptimal handling of nested
dependencies.
link_whole by itself does not propagate dependencies. In particular,
gnutls, a dependency of crypto, is not propagated to its users, and we
currently workaround the issue by declaring gnutls as a dependency for
each crypto user. On the other hand, if you write something like
libfoo = static_library('foo', 'foo.c', dependencies: gnutls)
foo = declare_dependency(link_whole: libfoo)
libbar = static_library('bar', 'bar.c', dependencies: foo)
bar = declare_dependency(link_whole: libbar, dependencies: foo)
executable('prog', sources: files('prog.c'), dependencies: [foo, bar])
hoping to propagate the gnutls dependency into bar.c, you'll see a
linking failure for "prog", because the foo.c.o object file is included in
libbar.a and therefore it is linked twice into "prog": once from libfoo.a
and once from libbar.a. Here Meson does not see the duplication, it
just asks the linker to link all of libfoo.a and libbar.a into "prog".
Instead of using link_whole, extract objects included in static libraries
and pass them to declare_dependency(); and then the dependencies can be
added as well so that they are propagated, because object files on the
linker command line are always deduplicated.
This requires Meson 1.1.0 or later.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-ID: <20240524-objects-v1-1-07cbbe96166b@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 45ccdbcb24.
The x86-64 instruction set can now be tuned down to x86-64 v1
or i386 Pentium Pro.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Simplify the logic for two-part, 32-bit pc-relative addresses.
Rather than assume all such fit in int32_t, do some arithmetic
and assert a result, do some arithmetic first and then check
to see if the pieces are in range.
Cc: qemu-stable@nongnu.org
Fixes: dacc51720d ("tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi")
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reported-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Fixes a bug in the immediate shifts, because the exact
encoding depends on the element size.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use TCG_VEC_TMP0 directly.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Each element size has a different encoding, so code cannot
be shared in the same way as with tcg_out_dup_vec.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can implement this with fld_d, fst_d for load and store,
and then use the normal v128 operations in registers.
This will improve support for guests which use v64.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
QEMU now requires an x86-64-v2 host, which has the POPCNT instruction.
Use it freely in TCG-generated code.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU now requires an x86-64-v2 host, which always has CMOV.
Use it freely in TCG generated code.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>