TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use
the later instead.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
movcond operation can be implemented on MIPS32 Release 2 using the MOVN,
MOVZ, SLT and SLTU instructions.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
deposit operations can be optimized on MIPS32 Release 2 using the INS
instruction.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
rotr operations can be optimized on MIPS32 Release 2 using the ROTR and
ROTRV instructions. Also implemented rotl operations by subtracting the
shift from 32.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
bswap operations can be optimized on MIPS32 Release 2 using the ROTR,
WSBH and SEH instructions. We can't use the non-R2 code to implement the
ops due to registers constraints, so don't define the corresponding
TCG_TARGET_HAS_bswap* values.
Also bswap16* operations are supposed to be called with the 16 high bits
zeroed. This is the case everywhere (including for TCG by definition)
except when called from the store helper. Remove the AND instructions from
bswap16* and move it there.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
MIPS has some conditional branch instructions when comparing with zero.
Use them.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use stack instead of temp_buf array in CPUState for TCG
temps.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of int, use the correct TCGArg and TCGReg type: TCGReg when
representing a TCG target register, TCGArg when representing the latter
or a constant.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Recent versions of GCC emit warnings when compiling user mode targets.
Kill them by reordering a bit the #ifdef.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The 'Z' constraint has been introduced to map the zero register. However
when the op also accept a constant, there is no point to accept the zero
register in addition.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The xtensa-test image generates a sra_i32 with count 0x40.
Whether this is accident of tcg constant propagation or
originating directly from the instruction stream is immaterial.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Don't use -ffixed-gN. Don't link statically. Don't save/restore
AREG0 around calls. Don't allocate space on the stack for AREG0 save.
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the same time, split out the tlb load logic to a new function.
Fixes the cases of two data registers and two address registers.
Fixes the signature of, and adds missing, qemu_ld/st opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Current code doesn't actually work in 32-bit mode at all. Since
no one really noticed, drop the complication of v7 and v8 cpus.
Eliminate the --sparc_cpu configure option and standardize macro
testing on TCG_TARGET_REG_BITS / HOST_LONG_BITS
Signed-off-by: Richard Henderson <rth@twiddle.net>
The CONFIG_TCG_PASS_AREG0 code for calling ld/st helpers
was not respecting the ABI requirement for 64-bit values
being aligned in registers.
Mirror the ARM port in use of helper functions to marshal
arguments into the correct registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Neither of these functions were performing double-word
compares properly.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 6375e09e changed the type of TranslationBlock.tb_next,
but failed to change the type of TCGContext.tb_next.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While swapping constants to the second operand, swap
sources matching destinations to the first operand.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implemented with setcond if the target does not provide
the optional opcode.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit e31b0a7c05 fixed copy propagation on
32-bit host by restricting the copy between different types. This was the
wrong fix.
The real problem is that the all temps states should be reset at the end
of a basic block. This was done by adding such operations in the switch,
but brcond2 was forgotten (that's why the crash was only observed on 32-bit
hosts).
Fix that by looking at the TCG_OPF_BB_END instead. We need to keep the case
for op_set_label as temps might be modified through another path.
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Given the copy propagation breakage on 32-bit hosts has been fixed
commit e31b0a7c05 can be reverted.
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
set_label is effectively the end of a basic block, as no optimization
can be made accross it. It was treated as such in the liveness analysis
code, but as a special case.
Mark it with TCG_OPF_BB_END flag so that this information can be used
by other parts of the TCG code, and remove the special case in the liveness
analysis code.
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On x86, it is possible to move a constant value to memory. Add code to
handle a constant argument to load/store ops.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets,
remove dead code and support for !CONFIG_TCG_PASS_AREG0 case.
Remove dyngen-exec.h and all references to it. Although included by
hw/spapr_hcall.c, it does not seem to use it.
Remove unused HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
optimizer.c contains some cases were the break is appearing in both the
if and the else parts. Fix that by moving it to the outer part. Also
move some common code there.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
brcond and setcond ops are not commutative, but it's easy to compute the
new condition after swapping the arguments. Try to always put the constant
argument in second position like for commutative ops, to help backends to
generate better code.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Split expression simplification in multiple parts so that a given op
can appear multiple times. This patch should not change anything.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that there are two passes of optimization (optimize.c, liveness)
there is no point of outputing the statistics of the liveness part
only. Update the code to take into account both optimizations.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The load/store slow path has been broken in e141ab52d:
- We need to move 4 registers for store functions and 3 registers for
load functions and not the reverse.
- According to the s390x calling convention the arguments of a function
should be zero extended. This means that the register shift should be
done with TCG_TYPE_I64 to ensure the higher word is correctly zero
extended when needed.
I am aware that CONFIG_TCG_PASS_AREG0 is being removed and thus that
this patch can be improved, but doing so means it can also be applied to
the 1.1 and 1.2 stable branches.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
The CONFIG_TCG_PASS_AREG0 code for calling ld/st helpers was
broken in that it did not respect the ABI requirement that 64
bit values were passed in even-odd register pairs. The simplest
way to fix this is to implement some new utility functions
for marshalling function arguments into the correct registers
and stack, so that the code which sets up the address and
data arguments does not need to care whether there has been
a preceding env argument.
Based on commit 9716ef3b for ARM by Peter Maydell.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Store slow path has been broken in e141ab52d:
- the arguments are shifted before the last one (mem_index) is written.
- the shift is done for both slow and fast paths.
Fix that. Also optimize a bit by bundling the move together. This still
can be optimized, but it's better to wait for a decision to be taken on
the arguments order.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The CONFIG_TCG_PASS_AREG0 code for calling ld/st helpers was
broken in that it did not respect the ABI requirement that 64
bit values were passed in even-odd register pairs. The simplest
way to fix this is to implement some new utility functions
for marshalling function arguments into the correct registers
and stack, so that the code which sets up the address and
data arguments does not need to care whether there has been
a preceding env argument.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
If tci_out_label is called in the context of tcg_gen_code_search_pc, we
could be overwriting an already patched relocation with zero -- and not
repatch it because the set_label is past search_pc, causing a QEMU crash
when it tries to branch to a zero label.
Not writing anything to the relocation area seems to be in line with what
other backends do from the couple I looked at (x86, ppc).
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Commit eeacee4d86 changed the syntax of tcg_dump_ops, but didn't convert
all users (notably missing the ppc ones) to it. Fix them to the new syntax.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: malc <av1474@comtv.ru>
Don't use global variables directly but via accessor functions. Rename globals.
Convert macros to functions, add GCC format attributes.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
does not define _CALL_DARWIN, leading to unexpected behavior w.r.t.
register clobbering and stack frame layout.
Since _CALL_DARWIN is a reserved identifier, define a custom
TCG_TARGET_CALL_DARWIN based on either _CALL_DARWIN or __APPLE__.
Signed-off-by: Andreas F?rber <andreas.faerber@web.de>
Signed-off-by: malc <av1474@comtv.ru>
In qemu_ld/st load the registers for the helper calls directly rather
than rotating them around afterwards for AREG0.
Also clobber the additional register.
Signed-off-by: Andreas F?rber <afaerber@suse.de>
Signed-off-by: malc <av1474@comtv.ru>
Adjust the tcg_out_qemu_{ld,st}() slow paths to pass AREG0 in r3,
based on patches by malc.
Also adjust the registers clobbered, based on patch by Alex.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
[AF: Do not hardcode r3 for AREG0, requested by Alex]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This accounts for the additional addr_reg2 register.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Also assure i64 alignment where necessary.
Alignment code optimization suggested by malc.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For targets where TARGET_LONG_BITS != 32, i.e. 64-bit guests,
addr_reg is moved to r4. For hosts without TCG_TARGET_CALL_ALIGN_ARGS
either data_reg2 or data_reg or a masked version thereof would overwrite
r4. Place it in r5 instead, matching TCG_TARGET_CALL_ALIGN_ARGS hosts.
This fixes immediate crashes of 64-bit guests observed on Darwin/ppc but
not on Darwin/ppc64.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Acked-by: malc <av1474@comtv.ru>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
w64 uses the registers rcx, rdx, r8 and r9 for function arguments,
so it needs a different declaration of tcg_target_call_iarg_regs.
rax, rcx, rdx, r8, r9, r10 and r11 may be changed by function calls.
rbx, rbp, rdi, rsi, r12, r13, r14 and r15 remain unchanged by function calls.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Not all i386 / x86_64 hosts use ELF.
Ask the compiler whether ELF is used.
On w64, gdb crashes when ELF_HOST_MACHINE is defined.
Cc: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
There two entries of INDEX_op_ld_i64 in the ppc_op_defs. That causes an
assertion failure in tcg_add_target_add_op_defs() when --enable-debug is
used on a ppc64 backend (that's ppc64 host, not target).
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: malc <av1474@comtv.ru>
This allows us to actually supply a function name in softmmu builds;
gdb doesn't pick up the minimal symbol table otherwise. Also add a
bit of documentation and statically generate more of the ELF image.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This allows us to generate unwind info for the dynamicly generated
code in the code_gen_buffer. Only i386 is converted at this point.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Optionally, make memory access helpers take a parameter for CPUState
instead of relying on global env.
On most targets, perform simple moves to reorder registers. On i386,
switch from regparm(3) calling convention to standard stack-based
version.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use stack based calling convention (GCC default) for interfacing with
generated code instead of register based convention (regparm(3)).
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
next_tb is the numeric value of a tcg target (= QEMU host) address.
Using tcg_target_ulong instead of unsigned long shows this and makes
the code portable for hosts with an unusual size of long (w64).
The type cast '(long)(next_tb & ~3)' was not needed (casting
unsigned long to long does not change the bits, and nor does
casting long to pointer for most (= all non w64) hosts.
It is removed here.
Macro or function tcg_qemu_tb_exec is used to set next_tb.
The function also returns next_tb. Therefore tcg_qemu_tb_exec
must return a tcg_target_ulong.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
An attempt to allocate a large memory chunk after a small one resulted in
circular links in list of pools. It caused the same memory being
allocated twice for different arrays.
Now pools for large memory chunks are kept in separate list and are
freed during pool reset because current allocator can not reuse them.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Scripted conversion:
for file in *.[hc] hw/*.[hc] hw/kvm/*.[hc] linux-user/*.[hc] linux-user/m68k/*.[hc] bsd-user/*.[hc] darwin-user/*.[hc] tcg/*/*.[hc] target-*/cpu.h; do
sed -i "s/CPUState/CPUArchState/g" $file
done
All occurrences of CPUArchState are expected to be replaced by QOM CPUState,
once all targets are QOM'ified and common fields have been extracted.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
tcg_out_label is always called with a third argument of pointer type
which was casted to tcg_target_long.
These casts can be avoided by changing the prototype of tcg_out_label.
There was also a cast to long. For most hosts with
sizeof(long) == sizeof(tcg_target_long) == sizeof(void *) this did not
matter, but for w64 it was wrong. This is fixed now.
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The TCG targets i386 and tci needed a change of the function
prototype for w64.
This change is currently not needed for the other TCG targets,
but it can be applied to avoid code differences.
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
flush_icache_range takes two address parameters which must be large
enough to address any address of the host.
For hosts with sizeof(unsigned long) == sizeof(void *), this patch
changes nothing. All currently supported hosts fall into this category.
For w64 hosts, sizeof(unsigned long) is 4 while sizeof(void *) is 8,
so the use of tcg_target_ulong is needed for i386 and tci (the tcg
targets which work with w64).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This change makes tcg_target_ulong available in tcg-target.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The standard include files are already included in qemu-common.h.
malloc.h and alloca.h were needed for alloca() which was removed
from TCG code some years ago when switching from dyngen to TCG
(see commit 49516bc0d6).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
ARM still doesn't support 16GB buffers in 32-bit modes, replace the
16GB by 16MB in the comment.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
On ARM, in Thumb mode r7 is used for the framepointer; this meant
that we would fail to compile in debug mode because we were using r7
for TCG_AREG0. Shift to r6 instead to avoid this clash.
(Bug reported as LP:870990.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
On ARM, don't map the code buffer at a fixed location, and fix up the
call/goto tcg routines to let it do long jumps.
Mapping the code buffer at a fixed address could sometimes result in it being
mapped over the top of the heap with pretty random results.
Signed-off-by: Dr. David Alan Gilbert <david.gilbert@linaro.org>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
Make tcg_const_ptr() include a cast so that you can pass it a
pointer. This allows us to drop the casts we had in all the places
that use this macro.
Acked-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
TCG_TARGET_REG_BITS is declared in tcg.h for all TCG targets.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
This is standard for other tcg targets and improves tci, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In both cases, val is computed, but then not used in the
subsequent line, which then re-computes the quantity in
a different type (int32_t vs unsigned long).
Keep the computation type that's been working so far.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* 's390-1.0' of git://repo.or.cz/qemu/agraf:
s390x: initialize virtio dev region
tcg: Use TCGReg for standard tcg-target entry points.
tcg: Standardize on TCGReg as the enum for hard registers
s390x: Add shutdown for TCG s390-virtio machine
s390: Fix cpu shutdown for KVM
s390: fix short kernel command lines
s390: fix reset hypercall to reset the status
s390x: implement SIGP restart and shutdown
s390x: implement rrbe instruction properly
s390x: update R and C bits in storage key
s390x: make ipte 31-bit aware
s390x: add ldeb instruction
Including tcg_out_ld, tcg_out_st, tcg_out_mov, tcg_out_movi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Most targets did not name the enum; tci used TCGRegister.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
tcg/ppc64/tcg-target.c has a couple of places where variables are set
unconditionally, but otherwise used only for softmmu builds, not
userspace only builds. This causes compiler warnings (which are fatal
by default) when compiling for a ppc64 host with gcc 4.6. This patch
fixes the problem by moving the code which defines and sets the
variables into the CONFIG_SOFTMMU guarded regions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
* 'tci' of git://qemu.weilnetz.de/qemu:
tcg: Add tcg interpreter to configure / make
tcg: Add tci disassembler
tcg: Add interpreter for bytecode
tcg: Add bytecode generator for tcg interpreter
tcg: Make ARRAY_SIZE(tcg_op_defs) globally available
tcg: TCG targets may define tcg_qemu_tb_exec
The error being caused by the failure to copy the other half of
the input to the output after having narrowed the deposit operation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: malc <av1474@comtv.ru>
Unlike other tcg target code generators, this one does not generate
machine code for some cpu. It generates machine independent bytecode
which is interpreted later.
This allows running QEMU on any host.
Interpreted bytecode is slower than direct execution of generated
machine code.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
tcg_op_defs was already a global array.
The tci disassembler also needs ARRAY_SIZE(tcg_op_defs),
so add a new global constant with this value.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Targets may use a non standard definition of tcg_tb_exec
by defining this macro in their tcg_target.h.
This is used here by ppc. It will be used by the TCG interpreter, too.
Cc: malc <av1474@comtv.ru>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
If the deposit replaces the entire word, optimize to a move.
If we're inserting to the top of the word, avoid the mask of arg2
as we'll be shifting out all of the garbage and shifting in zeros.
If the host is 32-bit, reduce a 64-bit deposit to a 32-bit deposit
when possible.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>