Every store would always cause the tb_invalidate_phys_page_fast path to be invoked,
amounting to a 40x slowdown of stores compared to loads.
Change this code to only worry about TB invalidation for regions marked as
executable (i.e. emulated executable).
Even without uc_set_native_thunks, this change fixes most of the performance
issues seen with thunking to native calls.
Signed-off-by: Andrei Warkentin <andrei.warkentin@intel.com>
The wfi exception trigger behavior should take into account user mode,
hstatus.vtw, and the fact the an wfi might raise different types of
exceptions depending on various factors:
If supervisor mode is not present:
- an illegal instruction exception should be generated if user mode
executes and wfi instruction and mstatus.tw = 1.
If supervisor mode is present:
- when a wfi instruction is executed, an illegal exception should be triggered
if either the current mode is user or the mode is supervisor and mstatus.tw is
set.
Plus, if the hypervisor extensions are enabled:
- a virtual instruction exception should be raised when a wfi is executed from
virtual-user or virtual-supervisor and hstatus.vtw is set.
Signed-off-by: Jose Martins <josemartins90@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210420213656.85148-1-josemartins90@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This line appears to be trying to undo the effect of adding 4 to pc above,
but does so incorrectly and so ends up returning with next_pc earlier than
it was prior to decoding.
This causes the translator to malfunction because it does not expect
pc_next to decrease during decoding: this is effectively reporting that
the invalid construction has a negative size, which is impossible. The
decoder uses the increase in next_pc to decide the translation block size,
but converts it to uint16_t thereby causing a block containing _only_ an
invalid instruction to be treated as having size 65532 (reinterpreted -4)
and therefore the translation loop tries to find the next translation block
at 65532 bytes after the invalid instruction, which can cause a spurious
instruction access/page fault if the page containing that address is not
mapped as executable.
In practice we don't need to readjust the pc at all here because it is
correct to report that the invalid instruction is four bytes long. This
allows the translation loop to correctly find the next instruction, and
to avoid producing spurious TLB fills that might cause incorrect exceptions.
* Add a quick test helper macro to test_x86.c
* Add regression tests for bswap and rex prefixes
* Properly ignore REX prefixes when appropriate
* Fix bswap ax emulator detection
tcg: Fix do_nonatomic_op_* vs signed operations
The smin/smax/umin/umax operations require the operands to be
properly sign extended. Do not drop the MO_SIGN bit from the
load, and additionally extend the val input.
Access to TB, DEC registers was lead to crash
spr_read_decr and others are changed to spr_read_generic
spr_write_decr and others are changed to spr_write_generic
target/i386: fix operand order for PDEP and PEXT
For PDEP and PEXT, the mask is provided in the memory (mod+r/m)
operand, and therefore is loaded in s->T0 by gen_ldst_modrm.
The source is provided in the second source operand (VEX.vvvv)
and therefore is loaded in s->T1. Fix the order in which
they are passed to the helpers.
The ram_offset allocator searches the smalest gap in the ram_offset address space.
This is slow especialy in combination with many allocation (i.e. snapshots). When
it is known that there is no gap, this is now optimized.
Uses Copy on Write to make it posible to restore the memory state after a snapshot
was made. To restore all MemoryRegions created after the snapshot are removed.
target/i386: Verify memory operand for lcall and ljmp
These two opcodes only allow a memory operand.
Lacking the check for a register operand, we used the A0 temp
without initialization, which led to a tcg abort.