Every store would always cause the tb_invalidate_phys_page_fast path to be invoked,
amounting to a 40x slowdown of stores compared to loads.
Change this code to only worry about TB invalidation for regions marked as
executable (i.e. emulated executable).
Even without uc_set_native_thunks, this change fixes most of the performance
issues seen with thunking to native calls.
Signed-off-by: Andrei Warkentin <andrei.warkentin@intel.com>
The wfi exception trigger behavior should take into account user mode,
hstatus.vtw, and the fact the an wfi might raise different types of
exceptions depending on various factors:
If supervisor mode is not present:
- an illegal instruction exception should be generated if user mode
executes and wfi instruction and mstatus.tw = 1.
If supervisor mode is present:
- when a wfi instruction is executed, an illegal exception should be triggered
if either the current mode is user or the mode is supervisor and mstatus.tw is
set.
Plus, if the hypervisor extensions are enabled:
- a virtual instruction exception should be raised when a wfi is executed from
virtual-user or virtual-supervisor and hstatus.vtw is set.
Signed-off-by: Jose Martins <josemartins90@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210420213656.85148-1-josemartins90@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This line appears to be trying to undo the effect of adding 4 to pc above,
but does so incorrectly and so ends up returning with next_pc earlier than
it was prior to decoding.
This causes the translator to malfunction because it does not expect
pc_next to decrease during decoding: this is effectively reporting that
the invalid construction has a negative size, which is impossible. The
decoder uses the increase in next_pc to decide the translation block size,
but converts it to uint16_t thereby causing a block containing _only_ an
invalid instruction to be treated as having size 65532 (reinterpreted -4)
and therefore the translation loop tries to find the next translation block
at 65532 bytes after the invalid instruction, which can cause a spurious
instruction access/page fault if the page containing that address is not
mapped as executable.
In practice we don't need to readjust the pc at all here because it is
correct to report that the invalid instruction is four bytes long. This
allows the translation loop to correctly find the next instruction, and
to avoid producing spurious TLB fills that might cause incorrect exceptions.
* Add a quick test helper macro to test_x86.c
* Add regression tests for bswap and rex prefixes
* Properly ignore REX prefixes when appropriate
* Fix bswap ax emulator detection