Move return address calculation and WINDOW_START adjustment out of the
retw helper to simplify logic a bit and avoid using registers directly.
Pass a0 as a parameter to the helper.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Opcodes that modify WINDOW_BASE SR don't have dependency on opcodes that
use windowed registers. If such opcodes are combined in a single
instruction they may not be correctly ordered. Instead of adding said
dependency use temporary register to store changed WINDOW_BASE value and
do actual register window rotation as a postprocessing step.
Not all opcodes that change WINDOW_BASE need this: retw, rfwo and rfwu
are also jump opcodes, so they are guaranteed to be translated last and
thus will not affect other opcodes in the same instruction.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
INTERRUPT special register may be changed both by the core (by writing
to INTSET and INTCLEAR registers) and by external events (by triggering
and clearing HW IRQs). In MTTCG this state must be protected from
concurrent access, otherwise interrupts may be lost or spurious
interrupts may be detected.
Use atomic operations to change INTSET SR.
Fix wsr.intset so that it soesn't clear any bits.
Fix wsr.intclear so that it doesn't clear bit that corresponds to NMI.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Don't invalidate TB with the end of zero overhead loop when LBEG or LEND
change. Instead encode the distance from the start of the page where the
TB starts to the LEND in the TB cs_base and generate loopback code when
the next PC matches encoded LEND. Distance to a destination within the
same page and up to a maximum instruction length into the next page is
encoded literally, otherwise it's zero. The distance from LEND to LBEG
is also encoded in the cs_base: it's encoded literally when less than
256 or as 0 otherwise. This allows for TB chaining for the loopback
branch at the end of a loop for the most common loop sizes.
With this change the resulting emulation speed is about 10% higher in
softmmu mode on uClibc-ng and LTP tests. Emulation speed in linux
user mode is a few percent lower because there's no direct TB chaining
between different memory pages. Testing with lower limit on direct TB
chaining range shows gradual slowdown to ~15% for the block size of 64
bytes and ~50% for the block size of 32 bytes.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
- mark retw and retw.n instructions;
- extract window inderflow test from retw helper;
- put underflow exception check generation right after the overflow
check;
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
- TB flags: add XTENSA_TBFLAG_CWOE that corresponds to the architectural
CWOE state;
- entry: move CWOE check from the helper to the test_ill_entry;
- retw: move CWOE check from the helper to the test_ill_retw;
- separate instruction disassembly loop and translation loop; save
disassembly results in local array;
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Import list of syscalls from the kernel source. Conditionalize code/data
that is only used with softmmu. Implement exception handlers. Implement
signal hander (only the core registers for now, no coprocessors or TIE).
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
RER and WER are privileged instructions for accessing external
registers. External register address space is local to processor core.
There's no alignment requirements, addressable units are 32-bit wide
registers.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Xtensa cores may have a register (CCOUNT) that counts core clock cycles.
It may also have a number of registers (CCOMPAREx); when CCOUNT value
passes the value of CCOMPAREx, timer interrupt x is raised.
Currently xtensa target counts a number of completed instructions and
assumes that for CCOUNT one instruction takes one cycle to complete.
It calls helper function to update CCOUNT register at every TB end and
raise timer interrupts. This scheme works very predictably and doesn't
have noticeable performance impact, but it is hard to use with multiple
synchronized processors, especially with coming MTTCG.
Derive CCOUNT from the virtual simulation time, QEMU_CLOCK_VIRTUAL.
Use native QEMU timers for CCOMPARE timers, one timer for each register.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
We've currently got 18 architectures in QEMU, and thus 18 target-xxx
folders in the root folder of the QEMU source tree. More architectures
(e.g. RISC-V, AVR) are likely to be included soon, too, so the main
folder of the QEMU sources slowly gets quite overcrowded with the
target-xxx folders.
To disburden the main folder a little bit, let's move the target-xxx
folders into a dedicated target/ folder, so that target-xxx/ simply
becomes target/xxx/ instead.
Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part]
Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part]
Acked-by: Michael Walle <michael@walle.cc> [lm32 part]
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part]
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part]
Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part]
Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part]
Acked-by: Richard Henderson <rth@twiddle.net> [alpha part]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part]
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [crisµblaze part]
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part]
Signed-off-by: Thomas Huth <thuth@redhat.com>