2011-10-05 22:03:02 +04:00
|
|
|
TCG Interpreter (TCI) - Copyright (c) 2011 Stefan Weil.
|
|
|
|
|
|
|
|
This file is released under the BSD license.
|
|
|
|
|
|
|
|
1) Introduction
|
|
|
|
|
|
|
|
TCG (Tiny Code Generator) is a code generator which translates
|
|
|
|
code fragments ("basic blocks") from target code (any of the
|
|
|
|
targets supported by QEMU) to a code representation which
|
|
|
|
can be run on a host.
|
|
|
|
|
2016-09-13 17:25:52 +03:00
|
|
|
QEMU can create native code for some hosts (arm, i386, ia64, ppc, ppc64,
|
2011-10-05 22:03:02 +04:00
|
|
|
s390, sparc, x86_64). For others, unofficial host support was written.
|
|
|
|
|
|
|
|
By adding a code generator for a virtual machine and using an
|
|
|
|
interpreter for the generated bytecode, it is possible to
|
|
|
|
support (almost) any host.
|
|
|
|
|
|
|
|
This is what TCI (Tiny Code Interpreter) does.
|
|
|
|
|
|
|
|
2) Implementation
|
|
|
|
|
|
|
|
Like each TCG host frontend, TCI implements the code generator in
|
2020-02-04 14:41:01 +03:00
|
|
|
tcg-target.c.inc, tcg-target.h. Both files are in directory tcg/tci.
|
2011-10-05 22:03:02 +04:00
|
|
|
|
tcg/tci: Change encoding to uint32_t units
This removes all of the problems with unaligned accesses
to the bytecode stream.
With an 8-bit opcode at the bottom, we have 24 bits remaining,
which are generally split into 6 4-bit slots. This fits well
with the maximum length opcodes, e.g. INDEX_op_add2_i32, which
have 6 register operands.
We have, in previous patches, rearranged things such that there
are no operations with a label which have more than one other
operand. Which leaves us with a 20-bit field in which to encode
a label, giving us a maximum TB size of 512k -- easily large.
Change the INDEX_op_tci_movi_{i32,i64} opcodes to tci_mov[il].
The former puts the immediate in the upper 20 bits of the insn,
like we do for the label displacement. The later uses a label
to reference an entry in the constant pool. Thus, in the worst
case we still have a single memory reference for any constant,
but now the constants are out-of-line of the bytecode and can
be shared between different moves saving space.
Change INDEX_op_call to use a label to reference a pair of
pointers in the constant pool. This removes the only slightly
dodgy link with the layout of struct TCGHelperInfo.
The re-encode cannot be done in pieces.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-02 10:27:41 +03:00
|
|
|
The additional file tcg/tci.c adds the interpreter and disassembler.
|
2011-10-05 22:03:02 +04:00
|
|
|
|
tcg/tci: Change encoding to uint32_t units
This removes all of the problems with unaligned accesses
to the bytecode stream.
With an 8-bit opcode at the bottom, we have 24 bits remaining,
which are generally split into 6 4-bit slots. This fits well
with the maximum length opcodes, e.g. INDEX_op_add2_i32, which
have 6 register operands.
We have, in previous patches, rearranged things such that there
are no operations with a label which have more than one other
operand. Which leaves us with a 20-bit field in which to encode
a label, giving us a maximum TB size of 512k -- easily large.
Change the INDEX_op_tci_movi_{i32,i64} opcodes to tci_mov[il].
The former puts the immediate in the upper 20 bits of the insn,
like we do for the label displacement. The later uses a label
to reference an entry in the constant pool. Thus, in the worst
case we still have a single memory reference for any constant,
but now the constants are out-of-line of the bytecode and can
be shared between different moves saving space.
Change INDEX_op_call to use a label to reference a pair of
pointers in the constant pool. This removes the only slightly
dodgy link with the layout of struct TCGHelperInfo.
The re-encode cannot be done in pieces.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-02 10:27:41 +03:00
|
|
|
The bytecode consists of opcodes (with only a few exceptions, with
|
|
|
|
the same same numeric values and semantics as used by TCG), and up
|
|
|
|
to six arguments packed into a 32-bit integer. See comments in tci.c
|
|
|
|
for details on the encoding.
|
2011-10-05 22:03:02 +04:00
|
|
|
|
|
|
|
3) Usage
|
|
|
|
|
|
|
|
For hosts without native TCG, the interpreter TCI must be enabled by
|
|
|
|
|
|
|
|
configure --enable-tcg-interpreter
|
|
|
|
|
|
|
|
If configure is called without --enable-tcg-interpreter, it will
|
|
|
|
suggest using this option. Setting it automatically would need
|
|
|
|
additional code in configure which must be fixed when new native TCG
|
|
|
|
implementations are added.
|
|
|
|
|
|
|
|
For hosts with native TCG, the interpreter TCI can be enabled by
|
|
|
|
|
|
|
|
configure --enable-tcg-interpreter
|
|
|
|
|
|
|
|
The only difference from running QEMU with TCI to running without TCI
|
|
|
|
should be speed. Especially during development of TCI, it was very
|
|
|
|
useful to compare runs with and without TCI. Create /tmp/qemu.log by
|
|
|
|
|
2023-04-17 19:40:37 +03:00
|
|
|
qemu-system-i386 -d in_asm,op_opt,cpu -D /tmp/qemu.log -accel tcg,one-insn-per-tb=on
|
2011-10-05 22:03:02 +04:00
|
|
|
|
|
|
|
once with interpreter and once without interpreter and compare the resulting
|
|
|
|
qemu.log files. This is also useful to see the effects of additional
|
|
|
|
registers or additional opcodes (it is easy to modify the virtual machine).
|
|
|
|
It can also be used to verify native TCGs.
|
|
|
|
|
|
|
|
Hosts with native TCG can also enable TCI by claiming to be unsupported:
|
|
|
|
|
|
|
|
configure --cpu=unknown --enable-tcg-interpreter
|
|
|
|
|
|
|
|
configure then no longer uses the native linker script (*.ld) for
|
|
|
|
user mode emulation.
|
|
|
|
|
|
|
|
|
|
|
|
4) Status
|
|
|
|
|
|
|
|
TCI needs special implementation for 32 and 64 bit host, 32 and 64 bit target,
|
|
|
|
host and target with same or different endianness.
|
|
|
|
|
|
|
|
| host (le) host (be)
|
|
|
|
| 32 64 32 64
|
|
|
|
------------+------------------------------------------------------------
|
|
|
|
target (le) | s0, u0 s1, u1 s?, u? s?, u?
|
|
|
|
32 bit |
|
|
|
|
|
|
|
|
|
target (le) | sc, uc s1, u1 s?, u? s?, u?
|
|
|
|
64 bit |
|
|
|
|
|
|
|
|
|
target (be) | sc, u0 sc, uc s?, u? s?, u?
|
|
|
|
32 bit |
|
|
|
|
|
|
|
|
|
target (be) | sc, uc sc, uc s?, u? s?, u?
|
|
|
|
64 bit |
|
|
|
|
|
|
|
|
|
|
|
|
|
System emulation
|
|
|
|
s? = untested
|
|
|
|
sc = compiles
|
|
|
|
s0 = bios works
|
|
|
|
s1 = grub works
|
|
|
|
s2 = Linux boots
|
|
|
|
|
|
|
|
Linux user mode emulation
|
|
|
|
u? = untested
|
|
|
|
uc = compiles
|
|
|
|
u0 = static hello works
|
|
|
|
u1 = linux-user-test works
|
|
|
|
|
|
|
|
5) Todo list
|
|
|
|
|
|
|
|
* TCI is not widely tested. It was written and tested on a x86_64 host
|
|
|
|
running i386 and x86_64 system emulation and Linux user mode.
|
|
|
|
A cross compiled QEMU for i386 host also works with the same basic tests.
|
|
|
|
A cross compiled QEMU for mipsel host works, too. It is terribly slow
|
|
|
|
because I run it in a mips malta emulation, so it is an interpreted
|
|
|
|
emulation in an emulation.
|
|
|
|
A cross compiled QEMU for arm host works (tested with pc bios).
|
|
|
|
A cross compiled QEMU for ppc host works at least partially:
|
|
|
|
i386-linux-user/qemu-i386 can run a simple hello-world program
|
|
|
|
(tested in a ppc emulation).
|
|
|
|
|
|
|
|
* Some TCG opcodes are either missing in the code generator and/or
|
|
|
|
in the interpreter. These opcodes raise a runtime exception, so it is
|
|
|
|
possible to see where code must be added.
|
|
|
|
|
|
|
|
* It might be useful to have a runtime option which selects the native TCG
|
|
|
|
or TCI, so QEMU would have to include two TCGs. Today, selecting TCI
|
|
|
|
is a configure option, so you need two compilations of QEMU.
|