When offsetted addresses of global non-static data are computed
multiple times in the same statement the x86_64 backend uses
gen_gotpcrel with offset, which implements an add insn on the
register given. load() uses the R member of the to-be-loaded
value, which doesn't yet have a reg assigned in all cases.
So use the register we're supposed to load the value into as
that register.
The first loop setting up struct arguments must not remove
elements from the vstack (via vtop--), as gen_reg needs them to
potentially evict some argument still held in registers to stack.
Swapping the arg in question to top (and back to its place) also
simplifies the vstore call itself, as not funny save/restore
or some "non-existing" stack elements need to be done.
Generally for a stack a vop-- operation conceptually clobbers
that element, so further references to it aren't allowed anymore.
See also commit 9527c4949f
On x86_64 we need to extend the reg_classes array because load()
is called for (at least) R11 too, which was not part of reg_classes
previously.
- Fix a wrong calculation for size of struct
- Handle cases where struct size isn't multple of 8
- Recover vstack after memcpy for pushing struct
- Add a float parameter for struct_assign_test1 to check SSE alignment
This enables native unwind semantics with longjmp on
win64 by putting an entry into the .pdata section for
each compiled fuction.
Also, the function now use a fixed stack and store arguments
into X(%rsp) rather than using push.
- calling conventions are different:
* only 4 registers
* stack "scratch area" is always reserved
* doubles are mirrored in normal registers
- no GOT or PIC there
- Now we can run tcc -run tcc.c successfully, though there are some bugs.
- Remove jmp_table and got_table and use text_section for got and plt entries.
- Combine buffers in tcc_relocate().
- Use R_X86_64_64 instead of R_X86_64_32 for R_DATA_32 (now the name R_DATA_32 is inappropriate...).
- Add got_table in TCCState. This approach is naive and the distance between executable code and GOT can be longer than 32bit.
- Handle R_X86_64_GOTPCREL properly. We use got_table for TCC_OUTPUT_MEMORY case for now.
- Fix load() and store() so that they access global variables via GOT.
Most change was done in #ifdef TCC_TARGET_X86_64. So, nothing should be broken by this change.
Summary of current status of x86-64 support:
- produces x86-64 object files and executables.
- the x86-64 code generator is based on x86's.
-- for long long integers, we use 64bit registers instead of tcc's generic implementation.
-- for float or double, we use SSE. SSE registers are not utilized well (we only use xmm0 and xmm1).
-- for long double, we use x87 FPU.
- passes make test.
- passes ./libtcc_test.
- can compile tcc.c. The compiled tcc can compile tcc.c, too. (there should be some bugs since the binary size of tcc2 and tcc3 is differ where tcc tcc.c -o tcc2 and tcc2 tcc.c -o tcc3)
- can compile links browser. It seems working.
- not tested well. I tested this work only on my linux box with few programs.
- calling convention of long-double-integer or struct is not exactly the same as GCC's x86-64 ABI.
- implementation of tcc -run is naive (tcc -run tcctest.c works, but tcc -run tcc.c doesn't work). Relocating 64bit addresses seems to be not as simple as 32bit environments.
- shared object support isn't unimplemented
- no bounds checker support
- some builtin functions such as __divdi3 aren't supported