There are probably still issues on x86-64 I've missed.
I've added a few new tests to abitest, which fail (2x long long and 2x double
in a struct should be passed in registers).
abitest now passes; however test1-3 fail in init_test. All other tests
pass. I need to re-test Win32 and Linux-x86.
I've added a dummy implementation of gfunc_sret to c67-gen.c so it
should now compile, and I think it should behave as before I created
gfunc_sret.
Should fix some warnings wrt. access out of array bounds.
tccelf.c: fix "static function unused" warning
x86_64-gen.c: fix "ctype.ref uninitialzed" warning and cleanup
tcc-win32.txt: remove obsolete limitation notes.
Loads of VT_LLOCAL values (which effectively represent saved
addresses of lvalues) were done in VT_INT type, loosing the upper
32 bits. Needs to be done in VT_PTR type.
When offsetted addresses of global non-static data are computed
multiple times in the same statement the x86_64 backend uses
gen_gotpcrel with offset, which implements an add insn on the
register given. load() uses the R member of the to-be-loaded
value, which doesn't yet have a reg assigned in all cases.
So use the register we're supposed to load the value into as
that register.
The first loop setting up struct arguments must not remove
elements from the vstack (via vtop--), as gen_reg needs them to
potentially evict some argument still held in registers to stack.
Swapping the arg in question to top (and back to its place) also
simplifies the vstore call itself, as not funny save/restore
or some "non-existing" stack elements need to be done.
Generally for a stack a vop-- operation conceptually clobbers
that element, so further references to it aren't allowed anymore.
See also commit 9527c4949f
On x86_64 we need to extend the reg_classes array because load()
is called for (at least) R11 too, which was not part of reg_classes
previously.
- Fix a wrong calculation for size of struct
- Handle cases where struct size isn't multple of 8
- Recover vstack after memcpy for pushing struct
- Add a float parameter for struct_assign_test1 to check SSE alignment
This enables native unwind semantics with longjmp on
win64 by putting an entry into the .pdata section for
each compiled fuction.
Also, the function now use a fixed stack and store arguments
into X(%rsp) rather than using push.
- calling conventions are different:
* only 4 registers
* stack "scratch area" is always reserved
* doubles are mirrored in normal registers
- no GOT or PIC there
- Now we can run tcc -run tcc.c successfully, though there are some bugs.
- Remove jmp_table and got_table and use text_section for got and plt entries.
- Combine buffers in tcc_relocate().
- Use R_X86_64_64 instead of R_X86_64_32 for R_DATA_32 (now the name R_DATA_32 is inappropriate...).
- Add got_table in TCCState. This approach is naive and the distance between executable code and GOT can be longer than 32bit.
- Handle R_X86_64_GOTPCREL properly. We use got_table for TCC_OUTPUT_MEMORY case for now.
- Fix load() and store() so that they access global variables via GOT.
Most change was done in #ifdef TCC_TARGET_X86_64. So, nothing should be broken by this change.
Summary of current status of x86-64 support:
- produces x86-64 object files and executables.
- the x86-64 code generator is based on x86's.
-- for long long integers, we use 64bit registers instead of tcc's generic implementation.
-- for float or double, we use SSE. SSE registers are not utilized well (we only use xmm0 and xmm1).
-- for long double, we use x87 FPU.
- passes make test.
- passes ./libtcc_test.
- can compile tcc.c. The compiled tcc can compile tcc.c, too. (there should be some bugs since the binary size of tcc2 and tcc3 is differ where tcc tcc.c -o tcc2 and tcc2 tcc.c -o tcc3)
- can compile links browser. It seems working.
- not tested well. I tested this work only on my linux box with few programs.
- calling convention of long-double-integer or struct is not exactly the same as GCC's x86-64 ABI.
- implementation of tcc -run is naive (tcc -run tcctest.c works, but tcc -run tcc.c doesn't work). Relocating 64bit addresses seems to be not as simple as 32bit environments.
- shared object support isn't unimplemented
- no bounds checker support
- some builtin functions such as __divdi3 aren't supported