This reverts commit 870271ea07.
The commit is broken, you can't unconditionally emit a PC-relative
relocation without a symbol. And if there's a symbol the addend
need to be in the relocation, not the section.
Dereferencing of absolute pointers is broken on x86_64, eg:
*(int*)NULL does not segfault but returns -4 instead
*(char*)(-10L << 20) does not return 0x55 (vsyscall page, push rbp)
tcc.h:
* cleanup struct 'Sym'
* include some 'Attributes' into 'Sym'
* in turn get rid of VT_IM/EXPORT, VT_WEAK
* re-number VT_XXX flags
* replace some 'long' function args by 'int'
tccgen.c:
* refactor parse_btype()
- configure
* use aarch64 instead of arm64
- Makefile
* rename the custom include file to "config-extra.mak"
* Also avoid "rm -r /*" if $(tccdir) is empty
- pp/Makefile
* fix .expect generation with gcc
- tcc.h
* cleanup #defines for _MSC_VER
- tccgen.c:
* fix const-propagation for &,|
* fix anonymous named struct (ms-extension) and enable
-fms-extension by default
- i386-gen.c
* clear VT_DEFSIGN
- x86_64-gen.c/win64:
* fix passing structs in registers
* fix alloca (need to keep "func_scratch" below each alloca area on stack)
(This allows to compile a working gnu-make on win64)
- tccpp.c
* alternative approach to 37999a4fbf
This is to avoid some slowdown with ## token pasting.
* get_tok_str() : return <eof> for TOK_EOF
* -funsigned-char: apply to "string" literals as well
- tccpe/tools.c: -impdef: support both 32 and 64 bit dlls anyway
If there were more than 6 integer arguments before the ellipsis, or
there were used more than 8 slots used until the ellipsis (e.g. by
a large intermediate struct) we generated wrong code. See testcase.
- configure:
- add --config-uClibc,-musl switch and suggest to use
it if uClibc/musl is detected
- make warning options magic clang compatible
- simplify (use $confvars instead of individual options)
- Revert "Remove some unused-parameter lint"
7443db0d5f
rather use -Wno-unused-parameter (or just not -Wextra)
- #ifdef functions that are unused on some targets
- tccgen.c: use PTR_SIZE==8 instead of (X86_64 || ARM64)
- tccpe.c: fix some warnings
- integrate dummy arm-asm better
The canonical way to describe a local variable that actually holds
the address of an lvalue is VT_LLOCAL. Remove the last user of VT_REF,
and handling of it, thereby freeing a flag for SValue.r.
- tccgen.c/tcc.h: allow function declaration after use:
int f() { return g(); }
int g() { return 1; }
may be a warning but not an error
see also 76cb1144ef
- tccgen.c: redundant code related to inline functions removed
(functions used anywhere have sym->c set automatically)
- tccgen.c: make 32bit llop non-equal test portable
(probably not on C67)
- dynarray_add: change prototype to possibly avoid aliasing
problems or at least warnings
- lib/alloca*.S: ".section .note.GNU-stack,"",%progbits" removed
(has no effect)
- tccpe: set SizeOfCode field (for correct upx decompression)
- libtcc.c: fixed alternative -run invocation
tcc "-run -lxxx ..." file.c
(meant to load the library after file).
Also supported now:
tcc files ... options ... -run @ arguments ...
Also:
- on windows i386 and x86-64, structures of size <= 8 are
NOT returned in registers if size is not one of 1,2,4,8.
- cleanup: put all tv-push/pop/swap/rot into one place
tccgen.c: remove any 'nocode_wanted' checks, except in
- greloca(), disables output elf symbols and relocs
- get_reg(), will return just the first suitable reg)
- save_regs(), will do nothing
Some minor adjustments were made where nocode_wanted is set.
xxx-gen.c: disable code output directly where it happens
in functions:
- g(), output disabled
- gjmp(), will do nothing
- gtst(), dto.
on 32bit long long support was sometimes broken. This fixes
code-gen for long long values in switches, disables a x86-64 specific
testcase and avoid an undefined shift amount. It comments out
a bitfield test involving long long bitfields > 32 bit; with GCC layout
they can straddle multiple words and code generation isn't prepared
for this.
The callee saved registers (among them r12-r15) really need
saving/restoring if mentioned in asm clobbers, even if TCC
itself doesn't use them. E.g. the linux kernel relies on that
in its switch_to() implementation.
Some routines were using the wrong type (int) in passing addends,
truncating it. This matters when bit 31 isn't set and the high
32 bits are set: the truncation would make it unsigned where in
reality it's signed (happen e.g. on the x86-64 with it's load
address at top-2GB).
This target has _32 and _32S relocs (the latter being for signed
32 bit entities). All instruction displacements have to use
the 32S variants. Normal references like
.long s
normally would use the _32 variant. For normal executables this
doesn't matter. For shared libraries neither (which use PC-relative
relocs). But it matters for things like the kernel that are linked
to high addresses (signed ones). There the GNU linker would error
out on overflow for the _32 variant.
To keep life simple we simply switch from _32 to _32S altogether.
Strictly speaking it's still wrong, but in practice using _32 is
more often wrong than using _32S ;)
C standard specifies that array should be declared with a non null size
or with * for standard array. Declaration of relocs_info in tcc.h was
not respecting this rule. This commit add a R_NUM macro that maps to the
R_<ARCH>_NUM macros and declare relocs_info using it. This commit also
moves all linker-related macros from <arch>-gen.c files to <arch>-link.c
ones.
With -b, this produces garbage. Code to call __bound_local_new
is put at wrong place, overwriting the regparam setup code.
Fix copied from x86_64-gen.c.
void __attribute__((regparm(3)))
fun(int unused)
{
char local[1];
}
- call RtlDeleteFunctionTable
(important for multiple compilations)
- the RUNTIME_FUNCTION* is now at the beginning of the
runtime memory. Therefor when tcc_relocate is called
with user memory, this should be done manually before
it is free'd:
RtlDeleteFunctionTable(*(void**)user_mem);
[ free(user_mem); ]
- x86_64-gen.c: expand char/short return values to int
fixes 5c35ba66c5
Implementation was consistent within tcc but incompatible
with the ABI (for example library functions vprintf etc)
Also:
- tccpp.c/get_tok_str() : avoid "unknown format "%llu" warning
- x86_64_gen.c/gen_vla_alloc() : fix vstack leak
Traditional behaviour on x86-64 is to encode the relocation
addend in r_addend, not in the relocated field (after all,
that's the reason to use RELA relocs to begin with). Our
linker can deal with both, other linkers as well. But using
e.g. the GNU assembler one can detect differences (equivalent
code in the end, but still a difference).
Now there's only a trivial difference in tests/asmtest.S
(having to do with ordering of prefixes).
* Documentation is now in "docs".
* Source code is now in "src".
* Misc. fixes here and there so that everything still works.
I think I got everything in this commit, but I only tested this
on Linux (Make) and Windows (CMake), so I might've messed
something up on other platforms...
Jsut for testing. It works for me (don't break anything)
Small fixes for x86_64-gen.c in "tccpp: fix issues, add tests"
are dropped in flavor of this patch.
Pip Cet:
Okay, here's a first patch that fixes the problem (but I've found
another bug, yet unfixed, in the process), though it's not
particularly pretty code (I tried hard to keep the changes to the
minimum necessary). If we decide to actually get rid of VT_QLONG and
VT_QFLOAT (please, can we?), there are some further simplifications in
tccgen.c that might offset some of the cost of this patch.
The idea is that an integer is no longer enough to describe how an
argument is stored in registers. There are a number of possibilities
(none, integer register, two integer registers, float register, two
float registers, integer register plus float register, float register
plus integer register), and instead of enumerating them I've
introduced a RegArgs type that stores the offsets for each of our
registers (for the other architectures, it's simply an int specifying
the number of registers). If someone strongly prefers an enum, we
could do that instead, but I believe this is a place where keeping
things general is worth it, because this way it should be doable to
add SSE or AVX support.
There is one line in the patch that looks suspicious:
} else {
addr = (addr + align - 1) & -align;
param_addr = addr;
addr += size;
- sse_param_index += reg_count;
}
break;
However, this actually fixes one half of a bug we have when calling a
function with eight double arguments "interrupted" by a two-double
structure after the seventh double argument:
f(double,double,double,double,double,double,double,struct { double
x,y; },double);
In this case, the last argument should be passed in %xmm7. This patch
fixes the problem in gfunc_prolog, but not the corresponding problem
in gfunc_call, which I'll try tackling next.
* fix some macro expansion issues
* add some pp tests in tests/pp
* improved tcc -E output for better diff'ability
* remove -dD feature (quirky code, exotic feature,
didn't work well)
Based partially on ideas / researches from PipCet
Some issues remain with VA_ARGS macros (if used in a
rather tricky way).
Also, to keep it simple, the pp doesn't automtically
add any extra spaces to separate tokens which otherwise
would form wrong tokens if re-read from tcc -E output
(such as '+' '=') GCC does that, other compilers don't.
* cleanups
- #line 01 "file" / # 01 "file" processing
- #pragma comment(lib,"foo")
- tcc -E: forward some pragmas to output (pack, comment(lib))
- fix macro parameter list parsing mess from
a3fc543459a715d7143d
(some coffee might help, next time ;)
- introduce TOK_PPSTR - to have character constants as
written in the file (similar to TOK_PPNUM)
- allow '\' appear in macros
- new functions begin/end_macro to:
- fix switching macro levels during expansion
- allow unget_tok to unget more than one tok
- slight speedup by using bitflags in isidnum_table
Also:
- x86_64.c : fix decl after statements
- i386-gen,c : fix a vstack leak with VLA on windows
- configure/Makefile : build on windows (MSYS) was broken
- tcc_warning: fflush stderr to keep output order (win32)
Author: Philip <pipcet@gmail.com>
Our VLA code can be made a lot simpler (simple enough for
even me to understand it) by giving up on the optimization idea, which
is very tempting. There's a patch to do that attached, feel free to
test and commit it if you like. (It passes all the tests, at least
The old code assumed that if an argument doesn't fit into the available
registers, none of the subsequent arguments do, either. But that's
wrong: passing 7 doubles, then a two-double struct, then another double
should generate code that passes the 9th argument in the 8th register
and the two-double struct on the stack. We now do so.
However, this patch does not yet fix the function calling code to do the
right thing in the same case.
The comment suggests this was meant to detect unions, but in fact it
compared f->c, the union/struct size, against f->next->c, the first
element's offset.
This affected only zero-length structs/unions with a first (zero-length)
element, as in this code:
struct u2 {
};
struct u {
struct u2 u2;
} u;
struct u f(struct u x)
{
return x;
}
However, such structures turned out to be broken anyway, as code like this
was generated for the above f:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 81 ec 10 00 00 00 sub $0x10,%rsp
b: 66 0f d6 45 f8 movq %xmm0,-0x8(%rbp)
10: 66 0f 6e 45 f8 movd -0x8(%rbp),%xmm0
15: e9 00 00 00 00 jmpq 1a <f+0x1a>
1a: c9 leaveq
1b: c3 retq
I ran into an issue playing with tinycc, and tracked it down to a rather
weird assumption in the function calling code. This breaks only when
varargs and float/double arguments are combined, I think, and only when
calling GCC-generated (or non-TinyCC, at least) code. The problem is we
sometimes generate code like this:
804a468: 4c 89 d9 mov %r11,%rcx
804a46b: b8 01 00 00 00 mov $0x1,%eax
804a470: 48 8b 45 c0 mov -0x40(%rbp),%rax
804a474: 4c 8b 18 mov (%rax),%r11
804a477: 41 ff d3 callq *%r11
for a function call. Note how $eax is first set to the correct value,
then clobbered when we try to load the function pointer into R11. With
the patch, the code generated is:
804a468: 4c 89 d9 mov %r11,%rcx
804a46b: b8 01 00 00 00 mov $0x1,%eax
804a470: 4c 8b 5d c0 mov -0x40(%rbp),%r11
804a474: 4d 8b 1b mov (%r11),%r11
804a477: 41 ff d3 callq *%r11
which is correct.
This becomes an issue when get_reg(RC_INT) is modified not always to
return %rax after a save_regs(0), because then another register (%ecx,
say) is clobbered, and the function passed an invalid argument.
A rather convoluted test case that generates the above code is
included. Please note that the test will not cause a failure because
TinyCC code ignores the %rax argument, but it will cause incorrect
behavior when combined with GCC code, which might wrongly fail to save
XMM registers and cause data corruption.
Verify an immediate value fits into 32 bits before jumping to it/calling
it with a 32-bit immediate operand. Without this fix, code along the
lines of
((int (*)(const char *, ...))140244834372944LL)("hi\n");
will fail mysteriously, even if that decimal constant is the correct
address for printf.
See https://github.com/pipcet/tinycc/tree/bugfix-1