With the last improvements to lexpand it's now harmful
to use on native 64bit platforms when not necessary. For gv_dup
it's not necessary there. It can still be used with really
transforming a 64bit value into two 32bit ones.
Previously, long longs were 'lexpand'ed into two registers
always.
Now, it expands
- constants into two constants (lo-part, hi-part)
- variables into two lvalues with offset+4 for the hi-part.
This makes long long operations look a bit nicer.
Also: don't apply i386 'inc/dec' optimization if carry
generation is wanted.
gen_cast() failed to truncate long long's if they
were unsigned, which was causing mess on the vstack.
There was a similar bug here
tccgen: 32bits: fix PTR +/- long long
ed15cddacd
Both were not visible until this patch
tccgen: arm/i386: save_reg_upstack
b691585785
I'd still assume that this patch is correct per se.
Also:
- remove 2x !nocode_wanted (we are already under a general
"else if (!nocode_wanted)" clause above).
__GNUC__ nowadays as macro seems to mean the "GNU C dialect"
rather than the compiler itself. See also
http://gcc.gnu.org/ml/gcc/2008-07/msg00026.html
This patch will probably cause problems of various kinds but
maybe we should try nonetheless.
With -run the call instruction and a defined function can be
far away, if the function is defined in the executable itself,
not in the to be compiled code. So we always need PLT slots
for -run, not just for undefined symbols.
Previously in order to perform a ll+ll operation tcc
was trying to 'lexpand' PTR in gen_opl which did
not work well. The case:
int printf(const char *, ...);
char t[] = "012345678";
int main(void)
{
char *data = t;
unsigned long long r = 4;
unsigned a = 5;
unsigned long long b = 12;
*(unsigned*)(data + r) += a - b;
printf("data %s\n", data);
return 0;
}
This is a work-around for TCC's linker, on AArch64, not building a PLT
when TCC is invoked with "-run". Fixing the linker should be possible:
it works on arm and x86_64, apparently.
The back end functions gen_op(comparison) and gtst() might allocate
registers so case_reg should be left on the value stack while they
are called and set again afterwards.
This bug fix was first applied as ff3f9aa (20 Feb 2015), but the fix
was reverted by fc0fc6a (21 Sep 2016, "switch: collect case ranges
first, then generate code"). Here the fix is updated for the new code.
- There's no need to force STRIP_BINARIES on windows since --enable-strip (at
configure) already does exactly that, if one wants to.
- Use the contigured $STRIP instead of the native 'strip', useful when
cross building tcc.
- 'make install-strip' now also strips libtcc.dll on windows (it already does
so now with --enable-strip, and previously it always stripped it).
Summary of current strip options for all platforms:
- configure --enable strip -> 'install -s' for the binaries.
- make install-strip: installs and then configured $STRIP the binaries.
- Otherwise -> no stripping.
MSYS2 installs 3 environments, with uname (e.g. on win8.1 64) as follows:
- MINGW32_NT-6.3 gcc -> stand-alone native i686 binaries
- MINGW64_NT-6.3 gcc -> stand-alone native x86_64 binaries
- MSYS_NT-6.3 gcc -> posix-ish binaries which can only run in this env
Therefore 'MINGW' is more generic and detects both 32/64 native
environments, where previously 'MINGW32' detected only the 32 one.
For the following reasons:
- Native windows links are rarely used in general.
- Require elevated privileges even if the current user has administrator
privileges (needs further "run as administrator").
- Most/all windows shell environments capable of running configure already
support ln (msys[1], msys2, most probably cygwin too).
- If cross building tcc on linux for windows then native mklink is not
available, as well as 'cmd' (in this scenario the build later fails
for other reasons, but at least configures succeeds now).
- cp is good enough as fallback since we only copy 5 makefiles anyway.
- The only environment I'm aware of which doesn't support ln -s is busybox
for windows, and with this patch it falls back to cp and completes
configure successfully (and the build later succeeds, assuming valid
$CC and $AR).
Also:
- regenerate all tests/pp/*.expect with gcc
- test "insert one space" feature
- test "0x1E-1" in asm mode case
- PARSE_FLAG_SPACES: ignore \f\v\r better
- tcc.h: move some things
We need to preserve the type of the pointer to the structure, f.ex.
when a global structure is returned.
This is not a perfect solution. Registers loaded in the first iteration
might be overwritten in a following iteration as the register is no
longer on vtop. This is not a problem for ARM32 as gfunc_sret returns
a maximum of 1 in the integer case.
Makefile :
- do not 'uninstall' peoples /usr/local/doc entirely
libtcc.c :
- MEM_DEBUG : IDE-friendly output "file:line: ..."
- always ELF for objects
tccgen.c :
- fix memory leak in new switch code
- move static 'in_sizeof' out of function
profiling :
- define 'static' to empty
resolve_sym() :
- replace by dlsym()
win32/64: fix R_XXX_RELATIVE fixme
- was fixed for i386 already in
8e4d64be2f
- do not -Lsystemdir if compiling to .o
tccgen.c:gv() when loading long long from lvalue, before
was saving all registers which caused problems in the arm
function call register parameter preparation, as with
void foo(long long y, int x);
int main(void)
{
unsigned int *xx[1], x;
unsigned long long *yy[1], y;
foo(**yy, **xx);
return 0;
}
Now only the modified register is saved if necessary,
as in this case where it is used to store the result
of the post-inc:
long long *p, v, **pp;
v = 1;
p = &v;
p[0]++;
printf("another long long spill test : %lld\n", *p);
i386-gen.c :
- found a similar problem with TOK_UMULL caused by the
vstack juggle in tccgen:gen_opl()
(bug seen only when using EBX as 4th register)
#ifndef guards are cached, however after #elif on the
same level, the file must be re-read.
Also: preprocess asm as such even when there is no
assembler (arm).
"make test" crashes without that "save_regs()".
This partially reverts
commit 49d3118621.
Found another solution: In a 2nd pass Just look if
any of the argument registers has been saved again,
and restore if so.
This was causing assembler bugs in a tcc compiled by itself
at i386-asm.c:352 when ExprValue.v was changed to uint64_t:
if (op->e.v == (int8_t)op->e.v)
op->type |= OP_IM8S;
A general test case:
#include <stdio.h>
int main(int argc, char **argv)
{
long long ll = 4000;
int i = (char)ll;
printf("%d\n", i);
return 0;
}
Output was "4000", now "-96".
Also: add "asmtest2" as asmtest with tcc compiled by itself
Support ./configure && make under msys2 (a new msys fork)
on win32 and win64.
Get rid of CONFIG_WIN64 make-var. (On windows, WIN32 in
general is used for both 32 and 64 bit platforms)
Also:
- cleanup win32/build-tcc.bat
- adjust win32/(doc/)tcc-win32.tx
On 2016-08-11 09:24 +0100, Balazs Kezes wrote:
> I think it's just that that copy_params() never restores the spilled
> registers. Maybe it needs some extra code at the end to see if any
> parameters have been spilled to stack and then restore them?
I've spent some time on this and I've found an alternative solution.
Although I'm not entirely sure about it but I've attached a patch
nevertheless.
And while poking at that I've found another problem affecting the
unsigned long long division on arm and I've attached a patch for that
too.
More details in the patches themselves. Please review and consider them
for merging! Thank you!
--
Balazs
[PATCH 1/2] Fix slow unsigned long long division on ARM
The macro AEABI_UXDIVMOD expands to this bit:
#define AEABI_UXDIVMOD(name,type, rettype, typemacro) \
...
while (num >= den) { \
...
while ((q << 1) * den <= num && q * den <= typemacro ## _MAX / 2) \
q <<= 1; \
...
With the current ULONG_MAX version the inner loop goes only until 4
billion so the outer loop will progress very slowly if num is large.
With ULLONG_MAX the inner loop works as expected. The current version is
probably a result of a typo.
The following bash snippet demonstrates the bug:
$ uname -a
Linux eper 4.4.16-2-ARCH #1 Wed Aug 10 20:03:13 MDT 2016 armv6l GNU/Linux
$ cat div.c
int printf(const char *, ...);
int main(void) {
unsigned long long num, denom;
num = 12345678901234567ULL;
denom = 7;
printf("%lld\n", num / denom);
return 0;
}
$ time tcc -run div.c
1763668414462081
real 0m16.291s
user 0m15.860s
sys 0m0.020s
[PATCH 2/2] Fix long long dereference during argument passing on ARMv6
For some reason the code spills the register to the stack. copy_params
in arm-gen.c doesn't expect this so bad code is generated. It's not
entirely clear why the saving part is necessary. It was added in commit
59c35638 with the comment "fixed long long code gen bug" with no further
clarification. Given that tcctest.c passes without this, maybe it's no
longer needed? Let's remove it.
Also add a new testcase just for this. After I've managed to make the
tests compile on a raspberry pi, I get the following diff without this
patch:
--- test.ref 2016-08-22 22:12:43.380000000 +0100
+++ test.out3 2016-08-22 22:12:49.990000000 +0100
@@ -499,7 +499,7 @@
2
1 0 1 0
4886718345
-shift: 9 9 9312
+shift: 291 291 291
shiftc: 36 36 2328
shiftc: 0 0 9998683865088
manyarg_test:
More discussion on this thread:
https://lists.nongnu.org/archive/html/tinycc-devel/2016-08/msg00004.html