- calling conventions are different:
* only 4 registers
* stack "scratch area" is always reserved
* doubles are mirrored in normal registers
- no GOT or PIC there
Date: Mon, 8 Jun 2009 19:06:56 +0800
From: Soloist Deng <soloist.deng-gmail-com>
Subject: [Tinycc-devel] trying to fix the bug of unclean FPU st(0)
Hi all:
I am using tcc-0.9.25, and the FPU bug brought a big trouble to
me. I read the source and tried to fix it.
Below is my solution.
There are two places where program(`o(0xd9dd)') will generates `fstp
%st(1)': vpop() in tccgen.c:689 and save_reg() in tccgen.c:210.
We should first change both of them to `o(0xd8dd) // fstp %st(0)'.
But these changes are not enough. Let's check the following code.
void foo()
{
double var = 2.7;
var++;
}
Using the changed tcc will generate following machine code:
.text:08000000 public foo
.text:08000000 foo proc near
.text:08000000
.text:08000000 var_18 = qword ptr -18h
.text:08000000 var_10 = qword ptr -10h
.text:08000000 var_8 = qword ptr -8
.text:08000000
.text:08000000 push ebp
.text:08000001 mov ebp, esp
.text:08000003 sub esp, 18h
.text:08000009 nop
.text:0800000A fld L_0
.text:08000010 fst [ebp+var_8]
.text:08000013 fstp st(0)
.text:08000015 fld [ebp+var_8]
.text:08000018 fst [ebp+var_10]
.text:0800001B fstp st(0)
.text:0800001D fst [ebp+var_18]
.text:08000020 fstp st(0)
.text:08000022 fld L_1
.text:08000028 fadd [ebp+var_10]
.text:0800002B fst [ebp+var_8]
.text:0800002E fstp st(0)
.text:08000030 leave
.text:08000031 retn
.text:08000031 foo endp
.text:08000031
.text:08000031 _text ends
--------------------------------------------------
.data:08000040 ; Segment type: Pure data
.data:08000040 ; Segment permissions: Read/Write
.data:08000040 ; Segment alignment '32byte' can not be represented in assembly
.data:08000040 _data segment page public 'DATA' use32
.data:08000040 assume cs:_data
.data:08000040 ;org 8000040h
.data:08000040 L_0 dq 400599999999999Ah
.data:08000048 L_1 dq 3FF0000000000000h
.data:08000048 _data ends
Please notice the code snippet from 0800000A to 08000020
// double var = 2.7; load constant to st(0)
.text:0800000A fld L_0
// double var = 2.7; store st(0) to `var'
.text:08000010 fst [ebp+var_8]
// double var = 2.7; poping st(0) will empty the floating registers stack
.text:08000013 fstp st(0)
After that ,tcc will call `void inc(int post, int c)" in
tccgen.c:2150, and produce 08000015 to 0800001B through the calling
chain (inc ->gv_dup)
// load from `var' to st(0)
.text:08000015 fld [ebp+var_8]
// store st(0) to a temporary location
.text:08000018 fst [ebp+var_10]
// poping st(0) will empty the floating registers stack
.text:0800001B fstp st(0)
And the calling chain
(gen_op('+')->gen_opif('+')->gen_opf('+')->gv(rc=2)->get_reg(rc=2)->save_reg(r=3))
will produce 0800001D to 08000020 .
// store st(0) to a temporary location, but floating stack is empty!
.text:0800001D fst [ebp+var_18]
// poping st(0) will empty the floating registers stack
.text:08000020 fstp st(0)
The `0800001D fst [ebp+var_18]' will store st(0) to a memory
location, but st(0) is empty. That will cause FPU invalid operation
exception(#IE).
Why does tcc do that? Please read `gv_dup' called by `inc' carefully.
Notice these lines:
(1): r = gv(rc);
(2): r1 = get_reg(rc);
(3): sv.r = r;
sv.c.ul = 0;
(4) load(r1, &sv); /* move r to r1 */
(5) vdup();
/* duplicates value */
(6) vtop->r = r1;
(1) let the vtop occupy TREG_ST0, and `r' will be TREG_ST0. (2)
try to get a free floating register,but tcc assume
there is only one, so it wil force vtop goto memory and assign `r1'
with TREG_ST0. When executing (3), it will do nothing
because `r' equals `r1'. (5) duplicates vtop. Then (6) let the new
vtop occupy TREG_ST0, but this will cause problem
because the old vtop has been moved to memory, so the new duplicated
vtop does not reside in TREG_ST0 but also
in memory after that. TREG_ST0 is not occupied but freely availabe
now. `gen_op('+')' need at least one oprand in register,
so it will incorrectly think TREG_ST0 is occupied by vtop and produce
instructions(0800001D and 08000020) to store it to
a temporary memory location.
According program above, if `r' == `r1' it is impossible for the old
vtop to still occupy the `r' register . And `load' will do nothing
too at this condition.
So the `gv_dup' can not promise the semantics that old vtop in one
register and the new duplicated vtop in another register at the same
time.
I changed (6) to
if (r != r1)
{
vtop->r = r1;
}
Then the new generated machine code will be :
.text:08000000 push ebp
.text:08000001 mov ebp, esp
.text:08000003 sub esp, 10h
.text:08000009 nop
.text:0800000A fld L_0
.text:08000010 fst [ebp+var_8]
.text:08000013 fstp st(0)
.text:08000015 fld [ebp+var_8]
.text:08000018 fst [ebp+var_10]
.text:0800001B fstp st(0)
.text:0800001D fld L_1
.text:08000023 fadd [ebp+var_10]
.text:08000026 fst [ebp+var_8]
.text:08000029 fstp st(0)
.text:0800002B leave
.text:0800002C retn
It works well, and will clean the floating registers stack when return.
Finally, I want to know there is any potential problem of this fixing ?
soloist
(Because GNU's alloca.h unconditionally #undef's alloca)
Also, remove gcc specific sections in headers. and
instead change tests such that gcc does not use them.
comparing the filenames as in the #include statement can be
ambiguous if including files are in different directories.
Now caches and checks the really opened filename instead.
The loop constructs to iterate over the non-overlapping, even
positions of two or three bytes in a word were broken.
This patch fixes the loops. It has been verified to generate the
72 combinations for two and the 80 combinations for three bytes.