Because error handling (luaG_errormsg) uses slots from EXTRA_STACK,
and some errors can recur (e.g., string overflow while creating an
error message in 'luaG_runerror', or a C-stack overflow before calling
the message handler), the code should use stack slots with parsimony.
This commit fixes the bug "Lua-stack overflow when C stack overflows
while handling an error".
luaV_execute should compute 'ra' only when the instruction uses it.
Computing an illegal address is undefined behavior even if the address
is never dereferenced.
'luaD_pretailcall' mimics 'luaD_precall', handling call metamethods
and calling C functions directly. That makes the code in the
interpreter loop simpler.
This commit also goes back to emulating the tail call in 'luaD_precall'
with a goto, as C compilers may not do proper tail calls and the C
stack can overflow much sooner than the Lua stack (which grows as the
metamethod is added to it).
The parameters 'nresults' and 'delta1', in 'luaD_precall', were never
meaningful simultaneously. So, they were combined in a single parameter
'retdel'.
Some places don't need the "fast path" macro tointegerns, either
because speed is not essential (lcode.c) or because the value is not
supposed to be an integer already (luaV_equalobj and luaG_tointerror).
Moreover, luaV_equalobj should always use F2Ieq, even if Lua is
compiled to "round to floor".
More uses of macros 'likely'/'unlikely' (renamed to
'l_likely'/'l_unlikely'), both in range (extended to the
libraries) and in scope (extended to hooks, stack growth).
To-be-closed variables are linked in their own list, embedded into the
stack elements. (Due to alignment, this information does not change
the size of the stack elements in most architectures.) This new list
does not produce garbage and avoids memory errors when creating tbc
variables.
Initial implementation to allow yields inside '__close' metamethods.
This current version still does not allow a '__close' metamethod
to yield when called due to an error. '__close' metamethods from
C functions also are not allowed to yield.
The stack size is derived from 'stack_last', when needed. Moreover,
the handling of stack sizes is more consistent, always excluding the
extra space except when allocating/deallocating the array.
The previous stackless implementations marked all 'luaV_execute'
invocations as fresh. However, re-entering 'luaV_execute' when
resuming a coroutine should not be a fresh invocation. (It works
because 'unroll' called 'luaV_execute' for each call entry, but
it was slower than letting 'luaV_execute' finish all non-fresh
invocations.)
A "with stack" implementation gains too little in performance to be
worth all the noise from C-stack overflows.
This commit is almost a sketch, to test performance. There are several
pending stuff:
- review control of C-stack overflow and error messages;
- what to do with setcstacklimit;
- review comments;
- review unroll of Lua calls.
The field 'L->oldpc' is not always updated when control returns to a
function; an invalid value can seg. fault when computing 'changedline'.
(One example is an error in a finalizer; control can return to
'luaV_execute' without executing 'luaD_poscall'.) Instead of trying to
fix all possible corner cases, it seems safer to be resilient to invalid
values for 'oldpc'. Valid but wrong values at most cause an extra call
to a line hook.
Errors in finalizers need a valid 'pc' to produce an error message,
even if the error is not propagated. Therefore, calls to the GC (which
may call finalizers) inside luaV_execute must save the 'pc'.
Macro 'checkstackGC' was doing a GC step after resizing the stack;
the GC could shrink the stack and undo the resize. Moreover, macro
'checkstackp' also does a GC step, which could remove the preallocated
CallInfo when calling a function. (Its name has been changed to
'checkstackGCp' to emphasize that it calls the GC.)
Instead of an explicit value (field 'b'), true and false use different
tag variants. This avoids reading an extra field and results in more
direct code. (Most code that uses booleans needs to distinguish between
true and false anyway.)
The difference in performance between immediate operands and K operands
does not seem to justify all those extra opcodes. We only keep OP_ADDI,
due to its ubiquity and because the difference is a little more relevant.
(Later, OP_SUBI will be implemented by OP_ADDI, negating the constant.)