'luaD_pretailcall' mimics 'luaD_precall', handling call metamethods
and calling C functions directly. That makes the code in the
interpreter loop simpler.
This commit also goes back to emulating the tail call in 'luaD_precall'
with a goto, as C compilers may not do proper tail calls and the C
stack can overflow much sooner than the Lua stack (which grows as the
metamethod is added to it).
The parameters 'nresults' and 'delta1', in 'luaD_precall', were never
meaningful simultaneously. So, they were combined in a single parameter
'retdel'.
Some places don't need the "fast path" macro tointegerns, either
because speed is not essential (lcode.c) or because the value is not
supposed to be an integer already (luaV_equalobj and luaG_tointerror).
Moreover, luaV_equalobj should always use F2Ieq, even if Lua is
compiled to "round to floor".
More uses of macros 'likely'/'unlikely' (renamed to
'l_likely'/'l_unlikely'), both in range (extended to the
libraries) and in scope (extended to hooks, stack growth).
To-be-closed variables are linked in their own list, embedded into the
stack elements. (Due to alignment, this information does not change
the size of the stack elements in most architectures.) This new list
does not produce garbage and avoids memory errors when creating tbc
variables.
Initial implementation to allow yields inside '__close' metamethods.
This current version still does not allow a '__close' metamethod
to yield when called due to an error. '__close' metamethods from
C functions also are not allowed to yield.
The stack size is derived from 'stack_last', when needed. Moreover,
the handling of stack sizes is more consistent, always excluding the
extra space except when allocating/deallocating the array.
The previous stackless implementations marked all 'luaV_execute'
invocations as fresh. However, re-entering 'luaV_execute' when
resuming a coroutine should not be a fresh invocation. (It works
because 'unroll' called 'luaV_execute' for each call entry, but
it was slower than letting 'luaV_execute' finish all non-fresh
invocations.)
A "with stack" implementation gains too little in performance to be
worth all the noise from C-stack overflows.
This commit is almost a sketch, to test performance. There are several
pending stuff:
- review control of C-stack overflow and error messages;
- what to do with setcstacklimit;
- review comments;
- review unroll of Lua calls.
The field 'L->oldpc' is not always updated when control returns to a
function; an invalid value can seg. fault when computing 'changedline'.
(One example is an error in a finalizer; control can return to
'luaV_execute' without executing 'luaD_poscall'.) Instead of trying to
fix all possible corner cases, it seems safer to be resilient to invalid
values for 'oldpc'. Valid but wrong values at most cause an extra call
to a line hook.
Errors in finalizers need a valid 'pc' to produce an error message,
even if the error is not propagated. Therefore, calls to the GC (which
may call finalizers) inside luaV_execute must save the 'pc'.
Macro 'checkstackGC' was doing a GC step after resizing the stack;
the GC could shrink the stack and undo the resize. Moreover, macro
'checkstackp' also does a GC step, which could remove the preallocated
CallInfo when calling a function. (Its name has been changed to
'checkstackGCp' to emphasize that it calls the GC.)
Instead of an explicit value (field 'b'), true and false use different
tag variants. This avoids reading an extra field and results in more
direct code. (Most code that uses booleans needs to distinguish between
true and false anyway.)
The difference in performance between immediate operands and K operands
does not seem to justify all those extra opcodes. We only keep OP_ADDI,
due to its ubiquity and because the difference is a little more relevant.
(Later, OP_SUBI will be implemented by OP_ADDI, negating the constant.)
In arithmetic/bitwise operators, the call to metamethods is made
in a separate opcode following the main one. (The main
opcode skips this next one when the operation succeeds.) This
change reduces slightly the size of the binary and the complexity
of the arithmetic/bitwise opcodes. It also simplfies the treatment
of errors and yeld/resume in these operations, as there are much
fewer cases to consider. (Only OP_MMBIN/OP_MMBINI/OP_MMBINK,
instead of all variants of all arithmetic/bitwise operators.)
The family of opcodes OP_ADDK (arithmetic operators with K constant)
were not being handled in 'luaV_finishOp', which completes their
task after an yield.
When initializing a to-be-closed variable, check whether it has a
'__close' metamethod (or is a false value) and raise an error if
if it hasn't. This produces more accurate error messages. (The
check before closing still need to be done: in the C API, the value
is not constant; and the object may lose its '__close' metamethod
during the block.)
Instead of updating 'L->top' in every place that may call a
metamethod, the metamethod functions themselves (luaT_trybinTM and
luaT_callorderTM) correct the top. (When calling metamethods from
the C API, however, the callers must preserve 'L->top'.)
- OP_NEWTABLE can use 'ra + 1' to set top (instead of ci->top);
- OP_CLOSE doesn't need to set top ('Protect' already does that);
- OP_TFORCALL must use 'ProtectNT', to preserve the top already set.
(That was a small bug, because iterators could be called with
extra parameters besides the state and the control variable.)
- Comments and an extra test for the bug in previous item.