The stack size is derived from 'stack_last', when needed. Moreover,
the handling of stack sizes is more consistent, always excluding the
extra space except when allocating/deallocating the array.
The previous stackless implementations marked all 'luaV_execute'
invocations as fresh. However, re-entering 'luaV_execute' when
resuming a coroutine should not be a fresh invocation. (It works
because 'unroll' called 'luaV_execute' for each call entry, but
it was slower than letting 'luaV_execute' finish all non-fresh
invocations.)
A "with stack" implementation gains too little in performance to be
worth all the noise from C-stack overflows.
This commit is almost a sketch, to test performance. There are several
pending stuff:
- review control of C-stack overflow and error messages;
- what to do with setcstacklimit;
- review comments;
- review unroll of Lua calls.
The field 'L->oldpc' is not always updated when control returns to a
function; an invalid value can seg. fault when computing 'changedline'.
(One example is an error in a finalizer; control can return to
'luaV_execute' without executing 'luaD_poscall'.) Instead of trying to
fix all possible corner cases, it seems safer to be resilient to invalid
values for 'oldpc'. Valid but wrong values at most cause an extra call
to a line hook.
Errors in finalizers need a valid 'pc' to produce an error message,
even if the error is not propagated. Therefore, calls to the GC (which
may call finalizers) inside luaV_execute must save the 'pc'.
Macro 'checkstackGC' was doing a GC step after resizing the stack;
the GC could shrink the stack and undo the resize. Moreover, macro
'checkstackp' also does a GC step, which could remove the preallocated
CallInfo when calling a function. (Its name has been changed to
'checkstackGCp' to emphasize that it calls the GC.)
Instead of an explicit value (field 'b'), true and false use different
tag variants. This avoids reading an extra field and results in more
direct code. (Most code that uses booleans needs to distinguish between
true and false anyway.)
The difference in performance between immediate operands and K operands
does not seem to justify all those extra opcodes. We only keep OP_ADDI,
due to its ubiquity and because the difference is a little more relevant.
(Later, OP_SUBI will be implemented by OP_ADDI, negating the constant.)
In arithmetic/bitwise operators, the call to metamethods is made
in a separate opcode following the main one. (The main
opcode skips this next one when the operation succeeds.) This
change reduces slightly the size of the binary and the complexity
of the arithmetic/bitwise opcodes. It also simplfies the treatment
of errors and yeld/resume in these operations, as there are much
fewer cases to consider. (Only OP_MMBIN/OP_MMBINI/OP_MMBINK,
instead of all variants of all arithmetic/bitwise operators.)
The family of opcodes OP_ADDK (arithmetic operators with K constant)
were not being handled in 'luaV_finishOp', which completes their
task after an yield.
When initializing a to-be-closed variable, check whether it has a
'__close' metamethod (or is a false value) and raise an error if
if it hasn't. This produces more accurate error messages. (The
check before closing still need to be done: in the C API, the value
is not constant; and the object may lose its '__close' metamethod
during the block.)
Instead of updating 'L->top' in every place that may call a
metamethod, the metamethod functions themselves (luaT_trybinTM and
luaT_callorderTM) correct the top. (When calling metamethods from
the C API, however, the callers must preserve 'L->top'.)
- OP_NEWTABLE can use 'ra + 1' to set top (instead of ci->top);
- OP_CLOSE doesn't need to set top ('Protect' already does that);
- OP_TFORCALL must use 'ProtectNT', to preserve the top already set.
(That was a small bug, because iterators could be called with
extra parameters besides the state and the control variable.)
- Comments and an extra test for the bug in previous item.
Open upvalues are kept alive together with their corresponding
stack. This change makes a simpler and safer fix to the issue in
commit 440a5ee78c, about upvalues in the list of open upvalues
being collected while others are being created. (That previous fix
may not be correct.)
When creating an upvalue, an emergency collection can collect the
previous upvalue where the new one would be linked. The following
code can trigger the bug, using valgrind on Lua compiled with the
-DHARDMEMTESTS option:
local x; local y
(function () return y end)();
(function () return x end)()
This commit brings a new implementation for HARDMEMTESTS, which forces
an emergency GC whenever possible. It also fixes some issues detected
with this option:
- A small bug in lvm.c: a closure could be collected by an emergency
GC while being initialized.
- Some tests: a memory address can be immediatly reused after a GC;
for instance, two consecutive '{}' expressions can return exactly the
same address, if the first one is not anchored.
OP_RETURN must update trap before updating stack. (Bug detected with
-DHARDSTACKTESTS). Also, in 'luaF_close', do not create a variable
with 'uplevel(uv)', as the stack may change and invalidate this
value. (This is not a bug, but could become one if 'upl' was used
again.)
Opcodes OP_NEWTABLE and OP_SETLIST use the same representation to
store the size of the array part of a table. This new representation
can go up to 2^33 (8 + 25 bits).
OP_NEWTABLE is followed by an OP_EXTRAARG, so that it can keep
the exact size of the array part of the table to be created.
(Functions 'luaO_int2fb'/'luaO_fb2int' were removed.)
In the generic for loop, it is simpler for OP_TFORLOOP to use the
same 'ra' as OP_TFORCALL. Moreover, the internal names of the loop
temporaries "(for ...)" don't need to leak internal details (even
because the numerical for loop doesn't have a fixed role for each of
its temporaries).
Ensure that operation macros, such as 'luai_numdiv' and 'luai_numidiv',
operate only on variables, or at most at 's2v(ra)'. ('s2v' is a nop, a
cast from pointer to pointer.)
- In 'readutf8esc' (llex.c), the overflow check must be done before
shifting the accumulator. It was working because tests were using
64-bit longs. Failed with 32-bit longs.
- In OP_FORPREP (lvm.c), avoid negating an unsigned value. Visual
Studio gives a warning for that operation, despite being well
defined in ISO C.
- In 'luaV_execute' (lvm.c), 'cond' can be defined only when needed,
like all other variables.
When calling metamethods for things like 'a < 3.0', which generates
the opcode OP_LTI, the C register tells that the operand was
converted to an integer, so that it can be corrected to float when
calling a metamethod.
This commit also includes some other stuff:
- file 'onelua.c' added to the project
- opcode OP_PREPVARARG renamed to OP_VARARGPREP
- comparison opcodes rewritten through macros
Changed some implementation details; in particular, it is back using
an internal variable to keep the index, with the control variable
being only a copy of that internal variable. (The direct use of
the control variable demands a check of its type for each access,
which offsets the gains from the use of a single variable.)
The numerical 'for' loop over integers now uses a precomputed counter
to control its number of iteractions. This change eliminates several
weird cases caused by overflows (wrap-around) in the control variable.
(It also ensures that every integer loop halts.)
Also, the special opcodes for the usual case of step==1 were removed.
(The new code is already somewhat complex for the usual case,
but efficient.)
The function 'string.gmatch' now has an optional 'init' argument,
similar to 'string.find' and 'string.match'. Moreover, there was
some reorganization in the manipulation of indices in the string
library.
This commit also includes small janitorial work in the manual
and in comments in the interpreter loop.
New functions to reset/kill a thread/coroutine, mainly (only?) to
close any pending to-be-closed variable. ('lua_resetthread' also
allows a thread to be reused...)
A to-be-closed variable must be closed when a block ends, so even
a 'return foo()' cannot directly returns the results of 'foo'; the
function must close the scope before returning.
It is an error for a to-be-closed variable to have a non-closable
non-nil value when it is being closed. This situation does not seem to
be useful and often hints to an error. (Particularly in the C API, it is
easy to change a to-be-closed index by mistake.)
(Long time without testing with '-DHARDSTACKTESTS'...)
With the introduction of to-be-closed variables, calls to 'luaF_close'
can move the stack, but some call sites where keeping pointers to the
stack without correcting them.
Added opcodes for all seven arithmetic operators with K operands
(that is, operands that are numbers in the array of constants of
the function). They cover the cases of constant float operands
(e.g., 'x + .0.0', 'x^0.5') and large integer operands (e.g.,
'x % 10000').
The variable to be closed in a generic 'for' loop now is the
4th value produced in the loop initialization, instead of being
the loop state (the 2nd value produced). That allows a loop to
use a state with a '__toclose' metamethod but do not close it.
(As an example, 'f:lines()' might use the file 'f' as a state
for the loop, but it should not close the file when the loop ends.)
The repetitive code of the arithmetic and bitwise operators in
the main iterpreter loop was moved to appropriate macros.
(As a detail, the function 'luaV_div' was renamed 'luaV_idiv',
as it does an "integer division" (floor division).
The mechanism of "caching the last closure created for a prototype to
try to reuse it the next time a closure for that prototype is created"
was removed. There are several reasons:
- It is hard to find a natural example where this cache has a measurable
impact on performance.
- Programmers already perceive closure creation as something slow,
so they tend to avoid it inside hot paths. (Any case where the cache
could reuse a closure can be rewritten predefining the closure in some
variable and using that variable.)
- The implementation was somewhat complex, due to a bad interaction
with the generational collector. (Typically, new closures are new,
while prototypes are old. So, the cache breaks the invariant that
old objects should not point to new ones.)
The implicit variable 'state' in a generic 'for' is marked as a
to-be-closed variable, so that the state will be closed as soon
as the loop ends, no matter how.
Taking advantage of this new facility, the call 'io.lines(filename)'
now returns the open file as a second result. Therefore,
an iteraction like 'for l in io.lines(name)...' will close the
file even when the loop ends with a break or an error.
"Better" and similar to error messages for invalid function arguments.
*old message: 'for' limit must be a number
*new message: bad 'for' limit (number expected, got table)
Added new instruction 'OP_TFORPREP' to prepare a generic for loop.
Currently it is equivalent to a jump (but with a format 'iABx',
similar to other for-loop preparing instructions), but soon it will
be the place to create upvalues for closing loop states.
A closing method cannot be called in its own stack slot, as there may
be returning values in the stack after that slot, and the call would
corrupt those values. Instead, the closing method must be copied to the
top of the stack to be called.
Moreover, even when a function returns no value, its return istruction
still has to have its position (which will set the stack top) after
the local variables, otherwise a closing method might corrupt another
not-yet-called closing method.
Start of the implementation of "scoped variables" or "to be closed"
variables, local variables whose '__close' (or themselves) are called
when they go out of scope. This commit implements the syntax, the
opcode, and the creation of the corresponding upvalue, but it still
does not call the finalizations when the variable goes out of scope
(the most important part).
Currently, the syntax is 'local scoped name = exp', but that will
probably change.
The multiplication (m*b) used to test whether 'm' is non-zero and
'm' and 'b' have different signs can underflow for very small numbers,
giving a wrong result. The use of explicit comparisons solves this
problem. This commit also adds several new tests for '%' (both for
floats and for integers) to exercise more corner cases, such as
very large and very small values.
As hinted in the manual for Lua 5.3, the emulation of the metamethod
for '__le' using '__le' has been deprecated. It is slow, complicates
the logic, and it is easy to avoid this emulation by defining a proper
'__le' function.
Moreover, often this emulation was used wrongly, with a programmer
assuming that an order is total when it is not (e.g., NaN in
floating-point numbers).