2014-05-04 02:27:38 +04:00
|
|
|
/*
|
2017-06-30 10:22:17 +03:00
|
|
|
* This file is part of the MicroPython project, http://micropython.org/
|
2014-05-04 02:27:38 +04:00
|
|
|
*
|
|
|
|
* The MIT License (MIT)
|
|
|
|
*
|
2019-05-27 04:55:40 +03:00
|
|
|
* Copyright (c) 2013-2019 Damien P. George
|
2019-01-31 11:55:21 +03:00
|
|
|
* Copyright (c) 2014-2015 Paul Sokolovsky
|
2014-05-04 02:27:38 +04:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
|
|
* of this software and associated documentation files (the "Software"), to deal
|
|
|
|
* in the Software without restriction, including without limitation the rights
|
|
|
|
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
|
|
* copies of the Software, and to permit persons to whom the Software is
|
|
|
|
* furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice shall be included in
|
|
|
|
* all copies or substantial portions of the Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
|
|
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
|
|
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
|
|
* THE SOFTWARE.
|
|
|
|
*/
|
|
|
|
|
2013-10-04 22:53:11 +04:00
|
|
|
#include <stdio.h>
|
2014-03-30 18:35:53 +04:00
|
|
|
#include <string.h>
|
2013-10-04 22:53:11 +04:00
|
|
|
#include <assert.h>
|
|
|
|
|
2015-01-01 23:27:54 +03:00
|
|
|
#include "py/emitglue.h"
|
2015-01-06 15:51:39 +03:00
|
|
|
#include "py/objtype.h"
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
#include "py/objfun.h"
|
2015-01-01 23:27:54 +03:00
|
|
|
#include "py/runtime.h"
|
|
|
|
#include "py/bc0.h"
|
2019-08-14 17:09:36 +03:00
|
|
|
#include "py/profile.h"
|
2013-10-04 22:53:11 +04:00
|
|
|
|
2020-02-26 03:58:42 +03:00
|
|
|
// *FORMAT-OFF*
|
|
|
|
|
2014-04-23 04:40:24 +04:00
|
|
|
#if 0
|
2021-03-17 02:22:03 +03:00
|
|
|
#if MICROPY_PY_THREAD
|
|
|
|
#define TRACE_PREFIX mp_printf(&mp_plat_print, "ts=%p sp=%d ", mp_thread_get_state(), (int)(sp - &code_state->state[0] + 1))
|
|
|
|
#else
|
|
|
|
#define TRACE_PREFIX mp_printf(&mp_plat_print, "sp=%d ", (int)(sp - &code_state->state[0] + 1))
|
|
|
|
#endif
|
2022-03-16 03:39:27 +03:00
|
|
|
#define TRACE(ip) TRACE_PREFIX; mp_bytecode_print2(&mp_plat_print, ip, 1, code_state->fun_bc->child_table, &code_state->fun_bc->context->constants);
|
2014-04-23 04:40:24 +04:00
|
|
|
#else
|
|
|
|
#define TRACE(ip)
|
|
|
|
#endif
|
2014-04-10 20:21:34 +04:00
|
|
|
|
2014-01-31 21:45:15 +04:00
|
|
|
// Value stack grows up (this makes it incompatible with native C stack, but
|
|
|
|
// makes sure that arguments to functions are in natural order arg1..argN
|
|
|
|
// (Python semantics mandates left-to-right evaluation order, including for
|
|
|
|
// function arguments). Stack pointer is pre-incremented and points at the
|
|
|
|
// top element.
|
|
|
|
// Exception stack also grows up, top element is also pointed at.
|
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
#define DECODE_UINT \
|
|
|
|
mp_uint_t unum = 0; \
|
2014-02-18 23:21:22 +04:00
|
|
|
do { \
|
|
|
|
unum = (unum << 7) + (*ip & 0x7f); \
|
2014-12-02 22:25:10 +03:00
|
|
|
} while ((*ip++ & 0x80) != 0)
|
2022-03-16 01:37:58 +03:00
|
|
|
|
|
|
|
#define DECODE_ULABEL \
|
|
|
|
size_t ulab; \
|
|
|
|
do { \
|
|
|
|
if (ip[0] & 0x80) { \
|
|
|
|
ulab = ((ip[0] & 0x7f) | (ip[1] << 7)); \
|
|
|
|
ip += 2; \
|
|
|
|
} else { \
|
|
|
|
ulab = ip[0]; \
|
|
|
|
ip += 1; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
#define DECODE_SLABEL \
|
|
|
|
size_t slab; \
|
|
|
|
do { \
|
|
|
|
if (ip[0] & 0x80) { \
|
|
|
|
slab = ((ip[0] & 0x7f) | (ip[1] << 7)) - 0x4000; \
|
|
|
|
ip += 2; \
|
|
|
|
} else { \
|
|
|
|
slab = ip[0] - 0x40; \
|
|
|
|
ip += 1; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
2015-11-02 20:27:18 +03:00
|
|
|
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
#if MICROPY_EMIT_BYTECODE_USES_QSTR_TABLE
|
2015-11-02 20:27:18 +03:00
|
|
|
|
|
|
|
#define DECODE_QSTR \
|
2015-11-27 20:01:44 +03:00
|
|
|
DECODE_UINT; \
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
qstr qst = qstr_table[unum]
|
2015-11-02 20:27:18 +03:00
|
|
|
|
|
|
|
#else
|
|
|
|
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
#define DECODE_QSTR \
|
|
|
|
DECODE_UINT; \
|
|
|
|
qstr qst = unum;
|
2015-11-02 20:27:18 +03:00
|
|
|
|
|
|
|
#endif
|
|
|
|
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
#define DECODE_PTR \
|
|
|
|
DECODE_UINT; \
|
|
|
|
void *ptr = (void *)(uintptr_t)code_state->fun_bc->child_table[unum]
|
|
|
|
|
|
|
|
#define DECODE_OBJ \
|
|
|
|
DECODE_UINT; \
|
|
|
|
mp_obj_t obj = (mp_obj_t)code_state->fun_bc->context->constants.obj_table[unum]
|
|
|
|
|
2014-01-18 18:10:48 +04:00
|
|
|
#define PUSH(val) *++sp = (val)
|
|
|
|
#define POP() (*sp--)
|
2013-12-10 21:27:24 +04:00
|
|
|
#define TOP() (*sp)
|
|
|
|
#define SET_TOP(val) *sp = (val)
|
2013-10-04 22:53:11 +04:00
|
|
|
|
2015-04-26 01:20:49 +03:00
|
|
|
#if MICROPY_PY_SYS_EXC_INFO
|
2015-11-27 20:01:44 +03:00
|
|
|
#define CLEAR_SYS_EXC_INFO() MP_STATE_VM(cur_exception) = NULL;
|
2015-04-26 01:20:49 +03:00
|
|
|
#else
|
|
|
|
#define CLEAR_SYS_EXC_INFO()
|
|
|
|
#endif
|
|
|
|
|
2014-12-22 15:49:57 +03:00
|
|
|
#define PUSH_EXC_BLOCK(with_or_finally) do { \
|
2014-03-29 04:49:07 +04:00
|
|
|
DECODE_ULABEL; /* except labels are always forward */ \
|
|
|
|
++exc_sp; \
|
2014-12-02 22:25:10 +03:00
|
|
|
exc_sp->handler = ip + ulab; \
|
2019-01-04 08:40:29 +03:00
|
|
|
exc_sp->val_sp = MP_TAGPTR_MAKE(sp, ((with_or_finally) << 1)); \
|
2015-11-27 20:01:44 +03:00
|
|
|
exc_sp->prev_exc = NULL; \
|
2014-12-02 22:25:10 +03:00
|
|
|
} while (0)
|
2014-03-29 04:49:07 +04:00
|
|
|
|
2014-03-30 01:16:27 +04:00
|
|
|
#define POP_EXC_BLOCK() \
|
2015-04-26 01:20:49 +03:00
|
|
|
exc_sp--; /* pop back to previous exception handler */ \
|
|
|
|
CLEAR_SYS_EXC_INFO() /* just clear sys.exc_info(), not compliant, but it shouldn't be used in 1st place */
|
2014-03-30 01:16:27 +04:00
|
|
|
|
2019-09-27 17:07:21 +03:00
|
|
|
#define CANCEL_ACTIVE_FINALLY(sp) do { \
|
|
|
|
if (mp_obj_is_small_int(sp[-1])) { \
|
|
|
|
/* Stack: (..., prev_dest_ip, prev_cause, dest_ip) */ \
|
|
|
|
/* Cancel the unwind through the previous finally, replace with current one */ \
|
|
|
|
sp[-2] = sp[0]; \
|
|
|
|
sp -= 2; \
|
|
|
|
} else { \
|
|
|
|
assert(sp[-1] == mp_const_none || mp_obj_is_exception_instance(sp[-1])); \
|
|
|
|
/* Stack: (..., None/exception, dest_ip) */ \
|
|
|
|
/* Silence the finally's exception value (may be None or an exception) */ \
|
|
|
|
sp[-1] = sp[0]; \
|
|
|
|
--sp; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
|
2019-08-14 17:09:36 +03:00
|
|
|
#if MICROPY_PY_SYS_SETTRACE
|
|
|
|
|
|
|
|
#define FRAME_SETUP() do { \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
MP_STATE_THREAD(current_code_state) = code_state; \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
#define FRAME_ENTER() do { \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
code_state->prev_state = MP_STATE_THREAD(current_code_state); \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
if (!mp_prof_is_executing) { \
|
|
|
|
mp_prof_frame_enter(code_state); \
|
|
|
|
} \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
#define FRAME_LEAVE() do { \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
MP_STATE_THREAD(current_code_state) = code_state->prev_state; \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
#define FRAME_UPDATE() do { \
|
|
|
|
assert(MP_STATE_THREAD(current_code_state) == code_state); \
|
|
|
|
if (!mp_prof_is_executing) { \
|
|
|
|
code_state->frame = MP_OBJ_TO_PTR(mp_prof_frame_update(code_state)); \
|
|
|
|
} \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
#define TRACE_TICK(current_ip, current_sp, is_exception) do { \
|
|
|
|
assert(code_state != code_state->prev_state); \
|
|
|
|
assert(MP_STATE_THREAD(current_code_state) == code_state); \
|
|
|
|
if (!mp_prof_is_executing && code_state->frame && MP_STATE_THREAD(prof_trace_callback)) { \
|
|
|
|
MP_PROF_INSTR_DEBUG_PRINT(code_state->ip); \
|
|
|
|
} \
|
|
|
|
if (!mp_prof_is_executing && code_state->frame && code_state->frame->callback) { \
|
|
|
|
mp_prof_instr_tick(code_state, is_exception); \
|
|
|
|
} \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
#else // MICROPY_PY_SYS_SETTRACE
|
|
|
|
#define FRAME_SETUP()
|
|
|
|
#define FRAME_ENTER()
|
|
|
|
#define FRAME_LEAVE()
|
|
|
|
#define FRAME_UPDATE()
|
|
|
|
#define TRACE_TICK(current_ip, current_sp, is_exception)
|
|
|
|
#endif // MICROPY_PY_SYS_SETTRACE
|
|
|
|
|
2014-01-18 18:10:48 +04:00
|
|
|
// fastn has items in reverse order (fastn[0] is local[0], fastn[-1] is local[1], etc)
|
|
|
|
// sp points to bottom of stack which grows up
|
2014-02-16 02:55:00 +04:00
|
|
|
// returns:
|
|
|
|
// MP_VM_RETURN_NORMAL, sp valid, return value in *sp
|
|
|
|
// MP_VM_RETURN_YIELD, ip, sp valid, yielded value in *sp
|
2019-01-04 09:22:40 +03:00
|
|
|
// MP_VM_RETURN_EXCEPTION, exception in state[0]
|
2021-09-24 05:49:51 +03:00
|
|
|
mp_vm_return_kind_t MICROPY_WRAP_MP_EXECUTE_BYTECODE(mp_execute_bytecode)(mp_code_state_t *code_state, volatile mp_obj_t inject_exc) {
|
2022-07-06 09:03:16 +03:00
|
|
|
|
2014-12-28 08:17:43 +03:00
|
|
|
#define SELECTIVE_EXC_IP (0)
|
2022-07-06 09:03:16 +03:00
|
|
|
// When disabled, code_state->ip is updated unconditionally during op
|
|
|
|
// dispatch, and this is subsequently used in the exception handler
|
|
|
|
// (either NLR jump or direct RAISE). This is good for code size because it
|
|
|
|
// happens in a single place but is more work than necessary, as many opcodes
|
|
|
|
// cannot raise. Enabling SELECTIVE_EXC_IP means that code_state->ip
|
|
|
|
// is "selectively" updated only during handling of opcodes that might raise.
|
|
|
|
// This costs about 360 bytes on PYBV11 for a 1-3% performance gain (e.g. 3%
|
|
|
|
// in bm_fft.py). On rp2040, there is zero code size diff for a 0-1% gain.
|
|
|
|
// (Both with computed goto enabled).
|
2014-12-28 08:17:43 +03:00
|
|
|
#if SELECTIVE_EXC_IP
|
2022-07-06 09:03:16 +03:00
|
|
|
// Note: Because ip has already been advanced by one byte in the dispatch, the
|
|
|
|
// value of ip here is one byte past the last opcode.
|
|
|
|
#define MARK_EXC_IP_SELECTIVE() { code_state->ip = ip; }
|
|
|
|
// No need to update in dispatch.
|
2014-12-28 08:17:43 +03:00
|
|
|
#define MARK_EXC_IP_GLOBAL()
|
|
|
|
#else
|
|
|
|
#define MARK_EXC_IP_SELECTIVE()
|
2022-07-06 09:03:16 +03:00
|
|
|
// Immediately before dispatch, save the current ip, which will be the opcode
|
|
|
|
// about to be dispatched.
|
|
|
|
#define MARK_EXC_IP_GLOBAL() { code_state->ip = ip; }
|
2014-12-28 08:17:43 +03:00
|
|
|
#endif
|
2022-07-06 09:03:16 +03:00
|
|
|
|
2014-05-21 23:32:59 +04:00
|
|
|
#if MICROPY_OPT_COMPUTED_GOTO
|
2015-01-01 23:27:54 +03:00
|
|
|
#include "py/vmentrytable.h"
|
2014-04-15 11:57:01 +04:00
|
|
|
#define DISPATCH() do { \
|
2014-04-23 04:40:24 +04:00
|
|
|
TRACE(ip); \
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_GLOBAL(); \
|
2019-08-14 17:09:36 +03:00
|
|
|
TRACE_TICK(ip, sp, false); \
|
2014-04-27 21:19:06 +04:00
|
|
|
goto *entry_table[*ip++]; \
|
2015-04-09 18:29:54 +03:00
|
|
|
} while (0)
|
2014-10-25 21:19:55 +04:00
|
|
|
#define DISPATCH_WITH_PEND_EXC_CHECK() goto pending_exception_check
|
2014-04-15 11:57:01 +04:00
|
|
|
#define ENTRY(op) entry_##op
|
|
|
|
#define ENTRY_DEFAULT entry_default
|
2014-04-14 19:22:44 +04:00
|
|
|
#else
|
2018-05-18 04:47:03 +03:00
|
|
|
#define DISPATCH() goto dispatch_loop
|
2014-10-25 21:19:55 +04:00
|
|
|
#define DISPATCH_WITH_PEND_EXC_CHECK() goto pending_exception_check
|
2014-04-15 11:57:01 +04:00
|
|
|
#define ENTRY(op) case op
|
|
|
|
#define ENTRY_DEFAULT default
|
2014-04-14 19:22:44 +04:00
|
|
|
#endif
|
|
|
|
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
// nlr_raise needs to be implemented as a goto, so that the C compiler's flow analyser
|
|
|
|
// sees that it's possible for us to jump from the dispatch loop to the exception
|
|
|
|
// handler. Without this, the code may have a different stack layout in the dispatch
|
|
|
|
// loop and the exception handler, leading to very obscure bugs.
|
2015-11-27 20:01:44 +03:00
|
|
|
#define RAISE(o) do { nlr_pop(); nlr.ret_val = MP_OBJ_TO_PTR(o); goto exception_handler; } while (0)
|
2013-10-04 22:53:11 +04:00
|
|
|
|
2015-03-28 02:14:44 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
run_code_state: ;
|
|
|
|
#endif
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_ENTER();
|
|
|
|
|
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
run_code_state_from_return: ;
|
|
|
|
#endif
|
|
|
|
FRAME_SETUP();
|
|
|
|
|
2014-06-07 17:16:08 +04:00
|
|
|
// Pointers which are constant for particular invocation of mp_execute_bytecode()
|
2017-07-18 09:17:23 +03:00
|
|
|
mp_obj_t * /*const*/ fastn;
|
|
|
|
mp_exc_stack_t * /*const*/ exc_stack;
|
|
|
|
{
|
2019-09-24 08:57:08 +03:00
|
|
|
size_t n_state = code_state->n_state;
|
2017-07-18 09:17:23 +03:00
|
|
|
fastn = &code_state->state[n_state - 1];
|
|
|
|
exc_stack = (mp_exc_stack_t*)(code_state->state + n_state);
|
|
|
|
}
|
2014-05-31 17:50:46 +04:00
|
|
|
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
// variables that are visible to the exception handler (declared volatile)
|
2019-09-23 16:37:01 +03:00
|
|
|
mp_exc_stack_t *volatile exc_sp = MP_CODE_STATE_EXC_SP_IDX_TO_PTR(exc_stack, code_state->exc_sp_idx); // stack grows up, exc_sp points to top of stack
|
2013-10-16 02:46:01 +04:00
|
|
|
|
2017-02-06 02:50:43 +03:00
|
|
|
#if MICROPY_PY_THREAD_GIL && MICROPY_PY_THREAD_GIL_VM_DIVISOR
|
|
|
|
// This needs to be volatile and outside the VM loop so it persists across handling
|
|
|
|
// of any exceptions. Otherwise it's possible that the VM never gives up the GIL.
|
|
|
|
volatile int gil_divisor = MICROPY_PY_THREAD_GIL_VM_DIVISOR;
|
|
|
|
#endif
|
|
|
|
|
2013-10-16 01:25:17 +04:00
|
|
|
// outer exception handling loop
|
2013-10-04 22:53:11 +04:00
|
|
|
for (;;) {
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
nlr_buf_t nlr;
|
2014-03-26 22:37:06 +04:00
|
|
|
outer_dispatch_loop:
|
2013-10-16 01:25:17 +04:00
|
|
|
if (nlr_push(&nlr) == 0) {
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
// local variables that are not visible to the exception handler
|
2014-05-31 17:50:46 +04:00
|
|
|
const byte *ip = code_state->ip;
|
|
|
|
mp_obj_t *sp = code_state->sp;
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
#if MICROPY_EMIT_BYTECODE_USES_QSTR_TABLE
|
|
|
|
const qstr_short_t *qstr_table = code_state->fun_bc->context->constants.qstr_table;
|
|
|
|
#endif
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t obj_shared;
|
2016-02-16 01:46:21 +03:00
|
|
|
MICROPY_VM_HOOK_INIT
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
|
2014-03-22 19:50:12 +04:00
|
|
|
// If we have exception to inject, now that we finish setting up
|
2019-12-20 06:57:06 +03:00
|
|
|
// execution context, raise it. This works as if MP_BC_RAISE_OBJ
|
2014-03-22 19:50:12 +04:00
|
|
|
// bytecode was executed.
|
2014-03-26 19:36:12 +04:00
|
|
|
// Injecting exc into yield from generator is a special case,
|
|
|
|
// handled by MP_BC_YIELD_FROM itself
|
|
|
|
if (inject_exc != MP_OBJ_NULL && *ip != MP_BC_YIELD_FROM) {
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t exc = inject_exc;
|
2014-03-22 19:50:12 +04:00
|
|
|
inject_exc = MP_OBJ_NULL;
|
2014-05-26 01:58:04 +04:00
|
|
|
exc = mp_make_raise_obj(exc);
|
|
|
|
RAISE(exc);
|
2014-03-22 19:50:12 +04:00
|
|
|
}
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
|
2013-10-16 01:25:17 +04:00
|
|
|
// loop to execute byte code
|
|
|
|
for (;;) {
|
2014-02-01 02:55:05 +04:00
|
|
|
dispatch_loop:
|
2022-07-12 15:46:35 +03:00
|
|
|
#if MICROPY_OPT_COMPUTED_GOTO
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2022-07-12 15:46:35 +03:00
|
|
|
#else
|
2014-04-23 04:40:24 +04:00
|
|
|
TRACE(ip);
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_GLOBAL();
|
2019-08-14 17:09:36 +03:00
|
|
|
TRACE_TICK(ip, sp, false);
|
2014-04-27 21:19:06 +04:00
|
|
|
switch (*ip++) {
|
2022-07-12 15:46:35 +03:00
|
|
|
#endif
|
2014-04-09 18:26:46 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_FALSE):
|
|
|
|
PUSH(mp_const_false);
|
|
|
|
DISPATCH();
|
2014-04-09 18:26:46 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_NONE):
|
|
|
|
PUSH(mp_const_none);
|
|
|
|
DISPATCH();
|
2014-03-23 23:19:02 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_TRUE):
|
|
|
|
PUSH(mp_const_true);
|
|
|
|
DISPATCH();
|
2014-04-09 00:11:49 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_SMALL_INT): {
|
2021-06-08 15:45:56 +03:00
|
|
|
mp_uint_t num = 0;
|
2014-04-14 19:22:44 +04:00
|
|
|
if ((ip[0] & 0x40) != 0) {
|
|
|
|
// Number is negative
|
|
|
|
num--;
|
|
|
|
}
|
|
|
|
do {
|
|
|
|
num = (num << 7) | (*ip & 0x7f);
|
|
|
|
} while ((*ip++ & 0x80) != 0);
|
|
|
|
PUSH(MP_OBJ_NEW_SMALL_INT(num));
|
|
|
|
DISPATCH();
|
|
|
|
}
|
2013-10-16 01:25:17 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_STRING): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
2015-06-25 16:58:41 +03:00
|
|
|
PUSH(MP_OBJ_NEW_QSTR(qst));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2015-01-13 18:55:54 +03:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_OBJ): {
|
2015-11-27 20:01:44 +03:00
|
|
|
DECODE_OBJ;
|
|
|
|
PUSH(obj);
|
2015-01-13 18:55:54 +03:00
|
|
|
DISPATCH();
|
|
|
|
}
|
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_LOAD_NULL):
|
|
|
|
PUSH(MP_OBJ_NULL);
|
|
|
|
DISPATCH();
|
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_LOAD_FAST_N): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
2014-05-26 01:58:04 +04:00
|
|
|
obj_shared = fastn[-unum];
|
2014-04-14 19:22:44 +04:00
|
|
|
load_check:
|
2014-05-26 01:58:04 +04:00
|
|
|
if (obj_shared == MP_OBJ_NULL) {
|
|
|
|
local_name_error: {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2020-03-02 14:35:22 +03:00
|
|
|
mp_obj_t obj = mp_obj_new_exception_msg(&mp_type_NameError, MP_ERROR_TEXT("local variable referenced before assignment"));
|
2014-05-26 01:58:04 +04:00
|
|
|
RAISE(obj);
|
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2014-05-26 01:58:04 +04:00
|
|
|
PUSH(obj_shared);
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_LOAD_DEREF): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
2014-05-26 01:58:04 +04:00
|
|
|
obj_shared = mp_obj_cell_get(fastn[-unum]);
|
2014-04-14 19:22:44 +04:00
|
|
|
goto load_check;
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_LOAD_NAME): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
PUSH(mp_load_name(qst));
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_LOAD_GLOBAL): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
PUSH(mp_load_global(qst));
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_LOAD_ATTR): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
2021-08-19 15:46:40 +03:00
|
|
|
mp_obj_t top = TOP();
|
|
|
|
mp_obj_t obj;
|
|
|
|
#if MICROPY_OPT_LOAD_ATTR_FAST_PATH
|
|
|
|
// For the specific case of an instance type, it implements .attr
|
|
|
|
// and forwards to its members map. Attribute lookups on instance
|
|
|
|
// types are extremely common, so avoid all the other checks and
|
|
|
|
// calls that normally happen first.
|
|
|
|
mp_map_elem_t *elem = NULL;
|
|
|
|
if (mp_obj_is_instance_type(mp_obj_get_type(top))) {
|
|
|
|
mp_obj_instance_t *self = MP_OBJ_TO_PTR(top);
|
|
|
|
elem = mp_map_lookup(&self->members, MP_OBJ_NEW_QSTR(qst), MP_MAP_LOOKUP);
|
|
|
|
}
|
|
|
|
if (elem) {
|
|
|
|
obj = elem->value;
|
|
|
|
} else
|
|
|
|
#endif
|
|
|
|
{
|
|
|
|
obj = mp_load_attr(top, qst);
|
|
|
|
}
|
|
|
|
SET_TOP(obj);
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_LOAD_METHOD): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
mp_load_method(*sp, qst, sp);
|
|
|
|
sp += 1;
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2017-04-19 02:45:59 +03:00
|
|
|
ENTRY(MP_BC_LOAD_SUPER_METHOD): {
|
|
|
|
MARK_EXC_IP_SELECTIVE();
|
|
|
|
DECODE_QSTR;
|
|
|
|
sp -= 1;
|
|
|
|
mp_load_super_method(qst, sp - 1);
|
|
|
|
DISPATCH();
|
|
|
|
}
|
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_LOAD_BUILD_CLASS):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
PUSH(mp_load_build_class());
|
|
|
|
DISPATCH();
|
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_LOAD_SUBSCR): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t index = POP();
|
|
|
|
SET_TOP(mp_obj_subscr(TOP(), index, MP_OBJ_SENTINEL));
|
2014-04-18 01:10:53 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-18 01:10:53 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_STORE_FAST_N): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
fastn[-unum] = POP();
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_STORE_DEREF): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
mp_obj_cell_set(fastn[-unum], POP());
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_STORE_NAME): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
mp_store_name(qst, POP());
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_STORE_GLOBAL): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
mp_store_global(qst, POP());
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_STORE_ATTR): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
mp_store_attr(sp[0], qst, sp[-1]);
|
|
|
|
sp -= 2;
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_STORE_SUBSCR):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-18 01:10:53 +04:00
|
|
|
mp_obj_subscr(sp[-1], sp[0], sp[-2]);
|
2014-04-14 19:22:44 +04:00
|
|
|
sp -= 3;
|
|
|
|
DISPATCH();
|
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_DELETE_FAST): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
if (fastn[-unum] == MP_OBJ_NULL) {
|
|
|
|
goto local_name_error;
|
|
|
|
}
|
|
|
|
fastn[-unum] = MP_OBJ_NULL;
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2013-10-16 01:25:17 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_DELETE_DEREF): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
if (mp_obj_cell_get(fastn[-unum]) == MP_OBJ_NULL) {
|
|
|
|
goto local_name_error;
|
|
|
|
}
|
|
|
|
mp_obj_cell_set(fastn[-unum], MP_OBJ_NULL);
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_DELETE_NAME): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
mp_delete_name(qst);
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_DELETE_GLOBAL): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
|
|
|
mp_delete_global(qst);
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_DUP_TOP): {
|
|
|
|
mp_obj_t top = TOP();
|
|
|
|
PUSH(top);
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_DUP_TOP_TWO):
|
|
|
|
sp += 2;
|
|
|
|
sp[0] = sp[-2];
|
|
|
|
sp[-1] = sp[-3];
|
|
|
|
DISPATCH();
|
|
|
|
|
|
|
|
ENTRY(MP_BC_POP_TOP):
|
|
|
|
sp -= 1;
|
|
|
|
DISPATCH();
|
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_ROT_TWO): {
|
|
|
|
mp_obj_t top = sp[0];
|
2014-04-14 19:22:44 +04:00
|
|
|
sp[0] = sp[-1];
|
2014-05-26 01:58:04 +04:00
|
|
|
sp[-1] = top;
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_ROT_THREE): {
|
|
|
|
mp_obj_t top = sp[0];
|
2014-04-14 19:22:44 +04:00
|
|
|
sp[0] = sp[-1];
|
|
|
|
sp[-1] = sp[-2];
|
2014-05-26 01:58:04 +04:00
|
|
|
sp[-2] = top;
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_JUMP): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_SLABEL;
|
2014-12-02 22:25:10 +03:00
|
|
|
ip += slab;
|
2014-10-25 21:19:55 +04:00
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_POP_JUMP_IF_TRUE): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_SLABEL;
|
|
|
|
if (mp_obj_is_true(POP())) {
|
2014-12-02 22:25:10 +03:00
|
|
|
ip += slab;
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2014-10-25 21:19:55 +04:00
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2013-11-10 00:12:32 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_POP_JUMP_IF_FALSE): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_SLABEL;
|
|
|
|
if (!mp_obj_is_true(POP())) {
|
2014-12-02 22:25:10 +03:00
|
|
|
ip += slab;
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2014-10-25 21:19:55 +04:00
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2013-11-10 00:12:32 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_JUMP_IF_TRUE_OR_POP): {
|
2022-03-21 08:36:13 +03:00
|
|
|
DECODE_ULABEL;
|
2014-04-14 19:22:44 +04:00
|
|
|
if (mp_obj_is_true(TOP())) {
|
2022-03-21 08:36:13 +03:00
|
|
|
ip += ulab;
|
2014-04-14 19:22:44 +04:00
|
|
|
} else {
|
|
|
|
sp--;
|
|
|
|
}
|
2014-10-25 21:19:55 +04:00
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2013-10-16 01:25:17 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_JUMP_IF_FALSE_OR_POP): {
|
2022-03-21 08:36:13 +03:00
|
|
|
DECODE_ULABEL;
|
2014-04-14 19:22:44 +04:00
|
|
|
if (mp_obj_is_true(TOP())) {
|
|
|
|
sp--;
|
|
|
|
} else {
|
2022-03-21 08:36:13 +03:00
|
|
|
ip += ulab;
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2014-10-25 21:19:55 +04:00
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_SETUP_WITH): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2015-04-24 03:52:28 +03:00
|
|
|
// stack: (..., ctx_mgr)
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t obj = TOP();
|
2015-04-24 03:52:28 +03:00
|
|
|
mp_load_method(obj, MP_QSTR___exit__, sp);
|
|
|
|
mp_load_method(obj, MP_QSTR___enter__, sp + 2);
|
|
|
|
mp_obj_t ret = mp_call_method_n_kw(0, 0, sp + 2);
|
|
|
|
sp += 1;
|
2014-12-22 15:49:57 +03:00
|
|
|
PUSH_EXC_BLOCK(1);
|
2014-05-26 01:58:04 +04:00
|
|
|
PUSH(ret);
|
2015-04-24 03:52:28 +03:00
|
|
|
// stack: (..., __exit__, ctx_mgr, as_value)
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_WITH_CLEANUP): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
// Arriving here, there's "exception control block" on top of stack,
|
2015-04-24 03:52:28 +03:00
|
|
|
// and __exit__ method (with self) underneath it. Bytecode calls __exit__,
|
2014-04-14 19:22:44 +04:00
|
|
|
// and "deletes" it off stack, shifting "exception control block"
|
|
|
|
// to its place.
|
2016-09-27 05:37:21 +03:00
|
|
|
// The bytecode emitter ensures that there is enough space on the Python
|
|
|
|
// value stack to hold the __exit__ method plus an additional 4 entries.
|
2014-04-14 19:22:44 +04:00
|
|
|
if (TOP() == mp_const_none) {
|
2015-04-24 03:52:28 +03:00
|
|
|
// stack: (..., __exit__, ctx_mgr, None)
|
|
|
|
sp[1] = mp_const_none;
|
|
|
|
sp[2] = mp_const_none;
|
|
|
|
sp -= 2;
|
|
|
|
mp_call_method_n_kw(3, 0, sp);
|
2014-04-14 19:22:44 +04:00
|
|
|
SET_TOP(mp_const_none);
|
2019-01-30 10:49:52 +03:00
|
|
|
} else if (mp_obj_is_small_int(TOP())) {
|
2018-02-08 05:30:33 +03:00
|
|
|
// Getting here there are two distinct cases:
|
|
|
|
// - unwind return, stack: (..., __exit__, ctx_mgr, ret_val, SMALL_INT(-1))
|
|
|
|
// - unwind jump, stack: (..., __exit__, ctx_mgr, dest_ip, SMALL_INT(num_exc))
|
|
|
|
// For both cases we do exactly the same thing.
|
|
|
|
mp_obj_t data = sp[-1];
|
|
|
|
mp_obj_t cause = sp[0];
|
|
|
|
sp[-1] = mp_const_none;
|
|
|
|
sp[0] = mp_const_none;
|
|
|
|
sp[1] = mp_const_none;
|
|
|
|
mp_call_method_n_kw(3, 0, sp - 3);
|
|
|
|
sp[-3] = data;
|
|
|
|
sp[-2] = cause;
|
2015-04-24 03:52:28 +03:00
|
|
|
sp -= 2; // we removed (__exit__, ctx_mgr)
|
2015-03-26 01:20:37 +03:00
|
|
|
} else {
|
2016-09-27 05:37:21 +03:00
|
|
|
assert(mp_obj_is_exception_instance(TOP()));
|
|
|
|
// stack: (..., __exit__, ctx_mgr, exc_instance)
|
|
|
|
// Need to pass (exc_type, exc_instance, None) as arguments to __exit__.
|
|
|
|
sp[1] = sp[0];
|
2016-09-27 06:21:23 +03:00
|
|
|
sp[0] = MP_OBJ_FROM_PTR(mp_obj_get_type(sp[0]));
|
2016-09-27 05:37:21 +03:00
|
|
|
sp[2] = mp_const_none;
|
|
|
|
sp -= 2;
|
|
|
|
mp_obj_t ret_value = mp_call_method_n_kw(3, 0, sp);
|
2014-05-26 01:58:04 +04:00
|
|
|
if (mp_obj_is_true(ret_value)) {
|
2015-04-24 03:52:28 +03:00
|
|
|
// We need to silence/swallow the exception. This is done
|
|
|
|
// by popping the exception and the __exit__ handler and
|
|
|
|
// replacing it with None, which signals END_FINALLY to just
|
|
|
|
// execute the finally handler normally.
|
|
|
|
SET_TOP(mp_const_none);
|
2015-02-10 16:21:42 +03:00
|
|
|
} else {
|
2015-04-24 03:52:28 +03:00
|
|
|
// We need to re-raise the exception. We pop __exit__ handler
|
2016-09-27 05:37:21 +03:00
|
|
|
// by copying the exception instance down to the new top-of-stack.
|
|
|
|
sp[0] = sp[3];
|
2014-01-30 15:49:18 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
|
|
|
DISPATCH();
|
|
|
|
}
|
2013-10-16 01:25:17 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_UNWIND_JUMP): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_SLABEL;
|
2015-11-27 20:01:44 +03:00
|
|
|
PUSH((mp_obj_t)(mp_uint_t)(uintptr_t)(ip + slab)); // push destination ip for jump
|
|
|
|
PUSH((mp_obj_t)(mp_uint_t)(*ip)); // push number of exception handlers to unwind (0x80 bit set if we also need to pop stack)
|
2014-12-02 22:25:10 +03:00
|
|
|
unwind_jump:;
|
|
|
|
mp_uint_t unum = (mp_uint_t)POP(); // get number of exception handlers to unwind
|
2014-05-30 18:20:41 +04:00
|
|
|
while ((unum & 0x7f) > 0) {
|
2014-04-14 19:22:44 +04:00
|
|
|
unum -= 1;
|
2014-03-22 15:49:31 +04:00
|
|
|
assert(exc_sp >= exc_stack);
|
2019-09-27 17:07:21 +03:00
|
|
|
|
|
|
|
if (MP_TAGPTR_TAG1(exc_sp->val_sp)) {
|
2022-06-17 16:01:55 +03:00
|
|
|
if (exc_sp->handler >= ip) {
|
2019-09-27 17:07:21 +03:00
|
|
|
// Found a finally handler that isn't active; run it.
|
|
|
|
// Getting here the stack looks like:
|
|
|
|
// (..., X, dest_ip)
|
|
|
|
// where X is pointed to by exc_sp->val_sp and in the case
|
|
|
|
// of a "with" block contains the context manager info.
|
|
|
|
assert(&sp[-1] == MP_TAGPTR_PTR(exc_sp->val_sp));
|
|
|
|
// We're going to run "finally" code as a coroutine
|
|
|
|
// (not calling it recursively). Set up a sentinel
|
|
|
|
// on the stack so it can return back to us when it is
|
|
|
|
// done (when WITH_CLEANUP or END_FINALLY reached).
|
|
|
|
// The sentinel is the number of exception handlers left to
|
|
|
|
// unwind, which is a non-negative integer.
|
|
|
|
PUSH(MP_OBJ_NEW_SMALL_INT(unum));
|
|
|
|
ip = exc_sp->handler;
|
|
|
|
goto dispatch_loop;
|
|
|
|
} else {
|
|
|
|
// Found a finally handler that is already active; cancel it.
|
|
|
|
CANCEL_ACTIVE_FINALLY(sp);
|
|
|
|
}
|
2014-02-02 03:04:09 +04:00
|
|
|
}
|
2016-02-01 19:07:21 +03:00
|
|
|
POP_EXC_BLOCK();
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2015-11-27 20:01:44 +03:00
|
|
|
ip = (const byte*)MP_OBJ_TO_PTR(POP()); // pop destination ip for jump
|
2014-05-30 18:20:41 +04:00
|
|
|
if (unum != 0) {
|
2017-05-25 13:39:08 +03:00
|
|
|
// pop the exhausted iterator
|
2017-03-23 08:36:08 +03:00
|
|
|
sp -= MP_OBJ_ITER_BUF_NSLOTS;
|
2014-05-30 18:20:41 +04:00
|
|
|
}
|
2014-10-25 21:19:55 +04:00
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_SETUP_EXCEPT):
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_SETUP_FINALLY): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-12-29 03:29:59 +03:00
|
|
|
#if SELECTIVE_EXC_IP
|
|
|
|
PUSH_EXC_BLOCK((code_state->ip[-1] == MP_BC_SETUP_FINALLY) ? 1 : 0);
|
|
|
|
#else
|
|
|
|
PUSH_EXC_BLOCK((code_state->ip[0] == MP_BC_SETUP_FINALLY) ? 1 : 0);
|
|
|
|
#endif
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_END_FINALLY):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
// if TOS is None, just pops it and continues
|
2016-09-27 05:37:21 +03:00
|
|
|
// if TOS is an integer, finishes coroutine and returns control to caller
|
|
|
|
// if TOS is an exception, reraises the exception
|
2019-09-27 17:07:21 +03:00
|
|
|
assert(exc_sp >= exc_stack);
|
|
|
|
POP_EXC_BLOCK();
|
2014-04-14 19:22:44 +04:00
|
|
|
if (TOP() == mp_const_none) {
|
2014-01-18 18:10:48 +04:00
|
|
|
sp--;
|
2019-01-30 10:49:52 +03:00
|
|
|
} else if (mp_obj_is_small_int(TOP())) {
|
2014-04-14 19:22:44 +04:00
|
|
|
// We finished "finally" coroutine and now dispatch back
|
|
|
|
// to our caller, based on TOS value
|
2018-02-08 05:30:33 +03:00
|
|
|
mp_int_t cause = MP_OBJ_SMALL_INT_VALUE(POP());
|
|
|
|
if (cause < 0) {
|
|
|
|
// A negative cause indicates unwind return
|
2015-03-26 01:20:37 +03:00
|
|
|
goto unwind_return;
|
|
|
|
} else {
|
2018-02-08 05:30:33 +03:00
|
|
|
// Otherwise it's an unwind jump and we must push as a raw
|
|
|
|
// number the number of exception handlers to unwind
|
|
|
|
PUSH((mp_obj_t)cause);
|
2015-03-26 01:20:37 +03:00
|
|
|
goto unwind_jump;
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2016-09-27 05:37:21 +03:00
|
|
|
} else {
|
|
|
|
assert(mp_obj_is_exception_instance(TOP()));
|
|
|
|
RAISE(TOP());
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
|
|
|
DISPATCH();
|
|
|
|
|
|
|
|
ENTRY(MP_BC_GET_ITER):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2016-01-10 02:14:54 +03:00
|
|
|
SET_TOP(mp_getiter(TOP(), NULL));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
|
|
|
|
2017-03-23 08:36:08 +03:00
|
|
|
// An iterator for a for-loop takes MP_OBJ_ITER_BUF_NSLOTS slots on
|
|
|
|
// the Python value stack. These slots are either used to store the
|
|
|
|
// iterator object itself, or the first slot is MP_OBJ_NULL and
|
2017-01-17 07:27:37 +03:00
|
|
|
// the second slot holds a reference to the iterator object.
|
2016-01-10 02:59:52 +03:00
|
|
|
ENTRY(MP_BC_GET_ITER_STACK): {
|
|
|
|
MARK_EXC_IP_SELECTIVE();
|
|
|
|
mp_obj_t obj = TOP();
|
|
|
|
mp_obj_iter_buf_t *iter_buf = (mp_obj_iter_buf_t*)sp;
|
2017-03-23 08:36:08 +03:00
|
|
|
sp += MP_OBJ_ITER_BUF_NSLOTS - 1;
|
2017-01-17 07:27:37 +03:00
|
|
|
obj = mp_getiter(obj, iter_buf);
|
|
|
|
if (obj != MP_OBJ_FROM_PTR(iter_buf)) {
|
|
|
|
// Iterator didn't use the stack so indicate that with MP_OBJ_NULL.
|
2022-03-31 02:57:34 +03:00
|
|
|
*(sp - MP_OBJ_ITER_BUF_NSLOTS + 1) = MP_OBJ_NULL;
|
|
|
|
*(sp - MP_OBJ_ITER_BUF_NSLOTS + 2) = obj;
|
2017-01-17 07:27:37 +03:00
|
|
|
}
|
2016-01-10 02:59:52 +03:00
|
|
|
DISPATCH();
|
|
|
|
}
|
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_FOR_ITER): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_ULABEL; // the jump offset if iteration finishes; for labels are always forward
|
py, vm: Replace save_ip, save_sp with code_state->{ip, sp}.
This may seem a bit of a risky change, in that it may introduce crazy
bugs with respect to volatile variables in the VM loop. But, I think it
should be fine: code_state points to some external memory, so the
compiler should always read/write to that memory when accessing the
ip/sp variables (ie not put them in registers).
Anyway, it passes all tests and improves on all efficiency fronts: about
2-4% faster (64-bit unix), 16 bytes less stack space per call (64-bit
unix) and slightly less executable size (unix and stmhal).
The reason it's more efficient is save_ip and save_sp were volatile
variables, so were anyway stored on the stack (in memory, not regs).
Thus converting them to code_state->{ip, sp} doesn't cost an extra
memory dereference (except maybe to get code_state, but that can be put
in a register and then made more efficient for other uses of it).
2014-06-01 15:32:28 +04:00
|
|
|
code_state->sp = sp;
|
2017-01-17 07:27:37 +03:00
|
|
|
mp_obj_t obj;
|
2022-03-31 02:57:34 +03:00
|
|
|
if (*(sp - MP_OBJ_ITER_BUF_NSLOTS + 1) == MP_OBJ_NULL) {
|
|
|
|
obj = *(sp - MP_OBJ_ITER_BUF_NSLOTS + 2);
|
2017-01-17 07:27:37 +03:00
|
|
|
} else {
|
2017-03-23 08:36:08 +03:00
|
|
|
obj = MP_OBJ_FROM_PTR(&sp[-MP_OBJ_ITER_BUF_NSLOTS + 1]);
|
2017-01-17 07:27:37 +03:00
|
|
|
}
|
|
|
|
mp_obj_t value = mp_iternext_allow_raise(obj);
|
2014-05-26 01:58:04 +04:00
|
|
|
if (value == MP_OBJ_STOP_ITERATION) {
|
2017-03-23 08:36:08 +03:00
|
|
|
sp -= MP_OBJ_ITER_BUF_NSLOTS; // pop the exhausted iterator
|
2014-12-02 22:25:10 +03:00
|
|
|
ip += ulab; // jump to after for-block
|
2014-04-14 19:22:44 +04:00
|
|
|
} else {
|
2014-05-26 01:58:04 +04:00
|
|
|
PUSH(value); // push the next iteration value
|
2019-08-14 17:09:36 +03:00
|
|
|
#if MICROPY_PY_SYS_SETTRACE
|
|
|
|
// LINE event should trigger for every iteration so invalidate last trigger
|
|
|
|
if (code_state->frame) {
|
|
|
|
code_state->frame->lineno = 0;
|
|
|
|
}
|
|
|
|
#endif
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2019-02-15 04:18:59 +03:00
|
|
|
ENTRY(MP_BC_POP_EXCEPT_JUMP): {
|
2014-04-14 19:22:44 +04:00
|
|
|
assert(exc_sp >= exc_stack);
|
|
|
|
POP_EXC_BLOCK();
|
2019-02-15 04:18:59 +03:00
|
|
|
DECODE_ULABEL;
|
|
|
|
ip += ulab;
|
|
|
|
DISPATCH_WITH_PEND_EXC_CHECK();
|
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_BUILD_TUPLE): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
sp -= unum - 1;
|
|
|
|
SET_TOP(mp_obj_new_tuple(unum, sp));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_BUILD_LIST): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
sp -= unum - 1;
|
|
|
|
SET_TOP(mp_obj_new_list(unum, sp));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_BUILD_MAP): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
PUSH(mp_obj_new_dict(unum));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_STORE_MAP):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
sp -= 2;
|
|
|
|
mp_obj_dict_store(sp[0], sp[2], sp[1]);
|
|
|
|
DISPATCH();
|
|
|
|
|
2022-07-12 15:46:35 +03:00
|
|
|
#if MICROPY_PY_BUILTINS_SET
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_BUILD_SET): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
sp -= unum - 1;
|
|
|
|
SET_TOP(mp_obj_new_set(unum, sp));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2022-07-12 15:46:35 +03:00
|
|
|
#endif
|
2013-10-16 23:57:49 +04:00
|
|
|
|
2022-07-12 15:46:35 +03:00
|
|
|
#if MICROPY_PY_BUILTINS_SLICE
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_BUILD_SLICE): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2018-09-27 04:22:33 +03:00
|
|
|
mp_obj_t step = mp_const_none;
|
|
|
|
if (*ip++ == 3) {
|
|
|
|
// 3-argument slice includes step
|
|
|
|
step = POP();
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2018-09-27 04:22:33 +03:00
|
|
|
mp_obj_t stop = POP();
|
|
|
|
mp_obj_t start = TOP();
|
|
|
|
SET_TOP(mp_obj_new_slice(start, stop, step));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2022-07-12 15:46:35 +03:00
|
|
|
#endif
|
2014-01-03 04:48:56 +04:00
|
|
|
|
2016-09-19 01:46:01 +03:00
|
|
|
ENTRY(MP_BC_STORE_COMP): {
|
|
|
|
MARK_EXC_IP_SELECTIVE();
|
|
|
|
DECODE_UINT;
|
|
|
|
mp_obj_t obj = sp[-(unum >> 2)];
|
|
|
|
if ((unum & 3) == 0) {
|
|
|
|
mp_obj_list_append(obj, sp[0]);
|
|
|
|
sp--;
|
|
|
|
} else if (!MICROPY_PY_BUILTINS_SET || (unum & 3) == 1) {
|
|
|
|
mp_obj_dict_store(obj, sp[0], sp[-1]);
|
|
|
|
sp -= 2;
|
|
|
|
#if MICROPY_PY_BUILTINS_SET
|
|
|
|
} else {
|
|
|
|
mp_obj_set_store(obj, sp[0]);
|
|
|
|
sp--;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
DISPATCH();
|
|
|
|
}
|
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_UNPACK_SEQUENCE): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
mp_unpack_sequence(sp[0], unum, sp);
|
|
|
|
sp += unum - 1;
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_UNPACK_EX): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
mp_unpack_ex(sp[0], unum, sp);
|
|
|
|
sp += (unum & 0xff) + ((unum >> 8) & 0xff);
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_MAKE_FUNCTION): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_PTR;
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
PUSH(mp_make_function_from_raw_code(ptr, code_state->fun_bc->context, NULL));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_MAKE_FUNCTION_DEFARGS): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_PTR;
|
|
|
|
// Stack layout: def_tuple def_dict <- TOS
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
sp -= 1;
|
|
|
|
SET_TOP(mp_make_function_from_raw_code(ptr, code_state->fun_bc->context, sp));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-04-20 20:50:40 +04:00
|
|
|
ENTRY(MP_BC_MAKE_CLOSURE): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_PTR;
|
2017-02-16 08:48:33 +03:00
|
|
|
size_t n_closed_over = *ip++;
|
2014-04-20 20:50:40 +04:00
|
|
|
// Stack layout: closed_overs <- TOS
|
|
|
|
sp -= n_closed_over - 1;
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
SET_TOP(mp_make_closure_from_raw_code(ptr, code_state->fun_bc->context, n_closed_over, sp));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-04-20 20:50:40 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-04-20 20:50:40 +04:00
|
|
|
ENTRY(MP_BC_MAKE_CLOSURE_DEFARGS): {
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_PTR;
|
2017-02-16 08:48:33 +03:00
|
|
|
size_t n_closed_over = *ip++;
|
2014-04-20 20:50:40 +04:00
|
|
|
// Stack layout: def_tuple def_dict closed_overs <- TOS
|
|
|
|
sp -= 2 + n_closed_over - 1;
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
SET_TOP(mp_make_closure_from_raw_code(ptr, code_state->fun_bc->context, 0x100 | n_closed_over, sp));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-04-20 20:50:40 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_CALL_FUNCTION): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
// unum & 0xff == n_positional
|
|
|
|
// (unum >> 8) & 0xff == n_keyword
|
|
|
|
sp -= (unum & 0xff) + ((unum >> 7) & 0x1fe);
|
2015-03-28 02:14:44 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
if (mp_obj_get_type(*sp) == &mp_type_fun_bc) {
|
|
|
|
code_state->ip = ip;
|
|
|
|
code_state->sp = sp;
|
2019-09-23 16:37:01 +03:00
|
|
|
code_state->exc_sp_idx = MP_CODE_STATE_EXC_SP_IDX_FROM_PTR(exc_stack, exc_sp);
|
2016-08-27 16:21:00 +03:00
|
|
|
mp_code_state_t *new_state = mp_obj_fun_bc_prepare_codestate(*sp, unum & 0xff, (unum >> 8) & 0xff, sp + 1);
|
2018-04-03 17:51:10 +03:00
|
|
|
#if !MICROPY_ENABLE_PYSTACK
|
|
|
|
if (new_state == NULL) {
|
|
|
|
// Couldn't allocate codestate on heap: in the strict case raise
|
|
|
|
// an exception, otherwise just fall through to stack allocation.
|
|
|
|
#if MICROPY_STACKLESS_STRICT
|
|
|
|
deep_recursion_error:
|
|
|
|
mp_raise_recursion_depth();
|
|
|
|
#endif
|
|
|
|
} else
|
|
|
|
#endif
|
|
|
|
{
|
2015-03-28 02:14:44 +03:00
|
|
|
new_state->prev = code_state;
|
|
|
|
code_state = new_state;
|
|
|
|
nlr_pop();
|
|
|
|
goto run_code_state;
|
|
|
|
}
|
2015-03-28 02:14:44 +03:00
|
|
|
}
|
|
|
|
#endif
|
2014-04-14 19:22:44 +04:00
|
|
|
SET_TOP(mp_call_function_n_kw(*sp, unum & 0xff, (unum >> 8) & 0xff, sp + 1));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_CALL_FUNCTION_VAR_KW): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
// unum & 0xff == n_positional
|
|
|
|
// (unum >> 8) & 0xff == n_keyword
|
2017-05-29 10:08:14 +03:00
|
|
|
// We have following stack layout here:
|
2020-03-25 07:54:45 +03:00
|
|
|
// fun arg0 arg1 ... kw0 val0 kw1 val1 ... bitmap <- TOS
|
2020-03-25 05:39:46 +03:00
|
|
|
sp -= (unum & 0xff) + ((unum >> 7) & 0x1fe) + 1;
|
2015-03-28 02:14:45 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
if (mp_obj_get_type(*sp) == &mp_type_fun_bc) {
|
|
|
|
code_state->ip = ip;
|
|
|
|
code_state->sp = sp;
|
2019-09-23 16:37:01 +03:00
|
|
|
code_state->exc_sp_idx = MP_CODE_STATE_EXC_SP_IDX_FROM_PTR(exc_stack, exc_sp);
|
2015-03-28 02:14:45 +03:00
|
|
|
|
2015-04-02 01:31:30 +03:00
|
|
|
mp_call_args_t out_args;
|
2015-03-28 02:14:45 +03:00
|
|
|
mp_call_prepare_args_n_kw_var(false, unum, sp, &out_args);
|
|
|
|
|
2016-08-27 16:21:00 +03:00
|
|
|
mp_code_state_t *new_state = mp_obj_fun_bc_prepare_codestate(out_args.fun,
|
2015-03-28 02:14:45 +03:00
|
|
|
out_args.n_args, out_args.n_kw, out_args.args);
|
2017-11-26 15:48:23 +03:00
|
|
|
#if !MICROPY_ENABLE_PYSTACK
|
|
|
|
// Freeing args at this point does not follow a LIFO order so only do it if
|
|
|
|
// pystack is not enabled. For pystack, they are freed when code_state is.
|
|
|
|
mp_nonlocal_free(out_args.args, out_args.n_alloc * sizeof(mp_obj_t));
|
|
|
|
#endif
|
2018-04-03 17:51:10 +03:00
|
|
|
#if !MICROPY_ENABLE_PYSTACK
|
|
|
|
if (new_state == NULL) {
|
|
|
|
// Couldn't allocate codestate on heap: in the strict case raise
|
|
|
|
// an exception, otherwise just fall through to stack allocation.
|
|
|
|
#if MICROPY_STACKLESS_STRICT
|
|
|
|
goto deep_recursion_error;
|
|
|
|
#endif
|
|
|
|
} else
|
|
|
|
#endif
|
|
|
|
{
|
2015-03-28 02:14:45 +03:00
|
|
|
new_state->prev = code_state;
|
|
|
|
code_state = new_state;
|
|
|
|
nlr_pop();
|
|
|
|
goto run_code_state;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
2014-04-14 19:22:44 +04:00
|
|
|
SET_TOP(mp_call_method_n_kw_var(false, unum, sp));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_CALL_METHOD): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
// unum & 0xff == n_positional
|
|
|
|
// (unum >> 8) & 0xff == n_keyword
|
|
|
|
sp -= (unum & 0xff) + ((unum >> 7) & 0x1fe) + 1;
|
2015-03-28 02:14:44 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
if (mp_obj_get_type(*sp) == &mp_type_fun_bc) {
|
|
|
|
code_state->ip = ip;
|
|
|
|
code_state->sp = sp;
|
2019-09-23 16:37:01 +03:00
|
|
|
code_state->exc_sp_idx = MP_CODE_STATE_EXC_SP_IDX_FROM_PTR(exc_stack, exc_sp);
|
2015-03-28 02:14:44 +03:00
|
|
|
|
2017-02-16 08:48:33 +03:00
|
|
|
size_t n_args = unum & 0xff;
|
|
|
|
size_t n_kw = (unum >> 8) & 0xff;
|
2015-12-17 15:32:41 +03:00
|
|
|
int adjust = (sp[1] == MP_OBJ_NULL) ? 0 : 1;
|
2015-03-28 02:14:44 +03:00
|
|
|
|
2016-08-27 16:21:00 +03:00
|
|
|
mp_code_state_t *new_state = mp_obj_fun_bc_prepare_codestate(*sp, n_args + adjust, n_kw, sp + 2 - adjust);
|
2018-04-03 17:51:10 +03:00
|
|
|
#if !MICROPY_ENABLE_PYSTACK
|
|
|
|
if (new_state == NULL) {
|
|
|
|
// Couldn't allocate codestate on heap: in the strict case raise
|
|
|
|
// an exception, otherwise just fall through to stack allocation.
|
|
|
|
#if MICROPY_STACKLESS_STRICT
|
|
|
|
goto deep_recursion_error;
|
|
|
|
#endif
|
|
|
|
} else
|
|
|
|
#endif
|
|
|
|
{
|
2015-03-28 02:14:44 +03:00
|
|
|
new_state->prev = code_state;
|
|
|
|
code_state = new_state;
|
|
|
|
nlr_pop();
|
|
|
|
goto run_code_state;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
2014-04-14 19:22:44 +04:00
|
|
|
SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-12-02 22:25:10 +03:00
|
|
|
ENTRY(MP_BC_CALL_METHOD_VAR_KW): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_UINT;
|
|
|
|
// unum & 0xff == n_positional
|
|
|
|
// (unum >> 8) & 0xff == n_keyword
|
2017-05-29 10:08:14 +03:00
|
|
|
// We have following stack layout here:
|
2020-03-25 07:54:45 +03:00
|
|
|
// fun self arg0 arg1 ... kw0 val0 kw1 val1 ... bitmap <- TOS
|
2020-03-25 05:39:46 +03:00
|
|
|
sp -= (unum & 0xff) + ((unum >> 7) & 0x1fe) + 2;
|
2015-03-28 02:14:45 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
if (mp_obj_get_type(*sp) == &mp_type_fun_bc) {
|
|
|
|
code_state->ip = ip;
|
|
|
|
code_state->sp = sp;
|
2019-09-23 16:37:01 +03:00
|
|
|
code_state->exc_sp_idx = MP_CODE_STATE_EXC_SP_IDX_FROM_PTR(exc_stack, exc_sp);
|
2015-03-28 02:14:45 +03:00
|
|
|
|
2015-04-02 01:31:30 +03:00
|
|
|
mp_call_args_t out_args;
|
2015-03-28 02:14:45 +03:00
|
|
|
mp_call_prepare_args_n_kw_var(true, unum, sp, &out_args);
|
|
|
|
|
2016-08-27 16:21:00 +03:00
|
|
|
mp_code_state_t *new_state = mp_obj_fun_bc_prepare_codestate(out_args.fun,
|
2015-03-28 02:14:45 +03:00
|
|
|
out_args.n_args, out_args.n_kw, out_args.args);
|
2017-11-26 15:48:23 +03:00
|
|
|
#if !MICROPY_ENABLE_PYSTACK
|
|
|
|
// Freeing args at this point does not follow a LIFO order so only do it if
|
|
|
|
// pystack is not enabled. For pystack, they are freed when code_state is.
|
|
|
|
mp_nonlocal_free(out_args.args, out_args.n_alloc * sizeof(mp_obj_t));
|
|
|
|
#endif
|
2018-04-03 17:51:10 +03:00
|
|
|
#if !MICROPY_ENABLE_PYSTACK
|
|
|
|
if (new_state == NULL) {
|
|
|
|
// Couldn't allocate codestate on heap: in the strict case raise
|
|
|
|
// an exception, otherwise just fall through to stack allocation.
|
|
|
|
#if MICROPY_STACKLESS_STRICT
|
|
|
|
goto deep_recursion_error;
|
|
|
|
#endif
|
|
|
|
} else
|
|
|
|
#endif
|
|
|
|
{
|
2015-03-28 02:14:45 +03:00
|
|
|
new_state->prev = code_state;
|
|
|
|
code_state = new_state;
|
|
|
|
nlr_pop();
|
|
|
|
goto run_code_state;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
2014-04-14 19:22:44 +04:00
|
|
|
SET_TOP(mp_call_method_n_kw_var(true, unum, sp));
|
|
|
|
DISPATCH();
|
2014-12-02 22:25:10 +03:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_RETURN_VALUE):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-02-01 02:55:05 +04:00
|
|
|
unwind_return:
|
2018-09-03 06:08:16 +03:00
|
|
|
// Search for and execute finally handlers that aren't already active
|
2014-04-14 19:22:44 +04:00
|
|
|
while (exc_sp >= exc_stack) {
|
2019-09-27 17:07:21 +03:00
|
|
|
if (MP_TAGPTR_TAG1(exc_sp->val_sp)) {
|
2022-06-17 16:01:55 +03:00
|
|
|
if (exc_sp->handler >= ip) {
|
2019-09-27 17:07:21 +03:00
|
|
|
// Found a finally handler that isn't active; run it.
|
|
|
|
// Getting here the stack looks like:
|
|
|
|
// (..., X, [iter0, iter1, ...,] ret_val)
|
|
|
|
// where X is pointed to by exc_sp->val_sp and in the case
|
|
|
|
// of a "with" block contains the context manager info.
|
|
|
|
// There may be 0 or more for-iterators between X and the
|
|
|
|
// return value, and these must be removed before control can
|
|
|
|
// pass to the finally code. We simply copy the ret_value down
|
|
|
|
// over these iterators, if they exist. If they don't then the
|
|
|
|
// following is a null operation.
|
|
|
|
mp_obj_t *finally_sp = MP_TAGPTR_PTR(exc_sp->val_sp);
|
|
|
|
finally_sp[1] = sp[0];
|
|
|
|
sp = &finally_sp[1];
|
|
|
|
// We're going to run "finally" code as a coroutine
|
|
|
|
// (not calling it recursively). Set up a sentinel
|
|
|
|
// on a stack so it can return back to us when it is
|
|
|
|
// done (when WITH_CLEANUP or END_FINALLY reached).
|
|
|
|
PUSH(MP_OBJ_NEW_SMALL_INT(-1));
|
|
|
|
ip = exc_sp->handler;
|
|
|
|
goto dispatch_loop;
|
|
|
|
} else {
|
|
|
|
// Found a finally handler that is already active; cancel it.
|
|
|
|
CANCEL_ACTIVE_FINALLY(sp);
|
|
|
|
}
|
2014-02-01 02:55:05 +04:00
|
|
|
}
|
2018-09-03 06:08:16 +03:00
|
|
|
POP_EXC_BLOCK();
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
|
|
|
nlr_pop();
|
2014-05-31 17:50:46 +04:00
|
|
|
code_state->sp = sp;
|
2014-04-14 19:22:44 +04:00
|
|
|
assert(exc_sp == exc_stack - 1);
|
2016-02-16 01:46:21 +03:00
|
|
|
MICROPY_VM_HOOK_RETURN
|
2015-03-28 02:14:44 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
if (code_state->prev != NULL) {
|
|
|
|
mp_obj_t res = *sp;
|
|
|
|
mp_globals_set(code_state->old_globals);
|
2017-11-26 15:37:19 +03:00
|
|
|
mp_code_state_t *new_code_state = code_state->prev;
|
|
|
|
#if MICROPY_ENABLE_PYSTACK
|
2017-11-26 15:48:23 +03:00
|
|
|
// Free code_state, and args allocated by mp_call_prepare_args_n_kw_var
|
|
|
|
// (The latter is implicitly freed when using pystack due to its LIFO nature.)
|
2017-11-26 15:37:19 +03:00
|
|
|
// The sizeof in the following statement does not include the size of the variable
|
|
|
|
// part of the struct. This arg is anyway not used if pystack is enabled.
|
|
|
|
mp_nonlocal_free(code_state, sizeof(mp_code_state_t));
|
|
|
|
#endif
|
|
|
|
code_state = new_code_state;
|
2015-03-28 02:14:44 +03:00
|
|
|
*code_state->sp = res;
|
2019-08-14 17:09:36 +03:00
|
|
|
goto run_code_state_from_return;
|
2015-03-28 02:14:44 +03:00
|
|
|
}
|
|
|
|
#endif
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_LEAVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
return MP_VM_RETURN_NORMAL;
|
|
|
|
|
2019-08-22 05:39:07 +03:00
|
|
|
ENTRY(MP_BC_RAISE_LAST): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2019-08-22 05:39:07 +03:00
|
|
|
// search for the inner-most previous exception, to reraise it
|
|
|
|
mp_obj_t obj = MP_OBJ_NULL;
|
|
|
|
for (mp_exc_stack_t *e = exc_sp; e >= exc_stack; --e) {
|
|
|
|
if (e->prev_exc != NULL) {
|
|
|
|
obj = MP_OBJ_FROM_PTR(e->prev_exc);
|
|
|
|
break;
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
|
|
|
}
|
2019-08-22 05:39:07 +03:00
|
|
|
if (obj == MP_OBJ_NULL) {
|
2020-03-02 14:35:22 +03:00
|
|
|
obj = mp_obj_new_exception_msg(&mp_type_RuntimeError, MP_ERROR_TEXT("no active exception to reraise"));
|
2019-08-22 05:39:07 +03:00
|
|
|
}
|
|
|
|
RAISE(obj);
|
|
|
|
}
|
|
|
|
|
|
|
|
ENTRY(MP_BC_RAISE_OBJ): {
|
|
|
|
MARK_EXC_IP_SELECTIVE();
|
|
|
|
mp_obj_t obj = mp_make_raise_obj(TOP());
|
|
|
|
RAISE(obj);
|
|
|
|
}
|
|
|
|
|
|
|
|
ENTRY(MP_BC_RAISE_FROM): {
|
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2023-10-06 22:02:43 +03:00
|
|
|
mp_obj_t from_value = POP();
|
|
|
|
if (from_value != mp_const_none) {
|
|
|
|
mp_warning(NULL, "exception chaining not supported");
|
|
|
|
}
|
2019-08-22 05:39:07 +03:00
|
|
|
mp_obj_t obj = mp_make_raise_obj(TOP());
|
2014-05-26 01:58:04 +04:00
|
|
|
RAISE(obj);
|
|
|
|
}
|
2014-01-10 18:09:55 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_YIELD_VALUE):
|
2014-03-26 19:36:12 +04:00
|
|
|
yield:
|
2014-04-14 19:22:44 +04:00
|
|
|
nlr_pop();
|
2014-05-31 17:50:46 +04:00
|
|
|
code_state->ip = ip;
|
|
|
|
code_state->sp = sp;
|
2019-09-23 16:37:01 +03:00
|
|
|
code_state->exc_sp_idx = MP_CODE_STATE_EXC_SP_IDX_FROM_PTR(exc_stack, exc_sp);
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_LEAVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
return MP_VM_RETURN_YIELD;
|
2013-10-16 01:25:17 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
ENTRY(MP_BC_YIELD_FROM): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
mp_vm_return_kind_t ret_kind;
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t send_value = POP();
|
2014-04-14 19:22:44 +04:00
|
|
|
mp_obj_t t_exc = MP_OBJ_NULL;
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t ret_value;
|
2018-02-27 07:39:31 +03:00
|
|
|
code_state->sp = sp; // Save sp because it's needed if mp_resume raises StopIteration
|
2014-04-14 19:22:44 +04:00
|
|
|
if (inject_exc != MP_OBJ_NULL) {
|
|
|
|
t_exc = inject_exc;
|
|
|
|
inject_exc = MP_OBJ_NULL;
|
2014-05-26 01:58:04 +04:00
|
|
|
ret_kind = mp_resume(TOP(), MP_OBJ_NULL, t_exc, &ret_value);
|
2014-04-14 19:22:44 +04:00
|
|
|
} else {
|
2014-05-26 01:58:04 +04:00
|
|
|
ret_kind = mp_resume(TOP(), send_value, MP_OBJ_NULL, &ret_value);
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2014-03-26 19:36:12 +04:00
|
|
|
|
2014-04-14 19:22:44 +04:00
|
|
|
if (ret_kind == MP_VM_RETURN_YIELD) {
|
|
|
|
ip--;
|
2014-05-26 01:58:04 +04:00
|
|
|
PUSH(ret_value);
|
2014-04-14 19:22:44 +04:00
|
|
|
goto yield;
|
2017-07-03 19:14:25 +03:00
|
|
|
} else if (ret_kind == MP_VM_RETURN_NORMAL) {
|
2021-06-29 15:39:24 +03:00
|
|
|
// The generator has finished, and returned a value via StopIteration
|
|
|
|
// Replace exhausted generator with the returned value
|
|
|
|
SET_TOP(ret_value);
|
2014-04-14 19:22:44 +04:00
|
|
|
// If we injected GeneratorExit downstream, then even
|
|
|
|
// if it was swallowed, we re-raise GeneratorExit
|
2022-07-12 15:41:10 +03:00
|
|
|
if (t_exc != MP_OBJ_NULL && mp_obj_exception_match(t_exc, MP_OBJ_FROM_PTR(&mp_type_GeneratorExit))) {
|
|
|
|
mp_obj_t raise_t = mp_make_raise_obj(t_exc);
|
|
|
|
RAISE(raise_t);
|
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2017-07-03 19:14:25 +03:00
|
|
|
} else {
|
|
|
|
assert(ret_kind == MP_VM_RETURN_EXCEPTION);
|
2022-07-12 15:41:10 +03:00
|
|
|
assert(!mp_obj_exception_match(ret_value, MP_OBJ_FROM_PTR(&mp_type_StopIteration)));
|
2014-04-14 19:22:44 +04:00
|
|
|
// Pop exhausted gen
|
|
|
|
sp--;
|
2019-09-30 09:06:20 +03:00
|
|
|
RAISE(ret_value);
|
2014-03-26 19:36:12 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
}
|
2014-03-26 19:36:12 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_IMPORT_NAME): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t obj = POP();
|
|
|
|
SET_TOP(mp_import_name(qst, obj, TOP()));
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
2014-05-26 01:58:04 +04:00
|
|
|
ENTRY(MP_BC_IMPORT_FROM): {
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_UPDATE();
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
DECODE_QSTR;
|
2014-05-26 01:58:04 +04:00
|
|
|
mp_obj_t obj = mp_import_from(TOP(), qst);
|
|
|
|
PUSH(obj);
|
2014-04-14 19:22:44 +04:00
|
|
|
DISPATCH();
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
2014-04-14 19:22:44 +04:00
|
|
|
|
|
|
|
ENTRY(MP_BC_IMPORT_STAR):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
mp_import_all(POP());
|
|
|
|
DISPATCH();
|
|
|
|
|
2022-07-12 15:46:35 +03:00
|
|
|
#if MICROPY_OPT_COMPUTED_GOTO
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
ENTRY(MP_BC_LOAD_CONST_SMALL_INT_MULTI):
|
2019-09-12 05:19:49 +03:00
|
|
|
PUSH(MP_OBJ_NEW_SMALL_INT((mp_int_t)ip[-1] - MP_BC_LOAD_CONST_SMALL_INT_MULTI - MP_BC_LOAD_CONST_SMALL_INT_MULTI_EXCESS));
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
DISPATCH();
|
|
|
|
|
|
|
|
ENTRY(MP_BC_LOAD_FAST_MULTI):
|
|
|
|
obj_shared = fastn[MP_BC_LOAD_FAST_MULTI - (mp_int_t)ip[-1]];
|
|
|
|
goto load_check;
|
|
|
|
|
|
|
|
ENTRY(MP_BC_STORE_FAST_MULTI):
|
|
|
|
fastn[MP_BC_STORE_FAST_MULTI - (mp_int_t)ip[-1]] = POP();
|
|
|
|
DISPATCH();
|
|
|
|
|
|
|
|
ENTRY(MP_BC_UNARY_OP_MULTI):
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
SET_TOP(mp_unary_op(ip[-1] - MP_BC_UNARY_OP_MULTI, TOP()));
|
|
|
|
DISPATCH();
|
|
|
|
|
|
|
|
ENTRY(MP_BC_BINARY_OP_MULTI): {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
mp_obj_t rhs = POP();
|
|
|
|
mp_obj_t lhs = TOP();
|
|
|
|
SET_TOP(mp_binary_op(ip[-1] - MP_BC_BINARY_OP_MULTI, lhs, rhs));
|
|
|
|
DISPATCH();
|
|
|
|
}
|
|
|
|
|
|
|
|
ENTRY_DEFAULT:
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2022-07-12 15:46:35 +03:00
|
|
|
#else
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
ENTRY_DEFAULT:
|
2019-09-12 05:19:49 +03:00
|
|
|
if (ip[-1] < MP_BC_LOAD_CONST_SMALL_INT_MULTI + MP_BC_LOAD_CONST_SMALL_INT_MULTI_NUM) {
|
|
|
|
PUSH(MP_OBJ_NEW_SMALL_INT((mp_int_t)ip[-1] - MP_BC_LOAD_CONST_SMALL_INT_MULTI - MP_BC_LOAD_CONST_SMALL_INT_MULTI_EXCESS));
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
DISPATCH();
|
2019-09-12 05:19:49 +03:00
|
|
|
} else if (ip[-1] < MP_BC_LOAD_FAST_MULTI + MP_BC_LOAD_FAST_MULTI_NUM) {
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
obj_shared = fastn[MP_BC_LOAD_FAST_MULTI - (mp_int_t)ip[-1]];
|
|
|
|
goto load_check;
|
2019-09-12 05:19:49 +03:00
|
|
|
} else if (ip[-1] < MP_BC_STORE_FAST_MULTI + MP_BC_STORE_FAST_MULTI_NUM) {
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
fastn[MP_BC_STORE_FAST_MULTI - (mp_int_t)ip[-1]] = POP();
|
|
|
|
DISPATCH();
|
2019-09-12 05:19:49 +03:00
|
|
|
} else if (ip[-1] < MP_BC_UNARY_OP_MULTI + MP_BC_UNARY_OP_MULTI_NUM) {
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
SET_TOP(mp_unary_op(ip[-1] - MP_BC_UNARY_OP_MULTI, TOP()));
|
|
|
|
DISPATCH();
|
2019-09-12 05:19:49 +03:00
|
|
|
} else if (ip[-1] < MP_BC_BINARY_OP_MULTI + MP_BC_BINARY_OP_MULTI_NUM) {
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
mp_obj_t rhs = POP();
|
|
|
|
mp_obj_t lhs = TOP();
|
|
|
|
SET_TOP(mp_binary_op(ip[-1] - MP_BC_BINARY_OP_MULTI, lhs, rhs));
|
|
|
|
DISPATCH();
|
|
|
|
} else
|
2022-07-12 15:46:35 +03:00
|
|
|
#endif // MICROPY_OPT_COMPUTED_GOTO
|
py: Compress load-int, load-fast, store-fast, unop, binop bytecodes.
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
2014-10-25 19:43:46 +04:00
|
|
|
{
|
2020-03-02 14:35:22 +03:00
|
|
|
mp_obj_t obj = mp_obj_new_exception_msg(&mp_type_NotImplementedError, MP_ERROR_TEXT("opcode"));
|
2014-04-14 19:22:44 +04:00
|
|
|
nlr_pop();
|
2019-01-04 09:22:40 +03:00
|
|
|
code_state->state[0] = obj;
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_LEAVE();
|
2014-04-14 19:22:44 +04:00
|
|
|
return MP_VM_RETURN_EXCEPTION;
|
2014-05-26 01:58:04 +04:00
|
|
|
}
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
|
2022-07-12 15:46:35 +03:00
|
|
|
#if !MICROPY_OPT_COMPUTED_GOTO
|
2014-04-15 11:57:01 +04:00
|
|
|
} // switch
|
2022-07-12 15:46:35 +03:00
|
|
|
#endif
|
2014-10-25 21:19:55 +04:00
|
|
|
|
|
|
|
pending_exception_check:
|
2022-07-01 04:22:24 +03:00
|
|
|
// We've just done a branch, use this as a convenient point to
|
|
|
|
// run periodic code/checks and/or bounce the GIL.. i.e.
|
|
|
|
// not _every_ instruction but on average a branch should
|
|
|
|
// occur every few instructions.
|
2016-02-16 01:46:21 +03:00
|
|
|
MICROPY_VM_HOOK_LOOP
|
2017-02-16 10:05:06 +03:00
|
|
|
|
2022-07-01 04:22:24 +03:00
|
|
|
// Check for pending exceptions or scheduled tasks to run.
|
|
|
|
// Note: it's safe to just call mp_handle_pending(true), but
|
|
|
|
// we can inline the check for the common case where there is
|
|
|
|
// neither.
|
|
|
|
if (
|
2017-02-16 10:05:06 +03:00
|
|
|
#if MICROPY_ENABLE_SCHEDULER
|
2022-07-01 04:22:24 +03:00
|
|
|
#if MICROPY_PY_THREAD
|
|
|
|
// Scheduler + threading: Scheduler and pending exceptions are independent, check both.
|
|
|
|
MP_STATE_VM(sched_state) == MP_SCHED_PENDING || MP_STATE_THREAD(mp_pending_exception) != MP_OBJ_NULL
|
2017-02-16 10:05:06 +03:00
|
|
|
#else
|
2022-07-01 04:22:24 +03:00
|
|
|
// Scheduler + non-threading: Optimisation: pending exception sets sched_state, only check sched_state.
|
|
|
|
MP_STATE_VM(sched_state) == MP_SCHED_PENDING
|
|
|
|
#endif
|
|
|
|
#else
|
|
|
|
// No scheduler: Just check pending exception.
|
|
|
|
MP_STATE_THREAD(mp_pending_exception) != MP_OBJ_NULL
|
|
|
|
#endif
|
2022-12-16 09:31:21 +03:00
|
|
|
#if MICROPY_ENABLE_VM_ABORT
|
|
|
|
// Check if the VM should abort execution.
|
|
|
|
|| MP_STATE_VM(vm_abort)
|
|
|
|
#endif
|
2022-07-01 04:22:24 +03:00
|
|
|
) {
|
2014-12-28 08:17:43 +03:00
|
|
|
MARK_EXC_IP_SELECTIVE();
|
2022-07-01 04:22:24 +03:00
|
|
|
mp_handle_pending(true);
|
2014-10-25 21:19:55 +04:00
|
|
|
}
|
|
|
|
|
2017-02-06 02:50:43 +03:00
|
|
|
#if MICROPY_PY_THREAD_GIL
|
|
|
|
#if MICROPY_PY_THREAD_GIL_VM_DIVISOR
|
2022-07-01 04:22:24 +03:00
|
|
|
// Don't bounce the GIL too frequently (default every 32 branches).
|
2018-05-16 05:33:39 +03:00
|
|
|
if (--gil_divisor == 0)
|
2017-02-06 02:50:43 +03:00
|
|
|
#endif
|
2018-05-16 05:33:39 +03:00
|
|
|
{
|
|
|
|
#if MICROPY_PY_THREAD_GIL_VM_DIVISOR
|
|
|
|
gil_divisor = MICROPY_PY_THREAD_GIL_VM_DIVISOR;
|
|
|
|
#endif
|
2017-03-20 10:42:27 +03:00
|
|
|
#if MICROPY_ENABLE_SCHEDULER
|
|
|
|
// can only switch threads if the scheduler is unlocked
|
|
|
|
if (MP_STATE_VM(sched_state) == MP_SCHED_IDLE)
|
|
|
|
#endif
|
|
|
|
{
|
2017-02-06 02:50:43 +03:00
|
|
|
MP_THREAD_GIL_EXIT();
|
|
|
|
MP_THREAD_GIL_ENTER();
|
2017-03-20 10:42:27 +03:00
|
|
|
}
|
2017-02-06 02:50:43 +03:00
|
|
|
}
|
|
|
|
#endif
|
2016-05-26 13:42:53 +03:00
|
|
|
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
} // for loop
|
2013-10-16 01:25:17 +04:00
|
|
|
|
|
|
|
} else {
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
exception_handler:
|
2013-10-16 01:25:17 +04:00
|
|
|
// exception occurred
|
|
|
|
|
2015-04-25 03:17:41 +03:00
|
|
|
#if MICROPY_PY_SYS_EXC_INFO
|
|
|
|
MP_STATE_VM(cur_exception) = nlr.ret_val;
|
|
|
|
#endif
|
|
|
|
|
2014-12-29 03:29:59 +03:00
|
|
|
#if SELECTIVE_EXC_IP
|
|
|
|
// with selective ip, we store the ip 1 byte past the opcode, so move ptr back
|
|
|
|
code_state->ip -= 1;
|
|
|
|
#endif
|
|
|
|
|
2015-11-27 20:01:44 +03:00
|
|
|
if (mp_obj_is_subclass_fast(MP_OBJ_FROM_PTR(((mp_obj_base_t*)nlr.ret_val)->type), MP_OBJ_FROM_PTR(&mp_type_StopIteration))) {
|
2022-06-21 11:11:19 +03:00
|
|
|
// check if it's a StopIteration within a for block
|
|
|
|
if (*code_state->ip == MP_BC_FOR_ITER) {
|
|
|
|
const byte *ip = code_state->ip + 1;
|
|
|
|
DECODE_ULABEL; // the jump offset if iteration finishes; for labels are always forward
|
|
|
|
code_state->ip = ip + ulab; // jump to after for-block
|
|
|
|
code_state->sp -= MP_OBJ_ITER_BUF_NSLOTS; // pop the exhausted iterator
|
|
|
|
goto outer_dispatch_loop; // continue with dispatch loop
|
|
|
|
} else if (*code_state->ip == MP_BC_YIELD_FROM) {
|
|
|
|
// StopIteration inside yield from call means return a value of
|
|
|
|
// yield from, so inject exception's value as yield from's result
|
|
|
|
// (Instead of stack pop then push we just replace exhausted gen with value)
|
|
|
|
*code_state->sp = mp_obj_exception_get_value(MP_OBJ_FROM_PTR(nlr.ret_val));
|
|
|
|
code_state->ip++; // yield from is over, move to next instruction
|
|
|
|
goto outer_dispatch_loop; // continue with dispatch loop
|
2015-05-10 17:18:10 +03:00
|
|
|
}
|
2014-03-26 22:37:06 +04:00
|
|
|
}
|
|
|
|
|
2019-08-14 17:09:36 +03:00
|
|
|
#if MICROPY_PY_SYS_SETTRACE
|
|
|
|
// Exceptions are traced here
|
|
|
|
if (mp_obj_is_subclass_fast(MP_OBJ_FROM_PTR(((mp_obj_base_t*)nlr.ret_val)->type), MP_OBJ_FROM_PTR(&mp_type_Exception))) {
|
|
|
|
TRACE_TICK(code_state->ip, code_state->sp, true /* yes, it's an exception */);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2015-03-28 02:14:44 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
unwind_loop:
|
|
|
|
#endif
|
2019-08-21 09:07:12 +03:00
|
|
|
// Set traceback info (file and line number) where the exception occurred, but not for:
|
|
|
|
// - constant GeneratorExit object, because it's const
|
|
|
|
// - exceptions re-raised by END_FINALLY
|
2019-08-21 09:08:43 +03:00
|
|
|
// - exceptions re-raised explicitly by "raise"
|
2019-08-21 09:07:12 +03:00
|
|
|
if (nlr.ret_val != &mp_const_GeneratorExit_obj
|
2019-08-21 09:08:43 +03:00
|
|
|
&& *code_state->ip != MP_BC_END_FINALLY
|
2019-08-22 05:39:07 +03:00
|
|
|
&& *code_state->ip != MP_BC_RAISE_LAST) {
|
2017-03-17 06:54:53 +03:00
|
|
|
const byte *ip = code_state->fun_bc->bytecode;
|
2019-09-16 15:12:59 +03:00
|
|
|
MP_BC_PRELUDE_SIG_DECODE(ip);
|
2019-09-25 08:45:47 +03:00
|
|
|
MP_BC_PRELUDE_SIZE_DECODE(ip);
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
const byte *line_info_top = ip + n_info;
|
2019-09-25 08:45:47 +03:00
|
|
|
const byte *bytecode_start = ip + n_info + n_cell;
|
|
|
|
size_t bc = code_state->ip - bytecode_start;
|
2017-06-09 06:31:57 +03:00
|
|
|
qstr block_name = mp_decode_uint_value(ip);
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
for (size_t i = 0; i < 1 + n_pos_args + n_kwonly_args; ++i) {
|
|
|
|
ip = mp_decode_uint_skip(ip);
|
|
|
|
}
|
|
|
|
#if MICROPY_EMIT_BYTECODE_USES_QSTR_TABLE
|
|
|
|
block_name = code_state->fun_bc->context->constants.qstr_table[block_name];
|
|
|
|
qstr source_file = code_state->fun_bc->context->constants.qstr_table[0];
|
|
|
|
#else
|
|
|
|
qstr source_file = code_state->fun_bc->context->constants.source_file;
|
2015-11-02 20:27:18 +03:00
|
|
|
#endif
|
py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 14:22:47 +03:00
|
|
|
size_t source_line = mp_bytecode_get_source_line(ip, line_info_top, bc);
|
2015-11-27 20:01:44 +03:00
|
|
|
mp_obj_exception_add_traceback(MP_OBJ_FROM_PTR(nlr.ret_val), source_file, source_line, block_name);
|
2014-01-19 03:24:36 +04:00
|
|
|
}
|
|
|
|
|
2019-01-04 08:40:29 +03:00
|
|
|
while (exc_sp >= exc_stack && exc_sp->handler <= code_state->ip) {
|
|
|
|
|
2013-12-29 20:54:59 +04:00
|
|
|
// nested exception
|
|
|
|
|
2014-03-22 15:49:31 +04:00
|
|
|
assert(exc_sp >= exc_stack);
|
2013-12-29 20:54:59 +04:00
|
|
|
|
|
|
|
// TODO make a proper message for nested exception
|
|
|
|
// at the moment we are just raising the very last exception (the one that caused the nested exception)
|
|
|
|
|
|
|
|
// move up to previous exception handler
|
2014-03-30 01:16:27 +04:00
|
|
|
POP_EXC_BLOCK();
|
2013-12-29 20:54:59 +04:00
|
|
|
}
|
|
|
|
|
2014-03-22 15:49:31 +04:00
|
|
|
if (exc_sp >= exc_stack) {
|
2013-10-16 01:25:17 +04:00
|
|
|
// catch exception and pass to byte code
|
2014-05-31 17:50:46 +04:00
|
|
|
code_state->ip = exc_sp->handler;
|
py: Tidy up variables in VM, probably fixes subtle bugs.
Things get tricky when using the nlr code to catch exceptions. Need to
ensure that the variables (stack layout) in the exception handler are
the same as in the bit protected by the exception handler.
Prior to this patch there were a few bugs. 1) The constant
mp_const_MemoryError_obj was being preloaded to a specific location on
the stack at the start of the function. But this location on the stack
was being overwritten in the opcode loop (since it didn't think that
variable would ever be referenced again), and so when an exception
occurred, the variable holding the address of MemoryError was corrupt.
2) The FOR_ITER opcode detection in the exception handler used sp, which
may or may not contain the right value coming out of the main opcode
loop.
With this patch there is a clear separation of variables used in the
opcode loop and in the exception handler (should fix issue (2) above).
Furthermore, nlr_raise is no longer used in the opcode loop. Instead,
it jumps directly into the exception handler. This tells the C compiler
more about the possible code flow, and means that it should have the
same stack layout for the exception handler. This should fix issue (1)
above. Indeed, the generated (ARM) assembler has been checked explicitly,
and with 'goto exception_handler', the problem with &MemoryError is
fixed.
This may now fix problems with rge-sm, and probably many other subtle
bugs yet to show themselves. Incidentally, rge-sm now passes on
pyboard (with a reduced range of integration)!
Main lesson: nlr is tricky. Don't use nlr_push unless you know what you
are doing! Luckily, it's not used in many places. Using nlr_raise/jump
is fine.
2014-04-17 19:50:23 +04:00
|
|
|
mp_obj_t *sp = MP_TAGPTR_PTR(exc_sp->val_sp);
|
2014-03-30 04:54:48 +04:00
|
|
|
// save this exception in the stack so it can be used in a reraise, if needed
|
|
|
|
exc_sp->prev_exc = nlr.ret_val;
|
2016-09-27 05:37:21 +03:00
|
|
|
// push exception object so it can be handled by bytecode
|
2015-11-27 20:01:44 +03:00
|
|
|
PUSH(MP_OBJ_FROM_PTR(nlr.ret_val));
|
2014-05-31 17:50:46 +04:00
|
|
|
code_state->sp = sp;
|
2013-12-29 20:54:59 +04:00
|
|
|
|
2015-03-28 02:14:44 +03:00
|
|
|
#if MICROPY_STACKLESS
|
|
|
|
} else if (code_state->prev != NULL) {
|
|
|
|
mp_globals_set(code_state->old_globals);
|
2017-11-26 15:37:19 +03:00
|
|
|
mp_code_state_t *new_code_state = code_state->prev;
|
|
|
|
#if MICROPY_ENABLE_PYSTACK
|
2017-11-26 15:48:23 +03:00
|
|
|
// Free code_state, and args allocated by mp_call_prepare_args_n_kw_var
|
|
|
|
// (The latter is implicitly freed when using pystack due to its LIFO nature.)
|
2017-11-26 15:37:19 +03:00
|
|
|
// The sizeof in the following statement does not include the size of the variable
|
|
|
|
// part of the struct. This arg is anyway not used if pystack is enabled.
|
|
|
|
mp_nonlocal_free(code_state, sizeof(mp_code_state_t));
|
|
|
|
#endif
|
|
|
|
code_state = new_code_state;
|
2019-09-24 08:57:08 +03:00
|
|
|
size_t n_state = code_state->n_state;
|
2017-03-17 06:54:53 +03:00
|
|
|
fastn = &code_state->state[n_state - 1];
|
|
|
|
exc_stack = (mp_exc_stack_t*)(code_state->state + n_state);
|
2015-03-28 02:14:44 +03:00
|
|
|
// variables that are visible to the exception handler (declared volatile)
|
2019-09-23 16:37:01 +03:00
|
|
|
exc_sp = MP_CODE_STATE_EXC_SP_IDX_TO_PTR(exc_stack, code_state->exc_sp_idx); // stack grows up, exc_sp points to top of stack
|
2015-03-28 02:14:44 +03:00
|
|
|
goto unwind_loop;
|
|
|
|
|
|
|
|
#endif
|
2013-10-16 01:25:17 +04:00
|
|
|
} else {
|
2014-02-16 02:55:00 +04:00
|
|
|
// propagate exception to higher level
|
2018-09-27 08:17:37 +03:00
|
|
|
// Note: ip and sp don't have usable values at this point
|
2018-09-29 16:25:08 +03:00
|
|
|
code_state->state[0] = MP_OBJ_FROM_PTR(nlr.ret_val); // put exception here because sp is invalid
|
2019-08-14 17:09:36 +03:00
|
|
|
FRAME_LEAVE();
|
2014-02-16 02:55:00 +04:00
|
|
|
return MP_VM_RETURN_EXCEPTION;
|
2013-10-16 01:25:17 +04:00
|
|
|
}
|
2013-10-04 22:53:11 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|