NetBSD/share/doc/papers/px/pxin2.n

926 lines
21 KiB
Plaintext

.\" $NetBSD: pxin2.n,v 1.2 1998/01/09 06:41:56 perry Exp $
.\"
.\" Copyright (c) 1979 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\" must display the following acknowledgement:
.\" This product includes software developed by the University of
.\" California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" @(#)pxin2.n 5.2 (Berkeley) 4/17/91
.\"
.if !\n(xx .so tmac.p
.nr H1 1
.if n .ND
.NH
Operations
.NH 2
Naming conventions and operation summary
.PP
Table 2.1 outlines the opcode typing convention.
The expression ``a above b'' means that `a' is on top
of the stack with `b' below it.
Table 2.3 describes each of the opcodes.
The character `*' at the end of a name specifies that
all operations with the root prefix
before the `*'
are summarized by one entry.
Table 2.2 gives the codes used
to describe the type inline data expected by each instruction.
.sp 2
.so table2.1.n
.sp 2
.so table2.2.n
.bp
.so table2.3.n
.bp
.NH 2
Basic control operations
.LP
.SH
HALT
.IP
Corresponds to the Pascal procedure
.I halt ;
causes execution to end with a post-mortem backtrace as if a run-time
error had occurred.
.SH
BEG s,W,w,"
.IP
Causes the second part of the block mark to be created, and
.I W
bytes of local variable space to be allocated and cleared to zero.
Stack overflow is detected here.
.I w
is the first line of the body of this section for error traceback,
and the inline string (length s) the character representation of its name.
.SH
NODUMP s,W,w,"
.IP
Equivalent to
.SM BEG ,
and used to begin the main program when the ``p''
option is disabled so that the post-mortem backtrace will be inhibited.
.SH
END
.IP
Complementary to the operators
.SM CALL
and
.SM BEG ,
exits the current block, calling the procedure
.I pclose
to flush buffers for and release any local files.
Restores the environment of the caller from the block mark.
If this is the end for the main program, all files are
.I flushed,
and the interpreter is exited.
.SH
CALL l,A
.IP
Saves the current line number, return address, and active display entry pointer
.I dp
in the first part of the block mark, then transfers to the entry point
given by the relative address
.I A ,
that is the beginning of a
.B procedure
or
.B function
at level
.I l.
.SH
PUSH s
.IP
Clears
.I s
bytes on the stack.
Used to make space for the return value of a
.B function
just before calling it.
.SH
POP s
.IP
Pop
.I s
bytes off the stack.
Used after a
.B function
or
.B procedure
returns to remove the arguments from the stack.
.SH
TRA a
.IP
Transfer control to relative address
.I a
as a local
.B goto
or part of a structured statement.
.SH
TRA4 A
.IP
Transfer control to an absolute address as part of a non-local
.B goto
or to branch over procedure bodies.
.SH
LINO s
.IP
Set current line number to
.I s.
For consistency, check that the expression stack is empty
as it should be (as this is the start of a statement.)
This consistency check will fail only if there is a bug in the
interpreter or the interpreter code has somehow been damaged.
Increment the statement count and if it exceeds the statement limit,
generate a fault.
.SH
GOTO l,A
.IP
Transfer control to address
.I A
that is in the block at level
.I l
of the display.
This is a non-local
.B goto.
Causes each block to be exited as if with
.SM END ,
flushing and freeing files with
.I pclose,
until the current display entry is at level
.I l.
.SH
SDUP*
.IP
Duplicate the word or long on the top of
the stack.
This is used mostly for constructing sets.
See section 2.11.
.NH 2
If and relational operators
.SH
IF a
.IP
The interpreter conditional transfers all take place using this operator
that examines the Boolean value on the top of the stack.
If the value is
.I true ,
the next code is executed,
otherwise control transfers to the specified address.
.SH
REL* r
.IP
These take two arguments on the stack,
and the sub-operation code specifies the relational operation to
be done, coded as follows with `a' above `b' on the stack:
.DS
.mD
.TS
lb lb
c a.
Code Operation
_
0 a = b
2 a <> b
4 a < b
6 a > b
8 a <= b
10 a >= b
.TE
.DE
.IP
Each operation does a test to set the condition code
appropriately and then does an indexed branch based on the
sub-operation code to a test of the condition here specified,
pushing a Boolean value on the stack.
.IP
Consider the statement fragment:
.DS
.mD
\*bif\fR a = b \*bthen\fR
.DE
.IP
If
.I a
and
.I b
are integers this generates the following code:
.DS
.TS
lp-2w(8) l.
RV4:\fIl a\fR
RV4:\fIl b\fR
REL4 \&=
IF \fIElse part offset\fR
.sp
.T&
c s.
\fI\&... Then part code ...\fR
.TE
.DE
.NH 2
Boolean operators
.PP
The Boolean operators
.SM AND ,
.SM OR ,
and
.SM NOT
manipulate values on the top of the stack.
All Boolean values are kept in single bytes in memory,
or in single words on the stack.
Zero represents a Boolean \fIfalse\fP, and one a Boolean \fItrue\fP.
.NH 2
Right value, constant, and assignment operators
.SH
LRV* l,A
.br
RV* l,a
.IP
The right value operators load values on the stack.
They take a block number as a sub-opcode and load the appropriate
number of bytes from that block at the offset specified
in the following word onto the stack. As an example, consider
.SM LRV4 :
.DS
.mD
_LRV4:
\fBcvtbl\fR (lc)+,r0 #r0 has display index
\fBaddl3\fR _display(r0),(lc)+,r1 #r1 has variable address
\fBpushl\fR (r1) #put value on the stack
\fBjmp\fR (loop)
.DE
.IP
Here the interpreter places the display level in r0.
It then adds the appropriate display value to the inline offset and
pushes the value at this location onto the stack.
Control then returns to the main
interpreter loop.
The
.SM RV*
operators have short inline data that
reduces the space required to address the first 32K of
stack space in each stack frame.
The operators
.SM RV14
and
.SM RV24
provide explicit conversion to long as the data
is pushed.
This saves the generation of
.SM STOI
to align arguments to
.SM C
subroutines.
.SH
CON* r
.IP
The constant operators load a value onto the stack from inline code.
Small integer values are condensed and loaded by the
.SM CON1
operator, that is given by
.DS
.mD
_CON1:
\fBcvtbw\fR (lc)+,\-(sp)
\fBjmp\fR (loop)
.DE
.IP
Here note that little work was required as the required constant
was available at (lc)+.
For longer constants,
.I lc
must be incremented before moving the constant.
The operator
.SM CON
takes a length specification in the sub-opcode and can be used to load
strings and other variable length data onto the stack.
The operators
.SM CON14
and
.SM CON24
provide explicit conversion to long as the constant is pushed.
.SH
AS*
.IP
The assignment operators are similar to arithmetic and relational operators
in that they take two operands, both in the stack,
but the lengths given for them specify
first the length of the value on the stack and then the length
of the target in memory.
The target address in memory is under the value to be stored.
Thus the statement
.DS
i := 1
.DE
.IP
where
.I i
is a full-length, 4 byte, integer,
will generate the code sequence
.DS
.TS
lp-2w(8) l.
LV:\fIl i\fP
CON1:1
AS24
.TE
.DE
.IP
Here
.SM LV
will load the address of
.I i,
that is really given as a block number in the sub-opcode and an
offset in the following word,
onto the stack, occupying a single word.
.SM CON1 ,
that is a single word instruction,
then loads the constant 1,
that is in its sub-opcode,
onto the stack.
Since there are not one byte constants on the stack,
this becomes a 2 byte, single word integer.
The interpreter then assigns a length 2 integer to a length 4 integer using
.SM AS24 \&.
The code sequence for
.SM AS24
is given by:
.DS
.mD
_AS24:
\fBincl\fR lc
\fBcvtwl\fR (sp)+,*(sp)+
\fBjmp\fR (loop)
.DE
.IP
Thus the interpreter gets the single word off the stack,
extends it to be a 4 byte integer
gets the target address off the stack,
and finally stores the value in the target.
This is a typical use of the constant and assignment operators.
.NH 2
Addressing operations
.SH
LLV l,W
.br
LV l,w
.IP
The most common operation done by the interpreter
is the ``left value'' or ``address of'' operation.
It is given by:
.DS
.mD
_LLV:
\fBcvtbl\fR (lc)+,r0 #r0 has display index
\fBaddl3\fR _display(r0),(lc)+,\-(sp) #push address onto the stack
\fBjmp\fR (loop)
.DE
.IP
It calculates an address in the block specified in the sub-opcode
by adding the associated display entry to the
offset that appears in the following word.
The
.SM LV
operator has a short inline data that reduces the space
required to address the first 32K of stack space in each call frame.
.SH
OFF s
.IP
The offset operator is used in field names.
Thus to get the address of
.LS
p^.f1
.LE
.IP
.I pi
would generate the sequence
.DS
.mD
.TS
lp-2w(8) l.
RV:\fIl p\fP
OFF \fIf1\fP
.TE
.DE
.IP
where the
.SM RV
loads the value of
.I p,
given its block in the sub-opcode and offset in the following word,
and the interpreter then adds the offset of the field
.I f1
in its record to get the correct address.
.SM OFF
takes its argument in the sub-opcode if it is small enough.
.SH
NIL
.IP
The example above is incomplete, lacking a check for a
.B nil
pointer.
The code generated would be
.DS
.TS
lp-2w(8) l.
RV:\fIl p\fP
NIL
OFF \fIf1\fP
.TE
.DE
.IP
where the
.SM NIL
operation checks for a
.I nil
pointer and generates the appropriate runtime error if it is.
.SH
LVCON s,"
.IP
A pointer to the specified length inline data is pushed
onto the stack.
This is primarily used for
.I printf
type strings used by
.SM WRITEF .
(see sections 3.6 and 3.8)
.SH
INX* s,w,w
.IP
The operators
.SM INX2
and
.SM INX4
are used for subscripting.
For example, the statement
.DS
a[i] := 2.0
.DE
.IP
with
.I i
an integer and
.I a
an
``array [1..1000] of real''
would generate
.DS
.TS
lp-2w(8) l.
LV:\fIl a\fP
RV4:\fIl i\fP
INX4:8 1,999
CON8 2.0
AS8
.TE
.DE
.IP
Here the
.SM LV
operation takes the address of
.I a
and places it on the stack.
The value of
.I i
is then placed on top of this on the stack.
The array address is indexed by the
length 4 index (a length 2 index would use
.SM INX2 )
where the individual elements have a size of 8 bytes.
The code for
.SM INX4
is:
.DS
.mD
_INX4:
\fBcvtbl\fR (lc)+,r0
\fBbneq\fR L1
\fBcvtwl\fR (lc)+,r0 #r0 has size of records
L1:
\fBcvtwl\fR (lc)+,r1 #r1 has lower bound
\fBmovzwl\fR (lc)+,r2 #r2 has upper-lower bound
\fBsubl3\fR r1,(sp)+,r3 #r3 has base subscript
\fBcmpl\fR r3,r2 #check for out of bounds
\fBbgtru\fR esubscr
\fBmull2\fR r0,r3 #calculate byte offset
\fBaddl2\fR r3,(sp) #calculate actual address
\fBjmp\fR (loop)
esubscr:
\fBmovw\fR $ESUBSCR,_perrno
\fBjbr\fR error
.DE
.IP
Here the lower bound is subtracted, and range checked against the
upper minus lower bound.
The offset is then scaled to a byte offset into the array
and added to the base address on the stack.
Multi-dimension subscripts are translated as a sequence of single subscriptings.
.SH
IND*
.IP
For indirect references through
.B var
parameters and pointers,
the interpreter has a set of indirection operators that convert a pointer
on the stack into a value on the stack from that address.
different
.SM IND
operators are necessary because of the possibility of different
length operands.
The
.SM IND14
and
.SM IND24
operators do conversions to long
as they push their data.
.NH 2
Arithmetic operators
.PP
The interpreter has many arithmetic operators.
All operators produce results long enough to prevent overflow
unless the bounds of the base type are exceeded.
The basic operators available are
.DS
Addition: ADD*, SUCC*
Subtraction: SUB*, PRED*
Multiplication: MUL*, SQR*
Division: DIV*, DVD*, MOD*
Unary: NEG*, ABS*
.DE
.NH 2
Range checking
.PP
The interpreter has several range checking operators.
The important distinction among these operators is between values whose
legal range begins at zero and those that do not begin at zero,
for example
a subrange variable whose values range from 45 to 70.
For those that begin at zero, a simpler ``logical'' comparison against
the upper bound suffices.
For others, both the low and upper bounds must be checked independently,
requiring two comparisons.
On the
.SM "VAX 11/780"
both checks are done using a single index instruction
so the only gain is in reducing the inline data.
.NH 2
Case operators
.PP
The interpreter includes three operators for
.B case
statements that are used depending on the width of the
.B case
label type.
For each width, the structure of the case data is the same, and
is represented in figure 2.4.
.sp 1
.so fig2.4.n
.PP
The
.SM CASEOP
case statement operators do a sequential search through the
case label values.
If they find the label value, they take the corresponding entry
from the transfer table and cause the interpreter to branch to the
specified statement.
If the specified label is not found, an error results.
.PP
The
.SM CASE
operators take the number of cases as a sub-opcode
if possible.
Three different operators are needed to handle single byte,
word, and long case transfer table values.
For example, the
.SM CASEOP1
operator has the following code sequence:
.DS
.mD
_CASEOP1:
\fBcvtbl\fR (lc)+,r0
\fBbneq\fR L1
\fBcvtwl\fR (lc)+,r0 #r0 has length of case table
L1:
\fBmovaw\fR (lc)[r0],r2 #r2 has pointer to case labels
\fBmovzwl\fR (sp)+,r3 #r3 has the element to find
\fBlocc\fR r3,r0,(r2) #r0 has index of located element
\fBbeql\fR caserr #element not found
\fBmnegl\fR r0,r0 #calculate new lc
\fBcvtwl\fR (r2)[r0],r1 #r1 has lc offset
\fBaddl2\fR r1,lc
\fBjmp\fR (loop)
caserr:
\fBmovw\fR $ECASE,_perrno
\fBjbr\fR error
.DE
.PP
Here the interpreter first computes the address of the beginning
of the case label value area by adding twice the number of case label
values to the address of the transfer table, since the transfer
table entries are 2 byte address offsets.
It then searches through the label values, and generates an ECASE
error if the label is not found.
If the label is found, the index of the corresponding entry
in the transfer table is extracted and that offset is added
to the interpreter location counter.
.NH 2
Operations supporting pxp
.PP
The following operations are defined to do execution profiling.
.SH
PXPBUF w
.IP
Causes the interpreter to allocate a count buffer
with
.I w
four byte counters
and to clear them to zero.
The count buffer is placed within an image of the
.I pmon.out
file as described in the
.I "PXP Implementation Notes."
The contents of this buffer are written to the file
.I pmon.out
when the program ends.
.SH
COUNT w
.IP
Increments the counter specified by
.I w .
.SH
TRACNT w,A
.IP
Used at the entry point to procedures and functions,
combining a transfer to the entry point of the block with
an incrementing of its entry count.
.NH 2
Set operations
.PP
The set operations:
union
.SM ADDT,
intersection
.SM MULT,
element removal
.SM SUBT,
and the set relationals
.SM RELT
are straightforward.
The following operations are more interesting.
.SH
CARD s
.IP
Takes the cardinality of a set of size
.I s
bytes on top of the stack, leaving a 2 byte integer count.
.SM CARD
uses the
.B ffs
opcode to successively count the number of set bits in the set.
.SH
CTTOT s,w,w
.IP
Constructs a set.
This operation requires a non-trivial amount of work,
checking bounds and setting individual bits or ranges of bits.
This operation sequence is slow,
and motivates the presence of the operator
.SM INCT
below.
The arguments to
.SM CTTOT
include the number of elements
.I s
in the constructed set,
the lower and upper bounds of the set,
the two
.I w
values,
and a pair of values on the stack for each range in the set, single
elements in constructed sets being duplicated with
.SM SDUP
to form degenerate ranges.
.SH
IN s,w,w
.IP
The operator
.B in
for sets.
The value
.I s
specifies the size of the set,
the two
.I w
values the lower and upper bounds of the set.
The value on the stack is checked to be in the set on the stack,
and a Boolean value of
.I true
or
.I false
replaces the operands.
.SH
INCT
.IP
The operator
.B in
on a constructed set without constructing it.
The left operand of
.B in
is on top of the stack followed by the number of pairs in the
constructed set,
and then the pairs themselves, all as single word integers.
Pairs designate runs of values and single values are represented by
a degenerate pair with both value equal.
This operator is generated in grammatical constructs such as
.LS
\fBif\fR character \fBin\fR [`+', '\-', `*', `/']
.LE
.IP
or
.LS
\fBif\fR character \fBin\fR [`a'..`z', `$', `_']
.LE
.IP
These constructs are common in Pascal, and
.SM INCT
makes them run much faster in the interpreter,
as if they were written as an efficient series of
.B if
statements.
.NH 2
Miscellaneous
.PP
Other miscellaneous operators that are present in the interpreter
are
.SM ASRT
that causes the program to end if the Boolean value on the stack is not
.I true,
and
.SM STOI ,
.SM STOD ,
.SM ITOD ,
and
.SM ITOS
that convert between different length arithmetic operands for
use in aligning the arguments in
.B procedure
and
.B function
calls, and with some untyped built-ins, such as
.SM SIN
and
.SM COS \&.
.PP
Finally, if the program is run with the run-time testing disabled, there
are special operators for
.B for
statements
and special indexing operators for arrays
that have individual element size that is a power of 2.
The code can run significantly faster using these operators.
.NH 2
Mathematical Functions
.PP
The transcendental functions
.SM SIN ,
.SM COS ,
.SM ATAN ,
.SM EXP ,
.SM LN ,
.SM SQRT ,
.SM SEED ,
and
.SM RANDOM
are taken from the standard UNIX
mathematical package.
These functions take double precision floating point
values and return the same.
.PP
The functions
.SM EXPO ,
.SM TRUNC ,
and
.SM ROUND
take a double precision floating point number.
.SM EXPO
returns an integer representing the machine
representation of its argument's exponent,
.SM TRUNC
returns the integer part of its argument, and
.SM ROUND
returns the rounded integer part of its argument.
.NH 2
System functions and procedures
.SH
LLIMIT
.IP
A line limit and a file pointer are passed on the stack.
If the limit is non-negative the line limit is set to the
specified value, otherwise it is set to unlimited.
The default is unlimited.
.SH
STLIM
.IP
A statement limit is passed on the stack. The statement limit
is set as specified.
The default is 500,000.
No limit is enforced when the ``p'' option is disabled.
.SH
CLCK
.br
SCLCK
.IP
.SM CLCK
returns the number of milliseconds of user time used by the program;
.SM SCLCK
returns the number of milliseconds of system time used by the program.
.SH
WCLCK
.IP
The number of seconds since some predefined time is
returned. Its primary usefulness is in determining
elapsed time and in providing a unique time stamp.
.sp
.LP
The other system time procedures are
.SM DATE
and
.SM TIME
that copy an appropriate text string into a pascal string array.
The function
.SM ARGC
returns the number of command line arguments passed to the program.
The procedure
.SM ARGV
takes an index on the stack and copies the specified
command line argument into a pascal string array.
.NH 2
Pascal procedures and functions
.SH
PACK s,w,w,w
.br
UNPACK s,w,w,w
.IP
They function as a memory to memory move with several
semantic checks.
They do no ``unpacking'' or ``packing'' in the true sense as the
interpreter supports no packed data types.
.SH
NEW s
.br
DISPOSE s
.IP
An
.SM LV
of a pointer is passed.
.SM NEW
allocates a record of a specified size and puts a pointer
to it into the pointer variable.
.SM DISPOSE
deallocates the record pointed to by the pointer
and sets the pointer to
.SM NIL .
.sp
.LP
The function
.SM CHR*
converts a suitably small integer into an ascii character.
Its primary purpose is to do a range check.
The function
.SM ODD*
returns
.I true
if its argument is odd and returns
.I false
if its argument is even.
The function
.SM UNDEF
always returns the value
.I false .