394 lines
14 KiB
Plaintext
394 lines
14 KiB
Plaintext
|
|
IEEE handler README
|
|
-------------------
|
|
|
|
Ian Dall <ian.dall@dsto.defence.gov.au>
|
|
December 1995
|
|
|
|
NetBSD information by Matthias Pfaller <leo@dachau.marco.de>
|
|
|
|
|
|
1. Introduction
|
|
|
|
The ns32081 and the ns32381 floating point units implement a subset of
|
|
the IEEE floating point standard. In cases where the correct operation
|
|
is to generate special values, +-Infinity, NaN or a denormalized
|
|
number, or when one of the operands is a special value, the FPU
|
|
generates a trap. It is intended that the missing functionality be
|
|
implemented in software. This packages provides that missing
|
|
functionality.
|
|
|
|
The trap handling code can run either in the kernel, or in user code
|
|
as the signal handler in a unix process. The latter has the
|
|
disadvantage that code changes are required to set up the signal
|
|
handler, so it is intended that this mode way of use be used primarily
|
|
for debugging. So far the in-kernel implementation has been done for
|
|
mach and NetBSD. There should be no large obstacles to incorporating
|
|
this package in other kernels.
|
|
|
|
2. Building
|
|
|
|
This section describes how to build the unix signal handler version.
|
|
When it is being built into the kernel, it is assumed that the kernels
|
|
build environment will be employed. The code requires gcc. Also the
|
|
supplied Makefile assumes GNU make.
|
|
|
|
Typing "make" or "gmake" will build the IEEE handler library, and the
|
|
test programs div0test, optest and ovtest. Take care to preserve the
|
|
optimization settings in Makefile for the test code. These have been
|
|
set to cause gcc to produce the appropriate test code. In some cases,
|
|
compiling with the wrong optimization may cause the compiler to crash
|
|
with a floating point exception as it attempts to pre-calculate some
|
|
result which is intended to cause a run time floating point exception.
|
|
|
|
3. Implementation
|
|
|
|
Assume that the CPU and FPU registers are already saved in some
|
|
structure passed as an argument to ieee_handle_exception. In the case
|
|
of the signal handler version, ieee_sig is the appropriate entry
|
|
point, which takes sigcontext as an argument. ieee_sig fetches the fpu
|
|
state, calls ieee_handle_exception and restores the fpu state. For the
|
|
rest of this section we assume in-kernel operation.
|
|
|
|
The trap processing proceeds as follows:
|
|
|
|
o decode the instruction, including addressing modes found at the
|
|
PC.
|
|
o fetch the operands.
|
|
o get the operands into an internal canonical form. Floating
|
|
operands become 8 byte doubles and integral operands become 4 byte
|
|
integers.
|
|
o get the trap type, eg overflow, underflow or reserved operand out of
|
|
the FSR, check whether the trap is enabled and if not switch to
|
|
functions to handle that trap.
|
|
o if the trap has been successfully handled, convert the result from
|
|
the canonical form to the destination form, write the operand and
|
|
increment the pc so that when the thread which took the exception
|
|
restarts, it is at the next instruction and return FPC_TT_NONE.
|
|
o if the user elected to handle the exception, or if some problem occurred,
|
|
ieee_handle_exception will return a trap type not equal to FPC_TT_NONE.
|
|
|
|
3.1 Status.
|
|
|
|
IEEE floating point standard says that special operands, Infinity, Nan
|
|
etc should be handled by default, but there is provision for the user
|
|
to specify that a trap should occur. The ns32381 always traps, so it
|
|
is up to the kernel trap handler to either handle the trap
|
|
transparently or pass it on to the user as required. To control this
|
|
functionality, there needs to be flags which the user can
|
|
set. Fortunately the floating status register (FSR) has 7 bits
|
|
reserved for this purpose (the FPC_SWF field). The following flags are
|
|
defined:
|
|
|
|
FPC_OVE 0x200 /* Overflow enable */
|
|
FPC_OVF 0x400 /* Overflow flag */
|
|
FPC_IVE 0x800 /* Invalid enable */
|
|
FPC_IVF 0x1000 /* Invalid flag */
|
|
FPC_DZE 0x2000 /* Divide by zero enable */
|
|
FPC_DZF 0x4000 /* Divide by zero flag */
|
|
FPC_UNDE 0x8000 /* Soft Underflow enable, requires FPC_UEN */
|
|
|
|
In addition there are the hardware defined flags:
|
|
|
|
FPC_IF 0x00000040 /* inexact result flag */
|
|
FPC_IEN 0x00000020 /* inexact result trap enable */
|
|
FPC_UF 0x00000010 /* underflow flag */
|
|
FPC_UEN 0x00000008 /* underflow trap enable */
|
|
|
|
If the corresponding enable flag is set when a trap occurs, then
|
|
ieee_handle_exception simply returns the trap type. The calling code
|
|
can then send the appropriate signal. Underflow is a little different
|
|
since there are three possible desired behaviours; produce a result of
|
|
zero, generate denormalized numbers and generate a signal. To provide
|
|
this level of control, there are two underflow bits:
|
|
|
|
FPC_UEN FPC_UNDE
|
|
0 X Produce zero
|
|
1 0 Produce denormalized numbers
|
|
1 1 Pass trap on to user
|
|
|
|
Whenever a trap occurs, the corresponding flag bit is set. Flags are
|
|
never cleared except by the user.
|
|
|
|
3.2 Subnormal numbers.
|
|
|
|
On an underflow trap, we need to be able to generate denormalized
|
|
numbers. Also, having generated the denormalized numbers, they will
|
|
cause a reserved operand trap if they are operands to any subsequent
|
|
operations. So we need to be able to generate and perform operations
|
|
on denormalized numbers.
|
|
|
|
Rather than produce a complete IEEE floating point emulation, the
|
|
approach to doing arithmetic on denorms is to first scale the operands
|
|
so that the operation can't possibly overflow or underflow, perform
|
|
the operation and then normalize. Care is taken to use the same
|
|
rounding mode as the thread which got the exception.
|
|
|
|
3.3 Error handling within the IEEE handler package.
|
|
|
|
If an instruction can't be decoded, or copyin or copyout fails
|
|
(presumably because an address is outside the tasks address space),
|
|
then ieee_handle_exception returns FPC_TT_ILL. It would be possible to
|
|
invent some new codes if more information is required. FPC_TT_ILL is
|
|
also returned if the external addressing mode is encountered. No one
|
|
uses this addressing mode.
|
|
|
|
4. Usage
|
|
|
|
4.1 Getting and setting the FSR contents.
|
|
|
|
The FSR fields are defined in fpu_status.h. There is a macro
|
|
GET_SET_FSR which gets the old value of the FSR and sets the new value
|
|
of the FSR. There are also seperate GET_FSR and SET_FSR macros.
|
|
Eg:
|
|
|
|
#include <fpu_status.h>
|
|
|
|
int fsr = GET_SET_FSR(FPC_UEN);
|
|
.
|
|
. /* Code requiring FPC_UEN to be set */
|
|
.
|
|
SET_FSR(fsr);
|
|
|
|
4.2 Signal Handler
|
|
|
|
The simplest way to use the package as a signal handler is as follows:
|
|
|
|
#include <signal.h>
|
|
#include <ieee_handler.h>
|
|
...
|
|
signal(SIGFPE, ieee_sig);
|
|
...
|
|
|
|
The ieee_sig function returns a code as for ieee_handle_exception. To
|
|
make use up this return code it would be necessary to write a wrapper
|
|
function for ieee_sig which did the right thing, possibly calling
|
|
kill(2) to send a (different) signal. ieee_sig could be made more
|
|
sophisticated in this respect, but hasn't since this mode of operation
|
|
is intended primarily as a debugging aid.
|
|
|
|
4.3 Mach Kernel
|
|
|
|
The fpintr() function is duplicated here to illustrate how the interface
|
|
is implemented in the mach kernel.
|
|
|
|
/*
|
|
* FPU error.
|
|
*/
|
|
void fpintr(struct ns532_saved_state *regs)
|
|
{
|
|
int ss;
|
|
int status = FPC_TT_UNKNOWN;
|
|
state state;
|
|
state.regs = regs;
|
|
state.fps = current_thread()->pcb->fps;
|
|
ss = splsched(); /* Note 1 */
|
|
fp_save(); /* Note 2 */
|
|
#if MACH_KDB
|
|
if (ieee_handler_enable) /* Note 3 */
|
|
#endif
|
|
{
|
|
_enable_fpu(); /* Note 4 */
|
|
status = ieee_handle_exception(&state); /* Note 5 */
|
|
_disable_fpu(); /* Note 4 */
|
|
}
|
|
splx(ss);
|
|
switch(status) {
|
|
case FPC_TT_ILL:
|
|
exception(EXC_BAD_INSTRUCTION, EXC_NS532_ILL, 0); /* Note 6 */
|
|
/* NOT REACHED */
|
|
case FPC_TT_NONE:
|
|
break;
|
|
default:
|
|
exception(EXC_ARITHMETIC, EXC_NS532_SLAVE, /* Note 7 */
|
|
(int)current_thread()->pcb->fps->fsr);
|
|
/* NOT REACHED */
|
|
}
|
|
}
|
|
|
|
|
|
Notes:
|
|
|
|
1 The ieee_handle_exception function is not re-entrant. This is for
|
|
two reasons. Mainly the code uses the fpu and the kernel fpu state
|
|
is not saved on interrupts. (User fpu state is managed by
|
|
allocating the fpu on demand to a thread. The fpu state is saved
|
|
only when the fpu is allocated to a new thread.) A second reason
|
|
for the code not being reentrant is that a static structure is
|
|
used to keep track of data which has been copyin'd. This latter
|
|
case could be eliminated fairly easily, but there seems no point
|
|
since the code can't be reentrant for the first reason.
|
|
|
|
To prevent other threads running and maybe using the fpu, we simply
|
|
run ieee_handle_exception at splsched.
|
|
|
|
2 The fp_save() also disables the fpu bit by clearing the bit in the cfg
|
|
register. We could make a slight saving by deferring the disable.
|
|
|
|
3 Having the ieee_handler_enable flag allows the in-kernel
|
|
processing to be turned off. This is useful when trying to debug
|
|
the signal handler version.
|
|
|
|
4 Ensure the fpu is enabled since ieee_handle_exception uses floating
|
|
point operations.
|
|
|
|
5 Call the handler and save its return status. The state argument
|
|
is constructed and passed. At this point in the processing, the
|
|
general registers have not yet been saved in the pcb, although the
|
|
floating point registers have been. Otherwise we could have arranged to
|
|
pass the pcb.
|
|
|
|
6 If the handler returned FPC_TT_ILL generate an illegal instruction
|
|
trap. This will ultimately cause a SIGILL.
|
|
|
|
7 Default to generating an arithmetic trap which will ultimately
|
|
cause a SIGFPE.
|
|
|
|
4.4 NetBSD Kernel
|
|
|
|
The integration of his FPU-handler into NetBSD/pc532 was really
|
|
straightforward. His FPU-handler should be in sys/arch/pc532/fpu in
|
|
any NetBSD version newer then 960415. This piece of code is from
|
|
sys/arch/pc532/pc532/trap.c:trap
|
|
|
|
case T_SLAVE | T_USER: {
|
|
int fsr, sig = SIGFPE;
|
|
pcb = &p->p_addr->u_pcb;
|
|
save_fpu_context(pcb);
|
|
switch(ieee_handle_exception(p)) {
|
|
case FPC_TT_NONE:
|
|
restore_fpu_context(pcb);
|
|
if (frame.tf_regs.r_psr & PSL_T) {
|
|
type = T_TRC | T_USER;
|
|
goto trace;
|
|
}
|
|
return;
|
|
case FPC_TT_ILL:
|
|
sig = SIGILL;
|
|
break;
|
|
default:
|
|
break;
|
|
}
|
|
restore_fpu_context(pcb);
|
|
sfsr(fsr);
|
|
trapsignal(p, sig, 0x80000000 | fsr);
|
|
goto out;
|
|
}
|
|
|
|
In NetBSD there is no need to enable the fpu here. save_fpu_context does
|
|
not disable the fpu. The FPU is enabled on T_UND in trap.c:trap and
|
|
disabled in locore.s:cpu_switch.
|
|
|
|
After an exec the fsr is initialized to FPC_UEN in machdep.c:setregs. All
|
|
other FPU registers are initialized to +0.0. Maybe it would be better to
|
|
set them to sNaNs.
|
|
|
|
The user can manipulate the behaviour of the FPU by using the following
|
|
functions:
|
|
|
|
fp_rnd fpgetround(void));
|
|
fp_rnd fpsetround(fp_rnd);
|
|
fp_except fpgetmask(void);
|
|
fp_except fpsetmask(fp_except);
|
|
fp_except fpgetsticky(void);
|
|
fp_except fpsetsticky(fp_except);
|
|
|
|
The prototypes for the functions and the typedefs for fp_rnd and fp_except
|
|
are defined in /usr/include/ieeefp.h.
|
|
|
|
fpsetround takes FP_RN (round to nearest), FP_RZ (round to zero), FP_RP
|
|
(round to +Inf) or FP_RM (round to -Inf) as arguments. fpgetround returns
|
|
the current rounding mode.
|
|
|
|
fpgetsticky returns the state of the stickybits in the fsr. The stickybits
|
|
can be set/reset by calling fpsetsticky. The stickybits have the same value
|
|
as their corresponding exception mask.
|
|
|
|
fpsetmask is used to enable and disable FPU exceptions. fpgetmask returns
|
|
the current exception mask.
|
|
|
|
The user can mask/unmask the following exceptions: FP_X_IMP (imprecise),
|
|
FP_X_OFL (overflow), FP_X_INV (invalid operation), FP_X_DZ
|
|
(devide-by-zero), FP_X_UFL (underflow).
|
|
Examples:
|
|
|
|
fpsetmask(FP_X_IMP | FP_X_OFL | FP_X_INV | FP_X_DZ | FP_X_UFL);
|
|
|
|
This will enable all fpu exceptions.
|
|
|
|
Check for divide by zero:
|
|
fpsetmask(0); /* mask all exceptions */
|
|
fpsetsticky(0); /* clear all sticky bits */
|
|
a = a / b;
|
|
if (fpgetsticky() & FP_X_DZ)
|
|
printf("devide by zero\n");
|
|
|
|
|
|
4.5 Porting to other Environments
|
|
|
|
The package is ns32k specific and assumes gcc. Otherwise it is intended
|
|
to be efficiently portable to other environments as easily as possible.
|
|
|
|
The calling code needs to get the CPU and FPU status into some
|
|
data structure. It is pretty flexible what data structure that might
|
|
be. In particular, it is OK for a structure to contain pointers to
|
|
other structures.
|
|
|
|
There must be a "typedef struct x state" statement in ieee_handler.h
|
|
since the type "state" is used in the prototype for
|
|
ieee_handle_exception(). Access to long (double precision floating
|
|
point), float and general purpose registers in the "state" type is
|
|
facilitated by the macros:
|
|
|
|
LREGBASE(s)
|
|
LREGOFFSET(n)
|
|
FREGBASE(s)
|
|
FREGOFFSET(n)
|
|
REGBASE(s)
|
|
REGOFFSET(n)
|
|
|
|
The OFFSET macros and the BASE macros must be defined so that, for
|
|
example, REGOFFSET(3) + REGBASE(state) is the address of register
|
|
3. The BASE macros must be constant expressions since they are used to
|
|
initialize a table of offsets with the "const" and "static"
|
|
attributes. Other macros, FSR, FP, SP, SB, PC and PSR are defined
|
|
such that, for example, (state *)s->FP accesses the fp (frame pointer)
|
|
register.
|
|
|
|
This should be quite flexible and examples for the mach, NetBSD and
|
|
signal handler implementations can be found in ieee_handler.h.
|
|
|
|
There should be only one other place where customization may be required.
|
|
That is in ieee_handler.c. The functions setjmp, longjmp and get_dword
|
|
are required. The first two may not be available in the kernel. In the
|
|
case of mach, setjmp and longjmp are defined in terms of _setjmp and
|
|
_longjmp. The get_dword macro uses copyin to get a long int from
|
|
user space using copyin.
|
|
|
|
It is assumed that copyin and copyout are available. In the signal
|
|
handler case, these are defined in terms of memcpy.
|
|
|
|
5. To Do
|
|
|
|
The testing has been cursory at best. A more sophisticated test suite
|
|
is needed. The conformance to the standard is probably patchy since
|
|
the author has never actually seen the IEEE floating point standard!
|
|
|
|
6. BUGS
|
|
|
|
Please report any bugs, and especially any improvements to the
|
|
author, Ian Dall <ian.dall@dsto.defence.gov.au>.
|
|
|
|
6. Copyright
|
|
|
|
This code is Copyright Ian Dall. Please respect it.
|
|
|
|
Permission to use, copy, modify and distribute this software and its
|
|
documentation is hereby granted, provided that both the copyright
|
|
notice and this permission notice appear in all copies of the
|
|
software, derivative works or modified versions, and any portions
|
|
thereof, and that both notices appear in supporting documentation.
|
|
|
|
If you have a good reason to want to vary the permission notice,
|
|
I am open to negotiation.
|