NetBSD/sys/arch/pc532/fpu
wiz ee1b406595 Spell address with two d's. Inspired by similar changes in OpenBSD,
originating from Jonathon Gray and forwarded by jmc@openbsd.
2003-11-10 08:51:51 +00:00
..
fpu_status.h
ieee_dze.c __KERNEL_RCSID() 2003-07-15 02:54:31 +00:00
ieee_handler.c Spell address with two d's. Inspired by similar changes in OpenBSD, 2003-11-10 08:51:51 +00:00
ieee_handler.h Adapt to the Scheduler Activations changes. 2003-06-23 13:06:54 +00:00
ieee_internal.h
ieee_invop.c __KERNEL_RCSID() 2003-07-15 02:54:31 +00:00
ieee_ovfl.c __KERNEL_RCSID() 2003-07-15 02:54:31 +00:00
ieee_subnormal.c __KERNEL_RCSID() 2003-07-15 02:54:31 +00:00
README

#	$NetBSD: README,v 1.5 2001/12/04 17:56:36 wiz Exp $

			 IEEE handler README
			 -------------------

	       Ian Dall <ian.dall@dsto.defence.gov.au>
			    December 1995

     NetBSD information by Matthias Pfaller <leo@dachau.marco.de>


1. Introduction

The ns32081 and the ns32381 floating point units implement a subset of
the IEEE floating point standard. In cases where the correct operation
is to generate special values, +-Infinity, NaN or a denormalized
number, or when one of the operands is a special value, the FPU
generates a trap. It is intended that the missing functionality be
implemented in software. This packages provides that missing
functionality.

The trap handling code can run either in the kernel, or in user code
as the signal handler in a unix process. The latter has the
disadvantage that code changes are required to set up the signal
handler, so it is intended that this mode way of use be used primarily
for debugging. So far the in-kernel implementation has been done for
mach and NetBSD. There should be no large obstacles to incorporating
this package in other kernels.

2. Building

This section describes how to build the unix signal handler version.
When it is being built into the kernel, it is assumed that the kernels
build environment will be employed.  The code requires gcc. Also the
supplied Makefile assumes GNU make.

Typing "make" or "gmake" will build the IEEE handler library, and the
test programs div0test, optest and ovtest. Take care to preserve the
optimization settings in Makefile for the test code. These have been
set to cause gcc to produce the appropriate test code. In some cases,
compiling with the wrong optimization may cause the compiler to crash
with a floating point exception as it attempts to pre-calculate some
result which is intended to cause a run time floating point exception.

3. Implementation

Assume that the CPU and FPU registers are already saved in some
structure passed as an argument to ieee_handle_exception. In the case
of the signal handler version, ieee_sig is the appropriate entry
point, which takes sigcontext as an argument. ieee_sig fetches the fpu
state, calls ieee_handle_exception and restores the fpu state. For the
rest of this section we assume in-kernel operation.

The trap processing proceeds as follows:

  o decode the instruction, including addressing modes found at the
    PC.
  o fetch the operands.
  o get the operands into an internal canonical form. Floating
    operands become 8 byte doubles and integral operands become 4 byte
    integers.
  o get the trap type, eg overflow, underflow or reserved operand out of
    the FSR, check whether the trap is enabled and if not switch to
    functions to handle that trap.
  o if the trap has been successfully handled, convert the result from
    the canonical form to the destination form, write the operand and
    increment the pc so that when the thread which took the exception
    restarts, it is at the next instruction and return FPC_TT_NONE.
  o if the user elected to handle the exception, or if some problem occurred,
    ieee_handle_exception will return a trap type not equal to FPC_TT_NONE.

3.1 Status.

IEEE floating point standard says that special operands, Infinity, Nan
etc should be handled by default, but there is provision for the user
to specify that a trap should occur. The ns32381 always traps, so it
is up to the kernel trap handler to either handle the trap
transparently or pass it on to the user as required. To control this
functionality, there needs to be flags which the user can
set. Fortunately the floating status register (FSR) has 7 bits
reserved for this purpose (the FPC_SWF field). The following flags are
defined:

    FPC_OVE  0x200		/* Overflow enable */
    FPC_OVF  0x400		/* Overflow flag */
    FPC_IVE  0x800		/* Invalid enable */
    FPC_IVF  0x1000		/* Invalid flag */
    FPC_DZE  0x2000		/* Divide by zero enable */
    FPC_DZF  0x4000		/* Divide by zero flag */
    FPC_UNDE 0x8000		/* Soft Underflow enable, requires FPC_UEN */

In addition there are the hardware defined flags:

    FPC_IF		0x00000040	/* inexact result flag */
    FPC_IEN		0x00000020	/* inexact result trap enable */
    FPC_UF	        0x00000010	/* underflow flag      	   */
    FPC_UEN	        0x00000008	/* underflow trap enable */

If the corresponding enable flag is set when a trap occurs, then
ieee_handle_exception simply returns the trap type. The calling code
can then send the appropriate signal. Underflow is a little different
since there are three possible desired behaviours; produce a result of
zero, generate denormalized numbers and generate a signal. To provide
this level of control, there are two underflow bits:

	FPC_UEN	FPC_UNDE
	0	X		Produce zero
	1	0		Produce denormalized numbers
	1	1		Pass trap on to user

Whenever a trap occurs, the corresponding flag bit is set. Flags are
never cleared except by the user.

3.2 Subnormal numbers.

On an underflow trap, we need to be able to generate denormalized
numbers. Also, having generated the denormalized numbers, they will
cause a reserved operand trap if they are operands to any subsequent
operations. So we need to be able to generate and perform operations
on denormalized numbers.

Rather than produce a complete IEEE floating point emulation, the
approach to doing arithmetic on denorms is to first scale the operands
so that the operation can't possibly overflow or underflow, perform
the operation and then normalize. Care is taken to use the same
rounding mode as the thread which got the exception.

3.3 Error handling within the IEEE handler package.

If an instruction can't be decoded, or copyin or copyout fails
(presumably because an address is outside the tasks address space),
then ieee_handle_exception returns FPC_TT_ILL. It would be possible to
invent some new codes if more information is required. FPC_TT_ILL is
also returned if the external addressing mode is encountered. No one
uses this addressing mode.

4. Usage

4.1 Getting and setting the FSR contents.

The FSR fields are defined in fpu_status.h. There is a macro
GET_SET_FSR which gets the old value of the FSR and sets the new value
of the FSR. There are also separate GET_FSR and SET_FSR macros.
Eg:

    #include <fpu_status.h>

       int fsr = GET_SET_FSR(FPC_UEN);
       .
       . /* Code requiring FPC_UEN to be set */
       .
       SET_FSR(fsr);

4.2 Signal Handler

The simplest way to use the package as a signal handler is as follows:

    #include <signal.h>
    #include <ieee_handler.h>
       ...
       signal(SIGFPE, ieee_sig);
       ...

The ieee_sig function returns a code as for ieee_handle_exception.  To
make use up this return code it would be necessary to write a wrapper
function for ieee_sig which did the right thing, possibly calling
kill(2) to send a (different) signal. ieee_sig could be made more
sophisticated in this respect, but hasn't since this mode of operation
is intended primarily as a debugging aid.

4.3 Mach Kernel

The fpintr() function is duplicated here to illustrate how the interface
is implemented in the mach kernel.

    /*
     * FPU error.
     */
    void fpintr(struct ns532_saved_state *regs)
    {
	    int ss;
	    int status = FPC_TT_UNKNOWN;
	    state state;
	    state.regs = regs;
	    state.fps = current_thread()->pcb->fps;
	    ss = splsched();	/* Note 1 */
	    fp_save();		/* Note 2 */
    #if MACH_KDB
	    if (ieee_handler_enable) /* Note 3 */
    #endif
	      {
		_enable_fpu();	/* Note 4 */
		status = ieee_handle_exception(&state);	/* Note 5 */
		_disable_fpu();	/* Note 4 */
	      }
	    splx(ss);
	    switch(status) {
	    case FPC_TT_ILL:
	      exception(EXC_BAD_INSTRUCTION, EXC_NS532_ILL, 0);	/* Note 6 */
	      /* NOT REACHED */
	    case FPC_TT_NONE:
	      break;
	    default:
	      exception(EXC_ARITHMETIC, EXC_NS532_SLAVE, /* Note 7 */
			(int)current_thread()->pcb->fps->fsr);
	      /* NOT REACHED */
	    }
    }


Notes:

  1 The ieee_handle_exception function is not re-entrant. This is for
    two reasons. Mainly the code uses the fpu and the kernel fpu state
    is not saved on interrupts. (User fpu state is managed by
    allocating the fpu on demand to a thread. The fpu state is saved
    only when the fpu is allocated to a new thread.) A second reason
    for the code not being reentrant is that a static structure is
    used to keep track of data which has been copyin'd. This latter
    case could be eliminated fairly easily, but there seems no point
    since the code can't be reentrant for the first reason.

    To prevent other threads running and maybe using the fpu, we simply
    run ieee_handle_exception at splsched.

  2 The fp_save() also disables the fpu bit by clearing the bit in the cfg
    register. We could make a slight saving by deferring the disable.

  3 Having the ieee_handler_enable flag allows the in-kernel
    processing to be turned off. This is useful when trying to debug
    the signal handler version.

  4 Ensure the fpu is enabled since ieee_handle_exception uses floating
    point operations.

  5 Call the handler and save its return status. The state argument
    is constructed and passed. At this point in the processing, the
    general registers have not yet been saved in the pcb, although the
    floating point registers have been. Otherwise we could have arranged to
    pass the pcb.
    
  6 If the handler returned FPC_TT_ILL generate an illegal instruction
    trap. This will ultimately cause a SIGILL.

  7 Default to generating an arithmetic trap which will ultimately
    cause a SIGFPE.

4.4 NetBSD Kernel 

The integration of his FPU-handler into NetBSD/pc532 was really
straightforward. His FPU-handler should be in sys/arch/pc532/fpu in
any NetBSD version newer than 960415. This piece of code is from
sys/arch/pc532/pc532/trap.c:trap

	case T_SLAVE | T_USER: {
		int fsr, sig = SIGFPE;
		pcb = &p->p_addr->u_pcb;
		save_fpu_context(pcb);
		switch(ieee_handle_exception(p)) {
		case FPC_TT_NONE:
			restore_fpu_context(pcb);
			if (frame.tf_regs.r_psr &  PSL_T) {
				type = T_TRC | T_USER;
				goto trace;
			}
			return;
		case FPC_TT_ILL:
			sig = SIGILL;
			break;
		default:
			break;
		}
		restore_fpu_context(pcb);
		sfsr(fsr);
		trapsignal(p, sig, 0x80000000 | fsr);
		goto out;
	}

In NetBSD there is no need to enable the fpu here. save_fpu_context does
not disable the fpu. The FPU is enabled on T_UND in trap.c:trap and
disabled in locore.s:cpu_switch.

After an exec the fsr is initialized to FPC_UEN in machdep.c:setregs. All
other FPU registers are initialized to +0.0. Maybe it would be better to
set them to sNaNs.

The user can manipulate the behaviour of the FPU by using the following
functions:

	fp_rnd	  fpgetround(void));
	fp_rnd	  fpsetround(fp_rnd);
	fp_except fpgetmask(void);
	fp_except fpsetmask(fp_except);
	fp_except fpgetsticky(void);
	fp_except fpsetsticky(fp_except);

The prototypes for the functions and the typedefs for fp_rnd and fp_except
are defined in /usr/include/ieeefp.h.

fpsetround takes FP_RN (round to nearest), FP_RZ (round to zero), FP_RP
(round to +Inf) or FP_RM (round to -Inf) as arguments. fpgetround returns
the current rounding mode.

fpgetsticky returns the state of the stickybits in the fsr. The stickybits
can be set/reset by calling fpsetsticky. The stickybits have the same value
as their corresponding exception mask.

fpsetmask is used to enable and disable FPU exceptions. fpgetmask returns
the current exception mask.

The user can mask/unmask the following exceptions: FP_X_IMP (imprecise),
FP_X_OFL (overflow), FP_X_INV (invalid operation), FP_X_DZ
(devide-by-zero), FP_X_UFL (underflow).
Examples:

	fpsetmask(FP_X_IMP |  FP_X_OFL | FP_X_INV | FP_X_DZ | FP_X_UFL);

This will enable all fpu exceptions.

Check for divide by zero:
	fpsetmask(0);	/* mask all exceptions */
	fpsetsticky(0);	/* clear all sticky bits */
	a = a / b;
	if (fpgetsticky() & FP_X_DZ)
		printf("devide by zero\n");


4.5 Porting to other Environments

The package is ns32k specific and assumes gcc. Otherwise it is intended
to be efficiently portable to other environments as easily as possible.

The calling code needs to get the CPU and FPU status into some
data structure. It is pretty flexible what data structure that might
be. In particular, it is OK for a structure to contain pointers to
other structures.

There must be a "typedef struct x state" statement in ieee_handler.h
since the type "state" is used in the prototype for
ieee_handle_exception().  Access to long (double precision floating
point), float and general purpose registers in the "state" type is
facilitated by the macros:

   LREGBASE(s)
   LREGOFFSET(n)
   FREGBASE(s)
   FREGOFFSET(n)
   REGBASE(s)
   REGOFFSET(n)

The OFFSET macros and the BASE macros must be defined so that, for
example, REGOFFSET(3) + REGBASE(state) is the address of register
3. The BASE macros must be constant expressions since they are used to
initialize a table of offsets with the "const" and "static"
attributes.  Other macros, FSR, FP, SP, SB, PC and PSR are defined
such that, for example, (state *)s->FP accesses the fp (frame pointer)
register.

This should be quite flexible and examples for the mach, NetBSD and
signal handler implementations can be found in ieee_handler.h.

There should be only one other place where customization may be required.
That is in ieee_handler.c. The functions setjmp, longjmp and get_dword
are required. The first two may not be available in the kernel. In the
case of mach, setjmp and longjmp are defined in terms of _setjmp and
_longjmp. The get_dword macro uses copyin to get a long int from
user space using copyin.

It is assumed that copyin and copyout are available. In the signal
handler case, these are defined in terms of memcpy.

5. To Do

The testing has been cursory at best. A more sophisticated test suite
is needed. The conformance to the standard is probably patchy since
the author has never actually seen the IEEE floating point standard!

6. BUGS

Please report any bugs, and especially any improvements to the
author, Ian Dall <ian.dall@dsto.defence.gov.au>.

6. Copyright

This code is Copyright Ian Dall. Please respect it.

Permission to use, copy, modify and distribute this software and its
documentation is hereby granted, provided that both the copyright
notice and this permission notice appear in all copies of the
software, derivative works or modified versions, and any portions
thereof, and that both notices appear in supporting documentation.

If you have a good reason to want to vary the permission notice,
I am open to negotiation.