# $NetBSD: README,v 1.3 1998/01/05 20:51:46 perry Exp $ IEEE handler README ------------------- Ian Dall December 1995 NetBSD information by Matthias Pfaller 1. Introduction The ns32081 and the ns32381 floating point units implement a subset of the IEEE floating point standard. In cases where the correct operation is to generate special values, +-Infinity, NaN or a denormalized number, or when one of the operands is a special value, the FPU generates a trap. It is intended that the missing functionality be implemented in software. This packages provides that missing functionality. The trap handling code can run either in the kernel, or in user code as the signal handler in a unix process. The latter has the disadvantage that code changes are required to set up the signal handler, so it is intended that this mode way of use be used primarily for debugging. So far the in-kernel implementation has been done for mach and NetBSD. There should be no large obstacles to incorporating this package in other kernels. 2. Building This section describes how to build the unix signal handler version. When it is being built into the kernel, it is assumed that the kernels build environment will be employed. The code requires gcc. Also the supplied Makefile assumes GNU make. Typing "make" or "gmake" will build the IEEE handler library, and the test programs div0test, optest and ovtest. Take care to preserve the optimization settings in Makefile for the test code. These have been set to cause gcc to produce the appropriate test code. In some cases, compiling with the wrong optimization may cause the compiler to crash with a floating point exception as it attempts to pre-calculate some result which is intended to cause a run time floating point exception. 3. Implementation Assume that the CPU and FPU registers are already saved in some structure passed as an argument to ieee_handle_exception. In the case of the signal handler version, ieee_sig is the appropriate entry point, which takes sigcontext as an argument. ieee_sig fetches the fpu state, calls ieee_handle_exception and restores the fpu state. For the rest of this section we assume in-kernel operation. The trap processing proceeds as follows: o decode the instruction, including addressing modes found at the PC. o fetch the operands. o get the operands into an internal canonical form. Floating operands become 8 byte doubles and integral operands become 4 byte integers. o get the trap type, eg overflow, underflow or reserved operand out of the FSR, check whether the trap is enabled and if not switch to functions to handle that trap. o if the trap has been successfully handled, convert the result from the canonical form to the destination form, write the operand and increment the pc so that when the thread which took the exception restarts, it is at the next instruction and return FPC_TT_NONE. o if the user elected to handle the exception, or if some problem occurred, ieee_handle_exception will return a trap type not equal to FPC_TT_NONE. 3.1 Status. IEEE floating point standard says that special operands, Infinity, Nan etc should be handled by default, but there is provision for the user to specify that a trap should occur. The ns32381 always traps, so it is up to the kernel trap handler to either handle the trap transparently or pass it on to the user as required. To control this functionality, there needs to be flags which the user can set. Fortunately the floating status register (FSR) has 7 bits reserved for this purpose (the FPC_SWF field). The following flags are defined: FPC_OVE 0x200 /* Overflow enable */ FPC_OVF 0x400 /* Overflow flag */ FPC_IVE 0x800 /* Invalid enable */ FPC_IVF 0x1000 /* Invalid flag */ FPC_DZE 0x2000 /* Divide by zero enable */ FPC_DZF 0x4000 /* Divide by zero flag */ FPC_UNDE 0x8000 /* Soft Underflow enable, requires FPC_UEN */ In addition there are the hardware defined flags: FPC_IF 0x00000040 /* inexact result flag */ FPC_IEN 0x00000020 /* inexact result trap enable */ FPC_UF 0x00000010 /* underflow flag */ FPC_UEN 0x00000008 /* underflow trap enable */ If the corresponding enable flag is set when a trap occurs, then ieee_handle_exception simply returns the trap type. The calling code can then send the appropriate signal. Underflow is a little different since there are three possible desired behaviours; produce a result of zero, generate denormalized numbers and generate a signal. To provide this level of control, there are two underflow bits: FPC_UEN FPC_UNDE 0 X Produce zero 1 0 Produce denormalized numbers 1 1 Pass trap on to user Whenever a trap occurs, the corresponding flag bit is set. Flags are never cleared except by the user. 3.2 Subnormal numbers. On an underflow trap, we need to be able to generate denormalized numbers. Also, having generated the denormalized numbers, they will cause a reserved operand trap if they are operands to any subsequent operations. So we need to be able to generate and perform operations on denormalized numbers. Rather than produce a complete IEEE floating point emulation, the approach to doing arithmetic on denorms is to first scale the operands so that the operation can't possibly overflow or underflow, perform the operation and then normalize. Care is taken to use the same rounding mode as the thread which got the exception. 3.3 Error handling within the IEEE handler package. If an instruction can't be decoded, or copyin or copyout fails (presumably because an address is outside the tasks address space), then ieee_handle_exception returns FPC_TT_ILL. It would be possible to invent some new codes if more information is required. FPC_TT_ILL is also returned if the external addressing mode is encountered. No one uses this addressing mode. 4. Usage 4.1 Getting and setting the FSR contents. The FSR fields are defined in fpu_status.h. There is a macro GET_SET_FSR which gets the old value of the FSR and sets the new value of the FSR. There are also seperate GET_FSR and SET_FSR macros. Eg: #include int fsr = GET_SET_FSR(FPC_UEN); . . /* Code requiring FPC_UEN to be set */ . SET_FSR(fsr); 4.2 Signal Handler The simplest way to use the package as a signal handler is as follows: #include #include ... signal(SIGFPE, ieee_sig); ... The ieee_sig function returns a code as for ieee_handle_exception. To make use up this return code it would be necessary to write a wrapper function for ieee_sig which did the right thing, possibly calling kill(2) to send a (different) signal. ieee_sig could be made more sophisticated in this respect, but hasn't since this mode of operation is intended primarily as a debugging aid. 4.3 Mach Kernel The fpintr() function is duplicated here to illustrate how the interface is implemented in the mach kernel. /* * FPU error. */ void fpintr(struct ns532_saved_state *regs) { int ss; int status = FPC_TT_UNKNOWN; state state; state.regs = regs; state.fps = current_thread()->pcb->fps; ss = splsched(); /* Note 1 */ fp_save(); /* Note 2 */ #if MACH_KDB if (ieee_handler_enable) /* Note 3 */ #endif { _enable_fpu(); /* Note 4 */ status = ieee_handle_exception(&state); /* Note 5 */ _disable_fpu(); /* Note 4 */ } splx(ss); switch(status) { case FPC_TT_ILL: exception(EXC_BAD_INSTRUCTION, EXC_NS532_ILL, 0); /* Note 6 */ /* NOT REACHED */ case FPC_TT_NONE: break; default: exception(EXC_ARITHMETIC, EXC_NS532_SLAVE, /* Note 7 */ (int)current_thread()->pcb->fps->fsr); /* NOT REACHED */ } } Notes: 1 The ieee_handle_exception function is not re-entrant. This is for two reasons. Mainly the code uses the fpu and the kernel fpu state is not saved on interrupts. (User fpu state is managed by allocating the fpu on demand to a thread. The fpu state is saved only when the fpu is allocated to a new thread.) A second reason for the code not being reentrant is that a static structure is used to keep track of data which has been copyin'd. This latter case could be eliminated fairly easily, but there seems no point since the code can't be reentrant for the first reason. To prevent other threads running and maybe using the fpu, we simply run ieee_handle_exception at splsched. 2 The fp_save() also disables the fpu bit by clearing the bit in the cfg register. We could make a slight saving by deferring the disable. 3 Having the ieee_handler_enable flag allows the in-kernel processing to be turned off. This is useful when trying to debug the signal handler version. 4 Ensure the fpu is enabled since ieee_handle_exception uses floating point operations. 5 Call the handler and save its return status. The state argument is constructed and passed. At this point in the processing, the general registers have not yet been saved in the pcb, although the floating point registers have been. Otherwise we could have arranged to pass the pcb. 6 If the handler returned FPC_TT_ILL generate an illegal instruction trap. This will ultimately cause a SIGILL. 7 Default to generating an arithmetic trap which will ultimately cause a SIGFPE. 4.4 NetBSD Kernel The integration of his FPU-handler into NetBSD/pc532 was really straightforward. His FPU-handler should be in sys/arch/pc532/fpu in any NetBSD version newer then 960415. This piece of code is from sys/arch/pc532/pc532/trap.c:trap case T_SLAVE | T_USER: { int fsr, sig = SIGFPE; pcb = &p->p_addr->u_pcb; save_fpu_context(pcb); switch(ieee_handle_exception(p)) { case FPC_TT_NONE: restore_fpu_context(pcb); if (frame.tf_regs.r_psr & PSL_T) { type = T_TRC | T_USER; goto trace; } return; case FPC_TT_ILL: sig = SIGILL; break; default: break; } restore_fpu_context(pcb); sfsr(fsr); trapsignal(p, sig, 0x80000000 | fsr); goto out; } In NetBSD there is no need to enable the fpu here. save_fpu_context does not disable the fpu. The FPU is enabled on T_UND in trap.c:trap and disabled in locore.s:cpu_switch. After an exec the fsr is initialized to FPC_UEN in machdep.c:setregs. All other FPU registers are initialized to +0.0. Maybe it would be better to set them to sNaNs. The user can manipulate the behaviour of the FPU by using the following functions: fp_rnd fpgetround(void)); fp_rnd fpsetround(fp_rnd); fp_except fpgetmask(void); fp_except fpsetmask(fp_except); fp_except fpgetsticky(void); fp_except fpsetsticky(fp_except); The prototypes for the functions and the typedefs for fp_rnd and fp_except are defined in /usr/include/ieeefp.h. fpsetround takes FP_RN (round to nearest), FP_RZ (round to zero), FP_RP (round to +Inf) or FP_RM (round to -Inf) as arguments. fpgetround returns the current rounding mode. fpgetsticky returns the state of the stickybits in the fsr. The stickybits can be set/reset by calling fpsetsticky. The stickybits have the same value as their corresponding exception mask. fpsetmask is used to enable and disable FPU exceptions. fpgetmask returns the current exception mask. The user can mask/unmask the following exceptions: FP_X_IMP (imprecise), FP_X_OFL (overflow), FP_X_INV (invalid operation), FP_X_DZ (devide-by-zero), FP_X_UFL (underflow). Examples: fpsetmask(FP_X_IMP | FP_X_OFL | FP_X_INV | FP_X_DZ | FP_X_UFL); This will enable all fpu exceptions. Check for divide by zero: fpsetmask(0); /* mask all exceptions */ fpsetsticky(0); /* clear all sticky bits */ a = a / b; if (fpgetsticky() & FP_X_DZ) printf("devide by zero\n"); 4.5 Porting to other Environments The package is ns32k specific and assumes gcc. Otherwise it is intended to be efficiently portable to other environments as easily as possible. The calling code needs to get the CPU and FPU status into some data structure. It is pretty flexible what data structure that might be. In particular, it is OK for a structure to contain pointers to other structures. There must be a "typedef struct x state" statement in ieee_handler.h since the type "state" is used in the prototype for ieee_handle_exception(). Access to long (double precision floating point), float and general purpose registers in the "state" type is facilitated by the macros: LREGBASE(s) LREGOFFSET(n) FREGBASE(s) FREGOFFSET(n) REGBASE(s) REGOFFSET(n) The OFFSET macros and the BASE macros must be defined so that, for example, REGOFFSET(3) + REGBASE(state) is the address of register 3. The BASE macros must be constant expressions since they are used to initialize a table of offsets with the "const" and "static" attributes. Other macros, FSR, FP, SP, SB, PC and PSR are defined such that, for example, (state *)s->FP accesses the fp (frame pointer) register. This should be quite flexible and examples for the mach, NetBSD and signal handler implementations can be found in ieee_handler.h. There should be only one other place where customization may be required. That is in ieee_handler.c. The functions setjmp, longjmp and get_dword are required. The first two may not be available in the kernel. In the case of mach, setjmp and longjmp are defined in terms of _setjmp and _longjmp. The get_dword macro uses copyin to get a long int from user space using copyin. It is assumed that copyin and copyout are available. In the signal handler case, these are defined in terms of memcpy. 5. To Do The testing has been cursory at best. A more sophisticated test suite is needed. The conformance to the standard is probably patchy since the author has never actually seen the IEEE floating point standard! 6. BUGS Please report any bugs, and especially any improvements to the author, Ian Dall . 6. Copyright This code is Copyright Ian Dall. Please respect it. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. If you have a good reason to want to vary the permission notice, I am open to negotiation.