Back to libsm overview

libsm : Exception Handling


Id: exc.html,v 1.12 2001/02/13 21:21:25 gshapiro Exp
$NetBSD: exc.html,v 1.1.1.2 2003/06/01 14:01:34 atatat Exp $

Introduction

The exception handling package provides the facilities that functions in libsm use to report errors. Here are the basic concepts:
  1. When a function detects an exceptional condition at the library level, it does not print an error message, or call syslog, or exit the program. Instead, it reports the error back to its caller, and lets the caller decide what to do. This improves modularity, because error handling is separated from error reporting.

  2. Errors are not represented by a single integer error code, because that you can't represent everything that an error handler might need to know about an error by a single integer. Instead, errors are represented by exception objects. An exception object contains an exception code and an array of zero or more exception arguments. The exception code is a string that specifies what kind of exception this is, and the arguments may be integers, strings or exception objects.

  3. Errors are not reported using a special return value, because if you religiously check for error returns from every function call that could fail, then most of your code ends up being error handling code. Errors are reported by raising an exception. When an exception is raised, we unwind the call stack until we find an exception handler. If the exception is not handled, then we print the exception on stderr and exit the program.

Synopsis

#include <sm/exc.h>

typedef struct sm_exc_type SM_EXC_TYPE_T;
typedef struct sm_exc SM_EXC_T;
typedef union sm_val SM_VAL_T;

/*
**  Exception types
*/

extern const char SmExcTypeMagic[];

struct sm_exc_type
{
	const char	*sm_magic;
	const char	*etype_category;
	const char	*etype_argformat;
	void 		(*etype_print)(SM_EXC_T *exc, SM_FILE_T *stream);
	const char	*etype_printcontext;
};

extern const SM_EXC_TYPE_T SmEtypeOs;
extern const SM_EXC_TYPE_T SmEtypeErr;

void
sm_etype_printf(
	SM_EXC_T *exc,
	SM_FILE_T *stream);

/*
**  Exception objects
*/

extern const char SmExcMagic[];

union sm_val
{
	int		v_int;
	long		v_long;
	char		*v_str;
	SM_EXC_T	*v_exc;
};

struct sm_exc
{
	const char		*sm_magic;
	size_t			exc_refcount;
	const SM_EXC_TYPE_T	*exc_type;
	SM_VAL_T		*exc_argv;
};

SM_EXC_T *
sm_exc_new_x(
	const SM_EXC_TYPE_T *type,
	...);

SM_EXC_T *
sm_exc_addref(
	SM_EXC_T *exc);

void
sm_exc_free(
	SM_EXC_T *exc);

bool
sm_exc_match(
	SM_EXC_T *exc,
	const char *pattern);

void
sm_exc_print(
	SM_EXC_T *exc,
	SM_FILE_T *stream);

void
sm_exc_write(
	SM_EXC_T *exc,
	SM_FILE_T *stream);

void
sm_exc_raise_x(
	SM_EXC_T *exc);

void
sm_exc_raisenew_x(
	const SM_EXC_TYPE_T *type,
	...);

/*
**  Ensure that cleanup code is executed,
**  and/or handle an exception.
*/
SM_TRY
	Block of code that may raise an exception.
SM_FINALLY
	Cleanup code that may raise an exception.
	This clause is guaranteed to be executed even if an exception is
	raised by the SM_TRY clause or by an earlier SM_FINALLY clause.
	You may have 0 or more SM_FINALLY clauses.
SM_EXCEPT(exc, pattern)
	Exception handling code, triggered by an exception
	whose category matches 'pattern'.
	You may have 0 or more SM_EXCEPT clauses.
SM_END_TRY

Overview

An exception is an object which represents an exceptional condition, which might be an error condition like "out of memory", or might be a condition like "end of file".

Functions in libsm report errors and other unusual conditions by raising an exception, rather than by returning an error code or setting a global variable such as errno. If a libsm function is capable of raising an exception, its name ends in "_x". (We do not raise an exception when a bug is detected in the program; instead, we terminate the program using sm_abort. See the assertion package for details.)

When you are using the libsm exception handling package, you are using a new programming paradigm. You will need to abandon some of the programming idioms you are accustomed to, and switch to new idioms. Here is an overview of some of these idioms.

  1. When a function is unable to complete its task because of an exceptional condition, it reports this condition by raising an exception.

    Here is an example of how to construct an exception object and raise an exception. In this example, we convert a Unix system error into an exception.

    fd = open(path, O_RDONLY);
    if (fd == -1)
    	sm_exc_raise_x(sm_exc_new_x(&SmEtypeOs, errno, "open", "%s", path));
    
    Because the idiom sm_exc_raise_x(sm_exc_new_x(...)) is so common, it can be abbreviated as sm_exc_raisenew_x(...).

  2. When you detect an error at the application level, you don't call a function like BSD's errx, which prints an error message on stderr and exits the program. Instead, you raise an exception. This causes cleanup code in surrounding exception handlers to be run before the program exits. For example, instead of this:
    errx(1, "%s:%d: syntax error", filename, lineno);
    
    use this:
    sm_exc_raisenew_x(&SmEtypeErr, "%s:%d: syntax error", filename, lineno);
    
    The latter code raises an exception, unwinding the call stack and executing cleanup code. If the exception is not handled, then the exception is printed to stderr and the program exits. The end result is substantially the same as a call to errx.

  3. The SM_TRY ... SM_FINALLY ... control structure ensures that cleanup code is executed and resources are released in the presence of exceptions.

    For example, suppose that you have written the following code:

    rpool = sm_rpool_new_x(&SmRpoolRoot, 0);
    ... some code ...
    sm_rpool_free_x(rpool);
    
    If any of the functions called within "... some code ..." have names ending in _x, then it is possible that an exception will be raised, and if that happens, then "rpool" will not be freed. And that's a bug. To fix this bug, change your code so it looks like this:
    rpool = sm_rpool_new_x(&SmRpoolRoot, 0);
    SM_TRY
    	... some code that can raise an exception ...
    SM_FINALLY
    	sm_rpool_free_x(rpool);
    SM_END_TRY
    
  4. The SM_TRY ... SM_EXCEPT ... control structure handles an exception. Unhandled exceptions terminate the program. For example, here is a simple exception handler that traps all exceptions, and prints the exceptions:
    SM_TRY
    	/* code that can raise an exception */
    	...
    SM_EXCEPT(exc, "*")
    	/* catch all exceptions */
    	sm_exc_print(exc, stderr);
    SM_END_TRY
    
    Exceptions are reference counted. The SM_END_TRY macro contains a call to sm_exc_free, so you don't normally need to worry about freeing an exception after handling it. In the rare case that you want an exception to outlive an exception handler, then you increment its reference count by calling sm_exc_addref.

  5. The second argument of the SM_EXCEPT macro is a glob pattern which specifies the types of exceptions that are to be handled. For example, you might want to handle an end-of-file exception differently from other exceptions. Here's how you do that:
    SM_TRY
    	/* code that might raise end-of-file, or some other exception */
    	...
    SM_EXCEPT(exc, "E:sm.eof")
    	/* what to do if end-of-file is encountered */
    	...
    SM_EXCEPT(exc, "*")
    	/* what to do if some other exception is raised */
    	...
    SM_END_TRY
    

Exception Values

In traditional C code, errors are usually denoted by a single integer, such as errno. In practice, errno does not carry enough information to describe everything that an error handler might want to know about an error. And the scheme is not very extensible: if several different packages want to add additional error codes, it is hard to avoid collisions.

In libsm, an exceptional condition is described by an object of type SM_EXC_T. An exception object is created by specifying an exception type and a list of exception arguments.

The exception arguments are an array of zero or more values. The values may be a mixture of ints, longs, strings, and exceptions. In the SM_EXC_T structure, the argument vector is represented by SM_VAL_T *exc_argv, where SM_VAL_T is a union of the possible argument types. The number and types of exception arguments is determined by the exception type.

An exception type is a statically initialized const object of type SM_EXC_TYPE_T, which has the following members:

const char *sm_magic
A pointer to SmExcTypeMagic.

const char *etype_category
This is a string of the form "class:name".

The class is used to assign the exception type to one of a number of broad categories of exceptions on which an exception handler might want to discriminate. I suspect that what we want is a hierarchical taxonomy, but I don't have a full design for this yet. For now, I am recommending the following classes:

"F"
A fatal error has occurred. This is an error that prevents the application from making any further progress, so the only recourse is to raise an exception, execute cleanup code as the stack is unwound, then exit the application. The out-of-memory exception raised by sm_malloc_x has category "F:sm.heap" because sendmail commits suicide (after logging the error and cleaning up) when it runs out of memory.
"E"
The function could not complete its task because an error occurred. (It might be useful to define subclasses of this category, in which case our taxonony becomes a tree, and 'F' becomes a subclass of 'E'.)
"J"
This exception is being raised in order to effect a non-local jump. No error has occurred; we are just performing the non-local equivalent of a continue, break or return.
"S"
The function was interrupted by a signal. Signals are not errors because they occur asynchronously, and they are semantically unrelated to the function that happens to be executing when the signal arrives. Note that it is extremely dangerous to raise an exception from a signal handler. For example, if you are in the middle of a call to malloc, you might corrupt the heap.
Eric's libsm paper defines "W", "D" and "I" for Warning, Debug and Informational: I suspect these categories only make sense in the context of Eric's 1985 exception handling system which allowed you to raise conditions without terminating the calling function.

The name uniquely identifies the exception type. I recommend a string of the form library.package.detail.

const char *etype_argformat
This is an array of single character codes. Each code indicates the type of one of the exception arguments. sm_exc_new_x uses this string to decode its variable argument list into an exception argument vector. The following type codes are supported:
i
The exception argument has type int.
l
The exception argument has type long.
e
The exception argument has type SM_EXC_T*. The value may either be NULL or a pointer to an exception. The pointer value is simply copied into the exception argument vector.
s
The exception argument has type char*. The value may either be NULL or a pointer to a character string. In the latter case, sm_exc_new_x will make a copy of the string.
r
The exception argument has type char*. sm_exc_new_x will read a printf-style format string argument followed by a list of printf arguments from its variable argument list, and convert these into a string. This type code can only occur as the last element of exc_argformat.

void (*etype_print)(SM_EXC_T *exc, SM_FILE_T *stream)
This function prints an exception of the specified type onto an output stream. The final character printed is not a newline.

Standard Exceptions and Exception Types

Libsm defines one standard exception value, SmHeapOutOfMemory. This is a statically initialized const variable, because it seems like a bad idea to dynamically allocate an exception object to report a low memory condition. This exception has category "F:sm.heap". If you need to, you can explicitly raise this exception with sm_exc_raise_x(&SmHeapOutOfMemory).

Statically initialized exception values cannot contain any run-time parameters, so the normal case is to dynamically allocate a new exception object whenever you raise an exception. Before you can create an exception, you need an exception type. Libsm defines the following standard exception types.

SmEtypeOs
This represents a generic operating system error. The category is "E:sm.os". The argformat is "isr", where argv[0] is the value of errno after a system call has failed, argv[1] is the name of the function (usually a system call) that failed, and argv[2] is either NULL or a character string which describes some of the arguments to the failing system call (usually it is just a file name). Here's an example of raising an exception:
fd = open(filename, O_RDONLY);
if (fd == -1)
	sm_exc_raisenew_x(&SmEtypeOs, errno, "open", "%s", filename);
If errno is ENOENT and filename is "/etc/mail/snedmail.cf", then the exception raised by the above code will be printed as
/etc/mail/snedmail.cf: open failed: No such file or directory
SmEtypeErr
This represents a generic error. The category is "E:sm.err", and the argformat is "r". You can use it in application contexts where you are raising an exception for the purpose of terminating the program. You know the exception won't be handled, so you don't need to worry about packaging the error for later analysis by an exception handler. All you need to specify is the message string that will be printed to stderr before the program exits. For example,
sm_exc_raisenew_x(&SmEtypeErr, "name lookup failed: %s", name);

Custom Exception Types

If you are writing a library package, and you need to raise exceptions that are not standard Unix system errors, then you need to define one or more new exception types.

Every new exception type needs a print function. The standard print function sm_etype_printf is all you need in the majority of cases. It prints the etype_printcontext string of the exception type, substituting occurrences of %0 through %9 with the corresponding exception argument. If exception argument 3 is an int or long, then %3 will print the argument in decimal, and %o3 or %x3 will print it in octal or hex.

In the following example, I will assume that your library package implements regular expressions, and can raise 5 different exceptions. When compiling a regular expression, 3 different syntax errors can be reported:

Whenever one of these errors is reported, you will also report the index of the character within the regex string at which the syntax error was detected. The fourth exception is raised if a compiled regular expression is invalid: this exception has no arguments. The fifth exception is raised if the package runs out of memory: for this, you use the standard SmHeapOutOfMemory exception.

The obvious approach is to define 4 separate exception types. Here they are:

/* print a regular expression syntax error */
void
rx_esyntax_print(SM_EXC_T *exc, SM_FILE_T *stream)
{
	sm_io_fprintf(stream, "rx syntax error at character %d: %s",
		exc->exc_argv[0].v_int,
		exc->exc_type->etype_printcontext);
}
SM_EXC_TYPE_T RxSyntaxParen = {
	SmExcTypeMagic,
	"E:mylib.rx.syntax.paren",
	"i",
	rx_esyntax_print,
	"unbalanced parenthesis"
};
SM_EXC_TYPE_T RxSyntaxBracket = {
	SmExcTypeMagic,
	"E:mylib.rx.syntax.bracket",
	"i",
	rx_esyntax_print,
	"unbalanced bracket"
};
SM_EXC_TYPE_T RxSyntaxMissingArg = {
	SmExcTypeMagic,
	"E:mylib.rx.syntax.missingarg",
	"i",
	rx_esyntax_print,
	"missing argument for repetition operator"
};

SM_EXC_TYPE_T RxRunCorrupt = {
	SmExcTypeMagic,
	"E:mylib.rx.run.corrupt",
	"",
	sm_etype_printf,
	"rx runtime error: compiled regular expression is corrupt"
};

With the above definitions, you can raise a syntax error reporting an unbalanced parenthesis at string offset i using:

sm_exc_raisenew_x(&RxSyntaxParen, i);
If i==42 then this exception will be printed as:
rx syntax error at character 42: unbalanced parenthesis
An exception handler can provide special handling for regular expression syntax errors using this code:
SM_TRY
	... code that might raise an exception ...
SM_EXCEPT(exc, "E:mylib.rx.syntax.*")
	int i = exc->exc_argv[0].v_int;
	... handle a regular expression syntax error ...
SM_END_TRY

External requirements may force you to define an integer code for each error reported by your package. Or you may be wrapping an existing package that works this way. In this case, it might make sense to define a single exception type, patterned after SmEtypeOs, and include the integer code as an exception argument.

Your package might intercept an exception E generated by a lower level package, and then reclassify it as a different expression E'. For example, a package for reading a configuration file might reclassify one of the regular expression syntax errors from the previous example as a configuration file syntax error. When you do this, the new exception E' should include the original exception E as an exception parameter, and the print function for exception E' should print the high level description of the exception (eg, "syntax error in configuration file %s at line %d\n"), then print the subexception that is stored as an exception parameter.

Function Reference

SM_EXC_T *sm_exc_new_x(const SM_EXC_TYPE_T *type, ...)
Create a new exception. Raise an exception on heap exhaustion. The new exception has a reference count of 1.

A list of zero or more exception arguments follows the exception type; these are copied into the new exception object. The number and types of these arguments is determined by type->etype_argformat.

Note that there is no rpool argument to sm_exc_new_x. Exceptions are allocated directly from the heap. This is because exceptions are normally raised at low levels of abstraction and handled at high levels. Because the low level code typically has no idea of how or at what level the exception will be handled, it also has no idea of which resource pool, if any, should own the exception.

SM_EXC_T *sm_exc_addref(SM_EXC_T *exc)
Increment the reference count of an exception. Return the first argument.

void sm_exc_free(SM_EXC_T *exc)
Decrement the reference count of an exception. If it reaches 0, free the exception object.

bool sm_exc_match(SM_EXC_T *exc, const char *pattern)
Compare the exception's category to the specified glob pattern, return true if they match.

void sm_exc_print(SM_EXC_T *exc, SM_FILE_T *stream)
Print the exception on the stream as a sequence of one or more newline terminated lines.

void sm_exc_write(SM_EXC_T *exc, SM_FILE_T *stream)
Write the exception on the stream without a terminating newline.

void sm_exc_raise_x(SM_EXC_T *exc)
Raise the exception. This function does not return to its caller.

void sm_exc_raisenew_x(const SM_EXC_TYPE_T *type, ...)
A short form for sm_exc_raise_x(sm_exc_new_x(type,...)).

Macro Reference

The SM_TRY ... SM_END_TRY control structure ensures that cleanup code is executed in the presence of exceptions, and permits exceptions to be handled.
SM_TRY
	A block of code that may raise an exception.
SM_FINALLY
	Cleanup code that may raise an exception.
	This code is guaranteed to be executed whether or not
	an exception was raised by a previous clause.
	You may have 0 or more SM_FINALLY clauses.
SM_EXCEPT(e, pat)
	Exception handling code, which is triggered by an exception
	whose category matches the glob pattern 'pat'.
	The exception value is bound to the local variable 'e'.
	You may have 0 or more SM_EXCEPT clauses.
SM_END_TRY
First, the SM_TRY clause is executed, then each SM_FINALLY clause is executed in sequence. If one or more of these clauses was terminated by an exception, then the first such exception is remembered, and the other exceptions are lost. If no exception was raised, then we are done. Otherwise, each of the SM_EXCEPT clauses is examined in sequence. and the first SM_EXCEPT clause whose pattern argument matches the exception (see sm_exc_match) is executed. If none of the SM_EXCEPT clauses matched the exception, or if there are no SM_EXCEPT clauses, then the remembered exception is re-raised.

SM_TRY .. SM_END_TRY clauses may be nested arbitrarily.

It is illegal to jump out of a SM_TRY or SM_FINALLY clause using goto, break, continue, return or longjmp. If you do this, you will corrupt the internal exception handling stack. You can't use break or continue in an SM_EXCEPT clause; these are reserved for use by the implementation. It is legal to jump out of an SM_EXCEPT clause using goto or return; however, in this case, you must take responsibility for freeing the exception object.

The SM_TRY and SM_FINALLY macros contain calls to setjmp, and consequently, they suffer from the limitations imposed on setjmp by the C standard. Suppose you declare an auto variable i outside of a SM_TRY ... SM_END_TRY statement, initializing it to 0. Then you modify i inside of a SM_TRY or SM_FINALLY clause, setting it to 1. If you reference i in a different SM_FINALLY clause, or in an SM_EXCEPT clause, then it is implementation dependent whether i will be 0 or 1, unless you have declared i to be volatile.

int volatile i = 0;

SM_TRY
	i = 1;
	...
SM_FINALLY
	/* the following reference to i only works if i is declared volatile */
	use(i);
	...
SM_EXCEPT(exc, "*")
	/* the following reference to i only works if i is declared volatile */
	use(i);
	...
SM_END_TRY