A dynamically linked program invokes the rtld cleanup routine via
an atexit handler. This rtld cleanup routine invokes _fini() for
shared libraries, which in-turn invoke __cxa_finalize() with their
DSO handle. By luck, this happens to work okay for non-threaded
programs, but for a threaded program, this leads to deadlock (sometimes
manifested as an assertion failure, if the program didn't actually
create any threads).
Fixed by teaching __cxa_finalize() that it can be recursively invoked,
adjusting the handler list manipulation accordingly.