for that file to not be built with profiling. This makes it reasonable to
use pthread_{set,get}specific() to implement thread-safe profiline call counts.
if the corresponding TSD entry is empty.
Cuts down lock/unlock pairs for this operation from 256 to the number
of active TSD entries; sicne this is done when every thread exits, it saves
many total lock/unlock pairs.