There are several optimizations here:
1) Objects on _rtld_list_main do not participate in the DAG structures
at all. This is okay because all symbols must be resolvable at
link/load time, and _rtld_list_main is always searched first, so
any references from those objects must necessarily be resolved to
other objects on _rtld_list_main.
(Making this work completely required setting obj->main a bit
earlier; hence the RTLD_MAIN hack.)
2) Objects on _rtld_list_main are not put on _rtld_list_global,
preventing an extra search.
3) A bit is used to keep track of whether an object is on
_rtld_list_global, so we don't have to do a silly linear search.
4) A small attempt is made to prevent objects being put on the DAG
lists multiple times (using a silly linear search).
The sum of this appears to be a ~10% (.3s) reduction in Mozilla's
startup time on my 800MHz box.
Also, make sure _rtld_objmain->path is always set, just to make the
debug output nicer.
and utilize it. This greatly reduces the number of calls to open(2) and
malloc(3) for programs like mozilla that depend on many shared objects
while it doesn't affect performance of small programs.
-dynamic-linker=/libexec/ld.elf_so) if the BINDIR of the program being
built is /bin or /sbin.
The reason we do this is because now all programs *except* those in
/bin and /sbin (i.e. the "special cases") match the default the compiler
uses, which is what is used for things in e.g. xsrc, pkgsrc, and other
random 3rd party programs.
This is done by decoupling where a shlib is installed from how it
is located. Two new variables, SHLIBINSTALLDIR and SHLINKINSTALLDIR,
contain the former information, and key off MKDYNAMICROOT only. SHLIBDIR
and SHLINKDIR contain the latter, and key off MKDYNAMICROOT and BINDIR.
The SHLIBINSTALLDIR, SHLIBDIR, _LIBSODIR, SHLINKINSTALLDIR, and
SHLINKDIR parameters are moved to a new <bsd.shlib.mk>; see bsd.README
for usage details.
instructions. Function calls use GOT indirection, and we only patch the
GOT.
2) The mask-comparison optimization always fails, because the saved mask
always has 0x2000 set, and the PLT stub mask never does. So, remove it.
Remove the call to _rtld_relocate_objects() completely -- except on VAX, where
we TEMPORARILY call _rtld_relocate_nonplt_objects() directly.
Also add more assertions -- ld.elf_so should never have PLT relocations.
Fix an obvious bug in the 64-bit PLT fixup: the SLLX was by 12 bits, when it
should be 32.
Fix what *appear* to be two bugs in the >32768 PLT entry stub:
* One division was wrong (/14 rather than /24).
* We need to subtract 1048576 (to make the offset relative to the beginning of
the upper section), not add it.
This path is still untested, and buggy.
symbols in the global part of the symbol table, use the updated GOT entry
rather than doing a lookup. (This provides the same effect as `-z combreloc'
on other platforms -- at most one lookup is done per symbol.)
Unfortunately, it is necessary to turn off lazy binding on MIPS. As the
comment says:
* XXX DANGER WILL ROBINSON!
* You might think this is stupid, as it intentionally
* defeats lazy binding -- and you'd be right.
* Unfortunately, for lazy binding to work right, we
* need to a way to force the GOT slots used for
* function pointers to be resolved immediately. This
* is supposed to be done automatically by the linker,
* by not outputting a PLT slot and setting st_value
* to 0, but GNU ld does not do so reliably.
years now.) Use _rtld_pagesz instead of getpagesize() to determine the page
size in our local malloc(). Saves a system call.
Also, since we're now relocated early, we don't need to be careful to avoid
globals, so most of the VARPSZ hacks are eliminated.
l_addr is always supposed to be obj->relocbase -- or so says the GDB code that
uses it. So, set it to this on all platforms. It already was on VAX
explicitly, and on everything else except MIPS implicitly (because
mapbase==relocbase for all existing shlibs). For some silly/stupid reason, a
new field was created that the MIPS GDB currently uses.
Another MD #ifdef bites it.
* Rename _rtld_find_library() to _rtld_load_library(). It now calls
_rtld_load_object() if necessary to actually load the object, rather
than having the caller do it. To do this, it also takes the `mode'
argument that gets passed to _rtld_load_object().
* On a related note, remove _rtld_check_library(), and instead call
_rtld_load_object() to instead try actually loading the object. We
save two extra namei's and a bunch of redundant work (almost
literally the same code) this way.
* In _rtld_map_object(), mmap(2) the first page read-only, rather than
read(2)ing it.
* In _rtld_symlook_obj(), compare the *second* character of the symbol
name before calling strcmp(). (This first character is too
frequently `_', and turns out to not be helpful, in libc.)
* Also in _rtld_symlook_obj(), remove the bogus STT_FUNC special case
-- this also allows removing the `in_plt' argument to
_rtld_symlook_list() and _rtld_symlook_obj().
Also:
* In _rtld_obj_from_addr(), rather than trying to look up `_end' in
the each object, instead use obj->mapsize as the upper bound.
order of DT_PLTREL and DT_JMPREL is irrelevant. Removes the need for yet
another weird #ifdef.
Also, be slightly more careful with the rel(a)lim trimming.
from the glue code in _rtld_start(). This is used to set objself.relocbase,
rather than assuming that it's the same as objself.mapbase (or 0 on MIPS).
Now -- with a bug fix to the kernel -- ld.elf_so can be linked at any VMA.
_rtld_relocate_nonplt_self(), which is called from _rtld_start.
Now we're completely relocated before main() is called.
We also no longer need _GOT_END_, so junk the ld script.
This code assumes that ld.elf_so only contains RELATIVE relocs, but that's
supposed to be the case for -Bsymbolic anyway.
* Add a ld.so.script that exports _GOT_END_.
* Prebind the GOT in _rtld_start.
* Skip over GOT relocs in _rtld_relocate_nonplt_objects().
This makes debugging work better at least.
Large programs need multiple GOTs. The lazy binding stub in the PLT
can be reached from any of these GOTs, but the dynamic linker only
has enough information to fix up the first GOT entry. Thus, calls
through the other GOTs went through the time-consuming lazy binding
process on every call.
This fix rewrites the PLT entries themselves to bypass the lazy binding
for those GOT entries that the dynamic linker can't fixup.
Fix from FreeBSD.
Note that now that we patch up the PLT, we need to put back the "imb"
that was removed from the binder exit path.