There are several optimizations here:
1) Objects on _rtld_list_main do not participate in the DAG structures
at all. This is okay because all symbols must be resolvable at
link/load time, and _rtld_list_main is always searched first, so
any references from those objects must necessarily be resolved to
other objects on _rtld_list_main.
(Making this work completely required setting obj->main a bit
earlier; hence the RTLD_MAIN hack.)
2) Objects on _rtld_list_main are not put on _rtld_list_global,
preventing an extra search.
3) A bit is used to keep track of whether an object is on
_rtld_list_global, so we don't have to do a silly linear search.
4) A small attempt is made to prevent objects being put on the DAG
lists multiple times (using a silly linear search).
The sum of this appears to be a ~10% (.3s) reduction in Mozilla's
startup time on my 800MHz box.
Also, make sure _rtld_objmain->path is always set, just to make the
debug output nicer.
and utilize it. This greatly reduces the number of calls to open(2) and
malloc(3) for programs like mozilla that depend on many shared objects
while it doesn't affect performance of small programs.
-dynamic-linker=/libexec/ld.elf_so) if the BINDIR of the program being
built is /bin or /sbin.
The reason we do this is because now all programs *except* those in
/bin and /sbin (i.e. the "special cases") match the default the compiler
uses, which is what is used for things in e.g. xsrc, pkgsrc, and other
random 3rd party programs.
This is done by decoupling where a shlib is installed from how it
is located. Two new variables, SHLIBINSTALLDIR and SHLINKINSTALLDIR,
contain the former information, and key off MKDYNAMICROOT only. SHLIBDIR
and SHLINKDIR contain the latter, and key off MKDYNAMICROOT and BINDIR.
The SHLIBINSTALLDIR, SHLIBDIR, _LIBSODIR, SHLINKINSTALLDIR, and
SHLINKDIR parameters are moved to a new <bsd.shlib.mk>; see bsd.README
for usage details.
instructions. Function calls use GOT indirection, and we only patch the
GOT.
2) The mask-comparison optimization always fails, because the saved mask
always has 0x2000 set, and the PLT stub mask never does. So, remove it.
Remove the call to _rtld_relocate_objects() completely -- except on VAX, where
we TEMPORARILY call _rtld_relocate_nonplt_objects() directly.
Also add more assertions -- ld.elf_so should never have PLT relocations.
Fix an obvious bug in the 64-bit PLT fixup: the SLLX was by 12 bits, when it
should be 32.
Fix what *appear* to be two bugs in the >32768 PLT entry stub:
* One division was wrong (/14 rather than /24).
* We need to subtract 1048576 (to make the offset relative to the beginning of
the upper section), not add it.
This path is still untested, and buggy.
way we don't have to worry about malloc() failure. Also closes
a memory leak since vis_user was never free()d. Lack of malloc()
checking pointed out by Peter Werner.
from openbsd
symbols in the global part of the symbol table, use the updated GOT entry
rather than doing a lookup. (This provides the same effect as `-z combreloc'
on other platforms -- at most one lookup is done per symbol.)
Unfortunately, it is necessary to turn off lazy binding on MIPS. As the
comment says:
* XXX DANGER WILL ROBINSON!
* You might think this is stupid, as it intentionally
* defeats lazy binding -- and you'd be right.
* Unfortunately, for lazy binding to work right, we
* need to a way to force the GOT slots used for
* function pointers to be resolved immediately. This
* is supposed to be done automatically by the linker,
* by not outputting a PLT slot and setting st_value
* to 0, but GNU ld does not do so reliably.
years now.) Use _rtld_pagesz instead of getpagesize() to determine the page
size in our local malloc(). Saves a system call.
Also, since we're now relocated early, we don't need to be careful to avoid
globals, so most of the VARPSZ hacks are eliminated.
l_addr is always supposed to be obj->relocbase -- or so says the GDB code that
uses it. So, set it to this on all platforms. It already was on VAX
explicitly, and on everything else except MIPS implicitly (because
mapbase==relocbase for all existing shlibs). For some silly/stupid reason, a
new field was created that the MIPS GDB currently uses.
Another MD #ifdef bites it.
* Rename _rtld_find_library() to _rtld_load_library(). It now calls
_rtld_load_object() if necessary to actually load the object, rather
than having the caller do it. To do this, it also takes the `mode'
argument that gets passed to _rtld_load_object().
* On a related note, remove _rtld_check_library(), and instead call
_rtld_load_object() to instead try actually loading the object. We
save two extra namei's and a bunch of redundant work (almost
literally the same code) this way.
* In _rtld_map_object(), mmap(2) the first page read-only, rather than
read(2)ing it.
* In _rtld_symlook_obj(), compare the *second* character of the symbol
name before calling strcmp(). (This first character is too
frequently `_', and turns out to not be helpful, in libc.)
* Also in _rtld_symlook_obj(), remove the bogus STT_FUNC special case
-- this also allows removing the `in_plt' argument to
_rtld_symlook_list() and _rtld_symlook_obj().
Also:
* In _rtld_obj_from_addr(), rather than trying to look up `_end' in
the each object, instead use obj->mapsize as the upper bound.
buffer that is smaller in size than the source buffer.
also, there is no guarantee that any of the string components of
the request packet are null terminated.
in some cases, not all elements of the response buffer are
explicitly set. specifically pad and addr. a talk client can spy to
see which host is talking to which host by sending out regular
packets, to which talkd responds without clearing the addr element.
from xs@kittenz.org
order of DT_PLTREL and DT_JMPREL is irrelevant. Removes the need for yet
another weird #ifdef.
Also, be slightly more careful with the rel(a)lim trimming.
from the glue code in _rtld_start(). This is used to set objself.relocbase,
rather than assuming that it's the same as objself.mapbase (or 0 on MIPS).
Now -- with a bug fix to the kernel -- ld.elf_so can be linked at any VMA.
Allows /etc/ftpchroot to work correctly for usernames > 9 characters.
Noted by Max Khon in the freebsd-stable mailing list, via Thomas Vogt in
private email.
_rtld_relocate_nonplt_self(), which is called from _rtld_start.
Now we're completely relocated before main() is called.
We also no longer need _GOT_END_, so junk the ld script.
This code assumes that ld.elf_so only contains RELATIVE relocs, but that's
supposed to be the case for -Bsymbolic anyway.
This is somewhat of a hack, but I find it better than having
to run env(1) from inetd(8), or changing the environment for
inetd(8) itself (and thus all daemons started by it).
* Add a ld.so.script that exports _GOT_END_.
* Prebind the GOT in _rtld_start.
* Skip over GOT relocs in _rtld_relocate_nonplt_objects().
This makes debugging work better at least.
Large programs need multiple GOTs. The lazy binding stub in the PLT
can be reached from any of these GOTs, but the dynamic linker only
has enough information to fix up the first GOT entry. Thus, calls
through the other GOTs went through the time-consuming lazy binding
process on every call.
This fix rewrites the PLT entries themselves to bypass the lazy binding
for those GOT entries that the dynamic linker can't fixup.
Fix from FreeBSD.
Note that now that we patch up the PLT, we need to put back the "imb"
that was removed from the binder exit path.
indicates whether we're relocating ld.elf_so itself. Use this in some places
rather than hackish tests on `dodebug'. (The Alpha and HPPA `dodebug' tests
were actually noops, because RTLD_RELOCATE_SELF is not set, and therefore
dodebug is always true.)
executable was of type ET_DYN. Use this instead of `mainprog' to determine
whether we need to do base-relative fixups of the PLT. (This allows loading
non-relocatable objects, should we desire to do that at some point...)
* _rtld_relocate_plt_lazy() fixes up all the relocs pointing to the PLT. (On
most platforms it just does a simple base-relative fixup; on SPARC it does
nothing.)
* _rtld_relocate_plt_object() does immediate binding for a PLT entry.
The basic gist is that this saves a bit of time on SPARC (where the iteration
through the pltrela table was gratuitous), and a little less time on all other
platforms. A whole lot of #ifdef'ed crap is moved out of reloc.c, too.
NOT tested on: hppa sh x86_64
symbol' errors, probably because the increment gets interrupted occasionally by
a signal. In general, _rtld_bind() should not modify ANY internal state.
* Pass a symbol number to _rtld_find_symdef(), not a r_info.
* Don't try to do a symbol lookup when we find an unsupported relocation;
instead get the symbol name from the referencing object's strtab.
* Add preliminary support for `-z combreloc'-style startup optimization on
i386, `#ifdef COMBRELOC'.
executable that uses the library on that line has the rather cryptic
"sysctl" printed when it starts executing.
Switch to (_PATH_LD_HINTS": unknown sysctl for %s", name);
Discovered after someone copied /etc from an i386 to a sparc64 box.
check_write(), so that a user who has modify disabled gets an error
message rather than a hung connection.
Noted by M.J. Rutter <mjr19@cus.cam.ac.uk> in private email.
failures as well as successes when a run of clean_all_inodes completes.
Explicitly cast to off_t in get_dinode and get_rawblock, to make sure we
read the right block.
a potential problem with cleaning fragments at all.
Better sanity checks when selecting files to coalesce; in particular don't
shift too far left when comparing the number of discontinuities to the log2
of the number of total blocks.
Better log messages: note beginning of coalescing correctly; also take
the log message from add_segment out of "if (debug)" for symmetry with the
"finished segment" message.
Use lfs_bmapv to find the inode, rather than looking it up manually in
the ifile; this should give more up-to-date information, since trolling
through every inode in the fs could take some time.
be digging itself deeper into a hole, it forks off a subprocess
that locates files with too many discontinuities and rewrites them, if
there is enough room.
Optionally the user can manually coaleasce files by running with "-c".
The recent change to lfs_markv is required for the coalescer to do anything.
All of "digging itself deeper", "too many discontinuities", and "enough room"
need to be better defined.
- implement SIMPLEQ_REMOVE(head, elm, type, field). whilst it's O(n),
this mirrors the functionality of SLIST_REMOVE() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE()
- remove the unnecessary elm arg from SIMPLEQ_REMOVE_HEAD().
this mirrors the functionality of SLIST_REMOVE_HEAD() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE_HEAD()
- remove notes about SIMPLEQ not supporting arbitrary element removal
- use SIMPLEQ_FOREACH() instead of home-grown for loops
- use SIMPLEQ_EMPTY() appropriately
- use SIMPLEQ_*() instead of accessing sqh_first,sqh_last,sqe_next directly
- reorder manual page; be consistent about how the types are listed
- other minor cleanups