the file-level crt_arch.h asm fragments generally make direct
(non-PLT) calls from _start to _start_c, which is only valid when
there is a local, non-interposable definition for _start_c. generally,
the linker is expected to know that local definitions in a main
executable (as opposed to shared library) output are non-interposable,
making this work, but historically there have been linker bugs in this
area, and microblaze is reportedly still broken, flagging the
relocation for the call as a textrel.
the equivalent _dlstart_c, called from the same crt_arch.h asm
fragments, has always used hidden visibility without problem, and
semantically it should be hidden, so make it hidden. this ensures the
direct call is always valid regardless of whether the linker properly
special-cases main executable output.
without explicit alignment directives, whether they end up at the
necessary alignment depends on linker/linking conditions. initially
reported as mold issue 1255.
after commit a48ccc159a removed the use
of _Noreturn on the stage3_func type (which only worked due to it
being defined to the "GNU C" attribute in C99 mode), GCC could no
longer assume that the ends of __dls2 and __dls2b are unreachable, and
produced a warning that a function marked _Noreturn returns.
also, since commit 4390383b32, the
_Noreturn declaration for __libc_start_main in crt1/rcrt1 has been not
only inconsistent with the definition, but wrong. formally,
__libc_start_main does return, via a (hopefully) tail call to a helper
function after the barrier. incorrect usage of _Noreturn in the
declaration was probably formal UB.
the _Noreturn specifiers were not useful in any of these places, so
remove them all. now, the only remaining usage of _Noreturn is in
public interfaces where _Noreturn is part of their contract.
this cleans up what had become widespread direct inline use of "GNU C"
style attributes directly in the source, and lowers the barrier to
increased use of hidden visibility, which will be useful to recovering
some of the efficiency lost when the protected visibility hack was
dropped in commit dc2f368e56, especially
on archs where the PLT ABI is costly.
the dynamic linker was found to hang when used as the PT_INTERP, but
not when invoked as a command. the mechanism of this failure was not
determined, but the cause is clear:
commit 5552ce5200 removed the SHARED
macro, but arch/sh/crt_arch.h is still using it to choose the right
form of the crt/ldso entry point code. moving the forced definition
from rcrt1.c to dlstart.c restores the old behavior. eventually the
logic should be changed to fully remove the SHARED macro or at least
rename it to something more reasonable.
this eliminates the last need for the SHARED macro to control how
files in the src tree are compiled. the same code is used for both
libc.a and libc.so, with additional code for the dynamic linker (from
the new ldso tree) being added to libc.so but not libc.a. separate .o
and .lo object files still exist for the src tree, but the only
difference is that the .lo files are built as PIC.
in the future, if/when we add dlopen support for static-linked
programs, much of the code in dynlink.c may be moved back into the src
tree, but properly factored into separate source files. in that case,
the code in the ldso tree will be reduced to just the dynamic linker
entry point, self-relocation, and loading of libraries needed by the
main application.
these files are all accepted as legacy arm syntax when producing arm
code, but legacy syntax cannot be used for producing thumb2 with
access to the full ISA. even after switching to UAL, some asm source
files contain instructions which are not valid in thumb mode, so these
will need to be addressed separately.
the idea of the three-instruction sequence being removed was to be
able to return to thumb code when used on armv4t+ from a thumb caller,
but also to be able to run on armv4 without the bx instruction
available (in which case the low bit of lr would always be 0).
however, without compiler support for generating such a sequence from
C code, which does not exist and which there is unlikely to be
interest in implementing, there is little point in having it in the
asm, and it would likely be easier to add pre-armv4t support via
enhanced linker handling of R_ARM_V4BX than at the compiler level.
removing this code simplifies adding support for building libc in
thumb2-only form (for cortex-m).
since commits 2907afb8db and
6fc30c2493, __dls2 is no longer called
via symbol lookup, but instead uses relative addressing that needs to
be resolved at link time. on some linker versions, and/or if
-Bsymbolic-functions is not used, the linker may leave behind a
dynamic relocation, which is not suitable for bootstrapping the
dynamic linker, if the reference to __dls2 is marked hidden but the
definition is not actually hidden. correcting the definition to use
hidden visibility fixes the problem.
the static-PIE entry point rcrt1 was likewise affected and is also
fixed by this patch.
since commit c5e34dabbb, crt1.c has
provided a "mostly-C" implementation of the crt1 start file that
avoids the need for arch-specific symbol referencing, PIC/PIE-specific
code variants, etc. but for archs that had existing hand-written
versions, the new code was initially unused, and later only used as
the dynamic linker entry point. this commit switches all archs to
using the new code.
the code being removed was a recurring source of subtle errors, and
was still broken at least on arm, where it failed to properly align
the stack pointer before calling into C code.
for fdpic support is is essential that the got pointer be saved at a
known, ABI-dictated offset from the frame pointer, since there is no
way to recover it once it's lost.
static-linked PIE files need startup code to relocate themselves, much
like the dynamic linker does. rcrt1.c reuses the code in dlstart.c,
stage 1 of the dynamic linker, which in turn reuses crt_arch.h, to
achieve static PIE with no new code. only relative relocations are
supported.
existing toolchains that don't yet support static PIE directly can be
repurposed by passing "-shared -Wl,-Bstatic -Wl,-Bsymbolic" instead of
"-static -pie" and substituting rcrt1.o in place of crt1.o.
all libraries being linked must be built as PIC/PIE; TEXTRELs are not
supported at this time.
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:
- that symbolic function references inside libc are bound at link time
via the linker option -Bsymbolic-functions.
- that libc functions used by the dynamic linker do not require
access to data symbols.
- that static/internal function calls and data accesses can be made
without performing any relocations, or that arch-specific startup
code handled any such relocations needed.
removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:
1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
fully-functional libc/ldso.
reduction in arch-specific code is achived through the following:
- crt_arch.h, used for generating crt1.o, now provides the entry point
for the dynamic linker too.
- asm is no longer responsible for skipping the beginning of argv[]
when ldso is invoked as a command.
- the functionality previously provided by __reloc_self for heavily
GOT-dependent RISC archs is now the arch-agnostic stage-1.
- arch-specific relocation type codes are mapped directly as macros
rather than via an inline translation function/switch statement.
This adds complete aarch64 target support including bigendian subarch.
Some of the long double math functions are known to be broken otherwise
interfaces should be fully functional, but at this point consider this
port experimental.
Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
With the exception of a fenv implementation, the port is fully featured.
The port has been tested in or1ksim, the golden reference functional
simulator for OpenRISC 1000.
It passes all libc-test tests (except the math tests that
requires a fenv implementation).
The port assumes an or1k implementation that has support for
atomic instructions (l.lwa/l.swa).
Although it passes all the libc-test tests, the port is still
in an experimental state, and has yet experienced very little
'real-world' use.
without these, calls may be resolved incorrectly if the calling code
has been compiled to thumb instead of arm. it's not clear to me at
this point whether crt_arch.h is even working if crt1.c is built as
thumb; this needs testing. but the _init and _fini issues were known
to cause crashes in static-linked apps when libc was built as thumb,
and this commit should fix that issue.
the only immediate effect of this commit is enabling PIE support on
some archs that did not previously have any Scrt1.s, since the
existing asm files for crt1 override this C code. so some of the
crt_arch.h files committed are only there for the sake of documenting
what their archs "would do" if they used the new C-based crt1.
the expectation is that new archs should use this new system rather
than using heavy asm for crt1. aside from being easier and less
error-prone, it also ensures that PIE support is available immediately
(since Scrt1.o is generated from the same C source, using -fPIC)
rather than having to be added as an afterthought in the porting
process.
failure to do so was causing crashes on x86_64 when ctors used SSE,
which was first observed when ctors called variadic functions due to
the SSE prologue code inserted into every variadic function.
a while back, gcc switched from using the old _init/_fini fragments
method for calling ctors and dtors on arm to the __init_array and
__fini_array method. unfortunately, on glibc this depends on ugly
hacks involving making libc.so a linker script and pulling parts of
libc into the main program binary. so I cheat a little bit, and just
write asm to iterate over the init/fini arrays from the _init/_fini
asm. the same approach could be used on any arch it's needed on, but
for now arm is the only one.
based on initial work by rdp, with heavy modifications. some features
including threads are untested because qemu app-level emulation seems
to be broken and I do not have a proper system image for testing.
it's naturally aligned when entered with the kernel argv array, but if
ld.so has been invoked explicitly to run a program, the stack will not
be aligned due to having thrown away argv[0].
since .init and .fini are not .text, the toolchain does not seem to
align them for code by default. this yields random breakage depending
on the object sizes the linker is dealing with.
basically, this version of the code was obtained by starting with
rdp's work from his ellcc source tree, adapting it to musl's build
system and coding style, auditing the bits headers for discrepencies
with kernel definitions or glibc/LSB ABI or large file issues, fixing
up incompatibility with the old binutils from aboriginal linux, and
adding some new special cases to deal with the oddities of sigaction
and pipe syscall interfaces on mips.
at present, minimal test programs work, but some interfaces are broken
or missing. threaded programs probably will not link.
lr must be saved because init/fini-section code from the compiler
clobbers it. this was not a problem when i tested without gcc's
crtbegin/crtend files present, but with them, musl on arm fails to
work (infinite loop in _init).
looks like nik copied these "extra arguments" from the i386 code.
they're not actually arguments there, just 1-byte instructions to
make sure the stack is aligned to 16 bytes after all the other
arguments are pushed. since each push is 8 bytes on x86_64, they
happened to have no effect here, but their presence is confusing and a
minor waste of space.
it does not work; after further consideration, a separate Scrt1.s for
pie really is essential. it would be nice if the unified approach
worked, but the linker fails to generate the correct PLT entries and
instead puts textrels in the main program, which don't work because
the kernel maps the text read-only.
new Scrt1.s will be committed soon in place of this.
this is mainly in hopes of supporting c++ (not yet possible for other
reasons) but will also help applications/libraries which use (and more
often, abuse) the gcc __attribute__((__constructor__)) feature in "C"
code.
x86_64 and arm versions of the new startup asm are untested and may
have minor problems.
this port assumes eabi calling conventions, eabi linux syscall
convention, and presence of the kernel helpers at 0xffff0f?0 needed
for threads support. otherwise it makes very few assumptions, and the
code should work even on armv4 without thumb support, as well as on
systems with thumb interworking. the bits headers declare this a
little endian system, but as far as i can tell the code should work
equally well on big endian.
some small details are probably broken; so far, testing has been
limited to qemu/aboriginal linux.