The common code to move a returned structure packed into
registers into memory on the caller side didn't take the
register size into account when allocating local storage,
so sometimes that lead to stack overwrites (e.g. in 73_arm64.c),
on x86_64. This fixes it by generally making gfunc_sret also return
the register size.
__clear_cache is defined in lib-arm64.c with a single call to
__arm64_clear_cache, which is the real built-in function and is
turned into inline assembler by gen_clear_cache in arm64-gen.c