the functions are aligned to a 32-bit boundary, otherwise some pretty colorful lossage can result.
bytes in instead of at the start, to leave room for a .cpload to store the gp at offset 0 in the frame. Allow 8 bytes for each (for mips64 one day...). .cpload overwrite problems noted by Michael Hitch.