no need to save the stack pointer. Just push the space for the cleanup and obj_main pointers before calling _rtld(), and pop it after loading those pointers into the appropriate argument registers for the program entry point.