doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.
This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.