while for some cases attempting retry after server restart works
brilliantly (e.g. firefox), in other cases it's quite disasterous
(sshd doesn't like its file descriptors going missing and does not
attempt to reopen them, leading to a quite catastophic loop of
EBADF once the server does come back)
* rename RUMPHIJACK_RETRY to the slightly more sensible
RUMPHIJACK_RETRYCONNECT
rump tcp/ip stack:
* sshd likes to fork and then re-exec itself
==> trap execve() and augment the env with the current parameters
essential to a rump kernel (kernel communication fd, information
about dup2'd file descriptors)
* sshd likes to play lots of games with pipes, socketpairs and dup{,2}()
==> make sure we do not close essential rump client descriptors:
dup() them to a safe place, except for F_CLOSEM where we
simply leave them alone. also, partially solved by the above,
make sure the process's set of rump kernel descriptors persists
over exec()
* sshd likes to chdir() before exec
==> for unix-style rump_sp(7) sockets save the full path on the
initial exec and use it afterwards. thread the path through
the environment in execve()
TCP/IP stack:
* mutt prepares to exec the smtp client: it forks and closes all
file descriptors
* when the next networking syscall is done, rumpclient detects that
the communication fd returned EBADF and does a reconnect,
gets descriptor 0 for the socket and descriptor 1 for kqueue
* mutt opens the mail file and implicitly assumes it'll get 0-2,
but in fact gets 2-4
* mutt execs the smtp agent which tries to read the mail from
stdin (rumpclient communication socket) and fails
Even if mutt correctly did dup2() things would go south when trying
to communicate with the kernel server the next time, since rumpclient
would actually be talking with some mail body instead (well, it
could work, but in that case you'd need to write *really* weird
mails ;).
Hence, prevent rumpclient from using the special fd's 0-2 for its
purposes.
Should fix mutt problem reported by Alexander Nasonov.
reconnect in case the connection to the server is lost. Default
to exactly one reattempt. This makes sense and additionally fixes
the dev/raidframe/smalldisk test which currently causes a server
panic when a certain raidctl command is run (without this fix the
test would timeout since the client kept attempting to reconnect).
the kernel server is lost, the client will now automatically attempt
to reconnect.
Among other things, this makes it possible to "reboot" and restart
the TCP/IP stack from under firefox without any perceivable less
of service. If pages were loading at the time the TCP/IP server
was killed, there may be some broken links, but nothing a ctrl-r
cannot fix.
* don't hold spc mutex while sending data
* use send() for the banner to avoid SIGPIPE in case a client
connects and immediately goes away
* fix error path locking
* use kevent() instead of pollts() in the client. Apparently that
is the only sensible way for a library to support both multithreading
and signal-reentrancy in a race-free manner.
(can I catch all signals with one kevent instead of installing
NSIG different ones??)
* mark client comm descriptor non-blocking so that clients have
better signal-interruptibility (we now sleep in signal-accepting
kevent() instead of signal-masked recvfrom())
requests which have a 0-length response (such as copyin 0/0).
This change makes links(1) work against a rump kernel which contains
rumpnet_local. The presence of unix domain sockets caused links
to select() with 0 fds and a timeout, and because copyin never woke
up in the kernel the application blocked indefinitely.
dlsym(RTLD_NEXT) to lookup a host_syscall() function pointer which
is used instead of syscall() to communicate with the kernel server.
WARNING: popular opinion classifies this as "ugly code". if you
have a weak heart/mind/soul/sole meuniere, read max. 1 line of the
diff per day, preferably with food.
It's pretty much a placeholder for now. One plan for the future
is to require some sort of authentication for superuser clients.
The code will need a little massage then, though, to prevent DoS
attacks.
basics are there, but a few more tweaks are needed. The reason
I'm committing it now is that the code was mindnumbingly boring to
write (no wonder it took me almost 3 years to get it done), and I
might burn it if it's not in a safe place.