is done in rumpuser for simplicity, since on the kernel side things
we assume we have only one pointer of space). As a side-effect,
we can no longer know if the current thread is holding on to a
mutex locked without curlwp context (basically all mutexes inited
outside of mutex_init()). The only thing that called rumpuser_mutex_held()
for a non-kmutex was the giant lock. So, instead implement recursive
locking for the giant lock in the rump kernel and get rid of the
now-unused recursive pthread mutex in the hypercall interface.
Also, add rump_daemonize_begin() / rump_daemonize_end() to help
with the "can't daemon() after pthread_create()" problem. Applications
could accomplish the same, but since it's such a common operation,
provide a little help.
limits. This improves syscall throughput about 2x for non-userio
syscalls (no copyin/out, e.g. getpid()) and almost 1.5x even for
things like __sysctl().
(measured for cases where the remote process is on the local machine)
XXX: if the pthread deadqueue sucks for anything which cares about
performance, why does it exist? Nuking it would make supporting
variable stack size easier.
==> add support for remote vmspace vmapbuf/vunmapbuf
==> add proper support for copyin/out_vmspace
==> add support for remote vmspace uvm_io
==> add support for non-curproc rumpuser_sp_copyin/out
==> store remote context in vm_map->pmap instead of
pthread_specificdata
In short, makes read/write of most (all?) block devices work from
a remote rump client via rump syscalls.
basics are there, but a few more tweaks are needed. The reason
I'm committing it now is that the code was mindnumbingly boring to
write (no wonder it took me almost 3 years to get it done), and I
might burn it if it's not in a safe place.
This incarnation is written in the user namespace as opposed to
the previous one which was done in kernel namespace. Also, rump
does all the handshaking now instead of excepting an application
to come up with the user namespace socket.
There's still a lot to do, including making code "a bit" more
robust, actually running different clients in a different process
inside the kernel and splitting the client side library from librump.
I'm committing this now so that I don't lose it, plus it generally
works as long as you don't use it in unexcepted ways: i've tested
ifconfig(8), route(8), envstat(8) and sysctl(8).
interlock. This is applicable in cases where the actual interlock
is the CPU the currently running thread is scheduled on. Borrowing
the scheduler lock as the mutex mandated by pthread_cond_wait()
does away with need to have an additional mutex. This both optimizes
runtime execution and simplifies code, as the extra lock typically
lead to quite some trickeries to avoid the dungeon collapsing due
to zaps from the wand of deadlock.