syscalls provided by a rump faction such as rumpvfs when the library
is not linked into the binary, but is dlopen()'d before calling
rump_init().
(it is illegal to dlopen() a faction after rump_init(), but syscalls
maybe be added the usual way with modules)
rump_server(1) -lstuff works now.
desired values in case the components containing the netisr handlers
were not linked in but dlopen()'d before calling rump_init().
(could simplify a little in case static linking is declared dead)
that scheduler locks are special in this regard - adaptive locks cannot
be in the path due to turnstiles. Randomly spotted/reported by uebayasi@.
- Remove unused lwp_relock() and replace lwp_lock_retry() by simplifying
lwp_lock() and sleepq_enter() a little.
- Give alllwp its own cache-line and mark lwp_cache pointer as read-mostly.
OK ad@
lock any uvm objects, check if lockholders are currently on CPU
and yield to try very soon again instead of assuming deadlock.
This makes limited-memory kernels perform the same as memory-unlimited
kernels (provided there is a reasonable amount of memory available).
For example, for large file copy off of ffs where the image is
backed on host memory (i.e. no disk i/o, i.e. ideal conditions)
the figures are, per rump kernel memory limit:
3000kB: same
1000kB: 10% slower
500kB: 50% slower
(per pagedaemon code might still be able to use some tweak, though)
is done in rumpuser for simplicity, since on the kernel side things
we assume we have only one pointer of space). As a side-effect,
we can no longer know if the current thread is holding on to a
mutex locked without curlwp context (basically all mutexes inited
outside of mutex_init()). The only thing that called rumpuser_mutex_held()
for a non-kmutex was the giant lock. So, instead implement recursive
locking for the giant lock in the rump kernel and get rid of the
now-unused recursive pthread mutex in the hypercall interface.
satisfied. This allows the caller to unlock the object and the
pagedaemon to avoid deadlock even if ~all memory is consumed by
one vm object. This in turn makes is possible to copy a large file
into a rump kernel with a 10MB memory limit (where large >> 10MB).
A little more tuning will be required to avoid the pagedaemon
hitting the sleep-and-retry path, though.
+ fix some outdated unrelated comments
Also, add rump_daemonize_begin() / rump_daemonize_end() to help
with the "can't daemon() after pthread_create()" problem. Applications
could accomplish the same, but since it's such a common operation,
provide a little help.
etfs objects must now be registered as absolute paths; however, it is now
possible to access them via relative paths and through symlinks, which
previously worked some times and not others depending on exactly what
namei was doing.
discussed on tech-kern and ok'd by pooka.
Now, let's say you start a rump server and configure a memory disk
on it. Remote (as in TCP remote) clients may now access that
memory.
cloudy, my apps are scattered and they're cloudy
they have no borders, no boundaries
==> add support for remote vmspace vmapbuf/vunmapbuf
==> add proper support for copyin/out_vmspace
==> add support for remote vmspace uvm_io
==> add support for non-curproc rumpuser_sp_copyin/out
==> store remote context in vm_map->pmap instead of
pthread_specificdata
In short, makes read/write of most (all?) block devices work from
a remote rump client via rump syscalls.
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.
Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).
The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
vm_page *) "reverse" lookup code from uvm_page.h to uvm_page.c, to
help migration to not do that.
Likewise move per-page metadata (struct vm_page *) -> physical
address "forward" conversion code into *.c too. This is called
only low-layer VM and MD code.
do, this effectively allows changing the uid of proc0 without
running into KASSERT problems in uidinfo code (although I'm not
quite so sure changing proc0's uid is the right thing to do ...).
problem reported by njoly
This incarnation is written in the user namespace as opposed to
the previous one which was done in kernel namespace. Also, rump
does all the handshaking now instead of excepting an application
to come up with the user namespace socket.
There's still a lot to do, including making code "a bit" more
robust, actually running different clients in a different process
inside the kernel and splitting the client side library from librump.
I'm committing this now so that I don't lose it, plus it generally
works as long as you don't use it in unexcepted ways: i've tested
ifconfig(8), route(8), envstat(8) and sysctl(8).
Introduce rump_pub_syscall() as the generic interface for making
system calls with already marshalled arguments. So it's kinda like
syscall(2), except it also remembered to breathe instead of having
to figure out how to deal with 64bit values.
"ifconfig create". As previously, virt<n> interfaces with the
host's /dev/tap<n> (I guess it could be made explicit with
"ifconfig media", but leave it this way for now).
pagedaemon. This mimics normal kernel behaviour where pmap_kentered
mappings are not tracked for references. Without this change the
vnode pager's clustering could cause one page to be released by
the pagedaemon, and the rest of the pages in the pageout cluster
made unlikely candidates to be released soon.
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.
XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..
1-3 address the PR/43488 by Jeremy Huddleston.
Passes RB-tree regression tests.
Reviewed by: matt@, christos@
* page out vnode objects
* drain kmem/kernel_map
As long as there is a reasonable memory hardlimit (>600kB or so),
a rump kernel can now survive file system metadata access for an
arbitrary size file system (provided, of course, that the file
system does not use wired kernel memory for metadata ...).
Data handling still needs a little give&take finetuning. The
general problem is that a single vm object can easily be the owner
of all vm pages in a rump kernel. now, if a thread wants to allocate
memory while holding that object locked, there's very little the
pagedaemon can do to avoid deadlock. but I think the problem can
be solved by making an object release a page when it wants to
allocate a page if a) the system is short on memory and b) too many
pages belong to the object. that still doesn't take care of the
pathological situation where 1000 threads hold an object with 1
page of memory locked and try to allocate more. but then again,
running 1000 threads with <1MB of memory is an unlikely scenario.
and ultimately, I call upon the fundamental interaction which is
the basis of why any operating works: luck.
we are short of memory.
There are still some funnies left to iron out. For example, with
a certain file system / memory size configuration it's still not
possible to create enough files to make the file system run out of
inodes before the kernel runs out of memory. Also, with some other
configurations disk access slows down gargantually (though i'm sure
there are >0 buffers available). Anyway, it ~works for now and
it's by no means worse than what it was before.
number currently attached. Deals with a SNAFU in my commit earlier
today which would cause softints established early to lack a
softint context on non-bootstrap CPUs.
structure itself and allocating the backing page directly from the
hypervisor.
* initial write to a large tmpfs file is almost 2x faster
* truncating the file to 0 length after write is over 50% faster
* rewrite of the file is just slightly faster (indicating that
kmem does a good job with caching, as expected)