lock any uvm objects, check if lockholders are currently on CPU
and yield to try very soon again instead of assuming deadlock.
This makes limited-memory kernels perform the same as memory-unlimited
kernels (provided there is a reasonable amount of memory available).
For example, for large file copy off of ffs where the image is
backed on host memory (i.e. no disk i/o, i.e. ideal conditions)
the figures are, per rump kernel memory limit:
3000kB: same
1000kB: 10% slower
500kB: 50% slower
(per pagedaemon code might still be able to use some tweak, though)
is done in rumpuser for simplicity, since on the kernel side things
we assume we have only one pointer of space). As a side-effect,
we can no longer know if the current thread is holding on to a
mutex locked without curlwp context (basically all mutexes inited
outside of mutex_init()). The only thing that called rumpuser_mutex_held()
for a non-kmutex was the giant lock. So, instead implement recursive
locking for the giant lock in the rump kernel and get rid of the
now-unused recursive pthread mutex in the hypercall interface.
satisfied. This allows the caller to unlock the object and the
pagedaemon to avoid deadlock even if ~all memory is consumed by
one vm object. This in turn makes is possible to copy a large file
into a rump kernel with a 10MB memory limit (where large >> 10MB).
A little more tuning will be required to avoid the pagedaemon
hitting the sleep-and-retry path, though.
+ fix some outdated unrelated comments
Also, add rump_daemonize_begin() / rump_daemonize_end() to help
with the "can't daemon() after pthread_create()" problem. Applications
could accomplish the same, but since it's such a common operation,
provide a little help.
etfs objects must now be registered as absolute paths; however, it is now
possible to access them via relative paths and through symlinks, which
previously worked some times and not others depending on exactly what
namei was doing.
discussed on tech-kern and ok'd by pooka.
Now, let's say you start a rump server and configure a memory disk
on it. Remote (as in TCP remote) clients may now access that
memory.
cloudy, my apps are scattered and they're cloudy
they have no borders, no boundaries
==> add support for remote vmspace vmapbuf/vunmapbuf
==> add proper support for copyin/out_vmspace
==> add support for remote vmspace uvm_io
==> add support for non-curproc rumpuser_sp_copyin/out
==> store remote context in vm_map->pmap instead of
pthread_specificdata
In short, makes read/write of most (all?) block devices work from
a remote rump client via rump syscalls.
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.
Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).
The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
vm_page *) "reverse" lookup code from uvm_page.h to uvm_page.c, to
help migration to not do that.
Likewise move per-page metadata (struct vm_page *) -> physical
address "forward" conversion code into *.c too. This is called
only low-layer VM and MD code.
do, this effectively allows changing the uid of proc0 without
running into KASSERT problems in uidinfo code (although I'm not
quite so sure changing proc0's uid is the right thing to do ...).
problem reported by njoly
This incarnation is written in the user namespace as opposed to
the previous one which was done in kernel namespace. Also, rump
does all the handshaking now instead of excepting an application
to come up with the user namespace socket.
There's still a lot to do, including making code "a bit" more
robust, actually running different clients in a different process
inside the kernel and splitting the client side library from librump.
I'm committing this now so that I don't lose it, plus it generally
works as long as you don't use it in unexcepted ways: i've tested
ifconfig(8), route(8), envstat(8) and sysctl(8).
Introduce rump_pub_syscall() as the generic interface for making
system calls with already marshalled arguments. So it's kinda like
syscall(2), except it also remembered to breathe instead of having
to figure out how to deal with 64bit values.
"ifconfig create". As previously, virt<n> interfaces with the
host's /dev/tap<n> (I guess it could be made explicit with
"ifconfig media", but leave it this way for now).
pagedaemon. This mimics normal kernel behaviour where pmap_kentered
mappings are not tracked for references. Without this change the
vnode pager's clustering could cause one page to be released by
the pagedaemon, and the rest of the pages in the pageout cluster
made unlikely candidates to be released soon.
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.
XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..
1-3 address the PR/43488 by Jeremy Huddleston.
Passes RB-tree regression tests.
Reviewed by: matt@, christos@
* page out vnode objects
* drain kmem/kernel_map
As long as there is a reasonable memory hardlimit (>600kB or so),
a rump kernel can now survive file system metadata access for an
arbitrary size file system (provided, of course, that the file
system does not use wired kernel memory for metadata ...).
Data handling still needs a little give&take finetuning. The
general problem is that a single vm object can easily be the owner
of all vm pages in a rump kernel. now, if a thread wants to allocate
memory while holding that object locked, there's very little the
pagedaemon can do to avoid deadlock. but I think the problem can
be solved by making an object release a page when it wants to
allocate a page if a) the system is short on memory and b) too many
pages belong to the object. that still doesn't take care of the
pathological situation where 1000 threads hold an object with 1
page of memory locked and try to allocate more. but then again,
running 1000 threads with <1MB of memory is an unlikely scenario.
and ultimately, I call upon the fundamental interaction which is
the basis of why any operating works: luck.
we are short of memory.
There are still some funnies left to iron out. For example, with
a certain file system / memory size configuration it's still not
possible to create enough files to make the file system run out of
inodes before the kernel runs out of memory. Also, with some other
configurations disk access slows down gargantually (though i'm sure
there are >0 buffers available). Anyway, it ~works for now and
it's by no means worse than what it was before.
number currently attached. Deals with a SNAFU in my commit earlier
today which would cause softints established early to lack a
softint context on non-bootstrap CPUs.
structure itself and allocating the backing page directly from the
hypervisor.
* initial write to a large tmpfs file is almost 2x faster
* truncating the file to 0 length after write is over 50% faster
* rewrite of the file is just slightly faster (indicating that
kmem does a good job with caching, as expected)
rump. These move the management of the pid/lwpid space from the
application into the kernel, make code more robust, and make it
possible to attach multiple lwp's to non-proc0 processes.
some idea of how they should be done. This change essentially
moves the responsibility of pid/lwpid management from the application
side into the rump kernel. It also introduces clear rules on what
happens when, i.e. introduces semantics (these semantics will be
documented on the man page, and more importantly in atf tests).
our SCSIPI driver stack. Currently we pretend to be a single CD
controller with an optional host file as the image, but I guess
the sky's the limit.
dmesg porn:
NetBSD 5.99.39 (RUMP-ROAST) #0: Mon Aug 23 11:38:16 CEST 2010
pooka@pain-rustique.localhost:/usr/allsrc/src/sys/rump/librump/rumpkern
total memory = unlimited (host limit)
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "rumpclk" frequency 100 Hz quality 0
root file system type: rumpfs
mainbus0 (root)
scsitest0 at mainbus0
scsibus0 at scsitest0: 2 targets, 1 lun per target
cd0 at scsibus0 target 1 lun 0: <RUMPHOBO, It's a LIE, 0.00> cdrom removable
advance the "first" pointer. This problem triggered only if the
bus was filled in the first round, since the first pointer is at
the end-of-bus only for the bootstrap round.
a CPU. This fixes some heavy-load problems with the pool code when
rump kernels essentially lied and caused the pool code not to do
a proper backdown from the fastpath when a context switch happened
when taking a lock.
because they aren't softint threads. This fixes callouts in
situations where there is nothing else happening in the rump kernel
(i.e. no threads executed which would trigger the softints when
they unschedule).
this more closely when he wakes up. Normally I wouldn't be in such
a huge rush, but due to atf bug #53 the whole test run breaks now.
At least with the KASSERT removed all tests pass again.
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.
Ok: YAMAMOTO Takashi <yamt@netbsd.org>
still at the original value and not the schedstate one. This makes
select not miss wakeups in cases where there was a lot of selecting
going on (which is not all that common in a rump kernel).
that we can attach a power management handler. The handler prevents
a suspend if the watchdog is active, to be consistent with other
watchdog drivers.
As discussed on tech-kern.
with rumpfs_puffs for prehistoric reasons which are no longer valid
(namely, only fs components existed back then and there was no /dev
support in rump fs namespace).
accounting. Use wired memory (which can be limited) for meta-data, and
kmem(9) for string allocations.
Close PR/31944. Fix PR/38361 while here. OK ad@.
our window to it. This fixes cases like opening a window at offsets
[8,32] to a file, which would cause host file offset [0,32-8] to
be mapped, i.e. [0,16] inside the window. Obviously, access to
the entire in-window [0,24] range should have been mapped (and
after this fix it is).
kernel ABI (i.e. not i386 or amd64). Due to the "half function,
half macro, all noodles" nature of pmap.h, it's too entangling and
too brittle to keep up with an ifdeffy MI implementation.
the rump kernel by specifying RUMP_MEMLIMIT. In case allocation
over that limit is attempted, essentially pool reclaim and uvm_wait()
is done. The default is to allow to allocate as much as the host
will give.
XXX: uvm_km_alloc and malloc(9) do not currently conform. the
former is easy, the latter requires kern_malloc.c (rump malloc is
currently directly relegated to host malloc).