was developed as part of Google's Summer of Code 2005 program. This
change adds the kernel code, the mount_tmpfs utility, a regression test
suite and does all other related changes to integrate these.
The file-system is still *experimental*. Therefore, it is disabled by
default in all kernels. However, as typically done, a commented-out
entry is added in them to ease its setup.
Note that I haven't commited the required mountd(8) changes to be able
to export tmpfs file-systems because NFS support is still very unstable
and because, before enabling it, I'd like to do some other changes.
OK'ed by my project mentor, William Studenmund (wrstuden@).
use from checking the proc's uid to suser(9), and account for the use of
privileges. Noted by David Holland in PR kern/31126.
Also, dispose of a redundant instance of that check.
explicitely by a plain integer array
the length in now known to all relevant parties, so this avoids
duplication of information, and we can allocate that thing in
drivers without hacks
IPIs in the MULTIPROCESSOR case. Adjust all callers.
- In fpusave_cpu(), block IPIs for the entire duration (while we have
CPUF_FPUSAVE set in ci_flags) to fix the deadlock that leads to
"panic: fpsave ipi didn't", as described in PR port-alpha/26383.
Many thanks to Michael Hitch for the analysis and initial patch which
this one is derived from.
it in pmap_create(), and freeing the lev1map in pmap_destroy(). This
means that pm_lev1map is consistent for the life of the pmap.
2. pmap_extract() now uses vtophys() for the kernel pmap. This avoids
having to lock the kernel pmap, since kernel PT pages are never freed.
3. Because of (1), pmap_asn_alloc() no longer needs to operate on a locked
pmap; pm_lev1map will never change over the life of the pmap, and all
other access to the pmap is done in per-CPU fields or with atomic
operations.
4. Because of (3), pmap_activate() no longer needs to lock the pmap
to do its work, thus eliminating the deadlock with sched_lock described
in PR port-alpha/25599. This is safe because we are guaranteed that
the pmap is still alive, since by definition an LWP that uses that it
is about to run.
Thanks to Michael Hitch for the analysis, and Michael and Ragge for testing.
and install ${TOOLDIR}/bin/${MACHINE_GNU_PLATFORM}-disklabel,
${TOOLDIR}/bin/${MACHINE_GNU_PLATFORM}-fdisk by "reaching over" to
the sources in ${NETBSDSRCDIR}/sbin/{disklabel fdisk}/.
To avoid clashes with a build-host's header files, especially on
*BSD, the host-tools versions of fdisk and disklabel search for
#includes such as disklabel.h, disklabel_acorn.h, disklabel_gpt.h,
and bootinfo.h in a new #includes namespace, nbinclude/. That is,
they #include <nbinclude/sys/disklabel.h>, <nbinclude/machine/disklabel.h>,
<nbinclude/sparc64/disklabel.h>, instead of <sys/disklabel.h> and
such. I have also updated the system headers to #include from
nbinclude/-space when HAVE_NBTOOL_CONFIG_H is #defined.
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
(still commented out)
- Add (also commented out) options BUFQ_PRIOCSCAN.
Suggested by perry and soda on tech-kern.
Please refer options(4) for details for these options.
where the printing of `version' is already performed.
This has the benefit of allowing the copyright to be available
via dmesg(8) on platforms which need the `msgbuf' to be setup
in cpu_startup() before printed output is remembered.
distinction between signalling NaNs and quiet NaNs back into the
machine-dependent headers; treat the implementation of __nanf in the
same spirit.
IEEE 754 leaves the distinction between signalling NaNs and quiet NANs
to the implementation, and unlike our headers used to suggest they're
not identical in the interpretation of the fraction's MSb; in due
course, make those of hppa, mips, sh3, and sh5 reflect reality.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
interrupt counter for a given shared interrupt descriptor.
- When an interrupt is successfully handled, reset the strays counter,
thus preventing a "slow leak" from eventually shutting off the interrupt
vector. Idea taken from pci_kn300.c (which was changed to use the new
alpha_shared_intr_reset_strays() function).
to select the maximum segment size for each bus_dmamap_load (up to the maxsegsz
supplied to bus_dmamap_create). dm_maxsegsz is reset to the value supplied to
bus_dmamap_create when the dmamap is unloaded.
- Ffs internal snapshots get compiled in unconditionally.
- File system snapshot device fss(4) added to all kernel configs that
have a disk. Device is commented out on all non-GENERIC kernels.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
All those kernels have a line for both tun and bridge, and if either is
commented out, tap is commented out also. With the exception of i386's
GENERIC_TINY.
XXX: we _need_ some way of making this more simple.
current linker script, change references for any .eh_frame sections so
that they go in to the special "DISCARD" section so that they are not
included in the final object, and use the resultant linker script when
actually linking the bootblocks.
Idea (and most of the sed expression) from Jakub Jelinek.
1) Fix small indenting issues
2) Removal of audio* at ... and midi* at ... entries and replace all them with
audio* at audiobus?
midi* at minibus?
3) Adding of USB audio/midi and IrDa entries
to avoid references to EISA variables if "pceb" is not defined in
kernel configurations, and save some bytes
tested by Havard Eidnes (or his colleague:-)
which bustype should be attached with a specific call to config_found()
(from a "mainbus" or a bus bridge).
Do it for isa/eisa/mca and pci/agp for now. These buses all attach to
an mi interface attribute "isabus", "eisabus" etc., and the autoconf
framework now allows to specify an interface attribute on config_found()
and config_search(), which limits the search of matching config data
to these which attach to that specific attribute.
So we basically have to call config_found_ia(..., "foobus", ...) where
such a bus is attached.
As a consequence, where a "mainbus" or alike also attaches other
devices (eg CPUs) which do not attach to a specific attribute yet,
we need at least pass an attribute name (different from "foobus") so
that the foo bus is not found at these places. This made some minor
changes necessary which are not obviously related to the mentioned buses.
the other CPUs resume execution. If the halt was initiated from DDB, the
CPUs will not have been resume execution and will never halt. If one of the
paused CPUs is the primary CPU, the system will hang. Fixes PR#26159.
non-specific EOIs. This really shouldn't matter so much, but I'm guessing
there's a strange interaction with the PALcode (possibly related to the fact
that the PALcode itself may be sending an EOI itself on some systems).
I have tested this on a Multia, and it appears to work just fine without the
INITIALLY_{ENABLED,LEVEL_TRIGGERED}() stuff now, so I'm also removing that.
to all GENERIC-like kernel config files where SYSV* options were already
present (commented out if the SYSV* options are commented out).
Fix lib/25897 and lib/25898.
* On systems where it's relevant (6600, kn300, kn8ae), check which PCI bus
hierarchy we're looking at.
* Consistently check PCI function numbers (important for quad-channel ATA
controllers).
* Check the channel number (important for ATA and multi-channel SCSI
controllers).
* Check logical unit numbers.
* Use strcasecmp() to check b->protocol; the case is different depending on
the firmware revision.
* Remove b->boot_dev_type check; it's unneeded.
* Remove extraneous unneeded variables.
Not only does this fix a few PRs and multiple long-standing bugs, but the
code is smaller too.
PR 9432
PR 12225
PR 25830
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP
drivers that attach to it. This allows for other host interface chips
that use the same keyboards and mice, such as the ones in the ARM
IOMD20, ARM7500, and SA-1111. The PC-compatible driver is still
called pckbc(4), and the new abstraction layer is "pckbport", so the
child devices have moved from sys/dev/pckbc to sys/dev/pckbport, which
also contains some code shared between all host controllers. To avoid
incompatibility, pckbdreg.h is still installed in
/usr/include/dev/pckbc.
In theory, this shouldn't cause any behavioural changes in the drivers
concerned. Thy just use rather more function pointers than before. Tested
on i386 and (with a new host driver) acorn32. Compiled on several other
affected architectures.
to only call pckbc_machdep_cnattach() if this is present. This allows
pckbc_machdep_cnattach() to be omitted entirely on most ports, where it only
returns ENXIO anyway.
The devices with this attribute at the moment are pc(4) on i386 and bebox, and
pckbc on sparc, where pckbc_machdep_cnattach() mysteriously returns 0 rather
than ENXIO.
sp, 64(sp)") on the theory that they're from the function epilogue,
which aggressive code motion has placed before the end of the
function's code.
Addresses my PR port-alpha/23996.
* lpt device is defined in MI place (dev/ppbus/files.ppbus), dev/ic/lpt.c
is included there too; dev/ic/lpt.c is not included if ppbus is
configured or if there is alternative platform lpt (like for pc532)
* g/c MD lpt definitions and custom puc/upc attachments,
glue moved to conf/files and dev/pci/files.pci respectively; remove
device lpt definition from dev/isa/files.isa
* add ppbus parport attribute, atppc device attachments, adjust plip and lpt
glue
files for machines I know to have genuine PCI slots. As sent to tech-kern
for feedback in December 2003. Based on feedback, opencrypto is commented
out in the macppc GENERIC (due to absense of GENERIC_SOFTINT support),
and added to the sparc64 config (sys/arch/sparc64/conf/GENERIC32).
process context ('reaper').
From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit
uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.
MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.
g/c now unneeded routines and variables, including the reaper kernel thread
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.
This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.
On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
copyin() or copyout().
uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
Simplify window test when adding a ras and correct test on VM_MAXUSER_ADDRESS.
Avoid unpredictable branch in i386 locore.S
(pad fields left in struct proc to avoid kernel bump)
containing signal posting, kernel-exit handling and sa_upcall processing.
XXX the pc532, sparc, sparc64 and vax ports should have their
XXX userret() code rearranged to use this.
uvm_swapout_threads will swapout LWPs which are running on another CPU:
- uvm_swapout_threads considers LWPs running on another CPU for swapout
if their l_swtime is high
- uvm_swapout_threads considers LWPs on the runqueue for swapout if their
l_swtime is high but these LWPs might be running by the time uvm_swapout
is called
symptoms of failure: panic in setrunqueue
fixes PR kern/23095