as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.
Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.
Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
Allocate ksiginfo on stack since it is safe and sigget() assumes that it is
not allocated from pool (pending signals via sigput()/sigget() "mill" should
be dynamically allocated, however). Might be useful to revisit later.
Likely the cause of PR/40750 and indirect cause of PR/39283.
collecting garbage in two phases: in the first stage, with
alldevs_mtx held, gather all of the objects to be freed onto a
list. Drop alldevs_mtx, and in the second stage, free all the
collected objects.
Also per rmind@'s suggestion, remove KASSERT(!mutex_owned(&alldevs_mtx))
throughout, it is not useful.
Find a free unit number and allocate it for a new device_t atomically.
Before, two threads would sometimes find the same free unit number
and race to allocate it. The loser panicked. Now there is no
race.
In support of the changes above, extract some new subroutines that
are private to this module: config_unit_nextfree(), config_unit_alloc(),
config_devfree(), config_dump_garbage().
Delete all of the #ifdef __BROKEN_CONFIG_UNIT_USAGE code. Only
the sun3 port still depends on __BROKEN_CONFIG_UNIT_USAGE, it's
not hard for the port to do without, and port-sun3@ had fair warning
that it was going away (>1 week, or a few years' warning, depending
how far back you look!).
Only sleep once within each pipe_read/pipe_write call.
If there is no data/space available after we wakeup return ERESTART so
then the 'fd' number is validated again.
A simple broadcast of the cvs is then enough to evict the correct threads
when close() is called from an active thread.
Only one fd needs clobbering, not all fds that reference the pipe.
This may be what ad@ realised when he tried to add the same code to
sockets. Unfixes part of PR/26567.
For each syscall, add a flag for the return value or an argument indicating
that it is a 64-bit argument. Also include the number of 64-bit arguments.
In theory this could get most of the code in compat/netbsd32/netbsd32_netbsd.c
but not at the moment due to multiply defined structures.
open files is rather gross - the poll map isn't required to be dense.
Instead limit to a much larger value (1000 + dt_nfiles) so that user
programs cannot allocate indefinite sized blocks of kvm.
If the limit is exceeded, then return EINVAL instead of silently truncating
the list.
(The silent truncation in select isn't quite as bad - although even there
any high bits that are set ought to generate an EBADF response.)
Move the code that converts ERESTART and EWOULDBLOCK into common code.
Effectively fixes PR/17507 since the new limit is unlikely to be detected.
struct timeval passed to todr_gettime(9) and todr_settime(9).
We no longer have an ancient and volatile struct timeval `time'
global since we have switched to MI timercounter(9) on all port.
XXX1: some of these RTC drivers still assume 32bit time_t
XXX2: some of these should be rewritten to use todr_[gs]ettime_ymdhms()
XXX3: todr(9) man page doesn't mention todr_[gs]ettime_ymdhms()
check the signal number to be in the allowed range. An invalid
signal number could crash the kernel by overflowing the sigset_t
array.
More checks would be good, and SIGEV_THREAD shouldn't be dropped
silently, but this fixes at least the local DOS vulnerability.
-an invalid signal number passed to mq_notify(2) could crash the kernel
on delivery -- add a boundary check
-mq_receive(2) from an empty queue crashed the kernel by NULL dereference
in timeout calculation -- handle the NULL case
-likewise for mq_send(2) to a full queue
-a user could set mq_maxmsg (the maximal number of messages in a queue)
to a huge value on mq_open(O_CREAT) and later use up all kernel
memory by mq_send(2) -- add a sysctl'able limit which defaults
to 16*mq_def_maxmsg
(mq_notify(2) should get some more checks, and SIGEV_* values other
than SIGEV_SIGNAL should be handled somehow, but this doesn't look
security critical)
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
can sleep on the vnode lock, while vget is sleeping on the
VI_INACTNOW flag (or the vget caller is looping on vget returning failure
because of the VI_INACTNOW flag). With layered FSes, the upper and lower
vnodes share the same lock, so the vget() caller above can be already
holding the vnode lock.
Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
vrelel(), and check the ref count again once we have the lock. If the
vnode has more than one reference, donc VOP_INACTIVE it.
Fix PR kern/42318 and PR kern/42377
patch tested by Hisashi T Fujinaka, Joachim König, Stephen Borrill and
Matthias Scheler.
Part 1 of N:
There is not an MI ordering of interrupt priority levels,
so use == IPL_HIGH and != IPL_HIGH instead of >= IPL_HIGH
and < IPL_HIGH. Ignore 'cold' and always use curcpu(),
since cpu_info_primary is MD.
Other changes:
There is no need to create symbols named _spldebug_* and
strong aliases to them. Just use symbols spldebug_*,
instead. Use a temporary variable instead of repeat
cpu_index(9) calls. KASSERT() that cpu_index(9) is <
MAXCPUS.
This allows use of subr_disk_mbr on all archs. Default to it for
the rump disk component. No functional change for regular kernels.
(The other option would've been to include dkbad in disklabels
everywhere, but arguably this approach has less possible side-effects,
especially given that wedges and related magic will take over the
world any second now).
support, i.e. move vfs functionality to a separate module
(kern_module_vfs.c)
* make module proplist size an MI constant (now 8k) instead of PAGE_SIZE
* change some error values to something else than the karmic EINVAL
I applied this patch with Coccinelle's semantic patch tool, spatch(1).
I installed Coccinelle from pkgsrc: devel/coccinelle/. I wrote
tailq.spatch and kdefs.h (see below) and ran this command,
spatch -debug -macro_file_builtins ./kdefs.h -outplace \
-sp_file sys/kern/tailq.spatch sys/kern/subr_autoconf.c
which wrote the transformed source file to /tmp/subr_autoconf.c. Then I
used indent(1) to fix the indentation.
::::::::::::::::::::
::: tailq.spatch :::
::::::::::::::::::::
@@
identifier I, N;
expression H;
statement S;
iterator name TAILQ_FOREACH;
@@
- for (I = TAILQ_FIRST(H); I != NULL; I = TAILQ_NEXT(I, N)) S
+ TAILQ_FOREACH(I, H, N) S
:::::::::::::::
::: kdefs.h :::
:::::::::::::::
#define MAXUSERS 64
#define _KERNEL
#define _KERNEL_OPT
#define i386
/*
* Tail queue definitions.
*/
#define _TAILQ_HEAD(name, type, qual) \
struct name { \
qual type *tqh_first; /* first element */ \
qual type *qual *tqh_last; /* addr of last next element */ \
}
#define TAILQ_HEAD(name, type) _TAILQ_HEAD(name, struct type,)
#define TAILQ_HEAD_INITIALIZER(head) \
{ NULL, &(head).tqh_first }
#define _TAILQ_ENTRY(type, qual) \
struct { \
qual type *tqe_next; /* next element */ \
qual type *qual *tqe_prev; /* address of previous next element */\
}
#define TAILQ_ENTRY(type) _TAILQ_ENTRY(struct type,)
#define PMF_FN_PROTO1 pmf_qual_t
#define PMF_FN_ARGS1 pmf_qual_t qual
#define PMF_FN_CALL1 qual
#define PMF_FN_PROTO , pmf_qual_t
#define PMF_FN_ARGS , pmf_qual_t qual
#define PMF_FN_CALL , qual
#define __KERNEL_RCSID(a, b)
the system into config_deactivate(dev): deactivate dev and all of
its descendants. Block all interrupts while calling each device's
activation hook, ca_activate. Now it is possible to simplify or
to delete several device-activation hooks throughout the system.
Do not deactivate a driver while detaching it! If the driver was
already deactivated (because of accidental/emergency removal), let
the driver cope with the knowledge that DVF_ACTIVE has been cleared.
Otherwise, let the driver access the underlying hardware (so that
it can flush caches, restore original register settings, et cetera)
until it exits its device-detachment hook.
Let multiple readers and writers simultaneously access the system's
device_t list, alldevs, from either interrupt or thread context:
postpone changing alldevs linkages and freeing autoconf device
structures until a garbage-collection phase that runs after all
readers & writers have left the list.
Give device iterators (deviter(9)) a consistent view of alldevs no
matter whether device_t's are added and deleted during iteration:
keep a global alldevs generation number. When an iterator enters
alldevs, record the current generation number in the iterator and
increase the global number. When a device_t is created, label it
with the current global generation number. When a device_t is
deleted, add a second label, the current global generation number.
During iteration, compare a device_t's added- and deleted-generation
with the iterator's generation and skip a device_t that was deleted
before the iterator entered the list or added after the iterator
entered the list.
The alldevs generation number is never 0. The garbage collector
reaps device_t's whose delete-generation number is non-zero.
Make alldevs private to sys/kern/subr_autoconf.c. Use deviter(9)
to access it.
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
if the vnode is being inactivated (by vrelel()), wait for
vrelel() to complete (or return EBUSY if we can't wait), and return
ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
of transitions to IPL_HIGH from lower IPLs. SPLDEBUG is only available
on i386 and Xen kernels, today.
'options SPLDEBUG' adds instrumentation to spllower() and splraise() as
well as routines to start/stop debugging and to record IPL transitions:
spldebug_start(), spldebug_stop(), spldebug_raise(), spldebug_lower().
meta-data of this pool takes more space than the actual data..
- Reduce lowat/hiwat to 1..8, since intensity is very low.
- Remove unused pew_next_free from pmf_event_workitem_t.
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.
Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).
Discussed on <tech-kern>, reviewed by <ad>.
early during boot, just after CPUs are attached but before they are marked
as running.
This will result in a list of CPUs without the SPCF_RUNNING flag set, and
will trigger the 'KASSERT(xc_tailp < xc_headp)' in xc_lowpri() as no cross
call is issued.
Bug reported and patch tested by tron@.
See also http://mail-index.netbsd.org/tech-kern/2009/10/19/msg006293.html
from usr.sbin/mopd/common/pf.c, where only the ad clause is removed,
because it has a shared UCB copyright) on Mats O Jansson's files.
thorpej OK'd usr.sbin/rpc.yppasswdd/yppasswdd_mkpw.c, where he shares
copyright.
Most times we've come through spec_read() which has already done the test,
but not always (eg pty with ptsfs mounted).
Without this there is a simple local-user panic in ureadc().
Noted Matthew Mondor on tech-kern.
that can't happen as early as the other init functions as called from
cpu_startup() -- for example, register kauth(9) listeners.
Put unprivileged policy in the x86 code; used by i386, amd64, and xen.
but access control) user mounting policies: enforced MNT_NOSUID and
MNT_NODEV, no MNT_EXPORT, MNT_EXEC propagation. This can be useful for
secmodels that are interested in simply adding finer grained user mount
support.
- Add a mount subsystem listener for KAUTH_REQ_SYSTEM_MOUNT_GET.
really belongs (suggested by rmind@),
- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.
Reviewed by and okay rmind@.
close to it). Useful for secmodels.
Replace open-coded form with it in secmodel code (securelevel, keylock).
Note: I need to find a way to make secmodel_keylock.c ~<100 lines.
- Separate the suser part of the bsd44 secmodel into its own secmodel
and directory, pending even more cleanups. For revision history
purposes, the original location of the files was
src/sys/secmodel/bsd44/secmodel_bsd44_suser.c
src/sys/secmodel/bsd44/suser.h
- Add a man-page for secmodel_suser(9) and update the one for
secmodel_bsd44(9).
- Add a "secmodel" module class and use it. Userland program and
documentation updated.
- Manage secmodel count (nsecmodels) through the module framework.
This eliminates the need for secmodel_{,de}register() calls in
secmodel code.
- Prepare for secmodel modularization by adding relevant module bits.
The secmodels don't allow auto unload. The bsd44 secmodel depends
on the suser and securelevel secmodels. The overlay secmodel depends
on the bsd44 secmodel. As the module class is only cosmetic, and to
prevent ambiguity, the bsd44 and overlay secmodels are prefixed with
"secmodel_".
- Adapt the overlay secmodel to recent changes (mainly vnode scope).
- Stop using link-sets for the sysctl node(s) creation.
- Keep sysctl variables under nodes of their relevant secmodels. In
other words, don't create duplicates for the suser/securelevel
secmodels under the bsd44 secmodel, as the latter is merely used
for "grouping".
- For the suser and securelevel secmodels, "advertise presence" in
relevant sysctl nodes (sysctl.security.models.{suser,securelevel}).
- Get rid of the LKM preprocessor stuff.
- As secmodels are now modules, there's no need for an explicit call
to secmodel_start(); it's handled by the module framework. That
said, the module framework was adjusted to properly load secmodels
early during system startup.
- Adapt rump to changes: Instead of using empty stubs for securelevel,
simply use the suser secmodel. Also replace secmodel_start() with a
call to secmodel_suser_start().
- 5.99.20.
Testing was done on i386 ("release" build). Spearated module_init()
changes were tested on sparc and sparc64 as well by martin@ (thanks!).
Mailing list reference:
http://mail-index.netbsd.org/tech-kern/2009/09/25/msg006135.html
definition, drvctl_init() is not called, the drvctl_eventq is not
initialized, and the kernel will panic in devmon_insert() when a
device is detached.
Thanks to Jared McNeill for pointing out the panic.
Since VNODE_LOCKDEBUG has never been generally useful, default to
off. However, the checks can still be generated by flipping the
switch for the isolated cases where this form of dynamic analysis
is useful and the person using it knows what she is doing.
lookup_for_nfsd(). This code is, or at least should be, the same as
the regular symlink-following code plus an extra flag nfsd needs.
The two lots of code can/will be merged in the future.
private backdoor entry point this is.
Also, clone the lookup_for_nfsd() entry point as
lookup_for_nfsd_index(), for use by a different call site in nfsd that
does different unclean things with nameidata.
and make suspension by self, by drvctl(8), and by ACPI system sleep
play nice together. Start solidifying some temporary API changes.
1. Extract a new header file, <sys/device_if.h>, from <sys/device.h> and
#include it from <sys/pmf.h> instead of <sys/device.h> to break the
circular dependency between <sys/device.h> and <sys/pmf.h>.
2. Introduce pmf_qual_t, an aggregate of qualifications on a PMF
suspend/resume call. Start to replace instances of PMF_FN_PROTO,
PMF_FN_ARGS, et cetera, with a pmf_qual_t.
3. Introduce the notion of a "suspensor," an entity that holds a
device in suspension. More than one suspensor may hold a device
at once. A device stays suspended as long as at least one
suspensor holds it. A device resumes when the last suspensor
releases it.
Currently, the kernel defines three suspensors,
3a the system-suspensor: for system suspension, initiated
by 'sysctl -w machdep.sleep_state=3', by lid closure, by
power-button press, et cetera,
3b the drvctl-suspensor: for device suspension by /dev/drvctl
ioctl, e.g., drvctl -S sip0.
3c the system self-suspensor: for device drivers that suspend
themselves and their children. Several drivers for network
interfaces put the network device to sleep while it is not
administratively up, that is, after the kernel calls if_stop(,
1). The self-suspensor should not be used directly. See
the description of suspensor delegates, below.
A suspensor can have one or more "delegates". A suspensor can
release devices that its delegates hold suspended. Right now,
only the system self-suspensor has delegates. For each device
that a self-suspending driver attaches, it creates the device's
self-suspensor, a delegate of the system self-suspensor.
Suspensors stop a system-wide suspend/resume cycle from waking
devices that the operator put to sleep with drvctl before the cycle.
They also help self-suspension to work more simply, safely, and in
accord with expectations.
4. Add the notion of device activation level, devact_level_t,
and a routine for checking the current activation level,
device_activation(). Current activation levels are DEVACT_LEVEL_BUS,
DEVACT_LEVEL_DRIVER, and DEVACT_LEVEL_CLASS, which respectively
indicate that the device's bus is active, that the bus and device are
active, and that the bus, device, and the functions of the device's
class (network, audio) are active.
Suspend/resume calls can be qualified with a devact_level_t.
The power-management framework treats a devact_level_t that
qualifies a device suspension as the device's current activation
level; it only runs hooks to reduce the activation level from
the presumed current level to the fully suspended state. The
framework treats a devact_level_t qualifying device resumption
as the target activation level; it only runs hooks to raise the
activation level to the target.
5. Use pmf_qual_t, devact_level_t, and self-suspensors in several
drivers.
6. Temporarily add an unused power-management workqueue that I will
remove or replace, soon.
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).
tested by booting a kernel in qemu and compile-testing i386/ALL
In the for(;;) loop of turnstile_block(), the lock owner can change while
cur's lock is released (cur's lock is also the tschain_t's mutex).
Remove the KASSERT about owner being invariant and try to deal with the
fact that the owner can change instead.
http://mail-index.netbsd.org/tech-kern/2009/08/24/msg005957.html
and followups.
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.
In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.
Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.
In support of the changes above,
1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().
2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.
3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
since they are only peripherially related to the autoconf subsystem
and more related to boot initialization. Also, apply _KERNEL_OPT
to autoconf where necessary.
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html
sysctl_name[0], so write that instead of sysctl_name[sz] (where sz
just happened to be set to 0 in the previous line).
Also in sysctl_create(), give the length of the sysctl_name its
own variable, nsz, and reserve sz for expressing the size of the
node's value.
No functional change intended.
a simple attempt instead of an authoritative answer. The failure of the
rootspec test could me machine-dependant. Thanks to martin@ for pointing
that out.
iterator and the length of the old child array, but introduce a
new variable, 'olen', for the latter purpose.
In sysctl_alloc(), name a constant.
Introduce sysctl_log_print(), a handy debug routine.
No functional changes intended.
yet in service, have a correct pointer to their parent, sysctl_parent.
This fixes a bug where sysctl_teardown(9) could not clean up a
network interface's sysctl(9) trees when I detached it, because
the wrong log had been recorded.
support -ve lengths (lock area before current offset).
Nothing in libc or the kernel allowed for this, so some random part
of the file would get locked (no idea which bits).
Although this could probably be fixed in libc, the stubs for posix file
locks for emulations could easily get into the kernel with -ve lengths.
So fixing in the kernel avoids those problems.
This also fixes PR/41620 (attempting to lock negative offsets) - which
is what I was looking into!
and vtime > 0. It should be allowed to go to sleep for the sleep interval
indicated in vtime. Reported by der Mouse a long while ago, and this is what
other unixes do.
Factor out common code of chroot-like syscalls into change_root() and export
that function for use in other parts of the kernel.
Rename change_dir() to chdir_lookup() as the latter describes better what
the function does. While there, move the namei_data initialisation into
chdir_lookup(), too. And export chdir_lookup().
mq_prio_max is dynamic, and sorted list is used for custom setup, when
user manually sets higher priority range.
- Cache mq->mq_attrib in some places. Change msg_ptr type to uint8_t.
- Update copyright, misc.
taking a reference to curlwp's by calling fd_hold(). If lwp_create()
is called from fork1(), then l2 != curlwp, but l2's and not curlwp's
filedesc_t whose reference we should take.
This change stops the problem I describe in
<http://mail-index.netbsd.org/tech-kern/2009/07/09/msg005422.html>,
where /dev/rsd0a is never properly closed after fsck / runs on it.
This change seems to quiet my USB backup drive, sd0 at scsibus0 at
umass0, which had stopped spinning down when it was not in use:
The unit probably stayed open after mount(8) tried (and failed:
errant fstab entry) to mount it.
I am confident that this change is an improvement, but I doubt that
it is the last word on the matter. I hate to get under the filedesc_t
abstraction by fiddling with fd_refcnt, and there may be something
I have missed, so somebody with greater understanding of the file
descriptors code should have a look.
must not allocate a pmf_event_workitem_t using kmem_alloc(9). Use
pool_cache(9), instead, because it is safe in interrupt context.
Thanks, rmind@, for catching the problem and suggesting the solution.
latter might lose its KAUTH_GENERIC_ISSUSER check soon, add an internal
function, mqueue_access(), and call genfs_can_access() from it instead
so we don't pollute the main code path once we need to add a special
kauth(9) check for message queues.
No functional change, error codes preserved.
Related mailing list thread:
http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr
All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.
XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
case functionality of namei in a simple package with only a couple flags.
A substantial majority of the namei call sites in the kernel can use
this interface; this will isolate those areas from the changes arising
as the internals of namei are fumigated.
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:
- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.
Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
(void *)pew is one way to get a struct work *, but let's
write&pew->pew_work, instead. It is more defensive and persuasive.
Make miscellaneous changes in support of tearing down arbitrary
stacks of filesystems and devices during shutdown:
1 Move struct shutdown_state, shutdown_first(), and shutdown_next(),
from kern_pmf.c to subr_autoconf.c. Rename detach_all() to
config_detach_all(), and move it from kern_pmf.c to subr_autoconf.c.
Export all of those routines.
2 In pmf_system_shutdown(), do not suspend user process scheduling, and
do not detach all devices: I am going to do that in cpu_reboot(),
instead. (Soon I will do it in an MI cpu_reboot() routine.) Do still
call PMF shutdown hooks.
3 In config_detach(), add a DIAGNOSTIC assertion: if we're exiting
config_detach() at the bottom, alldevs_nwrite had better not be 0,
because config_detach() is a writer of the device list.
4 In deviter_release(), check to see if we're iterating the device list
for reading, *first*, and if so, decrease the number of readers. Used
to be that if we happened to be reading during shutdown, we ran the
shutdown branch. Thus the number of writers reached 0, the number
of readers remained > 0, and no writer could iterate again. Under
certain circumstances that would cause a hang during shutdown.
filesystem is mounted. Synchronize access to the number with a
mutex. When a struct mount, mp, is allocated, assign the current
generation number to mp->mnt_gen. Introduce vfs_unmount_forceone()
that forcefully unmounts the most recently mounted filesystem.
Refactor: extract vfs_shutdown1() from vfs_shutdown(). Extract
vfs_sync_all() from vfs_shutdown1().
Print more progress indications while we're unmounting all of the
filesystems during shutdown.
We increase the reference count on mp before calling dounmount(mp),
but we do not decrease it if dounmount(mp) fails, and neither does
dounmount(mp). So decrease the reference count if dounmount(mp)
fails.
Change the loop terminating condition in vfs_unmountall1() to (mp
!= (void *)&mountlist) from !CIRCLEQ_EMPTY(&mountlist), because we
may not ever empty the list, especially if we're not forcing the
filesystems to unmount.
the other routines of the same spirit.
Adjust file-system code to use it.
Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).
No objections on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
by making the pps count time stamp and the update
time stamp u_int64.
The time delta between two PPS events can now
be correctly calculated avoiding any unaccounted
for wraps with 32-bit counters.