the original code since if fullgroups was empty and partgroups wasn't, we
would not clean up partgroups (pointed out by yamt). Well, this one has
different semantics from the original, they are the correct ones I think..
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
COMPAT_16 and earlier that results in a current shared linker running at
address 0 (and thus allows NULL pointer derefs to work).
As noted by Matthias Drochner, this "fix" just checks the first psection
and not the first loadable psection. This isn't a problem with the
binutils up to now, but might be in the future.
This closes a hole pointed out by Thor Lancelot Simon on tech-kern ~3
years ago.
The problem was with running binaries from remote storage, where our
kernel (and Veriexec) has no control over any changes to files.
An attacker could, after the fingerprint has been verified and
program loaded to memory, inject malicious code into the backing
store on the remote storage, followed by a forced flush, causing
a page-in of the malicious data from backing store, bypassing
integrity checks.
Initial implementation by Brett Lymn.
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.
clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.
split the single list of pool cache groups into three lists:
completely full, partially full, and completely empty.
use LIST instead of TAILQ where appropriate.
- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.
Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.
Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.
This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.
by changing the symlink one to set vap's vatype to VLNK. All the other three
already set vatype to the correct type. Note that, however, in the mkdir
case (and now symlink too) this is not strictly necessary.
used in ioctl routines to do the right thing when the FKIOCTL flag is
passed to the IOCTL routine indicating its a in-kernel VOP_IOCTL call and
indirect addresses provided in the arguments are to be seen as kernel
adresses rather than userland adresses.
A simple substitution and prepending of the `flags' passed on to the ioctl
handler is enough to DTRT.
we can implement an universal submatch() function covering all
the standard cases:
if (<configured> != <wildcard> && <configured> != <real>)
then fail
else
ask device match function
explicitely by a plain integer array
the length in now known to all relevant parties, so this avoids
duplication of information, and we can allocate that thing in
drivers without hacks
Added a big FIXME because two group lists containing the same entries,
but ordered differently, still compare as unequal. The same holds if one
group list contains an entry twice while the other does not. ok'ed by
christos.
VOP_GETATTR() fills a struct vattr, where va_fsid and va_fileid (device
and inode..) are typed as long.
Add some casts when using these values and surround them with XXXs about
the potential size mismatch, as long can be 64 bits but dev_t and ino_t
are always 32 bits. This is safe because *for now* we're still using
32 bit inode numbers.
Discussed with blymn@.
For example, this used to give false logs of matching fingerprint for
foo.sh while foo.sh don't have an entry, and the program executed (and
matching the fingerprint) is the interpreter - /bin/sh.
can be easily used by netbsd32 code.
XXX Meanwhile, introduce a copyinout_t type that matches the prototype of
XXX copyin(9) and copyout(9). Its logical place would be in systm.h, near
XXX the definition of copyin, but, well, see the comment.
as an argument a function that will retrieve an element of the pointer
arrays in user space. This allows COMPAT_NETBSD32 to share the code for
the emulated version of execve(2), and fixes various issues that came from
the slow drift between the two implementations.
Note: when splitting up a syscall function, I'll use two different ways
of naming the resulting helper function. If it stills does
copyin/out operations, it will be named <syscall>1(). If it does
not (as it was the case for get/setitimer), it will be named
do<syscall>.
relevant code with the COMPAT_NETBSD32 version, and make the latter use
the new functions.
This fixes netbsd32_setitimer() which had drifted from the native syscall
and did not work properly anymore.
- Add a @{var} syntax in addition to @var. This allows for patterns like
@{ostype}-@{osrelease}-@{machine_arch}.
- Add a @emul variable that expands to the process's emulation name
(e.g. "netbsd", "netbsd32", "linux", etc.)
Fix an issue when scripts are executed under systrace where the argv[0]
would be normalized, and hence break scripts that depend on how they were
called.
system-specific values. Submitted by Chris Demetriou in Nov 1995 (!)
in PR kern/1781, modified only slighly by me.
This is enabled on a per-mount basis with the MNT_MAGICLINKS mount
flag. It can be enabled at mountroot() time by building the kernel
with the ROOTFS_MAGICLINKS option.
The following magic strings are supported by the implementation:
@machine value of MACHINE for the system
@machine_arch value of MACHINE_ARCH for the system
@hostname the system host name, as set with sethostname()
@domainname the system domain name, as set with setdomainname()
@kernel_ident the kernel config file name
@osrelease the releaes number of the OS
@ostype the name of the OS (always "NetBSD" for NetBSD)
Example usage:
mkdir /arch/i386/bin
mkdir /arch/sparc/bin
ln -s /arch/@machine_arch/bin /bin
- Change #ifdef VERIFIED_EXEC_VERBOSE to another verbose level, 2. Add
sysctl(3) bits.
- Simplify access type conflict handling during load. This depends on
the values of access type defines to be ordered from least to most
'strict'.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
in the veriexec table entry; the lookups are very cheap now. Suggested
by Chuq.
- Handle non-regular (!VREG) files correctly).
- Remove (no longer needed) FINGERPRINT_NOENTRY.
- Make the locking rules for pr_rmpage() sane, and don't modify fields
protected by the pool lock without actually holding it.
- Always defer freeing the pool page to the back-end allocator, to avoid
invoking the pool_allocator with the pool locked (which would violate
the pool_allocator -> pool locking order).
- Fix pool_reclaim() to not violate the pool_cache -> pool locking order
by using a trylock.
Reviewed by Chuq Silvers.
- Better organize strict level. Now we have 4 levels:
- Level 0, learning mode: Warnings only about anything that might've
resulted in 'access denied' or similar in a higher strict level.
- Level 1, IDS mode:
- Deny access on fingerprint mismatch.
- Deny modification of veriexec tables.
- Level 2, IPS mode:
- All implications of strict level 1.
- Deny write access to monitored files.
- Prevent removal of monitored files.
- Enforce access type - 'direct', 'indirect', or 'file'.
- Level 3, lockdown mode:
- All implications of strict level 2.
- Prevent creation of new files.
- Deny access to non-monitored files.
- Update sysctl(3) man-page with above. (date bumped too :)
- Remove FINGERPRINT_INDIRECT from possible fp_status values; it's no
longer needed.
- Simplify veriexec_removechk() in light of new strict level policies.
- Eliminate use of 'securelevel'; veriexec now behaves according to
its strict level only.
- Use u_char for the fingerprint status.
- Add a pointer to the vnode's veriexec hash table entry in the vnode
struct. This saves a lookup and will also used by planned features.
- When removing a file from the tables, set the vnode fingerprint status
to NOENTRY.
- Add switch to do flag-specific handling in veriexec_verify(). At the
moment this prevents execution of FILE entries in strict level 2, but
it will also be used by planned features.
- Use memset() instead of bzero().
- Various cosmetic changes.
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
of fingerprinting algorithms to the ops vector.
- Cleanup in veriexec_add_fp_name().
- Remove veriexec_default_ops and use the above API for adding the default
methods in veriexec_init_fp_ops().
When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.
(fdp->fd_lastfile - i) against fd_knlistsize. Otherwise we can
call knote_fdclose() on a file descriptor that doesn't have a knote.
This issue explains random panics I have had on process exit over the
past few years.
New features:
- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).
Bug fixes:
- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.