- Add a @{var} syntax in addition to @var. This allows for patterns like
@{ostype}-@{osrelease}-@{machine_arch}.
- Add a @emul variable that expands to the process's emulation name
(e.g. "netbsd", "netbsd32", "linux", etc.)
Fix an issue when scripts are executed under systrace where the argv[0]
would be normalized, and hence break scripts that depend on how they were
called.
system-specific values. Submitted by Chris Demetriou in Nov 1995 (!)
in PR kern/1781, modified only slighly by me.
This is enabled on a per-mount basis with the MNT_MAGICLINKS mount
flag. It can be enabled at mountroot() time by building the kernel
with the ROOTFS_MAGICLINKS option.
The following magic strings are supported by the implementation:
@machine value of MACHINE for the system
@machine_arch value of MACHINE_ARCH for the system
@hostname the system host name, as set with sethostname()
@domainname the system domain name, as set with setdomainname()
@kernel_ident the kernel config file name
@osrelease the releaes number of the OS
@ostype the name of the OS (always "NetBSD" for NetBSD)
Example usage:
mkdir /arch/i386/bin
mkdir /arch/sparc/bin
ln -s /arch/@machine_arch/bin /bin
- Change #ifdef VERIFIED_EXEC_VERBOSE to another verbose level, 2. Add
sysctl(3) bits.
- Simplify access type conflict handling during load. This depends on
the values of access type defines to be ordered from least to most
'strict'.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
in the veriexec table entry; the lookups are very cheap now. Suggested
by Chuq.
- Handle non-regular (!VREG) files correctly).
- Remove (no longer needed) FINGERPRINT_NOENTRY.
- Make the locking rules for pr_rmpage() sane, and don't modify fields
protected by the pool lock without actually holding it.
- Always defer freeing the pool page to the back-end allocator, to avoid
invoking the pool_allocator with the pool locked (which would violate
the pool_allocator -> pool locking order).
- Fix pool_reclaim() to not violate the pool_cache -> pool locking order
by using a trylock.
Reviewed by Chuq Silvers.
- Better organize strict level. Now we have 4 levels:
- Level 0, learning mode: Warnings only about anything that might've
resulted in 'access denied' or similar in a higher strict level.
- Level 1, IDS mode:
- Deny access on fingerprint mismatch.
- Deny modification of veriexec tables.
- Level 2, IPS mode:
- All implications of strict level 1.
- Deny write access to monitored files.
- Prevent removal of monitored files.
- Enforce access type - 'direct', 'indirect', or 'file'.
- Level 3, lockdown mode:
- All implications of strict level 2.
- Prevent creation of new files.
- Deny access to non-monitored files.
- Update sysctl(3) man-page with above. (date bumped too :)
- Remove FINGERPRINT_INDIRECT from possible fp_status values; it's no
longer needed.
- Simplify veriexec_removechk() in light of new strict level policies.
- Eliminate use of 'securelevel'; veriexec now behaves according to
its strict level only.
- Use u_char for the fingerprint status.
- Add a pointer to the vnode's veriexec hash table entry in the vnode
struct. This saves a lookup and will also used by planned features.
- When removing a file from the tables, set the vnode fingerprint status
to NOENTRY.
- Add switch to do flag-specific handling in veriexec_verify(). At the
moment this prevents execution of FILE entries in strict level 2, but
it will also be used by planned features.
- Use memset() instead of bzero().
- Various cosmetic changes.
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
of fingerprinting algorithms to the ops vector.
- Cleanup in veriexec_add_fp_name().
- Remove veriexec_default_ops and use the above API for adding the default
methods in veriexec_init_fp_ops().
When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.
(fdp->fd_lastfile - i) against fd_knlistsize. Otherwise we can
call knote_fdclose() on a file descriptor that doesn't have a knote.
This issue explains random panics I have had on process exit over the
past few years.
New features:
- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).
Bug fixes:
- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
where the printing of `version' is already performed.
This has the benefit of allowing the copyright to be available
via dmesg(8) on platforms which need the `msgbuf' to be setup
in cpu_startup() before printed output is remembered.
* We now use hash tables instead of a list to store the in kernel
fingerprints.
* Fingerprint methods handling has been made more flexible, it is now
even simpler to add new methods.
* the loader no longer passes in magic numbers representing the
fingerprint method so veriexecctl is not longer kernel specific.
* fingerprint methods can be tailored out using options in the kernel
config file.
* more fingerprint methods added - rmd160, sha256/384/512
* veriexecctl can now report the fingerprint methods supported by the
running kernel.
* regularised the naming of some portions of veriexec.
user space. Add an argument `need_copyin' to only use `copyinstr()' if
the name is from user space.
modstat -n NAME works again.
Reviewed by: Peter Postma <peter@netbsd.org>
The *DISC definition is only for backward compatibility with deprecated
TIOC[GS]ETD ioctls, and not needed for new TIOC[GS]LINED ioctls.
The value of IRFRAMEDISC has never been correct, so we don't have any
compatibility to be kept.
Just remove the IRFRAMEDISC defintion.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
This does an #if 0 / #endif, so that no code (or declarations!) are
left after the first "return 1", making this compilable for vax and
playsation2 again, both of which use gcc 2.95.3 or similar.
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
for multiple things (proccnt,lockcnt,sbsize) and it adds too much code
complexity. Instead add a uid_find() routine that returns the existing
struct or allocates a new one.
Re-enable the sbsize limit code.
to be alloctated multiple times:
- we're allocating region of size 1
- there are holes in the extent, but all of size larger than 1
- there are 2 contigous allocations at the end of the extent, the last one
being of size 1.
While there fix a DIAGNOSTIC check: to check that a region is inside the extent
we need to check start and end, not only start.
0. Fix it by returning the peer's block size.
XXX: This is the minimal fix. Probably the buffer size should be initialized
somewhere else, but probably this would need some more code changes.
net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist
which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.
You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.
broken. Inside the kernel, we always have to use the real values of the
st_name fields, and only do the math when the request comes from userland.
No need for ksyms_getval_from{kernel,userland} hack anymore. However, a
different version will be asked for pull-up in -2{,-0}, one that doesn't
break the API, that is.
Fixes PR#29133 from Jens Kessmeier.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
section headers. We only allocate memory for those headers on compat_linux
and compat_ibcs2 while we probe, and although 32K is not such a big number,
we could fix the code in those two places to read section-by-section instead
of all the sections at once as it does now, if we really felt like it.
doing a context switch. use this on sparc and sparc64 to avoid trying
to access user memory (writing the register windows back to the stack)
in this case (since it's both unnecessary and wrong).
compare fd and fdp->fd_lastfile in fdrelease(), so change the test to a
more explicit one. Spotted by Matt Thomas.
Should fix the panic reported by Matthias Scheler.
Hence, make find_last_set return -1 in such situation, and initialize it
such. Otherwise, with 0 meaning two things, it confused the F_CLOSEM
fcntl which could end up looping indifintely (PR#28929 by Brian Marcotte).
However, this change enlightens another bug in fdcopy(), where more entries
than needed were cleared in the new file descriptor table, so the memset()
call there is fixed too.
Analyzed with the help of Greg Oster.
can in some situations exceed the high-water mark, and stay there once it
gets there. Adjust the canrelease function so that it will immediately
bring us back down to the high-water mark in this situation.
How can this happen at all? Consider a machine with two filesystems, one
with a much larger blocksize than the other. If the small-block filesystem
is very busy, growing the cache up to the high-water mark, and then the
large-block filesystem becomes busy, buffers will be recycled (since we
are at the high-water mark) but _grow each time they're recycled_. Once
we're above the high-water mark, the canrelease call in allocbuf (without
this change) doesn't shrink us back down below it; so things get worse and
worse.
and just passes it on to the file system functions. This avoids opening and
closing the device several times.
Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
do not leak siginfo structures.
Note that in the cases of trap signals and timer events, losing this
information could be very bad; right now it will cause us to spin until the
process is SIGKILLed.
"Needs work."
if it's specified, don't use free items as storage for internal state.
so that we can use pools for non memory backed objects.
inspired from solaris's KMC_NOTOUCH.
the code block, so that the purpose is more clear
avoid NULL pointer dereference in lkmunreserve() called on lkm device
close when ksym_addsymtab() fails for the temporary symbol table
(sanity change only, this can never happen at the moment)
* only add the symbol table for the current module if the LKM_E_LOAD
hook returns success; otherwise we overwrite the LKM_E_LOAD error,
which may ultimately lead to (incorrectly) allowing the module load
* only delete the sumbol table for current module if we actually added
the symbol table; avoids deleting symbol table of previously loaded
module when the same module is loaded twice, when the second load
fails with EEXIST
fixes PR kern/28803 by Jens Kessmeier
header files, so that they don't become out of sync (again).
- Use bitmask_snprintf() instead of hand-rolled code.
- Always check array bounds before dereferencing print arrays.
- Order arguments in the vnode printing functions consistently.
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
do { ... } while(/*CONSTCOND*/0)
so that they can be used unadorned in if/else blocks, etc. This means
that you now *have* to put a ; at the end of the "call" to these
macros.
the code that `knows' about /dev/[pt]tyXX names (the BSD ptys) into a separate
file. Make an interface to be used by the tty creating provider. The code
to enable old PTY searching via ptm is enabled via COMPAT_BSDPTY, and it
is turned on by default on all kernels that have compatibility options enabled.
the number of bytes in the send queue, and FIONSPACE reports the
number of free bytes in the send queue. These ioctls permit applications
to monitor file descriptor transmission dynamics.
In examining prior art, FIONWRITE exists with the semantics given
here. FIONSPACE is provided so that programs may easily determine how
much space is left in the send queue; they do not need to know the
send queue size.
The fact that a write may block even if there is enough space in the
send queue for it is noted in the documentation.
FIONWRITE functionality may be used to implement TIOCOUTQ for Linux
emulation - Linux extended this ioctl to sockets, even though they are
not ttys.
- Add a booted_wedge variable that indicates the wedge that was booted
from. If this is NULL, booted_partition is consulted.
- Adjust setroot() and its support routines for root-on-wedges. Could
use some tidy-up, but this works for now.
necessary information to create the pseudo-device instance. Pseudo-device
device's will reference this cfdata, just as normal devices reference
their corresponding cfdata.
Welcome to 2.99.10.
sense, since 1) the condition is quite normal condition and 2) there is
pool between us and uvm.
- Make the step of allocation possibility a bit seamless by moving the origin
of curve from 0 to lowater mark.
Simon told that this helps for interactive performance when there is heavy
disk activity in PR#27057.
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.
while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
kernel message buffer/log. Its off by default and can be switched on in the
kernel configuration on build time, be set as a variable in ddb and be set
using sysctl.
This adds the sysctl value
ddb.tee_msgbuf = 0
by default.
The functionality is especially added and aimed for developers who are not
blessed with a serial console and wish to keep all their ddb output in the
log. Specifying /l as a modifier to some selected commands will also put
the output in the log but not all commands provide one nor has the same
meaning for all commands.
This feature could in the future also be implemented as an ddb command but
that could lead to more bloat allthough maybe easier for non developpers to
use when mailing their backtraces from kernel crashes.
segment should succeed even if the segment would be marked removed; use this
to implement the Linux-compatible semantics of shmat(2)
this fixes the old Linux VMware3 graphics problem with local display,
and possibly other local Linux X clients using MIT-SHM
for Linux-compatible shmat() behaviour - shmat() for the removed shared memory
segment must work from all callers, the shared memory id could be passed e.g.
to native X server via MIT-SHM
temporarily remove the functionality, the Linux-compatible semantics
will be reimplemented differently
the reset condition are processed properly; this fixes PR#26687 by
Jan Schaumann
many thanks to Mark Davies, who tracked the offending change down
and helped test patches
while here, g/c unused sigtrapmask and rearrange some code to pre-r1.190 form
for better readability
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.
Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.
Inspired by similar changes in OpenBSD, but implemented differently.