of fingerprinting algorithms to the ops vector.
- Cleanup in veriexec_add_fp_name().
- Remove veriexec_default_ops and use the above API for adding the default
methods in veriexec_init_fp_ops().
When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.
(fdp->fd_lastfile - i) against fd_knlistsize. Otherwise we can
call knote_fdclose() on a file descriptor that doesn't have a knote.
This issue explains random panics I have had on process exit over the
past few years.
New features:
- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).
Bug fixes:
- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
where the printing of `version' is already performed.
This has the benefit of allowing the copyright to be available
via dmesg(8) on platforms which need the `msgbuf' to be setup
in cpu_startup() before printed output is remembered.
* We now use hash tables instead of a list to store the in kernel
fingerprints.
* Fingerprint methods handling has been made more flexible, it is now
even simpler to add new methods.
* the loader no longer passes in magic numbers representing the
fingerprint method so veriexecctl is not longer kernel specific.
* fingerprint methods can be tailored out using options in the kernel
config file.
* more fingerprint methods added - rmd160, sha256/384/512
* veriexecctl can now report the fingerprint methods supported by the
running kernel.
* regularised the naming of some portions of veriexec.
user space. Add an argument `need_copyin' to only use `copyinstr()' if
the name is from user space.
modstat -n NAME works again.
Reviewed by: Peter Postma <peter@netbsd.org>
The *DISC definition is only for backward compatibility with deprecated
TIOC[GS]ETD ioctls, and not needed for new TIOC[GS]LINED ioctls.
The value of IRFRAMEDISC has never been correct, so we don't have any
compatibility to be kept.
Just remove the IRFRAMEDISC defintion.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
This does an #if 0 / #endif, so that no code (or declarations!) are
left after the first "return 1", making this compilable for vax and
playsation2 again, both of which use gcc 2.95.3 or similar.
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
for multiple things (proccnt,lockcnt,sbsize) and it adds too much code
complexity. Instead add a uid_find() routine that returns the existing
struct or allocates a new one.
Re-enable the sbsize limit code.
to be alloctated multiple times:
- we're allocating region of size 1
- there are holes in the extent, but all of size larger than 1
- there are 2 contigous allocations at the end of the extent, the last one
being of size 1.
While there fix a DIAGNOSTIC check: to check that a region is inside the extent
we need to check start and end, not only start.
0. Fix it by returning the peer's block size.
XXX: This is the minimal fix. Probably the buffer size should be initialized
somewhere else, but probably this would need some more code changes.
net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist
which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.
You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.
broken. Inside the kernel, we always have to use the real values of the
st_name fields, and only do the math when the request comes from userland.
No need for ksyms_getval_from{kernel,userland} hack anymore. However, a
different version will be asked for pull-up in -2{,-0}, one that doesn't
break the API, that is.
Fixes PR#29133 from Jens Kessmeier.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
section headers. We only allocate memory for those headers on compat_linux
and compat_ibcs2 while we probe, and although 32K is not such a big number,
we could fix the code in those two places to read section-by-section instead
of all the sections at once as it does now, if we really felt like it.
doing a context switch. use this on sparc and sparc64 to avoid trying
to access user memory (writing the register windows back to the stack)
in this case (since it's both unnecessary and wrong).
compare fd and fdp->fd_lastfile in fdrelease(), so change the test to a
more explicit one. Spotted by Matt Thomas.
Should fix the panic reported by Matthias Scheler.
Hence, make find_last_set return -1 in such situation, and initialize it
such. Otherwise, with 0 meaning two things, it confused the F_CLOSEM
fcntl which could end up looping indifintely (PR#28929 by Brian Marcotte).
However, this change enlightens another bug in fdcopy(), where more entries
than needed were cleared in the new file descriptor table, so the memset()
call there is fixed too.
Analyzed with the help of Greg Oster.
can in some situations exceed the high-water mark, and stay there once it
gets there. Adjust the canrelease function so that it will immediately
bring us back down to the high-water mark in this situation.
How can this happen at all? Consider a machine with two filesystems, one
with a much larger blocksize than the other. If the small-block filesystem
is very busy, growing the cache up to the high-water mark, and then the
large-block filesystem becomes busy, buffers will be recycled (since we
are at the high-water mark) but _grow each time they're recycled_. Once
we're above the high-water mark, the canrelease call in allocbuf (without
this change) doesn't shrink us back down below it; so things get worse and
worse.
and just passes it on to the file system functions. This avoids opening and
closing the device several times.
Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
do not leak siginfo structures.
Note that in the cases of trap signals and timer events, losing this
information could be very bad; right now it will cause us to spin until the
process is SIGKILLed.
"Needs work."
if it's specified, don't use free items as storage for internal state.
so that we can use pools for non memory backed objects.
inspired from solaris's KMC_NOTOUCH.
the code block, so that the purpose is more clear
avoid NULL pointer dereference in lkmunreserve() called on lkm device
close when ksym_addsymtab() fails for the temporary symbol table
(sanity change only, this can never happen at the moment)
* only add the symbol table for the current module if the LKM_E_LOAD
hook returns success; otherwise we overwrite the LKM_E_LOAD error,
which may ultimately lead to (incorrectly) allowing the module load
* only delete the sumbol table for current module if we actually added
the symbol table; avoids deleting symbol table of previously loaded
module when the same module is loaded twice, when the second load
fails with EEXIST
fixes PR kern/28803 by Jens Kessmeier
header files, so that they don't become out of sync (again).
- Use bitmask_snprintf() instead of hand-rolled code.
- Always check array bounds before dereferencing print arrays.
- Order arguments in the vnode printing functions consistently.
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
do { ... } while(/*CONSTCOND*/0)
so that they can be used unadorned in if/else blocks, etc. This means
that you now *have* to put a ; at the end of the "call" to these
macros.
the code that `knows' about /dev/[pt]tyXX names (the BSD ptys) into a separate
file. Make an interface to be used by the tty creating provider. The code
to enable old PTY searching via ptm is enabled via COMPAT_BSDPTY, and it
is turned on by default on all kernels that have compatibility options enabled.
the number of bytes in the send queue, and FIONSPACE reports the
number of free bytes in the send queue. These ioctls permit applications
to monitor file descriptor transmission dynamics.
In examining prior art, FIONWRITE exists with the semantics given
here. FIONSPACE is provided so that programs may easily determine how
much space is left in the send queue; they do not need to know the
send queue size.
The fact that a write may block even if there is enough space in the
send queue for it is noted in the documentation.
FIONWRITE functionality may be used to implement TIOCOUTQ for Linux
emulation - Linux extended this ioctl to sockets, even though they are
not ttys.
- Add a booted_wedge variable that indicates the wedge that was booted
from. If this is NULL, booted_partition is consulted.
- Adjust setroot() and its support routines for root-on-wedges. Could
use some tidy-up, but this works for now.
necessary information to create the pseudo-device instance. Pseudo-device
device's will reference this cfdata, just as normal devices reference
their corresponding cfdata.
Welcome to 2.99.10.
sense, since 1) the condition is quite normal condition and 2) there is
pool between us and uvm.
- Make the step of allocation possibility a bit seamless by moving the origin
of curve from 0 to lowater mark.
Simon told that this helps for interactive performance when there is heavy
disk activity in PR#27057.
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.
while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
kernel message buffer/log. Its off by default and can be switched on in the
kernel configuration on build time, be set as a variable in ddb and be set
using sysctl.
This adds the sysctl value
ddb.tee_msgbuf = 0
by default.
The functionality is especially added and aimed for developers who are not
blessed with a serial console and wish to keep all their ddb output in the
log. Specifying /l as a modifier to some selected commands will also put
the output in the log but not all commands provide one nor has the same
meaning for all commands.
This feature could in the future also be implemented as an ddb command but
that could lead to more bloat allthough maybe easier for non developpers to
use when mailing their backtraces from kernel crashes.
segment should succeed even if the segment would be marked removed; use this
to implement the Linux-compatible semantics of shmat(2)
this fixes the old Linux VMware3 graphics problem with local display,
and possibly other local Linux X clients using MIT-SHM
for Linux-compatible shmat() behaviour - shmat() for the removed shared memory
segment must work from all callers, the shared memory id could be passed e.g.
to native X server via MIT-SHM
temporarily remove the functionality, the Linux-compatible semantics
will be reimplemented differently
the reset condition are processed properly; this fixes PR#26687 by
Jan Schaumann
many thanks to Mark Davies, who tracked the offending change down
and helped test patches
while here, g/c unused sigtrapmask and rearrange some code to pre-r1.190 form
for better readability
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.
Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.
Inspired by similar changes in OpenBSD, but implemented differently.
- Clean up the namespace of this module and enable the encode/decode
functions and printing functions.
- Move the code that actually generates the UUID out of the system call
routine and into its own function.
> Call dom_dispose() for any SCM_RIGHTS message that went through the
> read path rather than recv. Previously, if an fd was passed via
> sendmsg() but was consumed by the receiver via read() the ref count
> was incremented and never decremented and so the ref count would
> never reach zero even when there was no long any processes holding
> the file open (this was especially bad for locked fds).
loadable drivers and user controlled attach/detach of devices.
An outline was given in
http://mail-index.NetBSD.org/tech-kern/2004/08/11/0000.html
To cite the relevant parts:
-Add a "child detached" and a "rescan" method (both optional)
to the device driver. (This is added to the "cfattach" for now
because this is under the driver writer's control. Logically
it belongs more to the "cfdriver", but this is automatically
generated now.)
The "child detached" is called by the autoconf framework
during config_detach(), after the child's ca_detach()
function was called but before the device data structure
is freed.
The "rescan" is called explicitely, either after a driver LKM
was loaded, or on user request (see the "control device" below).
-Add a field to the device instance where the "locators" (in
terms of the autoconf framework), which describe the actual
location of the device relatively to the parent bus, can be
stored. This can be used by the "child detached" function
for easier bookkeeping (no need to lookup by device instance
pointer). (An idea for the future is to use this for generation
of optimized kernel config files - like DEC's "doconfig".)
-Pass the locators tuple describing a device's location to
various autoconf functions to support the previous. And since
locators do only make sense in relation to an "interface
attribute", pass this as well.
-Add helper functions to add/remove supplemental "cfdata"
arrays. Needed for driver LKMs.
There is some code duplication which will hopefully resolved
when all "submatch"-style functions are changed to accept the
locator argument.
Some more cleanup can take place when config(8) issues more
information about locators, in particular the length and default
values. To be done later.
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.
Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.
Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86
Clean up some crappy pointer frobnication.
* Process A is closing one file descriptor belonging to a device. In doing so,
ffs_update() is called and starts writing a block synchronously. (Note: This
leaves the vnode locked. It also has other instances -- stdin, et al -- of
the same device open, so v_usecount is definitely non-zero.)
* Process B does a revoke() on the device. The revoke() has to wait for the
vnode to be unlocked because ffs_update() is still in progress.
* Process C tries to open() the device. It wedges in checkalias() repeatedly
calling vget() because it returns EBUSY immediately.
To fix, this:
* checkalias() now uses LK_SLEEPFAIL rather than LK_NOWAIT. Therefore it will
wait for the vnode to become unlocked, but it will recheck that it is on the
hash list, in case it was in the process of being revoke()d or was revoke()d
again before we were woken up.
* Since we're relying on the vnode lock to tell us that the vnode hasn't been
removed from the hash list *anyway*, I have moved the code to remove it into
the DOCLOSE section of vclean(), inside the vnode lock.
In the example at hand, process A was sh(1), process B was a child of init(8),
and process C was syslogd(8).
from Stephan Uphoff, FreeBSD PR/69964.
(http://www.freebsd.org/cgi/query-pr.cgi?pr=69964)
> The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block
> other threads.
> Normally this is not a problem since the mini locks are upgraded to full loc
> and the release of the locks will unblock the other threads.
> However if a thread reset the bits without optaining a full lock
> other threads are not awoken.
> This can happens if obtaining the full lock fails because of a LK_SLEEPFAIL,
> or a signal (if lock priority includes PCATCH .. don't think this is used).
ensure that no one else have the same lock.
a patch from Stephan Uphoff, FreeBSD PR/69934.
(http://www.freebsd.org/cgi/query-pr.cgi?pr=69934)
> Upgrading a lock does not play well together with acquiring
> an exclusive lock and can lead to two threads being
> granted exclusive access.
>
> Problematic sequence:
> Thread A acquires a previous unlocked lock in shared mode.
> Thread B tries to acquire the same lock in exclusive mode
> and blocks.
> Thread A upgrades its lock - waking up thread B.
> Thread B wakes up and also acquires the same lock as it only checks
> if the lock is not shared or if someone wants to upgrade the lock
> and not if someone already upgraded the lock to an exclusive lock.
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.
This areas is called the comm pages. It is used to provide fast access to
several data and functions.
The comm pages are mapped starting at 0xffff800 (address chosed so that
absolute branch can be used, so it can be accessed even when dynamic linking
is not ready). NetBSD has the user stack here, so we need to provide a
Darwin-specific stack setup routine which sets the top of the stack at
0xbfff0000.
This implementation is not complete but it does enough to get MacOS X.3
starting again (static binaries run, dynamic binaries still have an issue).
in the comm pages functions, we only implement bcopy, pthread_self and
memcpy.
TODO:
- clean up the powerpc specific code from MD parts
- for now we map only one page to avoid a crash, we want two pages.
- write all the comm functions.