- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.
- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.
- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
For linux_sys_sched_setaffinity, pid == 0 means the current thread.
On the other hand, for our native sys_sched_setaffinity, lid == 0
means all lwp's that belong to the process.
exist in different namespace depending on FUTEX_PRIVATE_FLAG. This appears
not to be the case in Linux, and some futex users will mix private and non-
private ops on the same futex object. Provide a convenience wrapper that
puts this logic in one place witn a comment explaining why.
While here, move the Linux futex wrapper out of its own file and plop
it in linux_misc.c, which is where it lives in the linux32 module.
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.
In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.
Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.
Nudged in this direction by ad@ and chs@.
which relied on taking extra vnode refs.
Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.
The uninitalized field in this structure is `fillers`, an array that
simply reserves space for later changes in OSSv4, which this version
of the OSS compat layer (specifically for Linux applications) makes no
effort to implement.
when allocating a PID.
- Per above, proc_free_pid() no longer decrements nprocs. It's now done
in proc_free() right after proc_free_pid().
- Ensure nprocs is accessed using atomics everywhere.
The OSSv4 spec says we shouldn't really error if an invalid format is
chosen by an application. Things are especially likely to be confused
if we return MULAW, since in OSSv4 terms that means that's the native
hardware format. Instead, set and return the current hardware format
if an invalid format is chosen.
For the 24-bit sample formats, note that the NetBSD kernel currently
can't handle them in its default configuration, and will return an error
code if you attempt to use them. So, if an applicaton requests 24-bit PCM,
promote it to 32-bit PCM. According to the spec, this is valid and
applications should be checking the return value anyway.
In the Linux compat layer, we just use S16LE as a fallback. The OSSv3
headers that are still being shipped with Linux don't contain definitions
for fancier formats and we can reasonably expect all applications to
support S16LE.
From the perspective of reading the OSSv4 specification, NetBSD's
behaviour when an invalid sample rate is set makes no sense at all:
AUDIO_SETINFO simply returns an error code, and then we immediately
fall through to getting the sample rate, which is still set to the
legacy default of 8000 Hz.
Instead, what OSS applications generally expect is that they will be
able to receive the actual hardware sample rate. This is very, very
unlikely to be 8000 Hz on a modern machine.
No functional change when setting a sample rate between the supported
rates of 1000 and 192000 Hz. When a rate outside this range is requested,
the hardware rate is returned (on modern hardware, generally always 48000
Hz or a multiple of 48000 Hz).
identifier uniquely identifies an LWP across the entire system, and will
be used in future improvements in user-space synchronization primitives.
(Test disabled and libc stub not included intentionally so as to avoid
multiple libc version bumps.)
the set-up functions will be called, so it is perfectly acceptable
for a compat code's routine to be called ahead of the code in other
parts of the kernel.
So make sure that the 2nd level sysctl node ``vfs.generic'' exists
before trying to add the 3rd level entries.
XXX Rather than creating the 2nd level node in two places, we could
XXX add the shared ``vfs.generic'' node to sysctl_init_base.c but
XXX this is left for another day.
automate installation of sysctl nodes.
Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
of the structs themselves, so they need special handling. Undo previous
and do the permissions checks explicitly. It would be better to fix the
clockctl ioctls to contain the structs themselves...
functions: preempt_point() and preempt_needed().
- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
the module_hook code. Otherwise, if the syscall just happens to be exit()
we will exit while still holding a reference to the hook's localcount, and
nothing will ever release that reference. Attempts to manually unload the
module will hang indefinitely, as will modstat(8).
XXX pullup-9
of that syscall will return ERESTART. For amd64's netbsd32_syscall()
that means we need to back up the PC saved in the trap frame so we can
re-issue the syscall instruction. For "normal" syscall traps, we saved
the instruction length in the trap frame, but this was missing for the
oosyscall/lcall path. Since the PC was not backed up, the kernel-only
value ERESTART was returned to userland, causing all sort of grief for
old compat_netbsd32 executables!
XXX Pullup-9
Add the two missing errno.h constants: EOWNERDEAD and ENOTRECOVERABLE.
While technically they're used for robust mutexes which we do not
support at the moment, they are listed in POSIX and used by libc++.
While libc++ can be made to build without it, it just locally redefines
the values then, so we may as well define them globally.
by calling their compat_43 equivalents. With these changes, and with
built-in versions of COMPAT_NETBSD32, COMPAT_NOMID, and COMPAT_09, I can
now run a netbsd-0.9 statically linked i386 (32-bit) version of /bin/ls
on a 9.99.x amd64 host!
Addresses PR kern/55047 but more changes coming to handle non-built-in
modules.
XXX pullup-9
- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).
- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.
- Make cwdinfo use mostly lockless.
Exceptions: when fd_refcnt <= 1, or when holding fd_lock.
While here:
- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.
- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.
- Group the locks in struct proc at the end of the struct in their own cache
line.
- Add some comments.
make rump happy.
Rump doesn't have compat modules (the compat code is included in the
relevant librump*.so), so there's no module compat_50 listed in
link_set_modules, and thus ocryptodev's MODULE(...) can't "require"
it.
This fixes the problem of "built-in module compat_50 not found" when
starting up rump_allserver (or rump_server with -l rumpdev_opencrypto).
XXX This does not resolve the long-standing "crypto: unable to
XXX register devsw, error 17" message noted at line 78 of
XXX sys/rump/dev/lib/libopencrypto/opencrypto_component.c
life without its self->pt_lid being filled in.
- Fix an error path in _lwp_create(). If the new LID can't be copied out,
then get rid of the new LWP (i.e. either succeed or fail, not both).
- Mark l_dopreempt and l_nopreempt volatile in struct lwp.
stuff from the rest of the module. This allows loading of the
(main) compat_50 module on kernels that don't include ``options
QUOTA''.
Welcome to 9.99.40 !
Addresses PR kern/54875
getpid(), getuid() and getgid() used to call respectively sys_getpid(),
sys_getuid() and sys_getgid(). In the BSD4.3 compat mode there was a
fallback to call sys_getpid_with_ppid() and related functions.
In 2008 the compat ifdef was removed in sys/kern/syscalls.master r. 1.216.
For purity reasons we probably shall restore the NetBSD original behavior
and implement BSD4.3 one as a compat module, however it is not worth the
complexity.
Align the netbsd32 compat ABI to native ABI and call functions that return
two integers as in BSD4.3.
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).
XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
the system is shutting down or rebooting.
- Set this global in a new function called kern_reboot(), which is currently
just a basic wrapper around cpu_reboot().
- Call kern_reboot() instead of cpu_reboot() almost everywhere; a few
places remain where it's still called directly, but those are in early
pre-main() machdep locations.
Eventually, all of the various cpu_reboot() functions should be re-factored
and common functionality moved to kern_reboot(), but that's for another day.
with a zero page as argument.
MSan: Uninitialized Stack Memory In copyout() At Offset 0, Variable 'sb32' From compat_20_netbsd32_getfsstat()
MSan: Uninitialized Stack Memory In copyout() At Offset 12, Variable 'oss' From compat_43_sys_sigstack()
MSan: Uninitialized Stack Memory In copyout() At Offset 0, Variable 'sb' From compat_50_netbsd32___fhstat40()
overflow. On my test build at least, by luck, the compiler orders the
variables in a way that the overflow hits only local structures which
haven't yet been initialized and used, so the overflow is harmless.
Very easily seeable with kASan - just invoke the syscall from a 32bit
binary.
PT_LWPINFO is a legacy ptrace(2) operation that was originally intended
to retrieve the thread (LWP) information inside a traced process.
It has a number of flaws and is confused with PT_LWPINFO from FreeBSD.
PT_LWPSTATUS and PT_LWPNEXT address the problems (shortly by: rename,
removal of pl_event) and introduces new features: signal context
(pl_sigpend, pl_sigmask), LWP name (pl_name), LWP TLS base address
(pl_private). The private pointer was so far missing information for
a debugger.
PT_LWPSTATUS@nnn is now shipped with core(5) files and contain LWP specific
information, so far missed in the core(5) files.
PT_LWPSTATUS retrieves LWP information for the prompted thread.
PT_LWPNEXT retrieves LWP information for the next thread, borrowing the
semantics from NetBSD specific PT_LWPINFO.
PT_LWPINFO is namespaced with __LEGACY_PT_LWPINFO and still available for
the foreseeable future, without plans of removing it.
Add ATF tests for PT_LWPSTATUS + PT_LWPNEXT.
Keep ATF tests for PT_LWPINFO.
Switch GDB to new API.
Proposed on tech-kern@.
module hook, we can share a common set of synchronization structures.
This cuts the amount of cacheline_aligned data for these structures by
50%.
Note that we still have a per-hook localcount, since we need to count
individual references.
As discussed with riastradh@
Welcome to 9.99.22 !
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
and remove all #ifdef COREDUMP conditional compilation. Now, the
coredump module is completely separated from the emulation modules, and
they can all be independently loaded and unloaded.
Welcome to 9.99.18 !
Therefore, we must use __attribute__((__aligned__(4))) for them.
netbsd32_{,u}int64 are provided for this purpose. However, we
cannot use it in <compat/sys/siginfo.h> due to circular dependency
b/w <machine/netbsd32_machdep.h>.
In order to distangle it, we choose here to have a duplicate type,
netbsd32_siginfo_uint64, in <compat/sys/siginfo.h>. The equivalence
with netbsd32_uint64 is asserted in <compat/netbsd32/netbsd32.h>.
Now, gdb for i386 works again on amd64 kernel.
Based on patch provided by kamil. Thanks!
XXX
pullup to netbsd-9
as explained in siginfo(2).
- If it is SI_NOINFO, there's no additional information.
- If it is non-positive, i.e., codes described in siginfo(2),
we need to fill in _rt.
XXX
Description for SA_ASYNCIO in siginfo(2) seems outdated;
neither si_fd nor si_band are filled in with that code.
XXX
pullup to netbsd-9
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_sig.c#rev1.358
Provide syscall information with SIGTRAP TRAP_SCE/TRAP_SCX so that
picotrace/truss, for example, works fine on COMPAT_NETBSD32.
With some minor changes:
- Centralize netbsd32_si{,32}_si{32,}() into netbsd32_ksi{,32}_ksi{32,}().
- Provide si_status with SIGCHLD.
- Remove the remaining of SA.
XXX
pullup to netbsd-9
this is needed so that glibc falls back to emulation and apps behaving
properly, since EOPNOTSUPP is a documented and expected return code, but
ENOSYS is not
right now there are no filesystems in NetBSD tree supporting the fallocate
VOP, so no point trying to map this to a native call
supposed to help with problem reported in
https://mail-index.netbsd.org/tech-kern/2019/11/03/msg025641.html