but only the processes that emulate threads (with LINUX_CLONE_THREAD
set).
This fix a problem for child processes that share address space with
the parent. At exit, the child will die silently, leaving the parent
waiting indefinitely for its end ...
Make the FreeBSD and Linux compat code convert the parameters to their
native representation and call the native routines.
Remove KAUTH_PROCESS_SCHEDULER_GET/SET.
Update documentation and examples.
XXX: For now, only the Linux compat code does the priority conversion
XXX: right.
Linux priority conversion code from yamt@, thanks!
Okay yamt@.
words, don't pass an action and a request, and just use a single action to
indicate what is the operation in question.
This is the first step in fixing PR/37986, which calls for policy/priority
checking in the secmodel code. Right now we're lacking room for another
parameter required to make a decision, and this change makes room for such.
The 64bit linux emulation versions can't be used because the lock structure
alignment and field sizes all differ.
Since there need to be 4 different versions of the linux struct flock, and
amd64 kernel needs 3 of them compiled in, rather than replicating the same
code block twice more, move the body of the code into a few #defines
that can be expanded with the correct types in the linux[32]_sys_fcntl[64]()
functions.
Should fix problems running progams like skype running under linux32
emulation on amd64.
Unravel some of the knots that caused linux_file64.c to be compiled twice
for an amd64 kernel (once for linux and once for linux32) with different
parts being skipped each time.
- SHM_STAT take an index as input, and return the corresponding shmid.
- IPC_INFO and SHM_INFO returns the highest used index.
- SHM_INFO expected the total used pages (not bytes) in shm_tot field
of struct shm_info.
- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.
- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.
- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.
- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).
- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.
- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.
- Make bsd44 secmodel code handle the newly added rqeuests appropriately.
All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.
- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.
Discussed with christos@ and yamt@.
- Move uid16 functions to their own file linux_uid16.c, included by
needed archs (arm, i386 and m68k).
- Add new MI types linux_{u,g}id16_t.
- Add macros to handle linux_uid16_t and uid_t conversions.
- Add linux_sys_getres{uid,gid}16 syscalls, to fix an overflow with
bad sizes given to copyout when linux_sys_getres{uid,gid} are used.
- Update arm syscall table to use more uid16 functions.
- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
cmsg->cmsg_len is 'size_t' not 'socklen_t' - so it is 8 bytes on 64bit
platforms instead of 4. There is also never padding after the header.
Redo linux sendmsg() so that it stands a chance of getting it right.
Redo linux recvmsg() so that it process control data directly from the mbuf
list. Allowing it to hack the data without using the stackgap.
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
sys_stat() and friends, instead use do_sys_stat() and do_sys_fstat()
that write the answer into a kernel buffer (on stack) that can be
converted to the correct form and written the userspace.
I've test compiled a few kernels, and tested i386 netbsd1.6 ls.
Given I think I've fixed some bugs, it might be 50-50 with new ones.
both NPTL and old linuxthreads behaviour depending on process needs.
Apply to exit_group(), getpid() and getppid() to share them between
compat linux32 (non NPTL) and compat linux (NPTL) on amd64.
ok by manu and christos
was "correct" by luck - we don't have any other file type whose S_IF* bits
in sys/stat.h overlap with S_IFIFO.
Originally discovered by Paul Stoeber in OpenBSD.
- Fix shmat return value on amd64: it uses no black magic with retval[0]
- Fix integer overflows in sysinfo
- Implement sysinfo, mmap2, sched_getparam, sched_getscheduler, mremap,
and madvise in COMPAT_LINUX32
- Fix improper types used in setgroups16/getgroups16
- Implement mmap2 for COMPAT_LINUX32
- Ifdef debug messages by DEBUG_LINUX
Members of the thread group must die without reporting to the parent and
without going to zombie stage. We do that by reparenting to init before
catching a SIGKILL. The parent will not see the child death.
The thread group leader must report the exit status, even if it exits
because of another thread calling exit_group(). We do that by storing the
exit status in struct linux_emuldata_shared, and the exit hook has the
duty of setting struct proc's p_xstat for the thread group leader.
2) For exit/fork/exec hooks, move the NPTL specific code to separate functions
that are shared between COMPAT_LINUX and COMPAT_LINUX32
3) Fix LINUX_CLONE_PARENT_SETTID semantics
least Linux 2.4.31, Irix 6.5.20 and Solaris 10) use EAFNOSUPPORT.
Only the Linux emulation has been tested.
XXX somebody should audit the other emulations...
EPROTONOSUPPORT instead of EAFNOSUPPORT.
from pavel@ with a little bit of clean up from myself.
XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.
threads in a processes and kill them properly. The code is a bit too
complicated, but I could not find a simplier way of dealing with it
- Change getpid() and getppid() semantics to match what Linux does,
and implement gettid(). In the Linux kernel, threads are implemnted
as plain old processes. A thread group is just a set of processes,
with the parent called leader. Thread ID, which are returned by gettid(),
are just the PID of the plain old processes, and getpid() returns the
PID of the thread group leader.
- Remove struct linux32_emuldata. COMPAT_LINUX32 uses a lot of COMPAT_LINUX
code, where a struct linux_emuldata is assumed. By having distinct emuldata
structure with different sizes and layouts, we caused kernel memory
corruptions.
- Fix setprioriry() and getpriority()
Thanks to Nicolas Joly for tracking down the problem and providing me the
hardware to fix them.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
Linux kernel-version as on i386 and ppc (currently 2.4.18), and a date
in Feb 2002.
On all other NetBSD platforms we return a Linux-kernel version of
2.0.38 and a date sometime in 2000, which (AFAIK) predates the
existence of amd64, and therefore predates Linux support for amd64.
To me, it makes much more sense to return the same Linux-kernel-version
and date for both 32-bit x86 and 64-bit x86.
Empirically (and not least), this change also allows SuSE 10 amd64
binaries to run under our Linux amd64 binary emulation (both static
and dynamic-linked, given suitable setup) , which they didn't when we
reported a Linux/x86_64 kernel version of 2.0.38.
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
to a header where they can be shared between COMPAT_LINUX and COMPAT_LINUX32
- Add termios ioctl emulation to COMPAT_LINUX32
- Add the getcwd system call to COMPAT_LINUX32/amd64
That makes Linux's bash working with COMPAT_LINUX32.
was developed as part of Google's Summer of Code 2005 program. This
change adds the kernel code, the mount_tmpfs utility, a regression test
suite and does all other related changes to integrate these.
The file-system is still *experimental*. Therefore, it is disabled by
default in all kernels. However, as typically done, a commented-out
entry is added in them to ease its setup.
Note that I haven't commited the required mountd(8) changes to be able
to export tmpfs file-systems because NFS support is still very unstable
and because, before enabling it, I'd like to do some other changes.
OK'ed by my project mentor, William Studenmund (wrstuden@).
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
but amd64, it just returns 0, doing nothing.
For amd64, it implements vsyscalls through cheating: if the faulting
address is in the vsyscall area (which is statically known on Linux/amd64),
and the intruction pointer is too, it must have been a vsyscall. In that
case, retrieve the return address from the user stack, fix up %rip and
%rsp, and just execute the normal system call. It will return as if
the vsyscall has been executed.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
supported options can't get out of sync. This add support for the
linux __WCLONE and __WALL options (NetBSD version: WALTSIG and WALLSIG)
Add a diagnostic check to see if the one unhandled option (__WNOTHREAD) is
specified.
This should prevent linux processes from losing their children and creating
tons of zombie processes.
segment should succeed even if the segment would be marked removed; use this
to implement the Linux-compatible semantics of shmat(2)
this fixes the old Linux VMware3 graphics problem with local display,
and possibly other local Linux X clients using MIT-SHM
for Linux-compatible shmat() behaviour - shmat() for the removed shared memory
segment must work from all callers, the shared memory id could be passed e.g.
to native X server via MIT-SHM
temporarily remove the functionality, the Linux-compatible semantics
will be reimplemented differently
provide f_frsize. It cannot be actually used to GNU C statvfs() bug
in f_frsize != f_bsize case, so just keep pretending we don't support it.
Update comments and explain the situation in detail there.
explicit size types - the structure definition is actually identical
on currently support COMPAT_LINUX archs, so no point to have 6 copies of it
in the tree
- filesystem size is expressed in number of fragments, not blocks;
this fixes computed filesystem sizes for Linux df(1) and other Linux
binaries using statfs(2) for filesystems, which use different value
for frament and block, such as FFS
- use FS f_namemax instead of always using MAXNAMLEN
- print the socketcall type
- special case socket(2) call, it's also the only one with first argument
not being a socket descriptor
- only dump the relevant part of linux_socketcall_dummy_args, instead
of always the whole structure
grow-down auto extend segment) by allocating segment sized at
current stack size limit, and offsetting requested/returned address
as required
due to how normal virtual memory management work, allocating the
full sized stack memory segment up-front actually requires exactly same
amount of VA space and physical memory as the Linux 'grow' scheme and the
'grow' scheme is quite a lot more difficult to use in applications correctly,
so it's not very apparent why Linux introduced this feature at all
this fixes Thomas Klausner's Heroes3 crash, and might also
fix PR 26687 by Jan Schaumann
rather than EPERM; to emulate this properly, translate the error to EISDIR
if the target patch exists and points to a directory
this fixes the 'ant clean' problem reported by Marc Recht on current-users@
with SuSE 9.1 libraries
share same 'break' value used for brk()/sbrk(), otherwise application SIGSEGVs
quickly once different threads try to adjust data segment size
this fixes linux Mozilla crashes with SuSE 9.1 libraries, and possibly
other linux applications using real threads
otherwise, linux_syscall() returns garbage, at least on i386.
(it returns native_to_linux_errno[EPASSTHROUGH] where EPASSTHROUGH == -4.)
i choose EINVAL rather than ENOTTY, because linux's pipe returns it
and i think that it's a common case.
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).