cmsg->cmsg_len is 'size_t' not 'socklen_t' - so it is 8 bytes on 64bit
platforms instead of 4. There is also never padding after the header.
Redo linux sendmsg() so that it stands a chance of getting it right.
Redo linux recvmsg() so that it process control data directly from the mbuf
list. Allowing it to hack the data without using the stackgap.
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
sys_stat() and friends, instead use do_sys_stat() and do_sys_fstat()
that write the answer into a kernel buffer (on stack) that can be
converted to the correct form and written the userspace.
I've test compiled a few kernels, and tested i386 netbsd1.6 ls.
Given I think I've fixed some bugs, it might be 50-50 with new ones.
both NPTL and old linuxthreads behaviour depending on process needs.
Apply to exit_group(), getpid() and getppid() to share them between
compat linux32 (non NPTL) and compat linux (NPTL) on amd64.
ok by manu and christos
was "correct" by luck - we don't have any other file type whose S_IF* bits
in sys/stat.h overlap with S_IFIFO.
Originally discovered by Paul Stoeber in OpenBSD.
- Fix shmat return value on amd64: it uses no black magic with retval[0]
- Fix integer overflows in sysinfo
- Implement sysinfo, mmap2, sched_getparam, sched_getscheduler, mremap,
and madvise in COMPAT_LINUX32
- Fix improper types used in setgroups16/getgroups16
- Implement mmap2 for COMPAT_LINUX32
- Ifdef debug messages by DEBUG_LINUX
Members of the thread group must die without reporting to the parent and
without going to zombie stage. We do that by reparenting to init before
catching a SIGKILL. The parent will not see the child death.
The thread group leader must report the exit status, even if it exits
because of another thread calling exit_group(). We do that by storing the
exit status in struct linux_emuldata_shared, and the exit hook has the
duty of setting struct proc's p_xstat for the thread group leader.
2) For exit/fork/exec hooks, move the NPTL specific code to separate functions
that are shared between COMPAT_LINUX and COMPAT_LINUX32
3) Fix LINUX_CLONE_PARENT_SETTID semantics
least Linux 2.4.31, Irix 6.5.20 and Solaris 10) use EAFNOSUPPORT.
Only the Linux emulation has been tested.
XXX somebody should audit the other emulations...
EPROTONOSUPPORT instead of EAFNOSUPPORT.
from pavel@ with a little bit of clean up from myself.
XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.
threads in a processes and kill them properly. The code is a bit too
complicated, but I could not find a simplier way of dealing with it
- Change getpid() and getppid() semantics to match what Linux does,
and implement gettid(). In the Linux kernel, threads are implemnted
as plain old processes. A thread group is just a set of processes,
with the parent called leader. Thread ID, which are returned by gettid(),
are just the PID of the plain old processes, and getpid() returns the
PID of the thread group leader.
- Remove struct linux32_emuldata. COMPAT_LINUX32 uses a lot of COMPAT_LINUX
code, where a struct linux_emuldata is assumed. By having distinct emuldata
structure with different sizes and layouts, we caused kernel memory
corruptions.
- Fix setprioriry() and getpriority()
Thanks to Nicolas Joly for tracking down the problem and providing me the
hardware to fix them.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
Linux kernel-version as on i386 and ppc (currently 2.4.18), and a date
in Feb 2002.
On all other NetBSD platforms we return a Linux-kernel version of
2.0.38 and a date sometime in 2000, which (AFAIK) predates the
existence of amd64, and therefore predates Linux support for amd64.
To me, it makes much more sense to return the same Linux-kernel-version
and date for both 32-bit x86 and 64-bit x86.
Empirically (and not least), this change also allows SuSE 10 amd64
binaries to run under our Linux amd64 binary emulation (both static
and dynamic-linked, given suitable setup) , which they didn't when we
reported a Linux/x86_64 kernel version of 2.0.38.
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
to a header where they can be shared between COMPAT_LINUX and COMPAT_LINUX32
- Add termios ioctl emulation to COMPAT_LINUX32
- Add the getcwd system call to COMPAT_LINUX32/amd64
That makes Linux's bash working with COMPAT_LINUX32.
was developed as part of Google's Summer of Code 2005 program. This
change adds the kernel code, the mount_tmpfs utility, a regression test
suite and does all other related changes to integrate these.
The file-system is still *experimental*. Therefore, it is disabled by
default in all kernels. However, as typically done, a commented-out
entry is added in them to ease its setup.
Note that I haven't commited the required mountd(8) changes to be able
to export tmpfs file-systems because NFS support is still very unstable
and because, before enabling it, I'd like to do some other changes.
OK'ed by my project mentor, William Studenmund (wrstuden@).
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
but amd64, it just returns 0, doing nothing.
For amd64, it implements vsyscalls through cheating: if the faulting
address is in the vsyscall area (which is statically known on Linux/amd64),
and the intruction pointer is too, it must have been a vsyscall. In that
case, retrieve the return address from the user stack, fix up %rip and
%rsp, and just execute the normal system call. It will return as if
the vsyscall has been executed.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
supported options can't get out of sync. This add support for the
linux __WCLONE and __WALL options (NetBSD version: WALTSIG and WALLSIG)
Add a diagnostic check to see if the one unhandled option (__WNOTHREAD) is
specified.
This should prevent linux processes from losing their children and creating
tons of zombie processes.
segment should succeed even if the segment would be marked removed; use this
to implement the Linux-compatible semantics of shmat(2)
this fixes the old Linux VMware3 graphics problem with local display,
and possibly other local Linux X clients using MIT-SHM
for Linux-compatible shmat() behaviour - shmat() for the removed shared memory
segment must work from all callers, the shared memory id could be passed e.g.
to native X server via MIT-SHM
temporarily remove the functionality, the Linux-compatible semantics
will be reimplemented differently
provide f_frsize. It cannot be actually used to GNU C statvfs() bug
in f_frsize != f_bsize case, so just keep pretending we don't support it.
Update comments and explain the situation in detail there.
explicit size types - the structure definition is actually identical
on currently support COMPAT_LINUX archs, so no point to have 6 copies of it
in the tree
- filesystem size is expressed in number of fragments, not blocks;
this fixes computed filesystem sizes for Linux df(1) and other Linux
binaries using statfs(2) for filesystems, which use different value
for frament and block, such as FFS
- use FS f_namemax instead of always using MAXNAMLEN
- print the socketcall type
- special case socket(2) call, it's also the only one with first argument
not being a socket descriptor
- only dump the relevant part of linux_socketcall_dummy_args, instead
of always the whole structure
grow-down auto extend segment) by allocating segment sized at
current stack size limit, and offsetting requested/returned address
as required
due to how normal virtual memory management work, allocating the
full sized stack memory segment up-front actually requires exactly same
amount of VA space and physical memory as the Linux 'grow' scheme and the
'grow' scheme is quite a lot more difficult to use in applications correctly,
so it's not very apparent why Linux introduced this feature at all
this fixes Thomas Klausner's Heroes3 crash, and might also
fix PR 26687 by Jan Schaumann
rather than EPERM; to emulate this properly, translate the error to EISDIR
if the target patch exists and points to a directory
this fixes the 'ant clean' problem reported by Marc Recht on current-users@
with SuSE 9.1 libraries
share same 'break' value used for brk()/sbrk(), otherwise application SIGSEGVs
quickly once different threads try to adjust data segment size
this fixes linux Mozilla crashes with SuSE 9.1 libraries, and possibly
other linux applications using real threads
otherwise, linux_syscall() returns garbage, at least on i386.
(it returns native_to_linux_errno[EPASSTHROUGH] where EPASSTHROUGH == -4.)
i choose EINVAL rather than ENOTTY, because linux's pipe returns it
and i think that it's a common case.
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).