merge the two emul_irix structures; the only difference was
setregs function, which can be handled by exec-specific setregs hook
rename setregs_n32() to irix_n32_setregs(), and make it suitable
as the exec-specific setregs hook
make irix_check_exec() a macro now that just single compare
it checks both the alternative/emul tree, and the non-emul tree.
This makes it possible to run chrooted emulated binaries without need
to setup shadow /emul tree within the chroot hierarchy.
Only tested for COMPAT_LINUX, changes to other compat modules were
mechanical.
Fixes kern/19161 by Christian Groessler.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
private to the process within the share group.
There is one bit missing in this implementation: when replicating a change
in a process VM to the other process of the share group, we avoid copying
mappings for private regions in the target process, but we don't prevent
copying private regions from the source process.
it actually fixes a problem:
When /bin/sh gets a SIGSEGV, its signal handler calls brk and the offending
instruction is retried. Usually it gets another SIGSEGV, and things loops
until it pases without the SIGSEGV. This is the normal mode of operation, and
it can be reproduced on IRIX by a 10kB shell script starting by echo /*
However... the signal handler checks for BADVADDR in the saved registers
in struct sigcontext. If it does not find it, it gives up and exit instead
of retrying. Filling the field enables us to carry on normal operation
(which is to get dozens of SIGSEGV) instead of getting a failure at the
first SIGSEGV.
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.
- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports
- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));
- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
usync_cntl() system calls.
- when usync_cntl is used and the process is aborted (eg: by kill -9)
libc does not call usync_cntl() to unblock things. We have to cleanup
data allocated in the kernel. This is now done through the emulation
specific exit hook
- IRIX initialize some data in the system part of the PRDA: the pid and
a prid (PRDA ID?). We initialize both to pid.
- Move back struct irix_share_group from irix_exec.h to irix_prctl.h, it
is more revelant here.
- fix a few typos
* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.
Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.
private area called PRDA that remains unshared. We implement this by using
different vmspace for each share group member, and keeping the memory
appings in sync on each mmap/munmap/mprotect/break...
We use irix_saddr_sync_vmcmd and irix_saddr_sync_syscall to apply a
vmcmd or a syscall to all share group member, this makes the job a bit
easier.
Also implements {get|set}rlimit{64}.
- First implementation of procblk(). THis is supposed to suspend processes.
We emulate this by sending a SIGSTOP, which is not very accurate since
on IRIX, sending a SIGCONT to a process suspended by procblk() will not
resume it.
- support for shared groups
poll will return true until the semaphore is blocked again, but before the
semaphore is blocked, poll returns false.
We do this by maintaining another queue of "released" processes in
struct irix_usema_rec. Unblocking causes the waiting process record to be
moved to the released queue, and poll check for the process in this released
queue.
a SIGSEGV when sigaction(2) is used before a fork(2) and a signal is received
in the child.
- we now nearly correctly emulate PR_TERMCHILD in prctl(2). (the perfect
emulation would not send a SIGHUP if the parent is killed)
return the number of processes waiting on the semaphore. We now maintiain
a count of waiting processes.
- Blocked processes are unblocked "first in, first out". We now have a
queue of waiting processes on a asemaphores, so that we can wakeup the
first blocked process.
Problems:
- We now have a lot of dynamic memory allocation, it may be a bit slow.
- Nothing is SMP safe for now. We need to add locks.
- On close, we forget about a semaphore, which is incorrect. One process
can close its fd attached on a semaphore, but other processes would carry
on using it. Since any process can join a shared arena, this is not an
easy thing to solve.
- A lot of usema/usync functionnalities are still to be discovered.
successfully emulates a few test program that use poll semaphores,
including the attach-to-file-descriptor-and-select feature.
There are a few issues:
1) at least one ioctl need to set retval. We handle this in irix_sys_ioctl()
by replacing the data argument by a pointer to a strucutre in the stackgap
that carries the real data and retval. The underlying ioctl methods can
therefore retreive both data and retval.
2) usemaclone is a cloning device: each time it is open, it creates a new
context, and ioctl operation on each open file descriptor will lead to
different behavior. This functionnality is available in NetBSD through the
devvp branch. This first implementation does not use devvp yet, but this
should be done later. Currently, we create a new vnode, and we provide our
own vnode operations. Some operation are applied to the cloned vnode, others
are applied to the original vnode. The v_data field is used to hold a
reference to the original vnode so that we can work on it.
3) at least the setattr vnode operation needs some customisation: IRIX
libc relies on the fact that fchmod on /dev/usema will return 0 in case
of failure.
- initial support for MAP_AUTOGROW flag. When mapping beyond the end of file is
requested with MAP_AUTOGROW, if pages beyond the end of file are touched, the
file should be resized. We are not able to emulate this yet, so we immediatly
resize the file to fit the whole mapping.
- implements mmap64
address 0x200000 (disasembling usinit shows that this address is hardcoded in
libc). It uses it for locks and semaphres.
We therefore allocate this page of memory, to prevent IRIX process from
faulting when thay call usinit(3).
- the signal trampoline address is given to the kernel by a sigaction()
fourth argument
- we introduce an irix_emuldata structure to keep track of the signal
trampoline address
- we don't support per-sigaction signal trampolines, we only do per-process
- now that we use the IRIX libc signal trampoline, we do not have to handle
the errno update from the signal trampoline
- it is possible that IRIX 5 signal delivery works too, since theses binaries
will come with their own signal trampoline
when SA_SIGINFO is used. The IRIX process will hence find the expected
information using the third argument of the signal handler.
We do not provide code and siginfo yet.
swapent, or as seen in userland, is dbtob(1), which turns to be 512 for all
arch for now.
In struct swapdev, there is another field for block size. This value is private
to uvm_swap.c and is only used for swap I/O on regular files. It is equal to
the underlying device block size and it is not necessarily 512.
- Added two more swapctl commands: GETFREESWAP and GETSWAPVIRT.
There is a problem in the way swap block size are found here. See comment
in get_block_size().
sys_swapctl(SWAP_STATS). This enable the use of a kernel based
buffer instead of using some temporary memory in the stackgap,
whereas we cannot make sure that the size os the struct swapent array
will fit in it. (it is not known at build time, but the stackgap len
is set at build time).
look for a block of free virtual memory big enough to hold all sections. The
blocks starts at the beginning of the first section and ends at the end of
the last section. In the previous version the block ended at the beginning
of the last section, hence creating situations where there was not enough
free space to map the section.
structure for the mountid, but it is 32/64 bits long only, whereas
mountid is 128 bits long. Because we did not initialize the unused bits to
zero, the mountid was not always unique within a filesystem.
This makes autocad 1.3 able to start up.
makes X11 binaries able to actually work: most of them were previously hang
in infinite loop wiaiting for data from the X server because SIOCNREAD
reported that some data where to be read whereas the X server had nothing
to say.
Tested (and works): xlogo, xterm, ghostview (IRIX build). Things are getting
interesting...
header to distinguish between o32, n32 and n64 ABIs. We now use this.
This suppress the need of the mips_option test, which had some fake positive.
This also removes the mandatory ordering of n32 vs o32 in the exec switch
(exec_conf.c)
- do not save/and restore registers that should not be saved and restore
- do give an accurate sigcontext pointer to the signal handler
- do use the struct sigreturna from IRIX.
This eliminates panics and hangs in certain circonstances
Also some cosmetic changes with tabs usage
that the load addresses in the section array are increasing, and that no
section in the array overlap with each other. IRIX proably makes the same
assumptions, but this has not been tested.
The key point with relocation is to always use the same offset for each
section. Because userland gets only the load address off the first section, it
has to assume that all the remaining sections kept the same offset with
respect to the first section. By using fixed offset instead of finding
some free space for each section, we can eliminate the libX11 load hack.
even for program with a DSO using overlapping load virtual addresses.
The fix is a mean hack, see the comments in irix_syssgi.c. It would be nice to
get uvm_map_findspace() to return the page we suggest instead of the page which
is 16384 bytes away.
we used load_psection, then ran each vmcmd and tried to relocate the failing
ones. This fails if there is two vmcmd for one section, and the second is
not a mapping (for instance a map_pagedvn and a map_zero), because the first
one gets relocated, but not the second one.
Additionnaly, it was not necessary to update the userlevel psection array:
libc stubs seems to do the job themselves.
for the text section of libx11.so was overlapping with other ELF sections
aloready loaded, and this resulted into an ENOMEM error.
syssgi(MAPELF) uses elf32_load_psection() from syssrc/sys/kern/exec_subr.c
The problem was never experienced with load_psection() because it only has
to load one section, hence the requested address are not already allocated.
The fix is done when the initial mapping at the default address fails by
finding a free location in the VM space using uvm_map_findspace(), and then
retrying to load the section.
Other details:
- once the ELF section has been relocated, the ELF program header must be
updated with the new address and copied back to userland. For now we always
do it, maybe we could copy it only when it was modified.
- We are able to emulate the exact address where IRIX loads libX11.so instead
of the default location
Service happy. Code in libc attempts to open files in the ns filesystem, and
then uses getmountid on failure to ensure that the ns filesystem is really
mounted. We don't emulate the ns filesystem yet, but getmountid now correctly
reports that ns is not present.
Note: It seems that the mountid of the ns filesystem should always be
00000005 00000000 00000000 7fff3000