* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.
* availability of POSIX Synchronized I/O (kern.synchronized_io),
* maximum number of iovec structures to be used in readv(2) etc. (kern.iov_max)
via sysctl().
* if synchronized I/O file integrity completion of read operations was
requested, set IO_SYNC in the ioflag passed to the read vnode operator.
* if synchronized I/O data integrity completion of write operations was
requested, set IO_DSYNC in the ioflag passed to the write vnode operator.
means we're in interrupt context. Since we can be called from a network
hardware interrupt, we could corrupt the protocol queues we try to drain
them at that time.
called when devices attach, take two.
Note that it is necessary that mbinit() NOT allocate memory, since it
is called before mb_map is created. This is not a problem with the
pool allocator that is now used for mbufs and mbuf clusters.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.
There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.
There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).
- If either an alloc or release function is provided, make sure both are
provided, otherwise panic, as this is a fatal error.
- If using the default allocator, default the pool pagesz to PAGE_SIZE,
since that is the granularity of the default allocator's mechanism.
- In the default allocator, use new functions:
uvm_km_alloc_poolpage()/uvm_km_free_poolpage(), or
kmem_alloc_poolpage()/kmem_free_poolpage()
rather than doing it here. These functions may use pmap hooks to
provide alternate methods of mapping pool pages.
- pread() (#173) and pwrite() (#174), which are defined by XPG4.2. System
call numbers match Solaris.
- preadv() (#289) and pwritev() (#290), which are the positional cousins
of readv() and writev(), but not defined by any standard.
* we already have the vnode interlock, so vref() should not ask for it again.
* we call VOP_RECLAIM/VOP_INACTIVE(), which shouldn't be duplicated in vrele().
the new proc structure when performing a fork. This makes it much
easier to abort a fork operation and return an error if we run out
of KVA space.
The U-area pages are still wired down in {,u}vm_fork(), as before.
readlink() from type `int' to type `size_t'. This isn't an ABI change, since
the calling convention of our only LP64 platform (the Alpha) already promotes
this argument to a `long'.
This may not be the final action on this matter; readlink() still returns
an `int', which may change in a future revision of the standard.
the kernel's pmap, since proc0 (and other that share its address space)
are kernel-only processes, and should never contain userspace mappings.
This makes it easier to detect errors, like entering user mappings
for kernel processes, in pmap modules, and makes some sense, considering
that kernel processes are really just "thread contexts" for the kernel.
again - the facility required in this context would be a filesystem-specific
super-user determination, which is not available yet. Also, add some
clarification to a comment.
vectors; defer that to vfs_opv_init().
Change the interface to vfs_opv_init() and export it; it now takes a
pointer to an array of vnodeopv_desc *'s to initialize. Allocate
the vnode operations vectors here. Called by vfs_attach().
Implement vfs_opv_free(), which deallocates the vnode operations
vectors. Called by vfs_detach().
Change vfsinit() to build the initial vfs_list by traversing the
vfs_list_inital[] table, and vfs_attach()'ing those file systems.
Also, initialize special vnodeopv_descs (dead, fifo, spec) which
are not associated with any particular file system.
change_owner().
* Change the semantics of chown(), fchown() and lchown(): when requesting a
change of the owner of a file, clear the set-user-id bit; analogous behaviour
for group changes.
* Since the above is a violation of the semantics specified by POSIX and
X/Open, add corresponding compatibility syscalls: __posix_chown(),
__posix_fchown(), __posix_lchown(). (Neither fchown() nor lchown() is
specified by POSIX; the prefix is intended to reflect the semantics.)
* Rename posix_rename() to __posix_rename() to follow the above convention.
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.
Submitted by Tom Proett <proett@nas.nasa.gov>.
UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.
this is the rest of the MI portion changes.
this will be KNF'd shortly. :-)
down "Data modified on freelist" and "muliple free" problems.
The log is activated by the MALLOCLOG option, and the size of the
event ring buffer is controlable via the MALLOGLOGSIZE option (default
is 100000 entries).
From Chris Demetriou, cleaned up a little by me per suggestions in the
e-mail from Chris that contained the code.
called "MACHINE_NEW_NONCONGIG". this is required for UVM, the new VM
system (also written by chuck) that is coming soon. adds new functions:
vm_page_physload() -- tell the VM system about an area of memory.
vm_physseg_find() -- returns index in vm_physmem array that this
address is in.
and several new versions of old functions/macros defined in vm_page.h.
this is the MI portion. sparc, and then later i386 portions to come.
all other ports need to change to this ASAP! (alpha is already being
worked on)
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.
Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).
Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)
so_linger is used as an argument to tsleep(), so was stuffed with
clockticks for the TCP linger time. However, so_linger is set directly from
l_linger if the linger time is specified, and l_linger is seconds (although
this is not currently documented anywhere). Fix this to set the TCP
linger time in seconds, and multiply so_linger by hz when tsleep() is
called to actually perform the linger.
3BSD vfork(2), i.e. share address space w/ parent and block parent.
Keep statistics on the total number of forks, the number of forks that
block the parent, and the number of forks that share the address space
with the parent.
to be of type size_t; since this imposes an interface change on the Alpha
(sizeof(int) != sizeof(size_t)), allocate a new system call number and make
the previous version a compatibility system call.
whenever the %: format is used on NetBSD/Alpha. Disable %: for __alpha__.
Note: the "correct" (but untested on other architectures) fix is to
change the wrong: kprintf(cp, oflags, tp, NULL, va_arg(ap, va_list));
to the right: kprintf(cp, oflags, tp, NULL, ap);
and swapctl(). For the former three, they use an 'int' in their user-land
prototype which was a 'u_int' in the kernel, which screwed up automatic
generation/checking of lint syscall stubs. For the latter, the user-land
prototype uses a "const char *", but the syscall just used "char *".
From Chris Demetriou <cgd@pa.dec.com>.
floating point stuff removed].
the new kprintf replaces the 3 different (and buggy) versions of
printf that were in the kernel before (kprintf, sprintf, and db_printf),
thus reducing duplicated code by 2/3's. this fixes (or adds) several
printf formats. examples:
%#x - previously only supported by db_printf [not printf/sprintf]
%8.8s - printf would print "000chuck" for "chuck" before
%5p - printf would print "0x 1" for value 1 before
XXX: new kprintf still supports several non-standard '%' formats that
are supposed to eventually be removed:
%: - passes an additional format string and argument list recursively
%b - used to decode error registers
%r - int, but print in radix "db_radix" [DDB only]
%z - 'signed hex' [DDB only]
%n - unsigned int, but print in radix "db_radix" [DDB only]
note that DDB's "%n" conflicts with standard "%n" which takes the
number of characters written so far and stores it into the integer
indicated by the "int *" pointer arg. yuck!
while here, add comments for each function explaining what it is
supposed to do.
to the stat(2) family and msync(2). This uses a primitive function
versioning scheme.
This reverts the libc shared library major version from 13 to 12, and
adds a few new interfaces to bring us to libc version 12.20.
From Frank van der Linden <fvdl@NetBSD.ORG>.