- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.
Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).
Discussed on <tech-kern>, reviewed by <ad>.
It will replace azalia(4) after testing.
To use, comment out azalia in your kernel configuration and uncomment the
hdaudio and hdafg lines so it reads:
# Intel High Definition Audio
hdaudio* at pci? dev ? function ?
hdafg* at hdaudiobus?
You should also:
cd /dev
sh MAKEDEV audio
driver, gpiolock(4), is provided as an example how to interface real hardware.
A new securemodel, securemodel_keylock, is provided to show how this can
be used to tie keylocks to overall system security. This is experimental
code. The diff has been on tech-kern for several weeks.
Reviewed by many, kauth(9) integration reviewed by Elad Efrat; approved by
tonnerre@ and tron@. Thanks to everyone who provided feedback.
tested with a DEBUG+DIAGNOSTIC+LOCKDEBUG kernel. To summerise NiLFS, i'll
repeat my posting to tech-kern here:
NiLFS stands for New implementation of Logging File System; LFS done
right they claim :) It is at version 2 now and is being developed by NTT, the
Japanese telecom company and recently put into the linux source tree. See
http://www.nilfs.org. The on-disc format is not completely frozen and i expect
at least one minor revision to come in time.
The benefits of NiLFS are build-in fine-grained checkpointing, persistent
snapshots, multiple mounts and very large file and media support. Every
checkpoint can be transformed into a snapshot and v.v. It is said to perform
very well on flash media since it is not overwriting pieces apart from a
incidental update of the superblock, but that might change. It is accompanied
by a cleaner to clean up the segments and recover lost space.
My work is not a port of the linux code; its a new implementation. Porting the
code would be more work since its very linux oriented and never written to be
ported outside linux. The goal is to be fully interchangable. The code is non
intrusive to other parts of the kernel. It is also very light-weight.
The current state of the code is read-only access to both clean and dirty
NiLFS partitions. On mounting a dirty partition it rolls forward the log to
the last checkpoint. Full read-write support is however planned!
Just as the linux code, mount_nilfs allows for the `head' to be mounted
read/write and allows multiple read-only snapshots/checkpoint mounts next to
it.
By allowing the RW mount at a different snapshot for read-write it should be
possible eventually to revert back to a previous state; i.e. try to upgrade a
system and being able to revert to the exact state prior to the upgrade.
Compared to other FS's its pretty light-weight, suitable for embedded use and
on flash media. The read-only code is currently 17kb object code on
NetBSD/i386. I doubt the read-write code will surpass the 50 or 60. Compared
this to FFS being 156kb, UDF being 84 kb and NFS being 130kb. Run-time memory
usage is most likely not very different from other uses though maybe a bit
higher than FFS.
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep
Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
option MODULAR to using %MODULAR%. While it is now possible to only
request the new version in the affected Makefiles, it is made mandatory for
everybody because I just fixed a bug in config(1) that would not make it
fail in the case of a syntax error in the Makefile template.
it caused the return from the enclosing function to break, as well as the
ssp return on i386. To fix both issues, split configure in two pieces
the one before calling ssp_init and the one after, and move the ssp_init()
call back in main. Put ssp_init() in its own file, and compile this new file
with -fno-stack-protector. Tested on amd64.
XXX: If we want to have ssp kernels working on 5.0, this change needs to
be pulled up.
- renamed to MEMORY_DISK_RBFLAGS to better fit the rest of the
MEMORY_DISK options(4)
- change default value to RB_AUTOBOOT instead of RB_SINGLE, and adapt
the config(5) files accordingly
- document this option inside options(4)
See also http://mail-index.netbsd.org/tech-kern/2008/12/25/msg003924.html
Reviewed by abs@ in private mail.
magic libc symbol. This also allows to bid farewell to subr_prf2.c
and merge the contents back to subr_prf.c. The host kernel bridging
is now done via rumpuser_putchar().
the base NetBSD system. It uses Linux LVM2 tools and our BSD licensed
device-mapper driver.
The device-mapper driver can be used to create virtual block devices which
maps virtual blocks to real with target mapping called target. Currently
these targets are available a linear, zero, error and a snapshot (this is
work in progress and doesn't work yet).
The lvm2tools adds lvm and dmsetup binary to based system, where the lvm
tool is used to manage and administer whole LVM and the dmestup is used to
communicate iwith device-mapper kernel driver. With these tools also
a libdevmapper library is instaled to the base system.
Building of tools and driver is currently disable and can be enabled with
MKLVM=yes in mk.conf. I will add sets lists and rc.d script soon.
Oked by agc@ and cube@.
into modules. By and large this commit:
- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
own file, subr_exec_fd.c (they're used only by exec).
After this change, the kernel source modules are in a partitioned
enough state to allow building a system without vfs at all.
mail: http://mail-index.netbsd.org/source-changes/2008/10/10/msg211109.html
* Scary-looking socket locking stubs (changed to KASSERT of locked)
* depends on INET inappropriately (though now you must add new
accept filter names to the uipc_accf.c line in conf/files if
you aren't using dataready or httpready)
* New code uses MALLOC/FREE -- changed to kmem_alloc/kmem_free;
could be pool_cache, these are all fixed-size allocations.
We need to verify that this works as expected with protocols with per-socket
locking, like PF_LOCAL. I'm a little concerned about the case where the
lock on the listen socket isn't the same lock as on the eventual connected
socket.
UDF. Future users could be msdosfs, ufs, nilfs2 (when ready), cd9660 etc.
Note that its not the same as UFS's DIRHASH support; UFS would need a good
cleanup/splitout of directory operations to adopt to this new directory
hashing support since most directory operations are interweaved with the
vnops itself. This is a TODO.
unconfigured device. That removes the compile-time constant number
of useable devices.
While here, add disk_busy()/disk_unbusy() instrumentation.
Reviewed by: Quentin Garnier <cube@netbsd.org>
2) get rid of grep variable -- grep isn't used here
3) add a -m option that prints the release major number (like "4")
4) add a comment documenting the options
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.
OK core@.
Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.
OK'd by core@, releng@.
into two parts so that some of the routines could be used by rump.
Now that rump uses both vfs_subr and vfs_subr2 and there is no
reason to keep two files lying around, re-unite them.
const char bootprog_kernrev[] = "4.99.70";
For now, we still also include the builder name and date and such, so
that we don't break anything, but those are (probably) on the way out.
Part of the "bit-identical sources yield bit-identical release files"
project.
The rule is, if you change scan.l or gram.y, you bump the config(5)
version. If you implement the changes under sys/conf/files or affiliate,
you bump the required version in sys/conf/files or in an appropriate place
to minimise annoyance. If the changes makes new config(1) incompatible
with a previous version of config(5), embed it in config(1) using the
CONFIG_MINVERSION definition along with CONFIG_VERSION.
This has been in the tree for what, 3 years now? It's even documented...
to dig it out manually if installing by version number... and also to
make it somewhat easier to notice up front if one accidentally boots
the wrong test kernel. not like I've ever done that. ;-)
PR kern/38563.
"retro" computers, but NetBSD also runs a growing number of rare
and retro add-on cards. With this patch, NetBSD supports the IDEC
Supervision/16, a black&white image capture board for the 16-bit
ISA bus. Approximate date of manufacture: 1991. Total instances
known to be in use throughout the world: one.
Coming soon; isvctl(8), the utility program for capturing 8-bit,
512x480 images at speeds of up to 6 frames per second.
them in the mi "files" file, and remove include statements from md files.
These shouldn't pull in additional kernel code when not in use, so it
shouldn't do any harm except a risk of namespace collisions which
should be easy to fix.
host controller implementations, start with two little functions
which fake up string descriptors (which were inconststent, language
table fetching didn't interoperate with other code in the tree)
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.
This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
compiled with -g. For the initial set, netbsd on amd64 grows by around
80KB. This allows much easier use of GDB for post-mortem debugging as
it can understand the layout of data structures. The additional data can
be strip(1)ped off normally for size constraint environments.
for config_time.h and athhal_options.h.
Note: we still copy param.c because I'm told that we should still support
people editing that on a per-compile basis.
Add schedctl(8) - a program to control scheduling of processes and threads.
Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;
Proposed on: <tech-kern>. Reviewed by: <ad>.
For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack
For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack
(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie
This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.
Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
via the standard audio interfaces is redirected back to userland as raw
PCM data on /dev/padN.
One example usage is to stream audio to an AirTunes compatible device using
rtunes (http://www.nazgul.ch/dev_rtunes.html), ie:
$ rtunes - < /dev/pad0
$ mpg123 -a /dev/sound1 blah.mp3
Another option is to capture audio output from eg. Real Player, by simply
instructing Real Player to output to /dev/sound1, and running:
$ cat /dev/pad0 > blah.pcm
Rip the transport code completely out of puffs and generalize it
into an independent module which will be used for multiple purposes
in the future. This module is called the Pass-to-Userspace
Transporter (known as "putter" among friends).
This is very much work-in-progress and one dependency with puffs
remains: the request framing format.
The device name is still /dev/puffs, but that will change soon.
Users of puffs need the following in their kernel configs now:
pseudo-device putter
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.
The following lines in the kernel config enables the SCHED_M2:
no options SCHED_4BSD
options SCHED_M2
The scheduler seems to be stable. Further work will come soon.
http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.htmlhttp://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!
For now these just pass through to the current softintr code.
(The naming is different to allow softint/softintr to co-exist for a while.
I'm hoping that should make it easier to transition.)
from the forwarding table's users:
Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.
Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.
Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).
Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).
Cosmetic:
Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.
Stop using variadic arguments for rip6_output(), it is
unnecessary.
Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.
Make rt_maskedcopy() easier to read by using meaningful variable
names.
Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.
Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.
One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.
I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.
Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
FORTIFY_SOURCE feature of libssp, thus checking the size of arguments to
various string and memory copy and set functions (as well as a few system
calls and other miscellany) where known at function entry. RedHat has
evidently built all "core system packages" with this option for some time.
This option should be used at the top of Makefiles (or Makefile.inc where
this is used for subdirectories) but after any setting of LIB.
This is only useful for userland code, and cannot be used in libc or in
any code which includes the libc internals, because it overrides certain
libc functions with macros. Some effort has been made to make USE_FORT=yes
work correctly for a full-system build by having the bsd.sys.mk logic
disable the feature where it should not be used (libc, libssp iteself,
the kernel) but no attempt has been made to build the entire system with
USE_FORT and doing so will doubtless expose numerous bugs and misfeatures.
Adjust the system build so that all programs and libraries that are setuid,
directly handle network data (including serial comm data), perform
authentication, or appear likely to have (or have a history of having)
data-driven bugs (e.g. file(1)) are built with USE_FORT=yes by default,
with the exception of libc, which cannot use USE_FORT and thus uses
only USE_SSP by default. Tested on i386 with no ill results; USE_FORT=no
per-directory or in a system build will disable if desired.
The major changes are:
+ 4Gb (24XX) card support
+ Rewritten fabric and loop evaluation code
+ New f/w sets
The 4Gb changes required major rototilling, which caused a rewrite of
fabric and loop eval code. The latter can now be set up to tune for
dynamic device arrival/departure if the framework is set up for it,
or to be firm about waiting for devices.
Testing has been principally on amd64, i386 and sparc64 and seems to
not have broken things for me.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
* dev/ic/ug.c (main code shared by the attachments)
* dev/isa/ug_isa.c (isa attachment)
* dev/acpi/ug_acpi.c (acpi attachment)
That means that ug(4) can now be attached via ACPI.
Thanks to Mihai Chelaru for the good work.
Please note, that <tech-kern> people should note about
file names before commit. Otherwise, function may fail
with errno set to EDIRTY, and return -1. ;)
route_in6, struct route_iso), replacing all caches with a struct
route.
The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.
Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.
DETAILS
1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:
struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);
sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.
sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.
The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).
2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.
3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:
int rtcache_setdst(struct route *, const struct sockaddr *);
rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.
It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.
4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
device controllers, and more specifically raid controllers.
Add a new sensor type, ENVSYS_DRIVE, to report drive status. From OpenBSD.
Add bio and sysmon support to mfi(4). This allow userland to query
status for drives and logical volumes attached to a mfi(4) controller. While
there fix some debug printfs in mfi so they compile.
Add bio(4) to amd64 and i386 GENERIC.
Seems to be quite stable. Some work still left to do.
Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.
Reviewed by: <tech-kern>, <ad>
Minor modifications by me:
-use an mi device major number
-(coarsly) divided into pci card specific and less specific parts, moved
the latter to dev/drm
-renamed autoconf attributes to reflect this
Todo:
-adapt all card frontends but i915 to drm include file location
-review the mtrr change
-make the change to agp_i810.c coexist with the fix for buggy VESA
BIOSes which is commented out temporarily
-RCS IDs etc style stuff
-LKM support (rescan support for vga)
-test
dev/ic/wdc.c and in dev/ata/ata.c, #include "opt_ata.h", and make
both the files compile with or *without* ATADEBUG. Do not compile
with ATADEBUG by default.
merge link_set_* sections into the text section for a.out kernels,
from sys/arch/arm/conf/ to sys/conf/ since there is no ARM specific
stuff in it and other ports would share it.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).