NetBSD

Author	SHA1	Message	Date
maxv	e8bf64df9d	Check MT_FREE by default, and not just under DEBUG (or DIAGNOSTIC). This code is fast, with an nonexistent overhead - and we already take care of setting MT_FREE, so why not check it. In addition, stop registering the function name, that's not helpful since the MBUFFREE macro is local. Instead, set m_data to NULL, so that any access to a freed mbuf's data after mtod() or similar will page fault. The combination of these two changes provides a fast and efficient way of detecting use-after-frees in the network stack.	2017-12-31 06:57:12 +00:00
christos	9ee87335e6	Don't release the lock in the PR_NOWAIT allocation. Move flags setting after the acquiring the mutex. (from Tobias Nygren)	2017-12-29 16:13:26 +00:00
christos	ac475570f3	provide separate read and write functions to accomodate register functions that need a size argument.	2017-12-28 18:29:45 +00:00
ozaki-r	3e34af79cf	Add workqueue_wait that waits for a specific work to finish The caller must ensure that no new work is enqueued before calling workqueue_wait. Note that Note that if the workqueue is WQ_PERCPU, the caller can enqueue a new work to another queue other than the waiting queue. Discussed on tech-kern@	2017-12-28 07:00:52 +00:00
msaitoh	412ac21e62	Prevent panic or hangup in softint_disestablish(), pserialize_perform() or psref_target_destroy() while mp_online == false. See http://mail-index.netbsd.org/tech-kern/2017/12/25/msg022829.html	2017-12-28 03:39:48 +00:00
kamil	d46f49d32a	Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2) sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd file descriptor that lands in fildes[2]. This is a side effect of reusing the code for sys_pipe() (SYS_pipe) and not cleaning it up. The first returned value is (on success) 0. Introduced a small refactoring in pipe1() that it does not operate over retval[], but on an array int[2]. A user sets retval[] for pipe() when desired and needed. This refactoring touches compat code: netbsd32, linux, linux32. Before the changes on NetBSD/amd64: $ ktruss -i ./a.out [...] 15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4 [...] After the changes: $ ktruss -i ./a.out [...] 782 1 a.out pipe2(0x7f7fff97e850, 0) = 0 [...] There should not be a visible change for current users. Sponsored by <The NetBSD Foundation>	2017-12-26 08:30:57 +00:00
msaitoh	7281eb3fdc	Make cold __read_mostly like mp_online.	2017-12-26 03:58:03 +00:00
ozaki-r	a45a6f1723	Apply C99-style struct initialization to lockops_t	2017-12-25 09:13:40 +00:00
christos	d3ef0ca68b	Merge the code back; the problem was that since we are reading/writing to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.	2017-12-23 22:12:19 +00:00
kamil	f06ee48eda	ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands The refactored code did not work and was generating EFAULT. Sponsored by <The NetBSD Foundation>	2017-12-22 15:02:57 +00:00
kamil	102875f88e	Drop SYS_vadvise The (o)vadvise syscall is dummy since the beginning of NetBSD. It is an obsolete remnant from the old UNIX. Sponsored by <The NetBSD Foundation>	2017-12-19 19:40:03 +00:00
kamil	885229d011	Drop SYS_sbrk sbrk - change data segment size This syscall is dummy since the inception of the project. Sponsored by <The NetBSD Foundation>	2017-12-19 18:34:47 +00:00
kamil	ffa5cda8e7	Regenerate kern/systrace_args.c for the ___lwp_wait60 prototype change ___lwp_part60 removed 'const' from the ts argument. 'const struct timespec ts' -> 'struct timespec ts' Sponsored by <The NetBSD Foundation>	2017-12-19 08:51:09 +00:00
kamil	438b670525	Drop the sstk(2) syscall stub sstk - change stack section size This functionality has never been implemented and is a remnant from 16-bit UNIX. This stub appeared with the first NetBSD commit. Sponsored by <The NetBSD Foundation>	2017-12-19 08:48:19 +00:00
christos	f818d5c42e	handle siginfo requests for ptrace32	2017-12-17 20:59:27 +00:00
christos	28f60f0ed7	- reduce ifdef ugliness by moving it up top. - factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it - factor out PT_DUMPCORE - factor out sendsig code ... more to come ...	2017-12-17 15:43:27 +00:00
christos	2b32abf16a	untangle the mess: - factor out common code - break each ptrace subcall to its own sub-function ... more to come ...	2017-12-17 04:35:21 +00:00
christos	01d917581d	Fix the build: XXX this might^Wwill break module autoloading... It is the general issue about symbol replacement during module loading and unloading...	2017-12-16 18:42:22 +00:00
maxv	e9069ab139	compat_util.c must be compiled by default in the kernel. It is needed by generic non-compat code, so it must not depend on anything (libcompat or whatever option we choose to associate it to).	2017-12-16 10:15:12 +00:00
mrg	78ad3fe789	hopefully workaround the irregularly "fork fails in init" problem. if a pool is growing, and the grower is PR_NOWAIT, mark this. if another caller wants to grow the pool and is also PR_NOWAIT, busy-wait for the original caller, which should either succeed or hard-fail fairly quickly. implement the busy-wait by unlocking and relocking this pools mutex and returning ERESTART. other methods (such as having the caller do this) were significantly more code and this hack is fairly localised. ok chs@ riastradh@	2017-12-16 03:13:29 +00:00
chs	ef2af070d4	add some assertions to verify that CPU_INFO_FOREACH() works right early in the boot process. this detects existing bugs on some platforms.	2017-12-15 16:05:51 +00:00
pgoyette	a036e37d7f	Remove the check for duplicate-module-name-on-pending-list since it really doesn't help. The check really cannot fail, and it only looks at the list belonging to the current level of recursion. Instead, verify that the module's modcmd(MODULE_CMD_INIT, ...) does not introduce a duplicate module name as a result of recursively calling module_do_load().	2017-12-14 22:28:59 +00:00
pgoyette	2f9aeaa1a1	When looking for a duplicate module name, also check the pending list.	2017-12-14 11:45:40 +00:00
martin	e924f07b97	Change a KASSERTMSG into a regular module_error - not nice for the kernel to panic when I try to modload the 'ntfs' module.	2017-12-14 10:39:32 +00:00
ozaki-r	2b3456ea00	Improve debugging functions - Make psref_check_duplication check just if a given psref is on the list - It checked both psref and target - Suggested by riastradh@ some time ago - Add psref_check_existence that checks a releasing psref is surely on the list	2017-12-14 05:45:55 +00:00
pgoyette	1591fbefb5	Use KASSERT to ensure that the newly-added module's name can be found. XXX Pull-up to -8	2017-12-11 22:00:26 +00:00
knakahara	b2d6db8088	Fix psref(9) part of PR kern/52515. It is complete to fix the PR now. implementated by ozaki-r@n.o, reviewed by riastradh@n.o, thanks. XXX need pullup-8	2017-12-11 02:33:17 +00:00
pgoyette	57f560bc12	Add additional duplicate-module-name check in case we have two modules with the same internal name but no conflicting symbol definitions. When we load a module from the file system, the filename may have no relationship to the internal module's name. Furthermore, comparing the module's filename is insufficient if the file is loaded from an absolute path. XXX pullup to netbsd-8	2017-12-10 03:08:32 +00:00
christos	de1a66547a	use process_reg32 instead of struct reg32.	2017-12-09 05:18:45 +00:00
christos	d6abd869d0	add disgusting magic to handle compat_netbsd32 as a module.	2017-12-08 15:54:40 +00:00
christos	8e2dd5803e	regen XXX: pullup-8	2017-12-08 01:20:52 +00:00
christos	85bf85b701	make _lwp_park return the remaining time to sleep in the "ts" argument if it is a relative timestamp, as discussed in tech-kern. XXX: pullup-8	2017-12-08 01:19:29 +00:00
christos	4e720b5200	- Reset ignored or masked traps to avoid infinite loops - If sigpost fails don't add an SDT_PROBE ok (and author) chuq	2017-12-07 19:49:43 +00:00
christos	2b14b22e3a	Make {s,g}et{db,fp,}regs work again for PK_32 processes XXX: pullup-8	2017-12-07 15:21:34 +00:00
mrg	2af02d0e1d	properly account PR_RECURSIVE pools like vmstat does.	2017-12-04 03:05:24 +00:00
christos	3281275922	Also wait interruptibly when exiting. Avoids deadlocked on exit processes create by golang.	2017-12-02 22:51:22 +00:00
jdolecek	508f8978c8	according to benchmark extracting pkgsrc.tar, using FUA and hence waiting for each transfer to write through to the medium is way slower than just letting the drive use a cached write and doing DIOCCACHESYNC on the end Results were (fs block 32KB / frag 4KB, partition aligned on 32KB boundary): HDD at siisata(4): no-FUA: 108 sec w/FUA: 294 sec SSD at ahcisata(4): no-FUA: 73 sec w/FUA: 502 sec change the flag so that FUA is only used for the commit block write; for journal data write, only pass DPO, rely on the cache flush to get them to media	2017-12-02 17:29:55 +00:00
mrg	ac4d6c0dba	include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.	2017-12-02 08:22:04 +00:00
mrg	277fd3d5f5	add two new members to uvmexp_sysctl{}: bootpages and poolpages. bootpages is set to the pages allocated via uvm_pageboot_alloc(). poolpages is calculated from the list of pools nr_pages members. this brings us closer to having a valid total of pages known by the system, vs actual pages originally managed. XXX: poolpages needs some handling for PR_RECURSIVE pools still.	2017-12-02 08:15:42 +00:00
christos	10b27c7544	Allow attaching for write, but return no events.	2017-12-01 19:05:49 +00:00
christos	ea05286d92	add fo_name so we can identify the fileops in a simple way.	2017-11-30 20:25:54 +00:00
maxv	24d8fe75c0	If no auxv is present, don't kmem_alloc(0). Easy to panic the kernel by typing 'cat /proc/aout_pid/auxv' on whatever a.out binary you're running. Fortunately, amd64 does not enable EXEC_AOUT by default. Unfortunately, i386 does enable it by default.	2017-11-30 18:44:16 +00:00
christos	52eec12892	Put previous removed diagnostic back as debug. It has caught in the past (and now) different kqueue behavior between NetBSD and other kqueue implementations that depend on specific file types. If 3rd party programs trigger this it is probably because we are doing something different.	2017-11-30 14:19:27 +00:00
riastradh	51b6971009	Remove spammy kevent failure printf. Maybe this was once useful for debugging the kernel, but it's just console spam triggered by buggy or malicious userland programs now.	2017-11-30 05:52:40 +00:00
ozaki-r	06f11aad47	Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE If IFEF_MPSAFE is set, hold the lock and otherwise don't hold. This change requires additions of KERNEL_LOCK to subsequence functions from if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe components. Proposed on tech-kern@ and tech-net@	2017-11-22 03:03:18 +00:00
msaitoh	575c86383d	Increase the size of softint's data to prevent panic on big machine. Nowadays, some device drivers and some pseudo interfaces allocate a lot of softints. The resource size for softints are static and it panics when it execeed the limit. It can be dynamically resized. Untill dynamically resizing is implemented, increase softint_bytes from 8192 to 32768.	2017-11-22 02:20:21 +00:00
ozaki-r	683f5aa5e9	Implement debugging feature for pserialize(9) The debugging feature detects violations of pserialize constraints. It causes a panic: - if a context switch happens in a read section, or - if a sleepable function is called in a read section. The feature is enabled only if LOCKDEBUG is on. Discussed on tech-kern@	2017-11-21 08:49:14 +00:00
ozaki-r	48f9f2189f	Implement a debugging facility (overflow/underflow detection) for localcount We cannot get an accurate count from a localcount instance because it consists of per-cpu counters and we have no way to sum them up atomically. So we cannot detect counter overflow/underflow as we can do on a normal refcount. The facility adds an atomic counter to each localcount instance to enable the validations. The counter ups and downs in synchronization with the per-CPU counters. The counter is used iff both DEBUG and LOCKDEBUG are enabled in the kernel. Discussed on tech-kern@	2017-11-17 09:26:36 +00:00
christos	51b157a357	- fix an assert; we can reach there if we are nowait or limitfail. - when priming the pool and failing with ERESTART, don't decrement the number of pages; this avoids the issue of returning an ERESTART when we get to 0, and is more correct. - simplify the pool_grow code, and don't wakeup things if we ENOMEM.	2017-11-14 15:02:06 +00:00
jmcneill	f69ff2189c	Include "flash" in list of block devices that don't use partitions.	2017-11-14 14:14:29 +00:00
christos	dce2bc7064	grab a copy of the absolute pathbuf, before namei() munges it.	2017-11-13 22:01:45 +00:00
christos	c84ec9f755	Use the pathbuf which we pass to namei() (which is always absolute) as the resolved pathname. We need this in the case of scripts where p_path needs to point to the interpreter and not the script itself. Otherwise things like perl script that depend on /proc/$$/exe to re-exec themselves end up being fork bombs. In reality we should be using the fully resolved/canonicalized path here, but namei is not giving it back to us.	2017-11-13 20:38:31 +00:00
riastradh	227d009714	Apply same treatment to cv_timedwaitbt.	2017-11-12 20:04:51 +00:00
riastradh	4800a332d3	Clarify interpretation of timeout/epsilon in cv_timedwaitbt.	2017-11-12 19:46:34 +00:00
christos	b328643cbf	Don't add kevents to closing file descriptors (from riastradh)	2017-11-11 03:58:01 +00:00
riastradh	509483453b	Assert KM_SLEEP xor KM_NOSLEEP in all kmem allocation.	2017-11-09 23:20:12 +00:00
christos	17c894d283	Add assertions that either PR_WAITOK or PR_NOWAIT are set.	2017-11-09 22:52:26 +00:00
christos	b3254cd976	Don't use 0 for PR_NOWAIT	2017-11-09 22:21:27 +00:00
christos	b368d720c2	don't pass 0 to the pool flags	2017-11-09 21:57:06 +00:00
christos	9b7a6414b8	Add O_REGULAR to enforce opening of only regular files (like we have O_DIRECTORY for directories). This is better than open(, O_NONBLOCK), fstat()+S_ISREG() because opening devices can have side effects.	2017-11-09 20:30:01 +00:00
christos	948108c143	Handle the ERESTART case from pool_grow()	2017-11-09 19:34:17 +00:00
christos	a20c95d549	make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'	2017-11-09 15:53:40 +00:00
christos	56ae922037	Since pr_lock is now used to wait for two things now (PR_GROWING and PR_WANTED) we need to loop for the condition we wanted.	2017-11-09 15:40:23 +00:00
christos	f890274f96	add a "booted_method" string to aid in debugging double boot matches.	2017-11-09 01:02:55 +00:00
christos	0891190b55	hack around namei problem.	2017-11-07 20:58:23 +00:00
christos	0011aa658c	Store full executable path in p->p_path as discussed in tech-kern. This means that the full executable path is always available. - exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is always set, do so unconditionally. - kern_exec.c: simplify pathexec, use kmem_strfree where appropriate and set p->p_path - kern_exit.c: free p->p_path - kern_fork.c: set p->p_path for the child. - kern_proc.c: use p->p_path to return the executable pathname; the NULL check for p->p_path, should be a KASSERT? - exec.h: gc ep_path, it is not used anymore - param.h: bump version, 'struct proc' size change TODO: 1. reference count the path string, to save copy at fork and free just before exec? 2. canonicalize the pathname by changing namei() to LOCKPARENT vnode and then using getcwd() on the parent directory?	2017-11-07 19:44:04 +00:00
christos	3afe107bee	Add two utility functions to help use kmem with strings: kmem_strdupsize, kmem_strfree.	2017-11-07 18:35:57 +00:00
christos	cd1c6201df	We computed the length of the string already, so use it...	2017-11-07 15:57:38 +00:00
riastradh	d6585e3401	Assert that pool_get failure happens only with PR_NOWAIT. This would have caught the mistake I made last week leading to null pointer dereferences all over the place, a mistake which I evidently poorly scheduled alongside maxv's change to the panic message on x86 for null pointer dereferences.	2017-11-06 18:41:22 +00:00
mlelstv	cc92bcd96f	pool_grow can now fail even when sleeping is ok. Catch this case in pool_get and retry.	2017-11-05 07:49:45 +00:00
christos	ad97afb146	use Elf_Sym ** instead of casting.	2017-11-04 22:17:55 +00:00
martin	640f0abac6	Make kobj_sym_lookup's result type an Elf_Addr. Fixes the arm builds.	2017-11-04 12:14:41 +00:00
pgoyette	64ae8753d8	Remove the ABI version-and-length check that was recently introduced; sysctl(9) ABIs should be stable across versions. XXX Pull-up to -8	2017-11-03 22:45:14 +00:00
maxv	4e8a8f71db	Handle absolute relocations coming from the kernel: preserve SHN_ABS in the kernel and module symbols, and when relocating a symbol that has SHN_ABS, take its value as-is and don't return an error if it equals zero. Sent on tech-kern@.	2017-11-03 09:59:07 +00:00
riastradh	50a782dc6e	C99ify initialization of dummy_timecounter.	2017-11-02 15:28:23 +00:00
martin	a6bab1a764	Allow architectures to define a macro PROC_MACHINE_ARCH(P) and PROC_MACHINE_ARCH32(P) to override the value for sysctl hw.machine_arch (native and netbsd32 commpat resp.). Use these for arm and mips instead of the (not working, noisy, in case of arm) sysctl override and #ifdef __mips__ in architecture neutral code.	2017-10-31 12:37:23 +00:00
riastradh	aca2a29cb6	Allow only one pending call to a pool's backing allocator at a time. Candidate fix for problems with hanging after kva fragmentation related to PR kern/45718. Proposed on tech-kern: https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html Tested by bouyer@ on i386. This makes one small change to the semantics of pool_prime and pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if there is a pending call to the backing allocator in another thread but we are not actually out of memory. That is unlikely because nearly always these are used during initialization, when the pool is not in use. XXX pullup-8 XXX pullup-7 XXX pullup-6 (requires tweaking the patch) XXX pullup-5...	2017-10-28 17:06:43 +00:00
pgoyette	cb32a134a5	Update the kernhist(9) kernel history code to address issues identified in PR kern/52639, as well as some general cleaning-up... (As proposed on tech-kern@ with additional changes and enhancements.) Details of changes: * All history arguments are now stored as uintmax_t values[1], both in the kernel and in the structures used for exporting the history data to userland via sysctl(9). This avoids problems on some architectures where passing a 64-bit (or larger) value to printf(3) can cause it to process the value as multiple arguments. (This can be particularly problematic when printf()'s format string is not a literal, since in that case the compiler cannot know how large each argument should be.) * Update the data structures used for exporting kernel history data to include a version number as well as the length of history arguments. * All [2] existing users of kernhist(9) have had their format strings updated. Each format specifier now includes an explicit length modifier 'j' to refer to numeric values of the size of uintmax_t. * All [2] existing users of kernhist(9) have had their format strings updated to replace uses of "%p" with "%#jx", and the pointer arguments are now cast to (uintptr_t) before being subsequently cast to (uintmax_t). This is needed to avoid compiler warnings about casting "pointer to integer of a different size." * All [2] existing users of kernhist(9) have had instances of "%s" or "%c" format strings replaced with numeric formats; several instances of mis-match between format string and argument list have been fixed. * vmstat(1) has been modified to handle the new size of arguments in the history data as exported by sysctl(9). * vmstat(1) now provides a warning message if the history requested with the -u option does not exist (previously, this condition was silently ignored, with only a single blank line being printed). * vmstat(1) now checks the version and argument length included in the data exported via sysctl(9) and exits if they do not match the values with which vmstat was built. * The kernhist(9) man-page has been updated to note the additional requirements imposed on the format strings, along with several other minor changes and enhancements. [1] It would have been possible to use an explicit length (for example, uint64_t) for the history arguments. But that would require another "rototill" of all the users in the future when we add support for an architecture that supports a larger size. Also, the printf(3) format specifiers for explicitly-sized values, such as "%"PRIu64, are much more verbose (and less aesthetically appealing, IMHO) than simply using "%ju". [2] I've tried very hard to find "all [the] existing users of kernhist(9)" but it is possible that I've missed some of them. I would be glad to update any stragglers that anyone identifies.	2017-10-28 00:37:11 +00:00
joerg	e64612f440	Revert printf return value change.	2017-10-27 12:25:14 +00:00
utkarsh009	f11595bab5	[syzkaller] Cast all the printf's to (void *) > as a result of new printf(9) declaration.	2017-10-27 09:59:16 +00:00
maya	18b796d442	Use C99 initializer for filterops Mostly done with spatch with touchups for indentation @@ expression a; identifier b,c,d; identifier p; @@ const struct filterops p = - { a, b, c, d + { + .f_isfd = a, + .f_attach = b, + .f_detach = c, + .f_event = d, };	2017-10-25 08:12:37 +00:00
riastradh	2a7a645aaa	Document lock order and locking rules.	2017-10-25 06:02:40 +00:00
jdolecek	d3e642e387	remove counter for 'journal I/O bufs biowait' - it's (total - async), so superfluous; adjust the description of the the other counters a bit to make them more clear	2017-10-23 19:03:40 +00:00
riastradh	f7b8b20d17	Initialize the in/out parameter vmin. vmin is only an optional hint since we're not passing UVM_FLAG_FIXED, but that doesn't mean we should use uninitialized stack garbage as the hint. Noted by chs@.	2017-10-20 19:06:46 +00:00
riastradh	4691bf4bd7	Carve out KVA for execargs on boot from an exec_map like we used to. Candidate fix for PR kern/45718: `processes sometimes get stuck and spin in vm_map', a problem that has been plaguing all our 32-bit ports for years. Since we currently use large (256k) buffers for execargs, and since nobody has stepped up to tackle breaking them into bite-sized (or at least page-sized) chunks, after KVA gets sufficiently fragmented we can't allocate new execargs buffers from kernel_map. Until 2008, we always carved out KVA for execargs on boot with a uvm submap exec_map of kernel_map. Then ad@ found that the uvm_km_free call, to discard them when done, cost about 100us, which a pool avoided: https://mail-index.NetBSD.org/tech-kern/2008/06/25/msg001854.html https://mail-index.NetBSD.org/tech-kern/2008/06/26/msg001859.html ad@ _simultaneously_ introduced a pool _and_ eliminated the reserved KVA in the exec_map submap. This change preserves the pool, but restores exec_map (with less code, by putting it in MI code instead of copying it in every MD initialization routine). Patch proposed on tech-kern: https://mail-index.NetBSD.org/tech-kern/2017/10/19/msg022461.html Patch tested by bouyer@: https://mail-index.NetBSD.org/tech-kern/2017/10/20/msg022465.html I previously discussed the issue on tech-kern before I knew of the history around exec_map: https://mail-index.NetBSD.org/tech-kern/2012/12/09/msg014695.html The candidate workaround I proposed of using pool_setlowat to force preallocation of KVA would also force preallocation of physical RAM, which is a waste not incurred by using exec_map, and which is part of why I never committed it. There may remain a general problem that if thread A calls pool_get and tries to service that request by a uvm_km_alloc call that hangs because KVA is scarce, and thread B does pool_put, the pool_put in thread B will not notify the pool_get in thread A that it doesn't need to wait for KVA, and so thread A may continue to hang in uvm_km_alloc. However, (a) That won't apply here, because there is exactly as much KVA available in exec_map as exec_pool will ever try to use. (b) It is possible that may not even matter in other cases as long as the page daemon eventually tries to shrink the pool, which will cause a uvm_km_free that can unhang the hung uvm_km_alloc. XXX pullup-8 XXX pullup-7 XXX pullup-6 XXX pullup-5, perhaps...	2017-10-20 14:48:43 +00:00
martin	f115d566a4	Make check_exec() errors print the name of the binary that fails to execute.	2017-10-20 12:11:34 +00:00
bouyer	d4ce271380	PR port-arm/52603: There is a race here, as seen on arm with FPU: LWP L is running but not on CPU, has its FPU state on CPU2 which has not been released yet, so fpexc still has VFP_FPEXC_EN set in the PCB copy. LWP L is scheduled on CPU1, CPU1 calls cpu_switchto() for L in mi_switch(). cpu_switchto() will set VFP_FPEXC_EN in the FPU's fpexc register per the PCB fpexc copy. Before CPU1 calls pcu_switchpoint() for L, CPU2 calls pcu_do_op(PCU_CMD_SAVE \| PCU_CMD_RELEASE) for L because it still holds its FPU state and wants to load another lwp. This cause VFP_FPEXC_EN to be cleared in the PCB copy, but not in CPU1's register. L's l_pcu_cpu is set to NULL. When CPU1 calls pcu_switchpoint() for L it see l_pcu_cpu is NULL, and doesn't call the release callback. Now CPU1 has its FPU enabled but with the wrong FPU state. Fix by releasing the PCU even if l_pcu_cpu is NULL.	2017-10-16 15:03:57 +00:00
christos	bb321f6151	Setting AT_BASE on static binaries breaks TLS because they assume that it is 0, will fix it differently.	2017-10-16 01:50:55 +00:00
christos	3df3b581f3	For static PIE set the interpreter address to be the entry offset so we don't lose it.	2017-10-08 15:00:40 +00:00
maxv	252ca9c54a	Remove compat_linux32 from the autoload list and add a enable/disable sysctl, like compat_linux.	2017-09-29 17:47:29 +00:00
maxv	aef145dda9	Remove compat_linux from the autoload list, and add a sysctl to enable or disable it - which defaults to disabled. The following command is now required to use linux binaries: sysctl -w emul.linux.enabled=1 After a discussion on tech-kern@. All the other ideas to reduce the attack surface have drawbacks, and this sysctl seems to be the best option.	2017-09-29 17:08:00 +00:00
joerg	0e5b5aa88a	Fix non-DIAGNOSTICS build by adjusting _vstate_assert here too.	2017-09-22 06:05:20 +00:00
joerg	5db0939512	Change the VSTATE_ASSERT_UNLOCKED code by pushing the potential lock handling into the backend and doing an optimistic (unlocked) check first. Always taking the vnode interlock makes this assertion otherwise very heavy for multi-processor machines. Ride the kernel version bump.	2017-09-21 18:19:44 +00:00
jakllsch	6b34528ad5	Initialize ex_lock and ex_cv only in the not-EX_EARLY case.	2017-09-18 13:22:56 +00:00
christos	e7f0067cbe	more const	2017-09-16 23:55:33 +00:00
christos	c483c7cba9	more debug info	2017-09-16 23:55:16 +00:00
christos	9d349e2adb	add missing const	2017-09-16 23:25:34 +00:00
sevan	684872c792	Remove support for VERIFIED_EXEC_FP_RMD160, VERIFIED_EXEC_FP_SHA1, and VERIFIED_EXEC_FP_MD5 options. These algorithms are either broken or on their way to being broken. Discussed on tech-security http://mail-index.netbsd.org/tech-security/2017/08/21/msg000936.html ok riastradh	2017-09-13 22:24:42 +00:00
joerg	69ab70f077	Fix a race between sysctl_unpcblist and closef.	2017-09-09 14:41:19 +00:00
pgoyette	1c284673f6	When adding a new veriexec_file_entry, if an entry already exists with all the same values (except for the filename) just ignore it. Otherwise report the duplicate-entry error. This allows the user to create a signature file with veriexegen(8) and not worry about duplicate entries (due to hard-linked files) which will otherwise cause /etc/rc.d/veriexec to report an error. Fixes PR kern/52512 XXX Pull-up for -8	2017-08-31 08:47:19 +00:00
pgoyette	c31e1d979d	Revert previous changes. They are wrong. The intended clean-up is already being handled by the call to veriexec_file_free() in the "out:" path.	2017-08-29 12:48:50 +00:00
pgoyette	f18bf91f4d	One more resource to release - the filename, if we kept it.	2017-08-29 10:23:12 +00:00
pgoyette	8bdb86c1df	Release any allocated resources if we take the error paths. As posted on tech-kern and discussed on IRC.	2017-08-29 10:19:54 +00:00
dholland	cec712d80b	If we go to allocate and find someone else has at the same time, don't trigger a refcount leak of the other guy's object. From mjg@freebsd. While here also remove a bogus use of lbolt on the same path.	2017-08-28 04:57:11 +00:00
kamil	a69b333e73	Remove the filesystem tracing feature This is a legacy interface from 4.4BSD, and it was introduced to overcome shortcomings of ptrace(2) at that time, which are no longer relevant (performance). Today /proc/#/ctl offers a narrow subset of ptrace(2) commands and is not applicable for modern applications use beyond simplistic tracing scenarios. This removal will simplify kernel internals. Users will still be able to use all the other /proc files. This change won't affect other procfs files neither Linux compat features within mount_procfs(8). /proc/#/ctl isn't available on Linux. Remove: - /proc/#/ctl from mount_procfs(8) - P_FSTRACE note from the documentation of ps(1) - /proc/#/ctl and filesystem tracing documentation from mount_procfs(8) - KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9) - source code file miscfs/procfs/procfs_ctl.c - PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h - KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h - PSL_FSTRACE (0x00010000) from sys/sys/proc.h - P_FSTRACE (0x00010000) from sys/sys/sysctl.h Reduce code complexity after removal of this functionality. Update TODO.ptrace accordingly: remove two entries about /proc tracing. Do not keep legacy notes as comments in the headers about removed PSL_FSTRACE / P_FSTRACE, as this interface had little number of users (close or equal to zero). Proposed on tech-kern@. All filesystem tracing utility users are encouraged to switch to ptrace(2). Sponsored by <The NetBSD Foundation>	2017-08-28 00:46:06 +00:00
kre	968e76ebe6	Build fix attempt ... changes affect !KERNEL (ie: userland, rump) version of this file only. Rather than adding meaningless {} around all uses of functions that are #defined to nothing for userland, #define the funcs to something that is functionally equivalent (but which appeases gcc). Also, define KASSERT() to nothing for userland, which avoids the need to add a #definee for mutex_owned which would otherwise be needed, and simmultaneoiusly stops gcc from complaining about a lack of a prototype.	2017-08-24 17:18:55 +00:00
skrll	78070145bc	Whitespace fix	2017-08-24 11:37:25 +00:00
jmcneill	7de85ed29e	Add EX_EARLY flag for extent_create, which skips locking. Required for using extent subsystem in early bootstrap code, before caches are enabled. From skrll@	2017-08-24 11:33:28 +00:00
hannken	28650af9eb	Change forced unmount to revert open device vnodes to anonymous devices.	2017-08-21 09:00:21 +00:00
hannken	7801661c06	No need to cache anonymous device vnodes, they will never be looked up. Set key to (dead_rootmount, 0, NULL) and add assertions.	2017-08-21 08:56:45 +00:00
maxv	c778810068	Remove compat_svr4, compat_svr4_32 and compat_ibcs2 from the list of autoloaded modules. These options are disabled everywhere (except ibcs2 on Vax, but Vax does not support kernel modules, so doesn't matter), therefore there is no issue in removing them from the list. Interested users will now have to do a 'modload' first, or uncomment the entries in GENERIC.	2017-08-08 16:57:32 +00:00
maxv	1d68b497f2	Remove compat_freebsd from the list of autoloaded modules. Interested users will now have to type 'modload' to use it, or uncomment the entry in GENERIC. I should have removed it when I disabled COMPAT_FREEBSD by default, sorry about that.	2017-08-08 08:12:14 +00:00
christos	7869295617	use the same string for the log and uprintf.	2017-08-06 09:14:14 +00:00
mrg	65d1d4aa12	normalise a BIOHIST log message	2017-08-04 07:00:17 +00:00
riastradh	56272c962e	Don't walk off the end of the dirent buffer. From Ilja Van Sprundel.	2017-07-28 15:37:23 +00:00
riastradh	cf5a000fe5	Clamp the length we use, not the length we don't. Avoids uninitialized memory disclosure to userland. From Ilja Van Sprundel.	2017-07-28 15:16:39 +00:00
martin	f08cc415b0	Avoid integer overflow in kern_malloc(). Reported by Ilja Van Sprundel. XXX Time to kill malloc() completely!	2017-07-28 12:28:48 +00:00
skrll	111cbb5944	Add a condition variable (ex_flwanted) to struct extent so that ex_flags becomes an invariant. Remove strange locking for ex_flags as a result.	2017-07-24 19:56:07 +00:00
maxv	d245e6f22a	Should be loadfactor().	2017-07-14 13:23:48 +00:00
maxv	bcdfaccefa	Revert rev1.26. l_estcpu is increased by only one cpu, not all of them.	2017-07-14 13:02:20 +00:00
hannken	31624a0218	Regen.	2017-07-12 09:31:59 +00:00
hannken	d29c150b3b	As VOP_ADVLOCK() may block indefinitely we cannot take fstrans here. Fixes PR kern/52364: System hangs not much before showing the login prompt	2017-07-12 09:31:07 +00:00
dholland	9a94872476	Fix vnode leak on error, introduced by the openat family changes in -r1.200. From mjg@freebsd.	2017-07-09 22:48:44 +00:00
maxv	5dc461da23	explain a bit	2017-07-08 15:15:43 +00:00
christos	c85be1e9c7	move the timestamp stuff to uipc_socket.c because it already has the compat includes.	2017-07-06 17:42:39 +00:00
christos	2b50acc97b	Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single function, and add a SOOPT_TIMESTAMP define reducing compat pollution from 5 places to 1.	2017-07-06 17:08:57 +00:00
christos	c3a5f17a00	don't print diagnostic for AF_LINK	2017-07-05 17:54:46 +00:00
riastradh	0a89dacf06	Add cv_timedwaitbt, cv_timedwaitbt_sig. Takes struct bintime maximum delay, and decrements it in place so that you can use it in a loop in case of spurious wakeup. Discussed on tech-kern a couple years ago: https://mail-index.netbsd.org/tech-kern/2015/03/23/msg018557.html Added a parameter for expressing desired precision -- not currently interpreted, but intended for a future tickless kernel with a choice of high-resolution timers.	2017-07-03 02:12:47 +00:00
riastradh	a18efaac6b	Nix trailing whitespace. No functional change.	2017-07-03 00:53:33 +00:00
joerg	5f391f4ae2	Export the guard size of the main thread via vm.guard_size. Add a complementary writable sysctl for the initial guard size of threads created via pthread_create. Let the existing attribut accessors do the right thing. Raise the default guard size for threads to 64KB.	2017-07-02 16:41:32 +00:00
christos	6d52cc85b8	don't warn about AF_LINK sockets with sa_len less than the size of the sockaddr	2017-07-02 02:39:18 +00:00
christos	c4aed00fad	fix file descriptor locking (from joerg). fixes kernel crashes by running go XXX: pullup-7	2017-07-01 20:08:56 +00:00
christos	7700e78cab	put the code that returns the sizeof the socket by family in one place.	2017-07-01 16:59:12 +00:00
snj	4e609ee710	fix typo	2017-06-25 04:10:47 +00:00
joerg	b77121f193	Recommit exec_subr.c revision 1.79: Always include a 1MB guard area beyond the end of stack. While ASLR will normally create a guard area as well, this provides a deterministic area for all binaries. Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from Qualys. Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include user_stack_guard_size in the size reservation.	2017-06-23 21:28:38 +00:00
skrll	34397172e3	Unwrap two lines. NFC.	2017-06-22 09:05:09 +00:00
martin	8ee7e18703	Change a KASSERT to KASSERTMSG and print the xcall function to be invoked as a debugging help.	2017-06-21 07:39:04 +00:00
christos	f4961bd8ed	Change len type to be unsigned int for consistency with the input type. Don't check for negative; it does not matter we clamp anyway. This broke the compat32 getsockname() where an unitialized socklen_t ended up randomly negative causing it to fail.	2017-06-20 20:34:49 +00:00
joerg	2e851f5508	Revert for the moment, creates problems on i386.	2017-06-19 19:02:16 +00:00
joerg	5bcc4a51d6	Always include a 1MB guard area beyond the end of stack. While ASLR will normally create a guard area as well, this provides a deterministic area for all binaries. Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from Qualys.	2017-06-19 15:53:16 +00:00
hannken	a94bf97d25	Make the fast path of fstrans_get_lwp_info() "static inline".	2017-06-18 14:00:17 +00:00
hannken	90e2dee24a	Clear fstrans entries whose mount is gone from the last fstrans_done() only.	2017-06-18 13:59:45 +00:00
chs	2b3f157429	create an nmap table for module symtabs too. needed by dtrace.	2017-06-14 00:52:37 +00:00
riastradh	26bd73f202	Add heading comment for private localcount_adjust subroutine.	2017-06-12 21:08:34 +00:00
riastradh	44df486bb8	Move forward declaration to top of file. Keep header comment above localcount_init adjoined to it. No functional change.	2017-06-12 21:07:14 +00:00
chs	20bf3061d4	define a copy of getnanotime() named dtrace_getnanotime() so that dtrace can know from the name that it should not allow setting fbt probes on it. needed by dtrace.	2017-06-09 01:16:33 +00:00
chs	3756187172	add some pool_allocators for pool item sizes larger than PAGE_SIZE. needed by dtrace.	2017-06-08 04:00:01 +00:00
chs	ec5ea71a90	move some buffer cache internals declarations from buf.h to vfs_bio.c. this is needed to avoid name conflicts with ZFS and also makes it clearer that other code shouldn't be messing with these. remove the LFS debug code that poked around in bufqueues and remove the BQ_EMPTY bufqueue since nothing uses it anymore. provide a function to let LFS and wapbl read the value of nbuf for now.	2017-06-08 01:23:01 +00:00
chs	67c81802f1	allow cv_signal() immediately followed by cv_destroy(). this sequence is used by ZFS in a couple places and by supporting it natively we can undo our local ZFS changes that avoided it. note that this is only legal when all of the waiters use cv_wait() and not any of the other variations, and lockdebug will catch any violations of this rule.	2017-06-08 01:09:52 +00:00
hannken	287643b0da	Operations fstrans_start() and fstrans_start_nowait() now always use FSTRANS_SHARED as lock type so remove the lock type argument. File system state FSTRANS_SUSPENDING is now unused so remove it. Regen vnode_if files. Ride 8.99.1 less than a hour ago.	2017-06-04 08:05:41 +00:00
hannken	775d23a76b	Operations fstrans_start() and fstrans_start_nowait() now always use FSTRANS_SHARED as lock type so remove the lock type argument.	2017-06-04 08:03:26 +00:00
hannken	f5647f853e	Locking a layer vnode using the regular bypass routine is no longer racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz and make vi_lock a krwlock_t again.	2017-06-04 08:02:26 +00:00
hannken	48c67e7912	Regen.	2017-06-04 08:00:27 +00:00
hannken	dfcc54aa9c	Add "FSTRANS=LOCK" and "FSTRANS=UNLOCK" to vop_lock and vop_unlock. Add two "static inline" functions to vnode_if.c to handle MPSAFE and FSTRANS before and after the "VCALL()". Take FSTRANS and handle error before "VCALL(...vop_lock...)" and release it after "VCALL(...vop_unlock...)".	2017-06-04 07:59:17 +00:00
hannken	8e1cefd98c	A vnode is usually called "active", if it has an associated file system node and a usecount greater zero. Therefore rename state "VS_ACTIVE" to "VS_LOADED" and add a new synthetic state "VS_ACTIVE" for VSTATE_ASSERT() to assert an active vnode. Add VSTATE_ASSERT_UNLOCKED() to be used with v_interlock unheld and move the state assertion macros to sys/vnode_impl.h.	2017-06-04 07:58:29 +00:00
chs	ffb3d80455	localcount_init() can't fail because percpu_alloc() can't fail. remove the check and change the return type to void.	2017-06-02 00:32:12 +00:00
chs	fd34ea77eb	remove checks for failure after memory allocation calls that cannot fail: kmem_alloc() with KM_SLEEP kmem_zalloc() with KM_SLEEP percpu_alloc() pserialize_create() psref_class_create() all of these paths include an assertion that the allocation has not failed, so callers should not assert that again.	2017-06-01 02:45:05 +00:00
chs	1f0e167178	vmem_alloc() with VM_SLEEP cannot fail, so percpu_alloc() cannot fail either.	2017-05-31 23:54:17 +00:00
chs	c85613c074	assert that vmem_alloc() with VM_SLEEP does not fail.	2017-05-31 23:53:30 +00:00
hannken	e4e82d96c7	Restrict vgone() to suspended file systems only. Welcome to 7.99.75, old file system modules would cause a diagnostic assertion with new kernel.	2017-05-28 16:39:41 +00:00
hannken	a8045334ce	Add a helper to propagate file system suspension for vrevoke(). Take care to retry suspension on interrupt as vrevoke must succeed.	2017-05-28 16:35:47 +00:00
bouyer	6e4cb2b9ab	merge the bouyer-socketcan branch to HEAD. CAN stands for Controller Area Network, a broadcast network used in automation and automotive fields. For example, the NMEA2000 standard developped for marine devices uses a CAN network as the link layer. This is an implementation of the linux socketcan API: https://www.kernel.org/doc/Documentation/networking/can.txt you can also see can(4). This adds a new socket family (AF_CAN) and protocol (PF_CAN), as well as the canconfig(8) utility, used to set timing parameter of CAN hardware. Also inclued is a driver for the CAN controller found in the allwinner A20 SoC (I tested it with an Olimex lime2 board, connected with PIC18-based CAN devices). There is also the canloop(4) pseudo-device, which allows to use the socketcan API without CAN hardware. At this time the CANFD part of the linux socketcan API is not implemented. Error frames are not implemented either. But I could get the cansend and canreceive utilities from the canutils package to build and run with minimal changes. tcpudmp(8) can also be used to record frames, which can be decoded with etherreal.	2017-05-27 21:02:54 +00:00
riastradh	c921bd9b79	Check VOP_INACTIVE contract with a judicious assert.	2017-05-26 14:40:09 +00:00
riastradh	51e152b5ce	Clarify comment.	2017-05-26 14:39:20 +00:00
riastradh	93562e3f53	Eliminate crusty debugging sludge. We have a mostly sane vnode lifecycle now. If this needs debugging, it should be done once at the call site of VOP_RECLAIM.	2017-05-26 14:34:19 +00:00
riastradh	f4ad397b3e	regen	2017-05-26 14:21:54 +00:00
riastradh	7f7aad09bd	Make VOP_RECLAIM do the last unlock of the vnode. VOP_RECLAIM naturally has exclusive access to the vnode, so having it locked on entry is not strictly necessary -- but it means if there are any final operations that must be done on the vnode, such as ffs_update, requiring exclusive access to it, we can now kassert that the vnode is locked in those operations. We can't just have the caller release the last lock because some file systems don't use genfs_lock, and require the vnode to remain valid for VOP_UNLOCK to work, notably unionfs.	2017-05-26 14:20:59 +00:00
christos	9aa2075330	switch to a switch	2017-05-25 20:42:36 +00:00
pgoyette	3b2df19edf	When logging a history record for biowait(), include the return address as a parameter, to identify to which of the many calls to biowait() the record refers.	2017-05-25 02:28:07 +00:00
hannken	69174779b1	With dounmount() working on a suspended file system remove no longer needed fields mnt_busynest and mnt_unmounting from struct mount. Welcome to 7.99.73	2017-05-24 09:53:55 +00:00
hannken	c2c49e1ed2	Remove the syncer dance from dounmount(). The syncer skips unmounting file systems as they are suspended. Remove now unused syncer_mutex.	2017-05-24 09:52:59 +00:00
pgoyette	cb99404632	Fix a comment - in localcount_fini(), we don't care whether it was the caller or some other code that drained the localcount; all we care is that it has been drained.	2017-05-19 02:20:24 +00:00
pgoyette	a372bceac2	Introduce new localcount(9) reference-count primitives.	2017-05-19 00:01:33 +00:00
hannken	9fc3ca45b3	Suspend file system while revoking a vnode. This way no operations run on the mounted file system during revoke and all operations see the state before or after the revoke.	2017-05-17 12:46:14 +00:00
hannken	677cf1d8b4	Suspend file system while unmounting. This way no operations run on the mounted file system during unmount and all operations see the state before or after the (possibly failed) unmount.	2017-05-17 12:45:03 +00:00
christos	f6b964d39b	protect against NULL, from PaulG	2017-05-11 23:50:17 +00:00
nat	5e34165f16	Explicitly set the flags instead of masking set values in. This fixes FNONBLOCK weirdness seen in audio.c OK christos@ and martin@.	2017-05-11 22:38:56 +00:00
riastradh	9c32900485	regen	2017-05-10 06:19:47 +00:00
riastradh	913618cd04	Forward-declare `struct lwp' so we can use` struct lwp *' here.	2017-05-10 06:08:56 +00:00
christos	21e6c9452c	fp == NULL in the DIAGNOSTIC, so use the real fp and also print the errno.	2017-05-09 21:18:51 +00:00
christos	1e7fb326f1	de-triplicate.	2017-05-07 22:54:54 +00:00
hannken	4f4cfe27b2	Enter fstrans from _vfs_busy() and leave from vfs_unbusy(). Adapt sched_sync() and do_sys_sync().	2017-05-07 08:26:58 +00:00
hannken	01d31ceb6d	Return ENOENT if trying to suspend an unmounted file system.	2017-05-07 08:25:54 +00:00
hannken	c18a56f135	Move fstrans initialization to vfs_mountalloc().	2017-05-07 08:24:20 +00:00
hannken	12ad3b05fd	Handle the case where the mount is gone and its mnt_transinfo is NULL.	2017-05-07 08:23:28 +00:00
hannken	853d034c97	Remove now invalid comment.	2017-05-07 08:21:08 +00:00
joerg	4f77b889d0	Extend the mmap(2) interface to allow requesting protections for later use with mprotect(2), but without enabling them immediately. Extend the mremap(2) interface to allow duplicating mappings, i.e. create a second range of virtual addresses references the same physical pages. Duplicated mappings can have different effective protections. Adjust PAX mprotect logic to disallow effective protections of W&X, but allow one mapping W and another X protections. This obsoletes using temporary files for purposes like JIT. Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested and not silently drop the X protection. Improve test cases to ensure correct operation of the changed interfaces.	2017-05-06 21:34:51 +00:00
kamil	1627fdf3a4	Set clear comment about EI_OSABI and EI_ABIVERSION /* * NetBSD sets generic SYSV OSABI and ABI version 0 * Native ELF files are distinguishable with NetBSD specific notes */ No functional change.	2017-05-04 11:12:23 +00:00
kamil	ec80600208	Use consistently "bufq_private(bufq)" instead of "bufq->bq_private" No functional change.	2017-05-04 11:03:27 +00:00
kamil	df97a42593	Correct typo in the comment No functional change.	2017-05-04 11:01:16 +00:00
kamil	88e477a387	Fix kernel panic triggered with LLDB PT_SETSTEP and PT_CLEARSTEP in the current design must unlock proc_lock and t->p_lock. These functions use lwp_delref() for a tracee with more than one LWP. This function internally lock (t->)p_lock and this is lock against self. There are coming new ATF test with PT_*STEP with multiple LWPs to catch these bugs in future changes. Sponsored by <The NetBSD Foundation>	2017-05-03 15:53:31 +00:00
pgoyette	48e395b1b8	Introduce mutex_ownable() to determine if it is possible for the current process to acquire a mutex.	2017-05-01 21:35:25 +00:00
ryo	d9ee24f798	whitespace police	2017-05-01 10:00:43 +00:00
abhinav	39132b9e2d	Rearrange the if conditions in order to get rid of unnecessary indentation. No functional change intended. ok christos@	2017-04-27 16:52:22 +00:00
riastradh	8e5c8dbff1	regen	2017-04-26 03:04:24 +00:00
riastradh	6fa7b15833	Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp. No change to vp -- the plan is to replace the node by the componentname in the vop parameters, and let all directory vops do lookups internally. Proposed on tech-kern with no objections: https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html	2017-04-26 03:02:47 +00:00
pgoyette	ca22f64915	Add a check to ensure that a new sysctl node was attached in the tree at the place we expected it to be attached! As mentioned several times (on tech-kern@ mailing list) over the past 18 months or so, I've seen a few instances where this will trigger, although I've been unable to reproduce them. Hopefully some wider exposure will reveal the under-lying cause of this rare phenomenon. Commit was proposed on tech-kern list, and no objections raised.	2017-04-25 22:07:10 +00:00
pgoyette	ab5e69493e	Use __func__ for routine name in printf() calls. NFC intended.	2017-04-25 08:46:38 +00:00
kamil	795febebbd	Try to fix build of sys_lwp.c lwp_create() has been acquired more arguments, there was missing the latest one. Per analogiam with changes in the same commit to other source files, go for &SS_INIT.	2017-04-21 19:38:35 +00:00
christos	d7746f2ee3	- Propagate the signal mask from the ucontext_t to the newly created thread as specified by _lwp_create(2) - Reset the signal stack for threads created with _lwp_create(2)	2017-04-21 15:10:34 +00:00
kamil	34e270cb64	Enhance verbosity of debug message for ELF magic mismatch Print e_ident[EI_MAG3] (it was missed) Print e_ident[EI_CLASS] as it is used do determine correct ELF magic. No functional change for non-debug (without option DEBUG_ELF) build.	2017-04-21 13:17:42 +00:00
christos	5d75b0065e	simplify.	2017-04-19 15:54:45 +00:00
pgoyette	05aa8c5f12	Be consistent about checking for text section address being 0, and don't ignore errors by falling through to the next section(s). As discussed on tech-kern@	2017-04-19 06:19:02 +00:00
christos	6ef342f61a	PR/52174: Remove root test, it is too verbose. XXX: need to come up with something better.	2017-04-18 18:07:29 +00:00
hannken	bd152b56b5	Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.	2017-04-17 08:34:27 +00:00
hannken	eb8533a8b6	No need to keep a not yet visible mount busy. Move vfs_busy() from vfs_mountalloc() to vfs_rootmountalloc(). XXX: Do we really need to vfs_busy() for vfs_mountroot?	2017-04-17 08:32:55 +00:00
hannken	20bb034f5b	Remove unused argument "nextp" from vfs_busy() and vfs_unbusy(). Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.	2017-04-17 08:32:00 +00:00
hannken	ebb8f73b4b	Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace incrementing mp->mnt_refcnt with vfs_ref(mp).	2017-04-17 08:31:01 +00:00
hannken	256581e1f9	Cleanup after mountlist iterator: - remove now unused field mnt_list. - rename mount_list to mountlist and make it local to vfs_mount.c. - make mountlist_lock local to vfs_mount.c. Change pstat.c to retrieve vnodes by lru lists.	2017-04-17 08:29:58 +00:00
riastradh	629022bd8f	regen to confirm no functional change	2017-04-16 17:18:54 +00:00
riastradh	f2ed57297a	Count vnode arguments correctly. Don't count arguments that have WILLRELE/WILLPUT; count arguments that are struct vnode *. No functional change currently because it happens that every released or put vnode argument comes first or after other ones.	2017-04-16 17:18:28 +00:00
riastradh	d08e9ec7c8	regen	2017-04-16 16:49:25 +00:00
riastradh	6f8a4faacd	Back out previous. Breaks file systems for which VOP_UNLOCK doesn't work on a reclaimed vnode. The only case in tree right now is sys/fs/union -- most file systems use genfs_unlock, which does work on a reclaimed vnode. Maybe we can work around this -- and still enable VOP_RECLAIM's callees to assert lock ownership -- by having VOP_RECLAIM unlock the vnode instead.	2017-04-16 16:48:08 +00:00
riastradh	5a3d793f2a	regen to confirm no functional change	2017-04-15 23:21:46 +00:00
riastradh	ce1c68db98	Keep vnode locked during VOP_RECLAIM. No bump because it wouldn't have been possible to acquire the lock in VOP_RECLAIM anyway -- instant deadlock because vn_lock waits to transition out of the RECLAIMING state first. Benefit is that we can now assert ownership of the lock in any operations called by VOP_RECLAIM. Discussed on tech-kern: https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html	2017-04-15 23:16:53 +00:00
skrll	070497e366	Paranoia... keep vmspace reference while doing pmap_procwr	2017-04-13 07:58:45 +00:00
christos	cd306a0c3c	use opt_kmem.h for the KMEM_ variables.	2017-04-12 20:05:54 +00:00
hannken	e08a8c4104	Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator. Add a helper to retrieve a mount with "highest generation < arg" and use it from vfs_unmount_forceone() and vfs_unmountall1().	2017-04-12 10:35:10 +00:00
hannken	6058fea9b5	Switch veriexec_dump() and veriexec_flush() to mountlist iterator.	2017-04-12 10:30:02 +00:00
hannken	a315c73868	Switch do_sys_sync() and do_sys_getvfsstat() to mountlist iterator.	2017-04-12 10:28:39 +00:00
hannken	3137e0cee1	Switch vfs_vnode_lock_print() and printlockedvnodes() to _mountlist_next(). Switch sched_sync() and sysctl_kern_vnode() to mountlist iterator.	2017-04-12 10:26:33 +00:00
hannken	5ff843c227	Switch fstrans_dump() to _mountlist_next().	2017-04-12 10:23:35 +00:00
christos	d8c52c37b1	use a different root vnode variable to appease the rump gods.	2017-04-11 21:15:57 +00:00
riastradh	6d3ccf9762	Simplify: eliminate a now-needless unlock/lock cycle.	2017-04-11 14:45:46 +00:00
christos	b23251f1fa	return EPERM like the other failures.	2017-04-11 14:37:07 +00:00
christos	e85d5cbc14	Don't try to autoload modules before root is mounted.	2017-04-11 14:31:55 +00:00
riastradh	b7fb52a55b	regen to confirm no functional change	2017-04-11 14:30:33 +00:00
riastradh	d20cc14aa7	Eliminate now-unused WILLUNLOCK vop flag.	2017-04-11 14:29:32 +00:00
riastradh	2b4f5f70bd	regen	2017-04-11 14:26:13 +00:00
riastradh	87fb32292e	Make VOP_INACTIVE preserve vnode lock on return. Discussed on tech-kern: https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html Ride 7.99.68, a bumpy bus of incremental vfs improvements!	2017-04-11 14:24:59 +00:00
hannken	2f4fa4f94f	Add an iterator over the currently mounted file systems. Ride 7.99.68	2017-04-11 07:46:37 +00:00
jdolecek	6ef596151b	rename allow_fuadpo to allow_dpofua, so it's the same order as the SCSI flag	2017-04-10 21:36:05 +00:00
jdolecek	75f6d4fd1a	improve performance of journal writes by parallelizing the I/O - use 4 bufs by default, add sysctl vfs.wapbl.journal_iobufs to control it this also removes need to allocate iobuf during commit, so it might help to avoid deadlock during memory shortages like PR kern/47030	2017-04-10 21:34:37 +00:00
jdolecek	946ca69f6d	change b_wapbllist to TAILQ, to preserve the LRU order	2017-04-10 19:52:38 +00:00
kamil	05ffc73c35	Add new ptrace(2) API: PT_SETSTEP & PT_CLEARSTEP These operations allow to mark thread as a single-stepping one. This allows to i.a.: - single step and emit a signal (PT_SETSTEP & PT_CONTINUE) - single step and trace syscall entry and exit (PT_SETSTEP & PT_SYSCALL) The former is useful for debuggers like GDB or LLDB. The latter can be used to singlestep a usermode kernel. These examples don't limit use-cases of this interface. Define PT_*STEP only for platforms defining PT_STEP. Add new ATF tests setstep[1234]. These ptrace(2) operations first appeared in FreeBSD. Sponsored by <The NetBSD Foundation>	2017-04-08 00:25:49 +00:00
jdolecek	046f6d9783	optionally use FUA instead of full cache sync, and DPO for journal writes, when supported by disk device; controlled by sysctl vfs.wapbl.allow_fuadpo, default off for now discussed on tech-kern	2017-04-05 20:38:53 +00:00
jdolecek	6801660c77	expose disk device FUA/DPO support via DIOCGCACHE, and allow the flags to be set for I/O; implement support in sd(4) and nvme(4) discussed on tech-kern	2017-04-05 20:15:49 +00:00
skrll	bdf6985b50	spaces to tab	2017-03-31 08:50:54 +00:00
martin	1fd4f01ae0	PR kern/52117: move stop code for debuged children after fork into MI code. XXX we might want to revisit this when handling the same event for vfork better.	2017-03-31 08:47:04 +00:00
msaitoh	eabd5e1de9	Remove extra 0x. This bug was added when replacing bitmask_snprintf(9) with snprintb(3) (in between NetBSD 5 and 6). Old bitmask_snprint(9) didn't add 0x" automatically for hexadecimal value, so old code used it with "0x%s".	2017-03-31 08:38:13 +00:00
msaitoh	913e06bbd4	Remove extra 0x in m_print().	2017-03-31 05:44:05 +00:00
christos	6e0bd5329a	factor out getauxv code.	2017-03-30 20:17:11 +00:00
hannken	d0dc55acf0	Locking a layer vnode is racy as it may become reclaimed before calling the operation on the lower vnode. Replace vi_lock with a rw_obj and change layered file systems to share the lock with the lower vnode. Layered file systems now use genfs_lock()/_unlock/_islocked(). Welcome to 7.99.67	2017-03-30 09:16:52 +00:00
hannken	799c5cfefa	Change the operations vector before changing the mount. Vnode operations enter the mount before using the vector.	2017-03-30 09:15:51 +00:00
hannken	1a31dbf3eb	Change vrelel() to defer the test for a reclaimed vnode until we hold both the interlock and the vnode lock. Add a common operation to deallocate a vnode in state LOADING.	2017-03-30 09:14:59 +00:00
hannken	cf9ded4af4	Add flag VRELEL_FORCE_RELE to vrelel() to force release and use it from vdrain_vrele() and vrele_flush() to prevent a possible live lock from vrele_flush().	2017-03-30 09:14:08 +00:00
hannken	a644d1ecc8	Lock the vnode before changing its writecount.	2017-03-30 09:13:37 +00:00
hannken	dd67c605a3	Change _fstrans_start() to allocate per lwp info for layered file systems to get a reference on the mount. Set mnt_lower on successfull mount only.	2017-03-30 09:13:01 +00:00
hannken	3332a1029e	Change last users of FSTRANS_LAZY to FSTRANS_SHARED and change genfs_suspendctl() to move from FSTRANS_NORMAL to FSTRANS_SUSPENDED and vice versa.	2017-03-30 09:12:21 +00:00
kamil	7c54169f6c	Revert previous. Pointed out by Christous Zoulas that ELF_AUX_ENTRIES * sizeof(AuxInfo) assumption is incomplete. There is emulation code that can use different values (smaller and larger).	2017-03-29 22:48:03 +00:00

... 3 4 5 6 7 ...

10132 Commits