Commit Graph

8499 Commits

Author SHA1 Message Date
msaitoh
50699a2b3c Fix memory leak on the following cases when device attached or detached:
- No one open drvctl.
 - kmem_alloc() failed in devmon_insert().

XXX pullup to both netbsd-5 and netbsd-6.
2013-04-26 09:04:43 +00:00
yamt
c7784f4d97 - make debug size check more strict
- add comments about debug features
2013-04-22 13:22:25 +00:00
yamt
470409eadf whitespace 2013-04-22 13:13:20 +00:00
uebayasi
38d7d2cac5 Whitespace. 2013-04-21 02:44:15 +00:00
christos
ea4869ad3c revert previous, you can run on mips 64 bit binaries with a 32 bit kernel. 2013-04-20 22:28:58 +00:00
christos
c91b1193e7 don't attempt to load elf64 on 32 bit machines 2013-04-20 18:04:41 +00:00
para
9c44086af0 addresses PR/47512
properly return NULL for failed allocations not 0x8 with size checks enabled.
2013-04-16 21:13:38 +00:00
skrll
38fd17d91a Fix PAX build. 2013-04-09 07:39:01 +00:00
skrll
94a59cc1db Remove some set but unused variables 2013-04-08 21:12:33 +00:00
chs
3f6811bc27 don't overwrite the CTF info with the symbol table. 2013-04-07 00:49:45 +00:00
rmind
2540bef8df xc_highpri: fix assert. 2013-04-07 00:31:40 +00:00
martin
9d0957eba7 Provide binary compatibility for architectures that (erroneously) had
a larger MAXPARTITIONS value (and thus larger struct disklabel).
2013-04-04 12:51:39 +00:00
christos
11c04fdfa0 undo previous and move the test to the timeout function since 0,0 means
disable timer/interval.
2013-04-01 16:37:22 +00:00
christos
7f7fe0a2eb do the timeout test centrally. 2013-04-01 15:46:46 +00:00
martin
cf1df18e92 ts2timo: return ETIMEDOUT instead of failing an assertion when the
calculated difference to the target time is zero.
2013-04-01 12:31:34 +00:00
christos
2c8702f59e instead of doing the tests twice fix the *fix() routines to return ETIMEDOUT
if seconds are negative. Accorting to TOG, this is not an error as linux
claims. Also make an assert stricter.
2013-03-31 16:46:29 +00:00
christos
463c93b4e7 always return immediately on error, and if we passed negative seconds,
return with 0.
2013-03-31 16:45:06 +00:00
martin
573f2396f8 Move clock_gettime1() to subr_time.c (which is included in rump kernels) 2013-03-29 10:34:12 +00:00
christos
698e9d4d95 regen 2013-03-29 01:10:13 +00:00
christos
4cec95f0ea Centralize the computation of struct timespec to the int timo.
Make lwp_park take the regular arguments for specifying what kind
of timeout we supply like clock_nanosleep(), namely clockid_t and flags.
2013-03-29 01:08:17 +00:00
tls
88ad351cb1 Re-fix 'fix' for SA-2013-003. Because the original fix evaluated a flag
backwards, in low-entropy conditions there was a time interval in which
/dev/urandom could still output bits on an unacceptably short key.  Output
from /dev/random was *NOT* impacted.

Eliminate the flag in question -- it's safest to always fill the requested
key buffer with output from the entropy-pool, even if we let the caller
know we couldn't provide bytes with the full entropy it requested.

Advisory will be updated soon with a full worst-case analysis of the
/dev/urandom output path in the presence of either variant of the
SA-2013-003 bug.  Fortunately, because a large amount of other input
is mixed in before users can obtain any output, it doesn't look as dangerous
in practice as I'd feared it might be.
2013-03-28 18:06:48 +00:00
christos
fd5d831f1e downgrade an error to debug. 2013-03-24 22:06:37 +00:00
plunky
5ec364d4d9 C99 section 6.7.2.3 (Tags) Note 3 states that:
A type specifier of the form

	enum identifier

  without an enumerator list shall only appear after the type it
  specifies is complete.

which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).

(ok elad@)
2013-03-18 19:35:35 +00:00
para
82aa1e7edd calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html
2013-03-18 13:36:21 +00:00
gdt
de83e1acfd Add comment questioning lock asymmetry. 2013-03-14 19:13:17 +00:00
yamt
37fc08318c revert rev.1.37 for now.
PR/47634 from Ryo ONODERA.
while i have no idea how this change can break bge,
i don't have hardware and/or time to investigate right now.
2013-03-12 23:16:31 +00:00
pooka
83a2a556bf In pool_cache_put_slow(), pool_get() can block (it does mutex_enter()),
so we need to retry if curlwp took a context switch during the call.
Otherwise, CPU-local invariants can get screwed up:

    panic: kernel diagnostic assertion "cur->pcg_avail == cur->pcg_size" failed

This is (was) very easy to reproduce by just running:

  while : ; do RUMP_NCPU=32 ./a.out ; done

where a.out only calls rump_init().  But, any situation there's contention
and a pool doesn't have emptygroups would do.
2013-03-11 21:37:54 +00:00
pooka
55246528e8 At least pretend to not leak memory in sysctl initialization.
This commit message would be longer if it included opinions about
sysctllog vs. CTLFLAG_PERMANENT ...
2013-03-10 17:55:42 +00:00
christos
38cec6f03a more detailed/consistent error messages. 2013-03-10 04:25:06 +00:00
apb
f92c0e46b0 Properly differentiate between infinite timeout and zero timeout.
Local variable timo = -1 is used for zero timeout (non blocking mode).

Fixes PR 47625 from anthony.mallet
2013-03-08 09:32:59 +00:00
apb
7c5d63e1c6 In the timeout passed to sigtimedwait, NULL means an infinite timeout,
and {.tv_sec = 0, .tv_nsec=0} means do not block at all.  Add a comment
saying so.  The code incorrectly treats them both as an infinite timeout,
and that is not fixed by this commit.
2013-03-08 08:48:38 +00:00
apb
90c6b7a188 also comment on the meaning of timo=0 for cv_timedwait_sig. 2013-03-08 08:36:37 +00:00
apb
e6dad85522 Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout.  This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.
2013-03-08 08:35:09 +00:00
pooka
1578b4b049 make rump kernel syscalls through curproc->p_emul instead of rump_sysent 2013-03-07 19:17:46 +00:00
matt
17f82b93c2 Add a kern.configname sysctl object. 2013-03-07 18:02:54 +00:00
yamt
69f842b1d9 - use scaled calculations for avgcount
- sched_balance: account lwp which is currently running
- sched_balance: skip cpus w/o migratable lwps
2013-03-06 11:25:01 +00:00
yamt
0f92d1cdeb update comments 2013-03-06 11:20:10 +00:00
christos
e5a1aef5b9 remove extra chatty messages 2013-03-05 03:04:00 +00:00
christos
2a0d04a751 more debugging 2013-03-03 16:55:26 +00:00
pgoyette
e8ac3e27f9 Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern.  Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.
2013-02-21 01:39:54 +00:00
martin
b6d45f2118 Stopgap fix to make rump cooperate with pserialize, may be revisited later.
Patch from pooka, ok: rmind. No related regressions in a complete atf test
run (which works again with this, even on non x86 SMP machines).
2013-02-19 11:20:17 +00:00
christos
58523baaac PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.
2013-02-14 21:57:58 +00:00
riastradh
322ad729b3 Fix some screw cases in cmsg file descriptor passing.
- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos
2013-02-14 01:00:07 +00:00
hannken
9f9ac3cb83 Make the spec_node table implementation private to spec_vnops.c.
To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented.  Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire.  Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17
2013-02-13 14:03:48 +00:00
christos
0c9d7240de Can you please tell us the module name that had the wrong version? Thanks. 2013-02-12 19:14:50 +00:00
apb
bb0eb3bd82 Move the DDB-specific part of vpanic() to a new db_panic() function,
defined in ddb/db_panic.c and declared in ddb/ddbvar.h.  No functional
change.

The copyright years in db_panic.c are the years in which changes were
made to the code that has now been moved to db_panic.c.  No pre-NetBSD
copyright notice is needed because revision 1.12 of subr_prf.c had only
the trivial "#ifdef DDB \\ Debugger(); \\ #endif"
2013-02-10 11:04:19 +00:00
njoly
5fb876b9e0 Fix LOCKDEBUG build. 2013-02-09 11:04:32 +00:00
christos
4a90750c00 CID/980000: missing va_end() 2013-02-09 01:20:08 +00:00
christos
752baf2503 why didn't gcc find the formatting error? 2013-02-09 00:32:12 +00:00
christos
a67c3c8971 printflike maintenance. 2013-02-09 00:31:21 +00:00
skrll
8668323de3 Fix release of vmem_btag_lock (don't release twice in error path) 2013-02-08 09:30:01 +00:00
rmind
8ba0fc0dab - pserialize_switchpoint: check for passing twice, not more than needed.
- pserialize_perform: avoid a possible race with softint handler.
Reported by hannken@.
2013-02-07 23:37:58 +00:00
matt
06924b3fe7 Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.
2013-02-02 14:02:09 +00:00
joerg
8f31aaa01e Add sockaddr_format to ease debugging code dealing with socket
addresses.
2013-01-31 14:30:47 +00:00
para
8e65446416 fix the sysctl_setup_func typedef 2013-01-29 23:00:31 +00:00
para
19d40baab3 make vmem(9) ready to be used early during bootstrap to replace extent(9)
pass memory for vmem structs into the initialization function and
do away with the static pool of vmem structs.
remove special bootstrapping of the quantum cache pools of the kmem_va_arena
as memory for pool_caches is allocated via pool_allocator_meta which is
fully operational at this point.
2013-01-29 21:26:24 +00:00
christos
a5d450d94b remove useless cast (Richard Hansen) 2013-01-29 19:56:43 +00:00
tls
f974bd2506 Tweak the previous a little: don't be so hasty to declare sources "fast"
and process them in bulk, but, always declare hardware RNGs and VM system
sources as "fast" since in these cases efficiency is important and data
will be abundant.
2013-01-26 22:22:07 +00:00
riastradh
80ae1f3144 Assert equality, not assignment, in selrecord.
Code inspection suggests that this fix is not likely to reveal any
latent problems.
2013-01-26 19:38:17 +00:00
tls
d391d2bf9a Rather than holding samples from each source until we have 64 at a time to
process, process them ASAP for low-rate sources, and for all sources if we
have not yet acquired initial entropy.
2013-01-26 19:05:11 +00:00
tls
b4e58a0a00 Fix a security issue: when we are reseeding a PRNG seeded early in boot
before we had ever had any entropy, if something else has consumed the
entropy that triggered the immediate reseed, we can reseed with as little
as sizeof(int) bytes of entropy.
2013-01-26 16:05:34 +00:00
para
39dafdefa9 revert previous commit not yet fully functional, sorry 2013-01-26 15:18:00 +00:00
para
cca299e0a3 make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.
2013-01-26 13:50:33 +00:00
riastradh
a4e65a34a6 Assert equality, not assignment, in rnd_hwrng_test.
Not tested, but by inspection, the only caller, rnd_process_events,
clearly guarantees the condition.
2013-01-24 14:23:45 +00:00
christos
131cc4df10 It is useless to check for sigcontext_vec and compat module loading for
PK_32 processes. The correct modules are already loaded, otherwise how
is the process running?
2013-01-22 01:45:59 +00:00
hannken
037fec6e9b Replace the rwlock based implementation with passive serialization
from pserialize(9) and mutex / condvar.

The fast paths (fstrans_start/fstrans_done on a file system not
suspended or suspending and fscow_run with no change pending) now
run without locks or other atomic operations.  Suspension and cow
handler insertion and removal is done with mutex / condvars.

The API remains unchanged.
2013-01-21 09:14:01 +00:00
rmind
d797bd3dba - physmap_map, physmap_map_fini: pmap_update() must be performed before
freeing the VA; otherwise there is a window when it can be re-used while
  stale TLB entries may be present.
- physmap_fill: use MIN() instead of min(), since vsize_t is used.
- Add RCS ID comment while here and prevent physmap.h inclusion in userland.
2013-01-19 01:04:51 +00:00
rmind
d3cb55ca37 Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
2013-01-19 00:51:52 +00:00
matt
beae54ff5a Contain support routines used to create and destroy lists of physical pages
from lists of pages or ranges of virtual address.  By using these physical
maps, the kernel can avoid mapping physical I/O in the kernel's address space
in most cases.
2013-01-18 06:42:16 +00:00
pooka
6f7f25db9f Include rumpuser_port.h in rump_syscalls.c when compiled for librumpclient 2013-01-17 21:30:30 +00:00
matt
4ffdc4bda5 Add a separate bool to indicate a symbol table has been loaded.
ksym_initted indicates whether the kmutex has been initted or not.
Add __cacheline_aligned to the kmutex.
2013-01-17 14:36:36 +00:00
msaitoh
394ebb1bff Set resource limit. The rnd_process_events() function is called every tick
and process the sample queue. Without limitation, if a lot of rnd_add_*()
are called, all kernel memory may be eaten up.
2013-01-16 06:45:24 +00:00
dholland
ab137c90f7 Revert defective O_SEARCH implementation committed by manu@ along with
the *at system calls on November 18th of last year. Reasons to revert
it include:
   - it is incorrect in a whole variety of ways (but fortunately, one
     of them is that the missing and improper permission checks have
     no net effect);
   - it was committed without review or discussion;
   - core ruled that all the new O_* flags pertaining to the *at calls
     needed to wait until their semantics could be clarified.

manu was asked to revert it on these grounds but has ignored the request.

I have left O_SEARCH defined and visible and made open() explicitly
ignore it. This way, most code that tries to use it will continue to
build and run. I've also arranged lib/libc/c063/t_o_search.c so that
the tests that make use of the O_SEARCH semantics will disappear until
O_SEARCH comes back, and fixed some mistakes and/or incorrect hacks
that were causing some of these to succeed despite the broken O_SEARCH
implementation.
2013-01-13 08:15:02 +00:00
mlelstv
20911e3ae3 Also report attachment of pseudo-devices to userland. 2013-01-10 10:15:59 +00:00
rmind
ef8a266f76 - softint_dispatch: perform pserialize(9) switchpoint when softintr processing
finishes (without blocking).  Problem reported by hannken@, thanks!
- pserialize_read_enter: use splsoftserial(), not splsoftclock().
- pserialize_perform: add xcall(9) barrier as interrupts may be coalesced.
2013-01-07 23:21:31 +00:00
chs
8f9db9bc46 fix setrlimit(RLIMIT_STACK) for __MACHINE_STACK_GROWS_UP platforms. 2013-01-07 16:54:54 +00:00
para
493b8304e5 fix a lock order reversal during global boundary tag refill.
thanks to chuq@
xxx: request pullup
2013-01-04 08:28:38 +00:00
matt
92d08eb574 Remove a debugging printf 2012-12-31 01:20:05 +00:00
pooka
79f4679e52 size_t needs to printed with %zu 2012-12-30 20:52:20 +00:00
hannken
caf1788f80 Always call brelse() on error for breadn() too. 2012-12-30 09:19:24 +00:00
christos
250d24d86f Always call brelse() on error. Otherwise a possible error from bread() will
cause the buffer to stay lock and we end up blocking forever in
VOP_CLOSE->spec_close->vinvalbuf->bbysy since the buffer is marked busy
but there is no I/O pending.
This caused my laptop to hang on boot_findwedge because:
    findroot: unable to read block 358331527 of dev dk0 (22)
2012-12-29 21:56:04 +00:00
mlelstv
b78aa16690 The sanity check prevented messages that carry only ancillary data. 2012-12-29 18:51:39 +00:00
mlelstv
bfb6412a0b If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).
2012-12-29 10:22:40 +00:00
matt
09ae87cfb2 Add support for kernel-based code to use a PCU. (for memory to memory
copies or in_cksum or ...)
2012-12-26 18:30:22 +00:00
njoly
0558d8206e One semi-column is enough. 2012-12-21 19:39:48 +00:00
hannken
312d89f0de Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
2012-12-20 08:03:41 +00:00
dsl
682ac0728a The lwp_id in a process are supposed to be non-zero and unique.
This stops being true once a process has allocated (and freed) 2^32 lwps.
(I've not timed it!)
There is also some code lurking (eg ld.elf_so) that doesn't expect the
  high be be set.
Once the lwp_id wraps, scan the list to find the first free id higher
  than the last one allocated.
Maintain the sort order to make the possible.
Note that if some lwp (but not all) are allocated numbers from the pid
  space it will go horribly wrong.
Tested by setting the limit to 128 and getting firefox to create threads.
2012-12-16 22:21:03 +00:00
pooka
8c45e7bd31 Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
 done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG
2012-12-14 18:39:48 +00:00
yamt
38363c022a rw_vector_enter: reload owner in the case of no hand-off.
this fixes crashes in rw_oncpu().
2012-12-12 14:53:01 +00:00
pooka
37ca5a0657 Signed overflow is undefined behavior, and one version of gcc
clearly tells us:

	kern_rate.c:98: warning: assuming signed overflow does not
	occur when assuming that (X + c) > X is always true

Check value against INT_MAX instead.  Also, for good measure throw
in a __predict() to flag the assumed common case.
2012-12-12 11:10:56 +00:00
hannken
e0783eff3d Try to coalesce writes to the journal in MAXPHYS sized and aligned blocks.
Speeds up wapbl_flush() on raid5 by a factor of 3-4.

Discussed on tech-kern.

Needs pullup to NetBSD-6.
2012-12-08 07:24:42 +00:00
chs
11c69f2d20 adapt the cyclic module and profile dtrace provider to netbsd.
for now, just hook the cyclic callback into hardclock().
2012-12-02 01:05:16 +00:00
mbalmer
e3f283b63f Fix misspelling: accommodate is a long enough word to have room for two 'c's
and two 'm's.
2012-12-01 11:41:49 +00:00
njoly
6be7ae0a4a Apply fix from hannken to ensure that VOP_ACCESS() is called on a
locked vnode for fd_nameiat(), fd_nameiat_simple() and do_sys_openat().
Fix both PR/47226 and PR/47255.
2012-11-30 13:26:37 +00:00
christos
610818b251 expose ksem_t for fstat(8), and implement stat for future reference. 2012-11-25 01:05:04 +00:00
christos
4669372c9c - initialize kn_id
- in close, invalidate f_data and f_type early to prevent accidental re-use
- add a DIAGNOSTIC for when we use unsupported fd's and a KASSERT for f_event
  being NULL.
2012-11-24 15:14:32 +00:00
christos
7e72d438b2 Return EOPNOTSUPP for fnullop_kqfilter to prevent registration of unsupported
fds. XXX: We should really fix the fd's to be supported in the future.
Unsupported fd's have a NULL f_event, so registering crashes the kernel with
a NULL function dereference of f_event.
2012-11-24 15:07:44 +00:00
msaitoh
386462fd73 Pass correct wait channel string. 2012-11-20 11:06:27 +00:00
martin
daab85cca7 Use copyout to copy data from kernel out to userland!
Fixes PR kern/47217.
2012-11-19 15:01:17 +00:00
pooka
78b801a3d2 remove unused variable 2012-11-18 18:36:01 +00:00