Commit Graph

324 Commits

Author SHA1 Message Date
ozaki-r 040205ae93 Protect ifnet list with psz and psref
The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
2016-05-12 02:24:16 +00:00
ozaki-r 2cf7873b92 Constify rtentry of if_output
We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
2016-04-28 00:16:56 +00:00
christos e7ae23fd9e include "ioconf.h" to get the 'void <driver>attach(int count);' prototype. 2015-08-20 14:40:16 +00:00
riastradh 6c3a21ccc3 <sys/rnd.h> not needed for pf_norm.c. 2015-04-13 16:35:33 +00:00
dholland f9228f4225 Add d_discard to all struct cdevsw instances I could find.
All have been set to "nodiscard"; some should get a real implementation.
2014-07-25 08:10:31 +00:00
ozaki-r de94e6c564 Unbreak the build of pf 2014-07-25 04:09:58 +00:00
rmind 60d350cf6d - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
2014-06-05 23:48:16 +00:00
rmind 44b8265175 Fix previous. 2014-05-17 21:00:33 +00:00
rmind f7741dab17 - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
2014-05-17 20:44:24 +00:00
dholland a68f9396b6 Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
2014-03-16 05:20:22 +00:00
nonaka fefa462b86 remove unused variable to avoid warning from gcc 4.8. 2014-03-06 15:21:58 +00:00
christos 3cf53c78f3 fix compiler warnings 2013-10-20 21:05:47 +00:00
skrll 34b5ada363 PFIL_HOOKS is dead. 2013-07-01 08:32:48 +00:00
njoly 8b89b15c25 Fix pf module build. Adjust pfil_remove_hook 3rd arguments. 2013-06-30 17:23:52 +00:00
rmind 430eae4e07 Update pf to pfil(9) changes. Missed in previous commit. 2013-06-30 14:58:48 +00:00
plunky ea7708e17f IPF 5.1.2 is in external/bsd/ipf and sys/external/bsd/ipf now;
these files are obsolete
2012-09-15 18:12:17 +00:00
drochner 364a06bb29 remove KAME IPSEC, replaced by FAST_IPSEC 2012-03-22 20:34:37 +00:00
riz f8a1d7977c Back out the recent import of IPFilter 5.1.1 for the upcoming branch,
which will now have IPFilter 4.1.34.  IPFilter 5.1.1 will be restored
post-branch.

ok: core, releng.
2012-02-15 17:55:03 +00:00
darrenr fbf74c8f7c PR kern/45929
ipnat does not remove rules with -r
2012-02-09 07:15:27 +00:00
darrenr a785f15818 PR kern/45907
#ifdef USE_INET6 guards are missing around IPv6 code
2012-02-01 17:11:46 +00:00
christos bae8623cee ansify 2012-02-01 16:46:28 +00:00
he 67c7d087a0 Don't refer to ipf_log_soft_destroy(), ipf_log_soft_create(),
ipf_log_soft_init(), and ipf_log_soft_fini() unless IPFILTER_LOG is
defined, since ipf_log.c won't be built unless that flag is defined,
ref. sys/netinet/files.ipfilter.
2012-02-01 10:18:04 +00:00
he 8d979ac389 Don't forward-declare ipf_dolog unless IPFILTER_LOG is defined,
since its implementation is under that ifdef.
2012-02-01 10:03:24 +00:00
christos 20c1167f54 ansify 2012-02-01 02:21:19 +00:00
darrenr 714fe0e37d PR bin/45894
ipftest core dumps when running tests
2012-01-31 09:41:37 +00:00
christos 762e4c5eaf fix printf formats 2012-01-30 20:51:50 +00:00
martin cfe24375a5 Initialize "match", it may be used uninitialized, and gcc complains
about it.
2012-01-30 20:40:39 +00:00
darrenr b01ead29e4 Merge IPFilter 5.1.1 into HEAD 2012-01-30 16:12:02 +00:00
darrenr 9c0e9659f2 Import IPFilter 5.1.1 2012-01-30 16:02:57 +00:00
drochner 0d96157461 protect "union sockaddr_union" from being defined twice by a CPP symbol
(copied from FreeBSD), allows coexistence of (FAST_)IPSEC and pf
2012-01-11 14:37:45 +00:00
drochner 496df2a91f do missing ipsec->kame_ipsec renames 2011-12-19 16:10:07 +00:00
tls 6e1dd068e9 Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation.  Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key.  Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256.  This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved.  For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.
2011-12-17 20:05:38 +00:00
tls f27d6532f5 Remove arc4random() and arc4randbytes() from the kernel API. Replace
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not).  This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
2011-11-28 08:05:05 +00:00
mbalmer da918447be Typo. 2011-11-27 10:53:07 +00:00
tls 3afd44cf08 First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>.  This change includes
the following:

	An initial cleanup and minor reorganization of the entropy pool
	code in sys/dev/rnd.c and sys/dev/rndpool.c.  Several bugs are
	fixed.  Some effort is made to accumulate entropy more quickly at
	boot time.

	A generic interface, "rndsink", is added, for stream generators to
	request that they be re-keyed with good quality entropy from the pool
	as soon as it is available.

	The arc4random()/arc4randbytes() implementation in libkern is
	adjusted to use the rndsink interface for rekeying, which helps
	address the problem of low-quality keys at boot time.

	An implementation of the FIPS 140-2 statistical tests for random
	number generator quality is provided (libkern/rngtest.c).  This
	is based on Greg Rose's implementation from Qualcomm.

	A new random stream generator, nist_ctr_drbg, is provided.  It is
	based on an implementation of the NIST SP800-90 CTR_DRBG by
	Henric Jungheim.  This generator users AES in a modified counter
	mode to generate a backtracking-resistant random stream.

	An abstraction layer, "cprng", is provided for in-kernel consumers
	of randomness.  The arc4random/arc4randbytes API is deprecated for
	in-kernel use.  It is replaced by "cprng_strong".  The current
	cprng_fast implementation wraps the existing arc4random
	implementation.  The current cprng_strong implementation wraps the
	new CTR_DRBG implementation.  Both interfaces are rekeyed from
	the entropy pool automatically at intervals justifiable from best
	current cryptographic practice.

	In some quick tests, cprng_fast() is about the same speed as
	the old arc4randbytes(), and cprng_strong() is about 20% faster
	than rnd_extract_data().  Performance is expected to improve.

	The AES code in src/crypto/rijndael is no longer an optional
	kernel component, as it is required by cprng_strong, which is
	not an optional kernel component.

	The entropy pool output is subjected to the rngtest tests at
	startup time; if it fails, the system will reboot.  There is
	approximately a 3/10000 chance of a false positive from these
	tests.  Entropy pool _input_ from hardware random numbers is
	subjected to the rngtest tests at attach time, as well as the
	FIPS continuous-output test, to detect bad or stuck hardware
	RNGs; if any are detected, they are detached, but the system
	continues to run.

	A problem with rndctl(8) is fixed -- datastructures with
	pointers in arrays are no longer passed to userspace (this
	was not a security problem, but rather a major issue for
	compat32).  A new kernel will require a new rndctl.

	The sysctl kern.arandom() and kern.urandom() nodes are hooked
	up to the new generators, but the /dev/*random pseudodevices
	are not, yet.

	Manual pages for the new kernel interfaces are forthcoming.
2011-11-19 22:51:18 +00:00
jmcneill 883cb292ab fix -Wshadow warnings when ALTQ is enabled 2011-08-30 19:05:12 +00:00
jmcneill 1f02a7ab53 build pf module with WARNS=3, and remove the need for -Wno-shadow 2011-08-29 09:50:04 +00:00
mrg fc8dfe2ed3 fix an uninitialised variable problem. large-ish function, but i
couldn't see how GCC 4.5 isn't wrong about this one.
2011-07-01 02:33:23 +00:00
drochner 31eddb04eb remove unused expression 2011-05-18 12:54:15 +00:00
hauke 51a9336da1 Commit the patch from
<http://mail-index.netbsd.org/current-users/2010/09/12/msg014289.html>,
fixing a "panic: pool 'pfrktable' is IPL_NONE, but called from
interrupt context" that occurred on NetBSD/sparc.
2011-05-11 12:22:34 +00:00
dyoung c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
jakllsch 405a0ee48c Use %zu for size_t in debugging printf. 2011-03-05 21:51:17 +00:00
plunky eec5ecf4cc avoid #ifdef/#endif inside sprint() argument list, as with USE_FORT=yes
sprint becomes a macro
2011-02-24 18:35:40 +00:00
christos 57228dce84 Add 1 to the port range so the range is inclusive as documented. 2011-02-12 21:23:31 +00:00
christos e8a328b2a2 PR/44070: Avoid zero divide in modulo operations. 2011-02-12 18:14:21 +00:00
drochner 9b0c6e6540 make sure the "overload_tbl" member of "struct pf_rule" copied in
from userland is initialized (it is used by the kernel only)
fixes crash or data injection (CVE-2010-3830), usually by root user only
OpenBSD has rewritten the code to start with a zero'd struct and fills
in needed parts only - to be considered in case a newer pf version
is imported.
2011-01-19 19:58:02 +00:00
rmind c40af51a1a ip_randomid: make mechanism MP-safe and more modular.
OK matt@
2010-11-05 01:35:57 +00:00
mlelstv b707187b9f Fix mbuf corruption when sending ICMP errors for blocked IPv6
packets due to wrong buffer size computations. The corrupted
mbufs could lead to a panic.

Fix computation of link mtu where the link mtu itself is unspecified.

Limit ICMP error packets for IPv6 to MMTU as required by RFC4443. This
also avoids dropped errors when the length exceeds the link mtu.
2010-09-05 12:36:46 +00:00
pgoyette d591fa52b1 Revert previous - changes here are irrelevant to NetBSD
Need more caffeine.
2010-08-11 11:57:36 +00:00
pgoyette 99d809cb2e Keep condvar wmesg's within 8-char limit. 2010-08-11 11:40:51 +00:00