Commit Graph

142 Commits

Author SHA1 Message Date
dholland
f9228f4225 Add d_discard to all struct cdevsw instances I could find.
All have been set to "nodiscard"; some should get a real implementation.
2014-07-25 08:10:31 +00:00
ozaki-r
de94e6c564 Unbreak the build of pf 2014-07-25 04:09:58 +00:00
rmind
60d350cf6d - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
2014-06-05 23:48:16 +00:00
rmind
44b8265175 Fix previous. 2014-05-17 21:00:33 +00:00
rmind
f7741dab17 - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
2014-05-17 20:44:24 +00:00
dholland
a68f9396b6 Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
2014-03-16 05:20:22 +00:00
nonaka
fefa462b86 remove unused variable to avoid warning from gcc 4.8. 2014-03-06 15:21:58 +00:00
christos
3cf53c78f3 fix compiler warnings 2013-10-20 21:05:47 +00:00
skrll
34b5ada363 PFIL_HOOKS is dead. 2013-07-01 08:32:48 +00:00
njoly
8b89b15c25 Fix pf module build. Adjust pfil_remove_hook 3rd arguments. 2013-06-30 17:23:52 +00:00
rmind
430eae4e07 Update pf to pfil(9) changes. Missed in previous commit. 2013-06-30 14:58:48 +00:00
drochner
364a06bb29 remove KAME IPSEC, replaced by FAST_IPSEC 2012-03-22 20:34:37 +00:00
drochner
0d96157461 protect "union sockaddr_union" from being defined twice by a CPP symbol
(copied from FreeBSD), allows coexistence of (FAST_)IPSEC and pf
2012-01-11 14:37:45 +00:00
drochner
496df2a91f do missing ipsec->kame_ipsec renames 2011-12-19 16:10:07 +00:00
tls
6e1dd068e9 Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation.  Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key.  Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256.  This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved.  For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.
2011-12-17 20:05:38 +00:00
tls
f27d6532f5 Remove arc4random() and arc4randbytes() from the kernel API. Replace
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not).  This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
2011-11-28 08:05:05 +00:00
tls
3afd44cf08 First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>.  This change includes
the following:

	An initial cleanup and minor reorganization of the entropy pool
	code in sys/dev/rnd.c and sys/dev/rndpool.c.  Several bugs are
	fixed.  Some effort is made to accumulate entropy more quickly at
	boot time.

	A generic interface, "rndsink", is added, for stream generators to
	request that they be re-keyed with good quality entropy from the pool
	as soon as it is available.

	The arc4random()/arc4randbytes() implementation in libkern is
	adjusted to use the rndsink interface for rekeying, which helps
	address the problem of low-quality keys at boot time.

	An implementation of the FIPS 140-2 statistical tests for random
	number generator quality is provided (libkern/rngtest.c).  This
	is based on Greg Rose's implementation from Qualcomm.

	A new random stream generator, nist_ctr_drbg, is provided.  It is
	based on an implementation of the NIST SP800-90 CTR_DRBG by
	Henric Jungheim.  This generator users AES in a modified counter
	mode to generate a backtracking-resistant random stream.

	An abstraction layer, "cprng", is provided for in-kernel consumers
	of randomness.  The arc4random/arc4randbytes API is deprecated for
	in-kernel use.  It is replaced by "cprng_strong".  The current
	cprng_fast implementation wraps the existing arc4random
	implementation.  The current cprng_strong implementation wraps the
	new CTR_DRBG implementation.  Both interfaces are rekeyed from
	the entropy pool automatically at intervals justifiable from best
	current cryptographic practice.

	In some quick tests, cprng_fast() is about the same speed as
	the old arc4randbytes(), and cprng_strong() is about 20% faster
	than rnd_extract_data().  Performance is expected to improve.

	The AES code in src/crypto/rijndael is no longer an optional
	kernel component, as it is required by cprng_strong, which is
	not an optional kernel component.

	The entropy pool output is subjected to the rngtest tests at
	startup time; if it fails, the system will reboot.  There is
	approximately a 3/10000 chance of a false positive from these
	tests.  Entropy pool _input_ from hardware random numbers is
	subjected to the rngtest tests at attach time, as well as the
	FIPS continuous-output test, to detect bad or stuck hardware
	RNGs; if any are detected, they are detached, but the system
	continues to run.

	A problem with rndctl(8) is fixed -- datastructures with
	pointers in arrays are no longer passed to userspace (this
	was not a security problem, but rather a major issue for
	compat32).  A new kernel will require a new rndctl.

	The sysctl kern.arandom() and kern.urandom() nodes are hooked
	up to the new generators, but the /dev/*random pseudodevices
	are not, yet.

	Manual pages for the new kernel interfaces are forthcoming.
2011-11-19 22:51:18 +00:00
jmcneill
883cb292ab fix -Wshadow warnings when ALTQ is enabled 2011-08-30 19:05:12 +00:00
jmcneill
1f02a7ab53 build pf module with WARNS=3, and remove the need for -Wno-shadow 2011-08-29 09:50:04 +00:00
mrg
fc8dfe2ed3 fix an uninitialised variable problem. large-ish function, but i
couldn't see how GCC 4.5 isn't wrong about this one.
2011-07-01 02:33:23 +00:00
drochner
31eddb04eb remove unused expression 2011-05-18 12:54:15 +00:00
hauke
51a9336da1 Commit the patch from
<http://mail-index.netbsd.org/current-users/2010/09/12/msg014289.html>,
fixing a "panic: pool 'pfrktable' is IPL_NONE, but called from
interrupt context" that occurred on NetBSD/sparc.
2011-05-11 12:22:34 +00:00
dyoung
c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
drochner
9b0c6e6540 make sure the "overload_tbl" member of "struct pf_rule" copied in
from userland is initialized (it is used by the kernel only)
fixes crash or data injection (CVE-2010-3830), usually by root user only
OpenBSD has rewritten the code to start with a zero'd struct and fills
in needed parts only - to be considered in case a newer pf version
is imported.
2011-01-19 19:58:02 +00:00
rmind
c40af51a1a ip_randomid: make mechanism MP-safe and more modular.
OK matt@
2010-11-05 01:35:57 +00:00
degroote
ca38e323d1 Add support for pfs(8)
pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@
2010-05-07 17:41:57 +00:00
ahoka
8f356e922b Do not unload pf when enabled, not even manually. 2010-04-13 13:08:16 +00:00
ahoka
3bca1c92ed change module class to driver. 2010-04-13 11:53:18 +00:00
ahoka
b9e768f315 Do not auto unload pf if it's enabled. 2010-04-13 01:02:43 +00:00
ahoka
f6a8ba3d97 - Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill
2010-04-12 13:57:38 +00:00
skrll
ea3fb23f81 Spello in comment. 2010-04-12 06:56:19 +00:00
joerg
58e867556f Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
2010-04-05 07:19:28 +00:00
minskim
6f5cce7306 Fix a typo introduced by the bpf linkage change. 2010-01-23 01:17:23 +00:00
pooka
10fe49d72c Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client.  This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached.  However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff.  ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
2010-01-19 22:06:18 +00:00
elad
579013ded4 Replace uidinfo.h with kauth.h, should fix problems observed by tron@. 2009-12-30 19:47:15 +00:00
elad
67a179c338 Use the right member to store gid in the non-NetBSD case.
Pointed out by uebayasi@ and cegger@, thanks!
2009-12-30 16:49:02 +00:00
elad
cd2e2d5571 Get uid/gid from the socket's credentials. 2009-12-30 07:00:01 +00:00
dsl
835104494b If pfi_address_add() has to extend the buffer, copy the data in the
right direction!
Fixes PR/41939.
2009-12-06 16:46:11 +00:00
elad
6991fd9ea2 Move firewall/NAT policy back to respective subsystems (pf, ipf).
Note: the ipf code contains a lot of ifdefs, some of them for NetBSD
versions that are no longer maintained. It won't make the code more
readable, but we should consider removing them.
2009-10-03 00:37:01 +00:00
degroote
2d48ac808c Import pfsync support from OpenBSD 4.2
Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@
2009-09-14 10:36:48 +00:00
minskim
3c24e51c76 Remove LKM code from pf. 2009-07-28 18:15:26 +00:00
minskim
8221d4ac16 Reduce diff with OpenBSD. No functional change. 2009-06-16 05:15:41 +00:00
christos
ae0fe2262f Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x
2009-04-13 22:29:11 +00:00
cegger
e6e72079ad make this compile 2009-01-11 10:25:29 +00:00
cegger
dcf705893e use M_ZERO on malloc() and remove subsequent bzero(). 2008-12-19 18:49:37 +00:00
cegger
9c1c1ad122 pass M_NOWAIT instead of M_DONTWAIT to malloc. 2008-12-19 14:07:37 +00:00
dyoung
de87fe677d *** Summary ***
When a link-layer address changes (e.g., ifconfig ex0 link
02🇩🇪ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address.  (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior.  Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability.  KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR.  In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr.  That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR.  In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR.  For example, pull ..._init() out of any switch
statement that looks like this:

        switch (...->sa_family) {
        case ...:
                ..._init();
                ...
                break;
        ...
        default:
                ..._init();
                ...
                break;
        }

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

        switch (x & (IFF_UP|IFF_RUNNING)) {
        case 0:
                ...
                break;
        case IFF_RUNNING:
                ...
                break;
        case IFF_UP:
                ...
                break;
        case IFF_UP|IFF_RUNNING:
                ...
                break;
        }

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure.  Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls.  In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source.  In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively.  Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset.  Delete unnecessary casts to void *.  Use
sockaddr_in_init() and sockaddr_in6_init().  Compare pointers with
NULL instead of "testing truth".  Replace some instances of (type
*)0 with NULL.  Change some K&R prototypes to ANSI C, and join
lines.
2008-11-07 00:20:01 +00:00
pooka
7e5aba5af0 Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
2008-10-11 13:40:57 +00:00
peter
899faae713 Wrap definition of pfil6_wrapper in #ifdef INET6.
From Scott Ellis in PR/39007.
2008-06-22 11:36:33 +00:00
yamt
957749de49 remove pf42 branch's todo. 2008-06-19 03:37:57 +00:00