Commit Graph

2355 Commits

Author SHA1 Message Date
rmind 15d58f91b8 - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler.  Change the default policy to block when the config is
  loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
  of rule procedure might happen in the interrupt handler (under a very rare
  condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
2012-02-20 00:18:19 +00:00
rmind 91c530f0ed rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory. 2012-02-18 23:47:48 +00:00
rmind 2d3c715fba - Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
2012-02-06 23:30:14 +00:00
rmind f7fec0d2a4 Multiple NPF fixes, add better error reporting from kernel side, add some
asserts, bump the version.
2012-02-05 00:37:13 +00:00
christos 86cd0e8b2d PR/45764, PR/45914
Part 2:
Arrange so that the pointers that we free (ifp->if_afdata, dom->dom_ifqueues[i])
are set to NULL.
While I am here, add a continue.
2012-02-03 03:35:30 +00:00
matt 4b50cb788d Use proper ANSI prototypes for foo() -> foo(void)
Caught when compiling with -Wold-style-definition
2012-01-30 23:31:27 +00:00
christos 80398b9c60 - don't copy past the end of sockaddr if we are rounding, zero it out instead,
from mlelstv@
- put a comment explaining the 6 nuls.
2012-01-30 20:02:55 +00:00
christos 92b963447b Count length from the beginning of the structure not the sa_data portion.
From skrll@
2012-01-30 20:01:08 +00:00
rmind 833816ab75 Replace tun_lock with mutex(9). XXX: too far from being MP-safe yet. 2012-01-28 01:02:27 +00:00
rmind 4b85474b41 - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
2012-01-15 00:49:47 +00:00
christos 42c420856f - fix offsetof usage, and redundant defines
- kill pointer casts to 0
2011-12-31 20:41:58 +00:00
alnsn c55c73b80d Apply same bounds checks for BPF_LD|BPF_B|BPF_IND as for
BPF_LD|BPF_H|BPF_IND and BPF_LD|BPF_W|BPF_IND.

From FreeBSD r48548, the original r45574 had a typo.
2011-12-29 23:47:21 +00:00
christos b0874ea247 PR/45751: Alexander Nasonov: No overflow check in BPF_LD|BPF_ABS 2011-12-29 20:50:06 +00:00
dyoung b604e06e51 Fix ifpromisc() regression: if ifpromisc(ifp, 1) is called, do set
IFF_PROMISC whether ifp is IFF_UP or not, but do not call ifp->if_ioctl
unless ifp is IFF_UP.
2011-12-28 02:14:57 +00:00
christos 64f7c0e218 PR/45730: David Holland: Avoid having 2 copies of bpf.h in /usr/include.
This adds the missing entries from libpcap to make libpcap compile with
our bpf.h.
2011-12-21 19:04:18 +00:00
tls 6e1dd068e9 Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation.  Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key.  Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256.  This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved.  For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.
2011-12-17 20:05:38 +00:00
christos 4bdfaa0aa3 make comment reflect reality 2011-12-16 03:05:23 +00:00
christos 811ac7bb4f don't leak mbufs. 2011-12-15 22:20:26 +00:00
roy 46df35d87e When adding or scrubbing a prefix, always notify userland even if the
prefix does not have IFA_ROUTE.
Don't scrub the interface in SIOCAIFADDR if the new address does't
have IFA_ROUTE. If more functions are added to in_ifscrub then this logic
might need to be revisited.

Fixes PR/26450.
2011-12-12 00:06:39 +00:00
rmind 215a4b5f51 - Explain the magic in npf_tcpfl2case().
- Use __unused instead of (void)cast; fix comment.
2011-12-08 23:36:57 +00:00
rmind f2701a6f1c - Add npf_tcpfl2case() and make TCP state table more compact.
- Adjust the state for FIN case on sim-SYN and SYN-RECEIVED.
2011-12-05 00:34:25 +00:00
rmind fad8b2d7a1 - Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
2011-11-29 20:05:30 +00:00
drochner 2467eee7c7 sys/pcq.h isn't installed to userland, so only include it ifdef _KERNEL,
fixes glitch in kdump build
2011-11-29 17:28:45 +00:00
jakllsch 3a65f68183 We need a cv_destroy() here too. Fixes LOCKDEBUG panic on interface detachment. 2011-11-27 14:55:57 +00:00
kiyohara 6c04b3bca9 Fix build failed. Include if_inarp.h. 2011-11-20 12:15:38 +00:00
tls 3afd44cf08 First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>.  This change includes
the following:

	An initial cleanup and minor reorganization of the entropy pool
	code in sys/dev/rnd.c and sys/dev/rndpool.c.  Several bugs are
	fixed.  Some effort is made to accumulate entropy more quickly at
	boot time.

	A generic interface, "rndsink", is added, for stream generators to
	request that they be re-keyed with good quality entropy from the pool
	as soon as it is available.

	The arc4random()/arc4randbytes() implementation in libkern is
	adjusted to use the rndsink interface for rekeying, which helps
	address the problem of low-quality keys at boot time.

	An implementation of the FIPS 140-2 statistical tests for random
	number generator quality is provided (libkern/rngtest.c).  This
	is based on Greg Rose's implementation from Qualcomm.

	A new random stream generator, nist_ctr_drbg, is provided.  It is
	based on an implementation of the NIST SP800-90 CTR_DRBG by
	Henric Jungheim.  This generator users AES in a modified counter
	mode to generate a backtracking-resistant random stream.

	An abstraction layer, "cprng", is provided for in-kernel consumers
	of randomness.  The arc4random/arc4randbytes API is deprecated for
	in-kernel use.  It is replaced by "cprng_strong".  The current
	cprng_fast implementation wraps the existing arc4random
	implementation.  The current cprng_strong implementation wraps the
	new CTR_DRBG implementation.  Both interfaces are rekeyed from
	the entropy pool automatically at intervals justifiable from best
	current cryptographic practice.

	In some quick tests, cprng_fast() is about the same speed as
	the old arc4randbytes(), and cprng_strong() is about 20% faster
	than rnd_extract_data().  Performance is expected to improve.

	The AES code in src/crypto/rijndael is no longer an optional
	kernel component, as it is required by cprng_strong, which is
	not an optional kernel component.

	The entropy pool output is subjected to the rngtest tests at
	startup time; if it fails, the system will reboot.  There is
	approximately a 3/10000 chance of a false positive from these
	tests.  Entropy pool _input_ from hardware random numbers is
	subjected to the rngtest tests at attach time, as well as the
	FIPS continuous-output test, to detect bad or stuck hardware
	RNGs; if any are detected, they are detached, but the system
	continues to run.

	A problem with rndctl(8) is fixed -- datastructures with
	pointers in arrays are no longer passed to userspace (this
	was not a security problem, but rather a major issue for
	compat32).  A new kernel will require a new rndctl.

	The sysctl kern.arandom() and kern.urandom() nodes are hooked
	up to the new generators, but the /dev/*random pseudodevices
	are not, yet.

	Manual pages for the new kernel interfaces are forthcoming.
2011-11-19 22:51:18 +00:00
dyoung d74f0a643d Before freeing an ifnet_lock, destroy its mutex. Should help with
kern/43294.
2011-11-16 06:09:37 +00:00
jakllsch dacb12f218 Make a comment consistent with the code. 2011-11-12 14:51:41 +00:00
gdt c9bfbf1142 Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.
RTF_ANNOUNCE was defined as RTF_PROTO2.  The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
  netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
  netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
  (Neither of these flags are used anywhere.  Both have been removed
  to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.
2011-11-11 15:09:32 +00:00
christos 60b08a4dfb cosmetic, no functional change:
- sizeof(*var) instead of sizeof(type)
- sort the event counters in the discard the same as alloc for readability
2011-11-09 19:43:22 +00:00
tron 2a04f59494 Change module class to driver as npf(4) is a pseudo device. 2011-11-06 13:08:04 +00:00
rmind 09cdfd6a19 Few fixes, KNF/style, bump the NPF version. 2011-11-06 02:49:03 +00:00
zoltan 7d78d5aecf When building the kernel without IPv6 support, compilation failed.
Fix that.
2011-11-05 10:23:26 +00:00
jakllsch 380d04da8a Use uint8_t instead of npf_netmask_t, as npf_netmask_t is a uint_fast8_t,
which is in many places is actually a uint32_t and thus incompatible with
prop_dictionary_get_uint8().  The correct type is noted in a comment.
2011-11-04 02:57:28 +00:00
zoltan 5a5d868dc5 Add IPv6 support for NPF. 2011-11-04 01:00:27 +00:00
dyoung 89986fc527 For simplicity's sake, use pcq(9) instead of my own circular-queue
implementation.  Saves 45 lines of code.
2011-11-02 01:17:59 +00:00
yamt 02a75580d4 remove an unnecessary cast 2011-10-31 12:50:50 +00:00
dyoung 53c8737e53 For these interfaces, the implementation of SIOCSIFDSTADDR is identical
to SIOCINITIFADDR, and SIOCSIFDSTADDR callers always fall back to
SIOCINITIFADDR, so just get rid of the SIOCSIFDSTADDR case.
2011-10-28 22:08:14 +00:00
dyoung 60d9d0608c Don't kauth-orize SIOCSIFMTU in pppsioctl() and stf_ioctl(), ifioctl()
has already done that for us.
2011-10-28 20:13:32 +00:00
dyoung 7609a81937 Userland may not change the IFF_CANTCHANGE flags, however, the kernel
may, so make sure if_flags_set() takes care of them.  Fixes a regression
in ifpromisc().
2011-10-28 20:11:58 +00:00
dyoung bca1ae2608 Don't kauth-orize SIOCDIFPHYADDR, SIOCSIFFLAGS, SIOCSIFMTU, or
SIOCSLIFPHYADDR, in gif_ioctl() or in gre_ioctl(), because those
operations are ordinarily kauth-orized already in ifioctl().

Kauth-orizing SIOCSIFFLAGS in gre_ioctl() caused a panic ("panic:
bpf_detachd: ifpromisc failed: 1") when tcpdump(8) was interrupted.
Somehow bpf(4) enables promiscuous mode using different credentials than
it uses to disable promiscuous mode, hence the ifpromisc failure.  This
may have something to do with privilege-separation in tcpdump(8).  I.e.,
an LWP with SIOCSIFFLAGS privilege opens /dev/bpf, but an LWP without
SIOCSIFFLAGS privilege closes it.
2011-10-28 16:42:52 +00:00
dyoung f7b2ead301 kauth isn't used in here, so don't #include <sys/kauth.h>. 2011-10-28 16:10:12 +00:00
dyoung 0275d524f1 Fix gif(4)/gre(4) operation over interfaces such as wm(4) that do IPv4
checksum-offload.  Note well: it really is necessary to clear the
csum_data.

While I'm here, remove the do-nothing case for SIOCSIFDSTADDR and let
ifioctl_common() or the protocol handle it.
2011-10-27 20:04:57 +00:00
dyoung b9e1bc4e41 Document the ifioctl locking in comments.
Add a missing percpu_free(9) call.
2011-10-25 22:26:18 +00:00
dyoung 3dbb17c433 Use if_flags_set() and if_mcast_op(). 2011-10-19 22:07:09 +00:00
dyoung d2e7867bc1 Get rid of gre's deadlock-prone, one-off ifioctl locking. The standard
ifioctl locking will do.
2011-10-19 21:59:38 +00:00
dyoung ea1b432e78 Fix userland compilation: pull the ifioctl lock-related data members
into a struct ifnet_lock that the ifnet has a pointer to.  In a
non-_KERNEL environment, don't #include <sys/percpu.h> et cetera, and
don't define the struct ifnet_lock but *do* declare it.
2011-10-19 21:29:51 +00:00
dyoung 82f65cfc5a Use if_flags_set() and if_addr_init() instead of ifp->if_ioctl(). 2011-10-19 01:49:50 +00:00
dyoung 96ddefeaea Use if_mcast_op() and if_flags_set() instead of calling ifp->if_ioctl(). 2011-10-19 01:48:30 +00:00
dyoung 9454d9b3cd Extract subroutines ifioctl_enter() and ifioctl_exit(). 2011-10-19 01:46:43 +00:00
dyoung ab5a4db2a3 Start to untangle the ifnet ioctls mess.
Add ifnet functions, if_mcast_op(), if_flags_set(), and if_addr_init()
for adding/deleting multicast addresses, modifying the if_flags,
and initializing local/remote addresses.  Make ifpromisc() use
if_flags_set().  Protocols and network drivers should use these
instead of ifp->if_ioctl() calls.  Subsequent commits will
replace ifp->if_ioctl(SIOCADDMULTI| SIOCDELMULTI| SIOCSIFDSTADDR|
SIOCINITIFADDR| SIOCSIFFLAGS) calls with calls to the new functions.

Use a mutex(9) to synchronize ifp->if_ioctl() calls originating in
userland.  Also synchronize ifp->if_ioctl() calls with ifnet detachment
and reclamation.
2011-10-19 01:34:37 +00:00
dyoung 0f201a09eb Cosmetic: remove whitespace at the end of line. 2011-10-07 16:34:31 +00:00
christos 04f8076084 Change obsolete CBSIZE constant (48), to a power of two constant (64) that
is close enough to match the original assumptions.
2011-09-23 15:29:08 +00:00
rjs 66914c95f9 Add support for RFC 4638 to pppoe(4).
The change to if_spppsubr.c moves the test for whether LCP should
request a mru change until after the pppoe device has picked up the
mtu of the underlying ethernet device.
2011-09-05 12:19:09 +00:00
rjs 8ae6b6e3af Typo in comment. 2011-08-30 22:23:06 +00:00
bouyer ccc8030189 Provide netbsd32 compat for bpf. Beside the ioctls, the structure
returned to userland by read(2) also needs to be converted.
For this, the bpf descriptor is flagged as compat32 (or not) in the
open and ioctl functions (where the user process's pid is also updated
in the descriptor). When the bpf buffer is filled in, the 32bits or native
header is used depending on the information stored in the descriptor.

This won't work if a 64bit binary does the open and ioctls, and then
exec a 32bit program which will do the read. But this is very
unlikely to happen in real life ...

Tested on i386 and loongson; with these changes my loongson can run
dhclient and tcpdump with a n32 userland.
2011-08-30 14:22:22 +00:00
jmcneill 1f02a7ab53 build pf module with WARNS=3, and remove the need for -Wno-shadow 2011-08-29 09:50:04 +00:00
dyoung f2c33a10eb Define if_free() for ixg(4) to use. 2011-08-12 22:09:36 +00:00
dyoung 63cfe0ec97 Declare if_free(). 2011-08-12 22:09:17 +00:00
rmind acd100f2ac Convert ppp_list_lock to mutex(9). 2011-08-07 13:51:37 +00:00
tron 11677c694e Fix weird hardware address assignment that GCC 4.5 complains about. 2011-07-19 19:42:27 +00:00
joerg 3eb244d801 Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
2011-07-17 20:54:30 +00:00
drochner 37cdb98bb0 back out previous - this should be unnecessary on NetBSD due to
the extra validation introduced in rev.1.42 (and pulled up to netbsd-5)
2011-07-14 12:44:10 +00:00
drochner d5aac921d3 clear the packet filter's scratch memory before running the filter
program, otherwise kernel memory can be leaked, from Guy Harris
per PR kern/45142
2011-07-14 10:43:55 +00:00
kefren 3837722c76 Avoid putting implicit null labels on the wire 2011-07-03 18:46:12 +00:00
joerg 017dd250ef Fix memset usage. 2011-07-01 02:46:24 +00:00
wiz 4cbd24b23f dependant -> dependent 2011-06-30 20:09:15 +00:00
kefren 561daf4fe9 make LSE prepend the rest of the shims in they exist 2011-06-22 19:08:29 +00:00
kefren bd098fd968 learn mpls interface how to prepend multiple shims by using a vector of
smpls_addrs in sockaddr_mpls. The number of smpls_addrs is found from
smpls_len. First label encountered is BoS.
XXX: need to do the same for LSE and this feature needs to be documented.
2011-06-21 14:30:19 +00:00
kefren f33cde5958 Avoid computing INET[6] cksums for MPLS packets 2011-06-20 09:43:27 +00:00
kefren a91123ebd3 teach loopback about MPLS. Prerequisite for MPLS tunnels 2011-06-17 09:15:24 +00:00
kefren 87fd7aebe4 use ETHERTYPE_MPLS only for unicast packets (RFC3032) 2011-06-16 19:47:30 +00:00
christos eb8da70733 setting things once is enough. 2011-06-10 00:10:35 +00:00
drochner 2cd69bbbad more "const" 2011-06-09 19:54:18 +00:00
drochner 0a8dabda40 pull in AES-GCM/GMAC support from OpenBSD
This is still somewhat experimental. Tested between 2 similar boxes
so far. There is much potential for performance improvement. For now,
I've changed the gmac code to accept any data alignment, as the "char *"
pointer suggests. As the code is practically used, 32-bit alignment
can be assumed, at the cost of data copies. I don't know whether
bytewise access or copies are worse performance-wise. For efficient
implementations using SSE2 instructions on x86, even stricter
alignment requirements might arise.
2011-05-26 21:50:02 +00:00
matt 1cce8ae3bd Add code to auto-deencapsulate 0 tagged VLANs. 2011-05-24 17:16:43 +00:00
joerg 7800ff71d5 Use proper format string 2011-05-24 16:37:49 +00:00
joerg 15e751808f simplify 2011-05-23 21:52:54 +00:00
drochner fefed2101c add IANA number for camellia-cbc, copied from FreeBSD 2011-05-05 17:46:48 +00:00
yamt 0cc7ac519a undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
2011-04-25 22:20:59 +00:00
yamt 022ceac2bd fix module build 2011-04-25 22:16:21 +00:00
yamt 21f7828965 use ETHER_IS_MULTICAST macro. no functional changes. 2011-04-25 22:14:45 +00:00
sborrill bfaa893b9f PR kern/38871
Fix LAN on bge(4), alc(4). Flag VLAN capability in ec_capenable as used by network
card drivers.
2011-04-08 13:56:51 +00:00
mbalmer 1571556be6 Fix misplaced parenthesis. From henning.petersen@t-online.de, thanks. 2011-04-02 08:11:31 +00:00
dyoung 060522dec8 Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
2011-03-31 19:40:51 +00:00
christos e826c9f234 lib/44807: something broken in stat(2), return that we are a character
device in st_mode.
2011-03-30 21:34:08 +00:00
bouyer 22637b9c37 Allocate buffers with (M_WAITOK | M_CANFAIL) instead of M_NOWAIT.
M_NOWAIT cause dhcpd on a low-memory server with lots of interfaces to
occasionally fail to start with ENOBUFS; (M_WAITOK | M_CANFAIL) seems to
fix this.
Tested on 3 different dhcp servers.
2011-03-30 18:04:27 +00:00
cegger 126af76cac add MBSS. From FreeBSD. 2011-02-20 13:51:17 +00:00
christos 42b61c5ccd delint. 2011-02-19 17:21:48 +00:00
enami 01ec12c085 Fix userland build. 2011-02-19 08:46:41 +00:00
christos 6f035a2d13 Use kmem instead of malloc. Requested by rmind. 2011-02-19 04:10:47 +00:00
matt 6e0e9b9067 Use __CTASSERT 2011-02-19 02:22:27 +00:00
christos 4a5bd76895 Avoid stack memory disclosure by keeping track during filter validation time
of initialized memory. Idea taken from linux.
2011-02-19 01:12:39 +00:00
kefren 159fa1bbe5 Allow changing route flags. Should fix PR/40455
OK'ed: dyoung@
2011-02-10 07:42:18 +00:00
rmind fba2c6b806 Bump NPF_VERSION. 2011-02-02 23:01:34 +00:00
rmind 07ac07d35f NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
  Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
2011-02-02 02:20:24 +00:00
chuck e3e22c95ba udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
2011-02-01 19:40:24 +00:00
matt 4d5d6d9aa5 Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs.  The old unclean one remains for backward compatibility.
2011-02-01 01:39:19 +00:00
dyoung c2126ca4c3 Update comment on RTM_CHGADDR to describe better what it's for. 2011-01-26 00:58:36 +00:00
dyoung 7617f65929 Add some 10-gigabit media words used by Intel 82599. 2011-01-26 00:57:47 +00:00
christos 87c238c4a3 undo previous. Read the diff wrong. 2011-01-22 19:12:58 +00:00
christos 6c793dc721 fix comment 2011-01-22 16:54:48 +00:00
rmind f938371887 NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
  potentially other functions from the rule structure.  Rule procedure can be
  shared amongst the rules.  Separation is both at kernel level (npf_rproc_t)
  and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic.  Use TCP FSM definitions.
- Add if_byindex(), OK by matt@.  Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
2011-01-18 20:33:45 +00:00
tsutsui d6f76b4a7b Fix off by one in ether_aton_r(). Noticed by "arp info overwritten" warning.
(how could it be missed for months?)
2011-01-12 15:30:40 +00:00
pooka 8d1e86d12d Apply patch from PR kern/44369 by Wolfgang Stukenbrock. 2011-01-11 10:52:42 +00:00
christos d232460a0a kern/44310: Alexander Nasonov: write to /dev/bpf truncates size_t to int 2011-01-02 21:03:45 +00:00
uebayasi ddbd4f2fb0 Fix build. 2010-12-27 14:58:55 +00:00
christos d5760f00f3 merge the length getting code from rt_msg1 and rt_msg2 and make it fail
when the compatibility ifinfo is missing instead of returning junk.
2010-12-25 20:37:44 +00:00
rmind 628e094cdc NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
2010-12-18 01:07:25 +00:00
pooka 91a3d3404c linkset no more 2010-12-08 17:10:13 +00:00
pooka 55fde99bfe _KERNEL_TOP 2010-12-07 20:38:26 +00:00
mrg f705e59458 revert another part of bpf_filter 1.38 that broke the check for divide
by zero while validating the bpf program.

originally spotted by skrll@, and broke atf the month-old atf test for
this exact problem: net_bpf_t_div-by-zero_div_by_zero.
2010-12-05 22:40:56 +00:00
mrg b5dcdd394e consider BPF_ABS, BPF_IND and BPF_MSH as they used to be in rev 1.37.
this fixes dhclient, and i'm told dhcpcd as well.


this patch from skrll@netbsd.org, tested by me.
2010-12-05 09:42:20 +00:00
mrg 51b69d29cc apply the smallest hack to allow this to build without warnings again. 2010-12-05 08:45:46 +00:00
christos 70d66231fe make bpf_validate available in userland. 2010-12-05 02:40:40 +00:00
christos d639454cac constify 2010-12-05 00:34:21 +00:00
christos 420ea92013 PR/44131: Matthew Mondor: if_tap.c tap_dev_ioctl() not propagating error,
always returns 0.
2010-11-22 21:31:51 +00:00
dyoung 7ef5c7d564 Cosmetic: fix indentation. 2010-11-17 00:20:49 +00:00
pooka 6f2301fb3c Implement ifconfig linkstr as proposed on tech-net. 2010-11-15 22:42:36 +00:00
roy a4784ce051 Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.
2010-11-12 16:30:26 +00:00
rmind 97b932f123 NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
  plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
  as NAT code et al, to use it.  Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6.  Various fixes and clean-up.
2010-11-11 06:30:39 +00:00
christos 882dc7b484 remove unused variables. 2010-11-06 23:28:58 +00:00
christos 4b64d01d9d PR/44054: Onno van der Linden: Stacksmashing in handling of ioctl OOSIO*
parameter.
2010-11-06 17:17:13 +00:00
christos 0118eafd5d PR/44030: Onno van der Linden: ifreqn2o gets called with the parameters the
wrong way around in /sys/net/if.c
2010-11-02 22:34:21 +00:00
pooka 177b6d9664 Remove XXX comment with the text "going away soon". It was added
in September 1989 -- I think we passed "soon" around last week.
2010-10-20 15:02:17 +00:00
rmind e2137dd128 npf_packet_handler: clear M_CANFASTFWD flag, so inspection would work when
fast forwarding is enabled (e.g. with GATEWAY kernel option).  Thanks matt@
for the tip.
2010-10-10 15:29:01 +00:00
rmind dc69e25ffd - npf_session_gc: fix for previous RB-tree conversion.
- npf_session_free: rename (to singular).
2010-10-03 19:36:38 +00:00
rmind a21e0fbdda nbuf_advfetch: fix bug and change behaviour on error case. 2010-10-03 19:30:22 +00:00
matt 19e6c76b2d Rename rb.h to rbtree.h, as it is more appropriate (c.f. ptree.h). Also
helps find code that hasn't been updated to use the new rbtree API.
2010-09-25 01:42:38 +00:00
rmind 57fb328f93 Add nbuf_advfetch() and simplify some code slightly. 2010-09-25 00:25:31 +00:00
rmind 879d5dfb5e Fixes/improvements to RB-tree implementation:
1. Fix inverted node order, so that negative value from comparison operator
   would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
   already existing one.
4. Amend the interface to manipulate the actual object, instead of the
   rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.

XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..

1-3 address the PR/43488 by Jeremy Huddleston.

Passes RB-tree regression tests.
Reviewed by: matt@, christos@
2010-09-24 22:51:50 +00:00
christos 14032335ad prevent integer oveflow. From Maksymilian Arciemowicz 2010-09-23 21:16:42 +00:00
rmind 63012b51f1 NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
2010-09-16 04:53:27 +00:00
tls cb72c101ad From Coyote Point source tree: "fix" srt IPv4 lookup on little-endian
hosts.  IPv6 is probably still broken, and, actually, the lookup table
for mask values should be kept in network byte order, not host byte order
and the corresponding change to the srtconfig ioctl interface made.

But at least this works.
2010-09-09 03:24:57 +00:00
spz 445e6acd20 fix two bugs in the PFKEY interface:
1) RFC2367 says in 2.3.3 Address Extension: "All non-address
   information in the sockaddrs, such as sin_zero for AF_INET sockaddrs,
   and sin6_flowinfo for AF_INET6 sockaddrs, MUST be zeroed out."
   the IPSEC_NAT_T code was expecting the port information it needs
   to be conveyed in the sockaddr instead of exclusively by
   SADB_X_EXT_NAT_T_SPORT and SADB_X_EXT_NAT_T_DPORT,
   and was not zeroing out the port information in the non-nat-traversal
   case.
   Since it was expecting the port information to reside in the sockaddr
   it could get away with (re)setting the ports after starting to use them.
   -> Set the natt ports before setting the SA mature.

2) RFC3947 has two Original Address fields, initiator and responder,
   so we need SADB_X_EXT_NAT_T_OAI and SADB_X_EXT_NAT_T_OAR and not just
   SADB_X_EXT_NAT_T_OA

The change has been created using vanhu's patch for FreeBSD as reference.

Note that establishing actual nat-t sessions has not yet been tested.

Likely fixes the following:
PR bin/41757
PR net/42592
PR net/42606
2010-09-05 06:52:53 +00:00
rmind 2e6f2099c6 Import NPF - a packet filter. Some features:
- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
  Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
  supporting generic RISC-like and specific CISC-like instructions for
  common patterns (e.g. IPv4 address matching).  See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
2010-08-22 18:56:18 +00:00
pgoyette 23d5409e7e Update the rest of the kernel to conform to the module subsystem's new
locking protocol.
2010-08-21 13:19:39 +00:00
pgoyette 5ef3a6df9b Keep condvar wmesg within 8 char limit 2010-08-11 11:47:29 +00:00
kefren d4ccc143a1 we need to set rt_ifp even if ifa is the same. Fixes the case when one
changes route to a different ifp but wants to keep the same ifa
2010-06-28 17:26:11 +00:00
kefren 00914d6d55 Don't assume that rt_tag family is AF_MPLS but verify it.
This way rt_tag can be used for other future work also, not only MPLS
2010-06-27 13:39:11 +00:00
kefren aeb8fe1fa4 Style fix: Tab consistency with the lines around it 2010-06-27 06:48:40 +00:00
kefren 25133d6d8f Fix build for MPLS import: add options MPLS, changed pseudo-device mpls
to pseudo-device ifmpls
2010-06-26 15:17:56 +00:00
kefren 826653c190 Add MPLS support, proposed on tech-net@ a couple of days ago
Welcome to 5.99.33
2010-06-26 14:24:27 +00:00
skrll 6a978a976d Correct the argument order of ifreqn2o conversion.
Fixes PR/42585.
2010-06-12 08:12:03 +00:00
dyoung 0d73006091 Prevent if_detach() from crashing while it walks the routing table
to find and unlink routes that reference the detached ifnet: make
if_rt_walktree() return ERESTART whenever it has deleted a route.
Whenever rt_walktree() returns ERESTART, if_detach() restarts it.

I believe that this fix resembles one by Jonathan Kollasch or by someone
else, which has languished in a PR for too long.  Sorry!

Tested by me and by Jeff Rizzo.

XXX It's supposed to be safe for rn_walktree() to apply to the routing
XXX table a routine that may delete routes.  Why isn't it safe in
XXX practice?
2010-06-02 23:41:14 +00:00
mjf e0e10b0607 Add __cacheline_aligned and __read_mostly annotations.
These annotations help to mitigate false sharing on multiprocessor
systems.

Variables annotated with __cacheline_aligned are placed into the
.data.cacheline_aligned section in the kernel. Each item in this
section is aligned on a cachline boundary - this avoids false
sharing. Highly contended global locks are a good candidate for
__cacheline_aligned annotation.

Variables annotated with __read_mostly are packed together tightly
into a .data.read_mostly section in the kernel. The idea here is that
we can pack infrequently modified data items into a cacheline and
avoid having to purge the cache, which would happen if read mostly
data and write mostly data shared a cachline. Initialisation variables
are a prime candiate for __read_mostly annotations.
2010-06-01 22:13:30 +00:00
dyoung a5a3d4c202 Change sc_wrports from an int to a bool and "test truth" instead of
comparing with 0.

Add 'volatile' to several other state variables that need it.
2010-05-26 23:46:44 +00:00
christos e0734521a4 delint previous 2010-05-19 20:43:51 +00:00
christos 5cbb2aa881 Replace ether_nonstatic_aton with a
- better named one
- not suffering from buffer oveflow
- simpler
- handling different separators
- returning error codes for errors

Some ideas from one posted on tech-net by Jonathan A. Kollasch
2010-05-19 20:41:59 +00:00
jakllsch fdc252ea81 Changes to ether_nonstatic_aton():
Be more leinent on input string format.  Each nibble pair may optionally be
followed by any of ':', '-', '.' or ' '.

Make source string const and work on a temporary copy.  The caller may not
expect their string to be destroyed.
2010-05-19 18:58:22 +00:00
dyoung ffd45aaae7 Constify some ether_output() arguments so that it's clear that they
can never be re-assigned.
2010-05-05 18:12:24 +00:00
kefren f4183d10ff Permit the existence of a route with unlinked ifp and ifa,
enabling this way the posibility to send a packet on an interface with
source address from another interface.
2010-05-02 19:17:56 +00:00
drochner 14f78ca302 the correct check for BPF_K is with BPF_SRC for BPF_ALU ops, from
Guy Harris per PR kern/43185
fixes possible division-by-zero crashes by evil filter expressions
like "len / 0 = 1"
pullup candidate
2010-04-21 16:35:09 +00:00
jmcneill ce4300c675 COMPAT_50 support for SPPP[GS]ETIDLETO and SPPP[GS]ETKEEPALIVE, ok martin@ 2010-04-20 14:32:03 +00:00
pooka 735701ff27 Add a little comment on how bpf can be made unloadable, per pointer from ad. 2010-04-14 13:31:33 +00:00
joerg 58e867556f Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
2010-04-05 07:19:28 +00:00
pgoyette b96bf61fb6 Now that fw_port.h is gone, we need to directly include <sys/select.h>
Fixes build break reported by myself.
2010-03-31 12:17:01 +00:00
kiyohara af09db112b Bye-bye fw_port.h. 2010-03-29 03:05:27 +00:00
christos 8bc5973709 add BIOC{G,S}FEEDBACK which allows one to receive injected outgoing packets
via bpf.
2010-03-13 20:38:48 +00:00
snj ccaf1e96be Fight the ever-increasing size of src checkouts by spelling "useful"
without an extra l.
2010-02-28 15:52:16 +00:00
darran 6cc8d64caf Propagate the IFCAP_TSOv6 property also. 2010-02-28 07:10:40 +00:00
dyoung 9554bb1e78 Take another stab at fixing the LOCKDEBUG panic reported in PR
kern/39940 and by Martti Kuparinen on current-users@:  replace the
ioctl lock with finer-grained locking.  Lock the ports list and
wait to if_clone_destroy() until all threads are out of the softc.

Thanks to Martti Kuparinen for testing these changes.
2010-02-08 17:59:06 +00:00
joerg 3d7916e198 Explicitly include opt_gateway.h when depending on GATEWAY. 2010-02-04 21:48:11 +00:00
mbalmer 0f58fac97e fix language 2010-01-28 14:12:11 +00:00
pooka de4f105d4a Include sys/atomic.h now that it's used but gets stealth-included
only on some archs.
2010-01-26 01:06:23 +00:00
pooka b2bb0f38d5 Make bpf dynamically loadable. 2010-01-25 22:18:17 +00:00
dyoung 53aaf4795c Spelling fix: correspoding -> corresponding. 2010-01-21 20:51:31 +00:00
pooka 64cb662564 fix pasto in previous 2010-01-19 23:11:10 +00:00
pooka 21958f98cc slap dis wit summah dat RCSId 2010-01-19 22:33:35 +00:00
pooka b014350f7f Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client.  This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached.  However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff.  ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
2010-01-19 22:08:16 +00:00
pooka 10fe49d72c Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client.  This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached.  However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff.  ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
2010-01-19 22:06:18 +00:00
pooka 64da563d90 Forward declare struct bpf_if and use that as the type for bpf_if
instead of "void *".  Buys us oo times the type-safety for 0 times
the price.
(no functional change)
2010-01-17 19:45:06 +00:00
pooka ec8068f5fb * remove just-for-kicks locking
* KNF
* remove outdated comment (quite a funny one to read in 2010, though)
2010-01-15 22:16:46 +00:00
dsl 2a54322c7b If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
2009-12-20 09:36:05 +00:00
dsl 7a42c833db Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
2009-12-09 21:32:58 +00:00
dyoung a439206784 KNF. 2009-12-09 00:44:26 +00:00
plunky 7f5b5a7b01 fix a potential leak on tap device close, purging the send queue
did not actually release the dequeued mbufs.

pointed out by Paul Forgey on tech-net
2009-11-29 10:44:23 +00:00
mbalmer 7404b55630 Fix function name that was changed by mistake in the previous whitespace
commit.
2009-11-28 09:20:37 +00:00
isaki d591783aff white space -> tab. 2009-11-28 02:58:21 +00:00
rmind dbd9b86792 Remove some unecessary includes sys/user.h header. 2009-11-23 02:13:44 +00:00
christos dd8534acfe ar_tha() can return NULL; treat this as an error. 2009-11-20 02:14:56 +00:00
joerg a5fad62a18 Simplify ifreq_setaddr:
- Drop the INET6 block. The commands are never given to this function
  and truncating the sockaddr is arguably not the desired result anyway.
- Clear the address before copying. This fixes SIOCGIFNETMASK and possible
  other ioctls for users that don't check sa_len. This includes
  COMPAT_43 and Linux emulation.

OK dyoung@
2009-11-13 23:11:08 +00:00
christos 314b0d9f8c PR/42285: PR/41559: Daniel Hagerty: if_stf doesn't count output bytes 2009-11-08 18:44:45 +00:00
dyoung fa8b0147c6 s/u_quad_t/uint64_t/. 2009-11-03 00:30:31 +00:00
dyoung bb960ead7a s/u_quad_t/uint64_t/ 2009-11-03 00:30:11 +00:00
cegger 5b494d7c82 buildfix: only declare sysctl_net_ifq_setup() if INET or INET6 is defined 2009-10-26 16:41:35 +00:00
dyoung 7b7a580067 Replace u_quad_t with uint64_t. u_quad_t is just a typedef for
uint64_t, so no ABI/API breakage will result from this change.
2009-10-05 21:25:05 +00:00
christos 14c3063365 add the error from ifpromisc to the panic. 2009-10-05 17:58:15 +00:00
elad 2bf6c7c405 We only care about KAUTH_NETWORK_ROUTE. 2009-10-03 02:22:22 +00:00
elad cee5cd7dd4 Move default network interface policy back to the subsystem. 2009-10-03 01:46:39 +00:00
elad 9f0d81cf10 Move routing socket security policy back to the subsystem. 2009-10-02 23:16:21 +00:00
skrll 2c50cb71cb Initialise index_gen_mtx before use. 2009-09-19 11:02:07 +00:00
pooka 11281f01a0 Replace a large number of link set based sysctl node creations with
calls from subsystem constructors.  Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
2009-09-16 15:23:04 +00:00
jakllsch 1d3dca01b3 When working with address preferences, sockaddr_externalize() both
addresses before comparing them.

This allows IPv6 link-local addresses (which have an embedded scope id)
to have a preference set on them.

ok dyoung
2009-09-15 23:24:34 +00:00
drochner d70cb77245 fix undefined result of stat(), found by clang static analyzer 2009-09-15 19:38:15 +00:00
degroote 2d48ac808c Import pfsync support from OpenBSD 4.2
Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@
2009-09-14 10:36:48 +00:00
tsutsui c412ca027b Make this compile with options RTSOCK_DEBUG.
Noticed by PR kern/41842, but fixed differently.
2009-09-12 18:09:25 +00:00
dyoung c5d5f7697a Make ifconfig(8) set and display preference numbers for IPv6
addresses.  Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr.  Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
  provide an implementation for IPv6.  Expect more work in this area: it
  may be more proper to say that the IPv6 implementation "internalizes"
  a sockaddr.  Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
  family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
  sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
  ifconfig(8).
2009-09-11 22:06:29 +00:00
tls fd671f648a Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write.  From
Ritesh Agrawal at Coyote Point.
2009-09-02 14:56:57 +00:00
dyoung 7a9941f8e8 Use sysctl(9) to expose to userland each interface transmission
queue's maximum length, current length, and number of drops.  E.g.,

% sysctl net.interfaces.bnx0
net.interfaces.bnx0.sndq.len = 0
net.interfaces.bnx0.sndq.maxlen = 509
net.interfaces.bnx0.sndq.drops = 0

Let userland adjust the maximum queue length.

While I'm here, add a 64-bit generation number, if_index_gen, to
ifnet; the pair [ifp->if_index, ifp->if_index_gen] can serve to
identify an ifnet for the lifetime of the system.  I will use this
in an upcoming change.

Ok matt@.
2009-08-13 00:23:31 +00:00