NetBSD

Author	SHA1	Message	Date
drochner	23e5beaef1	rename the IPSEC in-kernel CPP variable and config(8) option to KAME_IPSEC, and make IPSEC define it so that existing kernel config files work as before Now the default can be easily be changed to FAST_IPSEC just by setting the IPSEC alias to FAST_IPSEC.	2011-12-19 11:59:56 +00:00
tls	6e1dd068e9	Separate /dev/random pseudodevice implemenation from kernel entropy pool implementation. Rewrite pseudodevice code to use cprng_strong(9). The new pseudodevice is cloning, so each caller gets bits from a stream generated with its own key. Users of /dev/urandom get their generators keyed on a "best effort" basis -- the kernel will rekey generators whenever the entropy pool hits the high water mark -- while users of /dev/random get their generators rekeyed every time key-length bits are output. The underlying cprng_strong API can use AES-256 or AES-128, but we use AES-128 because of concerns about related-key attacks on AES-256. This improves performance (and reduces entropy pool depletion) significantly for users of /dev/urandom but does cause users of /dev/random to rekey twice as often. Also fixes various bugs (including some missing locking and a reseed-counter overflow in the CTR_DRBG code) found while testing this. For long reads, this generator is approximately 20 times as fast as the old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of 2.5MB/sec) and also uses a separate mutex per instance so concurrency is greatly improved. For reads of typical key sizes for modern cryptosystems (16-32 bytes) performance is about the same as the old code: a little better for 32 bytes, a little worse for 16 bytes.	2011-12-17 20:05:38 +00:00
roy	46df35d87e	When adding or scrubbing a prefix, always notify userland even if the prefix does not have IFA_ROUTE. Don't scrub the interface in SIOCAIFADDR if the new address does't have IFA_ROUTE. If more functions are added to in_ifscrub then this logic might need to be revisited. Fixes PR/26450.	2011-12-12 00:06:39 +00:00
christos	450535e4c0	u_int -> uint	2011-12-11 23:56:10 +00:00
roy	4d6bb52603	Add RDNSS and DNSSL support, RFC6106. Replace custom lists with TAILQ lists. Clean up plently of signed vs unsigned warnings and set WARNS=4. Adapted from FreeBSD.	2011-12-10 19:14:29 +00:00
tls	3afd44cf08	First step of random number subsystem rework described in <20111022023242.BA26F14A158@mail.netbsd.org>. This change includes the following: An initial cleanup and minor reorganization of the entropy pool code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are fixed. Some effort is made to accumulate entropy more quickly at boot time. A generic interface, "rndsink", is added, for stream generators to request that they be re-keyed with good quality entropy from the pool as soon as it is available. The arc4random()/arc4randbytes() implementation in libkern is adjusted to use the rndsink interface for rekeying, which helps address the problem of low-quality keys at boot time. An implementation of the FIPS 140-2 statistical tests for random number generator quality is provided (libkern/rngtest.c). This is based on Greg Rose's implementation from Qualcomm. A new random stream generator, nist_ctr_drbg, is provided. It is based on an implementation of the NIST SP800-90 CTR_DRBG by Henric Jungheim. This generator users AES in a modified counter mode to generate a backtracking-resistant random stream. An abstraction layer, "cprng", is provided for in-kernel consumers of randomness. The arc4random/arc4randbytes API is deprecated for in-kernel use. It is replaced by "cprng_strong". The current cprng_fast implementation wraps the existing arc4random implementation. The current cprng_strong implementation wraps the new CTR_DRBG implementation. Both interfaces are rekeyed from the entropy pool automatically at intervals justifiable from best current cryptographic practice. In some quick tests, cprng_fast() is about the same speed as the old arc4randbytes(), and cprng_strong() is about 20% faster than rnd_extract_data(). Performance is expected to improve. The AES code in src/crypto/rijndael is no longer an optional kernel component, as it is required by cprng_strong, which is not an optional kernel component. The entropy pool output is subjected to the rngtest tests at startup time; if it fails, the system will reboot. There is approximately a 3/10000 chance of a false positive from these tests. Entropy pool _input_ from hardware random numbers is subjected to the rngtest tests at attach time, as well as the FIPS continuous-output test, to detect bad or stuck hardware RNGs; if any are detected, they are detached, but the system continues to run. A problem with rndctl(8) is fixed -- datastructures with pointers in arrays are no longer passed to userspace (this was not a security problem, but rather a major issue for compat32). A new kernel will require a new rndctl. The sysctl kern.arandom() and kern.urandom() nodes are hooked up to the new generators, but the /dev/*random pseudodevices are not, yet. Manual pages for the new kernel interfaces are forthcoming.	2011-11-19 22:51:18 +00:00
gdt	c9bfbf1142	Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2. RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated that host should act as a proxy for a link level arp or ndp request. (If RTF_PROTO2 is used as an experimental flag (as advertised), various problems can occur.) This commit provides a first-class definition with its own bit for RTF_ANNOUNCE, removes the old aliasing definitions, and adds support for the new RTF_ANNOUNCE flag to netstat(8) and route(8)., Also, remove unused RTF_ flags that collide with RTF_PROTO1: netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1 netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1 (Neither of these flags are used anywhere. Both have been removed to reduce chances of collision with RTF_PROTO1.) Figuring this out and the diff are the work of Beverly Schwartz of BBN. (Passed release build, boot in VM, with no apparently related atf failures.) Approved for Public Release, Distribution Unlimited This material is based upon work supported by the Defense Advanced Research Projects Agency and Space and Naval Warfare Systems Center, Pacific, under Contract No. N66001-09-C-2073.	2011-11-11 15:09:32 +00:00
yamt	d8e3880a8a	redo ip_output.c rev.1.206 and 1.207 differently. PR/43664. ok'ed by martin@	2011-10-31 13:16:01 +00:00
yamt	4fa9fc4940	fix a double unlock bug introduced by tcp_input.c rev.1.312.	2011-10-31 13:01:42 +00:00
yamt	ee68698439	tcp_drain: grab softnet_lock where appropriate	2011-10-31 12:56:45 +00:00
yamt	bf52753ac3	tcp_reass_unlock: assertion	2011-10-31 12:52:19 +00:00
dyoung	b16114789d	Remove the #if 1 / #endif around some code that appears to be responsible deleting the 'first' AF_INET address on the interface if the target address has family == AF_UNSPEC.	2011-10-28 22:23:54 +00:00
dyoung	386d3978d1	Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().	2011-10-19 01:52:22 +00:00
mrg	c9b22425b1	make this build without INET6. also, fix the rfc6056algo passed to sysctl_rfc6056_helper (it was backwards for inet4/inet6.)	2011-09-25 11:54:28 +00:00
christos	be63caec14	disable debugging	2011-09-24 18:32:23 +00:00
christos	afa4470578	install the header.	2011-09-24 17:54:19 +00:00
christos	bf93cf8726	Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of Google SoC-2011	2011-09-24 17:18:17 +00:00
plunky	7f3d4048d7	NULL does not need a cast	2011-08-31 18:31:02 +00:00
christos	4460dc9ac3	Add 3 logging sysctls for arp from freebsd: 1. log_movements: do you want to log the arp overwritten message or not? 2. log_wrong_iface: do you want to log when an arp arrives at the wrong interface? 3. log_permanent_modify: do you want to log when an arp message attempts to overwrite a static entry? I did not call the sysctls log_arp like FreeBSD does, because we already have an arp sysctl level. The default is on for all three of them.	2011-08-27 09:05:54 +00:00
christos	20d3618cc7	Fill in missing IPTOS defines (from Linux/OpenBSD)	2011-07-24 18:06:08 +00:00
joerg	3eb244d801	Retire varargs.h support. Move machine/stdarg.h logic into MI sys/stdarg.h and expect compiler to provide proper builtins, defaulting to the GCC interface. lint still has a special fallback. Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and derive va_list as required by standards.	2011-07-17 20:54:30 +00:00
mrg	211896099a	avoid an uninitialised variable warning. this one seems a false positive, but since it's for some hacky workaround code anyway...	2011-07-03 09:03:32 +00:00
enami	6140cebc6c	Don't increment ip_nfragpackets when failed to allocate fragment queue. No one will decrement it on such case.	2011-06-27 00:45:50 +00:00
joerg	20443a4138	Be a bit cleaner and reduce the amount of namespace pollution	2011-06-07 22:51:32 +00:00
dyoung	f92ee3adbe	Don't allocate resources for vtw until/unless it is enabled. This will further help those machines where memory is in short supply. TBD: release resources after vtw is disabled and all entries have expired.	2011-06-06 19:15:43 +00:00
dyoung	272b9fa3d1	Don't sleep until memory becomes available. Use kmem_zalloc() instead of kmem_alloc() + bzero(). During initialization, try to get all of the memory we need for the vestigial time-wait structures before we set any of the structures up, and if any single allocation fails, release all of the memory. This should help low-memory hosts. A much better fix postpones allocating any memory until vtw is enabled through the sysctl.	2011-06-03 20:01:00 +00:00
dyoung	42fedb3481	Defer scheduling vtw_tick() and setting the vtw hooks until vtw_control() is called. In this way, vtw_tick() will be re-scheduled repeatedly while vtw is in use. Pay tcp_vtw_was_enabled no attention in vtw_earlyinit(), since it's always going to be 0 during initialization.	2011-06-03 17:11:34 +00:00
gdt	c238210804	Remove erroneous additional tick in RTO estimation. The variable ts_rtt is 1 plus the RTT, so that 0 can mean invalid measurement. However, the code failed to subtract the 1 back out before use. With this change, TCP from Massachusetts to France now typically has 1s RTO values, rather than 1.5s. This bug was found and fixed by Bev Schwartz of BBN. This material is based upon work supported by the Defense Advanced Research Projects Agency and Space and Naval Warfare Systems Center, Pacific, under Contract No. N66001-09-C-2073. Approved for Public Release, Distribution Unlimited	2011-05-25 23:20:57 +00:00
gdt	2377e629f8	Add comment urging a separation of TCP_RTT_SHIFT into separate defines describing the EWMA calculation and the storage representation. (No code change.)	2011-05-25 23:17:44 +00:00
gdt	0ca69791cc	Note units and current value for TCP_DELACK_TICKS.	2011-05-24 18:37:52 +00:00
spz	5f1fd2312c	RA flood mitigation via a limit on accepted routes: - introduce a limit for the routes accepted via IPv6 Router Advertisement: a common 2 interface client will have 6, the default limit is 100 and can be adjusted via sysctl - report the current number of routes installed via RA via sysctl - count discarded route additions. Note that one RA message is two routes. This is at present only across all interfaces even though per-interface would be more useful, since the per-interface structure complies to RFC2466 - bump kernel version due to the previous change - adjust netstat to use the new value (with netstat -p icmp6)	2011-05-24 18:07:11 +00:00
dholland	5d71a1f21c	typo in comment	2011-05-17 05:40:24 +00:00
drochner	4f6bdd19b5	use getmicrouptime(9) rather than microtime(9) for TIME_WAIT duration calculation, because this doesn't get confused by system time changes, and uses less CPU cycles reviewed by dyoung	2011-05-11 15:08:59 +00:00
spz	18f5539bfc	update (unused) ND option identifiers and corresponding comments	2011-05-08 18:42:53 +00:00
drochner	060227a80a	remove an empty function	2011-05-06 12:52:43 +00:00
dyoung	6866464399	Remove #ifdef INET6 throughout.	2011-05-03 23:57:41 +00:00
dyoung	c2e43be1c5	Reduces the resources demanded by TCP sessions in TIME_WAIT-state using methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime Truncation (MSLT). MSLT and VTW were contributed by Coyote Point Systems, Inc. Even after a TCP session enters the TIME_WAIT state, its corresponding socket and protocol control blocks (PCBs) stick around until the TCP Maximum Segment Lifetime (MSL) expires. On a host whose workload necessarily creates and closes down many TCP sockets, the sockets & PCBs for TCP sessions in TIME_WAIT state amount to many megabytes of dead weight in RAM. Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to a class based on the nearness of the peer. Corresponding to each class is an MSL, and a session uses the MSL of its class. The classes are loopback (local host equals remote host), local (local host and remote host are on the same link/subnet), and remote (local host and remote host communicate via one or more gateways). Classes corresponding to nearer peers have lower MSLs by default: 2 seconds for loopback, 10 seconds for local, 60 seconds for remote. Loopback and local sessions expire more quickly when MSLT is used. Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket dead weight with a compact representation of the session, called a "vestigial PCB". VTW data structures are designed to be very fast and memory-efficient: for fast insertion and lookup of vestigial PCBs, the PCBs are stored in a hash table that is designed to minimize the number of cacheline visits per lookup/insertion. The memory both for vestigial PCBs and for elements of the PCB hashtable come from fixed-size pools, and linked data structures exploit this to conserve memory by representing references with a narrow index/offset from the start of a pool instead of a pointer. When space for new vestigial PCBs runs out, VTW makes room by discarding old vestigial PCBs, oldest first. VTW cooperates with MSLT. It may help to think of VTW as a "FIN cache" by analogy to the SYN cache. A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT sessions as fast as it can is approximately 17% idle when VTW is active versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM when VTW is active (approximately 64k vestigial PCBs are created) than when it is inactive.	2011-05-03 18:28:44 +00:00
dyoung	ac162b774b	_drain() routines may be called with locks held, so instead of doing any work in _drain(), set a drain-needed flag. Do the work in the fasttimo handler. Contributed by Coyote Point Systems, Inc.	2011-05-03 17:44:30 +00:00
dyoung	8e054749e4	arp_drain() may be called with locks held, so instead of doing any work in arp_drain(), set a drain-needed flag. Do the work in the fasttimo handler. Contributed by Coyote Point Systems, Inc.	2011-05-03 16:00:29 +00:00
yamt	0cc7ac519a	undefer csum in looutput. looutput is used by various code (ether_output, mcast) to loopback packets.	2011-04-25 22:20:59 +00:00
yamt	3e17d0f5a4	tcp_input: simplify redundant assignment. no functional changes.	2011-04-25 22:12:43 +00:00
yamt	45430a8699	ip_undefer_csum: - don't forget ntohs. - don't add hdrlen twice for l4 header offset. - use M_CSUM_DATA_IPv4_IPHL instead of extracting it from ip header. - simplify code. - KNF.	2011-04-25 22:11:31 +00:00
yamt	e86be17a4f	fix assertions	2011-04-25 22:04:32 +00:00
wiz	d8926a5a43	Fix typos.	2011-04-20 14:08:07 +00:00
gdt	f641bea548	Rewrite comments about TCP RTO calculations. Long ago, the storage representations of srtt and rttvar were changed from the 4.4BSD scheme, and the comments are out of sync with the code. This commit rewrites most of the comments that explain the RTO calculations, and points out some issues in the code. Joint work with Bev Schwartz of BBN (original analysis and comments), but I have rewritten and extended them, so errors are mine. This material is based upon work supported by the Defense Advanced Research Projects Agency and Space and Naval Warfare Systems Center, Pacific, under Contract No. N66001-09-C-2073. Approved for Public Release, Distribution Unlimited	2011-04-20 13:35:51 +00:00
dyoung	b34b1e2f1f	In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN. Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using the global ipqmaxlen. Get rid of the global ipqmaxlen. Now it works again to override the maximum IP queue length with, for example, sysctl -w net.inet.ip.ifq.maxlen=5.	2011-04-14 20:32:04 +00:00
yamt	e3f6054711	simplify a compile-time assertion	2011-04-14 16:08:53 +00:00
yamt	41529ab272	- comments - g/c stale extern	2011-04-14 15:57:02 +00:00
yamt	37494bba21	comments	2011-04-14 15:55:46 +00:00
yamt	c9cf49ace7	- comments - whitespace	2011-04-14 15:54:31 +00:00

1 2 3 4 5 ...

2052 Commits