Commit Graph

1730 Commits

Author SHA1 Message Date
manu
cddc307094 Fix build problem after recent NAT-T changes 2005-04-26 05:37:45 +00:00
manu
52786ce730 Don't sleep when handling ESP over UDP packets. 2005-04-25 20:37:06 +00:00
manu
455d55f55b Enhance IPSEC_NAT_T so that it can work with multiple machines behind the
same NAT.
2005-04-23 14:05:28 +00:00
yamt
23cd288d37 ip_output: handle the case M_CSUM_TSOv4 but !IFCAP_TSOv4. 2005-04-18 22:06:28 +00:00
yamt
fa67035590 add a function to handle M_CSUM_TSOv4 by software. 2005-04-18 21:55:06 +00:00
yamt
e5a2b5a4a4 fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does
  so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
  (maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.
2005-04-18 21:50:25 +00:00
yamt
0b4d50d7bd when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
2005-04-07 12:22:47 +00:00
kurahone
f7707899c1 Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
2005-04-05 01:07:17 +00:00
yamt
4b935040d0 tcp_input: update a comment to match with the code. 2005-04-03 05:02:46 +00:00
is
a0c9bc9616 Add IPv6 over GRE (contributed by Gert Doering in PR 29150). 2005-03-30 16:34:54 +00:00
yamt
73a5d8f913 s of sack is selective, not selection. pointed by Michael Eriksson. 2005-03-30 11:09:16 +00:00
yamt
8b0967ff45 protect tcpipqent with splvm. 2005-03-29 20:10:16 +00:00
yamt
c08e90ff51 tcp_output: lock reass queue when building sack. 2005-03-29 20:09:24 +00:00
yamt
2c742b20e6 ip_reass: clear stale csum_flags. 2005-03-29 09:37:08 +00:00
christos
3136f75efa defopt IPFILTER_DEFAULT_BLOCK 2005-03-26 18:08:42 +00:00
kurahone
0eb940bc75 TCP/SACK changes from FreeBSD.
Ignore the SACK option if
 * The packet is not an ACK.
 * The ACK is outside of snd_una -> snd_max
2005-03-18 21:25:09 +00:00
yamt
df05ca7085 simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
  the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
  build it directly from the segment list when needed.
2005-03-16 00:39:56 +00:00
yamt
0446b7c3e3 - use full sized segments unless we actually have SACKs to send.
- avoid TSO duplicate D-SACK.
- send SACKs regardless of TF_ACKNOW.
- don't clear rcv_sack_num when transmitting.

discussed on tech-net@.
2005-03-16 00:38:27 +00:00
yamt
9482bc7356 don't try to use TSO to transmit a single segment.
- there's no benefit.
- rtl8169 seems to be stuck with it.
2005-03-12 07:53:08 +00:00
matt
7dfa1d8cf7 Set ip_len to 0 in the wm driver when TSO is being used. 2005-03-11 17:07:51 +00:00
atatat
5b8a6c916d Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs.  It had other deleterious side-effects that are
rather nasty.  Another solution must be found.
2005-03-11 06:16:15 +00:00
thorpej
3901f760df In ip_fragment():
- Use the correct IP header length variable for other-than-first packets.
- Remove redundant setting of the original IP header length in the first
  packet's csum_data.  (It's already set before ip_fragment() is called
  in 1.147.)
2005-03-10 06:03:00 +00:00
atatat
d945605f5b Make this build without INET6 xor INET (hah!) again. 2005-03-10 05:49:14 +00:00
atatat
ca63da437a Change types of kern.file2 and net.*.*.pcblist to NODE 2005-03-10 05:43:25 +00:00
atatat
7c62c74d09 Add the following nodes to the sysctl tree:
net.local.stream.pcblist
	net.local.dgram.pcblist
	net.inet.tcp.pcblist
	net.inet.udp.pcblist
	net.inet.raw.pcblist
	net.inet6.tcp6.pcblist
	net.inet6.udp6.pcblist
	net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols.  The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
2005-03-09 05:07:19 +00:00
atatat
76a9013c25 gc the tcp_sysctl() prototype since it's completely vestigial 2005-03-09 04:51:56 +00:00
simonb
e491fee6a5 s/quence/quench/. 2005-03-09 04:24:12 +00:00
simonb
3792275475 Add an extra `i' to notifes/notifed. 2005-03-09 04:23:33 +00:00
matt
47df382bfe Move all the hardware-assisted checksum/segment offload code together. 2005-03-09 03:39:27 +00:00
matt
ea3d151322 For AF_INET, always set m->m_pkthdr.csum_data. Don't or TSOv4, just set it. 2005-03-09 03:38:33 +00:00
yamt
a0f802e2ac tcp_sack_option: handle the case that the right-most sack'ed block is expanded.
a fix from Noritoshi Demizu (FreeBSD PR/78226) via Kentaro A. Kurahone.
2005-03-08 11:27:14 +00:00
yamt
e55b9169d1 tcp_sack_option: fix the cases that some sack blocks go into a hole. 2005-03-07 10:27:39 +00:00
yamt
ff614e1114 tcp_sack_option: fix a typo(?), which can cause to ignore valid blocks. 2005-03-07 09:40:35 +00:00
yamt
ed8b840f26 tcp_sack_option: the max number of sack blocks in a packet is 4, not 3. 2005-03-07 09:32:51 +00:00
yamt
e16a97f90b - unwrap short lines.
- remove unneeded parenthesis.
- whitespace.
2005-03-06 23:06:40 +00:00
yamt
fd5005e8d7 don't assume alignment of sack options. 2005-03-06 23:05:56 +00:00
yamt
1152380a6b wrap long lines. 2005-03-06 23:05:20 +00:00
yamt
2dc19239d5 update SYSCTL_DESCR; sack is implemented. 2005-03-06 10:15:30 +00:00
thorpej
1f89264732 Add a /*CONSTCOND*/ to last. 2005-03-06 03:41:36 +00:00
matt
c24b749deb Fix typo. Opposite of >= is <, not ==. 2005-03-06 00:52:25 +00:00
matt
9337b701be Replace some gotos with a do while (0) and breaks. No functional change. 2005-03-06 00:48:52 +00:00
matt
8e04817c50 Add IPv4/TCP hooks for TCP Segment Offload on transmit. 2005-03-06 00:35:07 +00:00
briggs
6fe1c07527 Fix checksum offload for fragmented packets. From John Heasley
on gnats-bugs in PR kern/29544.
Tested with an NFS client using default rwsize on an NFS server
with wm(4) interface configured IP4CSUM,TCP4CSUM,UDP4CSUM.
Prior revision required the server to have checksum offload disabled.
2005-03-05 02:46:38 +00:00
mycroft
5640dcbb4a Re-add callout_active(), in a way compatible with the FreeBSD version, and use
it in the TCP stack to test which of the REXMT or PERSIST timer is in use.
This fixes a race condition that could cause "panic: tcp_output REXMT".  See
tech-net for details.
2005-03-04 05:51:41 +00:00
mycroft
c9f058f65e Copyright maintenance. 2005-03-02 10:20:18 +00:00
jonathan
4ae1f36dc9 Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over.  Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting  both
Kentaro's  wired-Ethernet NIC and in my two (different) WiFi NICs.
2005-02-28 16:20:59 +00:00
perry
f07677dd81 nuke trailing whitespace 2005-02-26 22:45:09 +00:00
peter
1c9b56c830 Add MKIPFILTER; if set to no, don't build and install the ipf(4) programs,
headers and LKM.

Add MKPF; if set to no, don't build and install the pf(4) programs,
headers, LKM and spamd.

Both options default to yes, so nothing changed in the default build.

Reviewed by lukem.
2005-02-22 14:39:58 +00:00
heas
0f8efdd552 My last change for pseudo-header checksums was flawed. The pseudo-header
checksum is always in the L4 header by the time we get to this point.  It
was occasionally not there due to a bug in tcp_respond, which has since
been fixed.
So, instead just stash the length of the L3 header in the high 16 bits of
csum_data.
2005-02-18 00:52:56 +00:00
briggs
da725d663a Initialize snd_high as part of tcp_sendseqinit().
From Kentaro A. Kurahone.
2005-02-16 15:00:47 +00:00
briggs
a825f3e77c Initialize t_partialacks in the tcpcb template.
From Kentaro A. Kurahone.
2005-02-16 14:59:40 +00:00
heas
2d4ced7c82 For controllers (eg: hme & gem) that can only perform linear hardware checksums
(from an offset to the end of the packet), the pseudo-header checksum must be
calculated by software.  So, provide it in the TCP/UDP header when
M_CSUM_NO_PSEUDOHDR is set in the interface's if_csum_flags_tx.

The start offset, the end of the IP header, is also provided in the high 16
bits of pkthdr.csum_data.  Such that the driver need not examine the packet
at all.

XXX At the request of Jonathan Stone, note that sharing of if_csum_flags_tx &
    pkthdr.csum_flags for checksum quirks should be re-evaluated.
2005-02-12 23:25:29 +00:00
manu
5c217c1a67 Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
2005-02-12 12:31:07 +00:00
heas
52b0cd6b47 ntohs->htons for ip6 plen (payload length).
It is not technically necessary to set plen here, since ip6_output() starts
off by calculating it, but leaving it keeps it consistent with other code.
2005-02-12 01:24:07 +00:00
pk
237a0c2d85 Update tcp_trace() prototype to match implementation. 2005-02-06 20:13:09 +00:00
perry
b02c92c5bf ANSIfy function declarations 2005-02-03 23:50:33 +00:00
perry
870f206724 ANSIfy function declarations 2005-02-03 23:39:32 +00:00
perry
dcf288607c ANSIfy function declarations 2005-02-03 23:25:22 +00:00
perry
d5c8fcf31c ANSIfy function declarations 2005-02-03 23:13:20 +00:00
perry
71ef63c98f ANSIify function declarations 2005-02-03 23:08:43 +00:00
perry
402f8626b1 ANSIfy function declarations 2005-02-03 22:51:50 +00:00
perry
90789ef318 some ANSIfying, and remove an unsightly tab 2005-02-03 22:45:28 +00:00
perry
babe6a957c KNF + slightly ANSIfy 2005-02-03 22:43:34 +00:00
perry
51ad03a950 ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
2005-02-03 03:49:01 +00:00
perry
3494482345 de-__P -- will ANSIfy .c files later. 2005-02-02 21:41:55 +00:00
perry
695648ddc8 de-__P, do some ANSIfication. 2005-02-02 21:41:01 +00:00
he
1c9ef2aa0a Fix "unused local variable" warning/error if compiling without
bridge support by making variable declaration conditional.  Found
while compiling for shark.
2005-02-01 12:56:30 +00:00
kim
c9f56c04dc Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
2005-01-31 23:49:36 +00:00
mycroft
47759e6333 Several changes based on comparison with NS:
1) dupseg_fix_=true from NS: do not count a segment with completely duplicate
data as a duplicate ack.  This can occur due to duplicate packets in the
network, or due to fast retransmit from the other side.

2) dupack_reset_=false from NS: do not reset the duplicate ack counter or exit
fast recovery if we happen to get data or a window update along with a
duplicate ack.

3) In the "very old ack" case that itojun added, send an ACK before dropping
the segment, to try to update the other side's send sequence number.

4) Check the ssthresh crossover point with >= rather than >.  Otherwise we
start to do "exponential" growth immediately following recovery, where we
should be doing "linear".  This is what NS does.
2005-01-28 00:18:22 +00:00
mycroft
e236dc1c36 Whoops. Exit fast recovery when handling a timeout. 2005-01-27 18:45:41 +00:00
mycroft
746d109a3c There is no reason to adjust ts_recent_age for ts_timebase; it's strictly an
internal variable.
2005-01-27 17:14:04 +00:00
mycroft
470f2d0705 Do the other TCP_PAWS_IDLE check unsigned as well. It doesn't do us any harm,
and it could detect even older time stamps.  (Really, to be 100% correct, there
should be a timer that clears these out -- but it probably doesn't matter in
the real world.)
2005-01-27 17:10:07 +00:00
mycroft
42655f2a87 Also check whether an echoed RTT is very large -- this *could* cause the
smoothing function to overflow.  I use TCP_PAWS_IDLE (24 days) for this.
2005-01-27 16:56:06 +00:00
mycroft
7215a0b3f1 Introduce a new state variable, t_partialacks. It has 3 states:
* t_partialacks<0 means we are not in fast recovery.
* t_partialacks==0 means we are in fast recovery, but we have not received
  any partial acks yet.
* t_partialacks>0 means we are in fast recovery, and we have received
  partial acks.

This is used to implement 2 changes in RFC 3782:
* We keep the notion that we are in fast recovery separate from t_dupacks, so
  it is not reset due to out-of-order acks.  (This affects both the Reno and
  NewReno cases.)
* We only reset the retransmit timer on the first partial ack -- preventing us
  from possibly taking one RTO per segment once fast recovery is initiated.

As before, it is hard to measure any difference between Reno and NewReno in the
real-world cases that I've tested.
2005-01-27 03:39:36 +00:00
mycroft
5283ca74ad Fix two problems in our TCP stack:
1) If an echoed RFC 1323 time stamp appears to be later than the current time,
   ignore it and fall back to old-style RTT calculation.  This prevents ending
   up with a negative RTT and panicking later.

2) Fix NewReno.  This involves a few changes:

   a) Implement the send_high variable in RFC 2582.  Our implementation is
      subtly different; it is one *past* the last sequence number transmitted
      rather than being equal to it.  This simplifies some logic and makes
      the code smaller.  Additional logic was required to prevent sequence
      number wraparound problems; this is not mentioned in RFC 2582.

   b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack.
      All of the new ack code is pushed out into tcp_newreno().  (Later this
      will probably be a pluggable function.)  Thus t_dupacks keeps track of
      whether we're in fast recovery all the time, with Reno or NewReno, which
      keeps some logic simpler.

   c) We do not need to update snd_recover when we're not in fast recovery.
      See tech-net for an explanation of this.

   d) In the gratuitous fast retransmit prevention case, do not send a packet.
      RFC 2582 specifically says that we should "do nothing".

   e) Do not inflate the congestion window on a partial ack.  (This is done by
      testing t_dupacks to see whether we're still in fast recovery.)

This brings the performance of NewReno back up to the same as Reno in a few
random test cases (e.g. transferring peer-to-peer over my wireless network).
I have not concocted a good test case for the behavior specific to NewReno.
2005-01-26 21:49:27 +00:00
matt
027c11539b Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them. 2005-01-24 21:25:09 +00:00
enami
f1b6d58e81 To fix bad pointer dereference on start up when gif is used,
- Allow rn_init() to be called multiple times, but do nothing except the
  first call.
- Include opt_inet.h so that #ifdef INET works.
- Call rn_init() from encap_init() explicitly rather than depending on the
  order of initialization.
2005-01-24 04:46:49 +00:00
itojun
fd232dd798 get zero-cleared field on malloc. kame-pr-856 2005-01-24 02:42:49 +00:00
matt
d341be30f4 Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
2005-01-23 18:41:56 +00:00
manu
5ff6d3d572 Duplicate nested if statement in PIM code (from the OpenBSD tree) 2005-01-15 06:50:47 +00:00
drochner
aeae2d9c94 compile tcp_debug.c only if the TCP_DEBUG option is set,
and remove the "#ifdef TCP_DEBUG" around everything
2005-01-13 19:09:40 +00:00
heas
fe4b3cd078 In tcp_respond(), clear the m_pkthdr.csum_flags that was inherited from the
received packet so that the checksum is not performed twice.  Also,
tcp_respond() does not fill-in the m_pkthdr.csum_data, so a h/w checksum may
have the wrong offset.

OK from Jason Thorpe.
2005-01-03 19:47:30 +00:00
yamt
ffebedd625 factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
2004-12-21 05:51:31 +00:00
christos
77e7bdb8aa yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.
2004-12-19 06:42:24 +00:00
yamt
ea04ddb694 udp6_input: correct loopback test. 2004-12-18 15:31:26 +00:00
yamt
6e353db6e4 tcp_input: add missing loopback checksum omission code for ipv6. 2004-12-18 07:30:17 +00:00
christos
60fb5c0ece Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

  4729 amq      0.000007 CALL  connect(4,0x804f2a0,0x1c)
  4729 amq      75.007420 RET   connect -1 errno 60 Connection timed out
2004-12-17 22:54:52 +00:00
thorpej
7994b6f95e Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
2004-12-15 04:25:19 +00:00
yamt
e745dd4766 remove TCPOPT_MD5SIGNATURE because no one in our tree uses it
and it's duplicated with TCPOPT_SIGNATURE.
i preferred TCPOPT_SIGNATURE because it's used by FreeBSD and OpenBSD.
2004-12-07 14:24:04 +00:00
peter
396b87b8c2 Convert lo(4) to a clonable device.
This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
2004-12-04 16:10:25 +00:00
christos
1ad35fcc9a PR/25749: Peter Postma: missing splx() in kernel. 2004-11-13 19:17:50 +00:00
thorpej
402ff2cf29 Slight simplification to IFA_STATS handling. 2004-10-06 05:42:24 +00:00
darrenr
0543239818 Add a comment to document what setting "srcrt" is really on about in ipintr() 2004-10-06 01:34:11 +00:00
yamt
2c46ccce37 move netinet/ip_lookup.h -> dist/ipf/netinet/ip_lookup.h. 2004-10-05 04:56:41 +00:00
yamt
8484dd9eed move ipf headers and add a comment. 2004-10-05 04:55:48 +00:00
jdolecek
46134b3da6 move ip_htable.h from sys/netinet/ to sys/dist/ipf/netinet/, it's ipfilter file 2004-10-02 07:59:14 +00:00
christos
722688d056 These are ipfilter files, although they don't have the same copyright.
Thanks jaromir.
2004-10-02 07:51:11 +00:00
christos
5976437e5f Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
2004-10-01 15:24:45 +00:00
christos
1b492809a0 PR/27082: Sean Boudreau: redundant assignment or NULL dereference in
in_pcbconnect()
2004-09-29 21:30:00 +00:00
christos
d790aa42d0 PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum 2004-09-29 21:28:34 +00:00
christos
7059bc7962 PR/21902: Sean Boudreau: arplookup() incrementing arpstat.as_allocfail
erroneously.
2004-09-29 21:26:52 +00:00
yamt
0ea22c32fa fix ipqent pool corruption problems. make tcp reass code use
its own pool of ipqent rather than sharing it with ip reass code.
PR/24782.
2004-09-15 09:21:22 +00:00
yamt
d676f9e5b0 fr_check_wrapper: as ipf modifies application data as well when
doing application proxy, it's needed to ensure that the whole packet
is writable here.
2004-09-06 10:46:02 +00:00
yamt
d73bcfeb33 fr_check_wrapper, fr_check_wrapper6:
ensure that mbufs are writable beforehand as ipf assumes.
PR/26773 and PR/26850.
2004-09-06 10:00:43 +00:00
darrenr
9ec77d6329 Do not allow packets flagged with "out-of-window" (oow) to match "keep state"
rules and try to prevent such rules ("keep state with oow") from being loaded
into the kernel.

Pr: kern/26581
2004-09-06 09:55:13 +00:00
manu
85111f912e IPv4 PIM support, based on submission from Pavlin Radoslavov on tech-net@ :
two new files I forgot to add on the first cvs commit.
2004-09-04 23:32:29 +00:00
manu
6e3c639957 IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
2004-09-04 23:29:44 +00:00
darrenr
02c34673a3 add a per-socket counter for dropped UDP packets when the internal buffers
are full.
2004-09-03 18:14:09 +00:00
smb
57643d12c5 Don't try and add a state session if the packet has already been checked
and marked as out of window - trying to do the add will result in a failure
and the packet being blocked, incorrectly.

Committed By: darrenr
Tested By: smb
2004-09-03 04:18:09 +00:00
chs
34187f4589 fix m_pulldown() usage, it's different from m_pullup().
fixes PRs 26666 and 26701.
2004-08-22 21:38:21 +00:00
itojun
682ddb0274 initialize max_keylen for ip_encap.c earlier 2004-08-17 07:05:34 +00:00
yamt
28b17ac69e in_control: fix address leaks on error, which causes a panic
("no domain for AF 0") on if_detach.
- SIOCAIFADDR, SIOCSIFADDR: free an address on error.
- SIOCSIFNETMASK, SIOCSIFDSTADDR: reject operations for an interface which
  has no AF_INET addresses.

partly from OpenBSD and FreeBSD.
reviewed by Christos Zoulas on tech-net@.
2004-08-08 09:52:41 +00:00
christos
f3a2c3728b remove the avail = 0; assignment which is superfluous. pointed out by enami. 2004-08-04 03:55:06 +00:00
christos
5ab21dfa5d PR/26471: Arto Selonen: ipfilter 4.1.3 crashes the system every few hours
Remove extraneous m = NULL assignment that will cause a NULL dereference
later.
2004-08-03 16:16:30 +00:00
cube
19861ea4fe Remove a common (icmpstat). 2004-08-03 13:58:59 +00:00
yamt
48d156e320 call PFIL_NEWIF hooks at a correct place.
(on SIOCAIFADDR rather than SIOCGIFALIAS.)

from Peter Postma, PR/26402.
ok'ed by itojun.
2004-07-26 13:43:14 +00:00
martti
7ff15b917f Upgraded IPFilter to 4.1.3 2004-07-23 05:39:03 +00:00
martti
9e82a8bf0d Import IPFilter 4.1.3 2004-07-23 05:33:55 +00:00
yamt
4374881880 fix typos. PFIL_HOOK -> PFIL_HOOKS 2004-07-18 11:37:38 +00:00
itojun
5807e550e5 typo. Bruno Rohee 2004-07-09 09:15:02 +00:00
christos
d397fc692a Bring in flags from 4.1.2 to make things compile. 2004-07-08 02:52:02 +00:00
mycroft
cc559c8583 Fix SIOCSIFNETMASK -- it needs to use in_ifscrub() and in_ifinit() to update
the interface route and various internal state.  Also, it should use an ifreq,
not an if_aliasreq.  Addresses PR 9604.  (Nothing in our source tree uses
SIOCSIFNETMASK, though.  Perhaps it should be deprecated.)
2004-07-07 01:39:00 +00:00
minoura
c3ed038115 Remove broken code for now: getsockopt(s, IPPROTO_IP, IP_IPSEC_POLICY,...).
It returned EINVAL, now returns ENOPROTOOPT.
Ok'd by itojun.
2004-07-06 04:30:27 +00:00
heas
192b371d42 Adjust description for net.inet.udp.checksum; it does not controll checking,
only computing.
2004-07-02 18:19:51 +00:00
christos
01a2047486 PR/25999: Jeff Rizzo: ipf: ipnat is corrupting "bimap" translations in 2.0_BETA and -current 2004-06-29 22:44:59 +00:00
itojun
2aef0b1784 correct TCP-MD5 support. Jeff Rizzo 2004-06-26 03:29:15 +00:00
itojun
db45a6f189 icmp_reflect: check if m_pkthdr.rcvif is non-NULL before touching it.
icmp_reflect could be called from the output path, so m_pkthdr.rcvif may not
be set.  (found by panic when PF is configured "block return all")
2004-06-25 15:43:00 +00:00
itojun
59302fc979 be careful touching m_pkthdr.rcvif, it could be NULL if the packet was
generated from local node and icmp_error calls icmp_reflect.
2004-06-25 15:24:41 +00:00
itojun
047170b1cc prepare PF-related hooks. reviewed by matt, perry, christos 2004-06-22 12:50:41 +00:00
tron
c465794d70 Correct two errors in fr_check():
1.) Make sure that "pass" is always initialized.
2.) Make sure the code doesn't use a stale mbuf pointer after fr_makefrip()
    has been called. This fixes PR kern/25868.

Analyzed and reviewed by Steve Woodford.
2004-06-16 14:06:23 +00:00
tron
fcda778c8f Don't leak mbuf if ipfr_fastroute6() fails.
Reviewed by Steve Woodford.
2004-06-16 14:02:39 +00:00
itojun
b834441eb5 update mtu value if outgoing interface changes with ipsec ops
(draft-touch-vpn case only?)  iij seil team
2004-06-01 05:06:56 +00:00
itojun
b4ea6633c0 fix SIOC*LIFADDR for IPv4. markus friedl 2004-05-30 06:37:07 +00:00
atatat
4de3747b89 Sysctl descriptions under net subtree (net.key not done) 2004-05-25 04:33:59 +00:00
jonathan
349ad018c7 Remove now-unused variable. 2004-05-23 00:37:27 +00:00
jonathan
c8c7a6dbab With FAST_IPSEC, include <netipsec/key.h>, as Itojun's recent changes
now require KEY_FREESAV() to be in scope.
2004-05-20 22:59:02 +00:00
christos
bd67b97d6a PR/25622: IPV6 return RST and through cloned interfaces was broken.
- checksum was computed incorrectly.
- ipv6 packet was not initialized properly.
- fixed code to be more similar to the v4 counterpart.
2004-05-20 13:55:31 +00:00
christos
b78a596c7a PR/25646: Perry Metzger: Commit a patch that compiles awaiting feedback. 2004-05-20 13:54:19 +00:00
christos
c046c90643 - remove superfluous assignment
- rt_gateway is already a pointer to struct sockaddr; don't take its address
  when assigning it to struct sockaddr_in *
2004-05-18 21:47:45 +00:00
christos
0d17293b81 Fix buffer overrun in in_pcbopts() (FreeBSD PR/66386) 2004-05-18 16:47:08 +00:00
itojun
4ebcfcf29a fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
2004-05-18 14:44:14 +00:00
christos
540c75a594 PR/25103: Martin Husemann: IP Filter 4.4.1 breaks some connections when NATing
patch from Darren applied.
2004-05-10 12:10:31 +00:00
christos
f07e678b45 PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
2004-05-10 01:34:59 +00:00
taca
3657b758c0 Make it comiple without warning; void function fr_checkv4sum() and
fr_checkv6sum() should not return value.
2004-05-09 08:29:30 +00:00
christos
e982110b53 PR/24981: Steven M. Bellovin: ipfilter in 2.0 branch panics the system
patch applied.
2004-05-09 04:17:34 +00:00
christos
865c473c96 PR/25332: HIROSE yuuji: "fastroute(to)" in ipf.conf doesn't work; patch applied 2004-05-09 04:02:32 +00:00
christos
5592d4d1fa PR/25441: Matthew Green: IP-Filter uses M_TEMP when it already has M_IPFILTER 2004-05-09 03:54:43 +00:00
chs
bd3ff85ff7 work around an LP64 problem where we report an excessively large window
due to incorrect mixing of types.
2004-05-08 14:41:47 +00:00
kleink
542839207d Add definitions for the (currently unimplemented) ECN TCP flags;
from Chuck Swiger in PR standards/25058.
2004-05-07 20:11:52 +00:00
jonathan
85b3ba5bf1 Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl,  for
netstat -p ipsec.

New kernel files:
	sys/netipsec/Makefile		(new file; install *_var.h includes)
	sys/netipsec/ipsec_var.h	(new 64-bit mib counter struct)

Changed kernel files:
	sys/Makefile			(recurse into sys/netipsec/)
	sys/netinet/in.h		(fake IP_PROTO name for fast_ipsec
					sysctl subtree.)
	sys/netipsec/ipsec.h		(minimal userspace inclusion)
	sys/netipsec/ipsec_osdep.h	(minimal userspace inclusion)
	sys/netipsec/ipsec_netbsd.c	(redo sysctl subtree from scratch)
	sys/netipsec/key*.c		(fix broken net.key subtree)

	sys/netipsec/ah_var.h		(increase all counters to 64 bits)
	sys/netipsec/esp_var.h		(increase all counters to 64 bits)
	sys/netipsec/ipip_var.h		(increase all counters to 64 bits)
	sys/netipsec/ipcomp_var.h	(increase all counters to 64 bits)

	sys/netipsec/ipsec.c		(add #include netipsec/ipsec_var.h)
	sys/netipsec/ipsec_mbuf.c	(add #include netipsec/ipsec_var.h)
	sys/netipsec/ipsec_output.c	(add #include netipsec/ipsec_var.h)

	sys/netinet/raw_ip.c		(add #include netipsec/ipsec_var.h)
	sys/netinet/tcp_input.c		(add #include netipsec/ipsec_var.h)
	sys/netinet/udp_usrreq.c	(add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
	usr.bin/netstat/fast_ipsec.c	(print fast-ipsec counters)

Changed files:
	usr.bin/netstat/Makefile	(add fast_ipsec.c)
	usr.bin/netstat/netstat.h	(declarations for fast_ipsec.c)
	usr.bin/netstat/main.c		(call KAME-vs-fast-ipsec dispatcher)
2004-05-07 00:55:14 +00:00
skd
1b1b474faa Fix to update all references to mbuf. Fixes case where mbuf is freed twice. 2004-05-04 11:31:52 +00:00
darrenr
39ee9f396a at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.
2004-05-02 05:02:53 +00:00
matt
c41eb5a6f6 defflag TCP_OUTPUT_COUNTERS and TCP_REASS_COUNTERS 2004-05-01 02:21:44 +00:00
matt
da67d85073 Use EVCNT_ATTACH_STATIC{,2} 2004-05-01 02:20:42 +00:00
ragge
79edf5fba0 Send an arp request before the arp entry times out if the entry is active,
to avoid deleting active entries.
Add sysctl support to tune the default arp timeout values.
2004-04-28 14:09:36 +00:00
matt
5a0de7507d When a packet is received that overlaps the left side of the window,
check for RST *before* trimming data and adjust its sequence number.
2004-04-27 14:46:07 +00:00
itojun
362e07a3c9 zero-clear ip6?pseudo before use 2004-04-26 05:18:13 +00:00
itojun
f103f9aee9 declare ip6_hdr_pseudo (for kernel only) and use it for TCP MD5 signature 2004-04-26 05:15:47 +00:00
itojun
67372cc454 sync comment with reality 2004-04-26 05:05:49 +00:00
itojun
e0395ac8f0 make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
2004-04-26 03:54:28 +00:00
matt
5413745100 Remove #else clause of __STDC__ 2004-04-26 01:31:56 +00:00
jonathan
887b782b0b Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP).  Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net.  Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct.  Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures.  Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary.  Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
2004-04-25 22:25:03 +00:00
simonb
b5d0e6bf06 Initialise (most) pools from a link set instead of explicit calls
to pool_init.  Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

 Convert struct session, ucred and lockf to pools.
2004-04-25 16:42:40 +00:00
itojun
22bdfd729d fix how we send RST against ACK. markus@openbsd 2004-04-25 03:29:11 +00:00
itojun
8a0aba4304 indent for little bit better readability 2004-04-25 00:08:54 +00:00
itojun
3b87628cfb fix comment; we no longer move ip+tcp into the same mbuf 2004-04-24 23:59:13 +00:00
matt
41478e7f33 Always include <sys/param.h> first! 2004-04-24 19:59:19 +00:00
ragge
febf637b17 Avoid performance problem in tcp_reass() when appending mbufs to a chain
by keeping a pointer to the last mbuf in the chain.
2004-04-22 15:05:33 +00:00
tls
7eb2f214d5 Change the default state of two tunables; bring our TCP a little bit
closer to normal behaviour for the current century.

New Reno is now on by default (which is really the only reasonable
choice, since we don't do SACK); instead of an initial window of 1
for non-local nets, we now use Sally Floyd's magic 4K rule.
2004-04-22 02:19:39 +00:00
matt
e50668c7fa Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
2004-04-22 01:01:40 +00:00
itojun
d2f1c029b9 kill sprintf, use snprintf 2004-04-21 18:40:37 +00:00
itojun
e133d13e80 kill some strcpy 2004-04-21 18:16:14 +00:00
itojun
0f06e31eb6 no space between function name and paren: foo (blah) -> foo(blah) 2004-04-21 17:49:46 +00:00
matt
e3b919c754 Constify if.c radix.c and route.c (and fix related fallout). 2004-04-21 04:17:28 +00:00
matt
30e63c6236 export tcpstates for _KERNEL and remove tcp_usrreq.c's incorrect
declartion.
2004-04-20 22:54:31 +00:00
itojun
6a16706746 follow draft-ietf-tcpm-tcpsecure-00.txt 3.2 (B):
if SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
2004-04-20 19:49:15 +00:00
itojun
f2e796b13f - respond to RST by ACK, as suggested in NISCC recommendation
- rate-limit ACKs against RSTs and SYNs
2004-04-20 16:52:12 +00:00
matt
5060b3b780 ANSI'fy and de __P 2004-04-18 23:35:56 +00:00
matt
db6a0b431a De __P() 2004-04-18 21:00:35 +00:00
matt
35b9f3ec72 If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it.  Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window.  If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
2004-04-17 23:35:37 +00:00
christos
90e1f431ca adjust to the sbreserve prototype change. 2004-04-17 15:18:53 +00:00
ragge
0a7fe37708 Add back one line which was accidentially removed (by me) a while ago.
Spotted by Markus Friedl (markus at openbsd.org).
2004-04-14 18:07:52 +00:00
christos
99d2bc9467 PR/22551: Invoking tcpcb's get erroneously free'd resulting in to_ticks <= 0
assertion. Approved by he.
2004-04-05 21:49:21 +00:00
matt
efc47093e2 In ip_reass_ttl_descr, make i signed since it's compared to >= 0 2004-04-01 22:47:55 +00:00
martin
8afe56f1c5 A few more ioctl vs. copyin changes, spotted by Bill Studenmund. 2004-04-01 21:54:41 +00:00
martin
9d16150a8e Untangle ioctl copyin/copyout confusion. IP-Filter now actually works
on sparc64 (and probably everywhere else).
2004-04-01 09:24:58 +00:00
dyoung
957f9ce691 Only #define COPYIN copyin, et cetera, in the kernel. That is, only
when when _KERNEL is defined.
2004-03-31 20:58:15 +00:00
darrenr
077337039d COPYIN/COPYOUT macros need to call copyin/out on NetBSD rather than just use
bcopy.
2004-03-31 11:41:45 +00:00
itojun
7cd01f1c20 clean previous commit (uh_sum != 0 check in IPv6) 2004-03-31 07:57:06 +00:00
itojun
8d81738de0 drop packet if IPv6 udp packet does not have checksum (checksum is mandatory
in IPv6).
2004-03-31 07:54:00 +00:00
christos
dc9378460c Make sure we disarm the persist timer before we arm the rexmit
timer, otherwise there is a tiny window where both timers are
active, and this is not correct according to the comments in the
code. I believe that this is the cause of the to_ticks <= 0 assertion
failure in callout_schedule() that I've been getting.
2004-03-30 19:58:14 +00:00
atatat
83b193a052 Make these compile without INET. tcp_input probably needs a lot more
work...
2004-03-29 04:59:02 +00:00
martin
665588c20c Cast 64 bit pointers only with (intptr_t) care. 2004-03-28 12:12:28 +00:00
martti
621e9bac7f Sync with official IPFilter 2004-03-28 09:01:26 +00:00
martti
24d567d60d Upgraded IPFilter to 4.1.1 2004-03-28 09:00:53 +00:00
martti
ad9b29ed97 Import IPFilter 4.1.1 2004-03-28 08:55:20 +00:00
atatat
19af35fd0d Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
2004-03-24 15:34:46 +00:00
itojun
3811eef49d typo 2004-03-23 05:31:54 +00:00
drochner
6a4fbf616c fix tcp/udp checksum test in the M_CSUM_NO_PSEUDOHDR case
(this can never have worked)
now I can use a "bge" gigabit interface with hw checksumming
ttcp-t: 2147483648 bytes in 18.31 real seconds = 114527.11 KB/sec +++
woow!
2004-03-10 18:50:45 +00:00
wiz
e8f4f5ba76 No need to include netinet/ip_mroute.h twice.
Closes PR 24652 by Kailash Sethuraman.
2004-03-04 15:15:06 +00:00
thorpej
8387ab32c5 Use IPSEC_PCB_SKIP_IPSEC() to short-circuit calls to ipsec{4,6}_hdrsiz_tcp(). 2004-03-03 05:59:38 +00:00
thorpej
2803ff0955 Use the new IPSEC_PCB_SKIP_IPSEC() to bypass a socket policy lookup
when possible.  This shaves several cycles from the output path for
non-IPsec connections, even if the policy is cached in the PCB.
2004-03-02 02:28:28 +00:00
thorpej
00f100daae Call ipsec_pcbconn() and ipsec_pcbdisconn() for FAST_IPSEC, too. 2004-03-02 02:26:28 +00:00
thorpej
979f197a86 Define a sotoinpcb_hdr() macro (a'la sotoinpcb()). 2004-03-02 02:11:14 +00:00
itojun
8ef33296ff KNF 2004-02-26 02:34:59 +00:00
wiz
73e1501b98 parameter with two es. From Peter Postma. 2004-02-24 15:22:01 +00:00
wiz
f05e6f1a3a occured -> occurred. From Peter Postma. 2004-02-24 15:12:51 +00:00
itojun
d334411bcd deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph).
check if there really is room for TCP data.
2004-02-04 05:36:03 +00:00
abs
c02c2d8844 Allow DEF_NAT_AGE to be set in kernel config. 2004-01-16 09:01:22 +00:00
itojun
0146a277ba correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy 2004-01-15 05:13:17 +00:00
itojun
3ffdb9507a avoid deref-after-free.
http://sources.zabbadoz.net/freebsd/patchset/106-ipsec-pcb-discon.diff
2004-01-13 06:17:14 +00:00
matt
9196bdd1f8 When accepting a peer's MSS, never let it drop below 256 (SLIP + TCP will
be the lowest MSS we should ever enounter).
2004-01-07 19:15:43 +00:00
tron
784a553ad1 Remove extra tokens at end of #undef directive. 2004-01-03 22:34:38 +00:00
itojun
4fc59b19d5 no need for tmp = arc4randomid here 2004-01-02 20:51:51 +00:00
itojun
7cddb2827b whitespace 2004-01-02 15:51:45 +00:00
itojun
344b08b44b some corrections from markus@openbsd;
- callout_ack() was called with wrong argument
2004-01-02 15:51:04 +00:00
itojun
5377ace199 some corrections from markus@openbsd;
- callout_ack() was called with wrong argument
- no need for xor with timestamp as we are using arc4random()
- minor typo/cleanup
2004-01-02 12:01:39 +00:00
wiz
d46bc94200 Niels Provos kindly agreed to drop clauses 3 and 4 from the
license -- thanks.
Based on OpenBSD commit and hints by itojun.
2003-12-26 19:04:55 +00:00
abs
8724ebf7f9 Comment out #undef LARGE_NAT so LARGE_NAT can be set in a kernel config file
without having to edit this file as well.
2003-12-16 12:15:04 +00:00
thorpej
0c4c58a70b Fix syntax errors in CHECK_NMBCLUSTER_PARAMS(). 2003-12-14 01:14:24 +00:00
jonathan
9c1a5c5570 Second part of hashed IP_reassembly changes:
When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters.  Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.
2003-12-14 00:09:24 +00:00
scw
6aec1d6812 Make fast-ipsec and ipflow (Fast Forwarding) interoperate.
The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
2003-12-12 21:17:59 +00:00
itojun
aa8a6718f0 use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces.  from kame
2003-12-10 11:46:33 +00:00
itojun
c81f32fe6c comment from niels provos;
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
  the repetition period to 30000)
2003-12-10 05:22:18 +00:00
jonathan
626b230d59 Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.
2003-12-08 02:23:27 +00:00
jonathan
27171efb6d KNF: s/unsigned/u_int/, in a couple of places I missed. 2003-12-07 01:18:26 +00:00
jonathan
c56097abb8 Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.
2003-12-06 23:56:10 +00:00
atatat
13f8d2ce5f Dynamic sysctl.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al.  Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded.  Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment.  I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
2003-12-04 19:38:21 +00:00
christos
0aac876eae fix unused variable warnings when LARGE_NAT is defined. 2003-12-04 15:32:01 +00:00
scw
7ef39665ff ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.
XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.
2003-12-04 10:02:35 +00:00
itojun
a748550c99 always compile ip_id.c 2003-11-26 21:26:56 +00:00
itojun
326cfe57d2 define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.
2003-11-26 21:15:47 +00:00
itojun
0864b4939d "seed2" was ruining non-repeating property, so remove it. discussed on tech-net 2003-11-25 18:13:55 +00:00
itojun
f51095cf7c knf 2003-11-25 14:44:13 +00:00
scw
fd11abcb03 For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.
2003-11-24 20:54:59 +00:00
yamt
bba8d5af45 comments on tcp_outflags. 2003-11-20 16:21:48 +00:00
fvdl
f2fdecfc92 Correct number of arguments to sysctl_rdint. 2003-11-19 22:40:55 +00:00
jonathan
b6e73d53fb Footwork for fast-ipsec and IPv6: when compiling sys/netinet/tcp_input.c
for both FAST_IPSEC and INET6, include <netipsec/ipsec6.h>.
2003-11-19 20:47:00 +00:00
jonathan
130f3bfc26 Patch back support for (badly) randomized IP ids, by request:
* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
  so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
  the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
  configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
  a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
2003-11-19 18:39:34 +00:00
jonathan
de80d1419e Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:
Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code.  Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be  provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
2003-11-17 22:34:16 +00:00
jonathan
995c532c33 Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code.  Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be  provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
2003-11-17 21:34:27 +00:00
jonathan
fa24e6f3f8 Add m_tag_delete_nonpesrsistent(), for deleting all packet tags on
mbuf chains which are recycled (e.g., ICMP reflection, loopback
interface).  A consensus was reached that such recycled packets should
behave (more-or-less) the same way if a new chain had been allocated
and the contents copied to that chain.

Some packet tags may in future be marked as "persistent" (e.g., for
mandatory access controls) and should persist across such deletion.
NetBSD as yet hos no persistent tags, so m_tag_delete_nonpersistent()
just deletes all tags. This should not be relied upon.
2003-11-13 01:48:12 +00:00
itojun
d46ad3421a KNF 2003-11-12 15:00:05 +00:00
ragge
4a9b211e76 Remove the FAST_MBSEARCH ifdef, send packet prediction is now default. 2003-11-12 10:48:04 +00:00
jonathan
79bf8521a5 Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
2003-11-11 20:25:26 +00:00
jonathan
b86d07f435 Allocate sysctl oid for ipv4 sysctl node "ifq", define symbolic name, and
bump IPCTL_MAXID. (Should have been committed with other ifq sysctl changes).
2003-11-10 20:50:29 +00:00
jonathan
88ba77e705 Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.
2003-11-10 20:03:29 +00:00
simonb
a2facef339 Remove some assigned-to but otherwise unused variables. 2003-10-30 01:43:08 +00:00
mycroft
d7f0f6de8f Do the previous differently. 2003-10-28 20:27:22 +00:00
provos
57755c156a use a hash table to bind to local ports; suggested by markus friedl
approved: fvdl@
2003-10-28 17:18:37 +00:00
thorpej
db71356cd1 - Change callout_setfunc() to require that the callout handle is already
initialized.  Update the txp(4) to compensate.
- Statically initialize the TCP timer callout handles in the tcpcb
  template.  We still use callout_setfunc(), but that call is now much
  less expensive.  Add a comment that the compiler is likely to unroll
  the loop (so don't sweat that it's there).
2003-10-27 16:52:01 +00:00
itojun
3fef2ba893 make it compilable with TCP_DEBUG defined 2003-10-27 07:43:01 +00:00
christos
2017bf9a94 Fix uninitialized variable warning 2003-10-25 18:31:59 +00:00
christos
649137925e initialize off 2003-10-25 08:13:28 +00:00
ragge
da20a11a23 Fix the bug in the tcp transmit prediction code.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
2003-10-24 10:25:40 +00:00
enami
935b3c7ad5 Make this file compile again when TCP_OUTPUT_COUNTERS defined. 2003-10-24 03:12:53 +00:00
mycroft
5a8b331f54 Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
2003-10-23 20:55:08 +00:00
thorpej
e8a98ee63e Oops, FAST_MBSEARCH counters were swapped; fix it. Pointed out by yamt@. 2003-10-23 17:02:23 +00:00
thorpej
9e4220c00a Oops, a little to aggressive in the previous patch; TCP_TIMER_INIT()
still needs to be in tcp_newtcpcb(), for now.  Pointed out by enami.
2003-10-22 05:55:54 +00:00
thorpej
31923baa46 Rather than zeroing a tcpcb structure and filling in all the fields
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters.  The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass.  The template is kept up-to-date as TCP sysctl
variables are changed.

Combined with the previous sb_max change, TCP socket creation is now
25% faster.
2003-10-22 02:45:57 +00:00
thorpej
861856caa0 Add event counters that measure FAST_MBSEARCH. 2003-10-21 21:17:20 +00:00
enami
e51f5c64e5 Fix indent. 2003-10-18 13:05:45 +00:00
enami
bae9643b84 Increment stats when packet is dropped since there is no room
to put all fragments in the interfaces's send queue.  Some large
UDP packets are dropped here and administrator may want to bump ifqmaxlen.
2003-10-17 20:31:12 +00:00
itojun
5e7b0c710b more correction to ip_fragment; free mbuf correctly if ENOBUFS is raised
during fragmenting.
2003-10-14 06:36:48 +00:00
itojun
00af50df1b avoid mbuf leak on ip_fragment(); obey 4.4bsd mbuf passing rule (mbuf passed
to a function must be freed by the called function on error).
pointed out by enami
2003-10-14 03:38:49 +00:00
mycroft
f2fc15d4b5 There is also no reason to use arc4random() here. 2003-10-07 21:24:56 +00:00
itojun
98d5598feb when dropping M_PKTHDR, need to free m_tag associated with it. 2003-10-03 20:56:11 +00:00
itojun
899b67c09a correct ip_fragment() wrt ip->ip_off handling.
do not send out incomplete fragment due to ENOBUFS (behavior change from 4.4BSD)
2003-10-01 23:54:40 +00:00
tls
b911732f2a Increase default socket-buffer sizes from 16K to 32K. This increases
throughput significantly in a wide variety of test cases, including
local gigabit ethernet with both jumbo and standard frames,
transcontinental (U.S.) connections with e2e bandwidths ranging from
10Mbit/sec to 155Mbit/sec, and on a variety of test connections
between the NetBSD Project public servers and machines in Australia.

The impact of this change is less dramatic for high-delay connections
when Path MTU is in use but still measurable.

For optimal performance on local gigabit networks, a higher socket
buffer size (at least 64K) will still yield a substantial improvement
in performance, but 32K gets us most of the way there in my test
cases, with only a cost of _doubling_ memory use per socket rather
than _quadrupling_ it.

N.B. Windows NT, at least since Win2k SP2, uses a default socket buffer
     size (or their analogue thereof) of 64K, which is a useful data
     point.
2003-09-29 21:39:35 +00:00
mycroft
ca96c7c4ec Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."
2003-09-28 04:45:14 +00:00
mycroft
3114965161 Fix glaring errors in recent changes. 2003-09-25 00:59:31 +00:00
itojun
8d9a724638 on arplookup() failure, nuke cloned route - otherwise outsider could use massive
number of bogus ARPs for DoS attack.  FreeBSD-SA-03:14.arp
2003-09-24 06:52:47 +00:00
jonathan
5923dedaeb Fast-ipsec can call ip_output() with a null 'struct socket *so'
argument.  So check so is non-NULL before doing the pointer-chasing
dance to find the PCB. (Unless and until we rework fast-ipsec and
KAME, to pass a struct in_pcbhdr * instead of the struct socket *).
2003-09-19 00:27:56 +00:00
itojun
a3931fc5ab exp is reserved name under posix 2003-09-16 00:31:55 +00:00
itojun
6b33d95e22 send icmp admin prohibit if socket policy mismatches. 2003-09-12 09:55:22 +00:00
itojun
644a4857fb cut-and-paste error. Valeriy E. Ushakov 2003-09-10 01:46:27 +00:00
itojun
99bc41d6fd if IPsec inbound policy mismatches, respond to SYN with RST (instead of
just dropping it), allow client to react quickly.
2003-09-10 00:58:29 +00:00
itojun
495bd5ff91 initialize ip_hl for ipsec policy lookup. PR kern/22715 2003-09-08 02:06:34 +00:00
itojun
32e3deae21 randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields.  ip_id.c is from openbsd.  ip6_id.c is adapted by kame.
2003-09-06 03:36:30 +00:00
itojun
175c9afa3f clarify flowlabel handling 2003-09-06 03:12:51 +00:00
itojun
dd45bfac41 backout previous, we don't know if arc4random() corrides on reboot. 2003-09-06 00:24:54 +00:00
itojun
9636351c96 u_short -> u_int16_t 2003-09-05 23:02:40 +00:00
itojun
186bd1ad6a initialize fragment ID with arc4random, not by time.tv_sec 2003-09-05 22:09:38 +00:00
itojun
495906ca8e revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
2003-09-04 09:16:57 +00:00
itojun
5c39f4aaa7 don't intiialize m by m0, m0 is not initialized (by introduction of ip_fragment) 2003-08-27 02:09:59 +00:00
itojun
3e76200c67 need sys/domain.h for FAST_IPSEC case; jonathan 2003-08-23 01:41:10 +00:00
itojun
a3bad645a4 make sure so is properly initialized 2003-08-22 22:49:34 +00:00
itojun
58f57a60fd tp could be null in tcp_respond() 2003-08-22 22:27:07 +00:00
itojun
4e6aca94c2 correct missing inclusion of opt_ipsec.h 2003-08-22 22:11:44 +00:00
itojun
11ede1ed88 remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output. 2003-08-22 22:00:36 +00:00
itojun
82eb4ce914 change the additional arg to be passed to ip{,6}_output to struct socket *.
this fixes KAME policy lookup which was broken by the previous commit.
2003-08-22 21:53:01 +00:00
jonathan
9339ef0381 Change KAME code for ip_output()/ip6_output() to obtain struct socket*
from the explicit inpcb*/in6pcb* argument.  set_socket() becomes redundant.
2003-08-22 20:29:00 +00:00
jonathan
902669955f Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec.  Reviewed by itojun.
2003-08-22 20:20:09 +00:00
jonathan
6196bbe72d Honour the M_CSUM_NO_PSEUDOHDR, if set on inbound TCP and UDP packets.
Tested against  bcm5700 with patched if_bge.c.
2003-08-21 14:49:49 +00:00
itojun
b83dd2f98b remove unneeded #ifdef __NetBSD__ 2003-08-19 08:00:54 +00:00
itojun
ade8129bdc make ip_fragment public (it is for coming PF integration) 2003-08-19 01:20:03 +00:00
christos
ae572737ba make ip_fragment static and add prototype. 2003-08-19 00:54:41 +00:00
itojun
4f8ba921cd correct ip_multicast_if fix to always set ifp (tnx Shiva) 2003-08-19 00:17:38 +00:00
itojun
449b5c43d4 since we cope with packets with addess on !IFF_UP interface in ip_input()
properly, IFF_UP check in INADDR_TO_IA is obsolete (or too much).
2003-08-18 22:28:51 +00:00
itojun
122edbc337 fix problem we can't drop membership on !IFF_UP interface.
reported by Shiva Shenoy

while we're here, fix another problem when the same interface address is
assigned to !IFF_MULTICAST and IFF_MULTICAST interface.  if ip_multicast_if()
returns the first one, join/leave will fail, which is not an desired effect.
2003-08-18 22:23:22 +00:00