ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.
This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.
Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
packets. PR 11082.
This is a short-term workaround. whenever new ipfilter comes out with
proper non-IPv4 support, we should migrate to the new ipfilter.
each in_ifaddr and delete it when an address is purged.
- Don't simply try to delete a multicast address record listed in the
ia_multiaddrs. It results a dangling pointer. Let who holds a
reference to it to delete it.
- when all the interface address is removed from an interface, and there's
multicast groups still left joined, keep it in kludge table.
- when an interface address is added again, recover multicast groups from
kludge table.
this will avoid problem with dangling in_ifaddr on pcmcia card removal,
due to the link from multicast group info (in_multi).
the code is basically from sys/netinet6/in6.c (jinmei@kame).
pointed out by: Shiva Shenoy <shiva_s@yahoo.com>
Without this, if a v6 address is placed before a v4 address in if_addrlist,
a PRU_PURGEIF request for v6 tcp protocol purges also v4 addresses and,
as a result, if_detach fails to request PRU_PURGEIF for v4 protocols
other than tcp.
this avoids too aggressive memory usage on heavy load web server, for example.
From: Kevin Lahey <kml@dotrocket.com>
release and reallocate t_template, if t_template->m_len changes.
(this happens if we connect to IPv4 mapped destination and then IPv6
destination, on a single AF_INET6 socket)
KAME 1.26 -> 1.28
rev 1.35 of ip_nat.c checks if packets are too short.
For ICMP packets, this packet length checking double counts
the length of an IP header contained in ICMP messages.
So, unless ICMP packets are long enough (such as echo-reply),
packets are mistakingly considered too short and are dropped.
multiple addresses from same prefix, onto single interface. PR 10427.
more info:
- 4.4BSD did not check return code from in_ifinit() at all.
4.4BSD does not support multiple address from same prefix.
- past KAME change passed in{,6}_ifinit() to upwards, toward ifconfig(8).
the behavior is filed as PR 10427.
- the commit inhibits EEXIST from rtinit(), hence partially recovers old
4.4BSD behavior.
- the right thing to happen is to properly support multiple address assignment
from the same prefix. KAME tree has more extensive change, however, it needs
much more time to get stabilized (rtentry refcnt change can cause serious
issue, we really need to bake it before bring it to netbsd)
fails, return EIO instead.
- iplioctl(): If performing a NAT operation, and IP Filter is not
yet initialized (e.g. by `ipf -E'), enable it implicitly before
doing the NAT operation.
basis. default: 100pps
set default value for net.inet.tcp.rstratelimit to 0 (disabled),
NOTE: it does not work right for smaller-than-1/hz interval. maybe we should
nuke it, or make it impossible to set smaller-than-1/hz value.
unspecified address (::) to mean "unbounded" or "unconnected",
and can be confused by packets from outside.
use of :: as source is not documented well in IPv6 specification.
not sure if it presents a real threat. the worst case scenario is a DoS
against TCP listening socket:
- outsider transmit TCP SYN with :: as IPv6 source
- receiving side creates TCP control block with:
local address = my addres
remote address = :: (meaning "unconnected")
state = SYN_RCVD
note that SYN ACK will not be sent due to ip6_output() filter.
this stays until it timeouts.
- the TCP control block prevents listening TCP control block from
being contacted (DoS).
udp6/raw6 socket may have similar problem, but as they are connectionless,
it may too much to filter it out.
reports a size smaller than the udp header; defends against bogosity
detected by Assar Westerlund.
This patch and the previous ip_icmp.c change were the joint work of
assar, itojun, and myself.
- allow it to work when icmpreturndatabytes is sufficiently large that the
icmp error message doesn't fit in a header mbuf.
- defend against mbuf chains shorter than their contained ip->ip_len.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
in ip_mroute.c for duplicate tunnel entries, too. Well, what
really needs to happen is that the mrouting code needs to be
changed to work w/ `gif' tunnels... but...
<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.
fix ipip to work with gif (using ip_encap.c). sorry for breakage.
gif now uses ip_encap.c.
introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.
The old timeout()/untimeout() API has been removed from the kernel.
it was a bit too strong, and forbids multiple addresses from
same prefix to be assigned.
now the behavior is the same as previous - memory leak on interface address
addition failure.
http://orange.kame.net/dev/query-pr.cgi?pr=218
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo
between protocol handlers.
ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.
due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.
take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).
this will bump kernel version.
(as discussed in tech-net, tested in kame tree)
draft-ietf-ipngwg-icmp-name-lookups-04.txt.
There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.
Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.
Note, we're reusing the previously unused slot for "MTU discovery" (which
was moved to the "net.inet.ip" branch of the sysctl tree quite some time
ago).
- Filter out multicast destinations explicitly for every incoming packet,
not just SYNs. Previously, non-SYN multicast destination would be
filtered out as a side effect of PCB lookup. Remove now redundant
similar checks in the dropwithreset case and in syn_cache_add().
- Defer the TCP checksum until we know that we want to process the
packet (i.e. have a non-CLOSED connection or a listen socket).
increasing both of them will result in negative number on udp
"delivered" stat on netstat(8), since netstat computes number of delivered
packet by subtracting them from number of inbound packets.
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1
NetBSD PR: 9387
From: nrt@iij.ad.jp
RFC2553/2292-compliant header file path, now the following headers are
forbidden:
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h
if you want netinet6/{ip6,icmp6}.h, use netinet/{ip6,icmp6}.h.
if you want netinet6/in6.h, you just need to include netinet/in.h.
it pulls it in.
(we may need to integrate them into netinet/in.h, but for cross-BSD code
sharing i'd like to keep it like this for now)
(netinet6/{ip6,icmp6}.h is non-standard path - these files should go away)
it was not possible to use cvsmove in this case.
when you try to look at history, chase it toward netinet6/{ip6,icmp6}.h.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.
improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.
(sync with kame)
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.
This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited
XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.
XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.
"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
Patch from Darren send to the mailing list after he released 3.3.6 and
did a bad job with using the wrong way to update the NetBSD version
of ipfilter.
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.
TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach
(sorry for jumbo commit, I can't separate this any more...)
(if the definition is like in rfc2553) they are not supposed to be used.
XXX i'm trying to change rfc2553 sockaddr_storage definition to include
"ss_len" and "ss_family". see ipngwg. situation might change soon.
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.
The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.
synchronization to latest KAME will take place on HEAD branch soon.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.
Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.
Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.
Note that multicast forwarding does not go through ip_forward().
Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.
back into the packet. (ip_output() clears it since ipsec reuses that
packet field in the output path. by putting it back, we're going to
pretend we're back on the input path now).
This is important, because for most protocols, link level fragmentation is
used, but with different default effective MTUs. (e.g.: IPv4 default MTU
is 1500 octets, IPv6 default MTU is 9072 octets).
MSS advertisement must always be:
max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.
tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.
tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.
(sync with recent KAME)
the member is used to pass struct socket to ip{,6}_output for ipsec decisions.
(i agree it is kind of ugly. we need to modify struct mbuf if we are
to do better - which seems to me a bit too much)
New Reno fast recovery code was being executed even when New Reno was
disabled, resulting in an unfortunate interaction with the traditional
fast recovery code, the end resulting being that the very condition
that would trigger the traditional fast recovery mechanism caused fast
recovery to be disabled!
Problem reported by Ted Lemon, and some analytical help from Charles Hannum.
Stale syn cache entries are useless because none of them will be used
if there is no listening socket, as tcp_input looks up listening socket by
in_pcblookup*() before looking into syn cache.
This fixes race condition due to dangling socket pointer from syn cache
entries to listening socket (this was introduced when ipsec is merged in).
This should preserve currently implemented behavior (but not 4.4BSD
behavior prior to syn cache).
Tested in KAME repository before commit, but we'd better run some
regression tests.
check that the packet if of the rigth protocol before giving it to the
proxy module, otherwise let the ipnat code handle it.
What happens in kern/7831 is that a router sends back a icmp message for
a TCP SYN, and ip_proxy.c forwards it to ip_ftp_pxy.c which can only
handle TCP packets. The icmp message is properly handled by ipnat, no need to
go to ip_ftp_pxy.c.
- Make sure that snd_recover is always at least snd_una. If we don't do
this, there can be confusion when sequence numbers wrap around on a
large loss-free data transfer.
- When doing a New Reno retransmit, snd_una hasn't been updated yet,
and the socket's send buffer has not yet dropped off ACK'd data, so
don't muddle with snd_una, so that tcp_output() gets the correct data
offset.
- When doing a New Reno retransmit, make sure the congestion window is
open one segment beyond the ACK'd data, so that we can actually perform
the retransmit.
Partially derived from, although more complete than, similar changes in
OpenBSD, which in turn originated from Tom Henderson <tomh@cs.berkeley.edu>.