1543 lines
66 KiB
Plaintext
1543 lines
66 KiB
Plaintext
$NetBSD: IMPLEMENTATION,v 1.13 2000/06/10 08:21:11 itojun Exp $
|
|
|
|
# NOTE: this is from original KAME distribution.
|
|
# Some portion of this document is not applicable to the code merged into
|
|
# NetBSD-current (for example, section 5). Check sys/netinet6/TODO as well.
|
|
|
|
Implementation Note
|
|
|
|
KAME Project
|
|
http://www.kame.net/
|
|
KAME Date: 2000/06/10 08:18:06
|
|
|
|
1. IPv6
|
|
|
|
1.1 Conformance
|
|
|
|
The KAME kit conforms, or tries to conform, to the latest set of IPv6
|
|
specifications. For future reference we list some of the relevant documents
|
|
below (NOTE: this is not a complete list - this is too hard to maintain...).
|
|
For details please refer to specific chapter in the document, RFCs, manpages
|
|
come with KAME, or comments in the source code.
|
|
|
|
Conformance tests have been performed on past and latest KAME STABLE kit,
|
|
at TAHI project. Results can be viewed at http://www.tahi.org/report/KAME/.
|
|
We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
|
|
in the past, with our past snapshots.
|
|
|
|
RFC1639: FTP Operation Over Big Address Records (FOOBAR)
|
|
* RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428,
|
|
then RFC1639 if failed.
|
|
RFC1886: DNS Extensions to support IPv6
|
|
RFC1933: Transition Mechanisms for IPv6 Hosts and Routers
|
|
* IPv4 compatible address is not supported.
|
|
* automatic tunneling (4.3) is not supported.
|
|
* "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
|
|
and it covers "configured tunnel" described in the spec.
|
|
See 1.5 in this document for details.
|
|
RFC1981: Path MTU Discovery for IPv6
|
|
RFC2080: RIPng for IPv6
|
|
* KAME-supplied route6d, bgpd and hroute6d support this.
|
|
RFC2283: Multiprotocol Extensions for BGP-4
|
|
* so-called "BGP4+".
|
|
* KAME-supplied bgpd supports this.
|
|
RFC2292: Advanced Sockets API for IPv6
|
|
* For supported library functions/kernel APIs, see sys/netinet6/ADVAPI.
|
|
RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
|
|
* RFC2362 defines packet formats for PIM-SM. draft-ietf-pim-ipv6-01.txt
|
|
is written based on this.
|
|
RFC2373: IPv6 Addressing Architecture
|
|
* KAME supports node required addresses, and conforms to the scope
|
|
requirement.
|
|
RFC2374: An IPv6 Aggregatable Global Unicast Address Format
|
|
* KAME supports 64-bit length of Interface ID.
|
|
RFC2375: IPv6 Multicast Address Assignments
|
|
* Userland applications use the well-known addresses assigned in the RFC.
|
|
RFC2428: FTP Extensions for IPv6 and NATs
|
|
* RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428,
|
|
then RFC1639 if failed.
|
|
RFC2460: IPv6 specification
|
|
RFC2461: Neighbor discovery for IPv6
|
|
* See 1.2 in this document for details.
|
|
RFC2462: IPv6 Stateless Address Autoconfiguration
|
|
* See 1.4 in this document for details.
|
|
RFC2463: ICMPv6 for IPv6 specification
|
|
* See 1.8 in this document for details.
|
|
RFC2464: Transmission of IPv6 Packets over Ethernet Networks
|
|
RFC2465: MIB for IPv6: Textual Conventions and General Group
|
|
* Necessary statistics are gathered by the kernel. Actual IPv6 MIB
|
|
support is provided as patchkit for ucd-snmp.
|
|
RFC2466: MIB for IPv6: ICMPv6 group
|
|
* Necessary statistics are gathered by the kernel. Actual IPv6 MIB
|
|
support is provided as patchkit for ucd-snmp.
|
|
RFC2467: Transmission of IPv6 Packets over FDDI Networks
|
|
RFC2472: IPv6 over PPP
|
|
RFC2492: IPv6 over ATM Networks
|
|
* only PVC is supported.
|
|
RFC2497: Transmission of IPv6 packet over ARCnet Networks
|
|
RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
|
|
RFC2553: Basic Socket Interface Extensions for IPv6
|
|
* IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
|
|
socket (3.8) are,
|
|
- supported on KAME/FreeBSD3x,
|
|
- supported on KAME/NetBSD,
|
|
- supported on KAME/BSDI4,
|
|
- not supported on KAME/FreeBSD228, KAME/OpenBSD and KAME/BSDI3.
|
|
see 1.12 in this document for details.
|
|
RFC2675: IPv6 Jumbograms
|
|
* See 1.7 in this document for details.
|
|
RFC2710: Multicast Listener Discovery for IPv6
|
|
RFC2711: IPv6 router alert option
|
|
RFC2732: Format for Literal IPv6 Addresses in URL's
|
|
* The spec is implemented in programs that handle URLs
|
|
(like freebsd ftpio(3) and fetch(1), or netbsd ftp(1))
|
|
draft-ietf-ipngwg-router-renum-10: Router renumbering for IPv6
|
|
draft-ietf-ipngwg-icmp-name-lookups-05: IPv6 Name Lookups Through ICMP
|
|
draft-ietf-pim-ipv6-03.txt: PIM for IPv6
|
|
* pim6dd implements dense mode. pim6sd implements sparse mode.
|
|
draft-ietf-dhc-dhcpv6-15.txt: DHCPv6
|
|
draft-ietf-dhc-dhcpv6exts-12.txt: Extensions for DHCPv6
|
|
* kame/dhcp6 has test implementation, which will not be compiled in
|
|
default compilation.
|
|
draft-itojun-ipv6-tcp-to-anycast-00.txt:
|
|
Disconnecting TCP connection toward IPv6 anycast address
|
|
draft-ietf-ipngwg-scopedaddr-format-01.txt:
|
|
An Extension of Format for IPv6 Scoped Addresses
|
|
draft-ietf-ngtrans-tcpudp-relay-01.txt:
|
|
An IPv6-to-IPv4 transport relay translator
|
|
* FAITH tcp relay translator (faithd) implements this. See 3.1 for more
|
|
details.
|
|
draft-ietf-ngtrans-6to4-06.txt:
|
|
Connection of IPv6 Domains via IPv4 Clouds without Explicit Tunnels
|
|
* "stf" interface implements it. Be sure to read the next item before
|
|
configuring it, there are security issues.
|
|
http://playground.iijlab.net/i-d/draft-itojun-ipv6-transition-abuse-00.txt:
|
|
Possible abuse against IPv6 transition technologies
|
|
* KAME does not implement RFC1933 automatic tunnel.
|
|
* "stf" interface implements some address filters. Refer to stf(4)
|
|
for details. Since there's no way to make 6to4 interface 100% secure,
|
|
we do not include "stf" interface into GENERIC.v6 compilation.
|
|
* kame/openbsd completely disables IPv4 mapped address support.
|
|
* kame/netbsd makes IPv4 mapped address support off by default.
|
|
* See section 12.6 and 14 for more details.
|
|
|
|
1.2 Neighbor Discovery
|
|
|
|
Neighbor Discovery is fairly stable. Currently Address Resolution,
|
|
Duplicated Address Detection, and Neighbor Unreachability Detection
|
|
are supported. In the near future we will be adding Unsolicited Neighbor
|
|
Advertisement transmission command as admin tool.
|
|
|
|
Duplicated Address Detection (DAD) will be performed when an IPv6 address
|
|
is assigned to a network interface, or the network interface is enabled
|
|
(ifconfig up). It is documented in RFC2462 5.4.
|
|
If DAD fails, the address will be marked "duplicated" and message will be
|
|
generated to syslog (and usually to console). The "duplicated" mark
|
|
can be checked with ifconfig. It is administrators' responsibility to check
|
|
for and recover from DAD failures. We may try to improve failure recovery
|
|
in future KAME code.
|
|
DAD procedure may not be effective on certain network interfaces/drivers.
|
|
If a network driver needs long initialization time (with wireless network
|
|
interfaces this situation is popular), and the driver mistakingly raises
|
|
IFF_RUNNING before the driver becomes ready, DAD code will try to transmit
|
|
DAD probes to not-really-ready network driver and the packet will not go out
|
|
from the interface. In such cases, network drivers should be corrected.
|
|
|
|
Some of network drivers loop multicast packets back to themselves,
|
|
even if instructed not to do so (especially in promiscuous mode).
|
|
In such cases DAD may fail, because DAD engine sees inbound NS packet
|
|
(actually from the node itself) and considers it as a sign of duplicate.
|
|
You may want to look at #if condition marked "heuristics" in
|
|
sys/netinet6/nd6_nbr.c:nd6_dad_timer() as workaround (note that the code
|
|
fragment in "heuristics" section is not spec conformant).
|
|
|
|
Neighbor Discovery specification (RFC2461) does not talk about neighbor
|
|
cache handling in the following cases:
|
|
(1) when there was no neighbor cache entry, node received unsolicited
|
|
RS/NS/NA/redirect packet without link-layer address
|
|
(2) neighbor cache handling on medium without link-layer address
|
|
(we need a neighbor cache entry for IsRouter bit)
|
|
For (1), we implemented workaround based on discussions on IETF ipngwg mailing
|
|
list. For more details, see the comments in the source code and email
|
|
thread started from (IPng 7155), dated Feb 6 1999.
|
|
|
|
IPv6 on-link determination rule (RFC2461) is quite different from assumptions
|
|
in BSD IPv4 network code. To implement behavior in RFC2461 section 5.2
|
|
(when default router list is empty), the kernel needs to know the default
|
|
outgoing interface. To configure the default outgoing interface, use
|
|
commands like "ndp -I de0" as root. Note that the spec misuse the word
|
|
"host" and "node" in several places in the section.
|
|
|
|
To avoid possible DoS attacks and infinite loops, KAME stack will accept
|
|
only 10 options on ND packet. Therefore, if you have 20 prefix options
|
|
attached to RA, only the first 10 prefixes will be recognized.
|
|
If this troubles you, please contact KAME team and/or modify
|
|
nd6_maxndopt in sys/netinet6/nd6.c. If there are high demands we may
|
|
provide sysctl knob for the variable.
|
|
|
|
Proxy Neighbor Advertisement support is implemented in the kernel.
|
|
You can configure it by using the following command:
|
|
# ndp -s fe80:1::1234 0:1:2:3:4:5 proxy
|
|
You need to fill in scope index into the address - see 1.3.3.
|
|
There are certain limitations, though:
|
|
- It does not send unsolicited multicast NA on configuration. This is MAY
|
|
behavior in RFC2461.
|
|
- It does not add random delay before transmission of solicited NA. This is
|
|
SHOULD behavior in RFC2461.
|
|
- We cannot configure proxy NDP for off-link address. The target address for
|
|
proxying must be link-local address, or must be in prefixes configured to
|
|
node which does proxy NDP.
|
|
- RFC2461 is unclear about if it is legal for a host to perform proxy ND.
|
|
We do not prohibit hosts from doing proxy ND, but there will be very limited
|
|
use in it.
|
|
|
|
Starting mid March 2000, we support Neighbor Unreachability Detection (NUD)
|
|
on p2p interfaces, including tunnel interfaces (gif). NUD is turned on by
|
|
default. Before March 2000 KAME stack did not perform NUD on p2p interfaces.
|
|
If the change raises any interoperability issues, you can turn off/on NUD
|
|
by per-interface basis. Use "ndp -i interface -nud" to turn it off.
|
|
Consult ndp(8) for details.
|
|
|
|
1.3 Scope Index
|
|
|
|
IPv6 uses scoped addresses. It is therefore very important to
|
|
specify scope index (interface index for link-local address, or
|
|
site index for site-local address) with an IPv6 address. Without
|
|
scope index, a scoped IPv6 address is ambiguous to the kernel, and
|
|
the kernel will not be able to determine the outbound interface for a
|
|
packet. KAME code tries to address the issue in several ways.
|
|
|
|
Site-local address is very vaguely defined in the specs, and both specification
|
|
and KAME code need tons of improvements to enable its actual use.
|
|
For example, it is still very unclear how we define a site, or how we resolve
|
|
hostnames in a site. There are work underway to define behavior of routers
|
|
at site border, however, we have almost no code for site boundary node support
|
|
(both forwarding nor routing) and we bet almost noone has.
|
|
We recommend, at this moment, you to use global addresses for experiments -
|
|
there are way too many pitfalls if you use site-local addresses.
|
|
|
|
1.3.1 Kernel internal
|
|
|
|
In the kernel, the interface index for a link-local scope address is
|
|
embedded into the 2nd 16bit-word (the 3rd and 4th bytes) in the IPv6
|
|
address.
|
|
For example, you may see something like:
|
|
fe80:1::200:f8ff:fe01:6317
|
|
in the routing table and interface address structure (struct
|
|
in6_ifaddr). The address above is a link-local unicast address
|
|
which belongs to a network interface whose interface identifier is 1.
|
|
The embedded index enables us to identify IPv6 link local
|
|
addresses over multiple interfaces effectively and with only a
|
|
little code change.
|
|
|
|
1.3.2 Interaction with API
|
|
|
|
Ordinary userland applications should use the advanced API (RFC2292)
|
|
to specify scope index, or interface index. For the similar purpose,
|
|
the sin6_scope_id member in the sockaddr_in6 structure is defined in
|
|
RFC2553. However, the semantics for sin6_scope_id is rather vague.
|
|
If you care about portability of your application, we suggest you to
|
|
use the advanced API rather than sin6_scope_id.
|
|
|
|
Routing daemons and configuration programs, like route6d and
|
|
ifconfig, will need to manipulate the "embedded" scope index.
|
|
These programs use routing sockets and ioctls (like SIOCGIFADDR_IN6)
|
|
and the kernel API will return IPv6 addresses with 2nd 16bit-word
|
|
filled in. The APIs are for manipulating kernel internal structure.
|
|
Programs that use these APIs have to be prepared about differences
|
|
in kernels anyway.
|
|
|
|
getaddrinfo(3) and getnameinfo(3) are modified to support extended numeric
|
|
IPv6 syntax, as documented in draft-ietf-ipngwg-scopedaddr-format-01.txt.
|
|
You can specify outgoing link, by using name of the outgoing interface
|
|
like "fe80::1%ne0". This way you will be able to specify link-local scoped
|
|
address without much trouble.
|
|
To use this extension in your program, you'll need to use getaddrinfo(3),
|
|
and getnameinfo(3) with NI_WITHSCOPEID.
|
|
The implementation currently assumes 1-to-1 relationship between a link and an
|
|
interface, which is stronger than what IPv6 specs say.
|
|
Other APIs like inet_pton(3) or getipnodebyname(3) are inherently unfriendly
|
|
with scoped addresses, since they are unable to annotate addresses with
|
|
scope identifier.
|
|
|
|
1.3.3 Interaction with users (command line)
|
|
|
|
Some of the userland tools support extended numeric IPv6 syntax, as
|
|
documented in draft-ietf-ipngwg-scopedaddr-format-01.txt. In this case,
|
|
you can specify outgoing link, by using name of the outgoing interface like
|
|
"fe80::1%ne0".
|
|
|
|
When you specify scoped address to the command line, NEVER write the
|
|
embedded form (such as ff02:1::1 or fe80:2::fedc). This is not supposed
|
|
to work. Always use standard form, like ff02::1 or fe80::fedc, with
|
|
command line option for specifying interface (like "ping6 -I ne0 ff02::1).
|
|
In general, if a command does not have command line option to specify
|
|
outgoing interface, that command is not ready to accept scoped address.
|
|
This may seem to be opposite from IPv6's premise to support "dentist office"
|
|
situation. We believe that specifications need some improvements for this.
|
|
|
|
The only exception to the above rule would be when you configure routing table
|
|
manually by route(8), or ndp(8). Gateway portion of IPv6 routing entry must
|
|
be an link-local address (otherwise ICMPv6 redirect will not work), and in this
|
|
case you'll need to configure it by putting interface index into the address:
|
|
# route add -inet6 default fe80:2::9876:5432:1234:5678
|
|
(when interface index for outgoing interface = 2)
|
|
To avoid configuration mistakes, we suggest you to run dynamic routing instead
|
|
(like route6d(8)).
|
|
|
|
1.4 Plug and Play
|
|
|
|
The KAME kit implements most of the IPv6 stateless address
|
|
autoconfiguration in the kernel.
|
|
Neighbor Discovery functions are implemented in the kernel as a whole.
|
|
Router Advertisement (RA) input for hosts is implemented in the
|
|
kernel. Router Solicitation (RS) output for endhosts, RS input
|
|
for routers, and RA output for routers are implemented in the
|
|
userland.
|
|
|
|
1.4.1 Assignment of link-local, and special addresses
|
|
|
|
IPv6 link-local address is generated from IEEE802 address (ethernet MAC
|
|
address). Each of interface is assigned an IPv6 link-local address
|
|
automatically, when the interface becomes up (IFF_UP). Also, direct route
|
|
for the link-local address is added to routing table.
|
|
|
|
Here is an output of netstat command:
|
|
|
|
Internet6:
|
|
Destination Gateway Flags Netif Expire
|
|
fe80::%ed0/64 link#1 UC ed0
|
|
fe80::%ep0/64 link#2 UC ep0
|
|
|
|
Interfaces that has no IEEE802 address (pseudo interfaces like tunnel
|
|
interfaces, or ppp interfaces) will borrow IEEE802 address from other
|
|
interfaces, such as ethernet interfaces, whenever possible.
|
|
If there is no IEEE802 hardware attached, last-resort pseudorandom value,
|
|
which is from MD5(hostname), will be used as source of link-local address.
|
|
If it is not suitable for your usage, you will need to configure the
|
|
link-local address manually.
|
|
|
|
If an interface is not capable of handling IPv6 (such as lack of multicast
|
|
support), link-local address will not be assigned to that interface.
|
|
See section 2 for details.
|
|
|
|
Each interface joins the solicited multicast address and the
|
|
link-local all-nodes multicast addresses (e.g. fe80::1:ff01:6317
|
|
and ff02::1, respectively, on the link the interface is attached).
|
|
In addition to a link-local address, the loopback address (::1) will be
|
|
assigned to the loopback interface. Also, ::1/128 and ff01::/32 are
|
|
automatically added to routing table, and loopback interface joins
|
|
node-local multicast group ff01::1.
|
|
|
|
1.4.2 Stateless address autoconfiguration on hosts
|
|
|
|
In IPv6 specification, nodes are separated into two categories:
|
|
routers and hosts. Routers forward packets addressed to others, hosts does
|
|
not forward the packets. net.inet6.ip6.forwarding defines whether this
|
|
node is router or host (router if it is 1, host if it is 0).
|
|
|
|
It is NOT recommended to change net.inet6.ip6.forwarding while the node
|
|
is in operation. IPv6 specification defines behavior for "host" and "router"
|
|
quite differently, and switching from one to another can cause serious
|
|
troubles. It is recommended to configure the variable at bootstrap time only.
|
|
|
|
The first step in stateless address configuration is Duplicated Address
|
|
Detection (DAD). See 1.2 for more detail on DAD.
|
|
|
|
When a host hears Router Advertisement from the router, a host may
|
|
autoconfigure itself by stateless address autoconfiguration.
|
|
This behavior can be controlled by net.inet6.ip6.accept_rtadv
|
|
(host autoconfigures itself if it is set to 1).
|
|
By autoconfiguration, network address prefix for the receiving interface
|
|
(usually global address prefix) is added. Default route is also configured.
|
|
Routers periodically generate Router Advertisement packets. To request
|
|
an adjacent router to generate RA packet, a host can transmit Router
|
|
Solicitation. To generate a RS packet at any time, use the "rtsol" command.
|
|
"rtsold" daemon is also available. "rtsold" generates Router Solicitation
|
|
whenever necessary, and it works great for nomadic usage (notebooks/laptops).
|
|
If one wishes to ignore Router Advertisements, use sysctl to set
|
|
net.inet6.ip6.accept_rtadv to 0.
|
|
|
|
To generate Router Advertisement from a router, use the "rtadvd" daemon.
|
|
|
|
Note that, IPv6 specification assumes the following items, and nonconforming
|
|
cases are left unspecified:
|
|
- Only hosts will listen to router advertisements
|
|
- Hosts have single network interface (except loopback)
|
|
Therefore, this is unwise to enable net.inet6.ip6.accept_rtadv on routers,
|
|
or multi-interface host. A misconfigured node can behave strange
|
|
(KAME code allows nonconforming configuration, for those who would like
|
|
to do some experiments).
|
|
|
|
To summarize the sysctl knob:
|
|
accept_rtadv forwarding role of the node
|
|
--- --- ---
|
|
0 0 host (to be manually configured)
|
|
0 1 router
|
|
1 0 autoconfigured host
|
|
(spec assumes that host has single
|
|
interface only, autoconfigred host with
|
|
multiple interface is out-of-scope)
|
|
1 1 invalid, or experimental
|
|
(out-of-scope of spec)
|
|
|
|
RFC2462 has validation rule against incoming RA prefix information option,
|
|
in 5.5.3 (e). This is to protect hosts from malicious (or misconfigured)
|
|
routers that advertise very short prefix lifetime.
|
|
There was an update from Jim Bound to ipngwg mailing list (look
|
|
for "(ipng 6712)" in the archive) and KAME implements Jim's update.
|
|
|
|
See 1.2 in the document for relationship between DAD and autoconfiguration.
|
|
|
|
1.4.3 DHCPv6
|
|
|
|
We supply a tiny DHCPv6 server/client in kame/dhcp6. However, the
|
|
implementation is very premature (for example, this does NOT
|
|
implement address lease/release), and it is not in default compilation
|
|
tree. If you want to do some experiment, compile it on your own.
|
|
|
|
DHCPv6 and autoconfiguration also needs more work. "Managed" and "Other"
|
|
bits in RA have no special effect to stateful autoconfiguration procedure
|
|
in DHCPv6 client program ("Managed" bit actually prevents stateless
|
|
autoconfiguration, but no special action will be taken for DHCPv6 client).
|
|
|
|
1.5 Generic tunnel interface
|
|
|
|
GIF (Generic InterFace) is a pseudo interface for configured tunnel.
|
|
Details are described in gif(4) manpage.
|
|
Currently
|
|
v6 in v6
|
|
v6 in v4
|
|
v4 in v6
|
|
v4 in v4
|
|
are available. Use "gifconfig" to assign physical (outer) source
|
|
and destination address to gif interfaces.
|
|
Configuration that uses same address family for inner and outer IP
|
|
header (v4 in v4, or v6 in v6) is dangerous. It is very easy to
|
|
configure interfaces and routing tables to perform infinite level
|
|
of tunneling. Please be warned.
|
|
|
|
gif can be configured to be ECN-friendly. See 4.5 for ECN-friendliness
|
|
of tunnels, and gif(4) manpage for how to configure.
|
|
|
|
If you would like to configure an IPv4-in-IPv6 tunnel with gif interface,
|
|
read gif(4) carefully. You may need to remove IPv6 link-local address
|
|
automatically assigned to the gif interface.
|
|
|
|
1.6 Source Address Selection
|
|
|
|
KAME's source address selection takes care of the following
|
|
conditions:
|
|
- address scope
|
|
- prefix matching against the destination
|
|
- outgoing interface
|
|
- whether an address is deprecated
|
|
|
|
Roughly speaking, the selection policy is as follows:
|
|
- always use an address that belongs to the same scope zone as the
|
|
destination.
|
|
- addresses that have equal or larger scope than the scope of the
|
|
destination are preferred.
|
|
- if multiple addresses have the equal scope, one which is longest
|
|
prefix matching against the destination is preferred.
|
|
- a deprecated address is not used in new communications if an
|
|
alternate (non-deprecated) address is available and has sufficient
|
|
scope.
|
|
- if none of above conditions tie-breaks, addresses assigned on the
|
|
outgoing interface are preferred.
|
|
|
|
For instance, ::1 is selected for ff01::1,
|
|
fe80::200:f8ff:fe01:6317%ne0 for fe80::2a0:24ff:feab:839b%ne0.
|
|
To see how longest-matching works, suppose that
|
|
3ffe:501:808:1:200:f8ff:fe01:6317 and 3ffe:2001:9:124:200:f8ff:fe01:6317
|
|
are given on the outgoing interface. Then the former is chosen as the
|
|
source for the destination 3ffe:501:800::1. Note that even if all
|
|
available addresses have smaller scope than the scope of the
|
|
destination, we choose one anyway. For example, if we have link-local
|
|
and site-local addresses only, we choose a site-local addresses for a
|
|
global destination. If the packet is going to break a site boundary,
|
|
the boundary router will return an ICMPv6 destination unreachable
|
|
error with code 2 - beyond scope of source address.
|
|
|
|
The precise desripction of the algorithm is quite complicated. To
|
|
describe the algorithm, we introduce the following notation:
|
|
|
|
For a given destination D,
|
|
samescope(D): A set of addresses that have the same scope as D.
|
|
largerscope(D): A set of addresses that have a larger scope than D.
|
|
smallerscope(D): A set of addresses that have a smaller scope than D.
|
|
|
|
For a given set of addresses A,
|
|
DEP(A): a set of deprecated addresses in A.
|
|
nonDEP(A): A - DEP(A).
|
|
|
|
Also, the algorithm assumes that the outgoing interface for the
|
|
destination D is determined. We call the interface "I".
|
|
|
|
The algorithm is as follows. Selection proceeds step by step as
|
|
described; For example, if an address is selected by item 1, item 2 or
|
|
later are not considered at all.
|
|
|
|
0. If there is no address in the same scope zone as D, just give up;
|
|
the packet will not be sent.
|
|
1. If nonDEP(samescope(D)) is not empty,
|
|
choose a longest matching address against D. If more than one
|
|
address is longest matching, choose arbitrary one provided that
|
|
an address on I is always preferred.
|
|
2. If nonDEP(largerscope(D)) is not empty,
|
|
choose an address that has the smallest scope. If more than one
|
|
address has the smallest scope, choose arbitrary one provided
|
|
that an address on I is always preferred.
|
|
3. If DEP(samescope(D)) is not empty,
|
|
choose a longest matching address against D. If more than one
|
|
address is longest matching, choose arbitrary one provided that
|
|
an address on I is always preferred.
|
|
4. If DEP(largerscope(D)) is not empty,
|
|
choose an address that has the smallest scope. If more than one
|
|
address has the smallest scope, choose arbitrary one provided
|
|
that an address on I is always preferred.
|
|
5. if nonDEP(smallerscope(D)) is not empty,
|
|
choose an address that has the largest scope. If more than one
|
|
address has the largest scope, choose arbitrary one provided
|
|
that an address on I is always preferred.
|
|
6. if DEP(smallerscope(D)) is not empty,
|
|
choose an address that has the largest scope. If more than one
|
|
address has the largest scope, choose arbitrary one provided
|
|
that an address on I is always preferred.
|
|
|
|
There exists a document about source address selection
|
|
(draft-ietf-ipngwg-default-addr-select-xx.txt). KAME's algorithm
|
|
described above takes a similar approach to the document, but there
|
|
are some differences. See the document for more details.
|
|
|
|
There are some cases where we do not use the above rule. One
|
|
example is connected TCP session, and we use the address kept in TCP
|
|
protocol control block (tcb) as the source.
|
|
Another example is source address for Neighbor Advertisement.
|
|
Under the spec (RFC2461 7.2.2) NA's source should be the target
|
|
address of the corresponding NS's target. In this case we follow
|
|
the spec rather than the above longest-match rule.
|
|
|
|
If you would like to prohibit the use of deprecated address for some
|
|
reason, configure net.inet6.ip6.use_deprecated to 0. The issue
|
|
related to deprecated address is described in RFC2462 5.5.4 (NOTE:
|
|
there is some debate underway in IETF ipngwg on how to use
|
|
"deprecated" address).
|
|
|
|
1.7 Jumbo Payload
|
|
|
|
KAME supports the Jumbo Payload hop-by-hop option used to send IPv6
|
|
packets with payloads longer than 65,535 octets. But since currently
|
|
KAME does not support any physical interface whose MTU is more than
|
|
65,535, such payloads can be seen only on the loopback interface(i.e.
|
|
lo0).
|
|
|
|
If you want to try jumbo payloads, you first have to reconfigure the
|
|
kernel so that the MTU of the loopback interface is more than 65,535
|
|
bytes; add the following to the kernel configuration file:
|
|
options "LARGE_LOMTU" #To test jumbo payload
|
|
and recompile the new kernel.
|
|
|
|
Then you can test jumbo payloads by the ping6 command with -b and -s
|
|
options. The -b option must be specified to enlarge the size of the
|
|
socket buffer and the -s option specifies the length of the packet,
|
|
which should be more than 65,535. For example, type as follows;
|
|
% ping6 -b 70000 -s 68000 ::1
|
|
|
|
The IPv6 specification requires that the Jumbo Payload option must not
|
|
be used in a packet that carries a fragment header. If this condition
|
|
is broken, an ICMPv6 Parameter Problem message must be sent to the
|
|
sender. KAME kernel follows the specification, but you cannot usually
|
|
see an ICMPv6 error caused by this requirement.
|
|
|
|
If KAME kernel receives an IPv6 packet, it checks the frame length of
|
|
the packet and compares it to the length specified in the payload
|
|
length field of the IPv6 header or in the value of the Jumbo Payload
|
|
option, if any. If the former is shorter than the latter, KAME kernel
|
|
discards the packet and increments the statistics. You can see the
|
|
statistics as output of netstat command with `-s -p ip6' option:
|
|
% netstat -s -p ip6
|
|
ip6:
|
|
(snip)
|
|
1 with data size < data length
|
|
|
|
So, KAME kernel does not send an ICMPv6 error unless the erroneous
|
|
packet is an actual Jumbo Payload, that is, its packet size is more
|
|
than 65,535 bytes. As described above, KAME kernel currently does not
|
|
support physical interface with such a huge MTU, so it rarely returns an
|
|
ICMPv6 error.
|
|
|
|
TCP/UDP over jumbogram is not supported at this moment. This is because
|
|
we have no medium (other than loopback) to test this. Contact us if you
|
|
need this.
|
|
|
|
IPsec does not work on jumbograms. This is due to some specification twists
|
|
in supporting AH with jumbograms (AH header size influences payload length,
|
|
and this makes it real hard to authenticate inbound packet with jumbo payload
|
|
option as well as AH).
|
|
|
|
There are fundamental issues in *BSD support for jumbograms. We would like to
|
|
address those, but we need more time to finalize the task. To name a few:
|
|
- mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it cannot hold
|
|
jumbogram with len > 2G on 32bit architecture CPUs. If we would like to
|
|
support jumbogram properly, the field must be expanded to hold 4G +
|
|
IPv6 header + link-layer header. Therefore, it must be expanded to at least
|
|
int64_t (u_int32_t is NOT enough).
|
|
- We mistakingly use "int" to hold packet length in many places. We need
|
|
to convert them into larger numeric type. It needs a great care, as we may
|
|
experience overflow during packet length computation.
|
|
- We mistakingly check for ip6_plen field of IPv6 header for packet payload
|
|
length in various places. We should be checking mbuf pkthdr.len instead.
|
|
ip6_input() will perform sanity check on jumbo payload option on input,
|
|
and we can safely use mbuf pkthdr.len afterwards.
|
|
- TCP code needs careful updates in bunch of places, of course.
|
|
|
|
1.8 Loop prevention in header processing
|
|
|
|
IPv6 specification allows arbitrary number of extension headers to
|
|
be placed onto packets. If we implement IPv6 packet processing
|
|
code in the way BSD IPv4 code is implemented, kernel stack may
|
|
overflow due to long function call chain. KAME sys/netinet6 code
|
|
is carefully designed to avoid kernel stack overflow. Because of
|
|
this, KAME sys/netinet6 code defines its own protocol switch
|
|
structure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
|
|
IPv4 part (sys/netinet) remains untouched for compatibility.
|
|
Because of this, if you receive IPsec-over-IPv4 packet with massive
|
|
number of IPsec headers, kernel stack may blow up. IPsec-over-IPv6 is okay.
|
|
|
|
1.9 ICMPv6
|
|
|
|
After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
|
|
packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
|
|
KAME already implements this into the kernel.
|
|
|
|
1.10 Applications
|
|
|
|
For userland programming, we support IPv6 socket API as specified in
|
|
RFC2553, RFC2292 and upcoming internet drafts.
|
|
|
|
TCP/UDP over IPv6 is available and quite stable. You can enjoy "telnet",
|
|
"ftp", "rlogin", "rsh", "ssh", etc. These applications are protocol
|
|
independent. That is, they automatically chooses IPv4 or IPv6
|
|
according to DNS.
|
|
|
|
1.11 Kernel Internals
|
|
|
|
(*) TCP/UDP part is handled differently between operating system platforms.
|
|
See 1.12 for details.
|
|
|
|
The current KAME has escaped from the IPv4 netinet logic. While
|
|
ip_forward() calls ip_output(), ip6_forward() directly calls
|
|
if_output() since routers must not divide IPv6 packets into fragments.
|
|
|
|
ICMPv6 should contain the original packet as long as possible up to
|
|
1280. UDP6/IP6 port unreach, for instance, should contain all
|
|
extension headers and the *unchanged* UDP6 and IP6 headers.
|
|
So, all IP6 functions except TCP6 never convert network byte
|
|
order into host byte order, to save the original packet.
|
|
|
|
tcp6_input(), udp6_input() and icmp6_input() can't assume that IP6
|
|
header is preceding the transport headers due to extension
|
|
headers. So, in6_cksum() was implemented to handle packets whose IP6
|
|
header and transport header is not continuous. TCP/IP6 nor UDP/IP6
|
|
header structure don't exist for checksum calculation.
|
|
|
|
To process IP6 header, extension headers and transport headers easily,
|
|
KAME requires network drivers to store packets in one internal mbuf or
|
|
one or more external mbufs. A typical old driver prepares two
|
|
internal mbufs for 100 - 208 bytes data, however, KAME's reference
|
|
implementation stores it in one external mbuf.
|
|
|
|
"netstat -s -p ip6" tells you whether or not your driver conforms
|
|
KAME's requirement. In the following example, "cce0" violates the
|
|
requirement. (For more information, refer to Section 2.)
|
|
|
|
Mbuf statistics:
|
|
317 one mbuf
|
|
two or more mbuf::
|
|
lo0 = 8
|
|
cce0 = 10
|
|
3282 one ext mbuf
|
|
0 two or more ext mbuf
|
|
|
|
Each input function calls IP6_EXTHDR_CHECK in the beginning to check
|
|
if the region between IP6 and its header is
|
|
continuous. IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has
|
|
M_LOOP flag, that is, the packet comes from the loopback
|
|
interface. m_pullup() is never called for packets coming from physical
|
|
network interfaces.
|
|
|
|
TCP6 reassembly makes use of IP6 header to store reassemble
|
|
information. IP6 is not supposed to be just before TCP6, so
|
|
ip6tcpreass structure has a pointer to TCP6 header. Of course, it has
|
|
also a pointer back to mbuf to avoid m_pullup().
|
|
|
|
Like TCP6, both IP and IP6 reassemble functions never call m_pullup().
|
|
|
|
xxx_ctlinput() calls in_mrejoin() on PRC_IFNEWADDR. We think this is
|
|
one of 4.4BSD implementation flaws. Since 4.4BSD keeps ia_multiaddrs
|
|
in in_ifaddr{}, it can't use multicast feature if the interface has no
|
|
unicast address. So, if an application joins to an interface and then
|
|
all unicast addresses are removed from the interface, the application
|
|
can't send/receive any multicast packets. Moreover, if a new unicast
|
|
address is assigned to the interface, in_mrejoin() must be called.
|
|
KAME's interfaces, however, have ALWAYS one link-local unicast
|
|
address. These extensions have thus not been implemented in KAME.
|
|
|
|
1.12 IPv4 mapped address and IPv6 wildcard socket
|
|
|
|
RFC2553 describes IPv4 mapped address (3.7) and special behavior
|
|
of IPv6 wildcard bind socket (3.8). The spec allows you to:
|
|
- Accept IPv4 connections by AF_INET6 wildcard bind socket.
|
|
- Transmit IPv4 packet over AF_INET6 socket by using special form of
|
|
the address like ::ffff:10.1.1.1.
|
|
but the spec itself is very complicated and does not specify how the
|
|
socket layer should behave.
|
|
Here we call the former one "listening side" and the latter one "initiating
|
|
side", for reference purposes.
|
|
|
|
Almost all KAME implementations treat tcp/udp port number space separately
|
|
between IPv4 and IPv6. You can perform wildcard bind on both of the address
|
|
families, on the same port.
|
|
|
|
There are some OS-platform differences in KAME code, as we use tcp/udp
|
|
code from different origin. The following table summarizes the behavior.
|
|
|
|
listening side initiating side
|
|
(AF_INET6 wildcard (connection to ::ffff:10.1.1.1)
|
|
socket gets IPv4 conn.)
|
|
--- ---
|
|
KAME/BSDI3 not supported not supported
|
|
KAME/FreeBSD228 not supported not supported
|
|
KAME/FreeBSD3x configurable supported
|
|
default: enabled
|
|
KAME/NetBSD configurable supported
|
|
default: disabled
|
|
KAME/BSDI4 enabled supported
|
|
KAME/OpenBSD not supported not supported
|
|
|
|
The following sections will give you more details, and how you can
|
|
configure the behavior.
|
|
|
|
Comments on listening side:
|
|
|
|
It looks that RFC2553 talks too little on wildcard bind issue,
|
|
specifically on (1) port space issue, (2) failure mode, (3) relationship
|
|
between AF_INET/INET6 wildcard bind like ordering constraint, and (4) behavior
|
|
when conflicting socket is opened/closed. There can be several separate
|
|
interpretation for this RFC which conform to it but behaves differently.
|
|
So, to implement portable application you should assume nothing
|
|
about the behavior in the kernel. Using getaddrinfo() is the safest way.
|
|
Port number space and wildcard bind issues were discussed in detail
|
|
on ipv6imp mailing list, in mid March 1999 and it looks that there's
|
|
no concrete consensus (means, up to implementers). You may want to
|
|
check the mailing list archives.
|
|
We supply a tool called "bindtest" that explores the behavior of
|
|
kernel bind(2). The tool will not be compiled by default.
|
|
|
|
If a server application would like to accept IPv4 and IPv6 connections,
|
|
it should use AF_INET and AF_INET6 socket (you'll need two sockets).
|
|
Use getaddrinfo() with AI_PASSIVE into ai_flags, and socket(2) and bind(2)
|
|
to all the addresses returned.
|
|
By opening multiple sockets, you can accept connections onto the socket with
|
|
proper address family. IPv4 connections will be accepted by AF_INET socket,
|
|
and IPv6 connections will be accepted by AF_INET6 socket (NOTE: KAME/BSDI4
|
|
kernel sometimes violate this - we will fix it).
|
|
|
|
If you try to support IPv6 traffic only and would like to reject IPv4
|
|
traffic, always check the peer address when a connection is made toward
|
|
AF_INET6 listening socket. If the address is IPv4 mapped address, you may
|
|
want to reject the connection. You can check the condition by using
|
|
IN6_IS_ADDR_V4MAPPED() macro. This is one of the reasons the author of
|
|
the section (itojun) dislikes special behavior of AF_INET6 wildcard bind.
|
|
|
|
Comments on initiating side:
|
|
|
|
Advise to application implementers: to implement a portable IPv6 application
|
|
(which works on multiple IPv6 kernels), we believe that the following
|
|
is the key to the success:
|
|
- NEVER hardcode AF_INET nor AF_INET6.
|
|
- Use getaddrinfo() and getnameinfo() throughout the system.
|
|
Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*().
|
|
- If you would like to connect to destination, use getaddrinfo() and try
|
|
all the destination returned, like telnet does.
|
|
- Some of the IPv6 stack is shipped with buggy getaddrinfo(). Ship a minimal
|
|
working version with your application and use that as last resort.
|
|
|
|
If you would like to use AF_INET6 socket for both IPv4 and IPv6 outgoing
|
|
connection, you will need tweaked implementation in DNS support libraries,
|
|
as documented in RFC2553 6.1. KAME libinet6 includes the tweak in
|
|
getipnodebyname(). Note that getipnodebyname() itself is not recommended as
|
|
it does not handle scoped IPv6 addresses at all. For IPv6 name resolution
|
|
getaddrinfo() is the preferred API. getaddrinfo() does not implement the
|
|
tweak.
|
|
|
|
When writing applications that make outgoing connections, story goes much
|
|
simpler if you treat AF_INET and AF_INET6 as totally separate address family.
|
|
{set,get}sockopt issue goes simpler, DNS issue will be made simpler. We do
|
|
not recommend you to rely upon IPv4 mapped address.
|
|
|
|
1.12.1 KAME/BSDI3 and KAME/FreeBSD228
|
|
|
|
The platforms do not support IPv4 mapped address at all (both listening side
|
|
and initiating side). AF_INET6 and AF_INET sockets are totally separated.
|
|
|
|
Port number space is totally separate between AF_INET and AF_INET6 sockets.
|
|
|
|
1.12.2 KAME/FreeBSD3x
|
|
|
|
KAME/FreeBSD3x uses shared tcp4/6 code (from sys/netinet/tcp*) and shared
|
|
udp4/6 code (from sys/netinet/udp*). It uses unified inpcb/in6pcb structure.
|
|
|
|
1.12.2.1 KAME/FreeBSD3x, listening side
|
|
|
|
The platform can be configured to support IPv4 mapped address/special
|
|
AF_INET6 wildcard bind (enabled by default). There is no kernel compilation
|
|
option to disable it. You can enable/disable the behavior with sysctl
|
|
(per-node), or setsockopt (per-socket).
|
|
|
|
Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
|
|
conditions are satisfied:
|
|
- there's no AF_INET socket that matches the IPv4 connection
|
|
- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
|
|
getsockopt(IPV6_BINDV6ONLY) returns 0.
|
|
|
|
(XXX need checking)
|
|
|
|
1.12.2.2 KAME/FreeBSD3x, initiating side
|
|
|
|
KAME/FreeBSD3x supports outgoing connection to IPv4 mapped address
|
|
(::ffff:10.1.1.1), if the node is configured to accept IPv4 connections
|
|
by AF_INET6 socket.
|
|
|
|
(XXX need checking)
|
|
|
|
1.12.3 KAME/NetBSD
|
|
|
|
KAME/NetBSD uses shared tcp4/6 code (from sys/netinet/tcp*) and shared
|
|
udp4/6 code (from sys/netinet/udp*). The implementation is made differently
|
|
from KAME/FreeBSD3x. KAME/NetBSD uses separate inpcb/in6pcb structures,
|
|
while KAME/FreeBSD3x uses merged inpcb structure.
|
|
|
|
1.12.3.1 KAME/NetBSD, listening side
|
|
|
|
The platform can be configured to support IPv4 mapped address/special AF_INET6
|
|
wildcard bind (disabled by default). Kernel behavior can be summarized as
|
|
follows:
|
|
- default: special support code will be compiled in, but is disabled by
|
|
default. It can be controlled by sysctl (net.inet6.ip6.bindv6only),
|
|
or setsockopt(IPV6_BINDV6ONLY).
|
|
- add "INET6_BINDV6ONLY": No special support code for AF_INET6 wildcard socket
|
|
will be compiled in. AF_INET6 sockets and AF_INET sockets are totally
|
|
separate. The behavior is similar to what described in 1.12.1.
|
|
|
|
sysctl setting will affect per-socket configuration at in6pcb creation time
|
|
only. In other words, per-socket configuration will be copied from sysctl
|
|
configuration at in6pcb creation time. To change per-socket behavior, you
|
|
must perform setsockopt or reopen the socket. Change in sysctl configuration
|
|
will not change the behavior or sockets that are already opened.
|
|
|
|
Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
|
|
conditions are satisfied:
|
|
- there's no AF_INET socket that matches the IPv4 connection
|
|
- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
|
|
getsockopt(IPV6_BINDV6ONLY) returns 0.
|
|
|
|
You cannot bind(2) with IPv4 mapped address. This is a workaround for port
|
|
number duplicate and other twists.
|
|
|
|
1.12.3.2 KAME/NetBSD, initiating side
|
|
|
|
When you initiate a connection, you can always connect to IPv4 destination
|
|
over AF_INET6 socket, usin IPv4 mapped address destination (::ffff:10.1.1.1).
|
|
This is enabled independently from the configuration for listening side, and
|
|
always enabled.
|
|
|
|
1.12.4 KAME/BSDI4
|
|
|
|
KAME/BSDI4 uses NRL-based TCP/UDP stack and inpcb source code,
|
|
which was derived from NRL IPv6/IPsec stack. I guess it supports IPv4 mapped
|
|
address and speical AF_INET6 wildcard bind. The implementation is, again,
|
|
different from other KAME/*BSDs.
|
|
|
|
1.12.4.1 KAME/BSDI4, listening side
|
|
|
|
NRL inpcb layer supports special behavior of AF_INET6 wildcard socket.
|
|
There is no way to disable the behavior.
|
|
|
|
Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
|
|
condition is satisfied:
|
|
- there's no AF_INET socket that matches the IPv4 connection
|
|
|
|
1.12.4.2 KAME/BSDI4, initiating side
|
|
|
|
KAME/BSDi4 supports connection initiation to IPv4 mapped address
|
|
(like ::ffff:10.1.1.1).
|
|
|
|
1.12.5 KAME/OpenBSD
|
|
|
|
KAME/OpenBSD uses NRL-based TCP/UDP stack and inpcb source code,
|
|
which was derived from NRL IPv6/IPsec stack.
|
|
|
|
1.12.5.1 KAME/OpenBSD, listening side
|
|
|
|
KAME/OpenBSD disables special behavior on AF_INET6 wildcard bind for
|
|
security reasons (if IPv4 traffic toward AF_INET6 wildcard bind is allowed,
|
|
access control will become much harder). KAME/BSDI4 uses NRL-based TCP/UDP
|
|
stack as well, however, the behavior is different due to OpenBSD's security
|
|
policy.
|
|
|
|
As a result the behavior of KAME/OpenBSD is similar to KAME/BSDI3 and
|
|
KAME/FreeBSD228 (see 1.12.1 for more detail).
|
|
|
|
1.12.5.2 KAME/OpenBSD, initiating side
|
|
|
|
KAME/OpenBSD does not support connection initiation to IPv4 mapped address
|
|
(like ::ffff:10.1.1.1).
|
|
|
|
1.12.6 More issues
|
|
|
|
IPv4 mapped address support adds a big requirement to EVERY userland codebase.
|
|
Every userland code should check if an AF_INET6 sockaddr contains IPv4
|
|
mapped address or not. This adds many twists:
|
|
|
|
- Access controls code becomes harder to write.
|
|
For example, if you would like to reject packets from 10.0.0.0/8,
|
|
you need to reject packets to AF_INET socket from 10.0.0.0/8,
|
|
and to AF_INET6 socket from ::ffff:10.0.0.0/104.
|
|
- If a protocol on top of IPv4 is defined differently with IPv6, we will get
|
|
very tricky situation.
|
|
For example, with FTP protocol, we can not simply use sa_family to determine
|
|
FTP command sets. The following example is incorrect:
|
|
if (sa_family == AF_INET)
|
|
use EPSV/EPRT or PASV/PORT; /*IPv4*/
|
|
else if (sa_family == AF_INET6)
|
|
use EPSV/EPRT or LPSV/LPRT; /*IPv6*/
|
|
else
|
|
error;
|
|
Under SIIT environment, the correct code would be:
|
|
if (sa_family == AF_INET)
|
|
use EPSV/EPRT or PASV/PORT; /*IPv4*/
|
|
else if (sa_family == AF_INET6 && IPv4 mapped address)
|
|
use EPSV/EPRT or PASV/PORT; /*IPv4 command set on AF_INET6*/
|
|
else if (sa_family == AF_INET6 && !IPv4 mapped address)
|
|
use EPSV/EPRT or LPSV/LPRT; /*IPv6*/
|
|
else
|
|
error;
|
|
(not sure if the above code fragment is perfect for all situations)
|
|
- By enabling kernel support for IPv4 mapped address (outgoing direction),
|
|
servers on the kernel can be hosed by IPv6 native packet that has IPv4
|
|
mapped address in IPv6 header source, and can generate unwanted IPv4 packets.
|
|
http://playground.iijlab.net/i-d/draft-itojun-ipv6-transition-abuse-00.txt
|
|
talks more about this scenario.
|
|
|
|
Due to the above twists, some of KAME userland programs has restrictions on
|
|
the use of IPv4 mapped addresses:
|
|
- rshd/rlogind do not accept connections from IPv4 mapped address.
|
|
This is to avoid malicious use of IPv4 mapped address in IPv6 native
|
|
packet, to bypass source-address based authentication.
|
|
- ftp/ftpd does not support SIIT environment. IPv4 mapped address will be
|
|
decoded in userland, and will be passed to AF_INET sockets
|
|
(SIIT client should pass IPv4 mapped address as is, to AF_INET6 sockets).
|
|
|
|
1.13 sockaddr_storage
|
|
|
|
When RFC2553 was about to be finalized, there was discussion on how struct
|
|
sockaddr_storage members are named. One proposal is to prepend "__" to the
|
|
members (like "__ss_len") as they should not be touched. The other proposal
|
|
was that don't prepend it (like "ss_len") as we need to touch those members
|
|
directly. There was no clear consensus on it.
|
|
|
|
As a result, RFC2553 defines struct sockaddr_storage as follows:
|
|
struct sockaddr_storage {
|
|
u_char __ss_len; /* address length */
|
|
u_char __ss_family; /* address family */
|
|
/* and bunch of padding */
|
|
};
|
|
On the contrary, XNET draft defines as follows:
|
|
struct sockaddr_storage {
|
|
u_char ss_len; /* address length */
|
|
u_char ss_family; /* address family */
|
|
/* and bunch of padding */
|
|
};
|
|
|
|
In December 1999, it was agreed that RFC2553bis should pick the latter (XNET)
|
|
definition.
|
|
|
|
KAME kit prior to December 1999 used RFC2553 definition. KAME kit after
|
|
December 1999 (including December) will conform to XNET definition,
|
|
based on RFC2553bis discussion.
|
|
|
|
If you look at multiple IPv6 implementations, you will be able to see
|
|
both definitions. As an userland programmer, the most portable way of
|
|
dealing with it is to:
|
|
(1) ensure ss_family and/or ss_len are available on the platform, by using
|
|
GNU autoconf,
|
|
(2) have -Dss_family=__ss_family to unify all occurences (including header
|
|
file) into __ss_family, or
|
|
(3) never touch __ss_family. cast to sockaddr * and use sa_family like:
|
|
struct sockaddr_storage ss;
|
|
family = ((struct sockaddr *)&ss)->sa_family
|
|
|
|
1.14 Invalid addresses on the wire
|
|
|
|
Some of IPv6 transition technologies embed IPv4 address into IPv6 address.
|
|
These specifications themselves are fine, however, there can be certain
|
|
set of attacks enabled by these specifications. Recent speicifcation
|
|
documents covers up those issues, however, there are already-published RFCs
|
|
that does not have protection against those (like using source address of
|
|
::ffff:127.0.0.1 to bypass "reject packet from remote" filter).
|
|
|
|
To name a few, these address ranges can be used to hose an IPv6 implementation,
|
|
or bypass security controls:
|
|
- IPv4 mapped address that embeds unspecified/multicast/loopback/broadcast
|
|
IPv4 address (if they are in IPv6 native packet header, they are malicious)
|
|
::ffff:0.0.0.0/104 ::ffff:127.0.0.0/104
|
|
::ffff:224.0.0.0/100 ::ffff:255.0.0.0/104
|
|
- 6to4 prefix generated from unspecified/multicast/loopback/broadcast/private
|
|
IPv4 address
|
|
2002:0000::/24 2002:7f00::/24 2002:e000::/24
|
|
2002:ff00::/24 2002:0a00::/24 2002:ac10::/28
|
|
2002:c0a8::/32
|
|
|
|
Also, since KAME does not support RFC1933 auto tunnels, seeing IPv4 compatible
|
|
is very rare. You should take caution if you see those on the wire.
|
|
|
|
KAME code is carefully written to avoid such incidents. More specifically,
|
|
KAME kernel will reject packets with certain source/dstination address in IPv6
|
|
base header, or IPv6 routing header. Also, KAME default configuration file
|
|
is written carefully, to avoid those attacks.
|
|
|
|
http://playground.iijlab.net/i-d/draft-itojun-ipv6-transition-abuse-00.txt
|
|
talks about more about this.
|
|
|
|
1.15 Node's required addresses
|
|
|
|
RFC2373 section 2.8 talks about required addresses for an IPv6
|
|
node. The section talks about how KAME stack manages those required
|
|
addresses.
|
|
|
|
1.15.1 Host case
|
|
|
|
The following items are automatically assigned to the node (or the node will
|
|
automatically joins the group), at bootstrap time:
|
|
- Loopback address
|
|
- All-nodes multicast addresses (ff01::1)
|
|
|
|
The following items will be automatically handled when the interface becomes
|
|
IFF_UP:
|
|
- Its link-local address for each interface
|
|
- Solicited-node multicast address for link-local addresses
|
|
- Link-local allnodes multicast address (ff02::1)
|
|
|
|
The following items need to be configured manually by ifconfig(8) or prefix(8).
|
|
Alternatively, these can be autoconfigured by using stateless address
|
|
autoconfiguration.
|
|
- Assigned unicast/anycast addresses
|
|
- Solicited-Node multicast address for assigned unicast address
|
|
|
|
Users can join groups by using appropriate system calls like setsockopt(2).
|
|
|
|
1.15.2 Router case
|
|
|
|
In addition to the above, routers needs to handle the following items.
|
|
|
|
The following items need to be configured manually by using ifconfig(8).
|
|
o The subnet-router anycast addresses for the interfaces it is configured
|
|
to act as a router on (prefix::/64)
|
|
o All other anycast addresses with which the router has been configured
|
|
|
|
The router will join the following multicast group when rtadvd(8) is available
|
|
for the interface.
|
|
o All-Routers Multicast Addresses (ff02::2)
|
|
|
|
Routing daemons will join appropriate multicast groups, as necessary,
|
|
like ff02::9 for RIPng.
|
|
|
|
Users can join groups by using appropriate system calls like setsockopt(2).
|
|
|
|
2. Network Drivers
|
|
|
|
KAME requires three items to be added into the standard drivers:
|
|
|
|
(1) mbuf clustering requirement. In this stable release, we changed
|
|
MINCLSIZE into MHLEN+1 for all the operating systems in order to make
|
|
all the drivers behave as we expect.
|
|
|
|
(2) multicast. If "ifmcstat" yields no multicast group for a
|
|
interface, that interface has to be patched.
|
|
|
|
To avoid troubles, we suggest you to comment out the device drivers
|
|
for unsupported/unnecessary cards, from the kernel configuration file.
|
|
If you accidentally enable unsupported drivers, some of the userland
|
|
tools may not work correctly (routing daemons are typical example).
|
|
|
|
In the following sections, "official support" means that KAME developers
|
|
are using that ethernet card/driver frequently.
|
|
|
|
(NOTE: In the past we required all pcmcia drivers to have a call to
|
|
in6_ifattach(). We have no such requirement any more)
|
|
|
|
2.1 FreeBSD 2.2.x-RELEASE
|
|
|
|
Here is a list of FreeBSD 2.2.x-RELEASE drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) official support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
ar looks ok - -
|
|
cnw ok ok yes (*)
|
|
ed ok ok yes
|
|
ep ok ok yes
|
|
fe ok ok yes
|
|
sn looks ok - - (*)
|
|
vx looks ok - -
|
|
wlp ok ok - (*)
|
|
xl ok ok yes
|
|
zp ok ok -
|
|
(FDDI)
|
|
fpa looks ok ? -
|
|
(ATM)
|
|
en ok ok yes
|
|
(Serial)
|
|
lp ? - not work
|
|
sl ? - not work
|
|
sr looks ok ok - (**)
|
|
|
|
You may want to add an invocation of "rtsol" in "/etc/pccard_ether",
|
|
if you are using notebook computers and PCMCIA ethernet card.
|
|
|
|
(*) These drivers are distributed with PAO (http://www.jp.freebsd.org/PAO/).
|
|
|
|
(**) There was some report says that, if you make sr driver up and down and
|
|
then up, the kernel may hang up. We have disabled frame-relay support from
|
|
sr driver and after that this looks to be working fine. If you need
|
|
frame-relay support to come back, please contact KAME developers.
|
|
|
|
2.2 BSD/OS 3.x
|
|
|
|
The following lists BSD/OS 3.x device drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) official support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
cnw ok ok yes
|
|
de ok ok -
|
|
df ok ok -
|
|
eb ok ok -
|
|
ef ok ok yes
|
|
exp ok ok -
|
|
mz ok ok yes
|
|
ne ok ok yes
|
|
we ok ok -
|
|
(FDDI)
|
|
fpa ok ok -
|
|
(ATM)
|
|
en maybe ok -
|
|
(Serial)
|
|
ntwo ok ok yes
|
|
sl ? - not work
|
|
appp ? - not work
|
|
|
|
You may want to use "@insert" directive in /etc/pccard.conf to invoke
|
|
"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
|
|
|
|
2.3 NetBSD
|
|
|
|
The following table lists the network drivers we have tried so far.
|
|
|
|
driver mbuf(1) multicast(2) official support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
awi pcmcia/i386 ok ok -
|
|
bah zbus/amiga NG(*)
|
|
cnw pcmcia/i386 ok ok yes
|
|
ep pcmcia/i386 ok ok -
|
|
le sbus/sparc ok ok yes
|
|
ne pci/i386 ok ok yes
|
|
ne pcmcia/i386 ok ok yes
|
|
wi pcmcia/i386 ok ok yes
|
|
(ATM)
|
|
en pci/i386 ok ok -
|
|
|
|
(*) This may need some fix, but I'm not sure what arcnet interfaces assume...
|
|
|
|
2.4 FreeBSD 3.x-RELEASE
|
|
|
|
Here is a list of FreeBSD 3.x-RELEASE drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) official support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
cnw ok ok -(*)
|
|
ed ? ok -
|
|
ep ok ok -
|
|
fe ok ok yes
|
|
fxp ?(**)
|
|
lnc ? ok -
|
|
sn ? ? -(*)
|
|
wi ok ok yes
|
|
xl ? ok -
|
|
|
|
(*) These drivers are distributed with PAO as PAO3
|
|
(http://www.jp.freebsd.org/PAO/).
|
|
(**) there are trouble reports with multicast filter initialization.
|
|
|
|
More drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not
|
|
been checked yet.
|
|
|
|
2.5 OpenBSD 2.x
|
|
|
|
Here is a list of OpenBSD 2.x drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) official support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
de pci/i386 ok ok yes
|
|
fxp pci/i386 ?(*)
|
|
le sbus/sparc ok ok yes
|
|
ne pci/i386 ok ok yes
|
|
ne pcmcia/i386 ok ok yes
|
|
|
|
(*) There seem to be some problem in driver, with multicast filter
|
|
configuration. This happens with certain revision of chipset on the card.
|
|
Should be fixed by now by workaround in sys/net/if.c, but still not sure.
|
|
|
|
2.6 BSD/OS 4.x
|
|
|
|
The following lists BSD/OS 4.x device drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) official support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
de ok ok yes
|
|
exp (*)
|
|
|
|
You may want to use "@insert" directive in /etc/pccard.conf to invoke
|
|
"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
|
|
|
|
(*) exp driver has serious conflict with KAME initialization sequence.
|
|
A workaround is committed into sys/i386/pci/if_exp.c, and should be okay by now.
|
|
|
|
3. Translator
|
|
|
|
We categorize IPv4/IPv6 translator into 4 types.
|
|
|
|
Translator A --- It is used in the early stage of transition to make
|
|
it possible to establish a connection from an IPv6 host in an IPv6
|
|
island to an IPv4 host in the IPv4 ocean.
|
|
|
|
Translator B --- It is used in the early stage of transition to make
|
|
it possible to establish a connection from an IPv4 host in the IPv4
|
|
ocean to an IPv6 host in an IPv6 island.
|
|
|
|
Translator C --- It is used in the late stage of transition to make it
|
|
possible to establish a connection from an IPv4 host in an IPv4 island
|
|
to an IPv6 host in the IPv6 ocean.
|
|
|
|
Translator D --- It is used in the late stage of transition to make it
|
|
possible to establish a connection from an IPv6 host in the IPv6 ocean
|
|
to an IPv4 host in an IPv4 island.
|
|
|
|
KAME provides an TCP relay translator for category A. This is called
|
|
"FAITH". We also provide IP header translator for category A.
|
|
|
|
3.1 FAITH TCP relay translator
|
|
|
|
FAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel.
|
|
FAITH will reserve an IPv6 address prefix, and relay TCP connection
|
|
toward that prefix to IPv4 destination.
|
|
|
|
For example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and
|
|
the IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12,
|
|
the connection will be relayed toward IPv4 destination 163.221.202.12.
|
|
|
|
destination IPv4 node (163.221.202.12)
|
|
^
|
|
| IPv4 tcp toward 163.221.202.12
|
|
FAITH-relay dual stack node
|
|
^
|
|
| IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
|
|
source IPv6 node
|
|
|
|
faithd must be invoked on FAITH-relay dual stack node.
|
|
|
|
For more details, consult kame/kame/faithd/README and
|
|
draft-ietf-ngtrans-tcpudp-relay-01.txt.
|
|
|
|
3.2 IPv6-to-IPv4 header translator
|
|
|
|
# removed since it is not imported to NetBSD-current
|
|
|
|
4. IPsec
|
|
|
|
IPsec is implemented as the following three components.
|
|
|
|
(1) Policy Management
|
|
(2) Key Management
|
|
(3) AH, ESP and IPComp handling in kernel
|
|
|
|
Note that KAME/OpenBSD does NOT include support for KAME IPsec code,
|
|
as OpenBSD team has their home-brew IPsec stack and they have no plan
|
|
to replace it. IPv6 support for IPsec is, therefore, lacking on KAME/OpenBSD.
|
|
|
|
4.1 Policy Management
|
|
|
|
The kernel implements experimental policy management code. There are two way
|
|
to manage security policy. One is to configure per-socket policy using
|
|
setsockopt(3). In this cases, policy configuration is described in
|
|
ipsec_set_policy(3). The other is to configure kernel packet filter-based
|
|
policy using PF_KEY interface, via setkey(8).
|
|
|
|
The policy entry will be matched in order. The order of entries makes
|
|
difference in behavior.
|
|
|
|
4.2 Key Management
|
|
|
|
The key management code implemented in this kit (sys/netkey) is a
|
|
home-brew PFKEY v2 implementation. This conforms to RFC2367.
|
|
|
|
The home-brew IKE daemon, "racoon" is included in the kit (kame/kame/racoon,
|
|
or usr.sbin/racoon).
|
|
Basically you'll need to run racoon as daemon, then setup a policy
|
|
to require keys (like ping -P 'out ipsec esp/transport//use').
|
|
The kernel will contact racoon daemon as necessary to exchange keys.
|
|
|
|
In IKE spec, there's ambiguity about interpretation of "tunnel" proposal.
|
|
For example, if we would like to propose the use of following packet:
|
|
IP AH ESP IP payload
|
|
some implementation proposes it as "AH transport and ESP tunnel", since
|
|
this is more logical from packet construction point of view. Some
|
|
implementation proposes it as "AH tunnel and ESP tunnel".
|
|
Racoon follows the former route.
|
|
This raises real interoperability issue. We hope this to be resolved quickly.
|
|
|
|
4.3 AH and ESP handling
|
|
|
|
IPsec module is implemented as "hooks" to the standard IPv4/IPv6
|
|
processing. When sending a packet, ip{,6}_output() checks if ESP/AH
|
|
processing is required by checking if a matching SPD (Security
|
|
Policy Database) is found. If ESP/AH is needed,
|
|
{esp,ah}{4,6}_output() will be called and mbuf will be updated
|
|
accordingly. When a packet is received, {esp,ah}4_input() will be
|
|
called based on protocol number, i.e. (*inetsw[proto])().
|
|
{esp,ah}4_input() will decrypt/check authenticity of the packet,
|
|
and strips off daisy-chained header and padding for ESP/AH. It is
|
|
safe to strip off the ESP/AH header on packet reception, since we
|
|
will never use the received packet in "as is" form.
|
|
|
|
By using ESP/AH, TCP4/6 effective data segment size will be affected by
|
|
extra daisy-chained headers inserted by ESP/AH. Our code takes care of
|
|
the case.
|
|
|
|
Basic crypto functions can be found in directory "sys/crypto". ESP/AH
|
|
transform are listed in {esp,ah}_core.c with wrapper functions. If you
|
|
wish to add some algorithm, add wrapper function in {esp,ah}_core.c, and
|
|
add your crypto algorithm code into sys/crypto.
|
|
|
|
Tunnel mode works basically fine, but comes with the following restrictions:
|
|
- You cannot run routing daemon across IPsec tunnel, since we do not model
|
|
IPsec tunnel as pseudo interfaces.
|
|
- Authentication model for AH tunnel must be revisited. We'll need to
|
|
improve the policy management engine, eventually.
|
|
- Tunnelling for IPv6 IPsec is still incomplete. This is disabled by default.
|
|
If you need to perform experiments, add "options IPSEC_IPV6FWD" into
|
|
the kernel configuration file. Note that path MTU discovery does not work
|
|
across IPv6 IPsec tunnel gateway due to insufficient code.
|
|
|
|
AH specificaton does not talk much about "multiple AH on a packet" case.
|
|
We incrementally compute AH checksum, from inside to outside. Also, we
|
|
treat inner AH to be immutable.
|
|
For example, if we are to create the following packet:
|
|
IP AH1 AH2 AH3 payload
|
|
we do it incrementally. As a result, we get crypto checksums like below:
|
|
AH3 has checksum against "IP AH3' payload".
|
|
where AH3' = AH3 with checksum field filled with 0.
|
|
AH2 has checksum against "IP AH2' AH3 payload".
|
|
AH1 has checksum against "IP AH1' AH2 AH3 payload",
|
|
Also note that AH3 has the smallest sequence number, and AH1 has the largest
|
|
sequence number.
|
|
|
|
4.4 IPComp handling
|
|
|
|
IPComp stands for IP payload compression protocol. This is aimed for
|
|
payload compression, not the header compression like PPP VJ compression.
|
|
This may be useful when you are using slow serial link (say, cell phone)
|
|
with powerful CPU (well, recent notebook PCs are really powerful...).
|
|
The protocol design of IPComp is very similar to IPsec, though it was
|
|
defined separately from IPsec itself.
|
|
|
|
Here are some points to be noted:
|
|
- IPComp is treated as part of IPsec protocol suite, and SPI and
|
|
CPI space is unified. Spec says that there's no relationship
|
|
between two so they are assumed to be separate in specs.
|
|
- IPComp association (IPCA) is kept in SAD.
|
|
- It is possible to use well-known CPI (CPI=2 for DEFLATE for example),
|
|
for outbound/inbound packet, but for indexing purposes one element from
|
|
SPI/CPI space will be occupied anyway.
|
|
- pfkey is modified to support IPComp. However, there's no official
|
|
SA type number assignment yet. Portability with other IPComp
|
|
stack is questionable (anyway, who else implement IPComp on UN*X?).
|
|
- Spec says that IPComp output processing must be performed before AH/ESP
|
|
output processing, to achieve better compression ratio and "stir" data
|
|
stream before encryption. The most meaningful processing order is:
|
|
(1) compress payload by IPComp, (2) encrypt payload by ESP, then (3) attach
|
|
authentication data by AH.
|
|
However, with manual SPD setting, you are able to violate the ordering
|
|
(KAME code is too generic, maybe). Also, it is just okay to use IPComp
|
|
alone, without AH/ESP.
|
|
- Though the packet size can be significantly decreased by using IPComp, no
|
|
special consideration is made about path MTU (spec talks nothing about MTU
|
|
consideration). IPComp is designed for serial links, not ethernet-like
|
|
medium, it seems.
|
|
- You can change compression ratio on outbound packet, by changing
|
|
deflate_policy in sys/netinet6/ipcomp_core.c. You can also change outbound
|
|
history buffer size by changing deflate_window_out in the same source code.
|
|
(should it be sysctl accessible, or per-SAD configurable?)
|
|
- Tunnel mode IPComp is not working right. KAME box can generate tunnelled
|
|
IPComp packet, however, cannot accept tunneled IPComp packet.
|
|
- You can negotiate IPComp association with racoon IKE daemon.
|
|
- KAME code does not attach Adler32 checksum to compressed data.
|
|
see ipsec wg mailing list discussion in Jan 2000 for details.
|
|
|
|
4.5 Conformance to RFCs and IDs
|
|
|
|
The IPsec code in the kernel conforms (or, tries to conform) to the
|
|
following standards:
|
|
"old IPsec" specification documented in rfc182[5-9].txt
|
|
"new IPsec" specification documented in rfc240[1-6].txt, rfc241[01].txt,
|
|
rfc2451.txt and draft-mcdonald-simple-ipsec-api-01.txt (draft expired,
|
|
but you can take from ftp://ftp.kame.net/pub/internet-drafts/).
|
|
(NOTE: IKE specifications, rfc240[7-9].txt are implemented in userland,
|
|
as "racoon" IKE daemon)
|
|
IPComp:
|
|
RFC2393: IP Payload Compression Protocol (IPComp)
|
|
|
|
Currently supported algorithms are:
|
|
old IPsec AH
|
|
null crypto checksum (no document, just for debugging)
|
|
keyed MD5 with 128bit crypto checksum (rfc1828.txt)
|
|
keyed SHA1 with 128bit crypto checksum (no document)
|
|
HMAC MD5 with 128bit crypto checksum (rfc2085.txt)
|
|
HMAC SHA1 with 128bit crypto checksum (no document)
|
|
old IPsec ESP
|
|
null encryption (no document, similar to rfc2410.txt)
|
|
DES-CBC mode (rfc1829.txt)
|
|
new IPsec AH
|
|
null crypto checksum (no document, just for debugging)
|
|
keyed MD5 with 96bit crypto checksum (no document)
|
|
keyed SHA1 with 96bit crypto checksum (no document)
|
|
HMAC MD5 with 96bit crypto checksum (rfc2403.txt
|
|
HMAC SHA1 with 96bit crypto checksum (rfc2404.txt)
|
|
new IPsec ESP
|
|
null encryption (rfc2410.txt)
|
|
DES-CBC with derived IV
|
|
(draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired)
|
|
DES-CBC with explicit IV (rfc2405.txt)
|
|
3DES-CBC with explicit IV (rfc2451.txt)
|
|
BLOWFISH CBC (rfc2451.txt)
|
|
CAST128 CBC (rfc2451.txt)
|
|
RC5 CBC (rfc2451.txt)
|
|
each of the above can be combined with:
|
|
ESP authentication with HMAC-MD5(96bit)
|
|
ESP authentication with HMAC-SHA1(96bit)
|
|
IPComp
|
|
RFC2394: IP Payload Compression Using DEFLATE
|
|
|
|
The following algorithms are NOT supported:
|
|
old IPsec AH
|
|
HMAC MD5 with 128bit crypto checksum + 64bit replay prevention
|
|
(rfc2085.txt)
|
|
keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt)
|
|
|
|
The key/policy management API is based on the following document, with fair
|
|
amount of extensions:
|
|
RFC2367: PF_KEY key management API
|
|
|
|
4.6 ECN consideration on IPsec tunnels
|
|
|
|
KAME IPsec implements ECN-friendly IPsec tunnel, described in
|
|
draft-ietf-ipsec-ecn-02.txt.
|
|
Normal IPsec tunnel is described in RFC2401. On encapsulation,
|
|
IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
|
|
IP header to outer IP header. On decapsulation outer IP header
|
|
will be simply dropped. The decapsulation rule is not compatible
|
|
with ECN, since ECN bit on the outer IP TOS/traffic class field will be
|
|
lost.
|
|
To make IPsec tunnel ECN-friendly, we should modify encapsulation
|
|
and decapsulation procedure. This is described in
|
|
draft-ietf-ipsec-ecn-02.txt, chapter 3.3.
|
|
|
|
KAME IPsec tunnel implementation can give you three behaviors, by setting
|
|
net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value:
|
|
- RFC2401: no consideration for ECN (sysctl value -1)
|
|
- ECN forbidden (sysctl value 0)
|
|
- ECN allowed (sysctl value 1)
|
|
Note that the behavior is configurable in per-node manner, not per-SA manner
|
|
(draft-ietf-ipsec-ecn-02 wants per-SA configuration, but it looks too much
|
|
for me).
|
|
|
|
The behavior is summarized as follows (see source code for more detail):
|
|
|
|
encapsulate decapsulate
|
|
--- ---
|
|
RFC2401 copy all TOS bits drop TOS bits on outer
|
|
from inner to outer. (use inner TOS bits as is)
|
|
|
|
ECN forbidden copy TOS bits except for ECN drop TOS bits on outer
|
|
(masked with 0xfc) from inner (use inner TOS bits as is)
|
|
to outer. set ECN bits to 0.
|
|
|
|
ECN allowed copy TOS bits except for ECN use inner TOS bits with some
|
|
CE (masked with 0xfe) from change. if outer ECN CE bit
|
|
inner to outer. is 1, enable ECN CE bit on
|
|
set ECN CE bit to 0. the inner.
|
|
|
|
General strategy for configuration is as follows:
|
|
- if both IPsec tunnel endpoint are capable of ECN-friendly behavior,
|
|
you'd better configure both end to "ECN allowed" (sysctl value 1).
|
|
- if the other end is very strict about TOS bit, use "RFC2401"
|
|
(sysctl value -1).
|
|
- in other cases, use "ECN forbidden" (sysctl value 0).
|
|
The default behavior is "ECN forbidden" (sysctl value 0).
|
|
|
|
For more information, please refer to:
|
|
draft-ietf-ipsec-ecn-02.txt
|
|
RFC2481 (Explicit Congestion Notification)
|
|
KAME sys/netinet6/{ah,esp}_input.c
|
|
|
|
(Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis)
|
|
|
|
4.7 Interoperability
|
|
|
|
IPsec, IPComp (in kernel) and IKE (in userland as "racoon") has been tested
|
|
at several interoperability test events, and it is known to interoperate
|
|
with many other implementations well. Also, KAME IPsec has quite wide
|
|
coverage for IPsec crypto algorithms documented in RFC (we do not cover
|
|
algorithms with intellectual property issues, though).
|
|
|
|
Here are (some of) platforms we have tested IPsec/IKE interoperability
|
|
in the past, in no particular order. Note that both ends (KAME and
|
|
others) may have modified their implementation, so use the following
|
|
list just for reference purposes.
|
|
Altiga, Ashley-laurent (vpcom.com), Data Fellows (F-Secure),
|
|
BlueSteel, CISCO, Ericsson, ACC, Fitel, FreeS/WAN, HITACHI, IBM
|
|
AIX, IIJ, Intel, Microsoft WinNT, NAI PGPnet,
|
|
NIST (linux IPsec + plutoplus), Netscreen, OpenBSD isakmpd, Radguard,
|
|
RedCreek, Routerware, SSH, Secure Computing, Soliton, Toshiba,
|
|
TIS/NAI Gauntret, VPNet, Yamaha RT100i
|
|
|
|
Here are (some of) platforms we have tested IPComp/IKE interoperability
|
|
in the past, in no particular order.
|
|
IRE
|
|
|
|
5. ALTQ
|
|
|
|
# removed since it is not imported to NetBSD-current
|
|
|
|
6. mobile-ip6
|
|
|
|
# removed since it is not imported to NetBSD-current
|
|
|
|
<end of IMPLEMENTATION>
|