1046 lines
45 KiB
Plaintext
1046 lines
45 KiB
Plaintext
$NetBSD: IMPLEMENTATION,v 1.2 1999/07/03 21:30:17 thorpej Exp $
|
|
|
|
# NOTE: this is from original KAME distribution.
|
|
# Some portion of this document is not applicable to the code merged into
|
|
# NetBSD-current (especially tcp part). Check sys/netinet6/TODO as well.
|
|
|
|
Implementation Note
|
|
|
|
KAME Project
|
|
http://www.kame.net/
|
|
$Date: 1999/07/03 21:30:17 $
|
|
|
|
1. IPv6
|
|
|
|
1.1 Conformance
|
|
|
|
The KAME kit conforms, or tries to conform, to the latest set of IPv6
|
|
specifications. For future reference we list some of the relevant documents
|
|
below (NOTE: this is not a complete list - this is too hard to maintain...).
|
|
For details please refer to specific chapter in the document, RFCs, manpages
|
|
come with KAME, or comments in the source code.
|
|
|
|
Conformance tests have been performed on the previous KAME STABLE kit at
|
|
TAHI project.
|
|
Results can be viewed at http://www.tahi.org/report/KAME/freebsd-stable-9903/.
|
|
We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
|
|
in the past, with our past snapshots.
|
|
|
|
RFC1639: FTP Operation Over Big Address Records (FOOBAR)
|
|
* RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428,
|
|
then RFC1639 if failed.
|
|
RFC1933: Transition Mechanisms for IPv6 Hosts and Routers
|
|
* IPv4 compatible address is not supported.
|
|
* automatic tunnelling (4.3) is not supported.
|
|
* "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
|
|
and it covers "configured tunnel" described in the spec.
|
|
See 1.5 in this document for details.
|
|
RFC1981: Path MTU Discovery for IPv6
|
|
RFC2080: RIPng for IPv6
|
|
* KAME-supplied route6d, bgpd and hroute6d support this.
|
|
RFC2283: Multiprotocol Extensions for BGP-4
|
|
* so-called "BGP4+".
|
|
* KAME-supplied bgpd supports this.
|
|
RFC2292: Advanced Sockets API for IPv6
|
|
* For supported library functions/kernel APIs, see sys/netinet6/ADVAPI
|
|
(KAME/FreeBSD228 only)
|
|
RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
|
|
* RFC2362 defines packet formats for PIM-SM. draft-ietf-pim-ipv6-01.txt
|
|
is written based on this.
|
|
RFC2373: IPv6 Addressing Architecture
|
|
* KAME supports node required addresses, and conforms to the scope
|
|
requirement.
|
|
RFC2374: An IPv6 Aggregatable Global Unicast Address Format
|
|
* KAME supports 64-bit length of Interface ID.
|
|
RFC2375: IPv6 Multicast Address Assignments
|
|
* Userland applications use the well-known addresses assigned in the RFC.
|
|
RFC2428: FTP Extensions for IPv6 and NATs
|
|
* RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428,
|
|
then RFC1639 if failed.
|
|
RFC2460: IPv6 specification
|
|
RFC2461: Neighbor discovery for IPv6
|
|
* See 1.2 in this document for details.
|
|
RFC2462: IPv6 Stateless Address Autoconfiguration
|
|
* See 1.4 in this document for details.
|
|
RFC2463: ICMPv6 for IPv6 specification
|
|
* See 1.8 in this document for details.
|
|
RFC2464: Transmission of IPv6 Packets over Ethernet Networks
|
|
RFC2467: Transmission of IPv6 Packets over FDDI Networks
|
|
RFC2472: IPv6 over PPP
|
|
RFC2492: IPv6 over ATM Networks
|
|
* only PVC is supported.
|
|
RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
|
|
RFC2553: Basic Socket Interface Extensions for IPv6
|
|
* IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
|
|
socket (3.8) are,
|
|
- supported on KAME/FreeBSD31 and it seems to be working well.
|
|
- also supported on KAME/FreeBSD228, but it is not tested well
|
|
so disabled by default.
|
|
- not supported on KAME/NetBSD and KAME/BSDI.
|
|
see 1.12 in this document for details.
|
|
draft-ietf-ipngwg-ipv6router-alert-04: IPv6 router alert option
|
|
draft-ietf-ipngwg-router-renum-08: Router renumbering for IPv6
|
|
draft-ietf-ipngwg-icmp-namelookups-02: IPv6 Name Lookups Through ICMP
|
|
draft-ietf-ipngwg-icmp-name-lookups-03: IPv6 Name Lookups Through ICMP
|
|
draft-ietf-ipngwg-aaaa-03: DNS Extensions to support IPv6
|
|
draft-ietf-ipngwg-mld-01: Multicast Listener Discovery for IPv6
|
|
draft-ietf-ipngwg-jumbo-00: The IPv6 Jumbo Payload Option
|
|
* See 1.7 in this document for details.
|
|
draft-ietf-ipngwg-jumbograms-00: IPv6 Jumbograms
|
|
* See 1.7 in this document for details.
|
|
draft-ietf-pim-ipv6-01.txt: PIM for IPv6
|
|
* no sparse mode support. pim6dd implements dense mode only.
|
|
draft-itojun-ipv6-tcp-to-anycast-00:
|
|
Disconnecting TCP connection toward IPv6 anycast address
|
|
draft-yamamoto-wideipv6-comm-model-00
|
|
* See 1.6 in this document for details.
|
|
|
|
1.2 Neighbor Discovery
|
|
|
|
Neighbor Discovery is fairly stable. Currently Address Resolution,
|
|
Duplicated Address Detection, and Neighbor Unreachability Detection
|
|
are supported. In the near future we will be adding Proxy Neighbor
|
|
Advertisement support in the kernel and Unsolicited Neighbor Advertisement
|
|
transmission command as admin tool.
|
|
|
|
If DAD fails, the address will be marked "duplicated" and message will be
|
|
generated to syslog (and usually to console). The "duplicated" mark
|
|
can be cheked with ifconfig. It is administrators' responsibility to check
|
|
for and recover from DAD failures.
|
|
The behavior should be improved in the near future.
|
|
|
|
Some of the network driver loops multicast packets back to itself,
|
|
even if instructed not to do so (especially in promiscuous mode).
|
|
In such cases DAD may fail, because DAD engine sees inbound NS packet
|
|
(actually from the node itself) and considers it as a sign of duplicate.
|
|
|
|
Neighbor Discovery specification (RFC2461) does not talk about neighbor
|
|
cache handling in the following cases:
|
|
(1) there was no neighbor cache entry, and unsolicited RS/NS/NA/redirect
|
|
packet, without link-layer address was received
|
|
(2) neighbor cache handling on medium without link-layer address
|
|
(we need a neighbor cache entry for IsRouter bit)
|
|
For (1), we implemented workaround based on discussions on IETF ipngwg mailing
|
|
list. For more details, see the comments in the source code and email
|
|
thread started from (IPng 7155), dated Feb 6 1999.
|
|
|
|
IPv6 adderss autoconfiguration assumes that a host has only single network
|
|
interface (single non-loopback interface). This is because the spec assumes
|
|
that there's no routing table on IPv6 host (an IPv6 host has default router
|
|
list and prefix list only).
|
|
Because of this, if you accept RAs on a node with multiple network interfaces,
|
|
the node will not behave as you might expect. RFC2461 Appendix A talks
|
|
little bit about this issue.
|
|
|
|
IPv6 on-link determination rule (RFC2461) is quite different from assumptions
|
|
in BSD network code. At this moment, KAME does not implement on-link
|
|
determination rule when default router list is empty (RFC2461, section 5.2,
|
|
last sentence in 2nd paragraph - note that the spec misuse the word "host"
|
|
and "node" in several places in the section).
|
|
|
|
To avoid possible DoS attacks and infinite loops, KAME stack will accept
|
|
only 10 options on ND packet. Therefore, if you have 20 prefix options
|
|
attached to RA, only the first 10 prefixes will be recognized.
|
|
If this troubles you, please contact KAME team and/or modify
|
|
sys/netinet6/nd6.c:nd6_options().
|
|
|
|
1.3 Scope Index
|
|
|
|
IPv6 uses scoped addresses. Therefore, it is very important to
|
|
specify scope index (interface index for link-local address, or
|
|
site index for site-local address) with an IPv6 address. Without
|
|
scope index, scoped IPv6 address is ambiguous to the kernel, and
|
|
kernel will not be able to determine the outbound interface for a
|
|
packet.
|
|
|
|
Ordinary userland applications should use advanced API (RFC2292) to
|
|
specify scope index, or interface index. For similar purpose,
|
|
sin6_scope_id member in sockaddr_in6 structure is defined in RFC2553.
|
|
However, the semantics for sin6_scope_id is rather vague. If you
|
|
care about portability of your application, we suggest you to use
|
|
advanced API rather than sin6_scope_id.
|
|
|
|
In the kernel, an interface index for link-local scoped address is
|
|
embedded into in6_addr.s6_addr16[1]. For example, you may see
|
|
something like:
|
|
fe80:1::200:f8ff:fe01:6317
|
|
in the routing table and interface address structure (struct
|
|
in6_ifaddr). The address above is a link-local unicast address
|
|
which belongs to a network interface whose interface identifier is
|
|
1. The embedded index enables us to identify IPv6 link local
|
|
addresses over multiple interfaces effectively and with only a
|
|
little code change.
|
|
Routing daemons and configuration programs, like route6d and
|
|
ifconfig, will need to manipulate the "embedded" scope index.
|
|
These programs use routing sockets and ioctls (like SIOCGIFADDR_IN6)
|
|
and the kernel API will return IPv6 addresses with in6_addr.s6_addr16[1]
|
|
filled in. The APIs are for manipulating kernel internal structure.
|
|
Programs that use these APIs have to be prepared about differences
|
|
in kernels anyway.
|
|
|
|
When you specify scoped address to the command line, NEVER write the
|
|
embedded form (such as ff02:1::1 or fe80:2::fedc). This is not supposed
|
|
to work. Always use standard form, like ff02::1 or fe80::fedc, with
|
|
command line option for specifying interface (like "ping6 -I ne0 ff02::1).
|
|
In general, if a command does not have command line option to specify
|
|
outgoing interface, that command is not ready to accept scoped address.
|
|
This may seem to be opposite from IPv6's premise to support "dentist office"
|
|
situation. We believe that specifications need some improvements for this.
|
|
|
|
1.4 Plug and Play
|
|
|
|
The KAME kit implements most of the IPv6 stateless address
|
|
autoconfiguration in the kernel.
|
|
Neighbor Discovery functions are implemented in the kernel as a whole.
|
|
Router Advertisement (RA) input for hosts is implemented in the
|
|
kernel. Router Solicitation (RS) output for endhosts, RS input
|
|
for routers, and RA output for routers are implemented in the
|
|
userland.
|
|
|
|
When the kernel boots up, an IPv6 link-local address is assigned to each
|
|
interface. Also, direct route for the link-local address is added to
|
|
routing table. Here is an output of netstat command:
|
|
|
|
Internet6:
|
|
Destination Gateway Flags Netif Expire
|
|
fe80:1::/64 link#1 UC ed0
|
|
fe80:2::/64 link#2 UC ep0
|
|
|
|
Each interface joins the solicited multicast address and the
|
|
link-local all-nodes multicast addresses (e.g. fe80:1::1:ff01:6317
|
|
and ff02:1::1, respectively). In addition to a link-local address,
|
|
the loopback address (::1) is assigned to the loopback interface. It
|
|
adds ::1/128 and ff01::/32 to routing table and joins ff01::1.
|
|
|
|
When a host hears Router Advertisement from the router, default route
|
|
and network address prefix (usually global address prefix) are added.
|
|
To generate a Router Solicitation packet at any time, use the "rtsol" command.
|
|
Also, "rtsold" daemon is available. It generates Router Solicitation
|
|
whenever necessary, and it works great for nomadic usage (notebooks/laptops).
|
|
If one wishes to ignore Router Advertisements, use sysctl to set
|
|
net.inet6.ip6.accept_rtadv to 0.
|
|
|
|
To generate Router Advertisement from a router, use the "rtadvd" daemon.
|
|
|
|
RFC2462 has validation rule against incoming RA prefix information option,
|
|
in 5.5.3 (e). This is to protect hosts from malicious (or misconfigured)
|
|
routers that advertise very short prefix lifetime.
|
|
There was an update from Jim Bound to ipngwg mailing list (look
|
|
for "(ipng 6712)" in the archive) and KAME implements Jim's update.
|
|
|
|
See 1.2 in the document for relationship between DAD and autoconfiguration.
|
|
|
|
DHCPv6 server/client is not implemented yet. "Managed" and "Other" bits in
|
|
RA have no special effect to stateful autoconfiguration procedure
|
|
("Managed" bit actually prevents stateless autoconfiguration, but no special
|
|
action will be taken for DHCPv6 client).
|
|
|
|
1.5 Generic tunnel interface
|
|
|
|
GIF (Generic InterFace) is a pseudo interface for configured tunnel.
|
|
Details are described in gif(4) manpage.
|
|
Currently
|
|
v6 in v6
|
|
v6 in v4
|
|
v4 in v6
|
|
v4 in v4
|
|
are available. Use "gifconfig" to assign physical (outer) source
|
|
and destination address to gif interfaces.
|
|
Configuration that uses same address family for inner and outer IP
|
|
header (v4 in v4, or v6 in v6) is dangerous. It is very easy to
|
|
configure interfaces and routing tables to perform infinite level
|
|
of tunneling. Please be warned.
|
|
|
|
gif can be configured to be ECN-friendly. See 4.5 for ECN-friendliness
|
|
of tunnels, and gif(4) manpage for how to configure.
|
|
|
|
1.6 Source Address Selection
|
|
|
|
Source selection of KAME is scope oriented (there are some exceptions -
|
|
see below). For a given destination, a source IPv6 address is selected
|
|
by the following rule:
|
|
1. If the source address is explicitly specified by the user
|
|
(e.g. via the advanced API), the specified address is used.
|
|
2. If there is an address assigned to the outgoing interface
|
|
(which is usually determined by looking up the routing table)
|
|
that has the same scope as the destination address, the address
|
|
is used.
|
|
This is the most typical case.
|
|
3. If there is no address that satisfies the above condition,
|
|
choose a global address assigned to one of the interfaces
|
|
on the sending node.
|
|
4. If there is no address that satisfies the above condition and
|
|
there is no global address on the sending node, choose the
|
|
address associated with the routing table entry for the destination.
|
|
This is the last resort, which may cause scope violation.
|
|
|
|
For instance, ::1 is selected for ff01::1, fe80:1::200:f8ff:fe01:6317
|
|
for fe80:1::2a0:24ff:feab:839b. If the outgoing interface has
|
|
multiple address for the scope, a source is selected longest match
|
|
basis (rule 3). Suppose 3ffe:501:808:1:200:f8ff:fe01:6317 and
|
|
3ffe:2001:9:124:200:f8ff:fe01:6317 are given to the outgoing
|
|
interface. 3ffe:501:808:1:200:f8ff:fe01:6317 is chosen as the source
|
|
for the destination 3ffe:501:800::1.
|
|
|
|
Note that the above rule is not documented in the IPv6 spec. It is
|
|
considered "up to implementation" item.
|
|
There are some cases where we do not use the above rule. One
|
|
example is connected TCP session, and we use the address kept in tcb
|
|
as the source.
|
|
Another example is source address for Neighbor Advertisement.
|
|
Under the spec (RFC2461 7.2.2) NA's source should be the target
|
|
address of the corresponding NS's target. In this case we follow
|
|
the spec rather than the above longest-match rule.
|
|
|
|
1.7 Jumbo Payload
|
|
|
|
KAME supports the Jumbo Payload hop-by-hop option used to send IPv6
|
|
packets with payloads longer than 65,535 octets. But since currently
|
|
KAME does not support any physical interface whose MTU is more than
|
|
65,535, such payloads can be seen only on the loopback interface(i.e.
|
|
lo0).
|
|
|
|
If you want to try jumbo payloads, you first have to reconfigure the
|
|
kernel so that the MTU of the loopback interface is more than 65,535
|
|
bytes; add the following to the kernel configuration file:
|
|
options "LARGE_LOMTU" #To test jumbo payload
|
|
and recompile the new kernel.
|
|
|
|
Then you can test jumbo payloads by the ping6 command with -b and -s
|
|
options. The -b option must be specified to enlarge the size of the
|
|
socket buffer and the -s option specifies the length of the packet,
|
|
which should be more than 65,535. For example, type as follows;
|
|
% ping6 -b 70000 -s 68000 ::1
|
|
|
|
The IPv6 specification requires that the Jumbo Payload option must not
|
|
be used in a packet that carries a fragment header. If this condition
|
|
is broken, an ICMPv6 Parameter Problem message must be sent to the
|
|
sender. KAME kernel follows the specification, but you cannot usually
|
|
see an ICMPv6 error caused by this requirement.
|
|
|
|
If KAME kernel receives an IPv6 packet, it checks the frame length of
|
|
the packet and compares it to the length specified in the payload
|
|
length field of the IPv6 header or in the value of the Jumbo Payload
|
|
option, if any. If the former is shorter than the latter, KAME kernel
|
|
discards the packet and increments the statistics. You can see the
|
|
statistics as output of netstat command with `-s -p ip6' option:
|
|
% netstat -s -p ip6
|
|
ip6:
|
|
(snip)
|
|
1 with data size < data length
|
|
|
|
So, KAME kernel does not send an ICMPv6 error unless the erroneous
|
|
packet is an actual Jumbo Payload, that is, its packet size is more
|
|
than 65,535 bytes. As described above, KAME kernel currently does not
|
|
support physical interface with such a huge MTU, so it rarely returns an
|
|
ICMPv6 error.
|
|
|
|
TCP/UDP over jumbogram is not supported at this moment. This is because
|
|
we have no medium (other than loopback) to test this. Contact us if you
|
|
need this.
|
|
|
|
IPsec does not work on jumbograms. This is due to some specification twists
|
|
in supporting AH with jumbograms.
|
|
|
|
1.8 Loop prevention in header processing
|
|
|
|
IPv6 specification allows arbitrary number of extension headers to
|
|
be placed onto packets. If we implement IPv6 packet processing
|
|
code in the way BSD IPv4 code is implemented, kernel stack may
|
|
overflow due to deep function call chain. KAME sys/netinet6 code
|
|
is carefully designed to avoid kernel stack overflow. Because of
|
|
this, KAME sys/netinet6 code defines its own protocol switch
|
|
structure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
|
|
IPv4 part (sys/netinet) remains untouched for compatibility.
|
|
Because of this, if you receive IPsec-over-IPv4 packet with massive
|
|
number of IPsec headers, kernel stack may blow up. IPsec-over-IPv6 is okay.
|
|
|
|
1.9 ICMPv6
|
|
|
|
After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
|
|
packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
|
|
KAME already implements this into the kernel.
|
|
|
|
1.10 Applications
|
|
|
|
For userland programming, we support IPv6 socket API as specified in
|
|
RFC2553, RFC2292 and upcoming internet drafts.
|
|
|
|
TCP/UDP over IPv6 is available and quite stable. You can enjoy "telnet",
|
|
"ftp", "rlogin", "rsh", "ssh", etc. These applications are protocol
|
|
independent. That is, they automatically chooses IPv4 or IPv6
|
|
according to DNS.
|
|
|
|
1.11 Kernel Internals
|
|
|
|
(*) TCP stack is merged between netinet and netinet6 on KAME/FreeBSD31,
|
|
so if your platform is FreeBSD31, then there is no difference
|
|
between tcp6 and tcp in following description.
|
|
|
|
The current KAME has escaped from the IPv4 netinet logic. While
|
|
ip_forward() calls ip_output(), ip6_forward() directly calls
|
|
if_output() since routers must not divide IPv6 packets into fragments.
|
|
|
|
ICMPv6 should contain the original packet as long as possible up to
|
|
1280. UDP6/IP6 port unreach, for instance, should contain all
|
|
extension headers and the *unchanged* UDP6 and IP6 headers.
|
|
So, all IP6 functions except TCP6 never convert network byte
|
|
order into host byte order, to save the original packet.
|
|
|
|
tcp6_input(), udp6_input() and icmp6_input() can't assume that IP6
|
|
header is preceding the transport headers due to extension
|
|
headers. So, in6_cksum() was implemented to handle packets whose IP6
|
|
header and transport header is not continuous. TCP/IP6 nor UDP/IP6
|
|
header structure don't exist for checksum calculation.
|
|
|
|
To process IP6 header, extension headers and transport headers easily,
|
|
KAME requires network drivers to store packets in one internal mbuf or
|
|
one or more external mbufs. A typical old driver prepares two
|
|
internal mbufs for 100 - 208 bytes data, however, KAME's reference
|
|
implementation stores it in one external mbuf.
|
|
|
|
"netstat -s -p ip6" tells you whether or not your driver conforms
|
|
KAME's requirement. In the following example, "cce0" violates the
|
|
requirement. (For more information, refer to Section 2.)
|
|
|
|
Mbuf statistics:
|
|
317 one mbuf
|
|
two or more mbuf::
|
|
lo0 = 8
|
|
cce0 = 10
|
|
3282 one ext mbuf
|
|
0 two or more ext mbuf
|
|
|
|
Each input function calls IP6_EXTHDR_CHECK in the beginning to check
|
|
if the region between IP6 and its header is
|
|
continuous. IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has
|
|
M_LOOP flag, that is, the packet comes from the loopback
|
|
interface. m_pullup() is never called for packets coming from physical
|
|
network interfaces.
|
|
|
|
TCP6 reassembly makes use of IP6 header to store reassemble
|
|
information. IP6 is not supposed to be just before TCP6, so
|
|
ip6tcpreass structure has a pointer to TCP6 header. Of course, it has
|
|
also a pointer back to mbuf to avoid m_pullup().
|
|
|
|
Like TCP6, both IP and IP6 reassemble functions never call m_pullup().
|
|
|
|
xxx_ctlinput() calls in_mrejoin() on PRC_IFNEWADDR. We think this is
|
|
one of 4.4BSD implementation flaws. Since 4.4BSD keeps ia_multiaddrs
|
|
in in_ifaddr{}, it can't use multicast feature if the interface has no
|
|
unicast address. So, if an application joins to an interface and then
|
|
all unicast addresses are removed from the interface, the application
|
|
can't send/receive any multicast packets. Moreover, if a new unicast
|
|
address is assigned to the interface, in_mrejoin() must be called.
|
|
KAME's interfaces, however, have ALWAYS one link-local unicast
|
|
address. These extensions have thus not been implemented in KAME.
|
|
|
|
1.12 IPv4 mapped address and IPv6 wildcard socket
|
|
|
|
RFC2553 describes IPv4 mapped address (3.7) and special behavior
|
|
of IPv6 wildcard bind socket (3.8). The spec allows you to:
|
|
- Transmit IPv4 packet over AF_INET6 socket by using special form of
|
|
the address like ::ffff:10.1.1.1.
|
|
- Accept IPv4 connections by AF_INET6 wildcard bind socket.
|
|
but it may look complicated spec.
|
|
|
|
We KAME team have 4 OS platforms right now, and behavior is silghtly
|
|
different between them. To summarize:
|
|
- All KAME implementations treat tcp/udp port number space separately
|
|
between IPv4 and IPv6.
|
|
- KAME/NetBSD and KAME/BSDI does not support IPv4 mapped address, nor
|
|
special wildcard bind on AF_INET6.
|
|
- KAME/FreeBSD31 supports IPv4 mapped address, and special wildcard bind on
|
|
AF_INET6. They seem to be working well and enabled by default.
|
|
You can disable those two by runtime and kernel compile configuration.
|
|
(you can't enable only one of them: they comes together)
|
|
- KAME/FreeBSD228 supports both, but it is not tested well so disabled by
|
|
default.
|
|
You can enable those two by runtime and kernel compile configuration.
|
|
(you can't enable only one of them: they comes together)
|
|
- KAME/FreeBSD228 and KAME/FreeBSD31 implements slightly different behavior
|
|
for special wildcard bind on AF_INET6.
|
|
|
|
The following sections will give you the details, and how you can
|
|
configure the behavior on KAME/FreeBSD{228,31}.
|
|
|
|
Advise to application implementers: to implement a portable IPv6 application
|
|
(which works on multiple IPv6 kernels), we believe that the following
|
|
is the key to the success:
|
|
- NEVER hardcode AF_INET or AF_INET6.
|
|
- Use getaddrinfo() and getnameinfo() throughout the system.
|
|
Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*().
|
|
- If you would like to listen to connections, use getaddrinfo() (maybe
|
|
with AI_PASSIVE), and make sockets for all the "struct addrinfo" returned.
|
|
- If you would like to connect to destination, use getaddrinfo() and try
|
|
all the destination returned, like telnet does.
|
|
- Some of the IPv6 stack is shipped with buggy getaddrinfo(). Ship a minimal
|
|
working version with your application and use that as last resort.
|
|
It looks that RFC2553 talks too little on wildcard bind issue,
|
|
especially on the port space issue, failure mode and relationship
|
|
between AF_INET/INET6 wildcard bind. There can be several separate
|
|
interpretation for this RFC (see 1.12.2 - we have two different
|
|
implementation for this, and RFC2553 seems to fit to both of them).
|
|
So, to implement portable application you should assume nothing
|
|
about the behavior in the kernel. Using getaddrinfo() is the safest way.
|
|
Port number space and wildcard bind issues were discussed in detail
|
|
on ipv6imp mailing list, in mid March 1999 and it looks that there's
|
|
no concrete consensus (means, up to implementers). You may want to
|
|
check the mailing list archives.
|
|
|
|
1.12.1 KAME/NetBSD and KAME/BSDI
|
|
|
|
The platforms do not support IPv4 mapped address.
|
|
The IPv4 mapped address support needs tweaked implementation in
|
|
DNS support libraries, as documented in RFC2553 6.1. However, since
|
|
the platforms do not support this, you do not need to worry about
|
|
RFC2553 6.1 and story goes much simpler.
|
|
(KAME library actually implements the tweaks, but it is safe to ignore that)
|
|
|
|
Port number space is totally separate between AF_INET and
|
|
AF_INET6 sockets. You can always perform wildcard bind on both of
|
|
the adderss families, on the same port.
|
|
|
|
If a server application would like to accept IPv4 and IPv6 connections,
|
|
it should use AF_INET and AF_INET6 socket (you'll need two sockets).
|
|
Applicsations should use proper socket for connections. IPv4 connections
|
|
must be made on AF_INET socket, and IPv6 connections must be made
|
|
on AF_INET6 socket. getaddrinfo() library helps you in writing
|
|
AF-independent application, and managing sockets with different AFs.
|
|
(some of the implementers think that we should totaly get rid of gethostby*
|
|
family of the functions and migrate to get{addr,name}info, since
|
|
it is very clean and helps you support new AFs in the future)
|
|
|
|
1.12.2 KAME/FreeBSD31
|
|
|
|
The platform can be configued to support IPv4 mapped address/special AF_INET6
|
|
wildcard bind (enabled by default). If you disable it, it behaves as described
|
|
in 1.12.1.
|
|
|
|
The IPv4 mapped address support needs tweaked implementation in
|
|
DNS support libraries. This is documented in RFC2553 6.1.
|
|
KAME libraries (namely libinet6.a) actually support that.
|
|
|
|
RFC2553 does not talk about how port number space should be designed
|
|
(i.e. should they be separate between AF_INET and AF_INET6, or
|
|
should they be common)
|
|
In KAME with the behavior enabled, port number space is separate
|
|
between AF_INET and AF_INET6 sockets in most cases. The only
|
|
exception is wildcard bind socket, where the special behavior appears.
|
|
|
|
If a server application would like to accept IPv4 and IPv6 connections,
|
|
it can use IPv6 socket with wildcard bind, or use two sockets (for
|
|
AF_INET6 and AF_INET). You can handle IPv4 and IPv6 connections
|
|
by using AF_INET6 socket. Porting of an application can be simpler in
|
|
this case, like:
|
|
- change AF_INET into AF_INET6
|
|
- use gethostbyname2(hostname, AF_INET6) or getipnodebyname(), instead of
|
|
gethostbyname(hostname)
|
|
- use struct sockaddr_in6 instead of sockaddr_in
|
|
|
|
To provide services to both IPv4 and IPv6 clients, you will run a single
|
|
server which binds to single AF_INET6 wildcard socket. This server will
|
|
accept both IPv4 and IPv6 connections to the tcp/udp port.
|
|
If you run two daemons, which binds to AF_INET6 wildcard socket and
|
|
AF_INET socket (say, sendmail4 and sendmail6), story start to look
|
|
a bit complicated. The next sections have the detail.
|
|
|
|
(* the following paragraphs are different between KAME/FreeBSD31 and 228)
|
|
|
|
Wildcard bind on AF_INET6 behaves like "wildcard bind between two
|
|
address families". It will grab IPv4 connection if and only if
|
|
there is no socket that binds to more specific destination.
|
|
Here, wildcard bind on AF_INET is regarded as "more specific bind"
|
|
than wildcard bind on AF_INET6.
|
|
In other words, wildcard bind on AF_INET6 is the only thing that
|
|
has special behavior. It will not affect wildcard bind on AF_INET.
|
|
If the folllowing events happen, IPv4 connection will be routed
|
|
to application B, and IPv6 connection will be routed to application A.
|
|
- application A perform wildcard bind on AF_INET6, port X
|
|
- application B perform wildcard bind on AF_INET, port X
|
|
- IPv4 connection arrives
|
|
- IPv6 connection arrives
|
|
If the following events happen, the behavior is the same. IPv4
|
|
connection will be routed to application B, and IPv6 connection
|
|
will be routed to application A.
|
|
- application B perform wildcard bind on AF_INET, port X
|
|
- application A perform wildcard bind on AF_INET6, port X
|
|
- IPv4 connection arrives
|
|
- IPv6 connection arrives
|
|
|
|
If the following events happen, KAME/FreeBSD31 will behave like this:
|
|
- sendmail4 is running. It is doing wildcard bind on AF_INET.
|
|
- Invoke sendmail6. It will do a wildcard bind on AF_INET6.
|
|
- Stop sendmail4. Here, on KAME/FreeBSD31, IPv4 and IPv6 conections will
|
|
be routed to sendmail6. This is different from KAME/FreeBSD228.
|
|
|
|
1.12.3 KAME/FreeBSD228
|
|
|
|
The platform can be configued to support IPv4 mapped address/special AF_INET6
|
|
wildcard bind (disabled by default). If you disable it, it behaves as
|
|
described in 1.12.1.
|
|
|
|
KAME/FreeBSD228 works mostly the same as KAME/FreeBSD31, except the paragraphs
|
|
under (*) mark in 1.12.2. Please refer to 1.12.2 for other parts.
|
|
|
|
Wildcard bind on AF_INET6 affects the behavior of wildcard bind on
|
|
AF_INET.
|
|
If events happen in the following order, application B fails to
|
|
bind. Therefore, both IPv4 and IPv6 connections will be routed to
|
|
application A.
|
|
- application A perform wildcard bind on AF_INET6, port X
|
|
- application B perform wildcard bind on AF_INET, port X -> fail
|
|
- IPv4 connection arrives
|
|
- IPv6 connection arrives
|
|
If the following events happen, IPv4 connection will be routed to
|
|
application B, and IPv6 connection will be routed to application
|
|
A. This behavior is the same as described in 1.12.1.
|
|
- application B perform wildcard bind on AF_INET, port X
|
|
- application A perform wildcard bind on AF_INET6, port X
|
|
- IPv4 connection arrives
|
|
- IPv6 connection arrives
|
|
|
|
If the following events happen, KAME/FreeBSD228 will behave like this:
|
|
- sendmail4 is running. It is doing wildcard bind on AF_INET.
|
|
- Invoke sendmail6. It will do a wildcard bind on AF_INET6.
|
|
- Stop sendmail4. Here, on KAME/FreeBSD228, sendmail6 will be able to
|
|
get IPv6 connections only. This is different from KAME/FreeBSD31.
|
|
|
|
1.12.4 configuration and implementation
|
|
|
|
On KAME/FreeBSD31 and KAME/FreeBSD228, the behavior is
|
|
configurable by following procedure. To enable it:
|
|
- Add the "MAPPED_ADDR_ENABLED" kernel config option into your
|
|
kernel config file (see "sys/i386/conf/GENERIC.v6" sample file)
|
|
and build your kernel, and
|
|
- set sysctl variable appropriately, like:
|
|
# sysctl -w net.inet6.ip6.mapped_addr=1
|
|
Note that, to enable the behavior you'll need to do the both of the above.
|
|
If you do not do the both, the behavior is disabled.
|
|
|
|
The behavior is enabled by default on KAME/FreeBSD31, but disabled
|
|
by default on KAME/FreeBSD228, because those functions are not well
|
|
tested on KAME/FreeBSD228.
|
|
Implementation is quite different between KAME/FreeBSD31 and
|
|
KAME/FreeBSD228, because tcp stack code and protocol contorol block
|
|
(pcb) implementation are merged between netinet and netinet6 in
|
|
KAME/FreeBSD31 code, but they are separate in KAME/FreeBSD228 code.
|
|
This is the reason for different behavior in two OSes.
|
|
|
|
2. Network Drivers
|
|
|
|
KAME requires three items to be added into the standard drivers:
|
|
|
|
(1) mbuf clustering requirement. In this stable release, we changed
|
|
MINCLSIZE into MHLEN+1 for all the operating systems in order to make
|
|
all the drivers behave as we expect.
|
|
|
|
(2) multicast. If "ifmcstat" yields no multicast group for a
|
|
interface, that interface has to be patched.
|
|
|
|
(3) If you are using notebook PCs, you'll need to be a one-liner for
|
|
initializing your interface when your card is plugged in.
|
|
(NOTE: KAME/NetBSD does not need this)
|
|
|
|
To avoid troubles, we suggest you to comment out the device drivers
|
|
for unsupported/unnecessary cards, from the configuration file.
|
|
If you accidentally enable unsupported drivers, some of the userland
|
|
tools may not work correctly (routing daemons are typical example).
|
|
|
|
In the following sections, "official support" means that KAME developers
|
|
are using that ethernet card/driver frequently.
|
|
|
|
2.1 FreeBSD 2.2.x-RELEASE
|
|
|
|
Here is a list of FreeBSD 2.2.x-RELEASE drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) PCMCIA official
|
|
patch(3) support?
|
|
--- --- --- --- ---
|
|
(Ethernet)
|
|
ar looks ok - - -
|
|
cnw ok ok ok yes (*)
|
|
ed ok ok ok yes
|
|
ep ok ok ok yes
|
|
fe ok ok ok yes
|
|
sn looks ok - ok - (*)
|
|
vx looks ok - - -
|
|
wlp ok ok ok - (*)
|
|
xl ok ok - yes
|
|
zp ok ok - -
|
|
(FDDI)
|
|
fpa looks ok ? - -
|
|
(ATM)
|
|
en ok ok - yes
|
|
(Serial)
|
|
lp ? - - not work
|
|
sl ? - - not work
|
|
sr looks ok ok - - (**)
|
|
|
|
You may want to add an invocation of "rtsol" in "/etc/pccard_ether",
|
|
if you are using notebook computers and PCMCIA ethernet card.
|
|
|
|
(*) These drivers are distributed with PAO (http://www.jp.freebsd.org/PAO/).
|
|
|
|
(**) There was some report says that, if you make sr driver up and down and
|
|
then up, the kernel may hang up. We have disabled frame-relay support from
|
|
sr driver and after that this looks to be working fine. If you need
|
|
frame-relay support to come back, please contact KAME deverlopers.
|
|
|
|
2.2 BSD/OS
|
|
|
|
The following lists BSD/OS device drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) PCMCIA official
|
|
patch(3) support?
|
|
--- --- --- --- ---
|
|
(Ethernet)
|
|
cnw ok ok ok yes
|
|
de ok ok no need -
|
|
ef ok ok ok yes
|
|
exp ok ok no need -
|
|
mz ok ok ok yes
|
|
ne ok ok ok yes
|
|
we ok ok ok -
|
|
(FDDI)
|
|
fpa ok ok - -
|
|
(ATM)
|
|
en maybe ok - -
|
|
(Serial)
|
|
ntwo ok ok - yes
|
|
sl ? - - not work
|
|
appp ? - - not work
|
|
|
|
You may want to use "@insert" directive in /etc/pccard.conf to invoke
|
|
"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
|
|
|
|
2.3 NetBSD
|
|
|
|
The following table lists the network drivers we have tried so far.
|
|
Note that, for NetBSD, we do not need one-liner patch to PCMCIA driver.
|
|
|
|
driver mbuf(1) multicast(2) official
|
|
support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
ne pci/i386 ok ok yes
|
|
ep pcmcia/i386 ok ok -
|
|
le sbus/sparc ok ok yes
|
|
(ATM)
|
|
en pci/i386 ok ok -
|
|
|
|
2.4 FreeBSD 3.x-RELEASE
|
|
|
|
Here is a list of FreeBSD 3.x-RELEASE drivers and its conditions:
|
|
|
|
driver mbuf(1) multicast(2) PCMCIA official
|
|
patch(3) support?
|
|
--- --- --- ---
|
|
(Ethernet)
|
|
fe ok ok ? yes
|
|
fxp ok ok - yes
|
|
lnc ? ok - -
|
|
cnw ok ok ok(*) -(**)
|
|
ep ok ok ok(*) -
|
|
sn ? ? ? -(**)
|
|
|
|
(*) With PAO3 patch applied. (http://www.jp.freebsd.org/PAO/).
|
|
(**) These drivers are distributed with PAO as PAO3
|
|
(http://www.jp.freebsd.org/PAO/).
|
|
|
|
More drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not
|
|
been checked yet.
|
|
|
|
3. Translator
|
|
|
|
We categorize IPv4/IPv6 translator into 4 types.
|
|
|
|
Translator A --- It is used in the early stage of transition to make
|
|
it possible to establish a connection from an IPv6 host in an IPv6
|
|
island to an IPv4 host in the IPv4 ocean.
|
|
|
|
Translator B --- It is used in the early stage of transition to make
|
|
it possible to establish a connection from an IPv4 host in the IPv4
|
|
ocean to an IPv6 host in an IPv6 island.
|
|
|
|
Translator C --- It is used in the late stage of transition to make it
|
|
possible to establish a connection from an IPv4 host in an IPv4 island
|
|
to an IPv6 host in the IPv6 ocean.
|
|
|
|
Translator D --- It is used in the late stage of transition to make it
|
|
possible to establish a connection from an IPv6 host in the IPv6 ocean
|
|
to an IPv4 host in an IPv4 island.
|
|
|
|
KAME provides an TCP relay translator for category A. This is called
|
|
"FAITH". We also provide IP header translator for category A.
|
|
|
|
3.1 FAITH TCP relay translator
|
|
|
|
FAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel.
|
|
FAITH will reserve an IPv6 address prefix, and relay TCP connection
|
|
toward that prefix to IPv4 destination.
|
|
|
|
For example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and
|
|
the IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12,
|
|
the connection will be relayed toward IPv4 destination 163.221.202.12.
|
|
|
|
destination IPv4 node (163.221.202.12)
|
|
^
|
|
| IPv4 tcp toward 163.221.202.12
|
|
FAITH-relay dual stack node
|
|
^
|
|
| IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
|
|
source IPv6 node
|
|
|
|
faithd must be invoked on FAITH-relay dual stack node.
|
|
|
|
For more details, consult kit/src/faithd/README.
|
|
|
|
3.2 IPv6-to-IPv4 header translator
|
|
|
|
(to be written)
|
|
|
|
4. IPsec
|
|
|
|
IPsec is mainly organized by three components.
|
|
|
|
(1) Policy Management
|
|
(2) Key Management
|
|
(3) AH and ESP handling
|
|
|
|
4.1 Policy Management
|
|
|
|
The policy management code is experimental, but this is almostly
|
|
conformed to RFC2401. You can manage the SPD by both the command
|
|
for administrater and the socket operation.
|
|
By the command "setkey", there are three directive as following:
|
|
|
|
"discard" means to discard the packet.
|
|
"none" means not to do IPsec.
|
|
"ipsec" means to do IPsec.
|
|
|
|
"ipsec" is followed by some IPsec requests like "protocol/level".
|
|
|
|
protocol is to be either "ah" or "esp".
|
|
level is to be either "default", "use" or "require".
|
|
|
|
"use" means that if a security association is available, use it
|
|
for outbound packet, request SA to the Key management daemon
|
|
through PF_KEY v2, and accept any inbound packets.
|
|
|
|
"require" means that if a security association does not exist
|
|
for outbound traffic, acquire one and discard the packet untill
|
|
SA is found, and require inbound packets to use security.
|
|
|
|
"default" is consulted to system wide default defined "sysctl" MIBs:
|
|
net.inet.ipsec.esp_trans_deflev
|
|
net.inet.ipsec.esp_net_deflev
|
|
net.inet.ipsec.ah_trans_deflev
|
|
net.inet.ipsec.ah_net_deflev
|
|
net.inet6.ipsec6.esp_trans_deflev
|
|
net.inet6.ipsec6.esp_net_deflev
|
|
net.inet6.ipsec6.ah_trans_deflev
|
|
net.inet6.ipsec6.ah_net_deflev
|
|
|
|
They are 1:use or 2:require.
|
|
|
|
When you need to setup IPsec tunnel mode, you use the format of
|
|
"procol/level/peer". "peer" is IP address of the tunnel end-point.
|
|
|
|
By the socket operation, there are three directive as following:
|
|
"ipsec" means to do IPsec, see above.
|
|
"entrust" means to consult to SPD defined by the command.
|
|
"bypass" means not to do IPsec. This is for priveleged socket.
|
|
|
|
If kernel doesn't find out policy entry, then system wide default
|
|
is applied. You can specify the system wide default as discarding
|
|
packet or not to do IPsec.
|
|
|
|
net.inet.ipsec.def_policy
|
|
net.inet6.ipsec6.def_policy
|
|
|
|
They are 0:discard or 1:none.
|
|
|
|
You can see these values in netinet6/ipsec.h.
|
|
|
|
The policy entry is not re-ordered with its
|
|
indexes, so the order of entry when you add is very significant.
|
|
But we think it should be fixed in the future.
|
|
|
|
4.2 Key Management
|
|
|
|
The key management code implemented in this kit (sys/netkey) is a
|
|
home-brew PFKEY v2 implementation. This conforms to RFC2367.
|
|
|
|
The home-brew IKE daemon, "racoon" is included in the kit
|
|
(kit/src/racoon). It can perform key exchanges in some limited
|
|
conditions, however, it may take some more time to be stabilized.
|
|
|
|
4.3 AH and ESP handling
|
|
|
|
IPsec module is implemented as "hooks" to the standard IPv4/IPv6
|
|
processing. When sending a packet, ip{,6}_output() checks if ESP/AH
|
|
processing is required by checking if a matching SPD (Security
|
|
Policy Database) is found. If ESP/AH is needed,
|
|
{esp,ah}{4,6}_output() will be called and mbuf will be updated
|
|
accordingly. When a packet is received, {esp,ah}4_input() will be
|
|
called based on protocol number, i.e. (*inetsw[proto])().
|
|
{esp,ah}4_input() will decrypt/check authenticity of the packet,
|
|
and strips off daisy-chained header and padding for ESP/AH. It is
|
|
safe to strip off the ESP/AH header on packet reception, since we
|
|
will never use the received packet in "as is" form.
|
|
|
|
By using ESP/AH, TCP4/6 MSS will be affected by extra daisy-chained
|
|
headers inserted by ESP/AH. Our code takes care of the case.
|
|
|
|
Basic crypto functions can be found in directory "sys/crypto". ESP/AH
|
|
transform are listed in {esp,ah}_core.c with wrapper functions. If you
|
|
wish to add some algorithm, add wrapper function in {esp,ah}_core.c, and
|
|
add your crypto algorithm code into sys/crypto.
|
|
|
|
Tunnel mode is partially supported in this release, with the following
|
|
restrictions:
|
|
- IPsec tunnel is not combined with GIF generic tunneling interface.
|
|
It needs a great care because we may create an infinite loop between
|
|
ip_output() and tunnelifp->if_output(). Opinion varies if it is better
|
|
to unify them, or not.
|
|
- MTU and Don't Fragment bit (IPv4) considerations need more checking, but
|
|
basically works fine.
|
|
- Authentication model for AH tunnel must be revisited. We'll need to
|
|
improve the policy management engine, eventually.
|
|
|
|
4.4 Conformance to RFCs and IDs
|
|
|
|
The IPsec code in the kernel conforms (or, tries to conform) to the
|
|
following standards:
|
|
"old IPsec" specification documented in rfc182[5-9].txt
|
|
"new IPsec" specification documented in rfc240[1-6].txt, rfc241[01].txt,
|
|
rfc2451.txt and draft-mcdonald-simple-ipsec-api-01.txt (draft expired,
|
|
but you can take from ftp://ftp.kame.net/pub/internet-drafts/).
|
|
(NOTE: IKE specifications, rfc241[7-9].txt are implemented in userland,
|
|
as "racoon" IKE daemon)
|
|
|
|
Currently supported algorithms are:
|
|
old IPsec AH
|
|
null crypto checksum (no document, just for debugging)
|
|
keyed MD5 with 128bit crypto checksum (rfc1828.txt)
|
|
keyed SHA1 with 128bit crypto checksum (no document)
|
|
HMAC MD5 with 128bit crypto checksum (rfc2085.txt)
|
|
HMAC SHA1 with 128bit crypto checksum (no document)
|
|
old IPsec ESP
|
|
null encryption (no document, similar to rfc2410.txt)
|
|
DES-CBC mode (rfc1829.txt)
|
|
new IPsec AH
|
|
null crypto checksum (no document, just for debugging)
|
|
keyed MD5 with 96bit crypto checksum (no document)
|
|
keyed SHA1 with 96bit crypto checksum (no document)
|
|
HMAC MD5 with 96bit crypto checksum (rfc2403.txt
|
|
HMAC SHA1 with 96bit crypto checksum (rfc2404.txt)
|
|
new IPsec ESP
|
|
null encryption (rfc2410.txt)
|
|
DES-CBC with derived IV
|
|
(draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired)
|
|
DES-CBC with explicit IV (rfc2405.txt)
|
|
3DES-CBC with explicit IV (rfc2451.txt)
|
|
BLOWFISH CBC (rfc2451.txt)
|
|
CAST128 CBC (rfc2451.txt)
|
|
RC5 CBC (rfc2451.txt)
|
|
each of the above can be combined with:
|
|
ESP authentication with HMAC-MD5(96bit)
|
|
ESP authentication with HMAC-SHA1(96bit)
|
|
|
|
The following algorithms are NOT supported:
|
|
old IPsec AH
|
|
HMAC MD5 with 128bit crypto checksum + 64bit replay prevention
|
|
(rfc2085.txt)
|
|
keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt)
|
|
|
|
4.5 ECN consideration on IPsec tunnels
|
|
|
|
KAME IPsec implements ECN-friendly IPsec tunnel, described in
|
|
draft-ipsec-ecn-00.txt.
|
|
Normal IPsec tunnel is described in RFC2401. On encapsulation,
|
|
IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
|
|
IP header to outer IP header. On decapsulation outer IP header
|
|
will be simply dropped. The decapsulation rule is not compatible
|
|
with ECN, since ECN bit on the outer IP TOS/traffic class field will be
|
|
lost.
|
|
To make IPsec tunnel ECN-friendly, we should modify encapsulation
|
|
and decapsulation procedure. This is described in
|
|
http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt, chapter 3.
|
|
|
|
KAME IPsec tunnel implementation can give you three behaviors, by setting
|
|
net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value:
|
|
- RFC2401: no consideration for ECN (sysctl value -1)
|
|
- ECN forbidden (sysctl value 0)
|
|
- ECN allowed (sysctl value 1)
|
|
Note that the behavior is configurable in per-node manner, not per-SA manner
|
|
(draft-ipsec-ecn-00 wants per-SA configuration, but it looks too much for me).
|
|
|
|
The behavior is summarized as follows (see source code for more detail):
|
|
|
|
encapsulate decapsulate
|
|
--- ---
|
|
RFC2401 copy all TOS bits drop TOS bits on outer
|
|
from inner to outer. (use inner TOS bits as is)
|
|
|
|
ECN forbidden copy TOS bits except for ECN drop TOS bits on outer
|
|
(masked with 0xfc) from inner (use inner TOS bits as is)
|
|
to outer. set ECN bits to 0.
|
|
|
|
ECN allowed copy TOS bits except for ECN use inner TOS bits with some
|
|
CE (masked with 0xfe) from change. if outer ECN CE bit
|
|
inner to outer. is 1, enable ECN CE bit on
|
|
set ECN CE bit to 0. the inner.
|
|
|
|
General strategy for configuration is as follows:
|
|
- if both IPsec tunnel endpoint are capable of ECN-friendly behavior,
|
|
you'd better configure both end to "ECN allowed" (sysctl value 1).
|
|
- if the other end is very strict about TOS bit, use "RFC2401"
|
|
(sysctl value -1).
|
|
- in other cases, use "ECN forbidden" (sysctl value 0).
|
|
The default behavior is "ECN forbidden" (sysctl value 0).
|
|
|
|
For more information, please refer to:
|
|
http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt
|
|
RFC2481 (Explicit Congestion Notification)
|
|
KAME sys/netinet6/{ah,esp}_input.c
|
|
|
|
(Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis)
|
|
|
|
5. IPComp
|
|
|
|
IPComp stands for IP payload compression protocol. This is aimed for
|
|
payload compression, not the header compression like PPP VJ compression.
|
|
This may be useful when you are using slow serial link (say, cell phone)
|
|
with powerful CPU (well, recent notebook PCs are really powerful...).
|
|
The protocol design of IPComp is very similar to IPsec.
|
|
|
|
KAME implements the following specifications:
|
|
- RFC2393: IP Payload Compression Protocol (IPComp)
|
|
- RFC2394: IP Payload Compression Using DEFLATE
|
|
|
|
Here are some points to be noted:
|
|
- IPComp is treated as part of IPsec protocol suite, and SPI and
|
|
CPI space is unified. Spec says that there's no relationship
|
|
between two so they are assumed to be separate.
|
|
- IPComp association (IPCA) is kept in SAD.
|
|
- It is possible to use well-known CPI (CPI=2 for DEFLATE for example),
|
|
for outbound/inbound packet, but for indexing purposes one element from
|
|
SPI/CPI space will be occupied anyway.
|
|
- pfkey is modified to support IPComp. However, there's no official
|
|
SA type number assignment yet. Portability with other IPComp
|
|
stack is questionable (anyway, who else implement IPComp on UN*X?).
|
|
- Spec says that IPComp output must be performed before IPsec output
|
|
processing. However, with manual SPD setting, you can violate this
|
|
ordering requirement (KAME code is too generic, maybe).
|
|
- Though MTU can be significantly decreased by using IPComp, no special
|
|
consideration is made about path MTU (spec talks nothing about MTU
|
|
consideration). IPComp is designed for serial links, not ethernet-like
|
|
medium, it seems.
|
|
- You can change compression ratio on outbound packet, by changing
|
|
deflate_policy in sys/netinet6/ipcomp_core.c (should it be sysctl accessible?
|
|
or per-SAD configurable?)
|
|
|
|
<end of IMPLEMENTATION>
|