Commit Graph

91 Commits

Author SHA1 Message Date
andvar 2e0bf311b3 fix multiplei repetitive typos in comments, messages and documentation. mainly because copy paste code big amount of files are affected. 2021-08-17 22:00:26 +00:00
ryo 798cc6a0c1 flowlabel will never return anything other than 1 or 0.
s/&&/&/
2021-03-11 11:10:22 +00:00
christos 0b3745dbe8 no need for ip6_id.c... 2021-03-08 18:22:16 +00:00
christos 4b58b6c56b netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)
2021-03-07 15:01:00 +00:00
ozaki-r 9e214c7fd5 inet6: reduce silent packet discards 2020-08-28 06:32:24 +00:00
ozaki-r 4c639cc739 inet6: pass rcvif to ip6_forward to avoid extra psref_acquire 2020-08-28 06:28:58 +00:00
ozaki-r c1e00d7df1 inet, inet6: count packets dropped by IPsec
The counters count packets dropped due to security policy checks.
2020-08-28 06:19:13 +00:00
maxv 0551fa110c localify 2020-06-19 16:08:06 +00:00
roy b05648aa26 Remove in-kernel handling of Router Advertisements
This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
2020-06-12 11:04:44 +00:00
ozaki-r 6d8eb4f9d2 Count packets dropped by pfil 2019-05-13 07:47:59 +00:00
ozaki-r 42cd9a0569 Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions 2018-11-29 09:51:20 +00:00
maxv f574a1384a Re-make ip6_nexthdr global, it will be used in soon-to-be-added code... 2018-02-14 05:29:39 +00:00
maxv 4e9bde34d6 Style, localify, remove dead code, and fix typos. No functional change. 2018-01-30 15:54:02 +00:00
maxv 71ad96023a Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len

is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.

The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.

But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.

However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.

As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.

Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.

Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.

This place is still fragile.
2018-01-30 14:49:25 +00:00
maxv 6f5024f576 Start cleaning up ip6_input.c. Several pieces of code have evolved but
their neighboring comments were not updated. So update them, and remove
code that has been disabled for years (it has no use anyway).
2018-01-29 10:57:13 +00:00
maxv e3090ee154 Several changes:
* Move the structure definitions into frag6.c, they should not be used
   elsewhere.

 * Rename ip6af_mff -> ip6af_more, and switch it to bool, easier to
   understand.

 * Remove IP6_REASS_MBUF, no point in keeping this.

 * Remove ip6q_arrive and ip6q_nxtp, unused.

 * Style.
2018-01-25 15:33:06 +00:00
knakahara 4ab3af3e3e add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
2018-01-10 10:56:30 +00:00
ozaki-r 2495e7a0c7 Pass inpcb/in6pcb instead of socket to ip_output/ip6_output
- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
2017-03-03 07:13:06 +00:00
ozaki-r 36ae5d22b0 Make usages of ifp MP-safe in some functions of IP multicast 2017-03-02 05:24:23 +00:00
ozaki-r 3f909d1769 Do ND in L2_output in the same manner as arpresolve
The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
  - old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
  - new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
2017-02-14 03:05:06 +00:00
ozaki-r 4c25fb2f83 Add rtcache_unref to release points of rtentry stemming from rtcache
In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
2016-12-08 05:16:33 +00:00
ozaki-r d0432711b6 Tidy up in6_select*
This change tidies up in6_select* functions, especially
selectroute.

selectroute is annoying because:
- It returns both/either of a rtentry and/or an ifp
  - Yes, it may return only an ifp!
    - It is valid but selectroute shouldn't handle the case
  - Such conditional behavior makes it difficult
    to apply locking/psref thingy
- It may return a rtentry even if error
- It may use opt->ip6po_nextroute rtcache implicitly
  - The caller can know if it is used
    by rtcache_validate(&opt->ip6po_nextroute)
    but it's racy in MP-safe world
  - Even if it uses opt->ip6po_nextroute, it may
    return a rtentry that isn't derived from the rtcache

The change includes:
- Rename selectroute to in6_selectroute
  - Let a remaining caller of selectroute, in6_selectif,
    use in6_selectroute instead
- Let in6_selectroute return only an rtentry
  - If error, it doesn't return an rtentry
  - A caller gets an ifp from a returned rtentry
- Allow in6_selectroute to modify a passed rtcache
  and a caller can know if opt->ip6po_nextroute is
  used via the rtcache
- Let callers (ip6_output and in6_selectif) handle
  the case that only an ifp is required

Inspired by OpenBSD
Proposed on tech-kern and tech-net
LGTM by roy@
2016-11-10 04:13:53 +00:00
ozaki-r 0f3a44863e Fix race condition of in6_selectsrc
in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
2016-10-31 04:16:25 +00:00
knakahara 74c24413b3 improve fast-forward performance when the number of flows exceeds ip6_maxflows.
This is porting of ip_flow.c:r1.76

In ip6flow case, the before degradation is about 45%, the after degradation is
bout 55%.
2016-08-23 09:59:20 +00:00
ozaki-r 4badfc204a Make sure returning ifp from in6_select* functions psref-ed
To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
2016-06-21 10:25:27 +00:00
ozaki-r 43c5ab376f Replace ifp of ip_moptions and ip6_moptions with if_index
The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
2016-06-21 03:28:27 +00:00
ozaki-r fe6d427551 Avoid storing a pointer of an interface in a mbuf
Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
2016-06-10 13:31:43 +00:00
roy 9daa8a6db0 Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
2015-01-20 21:27:36 +00:00
christos 4f85a755f8 Refactor the multicast membership code so that we can handle v4 mapped
addresses using the v6 membership ioctls.
2014-10-12 19:00:21 +00:00
rmind 60d350cf6d - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
2014-06-05 23:48:16 +00:00
rmind 4ae03c1815 - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
2014-05-19 02:51:24 +00:00
rmind 39bd8dee77 Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw.  Welcome to 6.99.62!
2014-05-18 14:46:15 +00:00
christos 443eb0a284 4 new sysctls to avoid ipv6 DoS attacks from OpenBSD 2012-06-23 03:13:41 +00:00
liamjfoy b723329891 Remove ip6f_start from ip6f struct 2012-01-19 13:19:34 +00:00
drochner cf21c579f1 add patch from Arnaud Degroote to handle IPv6 extended options with
(FAST_)IPSEC, tested lightly with a DSTOPTS header consisting
of PAD1
2012-01-10 20:01:56 +00:00
zoltan 766dd565c7 Change the IPv6 reassembly mechanism to use mutex(9).
Also add ip6_reass_packet() to be used by NPF.
2011-11-04 00:22:33 +00:00
spz 5f1fd2312c RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
  a common 2 interface client will have 6, the default limit is 100 and
  can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
  This is at present only across all interfaces even though per-interface
  would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
2011-05-24 18:07:11 +00:00
dyoung ac162b774b *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag.  Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
2011-05-03 17:44:30 +00:00
elad 4188b89914 Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

	http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
2009-05-06 21:41:59 +00:00
liamjfoy 29f894919e Init ip6flow pool dynamically instead of using a linkset. 2009-03-23 18:43:20 +00:00
plunky fd7356a917 Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
2008-08-06 15:01:23 +00:00
ad 15e29e981b Merge the socket locking patch:
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
2008-04-24 11:38:36 +00:00
thorpej 0dd41b37de Make ip6 and icmp6 stats per-cpu. 2008-04-15 03:57:04 +00:00
thorpej 3f466bce48 Change IPv6 stats from a structure to an array of uint64_t's.
Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
2008-04-08 23:37:43 +00:00
dyoung ff82b311dd No code ever sets struct ip6_pktopts member ip6po_m, so get rid of
it.
2008-03-19 08:10:18 +00:00
dyoung 6709ce7762 The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux.  A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count.  I have removed the
pointer from ip6aux.  I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.
2007-10-29 16:54:42 +00:00
dyoung 08e6f22226 Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

        Introduce rt_walktree() for walking the routing table and
        applying a function to each rtentry.  Replace most
        rn_walktree() calls with it.

        Use rt_getkey()/rt_setkey() to get/set a route's destination.
        Keep a pointer to the sockaddr key in the rtentry, so that
        rtentry users do not have to grovel in the radix_node for
        the key.

        Add a RTM_GET method to rtrequest.  Use that instead of
        radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

        Constify.  KNF.  Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
        et cetera.  Use NULL instead of 0 for null pointers.  Use
        __arraycount().  Reduce gratuitous parenthesization.

        Stop using variadic arguments for rip6_output(), it is
        unnecessary.

        Remove the unnecessary rtentry member rt_genmask and the
        code to maintain it, since nothing actually used it.

        Make rt_maskedcopy() easier to read by using meaningful variable
        names.

        Extract a subroutine intern_netmask() for looking up a netmask in
        the masks table.

        Start converting backslash-ridden IPv6 macros in
        sys/netinet6/in6_var.h into inline subroutines that one
        can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
2007-07-19 20:48:52 +00:00
yamt c8a34d8e58 remove net.inet6.ip6.rht0 sysctl.
it's too dangerous compared to its benefit.

strongly requested by itojun@.  ok'ed by core@.
2007-05-17 11:48:42 +00:00
dyoung 72f0a6dfb0 Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing.  Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously.  Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs.  I have
  introduced routines for allocating, copying, and duplicating,
  and freeing sockaddrs:

        struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
        struct sockaddr *sockaddr_copy(struct sockaddr *dst,
                                       const struct sockaddr *src);
        struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
        void sockaddr_free(struct sockaddr *sa);

  sockaddr_alloc() returns either a sockaddr from the pool belonging
  to the specified family, or NULL if the pool is exhausted.  The
  returned sockaddr has the right size for that family; sa_family
  and sa_len fields are initialized to the family and sockaddr
  length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
  sockaddr_in).  sockaddr_free() puts the given sockaddr back into
  its family's pool.

  sockaddr_dup() and sockaddr_copy() work analogously to strdup()
  and strcpy(), respectively.  sockaddr_copy() KASSERTs that the
  family of the destination and source sockaddrs are alike.

  The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
  passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
  family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
  etc.  They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more.  All protocol families
  use struct route.  I have changed the route cache, 'struct route',
  so that it does not contain storage space for a sockaddr.  Instead,
  struct route points to a sockaddr coming from the pool the sockaddr
  belongs to.  I added a new method to struct route, rtcache_setdst(),
  for setting the cache destination:

        int rtcache_setdst(struct route *, const struct sockaddr *);

  rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
  available to create the sockaddr storage.

  It is now possible for rtcache_getdst() to return NULL if, say,
  rtcache_setdst() failed.  I check the return value for NULL
  everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
  caches, dom_rtcache.  rtflushall(sa_family_t af) looks up the
  domain indicated by 'af', walks the domain's list of route caches
  and invalidates each one.
2007-05-02 20:40:22 +00:00
christos 30921e7925 fix typo. 2007-04-22 20:06:07 +00:00