netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.
XXX pullup-10 - hopefully before RC2
RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.
This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html
Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.
Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.
Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.
Fix the issue by initializing the components in bootup as usual.
Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.
If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.
Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.
This reverts part of nd6.c r1.14 - an 18 year old commit!
Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.
This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted (RTM_DELETED)
or has failed to been resolved (RTM_MISS). The latter case can be
interpreted as unreachable.
This hasn't been the case for a long time if you're a dhcpcd
user with a default config. As such, it's possible for the default
IPv6 router as set by dhcpcd could be erroneously gc'ed by nd6_free.
This reduces the scope of the ND6_WLOCK taken as well as fixing an
issue where we write to ln->ln_state without a lock being held.
On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.
To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.
Unless it's lo0, where we then flush the lot.
The maintains the status-quo with ndp(8) and allows dhcpcd(8) to at least
try and work with kernel RA on one interface and dhcpcd on another.
Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.
HOWEVER! Some subsystems have
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))
even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.
To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.
I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:
cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))
It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.
Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).