example etype 0x88CA (TIPC (Transparent Inter Process Communication,)
or 0x893A (IEEE 1905).
Classify them as dropped like Linux does (FreeBSD just ignores them). From RVP.
to get status of l2tp(4) added to lagg
NOTE:
SIOCGLAGGPORT is based on FreeBSD implementation.
And, currently, it is not used in NetBSD kernel/userland.
This causes the following KASSERT failure in pppoe server.
- sppp_rcr_event()
- sppp_ipcp_confreq()
- sppp_get_ip_addrs()
- psref_release()
After if_spppsubr.c:1.227, sppp_ipcp_confreq() is done in workqueue
instead of softint.
- Accept "ifconfig lagg* laggport l2tp*"
- Set promiscuous mode when the added interface is l2tp*
- check IFF_UP in addition to IFF_RUNNING on
SIOCSIFFLAGS to a child interface.
lagg(4) uses a configured link-layer (MAC) address instead
of a random MAC address generated on creating.
The configured MAC address is copied to all child interface
and used for a system id of LACP.
to link-state change hook
The callback is registered in every vlan I/F even if the parent
interface is the same. Therefore it is not needed to search the
vlan I/F by the parent interface unlike the previous callback.
agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).
After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.
This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.
NetBSD 9.99.89
Doing "d->bd_promisc = 0" is that bpf_detach() does not call
ifpromisc(ifp, 0). Currently, there is no reason for
this behavior so that it is removed.
In addition to the change, the workaround for it in vlan(4)
is also removed.
These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:
https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html
However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:
https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html
So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)
There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.
(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)
Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
that allows a driver to track listeners attaching/detaching from tap
points.
This is usefull for drivers that would have to do extra work for some
taps and can not easily decide (at the driver level) if the work would
be needed further up the stack.
An example is providing radiotap headers for IEEE 802.11 frames.
even a reconnection is scheduled
The queue for events in if_spppsubr.c is not possible
to enqueue the same event. So, The close event caused
while a close event and open event are enqueued for
reconnection is not possible to stop interface.
To solve this issue, The open event after
"ifconfig pppoe? down" is dropped.
Almost network interface do not use if_down() even when there is no
connectivity. So, pppoe(4) is also made be not used it.
This behavior can be rollbacked by SPPP_IFDOWN_RECONNECT option.
In if_spppsubr.c down and up do not mean that LCP is stopping
or running, but mean that the lower layer of LCP is up or down.
And, restarting of LCP is had to use close event and open event.
Clang is stricter than GCC when it comes to nonliteral format strings.
sys/net/lagg/if_lagg.c:2372:12: error:
format string is not a string literal [-Werror,-Wformat-nonliteral]
A authentication failed by TO+ event between RCA and RCR events
1. RCA event in REQ-SENT state
- REQ-SENT => ACK-RCVD
2. TO+ event
- ACK-RCVD => REQ-SENT
3. RCR+ event
- REQ-SENT => ACK-SENT
By moving RCA after RCR, the state is transisted to OPENED
1. RCR event
- REQ-SENT => ACK-SENT
2. TO+ event
- state is not changed
3. RCA event
- ACK-SENT => OPENED
and add initialization of saved_hisaddr for safety
0.0.0.0 was sometimes configured to destination address when
ipcp close was occurred before ipcp tlu.
Following messages will be appeared when the issue is encountered and
debug for pppoe(4) is enabled.
tc-so:[ 1.890005] pppoe0: ipcp close(starting)
(snip)
tc-so:[ 1.890005] pppoe0: ipcp_open(): no IP interface
In principle this might just push a real problem around, but this is
unlikely to be a real problem because:
1. The large stack frames are really only in the setup state machine
message handlers, which run at the top loop of a thread with a
shallow stack anyway.
2. If these are inlined, gcc might create multiple nonoverlapping
stack buffers, whereas if not inlined, the stack frames from
consecutive or alternative procedure calls would overlap anyway.
(I haven't investigated exactly what's going on leading to ~5 KB-byte
stack frames, but this shuts gcc up, at least, and the hypotheses
sound plausible to me!)
from softint context
When the pases were processed in softint, the state machine
in if_spppsubr.c had been broken by simultaneous events
on rare occasions.
Example:
1. Do ifconfig pppoe* up
- lcp open event is enqueued to workqueue
2. Receive conf-ack, and parse the packet
- save mru to sp->lcp.their_mru
- lcp RCR+ event is enqueued to workqueue
3. Process lcp open event
- initialize data including sp->lcp.their_mru
4. Process lcp RCR+ event
- Use sp->lcp.their_mru
- but it was initialized
in myauthproto and hisauthproto
When the flag is enabled, a authentication protocol notified
at LCP negotiation is used as my authentication protocol.
When the flags is NOT enabled, my authentication protoco is
not changed at LCP negotiation.
The locks were held while callout_halt() and workqueue_wait()
without reason.
And the locks also were held at callout and workqueue handler
so that the handler kicked by those function couldn't acquire
the lock.
The reasons why those are unneccesary are:
- Items of callout_t are protected by callout_lock
- Items of struct workqueue and struct work are protected
by q_mutex in struct workqueue
- Items of struct sppp_work protected by atomic_cas(3)
- struct pppoe_softc does not free before workqueue_wait() and
callout_halt() even if the locks are not held
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
This flag was only set for virtual interfaces.
All virtual interfaces have a means of knowing if they are going to work
or not and as such now support link state changes.
If we want this flag back, it should be used as an indicator that
the interfaces does not support link state changes that userland can use
so it can make a decision on what to do when the link state is UNKNOWN.
The exact flow is detatch addresses, join bridge and then mark detached
addresses as tentative.
This ensures that Duplicate Address Detection for the joining interface
are performed across all members of the bridge.
The vether interface simulates a normal Ethernet interface by encapsulating
standard network frames with an Ethernet header, specifically for use as
a member in a bridge(4).
To use vether the administrator needs to configure an address onto the
interface so that packets can be routed to it. An Ethernet header will
be prepended and, if the vether interface is a member of a bridge(4),
the frame will show up there.
Taken from OpenBSD.
If any member is LINK_STATE_UP then it's LINK_STATE_UP.
Otherwise if any member is LINK_STATE_UNKNOWN then it's LINK_STATE_UNKNOWN.
Otherwise it's LINK_STATE_DOWN.
Link state changes are not dependant on the interface being up, but we also
need to guard against more link state changes being scheduled when the
interface is being detached.
We do this by clearing the link queue but keeping if_link_sheduled = true.
We can check for this in both if_link_state_change() and
if_link_state_change_work() to abort early as there is no point in doing
anything if the interface is being detached because if_down() is called
in if_detach() after the workqueue has been drained to the same overall
effect.
For AF_LINK addrs from getifaddrs(2), ifa_data is struct if_data.
This in turn holds ifi_link_state which we can use to report
link status if the interface does not support media where it's normally
reported.
Based on OpenBSD.
RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.
This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
While here, remove the IFQ_CLASSIFY bottleneck (takes the ifq lock,
so it would serialize all transmission to all peers on a single wg(4)
interface).
altq can be disabled at compile-time or at run-time; even if included
at comple-time the run-time impact should be negligible if disabled.
Should really fix workqueue(9) so workqueue_create can be done before
CPUs have been detected in configure, but this will serve as a stop-
gap measure.
This brings the benefit of Neighbour Unreachability Detection which is
something ARP sorely lacks.
The new timings mirror those of IPv6 and are adjustable via sysctl(8).
Unlike IPv6 ND, these are global and not per interface.
Otherwise pktqueues can't be created before all CPUs are detected --
they will have a queue only for the primary CPU, not for others.
This will also be necessary if we want to add CPU hotplug (still need
some way to block hotplug during pktq_set_maxlen but it's a start).
- This is safe because wgp_endpoint_changing locks out any attempts
to change the endpoint until the draining is complete.
- This is necessary to avoid a deadlock where the handshake thread
holds a psref and awaits mutex_enter(wgp->wgp_lock).
XXX The same deadlock may occur in wg_destroy_session. Not clear
that it's safe to just release wgp_lock there; may need to create a
new session state, say WGS_STATE_DRAINING, while we wait for
psref_target_destroy. But this needs a little more thought; a new
state may not be necessary, and would be nice to avoid if not
necessary.
- Using threadpool(9) job per interface to receive incoming handshake
messages gives the same concurrency for active interfaces but
doesn't waste kthreads for inactive ones.
=> Can't really do this with a global workqueue(9) because there's
no bound on the amount of time wg_receive_packets() might run
for; we really need separate threads or threadpool jobs in order
to avoid having one interface starve all the others.
- Using a global workqueue(9) for asynchronous peer tasks avoids
creating unnecessary kthreads.
=> Each task does a more or less bounded amount of work, so it's OK
to share a global workqueue -- there's no advantage to adding
concurrency for what is almost certainly going to be CPU-bound
asymmetric crypto.
=> This way we don't need a thread per peer or iteration over a
list of all peers, so the task mechanism should no longer be a
bottleneck to scaling to thousands of peers.
XXX This doesn't distribute the load across CPUs -- it keeps it on
the same CPU where the packet came in. Should consider doing
something to balance the load -- maybe note if the current CPU is
loaded, and if so, sort CPUs by queue length or some other measure of
load and pick the least loaded one or something.
- Improves scalability -- won't hit limit on softints no matter how
many peers there are.
- Improves parallelism -- softint was kernel-locked to serialize
access to the pcq.
- Requires per-peer queue on handshake init to avoid dropping first
packet.
. Per-peer queue is currently a single packet -- should serve well
enough for pings, dns queries, tcp connections, &c.
Summary: Access to a stable established session is still allowed via
psref; all other access to peer and session state is now serialized
by struct wg_peer::wgp_lock, with no dancing around a per-session
lock. This way, the handshake paths are locked, while the data
transmission paths are pserialized.
- Eliminate struct wg_session::wgs_lock.
- Eliminate wg_get_unstable_session -- access to the unstable session
is allowed only with struct wgp_peer::wgp_lock held.
- Push INIT_PASSIVE->ESTABLISHED transition down into a thread task.
- Push rekey down into a thread task.
- Allocate session indices only on transition from UNKNOWN and free
them only on transition back to UNKNOWN.
- Be a little more explicit about allowed state transitions, and
reject some nonsensical ones.
- Sprinkle assertions and comments.
- Reduce atomic r/m/w swap operations that can just as well be
store-release.