Rip6 entry point could see a garbage Hop6 option.
Not a big issue, since it's a clean panic only triggerable if the socket
has the IN6P_DSTOPTS/IN6P_RTHDR option.
Reported-by: syzbot+3b07b3511b4ceb8bf1e2@syzkaller.appspotmail.com
ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.
- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.
Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.
Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.
We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.
This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.
A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.
We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.
Fix PR kern/53043
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.
NFC if NET_MPSAFE is disabled (default).
kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()
all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.
The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.
This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.
By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.
Reviewed by knakahara@