In IPsec Tx side, one Security Association can be used by multiple CPUs.
On the other hand, in IPsec Rx side, one Security Association is used
by only one CPU.
XXX pullup-{8,9}
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
When key_timehandler_spd() spent over one second, the "now" argument of
key_timehandler_sad() could be older than sav->created. That caused SA
was expired immediately.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.
Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by ozaki-r@ and yamaguchi@
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.
Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@
The update API updates an SA by creating a new SA and removing an existing SA.
The previous change removed a newly added SA wrongly if an existing SA had been
created by the getspi API.
An sav can be removed from belonging list(s) twice resulting in an assertion
failure of pslist. It can occur if the following two operations interleave:
(i) a deletion or a update of an SA via the API, and
(ii) a state change (key_sa_chgstate) of the same SA by the timer.
Note that even (ii) removes an sav once from its list(s) on a update.
The cause of the race condition is that the two operations are not serialized
and (i) doesn't get and remove an sav from belonging list(s) atomically. So
(ii) can be inserted between an acquisition and a removal of (i).
Avoid the race condition by making (i) atomic.
initialized, and the padding of the spidx structure is not initialized
either. This causes the memcmp() to wrongfully fail.
Change ipsec_setspidx() to always initialize spdix.dir and zero out the
padding.
ok ozaki-r@
MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
====================
5. SPD Update
// snip
SADB_X_SPDADD:
// snip
sadb_x_ipsecrequest_reqid:
An ID for that SA can be passed to the kernel in the
sadb_x_ipsecrequest_reqid field.
If tunnel mode is specified, the sadb_x_ipsecrequest structure is
followed by two sockaddr structures that define the tunnel
endpoint addresses. In the case that transport mode is used, no
additional addresses are specified.
====================
see: https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01
ipsecif(4) uses transport mode, so it should not add addresses.
These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.
HOWEVER! Some subsystems have
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))
even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.
To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.
I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:
cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))
It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.
Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
Use mutex here instead of pserialize because using mutex is simpler than
using psz+ref, which is another solution, and key_checkspidup isn't called in
any performance-sensitive paths.
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp
The two previous names were inconsistent and misleading.
Put the two functions into in_offload.c. Add comments to explain what
we're doing.
The same could be done for IPv6.
and now in PR/53334. Basically non-IKE markers come from a deprecated
draft, and our kernel code for them has never worked.
Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE.
Perhaps we should also add a check in key_handle_natt_info(), to make
sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB.