a FreeBSD bug report, by Jason Mader.
The RFC specifies that under IPv6 the complete AH header must be 64bit-
aligned, and under IPv4 32bit-aligned. That's a rule we've never respected.
The other BSDs and MacOS never have either.
So respect it now.
This makes it possible to set up IPv6-AH between Linux and NetBSD, and also
probably between Windows and NetBSD.
Until now all the tests I made were between two *BSD hosts, and everything
worked "correctly" since both hosts were speaking the same non-standard
AHv6, so they could understand each other.
Tested with Fedora<->NetBSD, hmac-sha2-384.
esp_hdrsiz, and clarify.
Until now we were using ah_hdrsiz, and were relying on the fact that the
size of the AH header happens to be equal to that of the ESP trailer.
Now the size of the ESP trailer is added manually. This also fixes one
branch in esp_hdrsiz: we always append an ESP trailer, so it must always
be taken into account, and not just when an ICV is here.
Trailer (two bytes), and we were doing minus two all the time.
Declare 'tlen', which contains padlen+ESP_Trailer+ICV, and use 'struct
esptail' instead of hardcoding the construction of the trailer. 'padlen'
now indicates only the length of the padding, so no need to do -2.
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.
Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.
Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
reasons. RH0 was already removed in the kernel's input path, but some
parts were still present in the output path: they are now removed.
Sent on tech-net@ a few days ago.
ensures 8 bytes are present), add an XXX (about the fact that it is
better to use m_copydata, because it is faster and less error-prone), and
improve two m_copybacks (remove useless casts).
net.inet6.esp6
net.inet6.ipcomp6
net.inet6.ah6
subtrees. They are aliases to net.inet6.ipsec6, but they are not
consistent with the original intended naming. (eg there was
net.inet6.esp6.esp_trans_deflev instead of net.inet6.esp6.trans_deflev).
net.inet.esp.trans_deflev = net.inet.ipsec.esp_trans_deflev
net.inet.esp.net_deflev = net.inet.ipsec.esp_net_deflev
net.inet.ah.cleartos = net.inet.ipsec.ah_cleartos
net.inet.ah.offsetmask = net.inet.ipsec.ah_offsetmask
net.inet.ah.trans_deflev = net.inet.ipsec.ah_trans_deflev
net.inet.ah.net_deflev = net.inet.ipsec.ah_net_deflev
Use the convention on the right. Discussed a month ago on tech-net@.
m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);
This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.
Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.
The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().
key_sad.sahlists doesn't work well for inbound packets because
its key includes source address. For the reason, the
look-up-table for the inbound packets is newly added.
The table has all sav whose state is MATURE or DYING and uses a
key calculated by destination address, protocol, and spi instead
of saidx.
reviewd ozaki-r@n.o, thanks.
An saidx of sah included in the list is unique so that
the search can use a hash list whose hash is calculated by
the saidx to find an sah quickly.
The hash list of the sahlits is used in FreeBSD, too.
reviewed by ozaki-r@n.o, thanks.
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.
We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.
This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.
key_prefered_oldsa flag can change the sa to use if an sah
has multiple sav. However the multiple saves whose protocol
is ah, esp, or tcp cannot exist because their duplications
are checked by the spi value. Although the multiple saves
can exist in the case of ipcomp, the values using in the
post processing are same between the saves.
For those reasons, it is no need to select an sav by its
lifetime.
In addition, FreeBSD has already remove this.
reviewed by ozaki-r@n.o, thanks.
Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
This modification reduces packet loss of fragmented packets on a
network where reordering occurs.
Alghough this modification has been applied, IPv4 ID is not set for
the packet smaller then IP_MINFRAGSIZE. According to RFC 6864, that
must not cause problems.
XXX pullup-8
since we used only ipsec_copy_pcbpolicy, and it was a no-op.
Originally we were using ipsec_copy_policy to optimize the IPsec-PCB
cache: when an ACK was received in response to a SYN, we used to copy the
SP cached in the SYN's PCB into the ACK's PCB, so that
ipsec_getpolicybysock could use the cached SP instead of requerying it.
Then we switched to ipsec_copy_pcbpolicy which has always been a no-op. As
a result the SP cached in the SYN was/is not copied in the ACK, and the
first call to ipsec_getpolicybysock had to query the SP and cache it
itself. It's not totally clear to me why this change was made.
But it has been this way for years, and after a conversation with Ryota
Ozaki it turns out the optimization is not valid anymore due to
MP-ification, so it won't be re-enabled.
ok ozaki-r@
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
We don't make them percpu(9) directly because the structure is exposed to
userland and we don't want to break ABI. So we add another member variable
for percpu(9) and use it internally. When we export them to userland, they
are converted to the original format.
ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy
The already-existing ipsec_get_policy() function is inlined in the new
one.
ipsec4_set_policy and ipsec6_set_policy
ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy
No real functional change.