syn_cache_unreach() should remove the entry, or just continue on.
Algorithm is to only remove the entry if we've had more than one unreach
error and have retransmitted 3 or more times. This prevents the following
scenario, as noted in PR #5909 (PR from Ty Sarna, scenario from
Charles Hannum):
* Host A sends a SYN.
* Host A retransmits the SYN.
* Host B gets the first SYN and sends a SYN-ACK.
* Host B gets the second SYN and sends a SYN-ACK.
* One of the SYN-ACK bounces with an
ICMP unreachable, causing the `SYN cache' entry to be
removed with no notification.
* Host A receives the other SYN-ACK, sends an ACK, and goes to
ESTABLISHED state.
Should fix PR #5909.
- Don't use home-grown queue manipulation. Use <sys/queue.h> instead. The
data structures are a little larger, but we are otherwise wasting the
memory chunk anyway (we're already a 64-byte malloc bucket).
- Fix a bug in the cache-is-full case: if the oldest element removed from
the first non-empty bucket was the only element in the bucket, the
bucket wouldn't be removed from the bucket cache, causing queue corruption
later.
- Optimize the syn cache timers by using PRT timers rather than home-grown
decrement-and-propagate timers.
This code is now a fair bit smaller, and significantly easier to read
and understand.
rule was to update the timestamp if the sequence numbers are in range. New
rule adds a check that the timestamp is advancing, thus preventing our notion
of the most recent timestamp from incorrectly moving backwards.
TCP connections by using the MTU of the interface. Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
to ACK immediately any packet that arrived with PSH set. This breaks
delayed ACKs in a few specific common cases that delayed ACKs were
supposed to help, and ends up not making much (if any) difference in
the case where where this ACK-on-PSH change was supposed to help.
Per discussion with several members of the TCPIMPL and TCPSAT IETF
working groups.
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
RTO estimation changes. Under some circumstances it would return a value
of 0, while the old Van Jacobson RTO code would return a minimum of 3.
This would result in 12 retransmissions, each 1 second apart.
This takes care of those instances, and ensures that t_rttmin is
used everywhere as a lower bound.
case. Sending an RST to ourselves is a little silly, considering that
we'll just attempt to remove a non-existent compressed state entry and
then drop the packet anyway.
socket:
- If we received a SYN,ACK, send an RST.
- If we received a SYN, and the connection attempt appears to come from
itself, send an RST, since it cannot possibly be valid.
- Don't overload t_maxseg. Previous behavior was to set it to the min
of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt
(for non-local networks). This breaks PMTU discovery running on
either host. Instead, remember the MSS we advertise, and use it
as appropriate (in silly window avoidance).
- Per last bullet, split tcp_mss() into several functions for handling
MSS (ours and peer's), and performing various tasks when a connection
becomes ESTABLISHED.
- Introduce a new function, tcp_segsize(), which computes the max size
for every segment transmitted in tcp_output(). This will eventually
be used to hook in PMTU discovery.
which happen to have a TCB in TIME_WAIT, where an mbuf which had been
advanced past the IP+TCP headers and TCP options would be reused as if
it had not been advanced. Problem found by Juergen Hannken-Illjes, who
also suggested a work-around on which this fix is based.
fixed in FreeBSD by John Polstra:
Fix a bug (apparently very old) that can cause a TCP connection to
be dropped when it has an unusual traffic pattern. For full details
as well as a test case that demonstrates the failure, see the
referenced PR (FreeBSD's kern/3998).
Under certain circumstances involving the persist state, it is
possible for the receive side's tp->rcv_nxt to advance beyond its
tp->rcv_adv. This causes (tp->rcv_adv - tp->rcv_nxt) to become
negative. However, in the code affected by this fix, that difference
was interpreted as an unsigned number by max(). Since it was
negative, it was taken as a huge unsigned number. The effect was
to cause the receiver to believe that its receive window had negative
size, thereby rejecting all received segments including ACKs. As
the test case shows, this led to fruitless retransmissions and
eventually to a dropped connection. Even connections using the
loopback interface could be dropped. The fix substitutes the signed
imax() for the unsigned max() function.
Bill informs me that his research indicates this bug appeared in Reno.