diff mbox

NAT stops forwarding ACKs after PMTU discovery

Message ID 1376870592.4226.27.camel@edumazet-glaptop
State Superseded
Headers show

Commit Message

Eric Dumazet Aug. 19, 2013, 12:03 a.m. UTC
On Sun, 2013-08-18 at 17:00 -0700, Eric Dumazet wrote:

> Code like this seems very suspect to me :
> 
> before(sack, receiver->td_end + 1)
> 

My suggestion would be to try :



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Corey Hickey Aug. 19, 2013, 8:43 a.m. UTC | #1
On 2013-08-18 17:03, Eric Dumazet wrote:
> On Sun, 2013-08-18 at 17:00 -0700, Eric Dumazet wrote:
> 
>> Code like this seems very suspect to me :
>>
>> before(sack, receiver->td_end + 1)
>>
> 
> My suggestion would be to try :
> 
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c

[...]

Thanks for all your suggestions--I really wasn't expecting so much on a
weekend. Here's all the data I have for tonight.


I tried the linux-next kernel, then linux-next with your patch applied.
Neither of them fix the problem, unfortunately. I have taken tcpdumps
for a working SSH and a failing SSH.

http://fatooh.org/files/tmp/linux-next-patch1.tar.bz2

[localhost]
sudo tcpdump -ni br0 -s 0 -w /tmp/local.pcap 'host 10.15.24.13 or icmp'

[router eth0]
tcpdump -ni eth0 -s 0 -w /tmp/eth0.pcap \
    'host 10.15.24.13 or (icmp and host not 69.78.33.132)'
* the exclusion here is just to remove some unrelated clutter

[router tun0]
tcpdump -ni tun0 -s 0 -w /tmp/tun0.pcap -s 0 'host 10.15.24.13 or icmp'

[remote]
tcpdump -ni eth0 -s 0 -w remote.pcap 'host 192.168.61.56'


Some notes:

1. I tested the new kernels only on the Linux router, assuming that is
where it was intended.

2. I take back what I wrote earlier about every connection that involves
PMTU discovery failed; I may have been observing this wrong. For now,
the situation is that some connections stop forwarding packets from the
remote host immediately after the retransmit, while other work fine.

3. From local.pcap, you can see that my localhost doesn't actually
transmit a large packet, yet the router's eth0 sees a large packet come
in. I think this is due to TSO, but I'm not completely sure.

4. For some reason, I cannot reproduce this when SSHing to a host at
work that is running Debian sid with 3.10-1-amd64, but I can reproduce
it when SSHing to hosts running Centos 6.4 with
2.6.32-358.6.1.el6.x86_64 (which surely has a ton of patches applied,
for whatever that's worth).

5. I have only a vague understanding of SACK; I will be reading up on
this soon. I will also look into packetdrill for reproducing the
problem, if the SSH results aren't good enough.

6. If I reduce the MTU on localhost to match the path MTU, the problem
does go away.

Thanks again for all the help,
Corey
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Aug. 19, 2013, 12:33 p.m. UTC | #2
Hello,

I would say, the problem is due to a sequence-number rewriting
middlebox, who does not correctly handle the SACK-blocks.

In remote.pcap, you see:
Packet#10: A Dup-ACK: ACK 1005503816, SACK: 1005505184-1005505648
The SACK actually covers Packet#9

In tun0.pcap, you have:
Packet#9: The one that is SACK'ed: SEQ: 3514869772
Packet#11: The Dup-ACK: ACK: 3514868404, SACK: 3570452498-3570452962

You can see that the SACK-block is not really aligned with the ACK-numbers.

Netfilter is probably dropping the Dup-ACK, because the SACK-block is
acknowledging unseen data.


There are middleboxes out there that randomize the sequence numbers, due to
an old bug in Windows, where the initial sequence number was not
sufficiently randomized. There are some of these middleboxes, who simply do
not support SACK, thus modify only the sequence numbers of the TCP-header
(https://supportforums.cisco.com/docs/DOC-12668#TCP_Sequence_Number_Randomization_and_SACK).

This results in a TCP retransmission timeout on the sender, because of
the invalid SACK-blocks, the duplicate ACKs are not accounted. This
completly kills the performance, as you can see in our presentation given at
IETF87: http://tools.ietf.org/agenda/87/slides/slides-87-tcpm-11.pdf

We have a patch that accounts DUP-ACKs with invalid SACK-blocks effectively
as duplicate acknowledgments. I can send the patches, if the
netdev-community is interested in accepting these upstream.


Cheers,
Christoph



On 19/08/13 - 01:43:18, Corey Hickey wrote:
> On 2013-08-18 17:03, Eric Dumazet wrote:
> > On Sun, 2013-08-18 at 17:00 -0700, Eric Dumazet wrote:
> > 
> >> Code like this seems very suspect to me :
> >>
> >> before(sack, receiver->td_end + 1)
> >>
> > 
> > My suggestion would be to try :
> > 
> > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> 
> [...]
> 
> Thanks for all your suggestions--I really wasn't expecting so much on a
> weekend. Here's all the data I have for tonight.
> 
> 
> I tried the linux-next kernel, then linux-next with your patch applied.
> Neither of them fix the problem, unfortunately. I have taken tcpdumps
> for a working SSH and a failing SSH.
> 
> http://fatooh.org/files/tmp/linux-next-patch1.tar.bz2
> 
> [localhost]
> sudo tcpdump -ni br0 -s 0 -w /tmp/local.pcap 'host 10.15.24.13 or icmp'
> 
> [router eth0]
> tcpdump -ni eth0 -s 0 -w /tmp/eth0.pcap \
>     'host 10.15.24.13 or (icmp and host not 69.78.33.132)'
> * the exclusion here is just to remove some unrelated clutter
> 
> [router tun0]
> tcpdump -ni tun0 -s 0 -w /tmp/tun0.pcap -s 0 'host 10.15.24.13 or icmp'
> 
> [remote]
> tcpdump -ni eth0 -s 0 -w remote.pcap 'host 192.168.61.56'
> 
> 
> Some notes:
> 
> 1. I tested the new kernels only on the Linux router, assuming that is
> where it was intended.
> 
> 2. I take back what I wrote earlier about every connection that involves
> PMTU discovery failed; I may have been observing this wrong. For now,
> the situation is that some connections stop forwarding packets from the
> remote host immediately after the retransmit, while other work fine.
> 
> 3. From local.pcap, you can see that my localhost doesn't actually
> transmit a large packet, yet the router's eth0 sees a large packet come
> in. I think this is due to TSO, but I'm not completely sure.
> 
> 4. For some reason, I cannot reproduce this when SSHing to a host at
> work that is running Debian sid with 3.10-1-amd64, but I can reproduce
> it when SSHing to hosts running Centos 6.4 with
> 2.6.32-358.6.1.el6.x86_64 (which surely has a ton of patches applied,
> for whatever that's worth).
> 
> 5. I have only a vague understanding of SACK; I will be reading up on
> this soon. I will also look into packetdrill for reproducing the
> problem, if the SSH results aren't good enough.
> 
> 6. If I reduce the MTU on localhost to match the path MTU, the problem
> does go away.
> 
> Thanks again for all the help,
> Corey
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 19, 2013, 1:24 p.m. UTC | #3
On Mon, 2013-08-19 at 14:33 +0200, Christoph Paasch wrote:
> Hello,
> 
> I would say, the problem is due to a sequence-number rewriting
> middlebox, who does not correctly handle the SACK-blocks.
> 
> In remote.pcap, you see:
> Packet#10: A Dup-ACK: ACK 1005503816, SACK: 1005505184-1005505648
> The SACK actually covers Packet#9
> 
> In tun0.pcap, you have:
> Packet#9: The one that is SACK'ed: SEQ: 3514869772
> Packet#11: The Dup-ACK: ACK: 3514868404, SACK: 3570452498-3570452962
> 
> You can see that the SACK-block is not really aligned with the ACK-numbers.
> 
> Netfilter is probably dropping the Dup-ACK, because the SACK-block is
> acknowledging unseen data.
> 
> 
> There are middleboxes out there that randomize the sequence numbers, due to
> an old bug in Windows, where the initial sequence number was not
> sufficiently randomized. There are some of these middleboxes, who simply do
> not support SACK, thus modify only the sequence numbers of the TCP-header
> (https://supportforums.cisco.com/docs/DOC-12668#TCP_Sequence_Number_Randomization_and_SACK).
> 
> This results in a TCP retransmission timeout on the sender, because of
> the invalid SACK-blocks, the duplicate ACKs are not accounted. This
> completly kills the performance, as you can see in our presentation given at
> IETF87: http://tools.ietf.org/agenda/87/slides/slides-87-tcpm-11.pdf
> 
> We have a patch that accounts DUP-ACKs with invalid SACK-blocks effectively
> as duplicate acknowledgments. I can send the patches, if the
> netdev-community is interested in accepting these upstream.
> 

You mean a netfilter patch, a tcp patch, or both ?



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Aug. 19, 2013, 1:49 p.m. UTC | #4
On 19/08/13 - 06:24:17, Eric Dumazet wrote:
> On Mon, 2013-08-19 at 14:33 +0200, Christoph Paasch wrote:
> > Hello,
> > 
> > I would say, the problem is due to a sequence-number rewriting
> > middlebox, who does not correctly handle the SACK-blocks.
> > 
> > In remote.pcap, you see:
> > Packet#10: A Dup-ACK: ACK 1005503816, SACK: 1005505184-1005505648
> > The SACK actually covers Packet#9
> > 
> > In tun0.pcap, you have:
> > Packet#9: The one that is SACK'ed: SEQ: 3514869772
> > Packet#11: The Dup-ACK: ACK: 3514868404, SACK: 3570452498-3570452962
> > 
> > You can see that the SACK-block is not really aligned with the ACK-numbers.
> > 
> > Netfilter is probably dropping the Dup-ACK, because the SACK-block is
> > acknowledging unseen data.
> > 
> > 
> > There are middleboxes out there that randomize the sequence numbers, due to
> > an old bug in Windows, where the initial sequence number was not
> > sufficiently randomized. There are some of these middleboxes, who simply do
> > not support SACK, thus modify only the sequence numbers of the TCP-header
> > (https://supportforums.cisco.com/docs/DOC-12668#TCP_Sequence_Number_Randomization_and_SACK).
> > 
> > This results in a TCP retransmission timeout on the sender, because of
> > the invalid SACK-blocks, the duplicate ACKs are not accounted. This
> > completly kills the performance, as you can see in our presentation given at
> > IETF87: http://tools.ietf.org/agenda/87/slides/slides-87-tcpm-11.pdf
> > 
> > We have a patch that accounts DUP-ACKs with invalid SACK-blocks effectively
> > as duplicate acknowledgments. I can send the patches, if the
> > netdev-community is interested in accepting these upstream.
> > 
> 
> You mean a netfilter patch, a tcp patch, or both ?

It's a TCP-patch, that interprets duplicate-acks with invalid SACK-blocks as
duplicate acks in tcp_sock->sacked_out.


Christoph

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 19, 2013, 1:58 p.m. UTC | #5
On Mon, 2013-08-19 at 15:49 +0200, Christoph Paasch wrote:

> 
> It's a TCP-patch, that interprets duplicate-acks with invalid SACK-blocks as
> duplicate acks in tcp_sock->sacked_out.

Yeah, but here, this is conntrack who is blocking the thing.

TCP receiver has no chance to 'fix' it.

See conntrack is one of those buggy middle box as well.

So if you want to properly handle this mess, you'll also have to fix
conntrack.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phil Oester Aug. 19, 2013, 2:35 p.m. UTC | #6
On Mon, Aug 19, 2013 at 06:58:05AM -0700, Eric Dumazet wrote:
> Yeah, but here, this is conntrack who is blocking the thing.
> 
> TCP receiver has no chance to 'fix' it.
> 
> See conntrack is one of those buggy middle box as well.
> 
> So if you want to properly handle this mess, you'll also have to fix
> conntrack.

Better to fix the box which is randomizing the acks and ignoring
the SACKs.  Usually these are older Cisco PIX/ASA devices which just
need a code upgrade.  

Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Aug. 19, 2013, 2:43 p.m. UTC | #7
On 19/08/13 - 06:58:05, Eric Dumazet wrote:
> On Mon, 2013-08-19 at 15:49 +0200, Christoph Paasch wrote:
> > It's a TCP-patch, that interprets duplicate-acks with invalid SACK-blocks as
> > duplicate acks in tcp_sock->sacked_out.
> 
> Yeah, but here, this is conntrack who is blocking the thing.

Yes, of course. I know that here conntrack is blocking the dup-ACKs.

Sorry for having added TCP-stack specific things in a conntrack-thread.
I just wanted to highlight that the TCP-stack has also problems with
sequence-number rewriting middleboxes.

> TCP receiver has no chance to 'fix' it.
> 
> See conntrack is one of those buggy middle box as well.
> 
> So if you want to properly handle this mess, you'll also have to fix
> conntrack.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 19, 2013, 3:32 p.m. UTC | #8
On Mon, 2013-08-19 at 07:35 -0700, Phil Oester wrote:
> On Mon, Aug 19, 2013 at 06:58:05AM -0700, Eric Dumazet wrote:
> > Yeah, but here, this is conntrack who is blocking the thing.
> > 
> > TCP receiver has no chance to 'fix' it.
> > 
> > See conntrack is one of those buggy middle box as well.
> > 
> > So if you want to properly handle this mess, you'll also have to fix
> > conntrack.
> 
> Better to fix the box which is randomizing the acks and ignoring
> the SACKs.  Usually these are older Cisco PIX/ASA devices which just
> need a code upgrade.  


Having a patch to relax things could be evaluated, if not adding new
problems ( http://en.wikipedia.org/wiki/Robustness_principle ) 



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Aug. 19, 2013, 3:33 p.m. UTC | #9
On 19/08/13 - 07:35:09, Phil Oester wrote:
> On Mon, Aug 19, 2013 at 06:58:05AM -0700, Eric Dumazet wrote:
> > Yeah, but here, this is conntrack who is blocking the thing.
> > 
> > TCP receiver has no chance to 'fix' it.
> > 
> > See conntrack is one of those buggy middle box as well.
> > 
> > So if you want to properly handle this mess, you'll also have to fix
> > conntrack.
> 
> Better to fix the box which is randomizing the acks and ignoring
> the SACKs.  Usually these are older Cisco PIX/ASA devices which just
> need a code upgrade.  

I agree, the problem are these horrid middleboxes...

Unfortunately, they will hardly go away in the near futur. Rather the
opposite is the case.


If you have a public server running, I would be interested in the count of
invalid SACK-blocks received (netstat -s | grep TCPSACKDiscard). This is an
indication for such kind of middlebox between your server and the client,
implying that these connections cannot benefit from TCP-FastRetransmission
and each packet-loss will require an RTO to recover.


Cheers,
Christoph

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 19, 2013, 4 p.m. UTC | #10
On Mon, 2013-08-19 at 17:33 +0200, Christoph Paasch wrote:

> 
> Unfortunately, they will hardly go away in the near futur. Rather the
> opposite is the case.
> 
> 
> If you have a public server running, I would be interested in the count of
> invalid SACK-blocks received (netstat -s | grep TCPSACKDiscard). This is an
> indication for such kind of middlebox between your server and the client,
> implying that these connections cannot benefit from TCP-FastRetransmission
> and each packet-loss will require an RTO to recover.
> 

If the (random) sequence offset is small rather than completely out of
window, it's going to be hard to detect all problems.

Show us your patch ;)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Aug. 19, 2013, 5:15 p.m. UTC | #11
On 19/08/13 - 09:00:31, Eric Dumazet wrote:
> On Mon, 2013-08-19 at 17:33 +0200, Christoph Paasch wrote:
> > Unfortunately, they will hardly go away in the near futur. Rather the
> > opposite is the case.
> > 
> > 
> > If you have a public server running, I would be interested in the count of
> > invalid SACK-blocks received (netstat -s | grep TCPSACKDiscard). This is an
> > indication for such kind of middlebox between your server and the client,
> > implying that these connections cannot benefit from TCP-FastRetransmission
> > and each packet-loss will require an RTO to recover.
> > 
> 
> If the (random) sequence offset is small rather than completely out of
> window, it's going to be hard to detect all problems.

Yes, it is not possible to make it 100% perfect. But considering the size of
the seq-space, the probability is rather low that the SACK-block falls
in-window.

> Show us your patch ;)

Will send it soon. Have to rebase on net-next,... :)
(it's several months ago that I did this)


Christoph
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phil Oester Aug. 19, 2013, 6 p.m. UTC | #12
On Mon, Aug 19, 2013 at 07:15:19PM +0200, Christoph Paasch wrote:
> On 19/08/13 - 09:00:31, Eric Dumazet wrote:
> > On Mon, 2013-08-19 at 17:33 +0200, Christoph Paasch wrote:
> > > Unfortunately, they will hardly go away in the near futur. Rather the
> > > opposite is the case.
> > > 
> > > 
> > > If you have a public server running, I would be interested in the count of
> > > invalid SACK-blocks received (netstat -s | grep TCPSACKDiscard). This is an
> > > indication for such kind of middlebox between your server and the client,
> > > implying that these connections cannot benefit from TCP-FastRetransmission
> > > and each packet-loss will require an RTO to recover.
> > > 
> > 
> > If the (random) sequence offset is small rather than completely out of
> > window, it's going to be hard to detect all problems.

It is not small unfortunately.  The bug is that with ISN randomization enabled
the PIX leaves the SACK sequence numbers untouched.  For reference, the cisco
bug is CSCse14419.

> Yes, it is not possible to make it 100% perfect. But considering the size of
> the seq-space, the probability is rather low that the SACK-block falls
> in-window.

s/rather low/nil/

> > Show us your patch ;)

Or just disable SACK on the box behind the problematic PIX.

Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 19, 2013, 6:10 p.m. UTC | #13
On Mon, 2013-08-19 at 19:15 +0200, Christoph Paasch wrote:

> 
> Will send it soon. Have to rebase on net-next,... :)
> (it's several months ago that I did this)

No hurry.

This is a very unlikely problem anyway.

Trace taken on a random Google server :

TcpExtTCPSACKDiscard 44666
TcpExtTCPSACKReneging 173593
TcpExtTCPSACKReorder 1331306
TcpExtTCPSackFailures 791292
TcpExtTCPSackMerged 261702691
TcpExtTCPSackRecovery 57267494
TcpExtTCPSackRecoveryFail 2008511
TcpExtTCPSackShiftFallback 841017416
TcpExtTCPSackShifted 619337144
...
TcpOutSegs 42215528084
TcpRetransSegs 441868477


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Corey Hickey Aug. 19, 2013, 6:22 p.m. UTC | #14
On 2013-08-19 05:33, Christoph Paasch wrote:
> Hello,
> 
> I would say, the problem is due to a sequence-number rewriting
> middlebox, who does not correctly handle the SACK-blocks.
> 
> In remote.pcap, you see:
> Packet#10: A Dup-ACK: ACK 1005503816, SACK: 1005505184-1005505648
> The SACK actually covers Packet#9
> 
> In tun0.pcap, you have:
> Packet#9: The one that is SACK'ed: SEQ: 3514869772
> Packet#11: The Dup-ACK: ACK: 3514868404, SACK: 3570452498-3570452962
> 
> You can see that the SACK-block is not really aligned with the ACK-numbers.
> 
> Netfilter is probably dropping the Dup-ACK, because the SACK-block is
> acknowledging unseen data.
> 
> 
> There are middleboxes out there that randomize the sequence numbers, due to
> an old bug in Windows, where the initial sequence number was not
> sufficiently randomized. There are some of these middleboxes, who simply do
> not support SACK, thus modify only the sequence numbers of the TCP-header
> (https://supportforums.cisco.com/docs/DOC-12668#TCP_Sequence_Number_Randomization_and_SACK).
> 
> This results in a TCP retransmission timeout on the sender, because of
> the invalid SACK-blocks, the duplicate ACKs are not accounted. This
> completly kills the performance, as you can see in our presentation given at
> IETF87: http://tools.ietf.org/agenda/87/slides/slides-87-tcpm-11.pdf

This makes good sense. I normally look at these in wireshark with
relative sequence numbers turned on, and I see in bad/tun0.pcap that the
SACK values are very high, but are normal in bad/remote.pcap.

I see the same thing in good/tun0.pcap, though, so I don't fully
understand what is making it work sometimes and not others.

I will show this thread and the Cisco docs to the network engineer at
work, and maybe we can get the SEQ randomization turned off (or at least
test it).

> We have a patch that accounts DUP-ACKs with invalid SACK-blocks effectively
> as duplicate acknowledgments. I can send the patches, if the
> netdev-community is interested in accepting these upstream.

I'll keep my eye out and test them if they come up.


Thanks,
Corey
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jozsef Kadlecsik Aug. 19, 2013, 8:13 p.m. UTC | #15
On Mon, 19 Aug 2013, Eric Dumazet wrote:

> On Mon, 2013-08-19 at 15:49 +0200, Christoph Paasch wrote:
>
> > It's a TCP-patch, that interprets duplicate-acks with invalid SACK-blocks as
> > duplicate acks in tcp_sock->sacked_out.
> 
> Yeah, but here, this is conntrack who is blocking the thing.
> 
> TCP receiver has no chance to 'fix' it.
> 
> See conntrack is one of those buggy middle box as well.
> 
> So if you want to properly handle this mess, you'll also have to fix
> conntrack.

I beg you pardon: why conntrack should be relaxed, when it is expected
to do more strict TCP checkings (RFC5961, Section 5.).

Also, it's clearly a broken middle box. Don't shoot the messenger.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Aug. 19, 2013, 8:43 p.m. UTC | #16
On 19/08/13 - 22:13:59, Jozsef Kadlecsik wrote:
> On Mon, 19 Aug 2013, Eric Dumazet wrote:
> 
> > On Mon, 2013-08-19 at 15:49 +0200, Christoph Paasch wrote:
> >
> > > It's a TCP-patch, that interprets duplicate-acks with invalid SACK-blocks as
> > > duplicate acks in tcp_sock->sacked_out.
> > 
> > Yeah, but here, this is conntrack who is blocking the thing.
> > 
> > TCP receiver has no chance to 'fix' it.
> > 
> > See conntrack is one of those buggy middle box as well.
> > 
> > So if you want to properly handle this mess, you'll also have to fix
> > conntrack.
> 
> I beg you pardon: why conntrack should be relaxed, when it is expected
> to do more strict TCP checkings (RFC5961, Section 5.).

There is no mention of SACK in this RFC.
The duplicate ACKs with invalid SACK-blocks are valid with respect to
RFC5961, Section 5.

Actually, no RFC says that dup-ACKs with invalid SACK-blocks should
be discarded.


Cheers,
Christoph

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 19, 2013, 9:08 p.m. UTC | #17
On Mon, 2013-08-19 at 22:13 +0200, Jozsef Kadlecsik wrote:
> On Mon, 19 Aug 2013, Eric Dumazet wrote:
> 
> > On Mon, 2013-08-19 at 15:49 +0200, Christoph Paasch wrote:
> >
> > > It's a TCP-patch, that interprets duplicate-acks with invalid SACK-blocks as
> > > duplicate acks in tcp_sock->sacked_out.
> > 
> > Yeah, but here, this is conntrack who is blocking the thing.
> > 
> > TCP receiver has no chance to 'fix' it.
> > 
> > See conntrack is one of those buggy middle box as well.
> > 
> > So if you want to properly handle this mess, you'll also have to fix
> > conntrack.
> 
> I beg you pardon: why conntrack should be relaxed, when it is expected
> to do more strict TCP checkings (RFC5961, Section 5.).
> 
> Also, it's clearly a broken middle box. Don't shoot the messenger.

Frames are dropped by conntrack, before TCP receiver can even have a
choice.

So Christoph patch would be of no use for Corey.

I do not think I shot anyone, only stated the truth.

We have workarounds in our stack to 'fix' bugs from others, there
is no shame in this.

Glad to see you are interested in RFC 5961 support, as conntrack is
known to break the ACK challenges in response to RST messages (section
3)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 2f80107..1862902 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -656,12 +656,12 @@  static bool tcp_in_window(const struct nf_conn *ct,
 	pr_debug("tcp_in_window: I=%i II=%i III=%i IV=%i\n",
 		 before(seq, sender->td_maxend + 1),
 		 (in_recv_win ? 1 : 0),
-		 before(sack, receiver->td_end + 1),
+		 before(sack, receiver->td_end + MAXACKWINDOW(sender) + 1),
 		 after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1));
 
 	if (before(seq, sender->td_maxend + 1) &&
 	    in_recv_win &&
-	    before(sack, receiver->td_end + 1) &&
+	    before(sack, receiver->td_end + MAXACKWINDOW(sender) + 1) &&
 	    after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) {
 		/*
 		 * Take into account window scaling (RFC 1323).