mbox series

[stable,4.4,0/9] fix SegmentSmack in stable branch (CVE-2018-5390)

Message ID 1534387810-121428-1-git-send-email-maowenan@huawei.com
Headers show
Series fix SegmentSmack in stable branch (CVE-2018-5390) | expand

Message

maowenan Aug. 16, 2018, 2:50 a.m. UTC
There are five patches to fix CVE-2018-5390 in latest mainline 
branch, but only two patches exist in stable 4.4 and 3.18: 
dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
I have tested with stable 4.4 kernel, and found the cpu usage was very high.
So I think only two patches can't fix the CVE-2018-5390.
test results:
with fix patch:     78.2%   ksoftirqd
withoutfix patch:   90%     ksoftirqd

Then I try to imitate 72cd43ba(tcp: free batches of packets in tcp_prune_ofo_queue())
to drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks with simple queue 
instead of RB tree. The result is not very well.
 
After analysing the codes of stable 4.4, and debuging the 
system, shows that search of ofo_queue(tcp ofo using a simple queue) cost more cycles.

So I try to backport "tcp: use an RB tree for ooo receive queue" using RB tree 
instead of simple queue, then backport Eric Dumazet 5 fixed patches in mainline,
good news is that ksoftirqd is turn to about 20%, which is the same with mainline now.

Stable 4.4 have already back port two patches, 
f4a3313d(tcp: avoid collapses in tcp_prune_queue() if possible)
3d4bf93a(tcp: detect malicious patterns in tcp_collapse_ofo_queue())
If we want to change simple queue to RB tree to finally resolve, we should apply previous 
patch 9f5afeae(tcp: use an RB tree for ooo receive queue.) firstly, but 9f5afeae have many 
conflicts with 3d4bf93a and f4a3313d, which are part of patch series from Eric in 
mainline to fix CVE-2018-5390, so I need revert part of patches in stable 4.4 firstly, 
then apply 9f5afeae, and reapply five patches from Eric.

Eric Dumazet (6):
  tcp: increment sk_drops for dropped rx packets
  tcp: free batches of packets in tcp_prune_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: add tcp_ooo_try_coalesce() helper

Mao Wenan (2):
  Revert "tcp: detect malicious patterns in tcp_collapse_ofo_queue()"
  Revert "tcp: avoid collapses in tcp_prune_queue() if possible"

Yaogong Wang (1):
  tcp: use an RB tree for ooo receive queue

 include/linux/skbuff.h   |   8 +
 include/linux/tcp.h      |   7 +-
 include/net/sock.h       |   7 +
 include/net/tcp.h        |   2 +-
 net/core/skbuff.c        |  19 +++
 net/ipv4/tcp.c           |   4 +-
 net/ipv4/tcp_input.c     | 412 +++++++++++++++++++++++++++++------------------
 net/ipv4/tcp_ipv4.c      |   3 +-
 net/ipv4/tcp_minisocks.c |   1 -
 net/ipv6/tcp_ipv6.c      |   1 +
 10 files changed, 294 insertions(+), 170 deletions(-)

Comments

Michal Kubecek Aug. 16, 2018, 6:16 a.m. UTC | #1
On Thu, Aug 16, 2018 at 10:50:01AM +0800, Mao Wenan wrote:
> There are five patches to fix CVE-2018-5390 in latest mainline 
> branch, but only two patches exist in stable 4.4 and 3.18: 
> dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
> 5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
> I have tested with stable 4.4 kernel, and found the cpu usage was very high.
> So I think only two patches can't fix the CVE-2018-5390.
> test results:
> with fix patch:     78.2%   ksoftirqd
> withoutfix patch:   90%     ksoftirqd
> 
> Then I try to imitate 72cd43ba(tcp: free batches of packets in tcp_prune_ofo_queue())
> to drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks with simple queue 
> instead of RB tree. The result is not very well.
>  
> After analysing the codes of stable 4.4, and debuging the 
> system, shows that search of ofo_queue(tcp ofo using a simple queue) cost more cycles.
> 
> So I try to backport "tcp: use an RB tree for ooo receive queue" using RB tree 
> instead of simple queue, then backport Eric Dumazet 5 fixed patches in mainline,
> good news is that ksoftirqd is turn to about 20%, which is the same with mainline now.
> 
> Stable 4.4 have already back port two patches, 
> f4a3313d(tcp: avoid collapses in tcp_prune_queue() if possible)
> 3d4bf93a(tcp: detect malicious patterns in tcp_collapse_ofo_queue())
> If we want to change simple queue to RB tree to finally resolve, we should apply previous 
> patch 9f5afeae(tcp: use an RB tree for ooo receive queue.) firstly, but 9f5afeae have many 
> conflicts with 3d4bf93a and f4a3313d, which are part of patch series from Eric in 
> mainline to fix CVE-2018-5390, so I need revert part of patches in stable 4.4 firstly, 
> then apply 9f5afeae, and reapply five patches from Eric.

There seems to be an obvious mistake in one of the backports. Could you
check the results with Takashi's follow-up fix submitted at

  http://lkml.kernel.org/r/20180815095846.7734-1-tiwai@suse.de

(I would try myself but you don't mention what test you ran.)

Michal Kubecek
maowenan Aug. 16, 2018, 6:42 a.m. UTC | #2
On 2018/8/16 14:16, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 10:50:01AM +0800, Mao Wenan wrote:
>> There are five patches to fix CVE-2018-5390 in latest mainline 
>> branch, but only two patches exist in stable 4.4 and 3.18: 
>> dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
>> 5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
>> I have tested with stable 4.4 kernel, and found the cpu usage was very high.
>> So I think only two patches can't fix the CVE-2018-5390.
>> test results:
>> with fix patch:     78.2%   ksoftirqd
>> withoutfix patch:   90%     ksoftirqd
>>
>> Then I try to imitate 72cd43ba(tcp: free batches of packets in tcp_prune_ofo_queue())
>> to drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks with simple queue 
>> instead of RB tree. The result is not very well.
>>  
>> After analysing the codes of stable 4.4, and debuging the 
>> system, shows that search of ofo_queue(tcp ofo using a simple queue) cost more cycles.
>>
>> So I try to backport "tcp: use an RB tree for ooo receive queue" using RB tree 
>> instead of simple queue, then backport Eric Dumazet 5 fixed patches in mainline,
>> good news is that ksoftirqd is turn to about 20%, which is the same with mainline now.
>>
>> Stable 4.4 have already back port two patches, 
>> f4a3313d(tcp: avoid collapses in tcp_prune_queue() if possible)
>> 3d4bf93a(tcp: detect malicious patterns in tcp_collapse_ofo_queue())
>> If we want to change simple queue to RB tree to finally resolve, we should apply previous 
>> patch 9f5afeae(tcp: use an RB tree for ooo receive queue.) firstly, but 9f5afeae have many 
>> conflicts with 3d4bf93a and f4a3313d, which are part of patch series from Eric in 
>> mainline to fix CVE-2018-5390, so I need revert part of patches in stable 4.4 firstly, 
>> then apply 9f5afeae, and reapply five patches from Eric.
> 
> There seems to be an obvious mistake in one of the backports. Could you
> check the results with Takashi's follow-up fix submitted at
> 
>   http://lkml.kernel.org/r/20180815095846.7734-1-tiwai@suse.de
> 
> (I would try myself but you don't mention what test you ran.)

I have backport RB tree in stable 4.4, function tcp_collapse_ofo_queue() has been refined, which keep the
same with mainline, so it seems no problem when apply Eric's patch 3d4bf93a(tcp: detect malicious patterns in tcp_collapse_ofo_queue()).

I also noticed that range_truesize != head->truesize will be always false which mentioned in your URL, but this only based on stable 4.4's codes,
If I applied RB tree's patch 9f5afeae(tcp: use an RB tree for ooo receive queue), and after apply 3d4bf93a,the codes should be
range_truesize += skb->truesize, and range_truesize != head->truesize can be true.

One POC programm(named smack-for-ficora) is to send large out of order packets to existed tcp link, and check the cpu usage of the system with top command.

> 
> Michal Kubecek
> 
> .
>
Michal Kubecek Aug. 16, 2018, 6:52 a.m. UTC | #3
On Thu, Aug 16, 2018 at 02:42:32PM +0800, maowenan wrote:
> On 2018/8/16 14:16, Michal Kubecek wrote:
> > On Thu, Aug 16, 2018 at 10:50:01AM +0800, Mao Wenan wrote:
> >> There are five patches to fix CVE-2018-5390 in latest mainline 
> >> branch, but only two patches exist in stable 4.4 and 3.18: 
> >> dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
> >> 5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
> >> I have tested with stable 4.4 kernel, and found the cpu usage was very high.
> >> So I think only two patches can't fix the CVE-2018-5390.
> >> test results:
> >> with fix patch:     78.2%   ksoftirqd
> >> withoutfix patch:   90%     ksoftirqd
> >>
> >> Then I try to imitate 72cd43ba(tcp: free batches of packets in tcp_prune_ofo_queue())
> >> to drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks with simple queue 
> >> instead of RB tree. The result is not very well.
> >>  
> >> After analysing the codes of stable 4.4, and debuging the 
> >> system, shows that search of ofo_queue(tcp ofo using a simple queue) cost more cycles.
> >>
> >> So I try to backport "tcp: use an RB tree for ooo receive queue" using RB tree 
> >> instead of simple queue, then backport Eric Dumazet 5 fixed patches in mainline,
> >> good news is that ksoftirqd is turn to about 20%, which is the same with mainline now.
> >>
> >> Stable 4.4 have already back port two patches, 
> >> f4a3313d(tcp: avoid collapses in tcp_prune_queue() if possible)
> >> 3d4bf93a(tcp: detect malicious patterns in tcp_collapse_ofo_queue())
> >> If we want to change simple queue to RB tree to finally resolve, we should apply previous 
> >> patch 9f5afeae(tcp: use an RB tree for ooo receive queue.) firstly, but 9f5afeae have many 
> >> conflicts with 3d4bf93a and f4a3313d, which are part of patch series from Eric in 
> >> mainline to fix CVE-2018-5390, so I need revert part of patches in stable 4.4 firstly, 
> >> then apply 9f5afeae, and reapply five patches from Eric.
> > 
> > There seems to be an obvious mistake in one of the backports. Could you
> > check the results with Takashi's follow-up fix submitted at
> > 
> >   http://lkml.kernel.org/r/20180815095846.7734-1-tiwai@suse.de
> > 
> > (I would try myself but you don't mention what test you ran.)
> 
> I have backport RB tree in stable 4.4, function
> tcp_collapse_ofo_queue() has been refined, which keep the same with
> mainline, so it seems no problem when apply Eric's patch 3d4bf93a(tcp:
> detect malicious patterns in tcp_collapse_ofo_queue()).
> 
> I also noticed that range_truesize != head->truesize will be always
> false which mentioned in your URL, but this only based on stable 4.4's
> codes, If I applied RB tree's patch 9f5afeae(tcp: use an RB tree for
> ooo receive queue), and after apply 3d4bf93a,the codes should be
> range_truesize += skb->truesize, and range_truesize != head->truesize
> can be true.

My point is that backporting all this into stable 4.4 is quite intrusive
so that if we can achieve similar results with a simple fix of an
obvious omission, it would be preferrable.

Michal Kubecek
maowenan Aug. 16, 2018, 7:19 a.m. UTC | #4
On 2018/8/16 14:52, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 02:42:32PM +0800, maowenan wrote:
>> On 2018/8/16 14:16, Michal Kubecek wrote:
>>> On Thu, Aug 16, 2018 at 10:50:01AM +0800, Mao Wenan wrote:
>>>> There are five patches to fix CVE-2018-5390 in latest mainline 
>>>> branch, but only two patches exist in stable 4.4 and 3.18: 
>>>> dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
>>>> 5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
>>>> I have tested with stable 4.4 kernel, and found the cpu usage was very high.
>>>> So I think only two patches can't fix the CVE-2018-5390.
>>>> test results:
>>>> with fix patch:     78.2%   ksoftirqd
>>>> withoutfix patch:   90%     ksoftirqd
>>>>
>>>> Then I try to imitate 72cd43ba(tcp: free batches of packets in tcp_prune_ofo_queue())
>>>> to drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks with simple queue 
>>>> instead of RB tree. The result is not very well.
>>>>  
>>>> After analysing the codes of stable 4.4, and debuging the 
>>>> system, shows that search of ofo_queue(tcp ofo using a simple queue) cost more cycles.
>>>>
>>>> So I try to backport "tcp: use an RB tree for ooo receive queue" using RB tree 
>>>> instead of simple queue, then backport Eric Dumazet 5 fixed patches in mainline,
>>>> good news is that ksoftirqd is turn to about 20%, which is the same with mainline now.
>>>>
>>>> Stable 4.4 have already back port two patches, 
>>>> f4a3313d(tcp: avoid collapses in tcp_prune_queue() if possible)
>>>> 3d4bf93a(tcp: detect malicious patterns in tcp_collapse_ofo_queue())
>>>> If we want to change simple queue to RB tree to finally resolve, we should apply previous 
>>>> patch 9f5afeae(tcp: use an RB tree for ooo receive queue.) firstly, but 9f5afeae have many 
>>>> conflicts with 3d4bf93a and f4a3313d, which are part of patch series from Eric in 
>>>> mainline to fix CVE-2018-5390, so I need revert part of patches in stable 4.4 firstly, 
>>>> then apply 9f5afeae, and reapply five patches from Eric.
>>>
>>> There seems to be an obvious mistake in one of the backports. Could you
>>> check the results with Takashi's follow-up fix submitted at
>>>
>>>   http://lkml.kernel.org/r/20180815095846.7734-1-tiwai@suse.de
>>>
>>> (I would try myself but you don't mention what test you ran.)
>>
>> I have backport RB tree in stable 4.4, function
>> tcp_collapse_ofo_queue() has been refined, which keep the same with
>> mainline, so it seems no problem when apply Eric's patch 3d4bf93a(tcp:
>> detect malicious patterns in tcp_collapse_ofo_queue()).
>>
>> I also noticed that range_truesize != head->truesize will be always
>> false which mentioned in your URL, but this only based on stable 4.4's
>> codes, If I applied RB tree's patch 9f5afeae(tcp: use an RB tree for
>> ooo receive queue), and after apply 3d4bf93a,the codes should be
>> range_truesize += skb->truesize, and range_truesize != head->truesize
>> can be true.
> 
> My point is that backporting all this into stable 4.4 is quite intrusive
> so that if we can achieve similar results with a simple fix of an
> obvious omission, it would be preferrable.

There are five patches in mainline to fix this CVE, only two patches have no effect on stable 4.4,
the important reason is 4.4 use simple queue but mainline use RB tree.

I have tried my best to use easy way to fix this with dropping packets 12.5%(or other value) based on simple queue,
but the result is not very well, so the RB tree is needed and tested result is my desire.

If we only back port two patches but they don't fix the issue, I think they don't make any sense.

> 
> Michal Kubecek
> 
> .
>
Michal Kubecek Aug. 16, 2018, 7:23 a.m. UTC | #5
On Thu, Aug 16, 2018 at 03:19:12PM +0800, maowenan wrote:
> On 2018/8/16 14:52, Michal Kubecek wrote:
> > 
> > My point is that backporting all this into stable 4.4 is quite intrusive
> > so that if we can achieve similar results with a simple fix of an
> > obvious omission, it would be preferrable.
> 
> There are five patches in mainline to fix this CVE, only two patches
> have no effect on stable 4.4, the important reason is 4.4 use simple
> queue but mainline use RB tree.
> 
> I have tried my best to use easy way to fix this with dropping packets
> 12.5%(or other value) based on simple queue, but the result is not
> very well, so the RB tree is needed and tested result is my desire.
> 
> If we only back port two patches but they don't fix the issue, I think
> they don't make any sense.

There is an obvious omission in one of the two patches and Takashi's
patch fixes it. If his follow-up fix (applied on top of what is in
stable 4.4 now) addresses the problem, I would certainly prefer using it
over backporting the whole series.

Michal Kubecek
maowenan Aug. 16, 2018, 7:39 a.m. UTC | #6
On 2018/8/16 15:23, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 03:19:12PM +0800, maowenan wrote:
>> On 2018/8/16 14:52, Michal Kubecek wrote:
>>>
>>> My point is that backporting all this into stable 4.4 is quite intrusive
>>> so that if we can achieve similar results with a simple fix of an
>>> obvious omission, it would be preferrable.
>>
>> There are five patches in mainline to fix this CVE, only two patches
>> have no effect on stable 4.4, the important reason is 4.4 use simple
>> queue but mainline use RB tree.
>>
>> I have tried my best to use easy way to fix this with dropping packets
>> 12.5%(or other value) based on simple queue, but the result is not
>> very well, so the RB tree is needed and tested result is my desire.
>>
>> If we only back port two patches but they don't fix the issue, I think
>> they don't make any sense.
> 
> There is an obvious omission in one of the two patches and Takashi's
> patch fixes it. If his follow-up fix (applied on top of what is in
> stable 4.4 now) addresses the problem, I would certainly prefer using it
> over backporting the whole series.

Do you mean below codes from Takashi can fix this CVE?
But I have already tested like this two days ago, it is not good effect.

Could you try to test with POC programme mentioned previous mail in case I made mistake?

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4a261e078082..9c4c6cd0316e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4835,6 +4835,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
 			end = TCP_SKB_CB(skb)->end_seq;
 			range_truesize = skb->truesize;
 		} else {
+			range_truesize += skb->truesize;
 			if (before(TCP_SKB_CB(skb)->seq, start))
 				start = TCP_SKB_CB(skb)->seq;
 			if (after(TCP_SKB_CB(skb)->end_seq, end))
Michal Kubecek Aug. 16, 2018, 7:44 a.m. UTC | #7
On Thu, Aug 16, 2018 at 03:39:14PM +0800, maowenan wrote:
> 
> 
> On 2018/8/16 15:23, Michal Kubecek wrote:
> > On Thu, Aug 16, 2018 at 03:19:12PM +0800, maowenan wrote:
> >> On 2018/8/16 14:52, Michal Kubecek wrote:
> >>>
> >>> My point is that backporting all this into stable 4.4 is quite intrusive
> >>> so that if we can achieve similar results with a simple fix of an
> >>> obvious omission, it would be preferrable.
> >>
> >> There are five patches in mainline to fix this CVE, only two patches
> >> have no effect on stable 4.4, the important reason is 4.4 use simple
> >> queue but mainline use RB tree.
> >>
> >> I have tried my best to use easy way to fix this with dropping packets
> >> 12.5%(or other value) based on simple queue, but the result is not
> >> very well, so the RB tree is needed and tested result is my desire.
> >>
> >> If we only back port two patches but they don't fix the issue, I think
> >> they don't make any sense.
> > 
> > There is an obvious omission in one of the two patches and Takashi's
> > patch fixes it. If his follow-up fix (applied on top of what is in
> > stable 4.4 now) addresses the problem, I would certainly prefer using it
> > over backporting the whole series.
> 
> Do you mean below codes from Takashi can fix this CVE?
> But I have already tested like this two days ago, it is not good effect.

IIRC what you proposed was different, you proposed to replace the "=" in
the other branch by "+=".

Michal Kubecek


> 
> Could you try to test with POC programme mentioned previous mail in case I made mistake?
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4a261e078082..9c4c6cd0316e 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4835,6 +4835,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
>  			end = TCP_SKB_CB(skb)->end_seq;
>  			range_truesize = skb->truesize;
>  		} else {
> +			range_truesize += skb->truesize;
>  			if (before(TCP_SKB_CB(skb)->seq, start))
>  				start = TCP_SKB_CB(skb)->seq;
>  			if (after(TCP_SKB_CB(skb)->end_seq, end))
> -- 
> 
> 
> > 
> > Michal Kubecek
> > 
> > 
> > .
> > 
>
maowenan Aug. 16, 2018, 7:55 a.m. UTC | #8
On 2018/8/16 15:44, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 03:39:14PM +0800, maowenan wrote:
>>
>>
>> On 2018/8/16 15:23, Michal Kubecek wrote:
>>> On Thu, Aug 16, 2018 at 03:19:12PM +0800, maowenan wrote:
>>>> On 2018/8/16 14:52, Michal Kubecek wrote:
>>>>>
>>>>> My point is that backporting all this into stable 4.4 is quite intrusive
>>>>> so that if we can achieve similar results with a simple fix of an
>>>>> obvious omission, it would be preferrable.
>>>>
>>>> There are five patches in mainline to fix this CVE, only two patches
>>>> have no effect on stable 4.4, the important reason is 4.4 use simple
>>>> queue but mainline use RB tree.
>>>>
>>>> I have tried my best to use easy way to fix this with dropping packets
>>>> 12.5%(or other value) based on simple queue, but the result is not
>>>> very well, so the RB tree is needed and tested result is my desire.
>>>>
>>>> If we only back port two patches but they don't fix the issue, I think
>>>> they don't make any sense.
>>>
>>> There is an obvious omission in one of the two patches and Takashi's
>>> patch fixes it. If his follow-up fix (applied on top of what is in
>>> stable 4.4 now) addresses the problem, I would certainly prefer using it
>>> over backporting the whole series.
>>
>> Do you mean below codes from Takashi can fix this CVE?
>> But I have already tested like this two days ago, it is not good effect.
> 
> IIRC what you proposed was different, you proposed to replace the "=" in
> the other branch by "+=".

No, I think you don't get what I mean, I have already tested stable 4.4,
based on commit dc6ae4d, and change the codes like Takashi, which didn't
contain any codes I have sent in this patch series.

I suggest someone to test again based on Takashi.

dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
255924e tcp: do not delay ACK in DCTCP upon CE status change
0b1d40e tcp: do not cancel delay-AcK on DCTCP special ACK
17fea38e7 tcp: helpers to send special DCTCP ack
500e03f tcp: fix dctcp delayed ACK schedule
b04c9a0 rtnetlink: add rtnl_link_state check in rtnl_configure_link
73dad08 net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
48f41c0 ip: hash fragments consistently
54a634c MIPS: ath79: fix register address in ath79_ddr_wb_flush()
762b585 Linux 4.4.144


> 
> Michal Kubecek
> 
> 
>>
>> Could you try to test with POC programme mentioned previous mail in case I made mistake?
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 4a261e078082..9c4c6cd0316e 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -4835,6 +4835,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
>>  			end = TCP_SKB_CB(skb)->end_seq;
>>  			range_truesize = skb->truesize;
>>  		} else {
>> +			range_truesize += skb->truesize;
>>  			if (before(TCP_SKB_CB(skb)->seq, start))
>>  				start = TCP_SKB_CB(skb)->seq;
>>  			if (after(TCP_SKB_CB(skb)->end_seq, end))
>> -- 
>>
>>
>>>
>>> Michal Kubecek
>>>
>>>
>>> .
>>>
>>
> 
> .
>
Michal Kubecek Aug. 16, 2018, 11:39 a.m. UTC | #9
On Thu, Aug 16, 2018 at 03:55:16PM +0800, maowenan wrote:
> On 2018/8/16 15:44, Michal Kubecek wrote:
> > On Thu, Aug 16, 2018 at 03:39:14PM +0800, maowenan wrote:
> >>
> >>
> >> On 2018/8/16 15:23, Michal Kubecek wrote:
> >>> On Thu, Aug 16, 2018 at 03:19:12PM +0800, maowenan wrote:
> >>>> On 2018/8/16 14:52, Michal Kubecek wrote:
> >>>>>
> >>>>> My point is that backporting all this into stable 4.4 is quite intrusive
> >>>>> so that if we can achieve similar results with a simple fix of an
> >>>>> obvious omission, it would be preferrable.
> >>>>
> >>>> There are five patches in mainline to fix this CVE, only two patches
> >>>> have no effect on stable 4.4, the important reason is 4.4 use simple
> >>>> queue but mainline use RB tree.
> >>>>
> >>>> I have tried my best to use easy way to fix this with dropping packets
> >>>> 12.5%(or other value) based on simple queue, but the result is not
> >>>> very well, so the RB tree is needed and tested result is my desire.
> >>>>
> >>>> If we only back port two patches but they don't fix the issue, I think
> >>>> they don't make any sense.
> >>>
> >>> There is an obvious omission in one of the two patches and Takashi's
> >>> patch fixes it. If his follow-up fix (applied on top of what is in
> >>> stable 4.4 now) addresses the problem, I would certainly prefer using it
> >>> over backporting the whole series.
> >>
> >> Do you mean below codes from Takashi can fix this CVE?
> >> But I have already tested like this two days ago, it is not good effect.
> > 
> > IIRC what you proposed was different, you proposed to replace the "=" in
> > the other branch by "+=".
> 
> No, I think you don't get what I mean, I have already tested stable 4.4,
> based on commit dc6ae4d, and change the codes like Takashi, which didn't
> contain any codes I have sent in this patch series.

I suspect you may be doing something wrong with your tests. I checked
the segmentsmack testcase and the CPU utilization on receiving side
(with sending 10 times as many packets as default) went down from ~100%
to ~3% even when comparing what is in stable 4.4 now against older 4.4
kernel.

This is actually not surprising. the testcase only sends 1-byte segments
starting at even offsets so that receiver never gets two adjacent
segments and the "range_truesize != head->truesize" condition would
never be satisfied, whether you fix the backport or not.

Where the missing "range_truesize += skb->truesize" makes a difference
is in the case when there are some adjacent out of order segments, e.g.
if you omitted 1:10 and then sent 10:11, 11:12, 12:13, 13:14, ...
Then the missing update of range_truesize would prevent collapsing the
queue until the total length of the range would exceed the value of
SKB_WITH_OVERHEAD(SK_MEM_QUANTUM) (i.e. a bit less than 4 KB).

Michal Kubecek
maowenan Aug. 16, 2018, 12:05 p.m. UTC | #10
On 2018/8/16 19:39, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 03:55:16PM +0800, maowenan wrote:
>> On 2018/8/16 15:44, Michal Kubecek wrote:
>>> On Thu, Aug 16, 2018 at 03:39:14PM +0800, maowenan wrote:
>>>>
>>>>
>>>> On 2018/8/16 15:23, Michal Kubecek wrote:
>>>>> On Thu, Aug 16, 2018 at 03:19:12PM +0800, maowenan wrote:
>>>>>> On 2018/8/16 14:52, Michal Kubecek wrote:
>>>>>>>
>>>>>>> My point is that backporting all this into stable 4.4 is quite intrusive
>>>>>>> so that if we can achieve similar results with a simple fix of an
>>>>>>> obvious omission, it would be preferrable.
>>>>>>
>>>>>> There are five patches in mainline to fix this CVE, only two patches
>>>>>> have no effect on stable 4.4, the important reason is 4.4 use simple
>>>>>> queue but mainline use RB tree.
>>>>>>
>>>>>> I have tried my best to use easy way to fix this with dropping packets
>>>>>> 12.5%(or other value) based on simple queue, but the result is not
>>>>>> very well, so the RB tree is needed and tested result is my desire.
>>>>>>
>>>>>> If we only back port two patches but they don't fix the issue, I think
>>>>>> they don't make any sense.
>>>>>
>>>>> There is an obvious omission in one of the two patches and Takashi's
>>>>> patch fixes it. If his follow-up fix (applied on top of what is in
>>>>> stable 4.4 now) addresses the problem, I would certainly prefer using it
>>>>> over backporting the whole series.
>>>>
>>>> Do you mean below codes from Takashi can fix this CVE?
>>>> But I have already tested like this two days ago, it is not good effect.
>>>
>>> IIRC what you proposed was different, you proposed to replace the "=" in
>>> the other branch by "+=".
>>
>> No, I think you don't get what I mean, I have already tested stable 4.4,
>> based on commit dc6ae4d, and change the codes like Takashi, which didn't
>> contain any codes I have sent in this patch series.
> 
> I suspect you may be doing something wrong with your tests. I checked
> the segmentsmack testcase and the CPU utilization on receiving side
> (with sending 10 times as many packets as default) went down from ~100%
> to ~3% even when comparing what is in stable 4.4 now against older 4.4
> kernel.

There seems no obvious problem when you send packets with default parameter in Segmentsmack POC,
Which is also very related with your server's hardware configuration. Please try with below parameter
to form OFO packets then you will see cpu usage is high, and perf top shows that tcp_data_queue costs
cpu about 55.6%.
If dst port is 22, then you will see sshd about 95%.

int main(int argc, char **argv)
{
  // Adjust dst_mac, src_mac and dst_ip to match source and target!
  // Adjust dst_port to match the target, needs to be an open port!
  char dst_mac[6] = {0xb8,0x27,0xeb,0x54,0x23,0x4a};
  char src_mac[6] = {0x08,0x00,0x27,0xbc,0x91,0x93};
  uint32_t dst_ip = (192<<24)|(168<<16)|(1<<8)|225;
  uint32_t src_ip = 0;
  uint16_t dst_port = 22;   //attack existed ssh link
  uint16_t src_port = 0;

  ......

    for (j = 0; j < 102400*10; j++)    //10240->102400
    {
      for (i = 0; i < 1024; i++)      // 128->1024
      {
        tcp_set_ack_on(only_tcp[i]);
        tcp_set_src_port(only_tcp[i], src_port);
        tcp_set_dst_port(only_tcp[i], dst_port);
        tcp_set_seq_number(only_tcp[i], isn+2+2*(rand()%16384));
        //tcp_set_seq_number(only_tcp[i], isn+2);
        tcp_set_ack_number(only_tcp[i], other_isn+1);
        tcp_set_data_offset(only_tcp[i], 20);
        tcp_set_window(only_tcp[i], 65535);
        tcp_set_cksum_calc(ip, 20, only_tcp[i], sizeof(only_tcp[i]));
      }
      ret = ldp_out_inject_chunk(outq, pkt_tbl_chunk, 1024);  //128->1024
      printf("sent %d packets\n", ret);
      ldp_out_txsync(outq);
      usleep(10*1000); // Adjust this and packet count to match the target!, sleep 100ms->10ms
    }
  ......

> 
> This is actually not surprising. the testcase only sends 1-byte segments
> starting at even offsets so that receiver never gets two adjacent
> segments and the "range_truesize != head->truesize" condition would
> never be satisfied, whether you fix the backport or not.
> 
> Where the missing "range_truesize += skb->truesize" makes a difference
> is in the case when there are some adjacent out of order segments, e.g.
> if you omitted 1:10 and then sent 10:11, 11:12, 12:13, 13:14, ...
> Then the missing update of range_truesize would prevent collapsing the
> queue until the total length of the range would exceed the value of
> SKB_WITH_OVERHEAD(SK_MEM_QUANTUM) (i.e. a bit less than 4 KB).
> 
> Michal Kubecek
> 
> 
> .
>
Michal Kubecek Aug. 16, 2018, 12:33 p.m. UTC | #11
On Thu, Aug 16, 2018 at 08:05:50PM +0800, maowenan wrote:
> On 2018/8/16 19:39, Michal Kubecek wrote:
> > 
> > I suspect you may be doing something wrong with your tests. I checked
> > the segmentsmack testcase and the CPU utilization on receiving side
> > (with sending 10 times as many packets as default) went down from ~100%
> > to ~3% even when comparing what is in stable 4.4 now against older 4.4
> > kernel.
> 
> There seems no obvious problem when you send packets with default
> parameter in Segmentsmack POC, Which is also very related with your
> server's hardware configuration. Please try with below parameter to
> form OFO packets

I did and even with these (questionable, see below) changes, I did not
get more than 10% (of one core) by receiving ksoftirqd.

>       for (i = 0; i < 1024; i++)      // 128->1024
...
>       usleep(10*1000); // Adjust this and packet count to match the target!, sleep 100ms->10ms

The comment in the testcase source suggests to do _one_ of these two
changes so that you generate 10 times as many packets as the original
testcase. You did both so that you end up sending 102400 packets per
second. With 55 byte long packets, this kind of attack requires at least
5.5 MB/s (44 Mb/s) of throughput. This is no longer a "low packet rate
DoS", I'm afraid.

Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).

What I can see, though, is that with current stable 4.4 code, modified
testcase which sends something like

  2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...

I quickly eat 6 MB of memory for receive queue of one socket while
earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
Takashi's follow-up yet but I'm pretty sure it will help while
preserving nice performance when using the original segmentsmack
testcase (with increased packet ratio).

Michal Kubecek
Greg KH Aug. 16, 2018, 3:24 p.m. UTC | #12
On Thu, Aug 16, 2018 at 02:33:56PM +0200, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 08:05:50PM +0800, maowenan wrote:
> > On 2018/8/16 19:39, Michal Kubecek wrote:
> > > 
> > > I suspect you may be doing something wrong with your tests. I checked
> > > the segmentsmack testcase and the CPU utilization on receiving side
> > > (with sending 10 times as many packets as default) went down from ~100%
> > > to ~3% even when comparing what is in stable 4.4 now against older 4.4
> > > kernel.
> > 
> > There seems no obvious problem when you send packets with default
> > parameter in Segmentsmack POC, Which is also very related with your
> > server's hardware configuration. Please try with below parameter to
> > form OFO packets
> 
> I did and even with these (questionable, see below) changes, I did not
> get more than 10% (of one core) by receiving ksoftirqd.
> 
> >       for (i = 0; i < 1024; i++)      // 128->1024
> ...
> >       usleep(10*1000); // Adjust this and packet count to match the target!, sleep 100ms->10ms
> 
> The comment in the testcase source suggests to do _one_ of these two
> changes so that you generate 10 times as many packets as the original
> testcase. You did both so that you end up sending 102400 packets per
> second. With 55 byte long packets, this kind of attack requires at least
> 5.5 MB/s (44 Mb/s) of throughput. This is no longer a "low packet rate
> DoS", I'm afraid.
> 
> Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).
> 
> What I can see, though, is that with current stable 4.4 code, modified
> testcase which sends something like
> 
>   2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...
> 
> I quickly eat 6 MB of memory for receive queue of one socket while
> earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
> Takashi's follow-up yet but I'm pretty sure it will help while
> preserving nice performance when using the original segmentsmack
> testcase (with increased packet ratio).

Ok, for now I've applied Takashi's fix to the 4.4 stable queue and will
push out a new 4.4-rc later tonight.  Can everyone standardize on that
and test and let me know if it does, or does not, fix the reported
issues?

If not, we can go from there and evaluate this much larger patch series.
But let's try the simple thing first.

thanks,

greg k-h
Michal Kubecek Aug. 16, 2018, 4:06 p.m. UTC | #13
On Thu, Aug 16, 2018 at 05:24:09PM +0200, Greg KH wrote:
> On Thu, Aug 16, 2018 at 02:33:56PM +0200, Michal Kubecek wrote:
> > 
> > Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).
> > 
> > What I can see, though, is that with current stable 4.4 code, modified
> > testcase which sends something like
> > 
> >   2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...
> > 
> > I quickly eat 6 MB of memory for receive queue of one socket while
> > earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
> > Takashi's follow-up yet but I'm pretty sure it will help while
> > preserving nice performance when using the original segmentsmack
> > testcase (with increased packet ratio).
> 
> Ok, for now I've applied Takashi's fix to the 4.4 stable queue and will
> push out a new 4.4-rc later tonight.  Can everyone standardize on that
> and test and let me know if it does, or does not, fix the reported
> issues?

I did repeat the tests with Takashi's fix and the CPU utilization is
similar to what we have now, i.e. 3-5% with 10K pkt/s. I could still
saturate one CPU somewhere around 50K pkt/s but that already requires
2.75 MB/s (22 Mb/s) of throughput. (My previous tests with Mao Wenan's
changes in fact used lower speeds as the change from 128 to 1024 would
need to be done in two places.)

Where Takashi's patch does help is that it does not prevent collapsing
of ranges of adjacent segments with total length shorter than ~4KB. It
took more time to verify: it cannot be checked by watching the socket
memory consumption with ss as tcp_collapse_ofo_queue isn't called until
we reach the limits. So I needed to trace when and how tcp_collpse() is
called with both current stable 4.4 code and one with Takashi's fix.
  
> If not, we can go from there and evaluate this much larger patch
> series.  But let's try the simple thing first.

At high packet rates (say 30K pkt/s and more), we can still saturate the
CPU. This is also mentioned in the announcement with claim that switch
to rbtree based queue would be necessary to fully address that. My tests
seem to confirm that but I'm still not sure it is worth backporting
something as intrusive into stable 4.4.

Michal Kubecek
Greg KH Aug. 16, 2018, 4:20 p.m. UTC | #14
On Thu, Aug 16, 2018 at 06:06:16PM +0200, Michal Kubecek wrote:
> > If not, we can go from there and evaluate this much larger patch
> > series.  But let's try the simple thing first.
> 
> At high packet rates (say 30K pkt/s and more), we can still saturate the
> CPU. This is also mentioned in the announcement with claim that switch
> to rbtree based queue would be necessary to fully address that. My tests
> seem to confirm that but I'm still not sure it is worth backporting
> something as intrusive into stable 4.4.

No, it is not.  If you worry about those things, you should not be
running a 4.4 kernel, use 4.14 or newer please.

thanks,

greg k-h
maowenan Aug. 17, 2018, 2:48 a.m. UTC | #15
On 2018/8/17 0:06, Michal Kubecek wrote:
> On Thu, Aug 16, 2018 at 05:24:09PM +0200, Greg KH wrote:
>> On Thu, Aug 16, 2018 at 02:33:56PM +0200, Michal Kubecek wrote:
>>>
>>> Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).
>>>
>>> What I can see, though, is that with current stable 4.4 code, modified
>>> testcase which sends something like
>>>
>>>   2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...
>>>
>>> I quickly eat 6 MB of memory for receive queue of one socket while
>>> earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
>>> Takashi's follow-up yet but I'm pretty sure it will help while
>>> preserving nice performance when using the original segmentsmack
>>> testcase (with increased packet ratio).
>>
>> Ok, for now I've applied Takashi's fix to the 4.4 stable queue and will
>> push out a new 4.4-rc later tonight.  Can everyone standardize on that
>> and test and let me know if it does, or does not, fix the reported
>> issues?
> 
> I did repeat the tests with Takashi's fix and the CPU utilization is
> similar to what we have now, i.e. 3-5% with 10K pkt/s. I could still
> saturate one CPU somewhere around 50K pkt/s but that already requires
> 2.75 MB/s (22 Mb/s) of throughput. (My previous tests with Mao Wenan's
> changes in fact used lower speeds as the change from 128 to 1024 would
> need to be done in two places.)
> 
> Where Takashi's patch does help is that it does not prevent collapsing
> of ranges of adjacent segments with total length shorter than ~4KB. It
> took more time to verify: it cannot be checked by watching the socket
> memory consumption with ss as tcp_collapse_ofo_queue isn't called until
> we reach the limits. So I needed to trace when and how tcp_collpse() is
> called with both current stable 4.4 code and one with Takashi's fix.

The POC is default to attack Raspberry Pi system, whose cpu performance is lower,
so the default parameter is not aggressive, we would enlarge parameter to test
in our intel skylake system(with high performance), if don't do this, cpu usage isn't
obvious different with fixed patch and without fixed patch, you can't distinguish
whether the patch can really fix it or not.

I have made series testing here, including low rate attacking(128B,100ms interval)
and high rate attacking(1024B,10ms interval), with original 4.4 kernel, only Takashi's patch,
and only Mao Wenan's patches. I will check the cpu usage of ksoftirq.

	      original	Takashi	Mao Wenan
low rate 	3%	2%	2%
high rate 	50%	49%	~10%

so, I can't identify whether Takashi's patch can really fix radical issue, which I think
the root reason exist in simple queue, and Eric's patch
72cd43ba tcp: free batches of packets in tcp_prune_ofo_queue() can completely fix this,
which have already involved in my patch series. This patch need change simple queue to
RB tree, and it is high efficiency searching and dropping packets, and avoid large tcp retransmitting.
so cpu usage will be fall down.

>   
>> If not, we can go from there and evaluate this much larger patch
>> series.  But let's try the simple thing first.
> 
> At high packet rates (say 30K pkt/s and more), we can still saturate the
> CPU. This is also mentioned in the announcement with claim that switch
> to rbtree based queue would be necessary to fully address that. My tests
> seem to confirm that but I'm still not sure it is worth backporting
> something as intrusive into stable 4.4.
> 
> Michal Kubecek
> 
> .
>
Greg KH Sept. 13, 2018, 12:32 p.m. UTC | #16
On Thu, Aug 16, 2018 at 05:24:09PM +0200, Greg KH wrote:
> On Thu, Aug 16, 2018 at 02:33:56PM +0200, Michal Kubecek wrote:
> > On Thu, Aug 16, 2018 at 08:05:50PM +0800, maowenan wrote:
> > > On 2018/8/16 19:39, Michal Kubecek wrote:
> > > > 
> > > > I suspect you may be doing something wrong with your tests. I checked
> > > > the segmentsmack testcase and the CPU utilization on receiving side
> > > > (with sending 10 times as many packets as default) went down from ~100%
> > > > to ~3% even when comparing what is in stable 4.4 now against older 4.4
> > > > kernel.
> > > 
> > > There seems no obvious problem when you send packets with default
> > > parameter in Segmentsmack POC, Which is also very related with your
> > > server's hardware configuration. Please try with below parameter to
> > > form OFO packets
> > 
> > I did and even with these (questionable, see below) changes, I did not
> > get more than 10% (of one core) by receiving ksoftirqd.
> > 
> > >       for (i = 0; i < 1024; i++)      // 128->1024
> > ...
> > >       usleep(10*1000); // Adjust this and packet count to match the target!, sleep 100ms->10ms
> > 
> > The comment in the testcase source suggests to do _one_ of these two
> > changes so that you generate 10 times as many packets as the original
> > testcase. You did both so that you end up sending 102400 packets per
> > second. With 55 byte long packets, this kind of attack requires at least
> > 5.5 MB/s (44 Mb/s) of throughput. This is no longer a "low packet rate
> > DoS", I'm afraid.
> > 
> > Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).
> > 
> > What I can see, though, is that with current stable 4.4 code, modified
> > testcase which sends something like
> > 
> >   2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...
> > 
> > I quickly eat 6 MB of memory for receive queue of one socket while
> > earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
> > Takashi's follow-up yet but I'm pretty sure it will help while
> > preserving nice performance when using the original segmentsmack
> > testcase (with increased packet ratio).
> 
> Ok, for now I've applied Takashi's fix to the 4.4 stable queue and will
> push out a new 4.4-rc later tonight.  Can everyone standardize on that
> and test and let me know if it does, or does not, fix the reported
> issues?
> 
> If not, we can go from there and evaluate this much larger patch series.
> But let's try the simple thing first.

So, is the issue still present on the latest 4.4 release?  Has anyone
tested it?  If not, I'm more than willing to look at backported patches,
but I want to ensure that they really are needed here.

thanks,

greg k-h
Eric Dumazet Sept. 13, 2018, 12:44 p.m. UTC | #17
On Thu, Sep 13, 2018 at 5:32 AM Greg KH <gregkh@linux-foundation.org> wrote:
>
> On Thu, Aug 16, 2018 at 05:24:09PM +0200, Greg KH wrote:
> > On Thu, Aug 16, 2018 at 02:33:56PM +0200, Michal Kubecek wrote:
> > > On Thu, Aug 16, 2018 at 08:05:50PM +0800, maowenan wrote:
> > > > On 2018/8/16 19:39, Michal Kubecek wrote:
> > > > >
> > > > > I suspect you may be doing something wrong with your tests. I checked
> > > > > the segmentsmack testcase and the CPU utilization on receiving side
> > > > > (with sending 10 times as many packets as default) went down from ~100%
> > > > > to ~3% even when comparing what is in stable 4.4 now against older 4.4
> > > > > kernel.
> > > >
> > > > There seems no obvious problem when you send packets with default
> > > > parameter in Segmentsmack POC, Which is also very related with your
> > > > server's hardware configuration. Please try with below parameter to
> > > > form OFO packets
> > >
> > > I did and even with these (questionable, see below) changes, I did not
> > > get more than 10% (of one core) by receiving ksoftirqd.
> > >
> > > >       for (i = 0; i < 1024; i++)      // 128->1024
> > > ...
> > > >       usleep(10*1000); // Adjust this and packet count to match the target!, sleep 100ms->10ms
> > >
> > > The comment in the testcase source suggests to do _one_ of these two
> > > changes so that you generate 10 times as many packets as the original
> > > testcase. You did both so that you end up sending 102400 packets per
> > > second. With 55 byte long packets, this kind of attack requires at least
> > > 5.5 MB/s (44 Mb/s) of throughput. This is no longer a "low packet rate
> > > DoS", I'm afraid.
> > >
> > > Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).
> > >
> > > What I can see, though, is that with current stable 4.4 code, modified
> > > testcase which sends something like
> > >
> > >   2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...
> > >
> > > I quickly eat 6 MB of memory for receive queue of one socket while
> > > earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
> > > Takashi's follow-up yet but I'm pretty sure it will help while
> > > preserving nice performance when using the original segmentsmack
> > > testcase (with increased packet ratio).
> >
> > Ok, for now I've applied Takashi's fix to the 4.4 stable queue and will
> > push out a new 4.4-rc later tonight.  Can everyone standardize on that
> > and test and let me know if it does, or does not, fix the reported
> > issues?
> >
> > If not, we can go from there and evaluate this much larger patch series.
> > But let's try the simple thing first.
>
> So, is the issue still present on the latest 4.4 release?  Has anyone
> tested it?  If not, I'm more than willing to look at backported patches,
> but I want to ensure that they really are needed here.
>
> thanks,

Honestly, TCP stack without rb-tree for the OOO queue is vulnerable,
even with non malicious sender,
but with big enough TCP receive window and a not favorable network.

So a malicious peer can definitely send packets needed to make TCP
stack behave in O(N), which is pretty bad if N is big...

9f5afeae51526b3ad7b7cb21ee8b145ce6ea7a7a ("tcp: use an RB tree for ooo
receive queue")
was proven to be almost bug free [1], and should be backported if possible.

[1] bug fixed :
76f0dcbb5ae1a7c3dbeec13dd98233b8e6b0b32a tcp: fix a stale ooo_last_skb
after a replace
maowenan Sept. 14, 2018, 2:24 a.m. UTC | #18
On 2018/9/13 20:44, Eric Dumazet wrote:
> On Thu, Sep 13, 2018 at 5:32 AM Greg KH <gregkh@linux-foundation.org> wrote:
>>
>> On Thu, Aug 16, 2018 at 05:24:09PM +0200, Greg KH wrote:
>>> On Thu, Aug 16, 2018 at 02:33:56PM +0200, Michal Kubecek wrote:
>>>> On Thu, Aug 16, 2018 at 08:05:50PM +0800, maowenan wrote:
>>>>> On 2018/8/16 19:39, Michal Kubecek wrote:
>>>>>>
>>>>>> I suspect you may be doing something wrong with your tests. I checked
>>>>>> the segmentsmack testcase and the CPU utilization on receiving side
>>>>>> (with sending 10 times as many packets as default) went down from ~100%
>>>>>> to ~3% even when comparing what is in stable 4.4 now against older 4.4
>>>>>> kernel.
>>>>>
>>>>> There seems no obvious problem when you send packets with default
>>>>> parameter in Segmentsmack POC, Which is also very related with your
>>>>> server's hardware configuration. Please try with below parameter to
>>>>> form OFO packets
>>>>
>>>> I did and even with these (questionable, see below) changes, I did not
>>>> get more than 10% (of one core) by receiving ksoftirqd.
>>>>
>>>>>       for (i = 0; i < 1024; i++)      // 128->1024
>>>> ...
>>>>>       usleep(10*1000); // Adjust this and packet count to match the target!, sleep 100ms->10ms
>>>>
>>>> The comment in the testcase source suggests to do _one_ of these two
>>>> changes so that you generate 10 times as many packets as the original
>>>> testcase. You did both so that you end up sending 102400 packets per
>>>> second. With 55 byte long packets, this kind of attack requires at least
>>>> 5.5 MB/s (44 Mb/s) of throughput. This is no longer a "low packet rate
>>>> DoS", I'm afraid.
>>>>
>>>> Anyway, even at this rate, I only get ~10% of one core (Intel E5-2697).
>>>>
>>>> What I can see, though, is that with current stable 4.4 code, modified
>>>> testcase which sends something like
>>>>
>>>>   2:3, 3:4, ..., 3001:3002, 3003:3004, 3004:3005, ... 6001:6002, ...
>>>>
>>>> I quickly eat 6 MB of memory for receive queue of one socket while
>>>> earlier 4.4 kernels only take 200-300 KB. I didn't test latest 4.4 with
>>>> Takashi's follow-up yet but I'm pretty sure it will help while
>>>> preserving nice performance when using the original segmentsmack
>>>> testcase (with increased packet ratio).
>>>
>>> Ok, for now I've applied Takashi's fix to the 4.4 stable queue and will
>>> push out a new 4.4-rc later tonight.  Can everyone standardize on that
>>> and test and let me know if it does, or does not, fix the reported
>>> issues?
>>>
>>> If not, we can go from there and evaluate this much larger patch series.
>>> But let's try the simple thing first.
>>
>> So, is the issue still present on the latest 4.4 release?  Has anyone
>> tested it?  If not, I'm more than willing to look at backported patches,
>> but I want to ensure that they really are needed here.
>>
>> thanks,
> 
> Honestly, TCP stack without rb-tree for the OOO queue is vulnerable,
> even with non malicious sender,
> but with big enough TCP receive window and a not favorable network.
> 
> So a malicious peer can definitely send packets needed to make TCP
> stack behave in O(N), which is pretty bad if N is big...
> 
> 9f5afeae51526b3ad7b7cb21ee8b145ce6ea7a7a ("tcp: use an RB tree for ooo
> receive queue")
> was proven to be almost bug free [1], and should be backported if possible.
> 
> [1] bug fixed :
> 76f0dcbb5ae1a7c3dbeec13dd98233b8e6b0b32a tcp: fix a stale ooo_last_skb
> after a replace

Thank you for Eric's suggestion, I will do some work to backport them.
> 
> .
>