diff mbox

[RFC] net: decrease the length of backlog queue immediately after it's detached from sk

Message ID 1459315001-3448-1-git-send-email-yangyingliang@huawei.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Yang Yingliang March 30, 2016, 5:16 a.m. UTC
When task A hold the sk owned in tcp_sendmsg, if lots of packets
arrive and the packets will be added to backlog queue. The packets
will be handled in release_sock called from tcp_sendmsg. When the
sk_backlog is removed from sk, the length will not decrease until
all the packets in backlog queue are handled. This may leads to the
new packets be dropped because the lenth is too big. So set the
lenth to 0 immediately after it's detached from sk.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 net/core/sock.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Eric Dumazet March 30, 2016, 5:25 a.m. UTC | #1
On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
> When task A hold the sk owned in tcp_sendmsg, if lots of packets
> arrive and the packets will be added to backlog queue. The packets
> will be handled in release_sock called from tcp_sendmsg. When the
> sk_backlog is removed from sk, the length will not decrease until
> all the packets in backlog queue are handled. This may leads to the
> new packets be dropped because the lenth is too big. So set the
> lenth to 0 immediately after it's detached from sk.
> 
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  net/core/sock.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 47fc8bb..108be05 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>  
>  	do {
>  		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
> +		sk->sk_backlog.len = 0;
>  		bh_unlock_sock(sk);
>  
>  		do {

Certainly not.

Have you really missed the comment ?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


I do not believe the case you describe can happen, unless a misbehaving
driver cooks fat skb (with skb->truesize being far more bigger than
skb->len)
Eric Dumazet March 30, 2016, 5:34 a.m. UTC | #2
On Tue, 2016-03-29 at 22:25 -0700, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
> > When task A hold the sk owned in tcp_sendmsg, if lots of packets
> > arrive and the packets will be added to backlog queue. The packets
> > will be handled in release_sock called from tcp_sendmsg. When the
> > sk_backlog is removed from sk, the length will not decrease until
> > all the packets in backlog queue are handled. This may leads to the
> > new packets be dropped because the lenth is too big. So set the
> > lenth to 0 immediately after it's detached from sk.
> > 
> > Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> > ---
> >  net/core/sock.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 47fc8bb..108be05 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
> >  
> >  	do {
> >  		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
> > +		sk->sk_backlog.len = 0;
> >  		bh_unlock_sock(sk);
> >  
> >  		do {
> 
> Certainly not.
> 
> Have you really missed the comment ?
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871
> 
> 
> I do not believe the case you describe can happen, unless a misbehaving
> driver cooks fat skb (with skb->truesize being far more bigger than
> skb->len)
> 

And also make sure you backported
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da882c1f2ecadb0ed582628ec1585e36b137c0f0
Yang Yingliang March 30, 2016, 5:38 a.m. UTC | #3
On 2016/3/30 13:25, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>> arrive and the packets will be added to backlog queue. The packets
>> will be handled in release_sock called from tcp_sendmsg. When the
>> sk_backlog is removed from sk, the length will not decrease until
>> all the packets in backlog queue are handled. This may leads to the
>> new packets be dropped because the lenth is too big. So set the
>> lenth to 0 immediately after it's detached from sk.
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   net/core/sock.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 47fc8bb..108be05 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>>
>>   	do {
>>   		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
>> +		sk->sk_backlog.len = 0;
>>   		bh_unlock_sock(sk);
>>
>>   		do {
>
> Certainly not.
>
> Have you really missed the comment ?
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871

My kernel is 4.1 LTS, it seems don't have this patch. I will try this 
patch later.

Thanks
Yang
>
>
> I do not believe the case you describe can happen, unless a misbehaving
> driver cooks fat skb (with skb->truesize being far more bigger than
> skb->len)
>
>
>
>
>
>
>
Yang Yingliang March 30, 2016, 5:56 a.m. UTC | #4
On 2016/3/30 13:34, Eric Dumazet wrote:
> On Tue, 2016-03-29 at 22:25 -0700, Eric Dumazet wrote:
>> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
>>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>>> arrive and the packets will be added to backlog queue. The packets
>>> will be handled in release_sock called from tcp_sendmsg. When the
>>> sk_backlog is removed from sk, the length will not decrease until
>>> all the packets in backlog queue are handled. This may leads to the
>>> new packets be dropped because the lenth is too big. So set the
>>> lenth to 0 immediately after it's detached from sk.
>>>
>>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>>> ---
>>>   net/core/sock.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 47fc8bb..108be05 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>>>
>>>   	do {
>>>   		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
>>> +		sk->sk_backlog.len = 0;
>>>   		bh_unlock_sock(sk);
>>>
>>>   		do {
>>
>> Certainly not.
>>
>> Have you really missed the comment ?
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871
>>
>>
>> I do not believe the case you describe can happen, unless a misbehaving
>> driver cooks fat skb (with skb->truesize being far more bigger than
>> skb->len)
>>
>
> And also make sure you backported
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da882c1f2ecadb0ed582628ec1585e36b137c0f0

Sorry, I made a mistake. I am very sure my kernel has these two patches.
And I can get some dropping of the packets in 10Gb eth.

# netstat -s | grep -i backlog
     TCPBacklogDrop: 4135
# netstat -s | grep -i backlog
     TCPBacklogDrop: 4167


>
>
>
>
>
>
Sergei Shtylyov March 30, 2016, 12:56 p.m. UTC | #5
Hello.

On 3/30/2016 8:16 AM, Yang Yingliang wrote:

> When task A hold the sk owned in tcp_sendmsg, if lots of packets
> arrive and the packets will be added to backlog queue. The packets
> will be handled in release_sock called from tcp_sendmsg. When the
> sk_backlog is removed from sk, the length will not decrease until
> all the packets in backlog queue are handled. This may leads to the
> new packets be dropped because the lenth is too big. So set the
> lenth to 0 immediately after it's detached from sk.

    Length?

> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
[...]

MBR, Sergei
Eric Dumazet March 30, 2016, 1:47 p.m. UTC | #6
On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:

> Sorry, I made a mistake. I am very sure my kernel has these two patches.
> And I can get some dropping of the packets in 10Gb eth.
> 
> # netstat -s | grep -i backlog
>      TCPBacklogDrop: 4135
> # netstat -s | grep -i backlog
>      TCPBacklogDrop: 4167

Sender will retransmit and the receiver backlog will lilely be emptied
before the packets arrive again.

Are you sure these are TCP drops ?

Which 10Gb NIC is it ? (ethtool -i eth0)

What is the max size of sendmsg() chunks are generated by your apps ?

Are they forcing small SO_RCVBUF or SO_SNDBUF ?

What percentage of drops do you have ?

Here (at Google), we have less than one backlog drop per billion
packets, on host facing the public Internet.

If a TCP sender sends a burst of tiny packets because it is misbehaving,
you absolutely will drop packets, especially if applications use
sendmsg() with very big lengths and big SO_SNDBUF.

Trying to not drop these hostile packets as you did is simply opening
your host to DOS attacks.

Eventually, we should even drop earlier in TCP stack (before taking
socket lock).
Yang Yingliang April 7, 2016, 6:01 a.m. UTC | #7
On 2016/3/30 20:56, Sergei Shtylyov wrote:
> Hello.
>
> On 3/30/2016 8:16 AM, Yang Yingliang wrote:
>
>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>> arrive and the packets will be added to backlog queue. The packets
>> will be handled in release_sock called from tcp_sendmsg. When the
>> sk_backlog is removed from sk, the length will not decrease until
>> all the packets in backlog queue are handled. This may leads to the
>> new packets be dropped because the lenth is too big. So set the
>> lenth to 0 immediately after it's detached from sk.
>
>     Length?
>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> [...]
>
> MBR, Sergei
>
>
Yes. It's a typo.

Thanks
Yang
diff mbox

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 47fc8bb..108be05 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1933,6 +1933,7 @@  static void __release_sock(struct sock *sk)
 
 	do {
 		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
+		sk->sk_backlog.len = 0;
 		bh_unlock_sock(sk);
 
 		do {