diff mbox

[V2,2/7] tcp: use limited socket backlog

Message ID 1267605389-7369-2-git-send-email-yi.zhu@intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Zhu Yi March 3, 2010, 8:36 a.m. UTC
Make tcp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/ipv4/tcp_ipv4.c |    6 ++++--
 net/ipv6/tcp_ipv6.c |    6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

Comments

Eric Dumazet March 3, 2010, 8:53 a.m. UTC | #1
Le mercredi 03 mars 2010 à 16:36 +0800, Zhu Yi a écrit :
> Make tcp adapt to the limited socket backlog change.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
> Cc: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> ---
>  net/ipv4/tcp_ipv4.c |    6 ++++--
>  net/ipv6/tcp_ipv6.c |    6 ++++--
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index c3588b4..4baf194 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1682,8 +1682,10 @@ process:
>  			if (!tcp_prequeue(sk, skb))
>  				ret = tcp_v4_do_rcv(sk, skb);
>  		}
> -	} else
> -		sk_add_backlog(sk, skb);
> +	} else if (sk_add_backlog_limited(sk, skb)) {
> +		bh_unlock_sock(sk);
> +		goto discard_and_relse;
> +	}
>  	bh_unlock_sock(sk);
>  
>  	sock_put(sk);

So no counter is incremented to reflect this loss, sk->sk_drops (local
counter) or SNMP ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yi March 3, 2010, 9:06 a.m. UTC | #2
On Wed, 2010-03-03 at 16:53 +0800, Eric Dumazet wrote:
> > @@ -1682,8 +1682,10 @@ process:
> >                       if (!tcp_prequeue(sk, skb))
> >                               ret = tcp_v4_do_rcv(sk, skb);
> >               }
> > -     } else
> > -             sk_add_backlog(sk, skb);
> > +     } else if (sk_add_backlog_limited(sk, skb)) {
> > +             bh_unlock_sock(sk);
> > +             goto discard_and_relse;
> > +     }
> >       bh_unlock_sock(sk);
> >  
> >       sock_put(sk);
> 
> So no counter is incremented to reflect this loss, sk->sk_drops (local
> counter) or SNMP ? 

I simply follow how the code is originally written. As you can see,
tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
we don't even bother to check.

Thanks,
-yi

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 3, 2010, 10:07 a.m. UTC | #3
Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit :

> I simply follow how the code is originally written. As you can see,
> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
> we don't even bother to check.

You add a new point where a packet can be dropped, this should be
accounted for, so that admins can have a clue whats going on.

Previously, packet was always queued, and dropped later (and accounted)

Not everybody runs drop monitor :)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yi March 3, 2010, 2:12 p.m. UTC | #4
Eric Dumazet [mailto:eric.dumazet@gmail.com] wrote:
> Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit :


>> I simply follow how the code is originally written. As you can see,

>> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,

>> we don't even bother to check.


> You add a new point where a packet can be dropped, this should be

> accounted for, so that admins can have a clue whats going on.


> Previously, packet was always queued, and dropped later (and accounted)


In case of the skb doesn't have a MD5 option while we are expecting one, or we
failed to find the sk for the skb connection request, etc, the skb is dropped silently in
tcp_v4_do_rcv(). No?

> Not everybody runs drop monitor :)


Thanks,
-yi
Eric Dumazet March 3, 2010, 2:31 p.m. UTC | #5
Le mercredi 03 mars 2010 à 22:12 +0800, Zhu, Yi a écrit :
> Eric Dumazet [mailto:eric.dumazet@gmail.com] wrote:
> > Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit :
> 
> >> I simply follow how the code is originally written. As you can see,
> >> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
> >> we don't even bother to check.
> 
> > You add a new point where a packet can be dropped, this should be
> > accounted for, so that admins can have a clue whats going on.
> 
> > Previously, packet was always queued, and dropped later (and accounted)
> 
> In case of the skb doesn't have a MD5 option while we are expecting one, or we
> failed to find the sk for the skb connection request, etc, the skb is dropped silently in
> tcp_v4_do_rcv(). No?

Then its a separate bug. MD5 support added so many bugs its not even
funny.

Existing bugs are not an excuse for adding new ones, we try the reverse.
No ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yi March 4, 2010, 5:21 a.m. UTC | #6
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Then its a separate bug. MD5 support added so many bugs its not even

> funny.


> Existing bugs are not an excuse for adding new ones, we try the reverse.

> No ?


Can you show me where sk_drops is used by TCP and what SNMP MIB value
should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct.

Thanks,
-yi
Eric Dumazet March 4, 2010, 6 a.m. UTC | #7
Le jeudi 04 mars 2010 à 13:21 +0800, Zhu, Yi a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Then its a separate bug. MD5 support added so many bugs its not even
> > funny.
> 
> > Existing bugs are not an excuse for adding new ones, we try the reverse.
> > No ?
> 
> Can you show me where sk_drops is used by TCP and what SNMP MIB value
> should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct.

sk_drop is not yet used by TCP, because when backlog processing is
performed, TCP state machine has much finer grain capabilities to show
why a packet is dropped. In our backlog drop, we dont examine details of
packet and drop it at a lower level.

Please add a new counter, say LINUX_MIB_TCPBACKLOGDROP :

3 added lines:
- one in "include/linux/snmp.h" to define the MIB name
- one to define its string
- one to perform the increment when actual drop occurs)

A good starting point would be to study recent commit
72032fdbcde8b333e65b3430e1bcb4358e2d6716 from Jamal
(xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR)

For sk_drop, just increment it and we can get it at generic sock level.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yi March 4, 2010, 11:04 a.m. UTC | #8
Eric Dumazet <eric.dumazet@gmail.com> wrote:

>> Can you show me where sk_drops is used by TCP and what SNMP MIB value

>> should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct.


> sk_drop is not yet used by TCP, because when backlog processing is

> performed, TCP state machine has much finer grain capabilities to show

> why a packet is dropped. In our backlog drop, we dont examine details of

> packet and drop it at a lower level.


> Please add a new counter, say LINUX_MIB_TCPBACKLOGDROP :


> 3 added lines:

> - one in "include/linux/snmp.h" to define the MIB name

> - one to define its string

> - one to perform the increment when actual drop occurs)


> A good starting point would be to study recent commit

> 72032fdbcde8b333e65b3430e1bcb4358e2d6716 from Jamal

> (xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR)


> For sk_drop, just increment it and we can get it at generic sock level.


Since neither sk_drops nor the new MIB value are used by TCP currently.
How about I keep the tcp backlog limit patch as is and you implement
the above in another patch?

Thanks,
-yi
Eric Dumazet March 4, 2010, 2:56 p.m. UTC | #9
Le jeudi 04 mars 2010 à 19:04 +0800, Zhu, Yi a écrit :

> 
> Since neither sk_drops nor the new MIB value are used by TCP currently.
> How about I keep the tcp backlog limit patch as is and you implement
> the above in another patch?

Sure, I will do that


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c3588b4..4baf194 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1682,8 +1682,10 @@  process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v4_do_rcv(sk, skb);
 		}
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog_limited(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto discard_and_relse;
+	}
 	bh_unlock_sock(sk);
 
 	sock_put(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6963a6b..c4ea9d5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1740,8 +1740,10 @@  process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v6_do_rcv(sk, skb);
 		}
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog_limited(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto discard_and_relse;
+	}
 	bh_unlock_sock(sk);
 
 	sock_put(sk);