Message ID | 1267605389-7369-2-git-send-email-yi.zhu@intel.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Le mercredi 03 mars 2010 à 16:36 +0800, Zhu Yi a écrit : > Make tcp adapt to the limited socket backlog change. > > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> > Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> > Cc: Patrick McHardy <kaber@trash.net> > Signed-off-by: Zhu Yi <yi.zhu@intel.com> > --- > net/ipv4/tcp_ipv4.c | 6 ++++-- > net/ipv6/tcp_ipv6.c | 6 ++++-- > 2 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index c3588b4..4baf194 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1682,8 +1682,10 @@ process: > if (!tcp_prequeue(sk, skb)) > ret = tcp_v4_do_rcv(sk, skb); > } > - } else > - sk_add_backlog(sk, skb); > + } else if (sk_add_backlog_limited(sk, skb)) { > + bh_unlock_sock(sk); > + goto discard_and_relse; > + } > bh_unlock_sock(sk); > > sock_put(sk); So no counter is incremented to reflect this loss, sk->sk_drops (local counter) or SNMP ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-03-03 at 16:53 +0800, Eric Dumazet wrote: > > @@ -1682,8 +1682,10 @@ process: > > if (!tcp_prequeue(sk, skb)) > > ret = tcp_v4_do_rcv(sk, skb); > > } > > - } else > > - sk_add_backlog(sk, skb); > > + } else if (sk_add_backlog_limited(sk, skb)) { > > + bh_unlock_sock(sk); > > + goto discard_and_relse; > > + } > > bh_unlock_sock(sk); > > > > sock_put(sk); > > So no counter is incremented to reflect this loss, sk->sk_drops (local > counter) or SNMP ? I simply follow how the code is originally written. As you can see, tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place, we don't even bother to check. Thanks, -yi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit : > I simply follow how the code is originally written. As you can see, > tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place, > we don't even bother to check. You add a new point where a packet can be dropped, this should be accounted for, so that admins can have a clue whats going on. Previously, packet was always queued, and dropped later (and accounted) Not everybody runs drop monitor :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet [mailto:eric.dumazet@gmail.com] wrote: > Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit : >> I simply follow how the code is originally written. As you can see, >> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place, >> we don't even bother to check. > You add a new point where a packet can be dropped, this should be > accounted for, so that admins can have a clue whats going on. > Previously, packet was always queued, and dropped later (and accounted) In case of the skb doesn't have a MD5 option while we are expecting one, or we failed to find the sk for the skb connection request, etc, the skb is dropped silently in tcp_v4_do_rcv(). No? > Not everybody runs drop monitor :) Thanks, -yi
Le mercredi 03 mars 2010 à 22:12 +0800, Zhu, Yi a écrit : > Eric Dumazet [mailto:eric.dumazet@gmail.com] wrote: > > Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit : > > >> I simply follow how the code is originally written. As you can see, > >> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place, > >> we don't even bother to check. > > > You add a new point where a packet can be dropped, this should be > > accounted for, so that admins can have a clue whats going on. > > > Previously, packet was always queued, and dropped later (and accounted) > > In case of the skb doesn't have a MD5 option while we are expecting one, or we > failed to find the sk for the skb connection request, etc, the skb is dropped silently in > tcp_v4_do_rcv(). No? Then its a separate bug. MD5 support added so many bugs its not even funny. Existing bugs are not an excuse for adding new ones, we try the reverse. No ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > Then its a separate bug. MD5 support added so many bugs its not even > funny. > Existing bugs are not an excuse for adding new ones, we try the reverse. > No ? Can you show me where sk_drops is used by TCP and what SNMP MIB value should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct. Thanks, -yi
Le jeudi 04 mars 2010 à 13:21 +0800, Zhu, Yi a écrit : > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Then its a separate bug. MD5 support added so many bugs its not even > > funny. > > > Existing bugs are not an excuse for adding new ones, we try the reverse. > > No ? > > Can you show me where sk_drops is used by TCP and what SNMP MIB value > should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct. sk_drop is not yet used by TCP, because when backlog processing is performed, TCP state machine has much finer grain capabilities to show why a packet is dropped. In our backlog drop, we dont examine details of packet and drop it at a lower level. Please add a new counter, say LINUX_MIB_TCPBACKLOGDROP : 3 added lines: - one in "include/linux/snmp.h" to define the MIB name - one to define its string - one to perform the increment when actual drop occurs) A good starting point would be to study recent commit 72032fdbcde8b333e65b3430e1bcb4358e2d6716 from Jamal (xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR) For sk_drop, just increment it and we can get it at generic sock level. Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Can you show me where sk_drops is used by TCP and what SNMP MIB value >> should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct. > sk_drop is not yet used by TCP, because when backlog processing is > performed, TCP state machine has much finer grain capabilities to show > why a packet is dropped. In our backlog drop, we dont examine details of > packet and drop it at a lower level. > Please add a new counter, say LINUX_MIB_TCPBACKLOGDROP : > 3 added lines: > - one in "include/linux/snmp.h" to define the MIB name > - one to define its string > - one to perform the increment when actual drop occurs) > A good starting point would be to study recent commit > 72032fdbcde8b333e65b3430e1bcb4358e2d6716 from Jamal > (xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR) > For sk_drop, just increment it and we can get it at generic sock level. Since neither sk_drops nor the new MIB value are used by TCP currently. How about I keep the tcp backlog limit patch as is and you implement the above in another patch? Thanks, -yi
Le jeudi 04 mars 2010 à 19:04 +0800, Zhu, Yi a écrit : > > Since neither sk_drops nor the new MIB value are used by TCP currently. > How about I keep the tcp backlog limit patch as is and you implement > the above in another patch? Sure, I will do that -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index c3588b4..4baf194 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1682,8 +1682,10 @@ process: if (!tcp_prequeue(sk, skb)) ret = tcp_v4_do_rcv(sk, skb); } - } else - sk_add_backlog(sk, skb); + } else if (sk_add_backlog_limited(sk, skb)) { + bh_unlock_sock(sk); + goto discard_and_relse; + } bh_unlock_sock(sk); sock_put(sk); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 6963a6b..c4ea9d5 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1740,8 +1740,10 @@ process: if (!tcp_prequeue(sk, skb)) ret = tcp_v6_do_rcv(sk, skb); } - } else - sk_add_backlog(sk, skb); + } else if (sk_add_backlog_limited(sk, skb)) { + bh_unlock_sock(sk); + goto discard_and_relse; + } bh_unlock_sock(sk); sock_put(sk);
Make tcp adapt to the limited socket backlog change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> --- net/ipv4/tcp_ipv4.c | 6 ++++-- net/ipv6/tcp_ipv6.c | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-)