Message ID | 4A930DEF.5000008@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2009-08-25 at 00:02 +0200, Eric Dumazet wrote: > Christoph Lameter a écrit : > > On Mon, 17 Aug 2009, Sridhar Samudrala wrote: > > > >> So it is possible that there is some other place in the stack where the packets > >> are gettting dropped but not counted. > > > > Such a deed occurs in ip_push_pending_frames(): > > > > /* Netfilter gets whole the not fragmented skb. */ > > err = ip_local_out(skb); > > if (err) { > > if (err > 0) > > err = inet->recverr ? net_xmit_errno(err) : 0; > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > if (err) > > goto error; > > } > > > > out: > > ip_cork_release(inet); > > return err; > > > > error: > > IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS); > > goto out; > > > > > > So if ip_local_out returns NET_XMIT_DROP then its simply going to be > > replaced by 0. Then we check err again and there is no error!!!! Christoph, So are you hitting this case with your workload and does this account for all the packet losses you are seeing? If we are dropping the packet and returing NET_XMIT_DROP, should we also increment qdisc drop stats (sch->qstats.drops)? In dev_queue_xmit(), we have if (q->enqueue) { spinlock_t *root_lock = qdisc_lock(q); spin_lock(root_lock); if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) { kfree_skb(skb); rc = NET_XMIT_DROP; } else { rc = qdisc_enqueue_root(skb, q); qdisc_run(q); } spin_unlock(root_lock); goto out; } Here, if QDISC_STATE_DEACTIVATED is true, the skb is dropped and NET_XMIT_DROP is returned, but not accounted in qdisc drop stats. However it is incremented when NET_XMIT_DROP is returned via qdisc_drop(). If we count these drops as qdisc drops, should we also count them as IP OUTDISCARDS? Thanks Sridhar > > > NET_XMIT_CN strikes again :) > > Well, if ip_local_out() returns a negative error (say -EPERM for example), > your patch disables OUTDISCARDS increments. > > Maybe a simpler patch like this one ? > > [PATCH] net: correctly updates OUTDISCARDS in ip_push_pending_frames() > > ip_push_pending_frames() can fail to send a frame because of a congestioned > device. In this case, we increment SNMP OUTDISCARDS only if user set > IP_RECVERR, which is not RFC conformant. > > Only case where we should not update OUTDISCARDS is when > ip_local_output() return value is NET_XMIT_CN (meaning > skb was xmitted but future frames might be dropped) > > Signed-off-by: Christoph Lameter <cl@linux-foundation.org> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 7d08210..27a5b79 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1301,19 +1301,15 @@ int ip_push_pending_frames(struct sock *sk) > /* Netfilter gets whole the not fragmented skb. */ > err = ip_local_out(skb); > if (err) { > + if (err != NET_XMIT_CN) > + IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS); > if (err > 0) > err = inet->recverr ? net_xmit_errno(err) : 0; > - if (err) > - goto error; > } > > out: > ip_cork_release(inet); > return err; > - > -error: > - IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS); > - goto out; > } > > /* -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009, Eric Dumazet wrote: > NET_XMIT_CN strikes again :) > > Well, if ip_local_out() returns a negative error (say -EPERM for example), > your patch disables OUTDISCARDS increments. > > Maybe a simpler patch like this one ? Yes looks good. Can we get this in soon? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > On Tue, 25 Aug 2009, Eric Dumazet wrote: > >> NET_XMIT_CN strikes again :) >> >> Well, if ip_local_out() returns a negative error (say -EPERM for example), >> your patch disables OUTDISCARDS increments. >> >> Maybe a simpler patch like this one ? > > Yes looks good. Can we get this in soon? > Please hold on, I would like to fully understand what's happening, and test the patch :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 24 Aug 2009, Sridhar Samudrala wrote: > So are you hitting this case with your workload and does this account for all the > packet losses you are seeing? Yes. > If we are dropping the packet and returing NET_XMIT_DROP, should > we also increment qdisc drop stats (sch->qstats.drops)? I think so but I am no expert. I was surprised to not even see counter increments at that level. But I was content to fix up the higher level tracking to have at least one counter that showed the packet loss. > If we count these drops as qdisc drops, should we also count them as IP OUTDISCARDS? Yes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009, Eric Dumazet wrote: > Please hold on, I would like to fully understand what's happening, > and test the patch :) Ok. It would be good if the drops would also be somehow noted by the UDP subsystem (one should see something with netstat -su) and may be even the socket. I see a drops column in /proc/net/udp. rx_drops, tx_drops? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > On Tue, 25 Aug 2009, Eric Dumazet wrote: > >> Please hold on, I would like to fully understand what's happening, >> and test the patch :) > > Ok. It would be good if the drops would also be somehow noted by the UDP > subsystem (one should see something with netstat -su) and may be even the > socket. I see a drops column in /proc/net/udp. rx_drops, tx_drops? This /proc/net/udp column is for rx_drops currently and was recently added... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009, Eric Dumazet wrote: > Christoph Lameter a ?crit : > > On Tue, 25 Aug 2009, Eric Dumazet wrote: > > > >> Please hold on, I would like to fully understand what's happening, > >> and test the patch :) > > > > Ok. It would be good if the drops would also be somehow noted by the UDP > > subsystem (one should see something with netstat -su) and may be even the > > socket. I see a drops column in /proc/net/udp. rx_drops, tx_drops? > > This /proc/net/udp column is for rx_drops currently and was recently added... So lets rename it to rx_drops and then add tx_drops? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > On Tue, 25 Aug 2009, Eric Dumazet wrote: > >> Christoph Lameter a ?crit : >>> On Tue, 25 Aug 2009, Eric Dumazet wrote: >>> >>>> Please hold on, I would like to fully understand what's happening, >>>> and test the patch :) >>> Ok. It would be good if the drops would also be somehow noted by the UDP >>> subsystem (one should see something with netstat -su) and may be even the >>> socket. I see a drops column in /proc/net/udp. rx_drops, tx_drops? >> This /proc/net/udp column is for rx_drops currently and was recently added... > > So lets rename it to rx_drops and then add tx_drops? > It wont be very nice, because it'll add yet another 32bits counter in each socket structure, for a unlikely use. While rx_drops can happen if application is slow. Also, tx_drops might be done later and not noticed. Please read this old (and usefull) thread, with Alexey words... http://oss.sgi.com/archives/netdev/2002-10/msg00612.html http://oss.sgi.com/archives/netdev/2002-10/msg00617.html So I bet your best choice is to set IP_RECVERR, as mentioned in 2002 by Jamal and Alexey :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009, Eric Dumazet wrote: > It wont be very nice, because it'll add yet another 32bits counter in each socket > structure, for a unlikely use. While rx_drops can happen if application is slow. tx_drops happen if the application sends too fast. TX drop tracking is important due to the braindamaged throttling logic during send. If SO_SNDBUF is less than what happens to fit in the TX ring then the application will be throttled and no packet loss happens. If SO_SNDBUF is set high then the TX ring will overflow and packets are dropped. We need some way to diagnose TX drops per socket as long as we have that mind boggling issue. TX drops means that one should reduce the size of the sendbuffer in order to get better throttling which reduces packet loss. > Also, tx_drops might be done later and not noticed. > > Please read this old (and usefull) thread, with Alexey words... > > http://oss.sgi.com/archives/netdev/2002-10/msg00612.html > > http://oss.sgi.com/archives/netdev/2002-10/msg00617.html > > > So I bet your best choice is to set IP_RECVERR, as mentioned in 2002 by Jamal and Alexey :) I read this just yesterday. IP_RECVERR means that the application wants to see details on each loss. We just want some counters that give us accurate statistics to gauge where packet loss is occurring. Applications are usually not interested in tracking the fate of each packet. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > On Tue, 25 Aug 2009, Eric Dumazet wrote: > >> It wont be very nice, because it'll add yet another 32bits counter in each socket >> structure, for a unlikely use. While rx_drops can happen if application is slow. > > tx_drops happen if the application sends too fast. > > TX drop tracking is important due to the braindamaged throttling logic > during send. If SO_SNDBUF is less than what happens to fit in the TX ring then the > application will be throttled and no packet loss happens. If SO_SNDBUF is > set high then the TX ring will overflow and packets are dropped. > > We need some way to diagnose TX drops per socket as long as we have > that mind boggling issue. TX drops means that one should reduce the size > of the sendbuffer in order to get better throttling which reduces packet > loss. > >> Also, tx_drops might be done later and not noticed. >> >> Please read this old (and usefull) thread, with Alexey words... >> >> http://oss.sgi.com/archives/netdev/2002-10/msg00612.html >> >> http://oss.sgi.com/archives/netdev/2002-10/msg00617.html >> >> >> So I bet your best choice is to set IP_RECVERR, as mentioned in 2002 by Jamal and Alexey :) > > I read this just yesterday. IP_RECVERR means that the application wants to > see details on each loss. We just want some counters that give us accurate > statistics to gauge where packet loss is occurring. Applications are > usually not interested in tracking the fate of each packet. Yep, but IP_RECVERR also has the side effect of letting kernel returns -ENOBUFS error in sending and congestion, which was your initial point :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009, Eric Dumazet wrote: > > I read this just yesterday. IP_RECVERR means that the application wants to > > see details on each loss. We just want some counters that give us accurate > > statistics to gauge where packet loss is occurring. Applications are > > usually not interested in tracking the fate of each packet. > > Yep, but IP_RECVERR also has the side effect of letting kernel returns -ENOBUFS error > in sending and congestion, which was your initial point :) The initial point was that the SNMP counters are not updated if IP_RECVERR is not set which is clearly a bug and your and my patch addresses that. Then Sridhar noted that there are other tx drop counters. qdisc counters are also not updated. Wish we would maintain tx drops counters there as well so that we can track down which NIC drops it. Then came the wishlist of UDP counters for tx drops and socket based tx_drop accounting for tuning and tracking down which app is sending too fast .... ;-) The apps could be third party apps. Just need to be able to troubleshoot packet loss. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter <cl@linux-foundation.org> wrote on 08/25/2009 06:48:24 AM: > On Mon, 24 Aug 2009, Sridhar Samudrala wrote: > > If we count these drops as qdisc drops, should we also count them as IP OUTDISCARDS? > > Yes. Actually, no. (!) IP_OUTDISCARDS should count the packets IP dropped, not anything dropped at a lower layer (which, in general, it is not aware of). If you count these in multiple layers, then you don't really know who dropped it. +-DLS -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Stevens <dlstevens@us.ibm.com> Date: Tue, 25 Aug 2009 12:03:58 -0700 > IP_OUTDISCARDS should count the packets IP dropped, not > anything dropped at a lower layer (which, in general, it > is not aware of). If you count these in multiple layers, > then you don't really know who dropped it. Right. We are in danger of going from one extreme to the other. Previously we lacked some drop detection capabilities but now we've filled most of these holes and ON TOP of all of that we have Neil's SKB drop tracer. Let's not get carried away over-accounting this stuff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009, David Stevens wrote: > Christoph Lameter <cl@linux-foundation.org> wrote on 08/25/2009 06:48:24 > AM: > > > On Mon, 24 Aug 2009, Sridhar Samudrala wrote: > > > > If we count these drops as qdisc drops, should we also count them as > IP OUTDISCARDS? > > > > Yes. > > Actually, no. (!) > > IP_OUTDISCARDS should count the packets IP dropped, not > anything dropped at a lower layer (which, in general, it > is not aware of). If you count these in multiple layers, > then you don't really know who dropped it. You are right. I skipped that IP OUTDICARDS reference. They need to be accounted at the qdisc level though. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2009-08-25 at 15:15 -0400, Christoph Lameter wrote: > On Tue, 25 Aug 2009, David Stevens wrote: > > Christoph Lameter <cl@linux-foundation.org> wrote on 08/25/2009 06:48:24 > > AM: > > > On Mon, 24 Aug 2009, Sridhar Samudrala wrote: > > > > If we count these drops as qdisc drops, should we also count them as > > IP OUTDISCARDS? > > > Yes. > > Actually, no. (!) > > IP_OUTDISCARDS should count the packets IP dropped, not > > anything dropped at a lower layer (which, in general, it > > is not aware of). If you count these in multiple layers, > > then you don't really know who dropped it. > They need to be accounted at the qdisc level though. It's probably useful to be able to know when packets and payloads are dropped. It may not be necessary though. It's probably not a fundamental. Chariot, LANforge and apps like it might care, but most all other apps might not care at all. Maybe these should be allowed with a CONFIG. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 7d08210..27a5b79 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1301,19 +1301,15 @@ int ip_push_pending_frames(struct sock *sk) /* Netfilter gets whole the not fragmented skb. */ err = ip_local_out(skb); if (err) { + if (err != NET_XMIT_CN) + IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS); if (err > 0) err = inet->recverr ? net_xmit_errno(err) : 0; - if (err) - goto error; } out: ip_cork_release(inet); return err; - -error: - IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS); - goto out; } /*