Message ID | 4A98132C.8090105@gmail.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 28 Aug 2009 19:26:04 +0200 > [PATCH] ip: Report qdisc packet drops > > Christoph Lameter pointed out that packet drops at qdisc level where not > accounted in SNMP counters. Only if application sets IP_RECVERR, drops > are reported to user and SNMP counters updated. > > IP_RECVERR is used to enable extended reliable error message passing. > In case of tx drops at qdisc level, no error packet will be generated. > It seems un-necessary to hide the qdisc drops for non IP_RECVERR enabled > sockets (as probably most sockets are) > > By removing the check of IP_RECVERR enabled sockets in ip_push_pending_frames()/ > raw_send_hdrinc() / ip6_push_pending_frames() / rawv6_send_hdrinc(), > we can properly update IPSTATS_MIB_OUTDISCARDS, and in case of UDP, update > UDP_MIB_SNDBUFERRORS SNMP counters. > > Application send() syscalls, instead of returning an OK status (thus lying), > will return -ENOBUFS error. > > Note : send() manual page explicitly says for -ENOBUFS error : > > "The output queue for a network interface was full. > This generally indicates that the interface has stopped sending, > but may be caused by transient congestion. > (Normally, this does not occur in Linux. Packets are just silently > dropped when a device queue overflows.) " > > This was not true for IP_RECVERR enabled sockets for < 2.6.32 linuxes, > and starting from linux 2.6.32, last part wont be true at all. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Christoph Lameter <cl@linux-foundation.org> The core question in all of this is what IP_RECVERR means. As far as I remember Alexey Kuznetsov's intentions, it means that the application is interested in learning about errors caused by the infrastructure of the network between local and remote stacks. Reporting a qdisc level drop to the application by default has the potential to break applications, because BSD and other stacks do not do this. I can see why we might be able to get away with making this change now. And I also can see the benefits of it, for sure. Let me think about this some more. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 7d08210..afae0cb 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1302,7 +1302,7 @@ int ip_push_pending_frames(struct sock *sk) err = ip_local_out(skb); if (err) { if (err > 0) - err = inet->recverr ? net_xmit_errno(err) : 0; + err = net_xmit_errno(err); if (err) goto error; } diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 2979f14..80ff607 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -375,7 +375,7 @@ static int raw_send_hdrinc(struct sock *sk, void *from, size_t length, err = NF_HOOK(PF_INET, NF_INET_LOCAL_OUT, skb, NULL, rt->u.dst.dev, dst_output); if (err > 0) - err = inet->recverr ? net_xmit_errno(err) : 0; + err = net_xmit_errno(err); if (err) goto error; out: diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 6ad5aad..537e8cf 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1511,7 +1511,7 @@ int ip6_push_pending_frames(struct sock *sk) err = ip6_local_out(skb); if (err) { if (err > 0) - err = np->recverr ? net_xmit_errno(err) : 0; + err = net_xmit_errno(err); if (err) goto error; } diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 5068410..1f7ee61 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -642,7 +642,7 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, err = NF_HOOK(PF_INET6, NF_INET_LOCAL_OUT, skb, NULL, rt->u.dst.dev, dst_output); if (err > 0) - err = np->recverr ? net_xmit_errno(err) : 0; + err = net_xmit_errno(err); if (err) goto error; out: