diff mbox

[net-next-2.6] ip: Report qdisc packet drops

Message ID 4A98132C.8090105@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Aug. 28, 2009, 5:26 p.m. UTC
Christoph Lameter a écrit :
> On Fri, 28 Aug 2009, Eric Dumazet wrote:
>> Only change you want is eventually to account for the UDP drop (SndbufErrors).
> 
> That is only a counter at the UDP layer. That one does not allow you to
> identify which NIC it was nor which application caused it.

We dont have per-application SNMP counters, so application is responsible
to get proper syscall return check logic / accounting if necessary.

NIC level, just forget it, it makes no sense.

> 
> But its already a big improvement to see TX drops at all. Could we get
> your latest patch merged soon?

I officially submit it, but David have to take it or reject it :)

Thanks

[PATCH] ip: Report qdisc packet drops

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing.
In case of tx drops at qdisc level, no error packet will be generated.
It seems un-necessary to hide the qdisc drops for non IP_RECVERR enabled
sockets (as probably most sockets are)

By removing the check of IP_RECVERR enabled sockets in ip_push_pending_frames()/
raw_send_hdrinc() / ip6_push_pending_frames() / rawv6_send_hdrinc(),
we can properly update IPSTATS_MIB_OUTDISCARDS, and in case of UDP, update
UDP_MIB_SNDBUFERRORS SNMP counters.

Application send() syscalls, instead of returning an OK status (thus lying),
will return -ENOBUFS error.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This was not true for IP_RECVERR enabled sockets for < 2.6.32 linuxes,
and starting from linux 2.6.32, last part wont be true at all.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
 net/ipv4/ip_output.c  |    2 +-
 net/ipv4/raw.c        |    2 +-
 net/ipv6/ip6_output.c |    2 +-
 net/ipv6/raw.c        |    2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Aug. 29, 2009, 6:38 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 28 Aug 2009 19:26:04 +0200

> [PATCH] ip: Report qdisc packet drops
> 
> Christoph Lameter pointed out that packet drops at qdisc level where not
> accounted in SNMP counters. Only if application sets IP_RECVERR, drops
> are reported to user and SNMP counters updated.
> 
> IP_RECVERR is used to enable extended reliable error message passing.
> In case of tx drops at qdisc level, no error packet will be generated.
> It seems un-necessary to hide the qdisc drops for non IP_RECVERR enabled
> sockets (as probably most sockets are)
> 
> By removing the check of IP_RECVERR enabled sockets in ip_push_pending_frames()/
> raw_send_hdrinc() / ip6_push_pending_frames() / rawv6_send_hdrinc(),
> we can properly update IPSTATS_MIB_OUTDISCARDS, and in case of UDP, update
> UDP_MIB_SNDBUFERRORS SNMP counters.
> 
> Application send() syscalls, instead of returning an OK status (thus lying),
> will return -ENOBUFS error.
> 
> Note : send() manual page explicitly says for -ENOBUFS error :
> 
>  "The output queue for a network interface was full.
>   This generally indicates that the interface has stopped sending,
>   but may be caused by transient congestion.
>   (Normally, this does not occur in Linux. Packets are just silently
>   dropped when a device queue overflows.) "
> 
> This was not true for IP_RECVERR enabled sockets for < 2.6.32 linuxes,
> and starting from linux 2.6.32, last part wont be true at all.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

The core question in all of this is what IP_RECVERR means.

As far as I remember Alexey Kuznetsov's intentions, it means that the
application is interested in learning about errors caused by the
infrastructure of the network between local and remote stacks.

Reporting a qdisc level drop to the application by default has the
potential to break applications, because BSD and other stacks do not
do this.

I can see why we might be able to get away with making this change
now.  And I also can see the benefits of it, for sure.

Let me think about this some more.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7d08210..afae0cb 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1302,7 +1302,7 @@  int ip_push_pending_frames(struct sock *sk)
 	err = ip_local_out(skb);
 	if (err) {
 		if (err > 0)
-			err = inet->recverr ? net_xmit_errno(err) : 0;
+			err = net_xmit_errno(err);
 		if (err)
 			goto error;
 	}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2979f14..80ff607 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -375,7 +375,7 @@  static int raw_send_hdrinc(struct sock *sk, void *from, size_t length,
 	err = NF_HOOK(PF_INET, NF_INET_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
 		      dst_output);
 	if (err > 0)
-		err = inet->recverr ? net_xmit_errno(err) : 0;
+		err = net_xmit_errno(err);
 	if (err)
 		goto error;
 out:
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6ad5aad..537e8cf 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1511,7 +1511,7 @@  int ip6_push_pending_frames(struct sock *sk)
 	err = ip6_local_out(skb);
 	if (err) {
 		if (err > 0)
-			err = np->recverr ? net_xmit_errno(err) : 0;
+			err = net_xmit_errno(err);
 		if (err)
 			goto error;
 	}
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 5068410..1f7ee61 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -642,7 +642,7 @@  static int rawv6_send_hdrinc(struct sock *sk, void *from, int length,
 	err = NF_HOOK(PF_INET6, NF_INET_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
 		      dst_output);
 	if (err > 0)
-		err = np->recverr ? net_xmit_errno(err) : 0;
+		err = net_xmit_errno(err);
 	if (err)
 		goto error;
 out: