diff mbox

[net-next-2.6] ip: Report qdisc packet drops

Message ID 4A9BBD8E.2010303@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Aug. 31, 2009, 12:09 p.m. UTC
David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 28 Aug 2009 19:26:04 +0200
> 
>> [PATCH] ip: Report qdisc packet drops
>>
>> Christoph Lameter pointed out that packet drops at qdisc level where not
>> accounted in SNMP counters. Only if application sets IP_RECVERR, drops
>> are reported to user and SNMP counters updated.
>>
>> IP_RECVERR is used to enable extended reliable error message passing.
>> In case of tx drops at qdisc level, no error packet will be generated.
>> It seems un-necessary to hide the qdisc drops for non IP_RECVERR enabled
>> sockets (as probably most sockets are)
>>
>> By removing the check of IP_RECVERR enabled sockets in ip_push_pending_frames()/
>> raw_send_hdrinc() / ip6_push_pending_frames() / rawv6_send_hdrinc(),
>> we can properly update IPSTATS_MIB_OUTDISCARDS, and in case of UDP, update
>> UDP_MIB_SNDBUFERRORS SNMP counters.
>>
>> Application send() syscalls, instead of returning an OK status (thus lying),
>> will return -ENOBUFS error.
>>
>> Note : send() manual page explicitly says for -ENOBUFS error :
>>
>>  "The output queue for a network interface was full.
>>   This generally indicates that the interface has stopped sending,
>>   but may be caused by transient congestion.
>>   (Normally, this does not occur in Linux. Packets are just silently
>>   dropped when a device queue overflows.) "
>>
>> This was not true for IP_RECVERR enabled sockets for < 2.6.32 linuxes,
>> and starting from linux 2.6.32, last part wont be true at all.
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
> 
> The core question in all of this is what IP_RECVERR means.
> 
> As far as I remember Alexey Kuznetsov's intentions, it means that the
> application is interested in learning about errors caused by the
> infrastructure of the network between local and remote stacks.
> 
> Reporting a qdisc level drop to the application by default has the
> potential to break applications, because BSD and other stacks do not
> do this.
> 
> I can see why we might be able to get away with making this change
> now.  And I also can see the benefits of it, for sure.
> 
> Let me think about this some more.

Yes I do agree this is risky.

Re-reading again this stuff, I realized ip6_push_pending_frames()
was not updating IPSTATS_MIB_OUTDISCARDS, even if IP_RECVERR was set.

May I suggest following path :

1) Correct ip6_push_pending_frames() to properly
account for dropped-by-qdisc frames when IP_RECVERR is set

2) Submit a patch to account for qdisc-dropped frames in SNMP counters
but still return a OK to user application, to not break them ?

Thanks

[PATCH] ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS

qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 2, 2009, 1:41 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 31 Aug 2009 14:09:50 +0200

> Re-reading again this stuff, I realized ip6_push_pending_frames()
> was not updating IPSTATS_MIB_OUTDISCARDS, even if IP_RECVERR was set.
> 
> May I suggest following path :
> 
> 1) Correct ip6_push_pending_frames() to properly
> account for dropped-by-qdisc frames when IP_RECVERR is set

Your patch is  applied to net-next-2.6, thanks!

> 2) Submit a patch to account for qdisc-dropped frames in SNMP counters
> but still return a OK to user application, to not break them ?

Sounds good.

I think if you sample random UDP applications, you will find that such
errors will upset them terribly, make them log tons of crap to
/var/log/messages et al., and consume tons of CPU.

And in such cases silent ignoring of drops is entirely appropriate and
optimal, which supports our current behavior.

If we are to make such applications "more sophisticated" such
converted apps can be indicated simply their use of IP_RECVERR.

If you want to be notified of all asynchronous errors we can detect,
you use this, end of story.  It is the only way to handle this
situation without breaking the world.

As usual, Alexey Kuznetsov's analysis of this situation is timeless,
accurate, and wise.  And he understood all of this 10+ years ago.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter Sept. 2, 2009, 6:22 p.m. UTC | #2
On Tue, 1 Sep 2009, David Miller wrote:

> > 2) Submit a patch to account for qdisc-dropped frames in SNMP counters
> > but still return a OK to user application, to not break them ?
>
> Sounds good.

Great. That was my initial suggestion and it would ensure that no apps
break.

> If we are to make such applications "more sophisticated" such
> converted apps can be indicated simply their use of IP_RECVERR.

There may be a minor issue here in that IP_RECVERR sometimes sends error
packets that have to be intercepted using special code. Or can those be
simply ignored? If so then I will ask UDP app vendors to use IP_RECVERR.

> As usual, Alexey Kuznetsov's analysis of this situation is timeless,
> accurate, and wise.  And he understood all of this 10+ years ago.

His code was just slightly buggy .... ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 3, 2009, 1:09 a.m. UTC | #3
From: Christoph Lameter <cl@linux-foundation.org>
Date: Wed, 2 Sep 2009 13:22:25 -0500 (CDT)

> There may be a minor issue here in that IP_RECVERR sometimes sends error
> packets that have to be intercepted using special code. Or can those be
> simply ignored? If so then I will ask UDP app vendors to use IP_RECVERR.

If you don't set MSG_ERRQUEUE, no special error reports will be
given to the application.

And this only matters for recvmsg() handling.

On send, the only behavior difference is this error code reporting
(and before Eric's patch, SNMP statistics handling).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6ad5aad..a931229 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1520,6 +1520,7 @@  out:
 	ip6_cork_release(inet, np);
 	return err;
 error:
+	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
 	goto out;
 }