diff mbox

IP: Increment INADDRERRORS if routing for a packet is not successful

Message ID alpine.DEB.2.00.1006011609280.9438@router.home
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Christoph Lameter June 1, 2010, 9:13 p.m. UTC
Something like this would have been very helpful during recent debugging
of multicast issues. Silent discards are bad.


If the kernel perceives that something is wrong with an incoming packet then the
IP stack currently silently discards packets. This makes it difficult to diagnose
problems with the network configurations (such as a misbehaving kernel
subsystem discarding multicast packets because the reverse path filter
does not like multicast subscriptions on the second NIC with rp_filter=1).

It is also necessary to know how many inbound packets are discarded to
assess networking issues in general with a NIC.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>

---
 net/ipv4/route.c |    3 +++
 1 file changed, 3 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet June 1, 2010, 10:07 p.m. UTC | #1
Le mardi 01 juin 2010 à 16:13 -0500, Christoph Lameter a écrit :
> Something like this would have been very helpful during recent debugging
> of multicast issues. Silent discards are bad.
> 

> 
> If the kernel perceives that something is wrong with an incoming packet then the
> IP stack currently silently discards packets. This makes it difficult to diagnose
> problems with the network configurations (such as a misbehaving kernel
> subsystem discarding multicast packets because the reverse path filter
> does not like multicast subscriptions on the second NIC with rp_filter=1).
> 
> It is also necessary to know how many inbound packets are discarded to
> assess networking issues in general with a NIC.
> 
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 

I disagree with this patch.

IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS.

In this path, we simulate the routing of a virtual packet, not its
delivery.

This should not affect IPSTATS SNMP entries.

You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ?

Dont inet_rtm_getroute() caller gets an error status anyway ?

> ---
>  net/ipv4/route.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6/net/ipv4/route.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/route.c	2010-06-01 11:46:10.000000000 -0500
> +++ linux-2.6/net/ipv4/route.c	2010-06-01 11:52:55.000000000 -0500
> @@ -2981,6 +2981,9 @@ static int inet_rtm_getroute(struct sk_b
>  		rt = skb_rtable(skb);
>  		if (err == 0 && rt->u.dst.error)
>  			err = -rt->u.dst.error;
> +		if (err)
> +			IP_INC_STATS_BH(dev_net(skb->dev),
> +					IPSTATS_MIB_INADDRERRORS);
>  	} else {
>  		struct flowi fl = {
>  			.nl_u = {
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 1, 2010, 10:23 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 02 Jun 2010 00:07:34 +0200

> IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS.
> 
> In this path, we simulate the routing of a virtual packet, not its
> delivery.
> 
> This should not affect IPSTATS SNMP entries.
> 
> You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ?

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter June 2, 2010, 3:27 p.m. UTC | #3
On Wed, 2 Jun 2010, Eric Dumazet wrote:

> Le mardi 01 juin 2010 à 16:13 -0500, Christoph Lameter a écrit :
> > Something like this would have been very helpful during recent debugging
> > of multicast issues. Silent discards are bad.
> >
>
> >
> > If the kernel perceives that something is wrong with an incoming packet then the
> > IP stack currently silently discards packets. This makes it difficult to diagnose
> > problems with the network configurations (such as a misbehaving kernel
> > subsystem discarding multicast packets because the reverse path filter
> > does not like multicast subscriptions on the second NIC with rp_filter=1).
> >
> > It is also necessary to know how many inbound packets are discarded to
> > assess networking issues in general with a NIC.
> >
> > Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
> > Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> >
>
> I disagree with this patch.
>
> IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS.
>
> In this path, we simulate the routing of a virtual packet, not its
> delivery.
>
> This should not affect IPSTATS SNMP entries.
>
> You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ?
>
> Dont inet_rtm_getroute() caller gets an error status anyway ?

Yes but they are not increment any counter. If packets are dropped because
of the rp_filter setting interfering f.e. then the packets vanish without
any accounting.

LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new
counters that allow us to diagnose and distinguish all the different
causes of packet loss? We would love to have that.
David Miller June 2, 2010, 3:29 p.m. UTC | #4
From: Christoph Lameter <cl@linux-foundation.org>
Date: Wed, 2 Jun 2010 10:27:13 -0500 (CDT)

> LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new
> counters that allow us to diagnose and distinguish all the different
> causes of packet loss? We would love to have that.

Within reason.

If you're going to spam the tree with something like 10 or 20 new
stat counters getting bumped all over the place, that's not what
we're trying to suggest here.

Consolidate as much as possible, add new things when absolutely
nothing existing fits the bill.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 2, 2010, 3:32 p.m. UTC | #5
Le mercredi 02 juin 2010 à 10:27 -0500, Christoph Lameter a écrit :

> Yes but they are not increment any counter. If packets are dropped because
> of the rp_filter setting interfering f.e. then the packets vanish without
> any accounting.
> 

But packets are vanishing either way.

> LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new
> counters that allow us to diagnose and distinguish all the different
> causes of packet loss? We would love to have that.
> 

For an example, you could take a look at commit 907cdda5205b
(tcp: Add SNMP counter for DEFER_ACCEPT)

Its pretty straigthforward, and wont conflict with monitoring apps.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter June 2, 2010, 4:12 p.m. UTC | #6
On Wed, 2 Jun 2010, Eric Dumazet wrote:

> Le mercredi 02 juin 2010 à 10:27 -0500, Christoph Lameter a écrit :
>
> > Yes but they are not increment any counter. If packets are dropped because
> > of the rp_filter setting interfering f.e. then the packets vanish without
> > any accounting.
> >
>
> But packets are vanishing either way.

Its important to know why drops occur (any drops for that matter, drops
mean retransmit which means latency). The symptom here was that
multicast traffic on secondary interfaces was not being forwarded to the application.
The rp_filter just dropped them and there was no way to easily track down
the issue for the people experiencing the problem.

In 2.6.31 the rp_filter was fixed to work properly with multicast and now
it considers multicast traffic to secondary interfaces to have the wrong
reverse path even though the multicast subscription occurred on the
secondary interface. So it drops multicast traffic. The rp_filter has to
be switched off when using multiple NICs for multicast load balancing.

> > LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new
> > counters that allow us to diagnose and distinguish all the different
> > causes of packet loss? We would love to have that.
> >
>
> For an example, you could take a look at commit 907cdda5205b
> (tcp: Add SNMP counter for DEFER_ACCEPT)

Well these are all TCP counters. I would add IP and UDP counters?
David Miller June 2, 2010, 4:19 p.m. UTC | #7
From: Christoph Lameter <cl@linux-foundation.org>
Date: Wed, 2 Jun 2010 11:12:14 -0500 (CDT)

> Its important to know why drops occur (any drops for that matter, drops
> mean retransmit which means latency).

We know, that's why there is a networking tracepoint that allows you
to see where all drops occur. :-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter June 2, 2010, 4:27 p.m. UTC | #8
On Wed, 2 Jun 2010, David Miller wrote:

> From: Christoph Lameter <cl@linux-foundation.org>
> Date: Wed, 2 Jun 2010 11:12:14 -0500 (CDT)
>
> > Its important to know why drops occur (any drops for that matter, drops
> > mean retransmit which means latency).
>
> We know, that's why there is a networking tracepoint that allows you
> to see where all drops occur. :-)

Where can I find out more about the network tracepoint?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 2, 2010, 4:28 p.m. UTC | #9
Le mercredi 02 juin 2010 à 11:12 -0500, Christoph Lameter a écrit :
> On Wed, 2 Jun 2010, Eric Dumazet wrote:
> 
> > Le mercredi 02 juin 2010 à 10:27 -0500, Christoph Lameter a écrit :
> >
> > > Yes but they are not increment any counter. If packets are dropped because
> > > of the rp_filter setting interfering f.e. then the packets vanish without
> > > any accounting.
> > >
> >
> > But packets are vanishing either way.
> 
> Its important to know why drops occur (any drops for that matter, drops
> mean retransmit which means latency). The symptom here was that
> multicast traffic on secondary interfaces was not being forwarded to the application.
> The rp_filter just dropped them and there was no way to easily track down
> the issue for the people experiencing the problem.
> 

I just dont follow you.

Your patch has nothing to do with dropped packets because of rp_filter. 

You inserted a counter increment in a path that is not taken at all by
packet delivery.

Maybe I missed something really obvious with your patch ?

It should only matters for the admin doing following command :

ip route get 1.2.3.4 from 192.168.0.1 iif eth0

That is a probe, not a 'packet delivery', this is why I said
incrementing an official MIB counter was probably wrong.

> In 2.6.31 the rp_filter was fixed to work properly with multicast and now
> it considers multicast traffic to secondary interfaces to have the wrong
> reverse path even though the multicast subscription occurred on the
> secondary interface. So it drops multicast traffic. The rp_filter has to
> be switched off when using multiple NICs for multicast load balancing.
> 

Have you considered CONFIG_NET_DROP_MONITOR ?
This one catches all possible cases, a developper doesnt have to patch
his kernel to add SNMP counters everywhere...

config NET_DROP_MONITOR
        boolean "Network packet drop alerting service"
        depends on INET && EXPERIMENTAL && TRACEPOINTS
        ---help---
        This feature provides an alerting service to userspace in the
        event that packets are discarded in the network stack.  Alerts
        are broadcast via netlink socket to any listening user space
        process.  If you don't need network drop alerts, or if you are ok
        just checking the various proc files and other utilities for
        drop statistics, say N here.


> > > LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new
> > > counters that allow us to diagnose and distinguish all the different
> > > causes of packet loss? We would love to have that.
> > >
> >
> > For an example, you could take a look at commit 907cdda5205b
> > (tcp: Add SNMP counter for DEFER_ACCEPT)
> 
> Well these are all TCP counters. I would add IP and UDP counters?

Why not ?

But as David said, it should be motivated by real use case ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 2, 2010, 4:33 p.m. UTC | #10
Le mercredi 02 juin 2010 à 11:27 -0500, Christoph Lameter a écrit :
> On Wed, 2 Jun 2010, David Miller wrote:
> 
> > From: Christoph Lameter <cl@linux-foundation.org>
> > Date: Wed, 2 Jun 2010 11:12:14 -0500 (CDT)
> >
> > > Its important to know why drops occur (any drops for that matter, drops
> > > mean retransmit which means latency).
> >
> > We know, that's why there is a networking tracepoint that allows you
> > to see where all drops occur. :-)
> 
> Where can I find out more about the network tracepoint?
> 

take a look at 

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter June 2, 2010, 4:35 p.m. UTC | #11
On Wed, 2 Jun 2010, Eric Dumazet wrote:

> Your patch has nothing to do with dropped packets because of rp_filter.
>
> You inserted a counter increment in a path that is not taken at all by
> packet delivery.
>
> Maybe I missed something really obvious with your patch ?

rp_filter rejects a route and packets are dropped then.

> Have you considered CONFIG_NET_DROP_MONITOR ?
> This one catches all possible cases, a developper doesnt have to patch
> his kernel to add SNMP counters everywhere...

Just looking at it. Great. A hook into skb_free. That will do.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter June 2, 2010, 4:49 p.m. UTC | #12
On Wed, 2 Jun 2010, Eric Dumazet wrote:

> take a look at
>
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch

System tap? Oh no. Also skbs may be freed for legitimate reasons. The
point is that the loss detection needs to be usable for a regular mortal.
With the counters in /proc/net/snmp you have something that is easy to
handle.

The approach with systemtap will need lots of work to both get the tracing
environment setup (local competence in systemtap) and participation by a
developer to figure out what the output means. Then there is also the
additional code overhead that you do not want by default in the kernel. So
we would need a different kernel for diagnostics.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 2, 2010, 5:12 p.m. UTC | #13
From: Christoph Lameter <cl@linux-foundation.org>
Date: Wed, 2 Jun 2010 11:49:18 -0500 (CDT)

> On Wed, 2 Jun 2010, Eric Dumazet wrote:
> 
>> take a look at
>>
>> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch
> 
> System tap?

You don't need to use system tap, just the normal tracing stuff using
sysfs files suffices.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 2, 2010, 5:19 p.m. UTC | #14
Le mercredi 02 juin 2010 à 10:12 -0700, David Miller a écrit :
> From: Christoph Lameter <cl@linux-foundation.org>
> Date: Wed, 2 Jun 2010 11:49:18 -0500 (CDT)
> 
> > On Wed, 2 Jun 2010, Eric Dumazet wrote:
> > 
> >> take a look at
> >>
> >> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch
> > 
> > System tap?
> 
> You don't need to use system tap, just the normal tracing stuff using
> sysfs files suffices.
> 

It would be good if Neil could gave us a man page or something ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 2, 2010, 5:31 p.m. UTC | #15
Just in case people are really so clueless as to be unable to figure
this out:

echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable
...do some stuff...
cat /sys/kernel/debug/tracing/trace

You can even trace it using 'perf' by passing "skb:kfree_skb"
as the event specifier.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman June 2, 2010, 5:41 p.m. UTC | #16
On Wed, Jun 02, 2010 at 07:19:10PM +0200, Eric Dumazet wrote:
> Le mercredi 02 juin 2010 à 10:12 -0700, David Miller a écrit :
> > From: Christoph Lameter <cl@linux-foundation.org>
> > Date: Wed, 2 Jun 2010 11:49:18 -0500 (CDT)
> > 
> > > On Wed, 2 Jun 2010, Eric Dumazet wrote:
> > > 
> > >> take a look at
> > >>
> > >> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch
> > > 
> > > System tap?
> > 
> > You don't need to use system tap, just the normal tracing stuff using
> > sysfs files suffices.
> > 
> 
> It would be good if Neil could gave us a man page or something ;)
> 
That stap script was really meant to be a stopgap measure.  As mentioned, you
can use the debugfs interface to turn tracepoints on and use them anyway you
wish.  Or, if you want to use the kfree_skb and napi_poll tracepoints in a more
formalized way, you can use the dropwatch user space utility:

https://fedorahosted.org/dropwatch/

Which includes a man page on usage :)

I also recently updated it so that this utility can query /proc/kallsyms to
translate program counter values into symbollic names and offsets for you. :)

Regards
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bill Fink June 3, 2010, 3:50 a.m. UTC | #17
On Wed, 02 Jun 2010, David Miller wrote:

> Just in case people are really so clueless as to be unable to figure
> this out:
> 
> echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable
> ...do some stuff...
> cat /sys/kernel/debug/tracing/trace
> 
> You can even trace it using 'perf' by passing "skb:kfree_skb"
> as the event specifier.

Could someone remind me where to get documentation for obtaining
and using perf.

						-Thanks

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 3, 2010, 3:54 a.m. UTC | #18
Le mercredi 02 juin 2010 à 23:50 -0400, Bill Fink a écrit :
> On Wed, 02 Jun 2010, David Miller wrote:
> 
> > Just in case people are really so clueless as to be unable to figure
> > this out:
> > 
> > echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable
> > ...do some stuff...
> > cat /sys/kernel/debug/tracing/trace
> > 
> > You can even trace it using 'perf' by passing "skb:kfree_skb"
> > as the event specifier.
> 
> Could someone remind me where to get documentation for obtaining
> and using perf.
> 

Its sources are included in kernel tree

cd tools/perf
make
./perf --help


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bill Fink June 3, 2010, 4:56 a.m. UTC | #19
On Thu, 03 Jun 2010, Eric Dumazet wrote:

> Le mercredi 02 juin 2010 à 23:50 -0400, Bill Fink a écrit :
> > On Wed, 02 Jun 2010, David Miller wrote:
> > 
> > > Just in case people are really so clueless as to be unable to figure
> > > this out:
> > > 
> > > echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable
> > > ...do some stuff...
> > > cat /sys/kernel/debug/tracing/trace
> > > 
> > > You can even trace it using 'perf' by passing "skb:kfree_skb"
> > > as the event specifier.
> > 
> > Could someone remind me where to get documentation for obtaining
> > and using perf.
> > 
> 
> Its sources are included in kernel tree
> 
> cd tools/perf
> make
> ./perf --help

Thanks.  That seems easy enough.  From some gitweb digging, it
appears perf first appeared in 2.6.31 kernels, so I'm going to
have to do a kernel upgrade from our current 2.6.30.10 kernels.

For others, I also ran across a perf wiki page at:

	https://perf.wiki.kernel.org/index.php/Main_Page

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/net/ipv4/route.c
===================================================================
--- linux-2.6.orig/net/ipv4/route.c	2010-06-01 11:46:10.000000000 -0500
+++ linux-2.6/net/ipv4/route.c	2010-06-01 11:52:55.000000000 -0500
@@ -2981,6 +2981,9 @@  static int inet_rtm_getroute(struct sk_b
 		rt = skb_rtable(skb);
 		if (err == 0 && rt->u.dst.error)
 			err = -rt->u.dst.error;
+		if (err)
+			IP_INC_STATS_BH(dev_net(skb->dev),
+					IPSTATS_MIB_INADDRERRORS);
 	} else {
 		struct flowi fl = {
 			.nl_u = {