Message ID | alpine.DEB.2.00.1006011609280.9438@router.home |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Le mardi 01 juin 2010 à 16:13 -0500, Christoph Lameter a écrit : > Something like this would have been very helpful during recent debugging > of multicast issues. Silent discards are bad. > > > If the kernel perceives that something is wrong with an incoming packet then the > IP stack currently silently discards packets. This makes it difficult to diagnose > problems with the network configurations (such as a misbehaving kernel > subsystem discarding multicast packets because the reverse path filter > does not like multicast subscriptions on the second NIC with rp_filter=1). > > It is also necessary to know how many inbound packets are discarded to > assess networking issues in general with a NIC. > > Signed-off-by: Christoph Lameter <cl@linux-foundation.org> > Acked-by: Stephen Hemminger <shemminger@vyatta.com> > I disagree with this patch. IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS. In this path, we simulate the routing of a virtual packet, not its delivery. This should not affect IPSTATS SNMP entries. You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ? Dont inet_rtm_getroute() caller gets an error status anyway ? > --- > net/ipv4/route.c | 3 +++ > 1 file changed, 3 insertions(+) > > Index: linux-2.6/net/ipv4/route.c > =================================================================== > --- linux-2.6.orig/net/ipv4/route.c 2010-06-01 11:46:10.000000000 -0500 > +++ linux-2.6/net/ipv4/route.c 2010-06-01 11:52:55.000000000 -0500 > @@ -2981,6 +2981,9 @@ static int inet_rtm_getroute(struct sk_b > rt = skb_rtable(skb); > if (err == 0 && rt->u.dst.error) > err = -rt->u.dst.error; > + if (err) > + IP_INC_STATS_BH(dev_net(skb->dev), > + IPSTATS_MIB_INADDRERRORS); > } else { > struct flowi fl = { > .nl_u = { > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 02 Jun 2010 00:07:34 +0200 > IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS. > > In this path, we simulate the routing of a virtual packet, not its > delivery. > > This should not affect IPSTATS SNMP entries. > > You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ? Agreed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2 Jun 2010, Eric Dumazet wrote: > Le mardi 01 juin 2010 à 16:13 -0500, Christoph Lameter a écrit : > > Something like this would have been very helpful during recent debugging > > of multicast issues. Silent discards are bad. > > > > > > > If the kernel perceives that something is wrong with an incoming packet then the > > IP stack currently silently discards packets. This makes it difficult to diagnose > > problems with the network configurations (such as a misbehaving kernel > > subsystem discarding multicast packets because the reverse path filter > > does not like multicast subscriptions on the second NIC with rp_filter=1). > > > > It is also necessary to know how many inbound packets are discarded to > > assess networking issues in general with a NIC. > > > > Signed-off-by: Christoph Lameter <cl@linux-foundation.org> > > Acked-by: Stephen Hemminger <shemminger@vyatta.com> > > > > I disagree with this patch. > > IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS. > > In this path, we simulate the routing of a virtual packet, not its > delivery. > > This should not affect IPSTATS SNMP entries. > > You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ? > > Dont inet_rtm_getroute() caller gets an error status anyway ? Yes but they are not increment any counter. If packets are dropped because of the rp_filter setting interfering f.e. then the packets vanish without any accounting. LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new counters that allow us to diagnose and distinguish all the different causes of packet loss? We would love to have that.
From: Christoph Lameter <cl@linux-foundation.org> Date: Wed, 2 Jun 2010 10:27:13 -0500 (CDT) > LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new > counters that allow us to diagnose and distinguish all the different > causes of packet loss? We would love to have that. Within reason. If you're going to spam the tree with something like 10 or 20 new stat counters getting bumped all over the place, that's not what we're trying to suggest here. Consolidate as much as possible, add new things when absolutely nothing existing fits the bill. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 02 juin 2010 à 10:27 -0500, Christoph Lameter a écrit : > Yes but they are not increment any counter. If packets are dropped because > of the rp_filter setting interfering f.e. then the packets vanish without > any accounting. > But packets are vanishing either way. > LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new > counters that allow us to diagnose and distinguish all the different > causes of packet loss? We would love to have that. > For an example, you could take a look at commit 907cdda5205b (tcp: Add SNMP counter for DEFER_ACCEPT) Its pretty straigthforward, and wont conflict with monitoring apps. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2 Jun 2010, Eric Dumazet wrote: > Le mercredi 02 juin 2010 à 10:27 -0500, Christoph Lameter a écrit : > > > Yes but they are not increment any counter. If packets are dropped because > > of the rp_filter setting interfering f.e. then the packets vanish without > > any accounting. > > > > But packets are vanishing either way. Its important to know why drops occur (any drops for that matter, drops mean retransmit which means latency). The symptom here was that multicast traffic on secondary interfaces was not being forwarded to the application. The rp_filter just dropped them and there was no way to easily track down the issue for the people experiencing the problem. In 2.6.31 the rp_filter was fixed to work properly with multicast and now it considers multicast traffic to secondary interfaces to have the wrong reverse path even though the multicast subscription occurred on the secondary interface. So it drops multicast traffic. The rp_filter has to be switched off when using multiple NICs for multicast load balancing. > > LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new > > counters that allow us to diagnose and distinguish all the different > > causes of packet loss? We would love to have that. > > > > For an example, you could take a look at commit 907cdda5205b > (tcp: Add SNMP counter for DEFER_ACCEPT) Well these are all TCP counters. I would add IP and UDP counters?
From: Christoph Lameter <cl@linux-foundation.org> Date: Wed, 2 Jun 2010 11:12:14 -0500 (CDT) > Its important to know why drops occur (any drops for that matter, drops > mean retransmit which means latency). We know, that's why there is a networking tracepoint that allows you to see where all drops occur. :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2 Jun 2010, David Miller wrote: > From: Christoph Lameter <cl@linux-foundation.org> > Date: Wed, 2 Jun 2010 11:12:14 -0500 (CDT) > > > Its important to know why drops occur (any drops for that matter, drops > > mean retransmit which means latency). > > We know, that's why there is a networking tracepoint that allows you > to see where all drops occur. :-) Where can I find out more about the network tracepoint? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 02 juin 2010 à 11:12 -0500, Christoph Lameter a écrit : > On Wed, 2 Jun 2010, Eric Dumazet wrote: > > > Le mercredi 02 juin 2010 à 10:27 -0500, Christoph Lameter a écrit : > > > > > Yes but they are not increment any counter. If packets are dropped because > > > of the rp_filter setting interfering f.e. then the packets vanish without > > > any accounting. > > > > > > > But packets are vanishing either way. > > Its important to know why drops occur (any drops for that matter, drops > mean retransmit which means latency). The symptom here was that > multicast traffic on secondary interfaces was not being forwarded to the application. > The rp_filter just dropped them and there was no way to easily track down > the issue for the people experiencing the problem. > I just dont follow you. Your patch has nothing to do with dropped packets because of rp_filter. You inserted a counter increment in a path that is not taken at all by packet delivery. Maybe I missed something really obvious with your patch ? It should only matters for the admin doing following command : ip route get 1.2.3.4 from 192.168.0.1 iif eth0 That is a probe, not a 'packet delivery', this is why I said incrementing an official MIB counter was probably wrong. > In 2.6.31 the rp_filter was fixed to work properly with multicast and now > it considers multicast traffic to secondary interfaces to have the wrong > reverse path even though the multicast subscription occurred on the > secondary interface. So it drops multicast traffic. The rp_filter has to > be switched off when using multiple NICs for multicast load balancing. > Have you considered CONFIG_NET_DROP_MONITOR ? This one catches all possible cases, a developper doesnt have to patch his kernel to add SNMP counters everywhere... config NET_DROP_MONITOR boolean "Network packet drop alerting service" depends on INET && EXPERIMENTAL && TRACEPOINTS ---help--- This feature provides an alerting service to userspace in the event that packets are discarded in the network stack. Alerts are broadcast via netlink socket to any listening user space process. If you don't need network drop alerts, or if you are ok just checking the various proc files and other utilities for drop statistics, say N here. > > > LINUX_MIB_INROUTEERRORS? Does it mean I can create a series of new > > > counters that allow us to diagnose and distinguish all the different > > > causes of packet loss? We would love to have that. > > > > > > > For an example, you could take a look at commit 907cdda5205b > > (tcp: Add SNMP counter for DEFER_ACCEPT) > > Well these are all TCP counters. I would add IP and UDP counters? Why not ? But as David said, it should be motivated by real use case ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 02 juin 2010 à 11:27 -0500, Christoph Lameter a écrit : > On Wed, 2 Jun 2010, David Miller wrote: > > > From: Christoph Lameter <cl@linux-foundation.org> > > Date: Wed, 2 Jun 2010 11:12:14 -0500 (CDT) > > > > > Its important to know why drops occur (any drops for that matter, drops > > > mean retransmit which means latency). > > > > We know, that's why there is a networking tracepoint that allows you > > to see where all drops occur. :-) > > Where can I find out more about the network tracepoint? > take a look at http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2 Jun 2010, Eric Dumazet wrote: > Your patch has nothing to do with dropped packets because of rp_filter. > > You inserted a counter increment in a path that is not taken at all by > packet delivery. > > Maybe I missed something really obvious with your patch ? rp_filter rejects a route and packets are dropped then. > Have you considered CONFIG_NET_DROP_MONITOR ? > This one catches all possible cases, a developper doesnt have to patch > his kernel to add SNMP counters everywhere... Just looking at it. Great. A hook into skb_free. That will do. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2 Jun 2010, Eric Dumazet wrote: > take a look at > > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch System tap? Oh no. Also skbs may be freed for legitimate reasons. The point is that the loss detection needs to be usable for a regular mortal. With the counters in /proc/net/snmp you have something that is easy to handle. The approach with systemtap will need lots of work to both get the tracing environment setup (local competence in systemtap) and participation by a developer to figure out what the output means. Then there is also the additional code overhead that you do not want by default in the kernel. So we would need a different kernel for diagnostics. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Christoph Lameter <cl@linux-foundation.org> Date: Wed, 2 Jun 2010 11:49:18 -0500 (CDT) > On Wed, 2 Jun 2010, Eric Dumazet wrote: > >> take a look at >> >> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch > > System tap? You don't need to use system tap, just the normal tracing stuff using sysfs files suffices. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 02 juin 2010 à 10:12 -0700, David Miller a écrit : > From: Christoph Lameter <cl@linux-foundation.org> > Date: Wed, 2 Jun 2010 11:49:18 -0500 (CDT) > > > On Wed, 2 Jun 2010, Eric Dumazet wrote: > > > >> take a look at > >> > >> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch > > > > System tap? > > You don't need to use system tap, just the normal tracing stuff using > sysfs files suffices. > It would be good if Neil could gave us a man page or something ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Just in case people are really so clueless as to be unable to figure this out: echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable ...do some stuff... cat /sys/kernel/debug/tracing/trace You can even trace it using 'perf' by passing "skb:kfree_skb" as the event specifier. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 02, 2010 at 07:19:10PM +0200, Eric Dumazet wrote: > Le mercredi 02 juin 2010 à 10:12 -0700, David Miller a écrit : > > From: Christoph Lameter <cl@linux-foundation.org> > > Date: Wed, 2 Jun 2010 11:49:18 -0500 (CDT) > > > > > On Wed, 2 Jun 2010, Eric Dumazet wrote: > > > > > >> take a look at > > >> > > >> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/SystemTap_Beginners_Guide/useful-systemtap-scripts.html#dropwatch > > > > > > System tap? > > > > You don't need to use system tap, just the normal tracing stuff using > > sysfs files suffices. > > > > It would be good if Neil could gave us a man page or something ;) > That stap script was really meant to be a stopgap measure. As mentioned, you can use the debugfs interface to turn tracepoints on and use them anyway you wish. Or, if you want to use the kfree_skb and napi_poll tracepoints in a more formalized way, you can use the dropwatch user space utility: https://fedorahosted.org/dropwatch/ Which includes a man page on usage :) I also recently updated it so that this utility can query /proc/kallsyms to translate program counter values into symbollic names and offsets for you. :) Regards Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 02 Jun 2010, David Miller wrote: > Just in case people are really so clueless as to be unable to figure > this out: > > echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable > ...do some stuff... > cat /sys/kernel/debug/tracing/trace > > You can even trace it using 'perf' by passing "skb:kfree_skb" > as the event specifier. Could someone remind me where to get documentation for obtaining and using perf. -Thanks -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 02 juin 2010 à 23:50 -0400, Bill Fink a écrit : > On Wed, 02 Jun 2010, David Miller wrote: > > > Just in case people are really so clueless as to be unable to figure > > this out: > > > > echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable > > ...do some stuff... > > cat /sys/kernel/debug/tracing/trace > > > > You can even trace it using 'perf' by passing "skb:kfree_skb" > > as the event specifier. > > Could someone remind me where to get documentation for obtaining > and using perf. > Its sources are included in kernel tree cd tools/perf make ./perf --help -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 03 Jun 2010, Eric Dumazet wrote: > Le mercredi 02 juin 2010 à 23:50 -0400, Bill Fink a écrit : > > On Wed, 02 Jun 2010, David Miller wrote: > > > > > Just in case people are really so clueless as to be unable to figure > > > this out: > > > > > > echo 1 >/sys/kernel/debug/tracing/events/skb/kfree_skb/enable > > > ...do some stuff... > > > cat /sys/kernel/debug/tracing/trace > > > > > > You can even trace it using 'perf' by passing "skb:kfree_skb" > > > as the event specifier. > > > > Could someone remind me where to get documentation for obtaining > > and using perf. > > > > Its sources are included in kernel tree > > cd tools/perf > make > ./perf --help Thanks. That seems easy enough. From some gitweb digging, it appears perf first appeared in 2.6.31 kernels, so I'm going to have to do a kernel upgrade from our current 2.6.30.10 kernels. For others, I also ran across a perf wiki page at: https://perf.wiki.kernel.org/index.php/Main_Page -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/net/ipv4/route.c =================================================================== --- linux-2.6.orig/net/ipv4/route.c 2010-06-01 11:46:10.000000000 -0500 +++ linux-2.6/net/ipv4/route.c 2010-06-01 11:52:55.000000000 -0500 @@ -2981,6 +2981,9 @@ static int inet_rtm_getroute(struct sk_b rt = skb_rtable(skb); if (err == 0 && rt->u.dst.error) err = -rt->u.dst.error; + if (err) + IP_INC_STATS_BH(dev_net(skb->dev), + IPSTATS_MIB_INADDRERRORS); } else { struct flowi fl = { .nl_u = {