diff mbox

[net,v3,2/4] net: add rx_nohandler stat counter

Message ID 1454370667-2328-3-git-send-email-jarod@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jarod Wilson Feb. 1, 2016, 11:51 p.m. UTC
This adds an rx_nohandler stat counter, along with a sysfs statistics
node, and copies the counter out via netlink as well.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 include/linux/netdevice.h    | 3 +++
 include/uapi/linux/if_link.h | 4 ++++
 net/core/dev.c               | 6 +++++-
 net/core/net-sysfs.c         | 2 ++
 net/core/rtnetlink.c         | 2 ++
 5 files changed, 16 insertions(+), 1 deletion(-)

Comments

Stephen Hemminger Feb. 7, 2016, 7:37 p.m. UTC | #1
On Mon,  1 Feb 2016 18:51:05 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -35,6 +35,8 @@ struct rtnl_link_stats {
>  	/* for cslip etc */
>  	__u32	rx_compressed;
>  	__u32	tx_compressed;
> +
> +	__u32	rx_nohandler;		/* dropped, no handler found	*/
>  };
>  
>  /* The main device statistics structure */
> @@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
>  	/* for cslip etc */
>  	__u64	rx_compressed;
>  	__u64	tx_compressed;
> +
> +	__u64	rx_nohandler;		/* dropped, no handler found	*/
>  };

Why was this userspace ABI change allowed?
The stats structure is exposed to user space via netlink
and changing the size of responses will break iproute2 commands.

The code will be expecting one size and the response will vary and
break existing code.  Yes, the code should check the size
of the response, but it doesn't and I am sure iproute2 is not
the only code that does this.
David Miller Feb. 7, 2016, 7:46 p.m. UTC | #2
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sun, 7 Feb 2016 11:37:32 -0800

> On Mon,  1 Feb 2016 18:51:05 -0500
> Jarod Wilson <jarod@redhat.com> wrote:
> 
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -35,6 +35,8 @@ struct rtnl_link_stats {
>>  	/* for cslip etc */
>>  	__u32	rx_compressed;
>>  	__u32	tx_compressed;
>> +
>> +	__u32	rx_nohandler;		/* dropped, no handler found	*/
>>  };
>>  
>>  /* The main device statistics structure */
>> @@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
>>  	/* for cslip etc */
>>  	__u64	rx_compressed;
>>  	__u64	tx_compressed;
>> +
>> +	__u64	rx_nohandler;		/* dropped, no handler found	*/
>>  };
> 
> Why was this userspace ABI change allowed?
> The stats structure is exposed to user space via netlink
> and changing the size of responses will break iproute2 commands.
> 
> The code will be expecting one size and the response will vary and
> break existing code.  Yes, the code should check the size
> of the response, but it doesn't and I am sure iproute2 is not
> the only code that does this.

Jarod, please look into this.
Eric Dumazet Feb. 7, 2016, 8:19 p.m. UTC | #3
On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote:

> > Why was this userspace ABI change allowed?
> > The stats structure is exposed to user space via netlink
> > and changing the size of responses will break iproute2 commands.

I do not think it breaks anything.

iproute2 always assumed kernel was sending at least 23 u64, and does not
check at all if the kernel sends more. (or less, so iproute2 can print
garbage if kernel is malicious)

an iproute2 patch will be needed to automatically detect if new kernels
are sending more data and print it accordingly.

> > 
> > The code will be expecting one size and the response will vary and
> > break existing code.  Yes, the code should check the size
> > of the response, but it doesn't and I am sure iproute2 is not
> > the only code that does this.
> 
> Jarod, please look into this.

Running latest net-next, and old iproute2 is just fine.

# ip -s link sh dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid
001a11fffec30d80 state UP mode DEFAULT group default qlen 16000
    link/ether 00:1a:11:c3:0d:7f brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    533766     1875     0       0       0       135     
    TX: bytes  packets  errors  dropped carrier collsns 
    209204     1858     0       0       0       0
Jarod Wilson Feb. 8, 2016, 6:32 p.m. UTC | #4
On Sun, Feb 07, 2016 at 12:19:28PM -0800, Eric Dumazet wrote:
> On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote:
> 
> > > Why was this userspace ABI change allowed?
> > > The stats structure is exposed to user space via netlink
> > > and changing the size of responses will break iproute2 commands.
> 
> I do not think it breaks anything.
> 
> iproute2 always assumed kernel was sending at least 23 u64, and does not
> check at all if the kernel sends more. (or less, so iproute2 can print
> garbage if kernel is malicious)
> 
> an iproute2 patch will be needed to automatically detect if new kernels
> are sending more data and print it accordingly.

My TODO list did include poking at iproute2 to expose the new info, I can
take a closer look for possible issues as well, but...

> > > The code will be expecting one size and the response will vary and
> > > break existing code.  Yes, the code should check the size
> > > of the response, but it doesn't and I am sure iproute2 is not
> > > the only code that does this.
> > 
> > Jarod, please look into this.
> 
> Running latest net-next, and old iproute2 is just fine.

...I haven't run into anything that didn't work with current iproute2
either while testing out functionality of these patches. If there's
something in particular that seems most suspect that I perhaps simply
haven't tried, I can give that a go as well.

In any case, I'm definitely due to take a look at iproute2 as it relates
to this patchset.
Stephen Hemminger Feb. 8, 2016, 7:38 p.m. UTC | #5
On Mon, 8 Feb 2016 13:32:54 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> On Sun, Feb 07, 2016 at 12:19:28PM -0800, Eric Dumazet wrote:
> > On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote:
> > 
> > > > Why was this userspace ABI change allowed?
> > > > The stats structure is exposed to user space via netlink
> > > > and changing the size of responses will break iproute2 commands.
> > 
> > I do not think it breaks anything.
> > 
> > iproute2 always assumed kernel was sending at least 23 u64, and does not
> > check at all if the kernel sends more. (or less, so iproute2 can print
> > garbage if kernel is malicious)
> > 
> > an iproute2 patch will be needed to automatically detect if new kernels
> > are sending more data and print it accordingly.
> 
> My TODO list did include poking at iproute2 to expose the new info, I can
> take a closer look for possible issues as well, but...
> 
> > > > The code will be expecting one size and the response will vary and
> > > > break existing code.  Yes, the code should check the size
> > > > of the response, but it doesn't and I am sure iproute2 is not
> > > > the only code that does this.
> > > 
> > > Jarod, please look into this.
> > 
> > Running latest net-next, and old iproute2 is just fine.
> 
> ...I haven't run into anything that didn't work with current iproute2
> either while testing out functionality of these patches. If there's
> something in particular that seems most suspect that I perhaps simply
> haven't tried, I can give that a go as well.
> 
> In any case, I'm definitely due to take a look at iproute2 as it relates
> to this patchset.
> 

The iproute2 command can be fixed, but adding dependency on size of response
gets gross fast.  Imagine when 4 more fields get added, this doesn't scale well.

Also, the definition of userspace ABI is that structures can't change.
There are many other utilities that are not visible that may get broken.
Traditionally Linux has guaranteed that programs will continue to work
no matter how they were coded.
Eric Dumazet Feb. 8, 2016, 10:57 p.m. UTC | #6
On Mon, 2016-02-08 at 11:38 -0800, Stephen Hemminger wrote:

> The iproute2 command can be fixed, but adding dependency on size of response
> gets gross fast.  Imagine when 4 more fields get added, this doesn't scale well.

Really ? I see no problem at all doing the proper tests.

> 
> Also, the definition of userspace ABI is that structures can't change.
> There are many other utilities that are not visible that may get broken.
> Traditionally Linux has guaranteed that programs will continue to work
> no matter how they were coded.

Stephen, we have been doing that for years.

Whole point of TLV is that it allows us to add new fields at the end of
the structures.

Yes, some buggy programs might need a fix.

Look at iproute2, you were the one adding in 2004 code to cope with
various tcp_info sizes.

So 12 years later, you cannot say it does not work anymore.
David Miller Feb. 9, 2016, 8:40 a.m. UTC | #7
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 08 Feb 2016 14:57:40 -0800

> Whole point of TLV is that it allows us to add new fields at the end of
> the structures.
 ...
> Look at iproute2, you were the one adding in 2004 code to cope with
> various tcp_info sizes.
> 
> So 12 years later, you cannot say it does not work anymore.

+1
Jamal Hadi Salim Feb. 9, 2016, 10:56 a.m. UTC | #8
On 16-02-09 03:40 AM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 08 Feb 2016 14:57:40 -0800
>
>> Whole point of TLV is that it allows us to add new fields at the end of
>> the structures.
>   ...
>> Look at iproute2, you were the one adding in 2004 code to cope with
>> various tcp_info sizes.
>>
>> So 12 years later, you cannot say it does not work anymore.
>
> +1
>

The TLV L should be canonical way to determine length. i.e should be
sufficient to just look at L and understand that content has changed.
But:
Using sizeof could be dangerous unless the data is packed to be
32-bit aligned. Looking INET_DIAG_INFO check for sizeof
there is a small 8 bit hole in tcp_info I think between
these two fields:

----
__u8    tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4;
__u32   tcpi_rto;
---

The kernel will pad to make sure the TLV data is 32-bit aligned.
I am not sure if that will be the same length as sizeof() in all
hardware + compilers... For this case,
it is almost safe to just add a version field - probably in the hole.
Or have a #define to say what the expected length should be. Or add
an 8 bit pad.

In general adding new fields that are non-optional is problematic. i.e
by non-optional i mean always expected to be present.
I think a good test is old kernel with new iproute2. If the new field
is non-optional, it will fail (example iproute2 may try to print a value
that it expects but because old kernel doesnt understand it; it is 
non-existent).

cheers,
jamal
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..78a20ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1397,6 +1397,8 @@  enum netdev_priv_flags {
  *			do not use this in drivers
  *	@tx_dropped:	Dropped packets by core network,
  *			do not use this in drivers
+ *	@rx_nohandler:	nohandler dropped packets by core network on
+ *			inactive devices, do not use this in drivers
  *
  *	@wireless_handlers:	List of functions to handle Wireless Extensions,
  *				instead of ioctl,
@@ -1611,6 +1613,7 @@  struct net_device {
 
 	atomic_long_t		rx_dropped;
 	atomic_long_t		tx_dropped;
+	atomic_long_t		rx_nohandler;
 
 #ifdef CONFIG_WIRELESS_EXT
 	const struct iw_handler_def *	wireless_handlers;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b780..d3e90b9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -35,6 +35,8 @@  struct rtnl_link_stats {
 	/* for cslip etc */
 	__u32	rx_compressed;
 	__u32	tx_compressed;
+
+	__u32	rx_nohandler;		/* dropped, no handler found	*/
 };
 
 /* The main device statistics structure */
@@ -68,6 +70,8 @@  struct rtnl_link_stats64 {
 	/* for cslip etc */
 	__u64	rx_compressed;
 	__u64	tx_compressed;
+
+	__u64	rx_nohandler;		/* dropped, no handler found	*/
 };
 
 /* The struct should be in sync with struct ifmap */
diff --git a/net/core/dev.c b/net/core/dev.c
index 65863e5..f128483 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4154,7 +4154,10 @@  ncls:
 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
 drop:
-		atomic_long_inc(&skb->dev->rx_dropped);
+		if (!deliver_exact)
+			atomic_long_inc(&skb->dev->rx_dropped);
+		else
+			atomic_long_inc(&skb->dev->rx_nohandler);
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
@@ -7307,6 +7310,7 @@  struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 	}
 	storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
 	storage->tx_dropped += atomic_long_read(&dev->tx_dropped);
+	storage->rx_nohandler += atomic_long_read(&dev->rx_nohandler);
 	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..da7dbc2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -574,6 +574,7 @@  NETSTAT_ENTRY(tx_heartbeat_errors);
 NETSTAT_ENTRY(tx_window_errors);
 NETSTAT_ENTRY(rx_compressed);
 NETSTAT_ENTRY(tx_compressed);
+NETSTAT_ENTRY(rx_nohandler);
 
 static struct attribute *netstat_attrs[] = {
 	&dev_attr_rx_packets.attr,
@@ -599,6 +600,7 @@  static struct attribute *netstat_attrs[] = {
 	&dev_attr_tx_window_errors.attr,
 	&dev_attr_rx_compressed.attr,
 	&dev_attr_tx_compressed.attr,
+	&dev_attr_rx_nohandler.attr,
 	NULL
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e85..20d7135 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -804,6 +804,8 @@  static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 
 	a->rx_compressed = b->rx_compressed;
 	a->tx_compressed = b->tx_compressed;
+
+	a->rx_nohandler = b->rx_nohandler;
 }
 
 static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)