Message ID | 1454370667-2328-3-git-send-email-jarod@redhat.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 1 Feb 2016 18:51:05 -0500 Jarod Wilson <jarod@redhat.com> wrote: > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -35,6 +35,8 @@ struct rtnl_link_stats { > /* for cslip etc */ > __u32 rx_compressed; > __u32 tx_compressed; > + > + __u32 rx_nohandler; /* dropped, no handler found */ > }; > > /* The main device statistics structure */ > @@ -68,6 +70,8 @@ struct rtnl_link_stats64 { > /* for cslip etc */ > __u64 rx_compressed; > __u64 tx_compressed; > + > + __u64 rx_nohandler; /* dropped, no handler found */ > }; Why was this userspace ABI change allowed? The stats structure is exposed to user space via netlink and changing the size of responses will break iproute2 commands. The code will be expecting one size and the response will vary and break existing code. Yes, the code should check the size of the response, but it doesn't and I am sure iproute2 is not the only code that does this.
From: Stephen Hemminger <stephen@networkplumber.org> Date: Sun, 7 Feb 2016 11:37:32 -0800 > On Mon, 1 Feb 2016 18:51:05 -0500 > Jarod Wilson <jarod@redhat.com> wrote: > >> --- a/include/uapi/linux/if_link.h >> +++ b/include/uapi/linux/if_link.h >> @@ -35,6 +35,8 @@ struct rtnl_link_stats { >> /* for cslip etc */ >> __u32 rx_compressed; >> __u32 tx_compressed; >> + >> + __u32 rx_nohandler; /* dropped, no handler found */ >> }; >> >> /* The main device statistics structure */ >> @@ -68,6 +70,8 @@ struct rtnl_link_stats64 { >> /* for cslip etc */ >> __u64 rx_compressed; >> __u64 tx_compressed; >> + >> + __u64 rx_nohandler; /* dropped, no handler found */ >> }; > > Why was this userspace ABI change allowed? > The stats structure is exposed to user space via netlink > and changing the size of responses will break iproute2 commands. > > The code will be expecting one size and the response will vary and > break existing code. Yes, the code should check the size > of the response, but it doesn't and I am sure iproute2 is not > the only code that does this. Jarod, please look into this.
On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote: > > Why was this userspace ABI change allowed? > > The stats structure is exposed to user space via netlink > > and changing the size of responses will break iproute2 commands. I do not think it breaks anything. iproute2 always assumed kernel was sending at least 23 u64, and does not check at all if the kernel sends more. (or less, so iproute2 can print garbage if kernel is malicious) an iproute2 patch will be needed to automatically detect if new kernels are sending more data and print it accordingly. > > > > The code will be expecting one size and the response will vary and > > break existing code. Yes, the code should check the size > > of the response, but it doesn't and I am sure iproute2 is not > > the only code that does this. > > Jarod, please look into this. Running latest net-next, and old iproute2 is just fine. # ip -s link sh dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 001a11fffec30d80 state UP mode DEFAULT group default qlen 16000 link/ether 00:1a:11:c3:0d:7f brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 533766 1875 0 0 0 135 TX: bytes packets errors dropped carrier collsns 209204 1858 0 0 0 0
On Sun, Feb 07, 2016 at 12:19:28PM -0800, Eric Dumazet wrote: > On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote: > > > > Why was this userspace ABI change allowed? > > > The stats structure is exposed to user space via netlink > > > and changing the size of responses will break iproute2 commands. > > I do not think it breaks anything. > > iproute2 always assumed kernel was sending at least 23 u64, and does not > check at all if the kernel sends more. (or less, so iproute2 can print > garbage if kernel is malicious) > > an iproute2 patch will be needed to automatically detect if new kernels > are sending more data and print it accordingly. My TODO list did include poking at iproute2 to expose the new info, I can take a closer look for possible issues as well, but... > > > The code will be expecting one size and the response will vary and > > > break existing code. Yes, the code should check the size > > > of the response, but it doesn't and I am sure iproute2 is not > > > the only code that does this. > > > > Jarod, please look into this. > > Running latest net-next, and old iproute2 is just fine. ...I haven't run into anything that didn't work with current iproute2 either while testing out functionality of these patches. If there's something in particular that seems most suspect that I perhaps simply haven't tried, I can give that a go as well. In any case, I'm definitely due to take a look at iproute2 as it relates to this patchset.
On Mon, 8 Feb 2016 13:32:54 -0500 Jarod Wilson <jarod@redhat.com> wrote: > On Sun, Feb 07, 2016 at 12:19:28PM -0800, Eric Dumazet wrote: > > On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote: > > > > > > Why was this userspace ABI change allowed? > > > > The stats structure is exposed to user space via netlink > > > > and changing the size of responses will break iproute2 commands. > > > > I do not think it breaks anything. > > > > iproute2 always assumed kernel was sending at least 23 u64, and does not > > check at all if the kernel sends more. (or less, so iproute2 can print > > garbage if kernel is malicious) > > > > an iproute2 patch will be needed to automatically detect if new kernels > > are sending more data and print it accordingly. > > My TODO list did include poking at iproute2 to expose the new info, I can > take a closer look for possible issues as well, but... > > > > > The code will be expecting one size and the response will vary and > > > > break existing code. Yes, the code should check the size > > > > of the response, but it doesn't and I am sure iproute2 is not > > > > the only code that does this. > > > > > > Jarod, please look into this. > > > > Running latest net-next, and old iproute2 is just fine. > > ...I haven't run into anything that didn't work with current iproute2 > either while testing out functionality of these patches. If there's > something in particular that seems most suspect that I perhaps simply > haven't tried, I can give that a go as well. > > In any case, I'm definitely due to take a look at iproute2 as it relates > to this patchset. > The iproute2 command can be fixed, but adding dependency on size of response gets gross fast. Imagine when 4 more fields get added, this doesn't scale well. Also, the definition of userspace ABI is that structures can't change. There are many other utilities that are not visible that may get broken. Traditionally Linux has guaranteed that programs will continue to work no matter how they were coded.
On Mon, 2016-02-08 at 11:38 -0800, Stephen Hemminger wrote: > The iproute2 command can be fixed, but adding dependency on size of response > gets gross fast. Imagine when 4 more fields get added, this doesn't scale well. Really ? I see no problem at all doing the proper tests. > > Also, the definition of userspace ABI is that structures can't change. > There are many other utilities that are not visible that may get broken. > Traditionally Linux has guaranteed that programs will continue to work > no matter how they were coded. Stephen, we have been doing that for years. Whole point of TLV is that it allows us to add new fields at the end of the structures. Yes, some buggy programs might need a fix. Look at iproute2, you were the one adding in 2004 code to cope with various tcp_info sizes. So 12 years later, you cannot say it does not work anymore.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 08 Feb 2016 14:57:40 -0800 > Whole point of TLV is that it allows us to add new fields at the end of > the structures. ... > Look at iproute2, you were the one adding in 2004 code to cope with > various tcp_info sizes. > > So 12 years later, you cannot say it does not work anymore. +1
On 16-02-09 03:40 AM, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 08 Feb 2016 14:57:40 -0800 > >> Whole point of TLV is that it allows us to add new fields at the end of >> the structures. > ... >> Look at iproute2, you were the one adding in 2004 code to cope with >> various tcp_info sizes. >> >> So 12 years later, you cannot say it does not work anymore. > > +1 > The TLV L should be canonical way to determine length. i.e should be sufficient to just look at L and understand that content has changed. But: Using sizeof could be dangerous unless the data is packed to be 32-bit aligned. Looking INET_DIAG_INFO check for sizeof there is a small 8 bit hole in tcp_info I think between these two fields: ---- __u8 tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4; __u32 tcpi_rto; --- The kernel will pad to make sure the TLV data is 32-bit aligned. I am not sure if that will be the same length as sizeof() in all hardware + compilers... For this case, it is almost safe to just add a version field - probably in the hole. Or have a #define to say what the expected length should be. Or add an 8 bit pad. In general adding new fields that are non-optional is problematic. i.e by non-optional i mean always expected to be present. I think a good test is old kernel with new iproute2. If the new field is non-optional, it will fail (example iproute2 may try to print a value that it expects but because old kernel doesnt understand it; it is non-existent). cheers, jamal
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 289c231..78a20ce 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1397,6 +1397,8 @@ enum netdev_priv_flags { * do not use this in drivers * @tx_dropped: Dropped packets by core network, * do not use this in drivers + * @rx_nohandler: nohandler dropped packets by core network on + * inactive devices, do not use this in drivers * * @wireless_handlers: List of functions to handle Wireless Extensions, * instead of ioctl, @@ -1611,6 +1613,7 @@ struct net_device { atomic_long_t rx_dropped; atomic_long_t tx_dropped; + atomic_long_t rx_nohandler; #ifdef CONFIG_WIRELESS_EXT const struct iw_handler_def * wireless_handlers; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index a30b780..d3e90b9 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -35,6 +35,8 @@ struct rtnl_link_stats { /* for cslip etc */ __u32 rx_compressed; __u32 tx_compressed; + + __u32 rx_nohandler; /* dropped, no handler found */ }; /* The main device statistics structure */ @@ -68,6 +70,8 @@ struct rtnl_link_stats64 { /* for cslip etc */ __u64 rx_compressed; __u64 tx_compressed; + + __u64 rx_nohandler; /* dropped, no handler found */ }; /* The struct should be in sync with struct ifmap */ diff --git a/net/core/dev.c b/net/core/dev.c index 65863e5..f128483 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4154,7 +4154,10 @@ ncls: ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); } else { drop: - atomic_long_inc(&skb->dev->rx_dropped); + if (!deliver_exact) + atomic_long_inc(&skb->dev->rx_dropped); + else + atomic_long_inc(&skb->dev->rx_nohandler); kfree_skb(skb); /* Jamal, now you will not able to escape explaining * me how you were going to use this. :-) @@ -7307,6 +7310,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, } storage->rx_dropped += atomic_long_read(&dev->rx_dropped); storage->tx_dropped += atomic_long_read(&dev->tx_dropped); + storage->rx_nohandler += atomic_long_read(&dev->rx_nohandler); return storage; } EXPORT_SYMBOL(dev_get_stats); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index b6c8a66..da7dbc2 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -574,6 +574,7 @@ NETSTAT_ENTRY(tx_heartbeat_errors); NETSTAT_ENTRY(tx_window_errors); NETSTAT_ENTRY(rx_compressed); NETSTAT_ENTRY(tx_compressed); +NETSTAT_ENTRY(rx_nohandler); static struct attribute *netstat_attrs[] = { &dev_attr_rx_packets.attr, @@ -599,6 +600,7 @@ static struct attribute *netstat_attrs[] = { &dev_attr_tx_window_errors.attr, &dev_attr_rx_compressed.attr, &dev_attr_tx_compressed.attr, + &dev_attr_rx_nohandler.attr, NULL }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index d735e85..20d7135 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -804,6 +804,8 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a, a->rx_compressed = b->rx_compressed; a->tx_compressed = b->tx_compressed; + + a->rx_nohandler = b->rx_nohandler; } static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
This adds an rx_nohandler stat counter, along with a sysfs statistics node, and copies the counter out via netlink as well. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@mellanox.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Tom Herbert <tom@herbertland.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- include/linux/netdevice.h | 3 +++ include/uapi/linux/if_link.h | 4 ++++ net/core/dev.c | 6 +++++- net/core/net-sysfs.c | 2 ++ net/core/rtnetlink.c | 2 ++ 5 files changed, 16 insertions(+), 1 deletion(-)