mbox series

[RFC,00/14] netlink/hierarchical stats

Message ID 20190128234507.32028-1-jakub.kicinski@netronome.com
Headers show
Series netlink/hierarchical stats | expand

Message

Jakub Kicinski Jan. 28, 2019, 11:44 p.m. UTC
Hi!

As I tried to explain in my slides at netconf 2018 we are lacking
an expressive, standard API to report device statistics.

Networking silicon generally maintains some IEEE 802.3 and/or RMON
statistics.  Today those all end up in ethtool -S.  Here is a simple
attempt (admittedly very imprecise) of counting how many names driver
authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
statistics (RX and TX):

$ git grep '".*512.*1023.*"' -- drivers/net/ | \
    sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
63

Interestingly only two drivers in the tree use the name the standard
gave us (etherStatsPkts512to1023, modulo case).

I set out to working on this set in an attempt to give drivers a way
to express clearly to user space standard-compliant counters.

Second most common use for custom statistics is per-queue counters.
This is where the "hierarchical" part of this set comes in, as
groups can be nested, and user space tools can handle the aggregation
inside the groups if needed.

This set also tries to address the problem of users not knowing if
a statistic is reported by hardware or the driver.  Many modern drivers
use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
In this set, netlink attributes describe whether a group of statistics
is RX or TX, maintained by device or driver.

The purpose of this patch set is _not_ to replace ethtool -S.  It is
an incredibly useful tool, and we will certainly continue using it.
However, for standard-based and commonly maintained statistics a more
structured API seems warranted.

There are two things missing from these patches, which I initially
planned to address as well: filtering, and refresh rate control.

Filtering doesn't need much explanation, users should be able to request
only a subset of statistics (like only SW stats or only given ID).  The
bitmap of statistics in each group is there for filtering later on.

By refresh control I mean the ability for user space to indicate how
"fresh" values it expects.  Sometimes reading the HW counters requires
slow register reads or FW communication, in such cases drivers may cache
the result.  (Privileged) user space should be able to add a "not older
than" timestamp to indicate how fresh statistics it expects.  And vice
versa, drivers can then also put the timestamp of when the statistics
were last refreshed in the dump for more precise bandwidth estimation.

Jakub Kicinski (14):
  nfp: remove unused structure
  nfp: constify parameter to nfp_port_from_netdev()
  net: hstats: add basic/core functionality
  net: hstats: allow hierarchies to be built
  nfp: very basic hstat support
  net: hstats: allow iterators
  net: hstats: help in iteration over directions
  nfp: hstats: make use of iteration for direction
  nfp: hstats: add driver and device per queue statistics
  net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
  nfp: hstats: add IEEE/RMON ethernet port/MAC stats
  net: hstats: add markers for partial groups
  nfp: hstats: add a partial group of per-8021Q prio stats
  Documentation: networking: describe new hstat API

 Documentation/networking/hstats.rst           | 590 +++++++++++++++
 .../networking/hstats_flow_example.dot        |  11 +
 Documentation/networking/index.rst            |   1 +
 drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 474 ++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  10 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_repr.h |   2 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c |   2 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 +-
 include/linux/netdevice.h                     |   9 +
 include/net/hstats.h                          | 176 +++++
 include/uapi/linux/if_link.h                  | 107 +++
 net/core/Makefile                             |   2 +-
 net/core/hstats.c                             | 682 ++++++++++++++++++
 net/core/rtnetlink.c                          |  21 +
 18 files changed, 2084 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/networking/hstats.rst
 create mode 100644 Documentation/networking/hstats_flow_example.dot
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c
 create mode 100644 include/net/hstats.h
 create mode 100644 net/core/hstats.c

Comments

Roopa Prabhu Jan. 30, 2019, 10:14 p.m. UTC | #1
On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> Hi!
>
> As I tried to explain in my slides at netconf 2018 we are lacking
> an expressive, standard API to report device statistics.
>
> Networking silicon generally maintains some IEEE 802.3 and/or RMON
> statistics.  Today those all end up in ethtool -S.  Here is a simple
> attempt (admittedly very imprecise) of counting how many names driver
> authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> statistics (RX and TX):
>
> $ git grep '".*512.*1023.*"' -- drivers/net/ | \
>     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> 63
>
> Interestingly only two drivers in the tree use the name the standard
> gave us (etherStatsPkts512to1023, modulo case).
>
> I set out to working on this set in an attempt to give drivers a way
> to express clearly to user space standard-compliant counters.
>
> Second most common use for custom statistics is per-queue counters.
> This is where the "hierarchical" part of this set comes in, as
> groups can be nested, and user space tools can handle the aggregation
> inside the groups if needed.
>
> This set also tries to address the problem of users not knowing if
> a statistic is reported by hardware or the driver.  Many modern drivers
> use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> In this set, netlink attributes describe whether a group of statistics
> is RX or TX, maintained by device or driver.
>
> The purpose of this patch set is _not_ to replace ethtool -S.  It is
> an incredibly useful tool, and we will certainly continue using it.
> However, for standard-based and commonly maintained statistics a more
> structured API seems warranted.
>
> There are two things missing from these patches, which I initially
> planned to address as well: filtering, and refresh rate control.
>
> Filtering doesn't need much explanation, users should be able to request
> only a subset of statistics (like only SW stats or only given ID).  The
> bitmap of statistics in each group is there for filtering later on.
>
> By refresh control I mean the ability for user space to indicate how
> "fresh" values it expects.  Sometimes reading the HW counters requires
> slow register reads or FW communication, in such cases drivers may cache
> the result.  (Privileged) user space should be able to add a "not older
> than" timestamp to indicate how fresh statistics it expects.  And vice
> versa, drivers can then also put the timestamp of when the statistics
> were last refreshed in the dump for more precise bandwidth estimation.


Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
mention 'partial' support for ethtool stats. I understand the reason
you say its partial.
But while at it, why not also include the ability to have driver
extensible stats here ? ie make it complete. We have talked about
making all hw stats available
via the RTM_*STATS api in the past..., so just want to make sure the
new HSTATS infra you are adding to the RTM_*STATS api
covers or at-least makes it possible to include driver extensible
stats in the future where the driver gets to define the stats id +
value (This is very useful).
 It would be nice if you can account for that in this new HSTATS API.





>
> Jakub Kicinski (14):
>   nfp: remove unused structure
>   nfp: constify parameter to nfp_port_from_netdev()
>   net: hstats: add basic/core functionality
>   net: hstats: allow hierarchies to be built
>   nfp: very basic hstat support
>   net: hstats: allow iterators
>   net: hstats: help in iteration over directions
>   nfp: hstats: make use of iteration for direction
>   nfp: hstats: add driver and device per queue statistics
>   net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
>   nfp: hstats: add IEEE/RMON ethernet port/MAC stats
>   net: hstats: add markers for partial groups
>   nfp: hstats: add a partial group of per-8021Q prio stats
>   Documentation: networking: describe new hstat API
>
>  Documentation/networking/hstats.rst           | 590 +++++++++++++++
>  .../networking/hstats_flow_example.dot        |  11 +
>  Documentation/networking/index.rst            |   1 +
>  drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_hstat.c    | 474 ++++++++++++
>  drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
>  drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
>  drivers/net/ethernet/netronome/nfp/nfp_net.h  |  10 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_net_repr.h |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.c |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 +-
>  include/linux/netdevice.h                     |   9 +
>  include/net/hstats.h                          | 176 +++++
>  include/uapi/linux/if_link.h                  | 107 +++
>  net/core/Makefile                             |   2 +-
>  net/core/hstats.c                             | 682 ++++++++++++++++++
>  net/core/rtnetlink.c                          |  21 +
>  18 files changed, 2084 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/networking/hstats.rst
>  create mode 100644 Documentation/networking/hstats_flow_example.dot
>  create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c
>  create mode 100644 include/net/hstats.h
>  create mode 100644 net/core/hstats.c
>
> --
> 2.19.2
>
Jakub Kicinski Jan. 31, 2019, 12:24 a.m. UTC | #2
On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > Hi!
> >
> > As I tried to explain in my slides at netconf 2018 we are lacking
> > an expressive, standard API to report device statistics.
> >
> > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > attempt (admittedly very imprecise) of counting how many names driver
> > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > statistics (RX and TX):
> >
> > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > 63
> >
> > Interestingly only two drivers in the tree use the name the standard
> > gave us (etherStatsPkts512to1023, modulo case).
> >
> > I set out to working on this set in an attempt to give drivers a way
> > to express clearly to user space standard-compliant counters.
> >
> > Second most common use for custom statistics is per-queue counters.
> > This is where the "hierarchical" part of this set comes in, as
> > groups can be nested, and user space tools can handle the aggregation
> > inside the groups if needed.
> >
> > This set also tries to address the problem of users not knowing if
> > a statistic is reported by hardware or the driver.  Many modern drivers
> > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > In this set, netlink attributes describe whether a group of statistics
> > is RX or TX, maintained by device or driver.
> >
> > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > an incredibly useful tool, and we will certainly continue using it.
> > However, for standard-based and commonly maintained statistics a more
> > structured API seems warranted.
> >
> > There are two things missing from these patches, which I initially
> > planned to address as well: filtering, and refresh rate control.
> >
> > Filtering doesn't need much explanation, users should be able to request
> > only a subset of statistics (like only SW stats or only given ID).  The
> > bitmap of statistics in each group is there for filtering later on.
> >
> > By refresh control I mean the ability for user space to indicate how
> > "fresh" values it expects.  Sometimes reading the HW counters requires
> > slow register reads or FW communication, in such cases drivers may cache
> > the result.  (Privileged) user space should be able to add a "not older
> > than" timestamp to indicate how fresh statistics it expects.  And vice
> > versa, drivers can then also put the timestamp of when the statistics
> > were last refreshed in the dump for more precise bandwidth estimation.  
> 
> Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> mention 'partial' support for ethtool stats. I understand the reason
> you say its partial.
> But while at it, why not also include the ability to have driver
> extensible stats here ? ie make it complete. We have talked about
> making all hw stats available
> via the RTM_*STATS api in the past..., so just want to make sure the
> new HSTATS infra you are adding to the RTM_*STATS api
> covers or at-least makes it possible to include driver extensible
> stats in the future where the driver gets to define the stats id +
> value (This is very useful).
>  It would be nice if you can account for that in this new HSTATS API.

My thinking was that we should leave truly custom/strange stats to
ethtool API which works quite well for that and at the same time be
very accepting of people adding new IDs to HSTAT (only requirement is
basically defining the meaning very clearly).  

For the first stab I looked at two drivers and added all the stats that
were common.

Given this set is identifying statistics by ID - how would we make that
extensible to drivers?  Would we go back to strings or have some
"driver specific" ID space?

Is there any particular type of statistic you'd expect drivers to want
to add?  For NICs I think IEEE/RMON should pretty much cover the
silicon ones, but I don't know much about switches :)
Roopa Prabhu Jan. 31, 2019, 4:16 p.m. UTC | #3
On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > > Hi!
> > >
> > > As I tried to explain in my slides at netconf 2018 we are lacking
> > > an expressive, standard API to report device statistics.
> > >
> > > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > > attempt (admittedly very imprecise) of counting how many names driver
> > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > > statistics (RX and TX):
> > >
> > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> > >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > > 63
> > >
> > > Interestingly only two drivers in the tree use the name the standard
> > > gave us (etherStatsPkts512to1023, modulo case).
> > >
> > > I set out to working on this set in an attempt to give drivers a way
> > > to express clearly to user space standard-compliant counters.
> > >
> > > Second most common use for custom statistics is per-queue counters.
> > > This is where the "hierarchical" part of this set comes in, as
> > > groups can be nested, and user space tools can handle the aggregation
> > > inside the groups if needed.
> > >
> > > This set also tries to address the problem of users not knowing if
> > > a statistic is reported by hardware or the driver.  Many modern drivers
> > > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > > In this set, netlink attributes describe whether a group of statistics
> > > is RX or TX, maintained by device or driver.
> > >
> > > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > > an incredibly useful tool, and we will certainly continue using it.
> > > However, for standard-based and commonly maintained statistics a more
> > > structured API seems warranted.
> > >
> > > There are two things missing from these patches, which I initially
> > > planned to address as well: filtering, and refresh rate control.
> > >
> > > Filtering doesn't need much explanation, users should be able to request
> > > only a subset of statistics (like only SW stats or only given ID).  The
> > > bitmap of statistics in each group is there for filtering later on.
> > >
> > > By refresh control I mean the ability for user space to indicate how
> > > "fresh" values it expects.  Sometimes reading the HW counters requires
> > > slow register reads or FW communication, in such cases drivers may cache
> > > the result.  (Privileged) user space should be able to add a "not older
> > > than" timestamp to indicate how fresh statistics it expects.  And vice
> > > versa, drivers can then also put the timestamp of when the statistics
> > > were last refreshed in the dump for more precise bandwidth estimation.
> >
> > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> > mention 'partial' support for ethtool stats. I understand the reason
> > you say its partial.
> > But while at it, why not also include the ability to have driver
> > extensible stats here ? ie make it complete. We have talked about
> > making all hw stats available
> > via the RTM_*STATS api in the past..., so just want to make sure the
> > new HSTATS infra you are adding to the RTM_*STATS api
> > covers or at-least makes it possible to include driver extensible
> > stats in the future where the driver gets to define the stats id +
> > value (This is very useful).
> >  It would be nice if you can account for that in this new HSTATS API.
>
> My thinking was that we should leave truly custom/strange stats to
> ethtool API which works quite well for that and at the same time be
> very accepting of people adding new IDs to HSTAT (only requirement is
> basically defining the meaning very clearly).

that sounds reasonable. But the 'defining meaning clearly' gets tricky
sometimes.
The vendor who gets their ID or meaning first wins :) and the rest
will have to live with
ethtool and explain to rest of the world that ethtool is more reliable
for their hardware :)

I am also concerned that this getting the ID into common HSTAT ID
space will  slow down the process of adding new counters
for vendors. Which will lead to vendors sticking with ethtool API. It
would be great if people can get all stats in one place and not rely
on another API for 'more'.

>
> For the first stab I looked at two drivers and added all the stats that
> were common.
>
> Given this set is identifying statistics by ID - how would we make that
> extensible to drivers?  Would we go back to strings or have some
> "driver specific" ID space?

I was looking for ideas from you really, to see if you had considered
this. agree per driver ID space seems ugly.
ethtool strings are great today...if we can control the duplication.
But thinking some more..., i did see some
patches recently for vendor specific parameter (with ID) space in
devlink. maybe something like that will be
reasonable ?

>
> Is there any particular type of statistic you'd expect drivers to want
> to add?  For NICs I think IEEE/RMON should pretty much cover the
> silicon ones, but I don't know much about switches :)

I will have to go through the list. But switch asics do support
flexible stats/counters that can be attached at various points.
And new chip versions come with more support. Having that flexibility
to expose/extend such stats incrementally is very valuable on a per
hardware/vendor basis.
Roopa Prabhu Jan. 31, 2019, 4:31 p.m. UTC | #4
On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
> On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > > > Hi!
> > > >
> > > > As I tried to explain in my slides at netconf 2018 we are lacking
> > > > an expressive, standard API to report device statistics.
> > > >
> > > > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > > > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > > > attempt (admittedly very imprecise) of counting how many names driver
> > > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > > > statistics (RX and TX):
> > > >
> > > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> > > >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > > > 63
> > > >
> > > > Interestingly only two drivers in the tree use the name the standard
> > > > gave us (etherStatsPkts512to1023, modulo case).
> > > >
> > > > I set out to working on this set in an attempt to give drivers a way
> > > > to express clearly to user space standard-compliant counters.
> > > >
> > > > Second most common use for custom statistics is per-queue counters.
> > > > This is where the "hierarchical" part of this set comes in, as
> > > > groups can be nested, and user space tools can handle the aggregation
> > > > inside the groups if needed.
> > > >
> > > > This set also tries to address the problem of users not knowing if
> > > > a statistic is reported by hardware or the driver.  Many modern drivers
> > > > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > > > In this set, netlink attributes describe whether a group of statistics
> > > > is RX or TX, maintained by device or driver.
> > > >
> > > > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > > > an incredibly useful tool, and we will certainly continue using it.
> > > > However, for standard-based and commonly maintained statistics a more
> > > > structured API seems warranted.
> > > >
> > > > There are two things missing from these patches, which I initially
> > > > planned to address as well: filtering, and refresh rate control.
> > > >
> > > > Filtering doesn't need much explanation, users should be able to request
> > > > only a subset of statistics (like only SW stats or only given ID).  The
> > > > bitmap of statistics in each group is there for filtering later on.
> > > >
> > > > By refresh control I mean the ability for user space to indicate how
> > > > "fresh" values it expects.  Sometimes reading the HW counters requires
> > > > slow register reads or FW communication, in such cases drivers may cache
> > > > the result.  (Privileged) user space should be able to add a "not older
> > > > than" timestamp to indicate how fresh statistics it expects.  And vice
> > > > versa, drivers can then also put the timestamp of when the statistics
> > > > were last refreshed in the dump for more precise bandwidth estimation.
> > >
> > > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> > > mention 'partial' support for ethtool stats. I understand the reason
> > > you say its partial.
> > > But while at it, why not also include the ability to have driver
> > > extensible stats here ? ie make it complete. We have talked about
> > > making all hw stats available
> > > via the RTM_*STATS api in the past..., so just want to make sure the
> > > new HSTATS infra you are adding to the RTM_*STATS api
> > > covers or at-least makes it possible to include driver extensible
> > > stats in the future where the driver gets to define the stats id +
> > > value (This is very useful).
> > >  It would be nice if you can account for that in this new HSTATS API.
> >
> > My thinking was that we should leave truly custom/strange stats to
> > ethtool API which works quite well for that and at the same time be
> > very accepting of people adding new IDs to HSTAT (only requirement is
> > basically defining the meaning very clearly).
>
> that sounds reasonable. But the 'defining meaning clearly' gets tricky
> sometimes.
> The vendor who gets their ID or meaning first wins :) and the rest
> will have to live with
> ethtool and explain to rest of the world that ethtool is more reliable
> for their hardware :)
>
> I am also concerned that this getting the ID into common HSTAT ID
> space will  slow down the process of adding new counters
> for vendors. Which will lead to vendors sticking with ethtool API. It
> would be great if people can get all stats in one place and not rely
> on another API for 'more'.
>
> >
> > For the first stab I looked at two drivers and added all the stats that
> > were common.
> >
> > Given this set is identifying statistics by ID - how would we make that
> > extensible to drivers?  Would we go back to strings or have some
> > "driver specific" ID space?
>
> I was looking for ideas from you really, to see if you had considered
> this. agree per driver ID space seems ugly.
> ethtool strings are great today...if we can control the duplication.
> But thinking some more..., i did see some
> patches recently for vendor specific parameter (with ID) space in
> devlink. maybe something like that will be
> reasonable ?
>
> >
> > Is there any particular type of statistic you'd expect drivers to want
> > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > silicon ones, but I don't know much about switches :)
>
> I will have to go through the list. But switch asics do support
> flexible stats/counters that can be attached at various points.
> And new chip versions come with more support. Having that flexibility
> to expose/extend such stats incrementally is very valuable on a per
> hardware/vendor basis.

Just want to clarify that I am suggesting a nested HSTATS extension
infra for drivers (just like ethtool).
'Common stats' stays at the top-level.
Jakub Kicinski Jan. 31, 2019, 7:30 p.m. UTC | #5
On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:
> On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:
> > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:  
> > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:  
> > >
> > > My thinking was that we should leave truly custom/strange stats to
> > > ethtool API which works quite well for that and at the same time be
> > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > basically defining the meaning very clearly).  
> >
> > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > sometimes.
> > The vendor who gets their ID or meaning first wins :) and the rest
> > will have to live with
> > ethtool and explain to rest of the world that ethtool is more reliable
> > for their hardware :)

Right, that's the trade off inherent to standardization.  I don't see
any way to work around the fact that the definition may not fit all.

What I want as a end user and what I want for my customers is the
ability to switch the NIC on their system and not spend two months
"integrating" into their automation :(  If the definition of statistics
is not solid we're back to square one.

> > I am also concerned that this getting the ID into common HSTAT ID
> > space will  slow down the process of adding new counters
> > for vendors. Which will lead to vendors sticking with ethtool API. 

I feel like whatever we did here will end up looking much like the
ethtool interface, which is why I decided to leave that part out.
Ethtool -S works pretty well for custom stats.  Standard and structured
stats don't fit with it in any way, the two seem best left separate.

> > It would be great if people can get all stats in one place and not
> > rely on another API for 'more'.

One place in the driver or for the user?  I'm happy to add the code to
ethtool to also dump hstats and render them in a standard way.  In fact
the tool I have for testing has a "simplified" output format which
looks exactly like ethtool -S.

One place for the driver to report is hard, as I said I think the
custom stats are best left with ethtool.  Adding an extra incentive to
standardize.

> > > For the first stab I looked at two drivers and added all the stats that
> > > were common.
> > >
> > > Given this set is identifying statistics by ID - how would we make that
> > > extensible to drivers?  Would we go back to strings or have some
> > > "driver specific" ID space?  
> >
> > I was looking for ideas from you really, to see if you had considered
> > this. agree per driver ID space seems ugly.
> > ethtool strings are great today...if we can control the duplication.
> > But thinking some more..., i did see some
> > patches recently for vendor specific parameter (with ID) space in
> > devlink. maybe something like that will be
> > reasonable ?

I thought about this for a year and I basically came to the conclusion
I can't find any perfect solution, if there is one.

The devlink parameters are useful, but as anticipated they became the
laziest excuse of an ABI... Don't get me started ;)

> > > Is there any particular type of statistic you'd expect drivers to want
> > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > silicon ones, but I don't know much about switches :)  
> >
> > I will have to go through the list. But switch asics do support
> > flexible stats/counters that can be attached at various points.
> > And new chip versions come with more support. Having that flexibility
> > to expose/extend such stats incrementally is very valuable on a per
> > hardware/vendor basis.  

Yes, I'm not too familiar with those counters.  Do they need to be
enabled to start counting?  Do they have performance impact?  Can the
"sample" events perf-style?  How is the condition on which they trigger
defined?  Is it maybe just "match a packet and increment a counter"?
Would such counters benefit from hierarchical structure?

I was trying to cover the long standing use cases - namely the
IEEE/RMON stats which all MAC have had for years and per queue stats
which all drivers have had for years.  But if we can cater to more
cases I'm open.

> Just want to clarify that I am suggesting a nested HSTATS extension
> infra for drivers (just like ethtool).
> 'Common stats' stays at the top-level.

I got a concept of groups here.  The dump generally looks like this:

[root group A (say MAC stats)]
  [sub group RX]
  [sub group TX]
[root group B (say PCIe stats)]
  [sub group RX]
  [sub group TX]
[root group C (say per-q driver stats]
  [sub group RX]
    [q1 group]
    [q2 group]
    [q3 group]
  [sub group TX]
    [q1 group]
    [q2 group]
    [q3 group]

Each root group representing a "point in the pipeline".

So it's not too hard to add a root group with whatever, the questions
are move how would it benefit over existing ethtool if the stats are
custom anyway?  Hm..
Roopa Prabhu Feb. 2, 2019, 11:14 p.m. UTC | #6
On Thu, Jan 31, 2019 at 11:31 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:
> > On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:
> > > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:
> > > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > >
> > > > My thinking was that we should leave truly custom/strange stats to
> > > > ethtool API which works quite well for that and at the same time be
> > > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > > basically defining the meaning very clearly).
> > >
> > > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > > sometimes.
> > > The vendor who gets their ID or meaning first wins :) and the rest
> > > will have to live with
> > > ethtool and explain to rest of the world that ethtool is more reliable
> > > for their hardware :)
>
> Right, that's the trade off inherent to standardization.  I don't see
> any way to work around the fact that the definition may not fit all.
>
> What I want as a end user and what I want for my customers is the
> ability to switch the NIC on their system and not spend two months
> "integrating" into their automation :(  If the definition of statistics
> is not solid we're back to square one.

agree. And I am with you on standardizing them.

>
> > > I am also concerned that this getting the ID into common HSTAT ID
> > > space will  slow down the process of adding new counters
> > > for vendors. Which will lead to vendors sticking with ethtool API.
>
> I feel like whatever we did here will end up looking much like the
> ethtool interface, which is why I decided to leave that part out.
> Ethtool -S works pretty well for custom stats.  Standard and structured
> stats don't fit with it in any way, the two seem best left separate.

understand the fear. My only point was getting them together in a
single API is better so that they don't get developed separately and
we don't end up with duplicate stats code.

>
> > > It would be great if people can get all stats in one place and not
> > > rely on another API for 'more'.
>
> One place in the driver or for the user?  I'm happy to add the code to
> ethtool to also dump hstats and render them in a standard way.  In fact
> the tool I have for testing has a "simplified" output format which
> looks exactly like ethtool -S.
>
> One place for the driver to report is hard, as I said I think the
> custom stats are best left with ethtool.  Adding an extra incentive to
> standardize.
>
> > > > For the first stab I looked at two drivers and added all the stats that
> > > > were common.
> > > >
> > > > Given this set is identifying statistics by ID - how would we make that
> > > > extensible to drivers?  Would we go back to strings or have some
> > > > "driver specific" ID space?
> > >
> > > I was looking for ideas from you really, to see if you had considered
> > > this. agree per driver ID space seems ugly.
> > > ethtool strings are great today...if we can control the duplication.
> > > But thinking some more..., i did see some
> > > patches recently for vendor specific parameter (with ID) space in
> > > devlink. maybe something like that will be
> > > reasonable ?
>
> I thought about this for a year and I basically came to the conclusion
> I can't find any perfect solution, if there is one.
>
> The devlink parameters are useful, but as anticipated they became the
> laziest excuse of an ABI... Don't get me started ;)
>
> > > > Is there any particular type of statistic you'd expect drivers to want
> > > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > > silicon ones, but I don't know much about switches :)
> > >
> > > I will have to go through the list. But switch asics do support
> > > flexible stats/counters that can be attached at various points.
> > > And new chip versions come with more support. Having that flexibility
> > > to expose/extend such stats incrementally is very valuable on a per
> > > hardware/vendor basis.
>
> Yes, I'm not too familiar with those counters.  Do they need to be
> enabled to start counting?

yes correct.

> Do they have performance impact?

I have not heard of any performance impact...but they are not enabled
by default because of limited counter resource pool.

> Can the
> "sample" events perf-style?

I don't think so

> How is the condition on which they trigger
> defined?  Is it maybe just "match a packet and increment a counter"?

yes, something like that.

> Would such counters benefit from hierarchical structure?

hmm not sure.


One thing though, for most of these flexible counters and their
attachment points in hardware, we can count them on logical devices or
other objects in software like per vlan, vni, route stats etc.

>
> I was trying to cover the long standing use cases - namely the
> IEEE/RMON stats which all MAC have had for years and per queue stats
> which all drivers have had for years.  But if we can cater to more
> cases I'm open.
>
> > Just want to clarify that I am suggesting a nested HSTATS extension
> > infra for drivers (just like ethtool).
> > 'Common stats' stays at the top-level.
>
> I got a concept of groups here.  The dump generally looks like this:
>
> [root group A (say MAC stats)]
>   [sub group RX]
>   [sub group TX]
> [root group B (say PCIe stats)]
>   [sub group RX]
>   [sub group TX]
> [root group C (say per-q driver stats]
>   [sub group RX]
>     [q1 group]
>     [q2 group]
>     [q3 group]
>   [sub group TX]
>     [q1 group]
>     [q2 group]
>     [q3 group]
>
> Each root group representing a "point in the pipeline".
>
> So it's not too hard to add a root group with whatever, the questions
> are move how would it benefit over existing ethtool if the stats are
> custom anyway?  Hm..

It wouldn't. I am only saying that the netlink stats API is the new
place to move stats.
Ethtool stats will have to move to netlink some day and I don't see a
need to draw a hardline on saying no to ethtool custom stats moving to
the netlink based common stats API. And unless there is a good
migration path for a new hardware stats API that is all inclusive,
there is a higher chance of continued development on the older
hardware stats API.
I have no objections to having a standard set of stats (this is
essentially what we have for software stats too).

I don't want to block your series from going forward without HW custom
stats extensions. And IIUC your API is extensible and does not
preclude anyone from adding the ability to include HW custom stats
extensions in the future with enough justification. That is good for
me.

To take a random example, we expose the following stats on our
switches via ethtool. I have not used them personally but they
correspond to respective hardware counters. Is there any room for such
stats in the new HSTATS netlink API or they will have to stick to
ethtool ?
I believe people will need per-queue counters for this.

     HwIfOutWredDrops: 0
     HwIfOutQ0WredDrops: 0
     HwIfOutQ1WredDrops: 0
     HwIfOutQ2WredDrops: 0
     HwIfOutQ3WredDrops: 0
     HwIfOutQ4WredDrops: 0
     HwIfOutQ5WredDrops: 0
     HwIfOutQ6WredDrops: 0
     HwIfOutQ7WredDrops: 0
     HwIfOutQ8WredDrops: 0

     HwIfOutQ9WredDrops: 0
Jakub Kicinski Feb. 6, 2019, 4:45 a.m. UTC | #7
On Sat, 2 Feb 2019 15:14:44 -0800, Roopa Prabhu wrote:
> On Thu, Jan 31, 2019 at 11:31 AM Jakub Kicinski wrote:
> > On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:  
> > > On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:  
> > > > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:  
> > > > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > > >
> > > > > My thinking was that we should leave truly custom/strange stats to
> > > > > ethtool API which works quite well for that and at the same time be
> > > > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > > > basically defining the meaning very clearly).  
> > > >
> > > > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > > > sometimes.
> > > > The vendor who gets their ID or meaning first wins :) and the rest
> > > > will have to live with
> > > > ethtool and explain to rest of the world that ethtool is more reliable
> > > > for their hardware :)  
> >
> > Right, that's the trade off inherent to standardization.  I don't see
> > any way to work around the fact that the definition may not fit all.
> >
> > What I want as a end user and what I want for my customers is the
> > ability to switch the NIC on their system and not spend two months
> > "integrating" into their automation :(  If the definition of statistics
> > is not solid we're back to square one.  
> 
> agree. And I am with you on standardizing them.
> 
> >  
> > > > I am also concerned that this getting the ID into common HSTAT ID
> > > > space will  slow down the process of adding new counters
> > > > for vendors. Which will lead to vendors sticking with ethtool API.  
> >
> > I feel like whatever we did here will end up looking much like the
> > ethtool interface, which is why I decided to leave that part out.
> > Ethtool -S works pretty well for custom stats.  Standard and structured
> > stats don't fit with it in any way, the two seem best left separate.  
> 
> understand the fear. My only point was getting them together in a
> single API is better so that they don't get developed separately and
> we don't end up with duplicate stats code.
> 
> >  
> > > > It would be great if people can get all stats in one place and not
> > > > rely on another API for 'more'.  
> >
> > One place in the driver or for the user?  I'm happy to add the code to
> > ethtool to also dump hstats and render them in a standard way.  In fact
> > the tool I have for testing has a "simplified" output format which
> > looks exactly like ethtool -S.
> >
> > One place for the driver to report is hard, as I said I think the
> > custom stats are best left with ethtool.  Adding an extra incentive to
> > standardize.
> >  
> > > > > For the first stab I looked at two drivers and added all the stats that
> > > > > were common.
> > > > >
> > > > > Given this set is identifying statistics by ID - how would we make that
> > > > > extensible to drivers?  Would we go back to strings or have some
> > > > > "driver specific" ID space?  
> > > >
> > > > I was looking for ideas from you really, to see if you had considered
> > > > this. agree per driver ID space seems ugly.
> > > > ethtool strings are great today...if we can control the duplication.
> > > > But thinking some more..., i did see some
> > > > patches recently for vendor specific parameter (with ID) space in
> > > > devlink. maybe something like that will be
> > > > reasonable ?  
> >
> > I thought about this for a year and I basically came to the conclusion
> > I can't find any perfect solution, if there is one.
> >
> > The devlink parameters are useful, but as anticipated they became the
> > laziest excuse of an ABI... Don't get me started ;)
> >  
> > > > > Is there any particular type of statistic you'd expect drivers to want
> > > > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > > > silicon ones, but I don't know much about switches :)  
> > > >
> > > > I will have to go through the list. But switch asics do support
> > > > flexible stats/counters that can be attached at various points.
> > > > And new chip versions come with more support. Having that flexibility
> > > > to expose/extend such stats incrementally is very valuable on a per
> > > > hardware/vendor basis.  
> >
> > Yes, I'm not too familiar with those counters.  Do they need to be
> > enabled to start counting?  
> 
> yes correct.
> 
> > Do they have performance impact?  
> 
> I have not heard of any performance impact...but they are not enabled
> by default because of limited counter resource pool.

I see.. I'd personally see that as something that we probably either
support via perf, or new devlink perf creation.  Those are perf events,
not stats to me.  Devlink would probably suit fixed HW better, and
perf could feel slightly more natural to certain NICs (*ekhm* perf
traces of offloaded BPF programs).

> > Can the
> > "sample" events perf-style?  
> 
> I don't think so
> 
> > How is the condition on which they trigger
> > defined?  Is it maybe just "match a packet and increment a counter"?  
> 
> yes, something like that.
> 
> > Would such counters benefit from hierarchical structure?  
> 
> hmm not sure.
> 
> 
> One thing though, for most of these flexible counters and their
> attachment points in hardware, we can count them on logical devices or
> other objects in software like per vlan, vni, route stats etc.
> 
> >
> > I was trying to cover the long standing use cases - namely the
> > IEEE/RMON stats which all MAC have had for years and per queue stats
> > which all drivers have had for years.  But if we can cater to more
> > cases I'm open.
> >  
> > > Just want to clarify that I am suggesting a nested HSTATS extension
> > > infra for drivers (just like ethtool).
> > > 'Common stats' stays at the top-level.  
> >
> > I got a concept of groups here.  The dump generally looks like this:
> >
> > [root group A (say MAC stats)]
> >   [sub group RX]
> >   [sub group TX]
> > [root group B (say PCIe stats)]
> >   [sub group RX]
> >   [sub group TX]
> > [root group C (say per-q driver stats]
> >   [sub group RX]
> >     [q1 group]
> >     [q2 group]
> >     [q3 group]
> >   [sub group TX]
> >     [q1 group]
> >     [q2 group]
> >     [q3 group]
> >
> > Each root group representing a "point in the pipeline".
> >
> > So it's not too hard to add a root group with whatever, the questions
> > are move how would it benefit over existing ethtool if the stats are
> > custom anyway?  Hm..  
> 
> It wouldn't. I am only saying that the netlink stats API is the new
> place to move stats.
> Ethtool stats will have to move to netlink some day and I don't see a
> need to draw a hardline on saying no to ethtool custom stats moving to
> the netlink based common stats API. And unless there is a good
> migration path for a new hardware stats API that is all inclusive,
> there is a higher chance of continued development on the older
> hardware stats API.
> I have no objections to having a standard set of stats (this is
> essentially what we have for software stats too).
> 
> I don't want to block your series from going forward without HW custom
> stats extensions. And IIUC your API is extensible and does not
> preclude anyone from adding the ability to include HW custom stats
> extensions in the future with enough justification. That is good for
> me.

Would you be more interested in seeing the similarity in API on the
driver side or on the netlink side?  I was hoping to leave the legacy
stats in ethtool (soon to be running over netlink as well) for the
time being.  I wish we had some form of library on the iproute2 side 
we could evolve together with the kernel libbpf-style :(

> To take a random example, we expose the following stats on our
> switches via ethtool. I have not used them personally but they
> correspond to respective hardware counters. Is there any room for such
> stats in the new HSTATS netlink API or they will have to stick to
> ethtool ?
> I believe people will need per-queue counters for this.
> 
>      HwIfOutWredDrops: 0
>      HwIfOutQ0WredDrops: 0
>      HwIfOutQ1WredDrops: 0
>      HwIfOutQ2WredDrops: 0
>      HwIfOutQ3WredDrops: 0
>      HwIfOutQ4WredDrops: 0
>      HwIfOutQ5WredDrops: 0
>      HwIfOutQ6WredDrops: 0
>      HwIfOutQ7WredDrops: 0
>      HwIfOutQ8WredDrops: 0
>      HwIfOutQ9WredDrops: 0

Well, yes, so these are clearly enough defined stats, and I'd be very
happy to add an ID for you for those... if those shouldn't be reported
in the tc qdisc red stats that should be used to configure WRED :(
Florian Fainelli Feb. 6, 2019, 8:12 p.m. UTC | #8
On 1/28/19 3:44 PM, Jakub Kicinski wrote:
> Hi!
> 
> As I tried to explain in my slides at netconf 2018 we are lacking
> an expressive, standard API to report device statistics.
> 
> Networking silicon generally maintains some IEEE 802.3 and/or RMON
> statistics.  Today those all end up in ethtool -S.  Here is a simple
> attempt (admittedly very imprecise) of counting how many names driver
> authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> statistics (RX and TX):
> 
> $ git grep '".*512.*1023.*"' -- drivers/net/ | \
>     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> 63
> 
> Interestingly only two drivers in the tree use the name the standard
> gave us (etherStatsPkts512to1023, modulo case).
> 
> I set out to working on this set in an attempt to give drivers a way
> to express clearly to user space standard-compliant counters.
> 
> Second most common use for custom statistics is per-queue counters.
> This is where the "hierarchical" part of this set comes in, as
> groups can be nested, and user space tools can handle the aggregation
> inside the groups if needed.
> 
> This set also tries to address the problem of users not knowing if
> a statistic is reported by hardware or the driver.  Many modern drivers
> use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> In this set, netlink attributes describe whether a group of statistics
> is RX or TX, maintained by device or driver.
> 
> The purpose of this patch set is _not_ to replace ethtool -S.  It is
> an incredibly useful tool, and we will certainly continue using it.
> However, for standard-based and commonly maintained statistics a more
> structured API seems warranted.
> 
> There are two things missing from these patches, which I initially
> planned to address as well: filtering, and refresh rate control.
> 
> Filtering doesn't need much explanation, users should be able to request
> only a subset of statistics (like only SW stats or only given ID).  The
> bitmap of statistics in each group is there for filtering later on.
> 
> By refresh control I mean the ability for user space to indicate how
> "fresh" values it expects.  Sometimes reading the HW counters requires
> slow register reads or FW communication, in such cases drivers may cache
> the result.  (Privileged) user space should be able to add a "not older
> than" timestamp to indicate how fresh statistics it expects.  And vice
> versa, drivers can then also put the timestamp of when the statistics
> were last refreshed in the dump for more precise bandwidth estimation.

Another thing that we cannot quite do with ethtool right now, at least
not easily, is something like the following use case.

You have some filtering/classification capable hardware, and the HW can
count the number of times a rule has been hit/missed. The number of
rules programmed into the HW is dynamic and depends on use case so
dumping them all is not convenient for e.g.: hundreds/thousands of rules.

You would want to return only the rules that are active/enabled, and not
the full possible range of rules. With ethtool, this is not possible
because you have to define the strings first, and in a second call, you
are going to get the dump and fill in the data returned to user-space...

I will review more in depth, but the idea looks great so far.

> 
> Jakub Kicinski (14):
>   nfp: remove unused structure
>   nfp: constify parameter to nfp_port_from_netdev()
>   net: hstats: add basic/core functionality
>   net: hstats: allow hierarchies to be built
>   nfp: very basic hstat support
>   net: hstats: allow iterators
>   net: hstats: help in iteration over directions
>   nfp: hstats: make use of iteration for direction
>   nfp: hstats: add driver and device per queue statistics
>   net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
>   nfp: hstats: add IEEE/RMON ethernet port/MAC stats
>   net: hstats: add markers for partial groups
>   nfp: hstats: add a partial group of per-8021Q prio stats
>   Documentation: networking: describe new hstat API
> 
>  Documentation/networking/hstats.rst           | 590 +++++++++++++++
>  .../networking/hstats_flow_example.dot        |  11 +
>  Documentation/networking/index.rst            |   1 +
>  drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_hstat.c    | 474 ++++++++++++
>  drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
>  drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
>  drivers/net/ethernet/netronome/nfp/nfp_net.h  |  10 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_net_repr.h |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.c |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 +-
>  include/linux/netdevice.h                     |   9 +
>  include/net/hstats.h                          | 176 +++++
>  include/uapi/linux/if_link.h                  | 107 +++
>  net/core/Makefile                             |   2 +-
>  net/core/hstats.c                             | 682 ++++++++++++++++++
>  net/core/rtnetlink.c                          |  21 +
>  18 files changed, 2084 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/networking/hstats.rst
>  create mode 100644 Documentation/networking/hstats_flow_example.dot
>  create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c
>  create mode 100644 include/net/hstats.h
>  create mode 100644 net/core/hstats.c
>
Jakub Kicinski Feb. 7, 2019, 4:23 p.m. UTC | #9
On Wed, 6 Feb 2019 12:12:39 -0800, Florian Fainelli wrote:
> > By refresh control I mean the ability for user space to indicate how
> > "fresh" values it expects.  Sometimes reading the HW counters requires
> > slow register reads or FW communication, in such cases drivers may cache
> > the result.  (Privileged) user space should be able to add a "not older
> > than" timestamp to indicate how fresh statistics it expects.  And vice
> > versa, drivers can then also put the timestamp of when the statistics
> > were last refreshed in the dump for more precise bandwidth estimation.  
> 
> Another thing that we cannot quite do with ethtool right now, at least
> not easily, is something like the following use case.
> 
> You have some filtering/classification capable hardware, and the HW can
> count the number of times a rule has been hit/missed. The number of
> rules programmed into the HW is dynamic and depends on use case so
> dumping them all is not convenient for e.g.: hundreds/thousands of rules.

That raises the inevitable question of what is the source of the rules
i.e. which API has been used to configure them?

> You would want to return only the rules that are active/enabled, and not
> the full possible range of rules. With ethtool, this is not possible
> because you have to define the strings first, and in a second call, you
> are going to get the dump and fill in the data returned to user-space...

Interesting, if the driver is caching the stats it can remember both
last refresh and last change and return only the statistics which
changed since time X.  Would the "last changed" time stamp be of any
use to user space?  Probably not, right?

> I will review more in depth, but the idea looks great so far.

Thanks!