mbox series

[net-next,00/10] net: allow user specify TC filter HW stats type

Message ID 20200221095643.6642-1-jiri@resnulli.us
Headers show
Series net: allow user specify TC filter HW stats type | expand

Message

Jiri Pirko Feb. 21, 2020, 9:56 a.m. UTC
From: Jiri Pirko <jiri@mellanox.com>

Currently, when user adds a TC filter and the filter gets offloaded,
the user expects the HW stats to be counted and included in stats dump.
However, since drivers may implement different types of counting, there
is no way to specify which one the user is interested in.

For example for mlx5, only delayed counters are available as the driver
periodically polls for updated stats.

In case of mlxsw, the counters are queried on dump time. However, the
HW resources for this type of counters is quite limited (couple of
thousands). This limits the amount of supported offloaded filters
significantly. Without counter assigned, the HW is capable to carry
millions of those.

On top of that, mlxsw HW is able to support delayed counters as well in
greater numbers. That is going to be added in a follow-up patch.

This patchset allows user to specify one of the following types of HW
stats for added fitler:
any - current default, user does not care about the type, just expects
      any type of stats.
immediate - queried during dump time
delayed - polled from HW periodically or sent by HW in async manner
disabled - no stats needed

Examples:
$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower hw_stats disabled dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0 
filter protocol ip pref 1 flower chain 0 handle 0x1 
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
  hw_stats disabled
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 2 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0

$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower hw_stats immediate dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0 
filter protocol ip pref 1 flower chain 0 handle 0x1 
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
  hw_stats immediate
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 14 sec used 7 sec
        Action statistics:
        Sent 102 bytes 1 pkt (dropped 1, overlimits 0 requeues 0) 
        Sent software 0 bytes 0 pkt
        Sent hardware 102 bytes 1 pkt
        backlog 0b 0p requeues 0

Jiri Pirko (10):
  net: rename tc_cls_can_offload_and_chain0() to
    tc_cls_can_offload_basic()
  iavf: use tc_cls_can_offload_basic() instead of chain check
  flow_offload: Introduce offload of HW stats type
  net: extend tc_cls_can_offload_basic() to check HW stats type
  mlx5: restrict supported HW stats type to "any"
  mlxsw: restrict supported HW stats type to "any"
  flow_offload: introduce "immediate" HW stats type and allow it in
    mlxsw
  flow_offload: introduce "delayed" HW stats type and allow it in mlx5
  flow_offload: introduce "disabled" HW stats type and allow it in mlxsw
  sched: cls_flower: allow user to specify type of HW stats for a filter

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c |  2 +-
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   |  4 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  8 ++--
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  8 +++-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  3 +-
 .../ethernet/mellanox/mlxsw/spectrum_flower.c | 31 +++++++++++-----
 drivers/net/ethernet/mscc/ocelot_flower.c     |  2 +-
 drivers/net/ethernet/mscc/ocelot_tc.c         |  2 +-
 drivers/net/ethernet/netronome/nfp/abm/cls.c  |  2 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  2 +-
 .../ethernet/netronome/nfp/flower/offload.c   |  2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c |  2 +-
 drivers/net/netdevsim/bpf.c                   |  2 +-
 include/net/flow_offload.h                    |  8 ++++
 include/net/pkt_cls.h                         | 37 ++++++++++++++++++-
 include/uapi/linux/pkt_cls.h                  | 27 ++++++++++++++
 net/sched/cls_flower.c                        | 12 ++++++
 22 files changed, 132 insertions(+), 32 deletions(-)

Comments

Jakub Kicinski Feb. 21, 2020, 6:22 p.m. UTC | #1
On Fri, 21 Feb 2020 10:56:33 +0100 Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Currently, when user adds a TC filter and the filter gets offloaded,
> the user expects the HW stats to be counted and included in stats dump.
> However, since drivers may implement different types of counting, there
> is no way to specify which one the user is interested in.
> 
> For example for mlx5, only delayed counters are available as the driver
> periodically polls for updated stats.
> 
> In case of mlxsw, the counters are queried on dump time. However, the
> HW resources for this type of counters is quite limited (couple of
> thousands). This limits the amount of supported offloaded filters
> significantly. Without counter assigned, the HW is capable to carry
> millions of those.
> 
> On top of that, mlxsw HW is able to support delayed counters as well in
> greater numbers. That is going to be added in a follow-up patch.
> 
> This patchset allows user to specify one of the following types of HW
> stats for added fitler:
> any - current default, user does not care about the type, just expects
>       any type of stats.
> immediate - queried during dump time
> delayed - polled from HW periodically or sent by HW in async manner
> disabled - no stats needed

Hmm, but the statistics are on actions, it feels a little bit like we
are perpetuating the mistake of counting on filters here.

Would it not work to share actions between filters which don't need
fine grained stats if HW can do more filters than stats?

Let's CC Ed on this.
Jiri Pirko Feb. 22, 2020, 6:38 a.m. UTC | #2
Fri, Feb 21, 2020 at 07:22:00PM CET, kuba@kernel.org wrote:
>On Fri, 21 Feb 2020 10:56:33 +0100 Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Currently, when user adds a TC filter and the filter gets offloaded,
>> the user expects the HW stats to be counted and included in stats dump.
>> However, since drivers may implement different types of counting, there
>> is no way to specify which one the user is interested in.
>> 
>> For example for mlx5, only delayed counters are available as the driver
>> periodically polls for updated stats.
>> 
>> In case of mlxsw, the counters are queried on dump time. However, the
>> HW resources for this type of counters is quite limited (couple of
>> thousands). This limits the amount of supported offloaded filters
>> significantly. Without counter assigned, the HW is capable to carry
>> millions of those.
>> 
>> On top of that, mlxsw HW is able to support delayed counters as well in
>> greater numbers. That is going to be added in a follow-up patch.
>> 
>> This patchset allows user to specify one of the following types of HW
>> stats for added fitler:
>> any - current default, user does not care about the type, just expects
>>       any type of stats.
>> immediate - queried during dump time
>> delayed - polled from HW periodically or sent by HW in async manner
>> disabled - no stats needed
>
>Hmm, but the statistics are on actions, it feels a little bit like we
>are perpetuating the mistake of counting on filters here.

You are right, the stats in tc are per-action. However, in both mlxsw
and mlx5, the stats are per-filter. What hw_stats does is that it
basically allows the user to set the type for all the actions in the
filter at once. 

Could you imagine a usecase that would use different HW stats type for
different actions in one action-chain?

Plus, if the fine grained setup is ever needed, the hw_stats could be in
future easilyt extended to be specified per-action too overriding
the per-filter setup only for specific action. What do you think?


>
>Would it not work to share actions between filters which don't need
>fine grained stats if HW can do more filters than stats?

For mlxsw it would work, if the action chain would be identical. For
that you don't need the per-action granularity of setting type hw_stats.

>
>Let's CC Ed on this.
>
Edward Cree Feb. 24, 2020, 11:38 a.m. UTC | #3
On 22/02/2020 06:38, Jiri Pirko wrote:
> Fri, Feb 21, 2020 at 07:22:00PM CET, kuba@kernel.org wrote:
>> Hmm, but the statistics are on actions, it feels a little bit like we
>> are perpetuating the mistake of counting on filters here.
> You are right, the stats in tc are per-action. However, in both mlxsw
> and mlx5, the stats are per-filter. What hw_stats does is that it
> basically allows the user to set the type for all the actions in the
> filter at once.
>
> Could you imagine a usecase that would use different HW stats type for
> different actions in one action-chain?
Potentially a user could only want stats on one action and disable them
 on another.  For instance, if their action chain does delivery to the
 'customer' and also mirrors the packets for monitoring, they might only
 want stats on the first delivery (e.g. for billing) and not want to
 waste a counter on the mirror.

> Plus, if the fine grained setup is ever needed, the hw_stats could be in
> future easilyt extended to be specified per-action too overriding
> the per-filter setup only for specific action. What do you think?
I think this is improper; the stats type should be defined per-action in
 the uapi, even if userland initially only has UI to set it across the
 entire filter.  (I imagine `tc action` would grow the corresponding UI
 pretty quickly.)  Then on the driver side, you must determine whether
 your hardware can support the user's request, and if not, return an
 appropriate error code.

Let's try not to encourage people (users, future HW & driver developers)
 to keep thinking of stats as per-filter entities.

-ed
Jiri Pirko Feb. 24, 2020, 1:11 p.m. UTC | #4
Mon, Feb 24, 2020 at 12:38:20PM CET, ecree@solarflare.com wrote:
>On 22/02/2020 06:38, Jiri Pirko wrote:
>> Fri, Feb 21, 2020 at 07:22:00PM CET, kuba@kernel.org wrote:
>>> Hmm, but the statistics are on actions, it feels a little bit like we
>>> are perpetuating the mistake of counting on filters here.
>> You are right, the stats in tc are per-action. However, in both mlxsw
>> and mlx5, the stats are per-filter. What hw_stats does is that it
>> basically allows the user to set the type for all the actions in the
>> filter at once.
>>
>> Could you imagine a usecase that would use different HW stats type for
>> different actions in one action-chain?
>Potentially a user could only want stats on one action and disable them
> on another.  For instance, if their action chain does delivery to the
> 'customer' and also mirrors the packets for monitoring, they might only
> want stats on the first delivery (e.g. for billing) and not want to
> waste a counter on the mirror.

Okay.


>
>> Plus, if the fine grained setup is ever needed, the hw_stats could be in
>> future easilyt extended to be specified per-action too overriding
>> the per-filter setup only for specific action. What do you think?
>I think this is improper; the stats type should be defined per-action in
> the uapi, even if userland initially only has UI to set it across the
> entire filter.  (I imagine `tc action` would grow the corresponding UI
> pretty quickly.)  Then on the driver side, you must determine whether
> your hardware can support the user's request, and if not, return an
> appropriate error code.
>
>Let's try not to encourage people (users, future HW & driver developers)
> to keep thinking of stats as per-filter entities.

Okay, but in that case, it does not make sense to have "UI to set it
across the entire filter". The UI would have to set it per-action too.
Othewise it would make people think it is per-filter entity.
But I guess the tc cmdline is chatty already an it can take other
repetitive cmdline options.

What do you think?


Thanks!
Jamal Hadi Salim Feb. 24, 2020, 3:45 p.m. UTC | #5
On 2020-02-24 8:11 a.m., Jiri Pirko wrote:
> Mon, Feb 24, 2020 at 12:38:20PM CET, ecree@solarflare.com wrote:
>> On 22/02/2020 06:38, Jiri Pirko wrote:

[..]
>> Potentially a user could only want stats on one action and disable them
>>   on another.  For instance, if their action chain does delivery to the
>>   'customer' and also mirrors the packets for monitoring, they might only
>>   want stats on the first delivery (e.g. for billing) and not want to
>>   waste a counter on the mirror.
> 
> Okay.
> 

+1  very important for telco billing use cases i am familiar with.

Ancient ACL implementations that had only one filter and
one action (drop/accept) didnt need more than one counter.
But not anymore in my opinion.

There's also a requirement for the concept of "sharing" - think
"family plans" or "small bussiness plan".
Counters may be shared across multiple filter-action chains for example.

> 
>>
>>> Plus, if the fine grained setup is ever needed, the hw_stats could be in
>>> future easilyt extended to be specified per-action too overriding
>>> the per-filter setup only for specific action. What do you think?
>> I think this is improper; the stats type should be defined per-action in
>>   the uapi, even if userland initially only has UI to set it across the
>>   entire filter.  (I imagine `tc action` would grow the corresponding UI
>>   pretty quickly.)  Then on the driver side, you must determine whether
>>   your hardware can support the user's request, and if not, return an
>>   appropriate error code.
>>
>> Let's try not to encourage people (users, future HW & driver developers)
>>   to keep thinking of stats as per-filter entities.
> > Okay, but in that case, it does not make sense to have "UI to set it
> across the entire filter". The UI would have to set it per-action too.
> Othewise it would make people think it is per-filter entity.
> But I guess the tc cmdline is chatty already an it can take other
> repetitive cmdline options.
> 

I normally visualize policy as a dag composed of filter(s) +
actions. The UI and uAPI has to be able to express that.

I am not sure if mlxsw works this way, but:
Most hardware i have encountered tends to have a separate
stats/counter table. The stats table is indexed.

Going backwards and looking at your example in this stanza:
---
   in_hw in_hw_count 2
   hw_stats immediate
         action order 1: gact action drop
          random type none pass val 0
          index 1 ref 1 bind 1 installed 14 sec used 7 sec
         Action statistics:
----

Guessing from "in_hw in_hw_count 2" - 2 is a hw stats table index?
If you have enough counters in hardware, the "stats table index"
essentially can be directly mapped to the action "index" attribute.

Sharing then becomes a matter of specifying the same drop action
with the correct index across multiple filters.

If you dont have enough hw counters - then perhaps a scheme to show
separate hardware counter index and software counter index (aka action
index) is needed.

cheers,
jamal

> What do you think?
> 
> 
> Thanks!
>
Edward Cree Feb. 24, 2020, 3:50 p.m. UTC | #6
On 24/02/2020 15:45, Jamal Hadi Salim wrote:
> Going backwards and looking at your example in this stanza:
> ---
>   in_hw in_hw_count 2
>   hw_stats immediate
>         action order 1: gact action drop
>          random type none pass val 0
>          index 1 ref 1 bind 1 installed 14 sec used 7 sec
>         Action statistics:
> ----
>
> Guessing from "in_hw in_hw_count 2" - 2 is a hw stats table index?
AIUI in_hw_count is a reference count of hardware devices that have
 offloaded the rule.  Nothing to do with stats "counters".

-ed
Jamal Hadi Salim Feb. 24, 2020, 3:55 p.m. UTC | #7
On 2020-02-24 10:50 a.m., Edward Cree wrote:
> On 24/02/2020 15:45, Jamal Hadi Salim wrote:
>> Going backwards and looking at your example in this stanza:
>> ---
>>    in_hw in_hw_count 2
>>    hw_stats immediate
>>          action order 1: gact action drop
>>           random type none pass val 0
>>           index 1 ref 1 bind 1 installed 14 sec used 7 sec
>>          Action statistics:
>> ----
>>
>> Guessing from "in_hw in_hw_count 2" - 2 is a hw stats table index?
> AIUI in_hw_count is a reference count of hardware devices that have
>   offloaded the rule.  Nothing to do with stats "counters".

ah, ok;->. A more descriptive noun (hw_ref_count) would be nice.

Is it fair to assume that there's some form of stats table which is
indexed? And that the hw table index is accessible to the driver?

cheers,
jamal
Jiri Pirko Feb. 24, 2020, 4:25 p.m. UTC | #8
Mon, Feb 24, 2020 at 04:45:57PM CET, jhs@mojatatu.com wrote:
>On 2020-02-24 8:11 a.m., Jiri Pirko wrote:
>> Mon, Feb 24, 2020 at 12:38:20PM CET, ecree@solarflare.com wrote:
>> > On 22/02/2020 06:38, Jiri Pirko wrote:
>
>[..]
>> > Potentially a user could only want stats on one action and disable them
>> >   on another.  For instance, if their action chain does delivery to the
>> >   'customer' and also mirrors the packets for monitoring, they might only
>> >   want stats on the first delivery (e.g. for billing) and not want to
>> >   waste a counter on the mirror.
>> 
>> Okay.
>> 
>
>+1  very important for telco billing use cases i am familiar with.
>
>Ancient ACL implementations that had only one filter and
>one action (drop/accept) didnt need more than one counter.
>But not anymore in my opinion.
>
>There's also a requirement for the concept of "sharing" - think
>"family plans" or "small bussiness plan".
>Counters may be shared across multiple filter-action chains for example.

In hardware, we have a separate "counter" action with counter index.
You can reuse this index in multiple counter action instances.
However, in tc there is implicit separate counter for every action.

The counter is limited resource. So we overcome this mismatch in mlxsw
by having action "counter" always first for every rule inserted:
rule->action_counter,the_actual_action,the_actual_action2,...the_actual_actionN

and we report stats from action_counter for all the_actual_actionX.


>
>> 
>> > 
>> > > Plus, if the fine grained setup is ever needed, the hw_stats could be in
>> > > future easilyt extended to be specified per-action too overriding
>> > > the per-filter setup only for specific action. What do you think?
>> > I think this is improper; the stats type should be defined per-action in
>> >   the uapi, even if userland initially only has UI to set it across the
>> >   entire filter.  (I imagine `tc action` would grow the corresponding UI
>> >   pretty quickly.)  Then on the driver side, you must determine whether
>> >   your hardware can support the user's request, and if not, return an
>> >   appropriate error code.
>> > 
>> > Let's try not to encourage people (users, future HW & driver developers)
>> >   to keep thinking of stats as per-filter entities.
>> > Okay, but in that case, it does not make sense to have "UI to set it
>> across the entire filter". The UI would have to set it per-action too.
>> Othewise it would make people think it is per-filter entity.
>> But I guess the tc cmdline is chatty already an it can take other
>> repetitive cmdline options.
>> 
>
>I normally visualize policy as a dag composed of filter(s) +
>actions. The UI and uAPI has to be able to express that.
>
>I am not sure if mlxsw works this way, but:
>Most hardware i have encountered tends to have a separate
>stats/counter table. The stats table is indexed.
>
>Going backwards and looking at your example in this stanza:
>---
>  in_hw in_hw_count 2
>  hw_stats immediate
>        action order 1: gact action drop
>         random type none pass val 0
>         index 1 ref 1 bind 1 installed 14 sec used 7 sec
>        Action statistics:
>----
>
>Guessing from "in_hw in_hw_count 2" - 2 is a hw stats table index?
>If you have enough counters in hardware, the "stats table index"
>essentially can be directly mapped to the action "index" attribute.

>
>Sharing then becomes a matter of specifying the same drop action
>with the correct index across multiple filters.
>
>If you dont have enough hw counters - then perhaps a scheme to show
>separate hardware counter index and software counter index (aka action
>index) is needed.

I don't, that is the purpose of this patchset, to be able to avoid the
"action_counter" from the example I wrote above.

Note that I don't want to share, there is still separate "last_hit"
record in hw I expose in "used X sec". Interestingly enough, in
Spectrum-1 this is per rule, in Spectrum-2,3 this is per action block :)


>
>cheers,
>jamal
>
>> What do you think?
>> 
>> 
>> Thanks!
>> 
>
Jamal Hadi Salim Feb. 25, 2020, 4:01 p.m. UTC | #9
+Cc Marian.

On 2020-02-24 11:25 a.m., Jiri Pirko wrote:
> Mon, Feb 24, 2020 at 04:45:57PM CET, jhs@mojatatu.com wrote:
>> On 2020-02-24 8:11 a.m., Jiri Pirko wrote:
>>> Mon, Feb 24, 2020 at 12:38:20PM CET, ecree@solarflare.com wrote:
>>>> On 22/02/2020 06:38, Jiri Pirko wrote:
>>

>> There's also a requirement for the concept of "sharing" - think
>> "family plans" or "small bussiness plan".
>> Counters may be shared across multiple filter-action chains for example.
> 
> In hardware, we have a separate "counter" action with counter index.

Ok, so it is similar semantics.
In your case, you abstract it as a speacial action, but in most
abstractions(including P4) it looks like an indexed table.
 From a tc perspective you could abstract the equivalent to
your "counter action" as a gact "ok" or "pipe",etc depending
on your policy goal. The counter index becomes the gact index
if there is no conflict.
In most actions "index" attribute is really mapped to a
"counter" index. Exception would be actions with state
(like policer).

> You can reuse this index in multiple counter action instances.

That is great because it maps to tc semantics. When you create
an action of the same type, you can specify the index and it
is re-used. Example:

sudo tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:8 \
action vlan push id 8 protocol 802.1q index 8\
action mirred egress mirror dev eth0 index 111

sudo tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.15/32 flowid 1:10 \
action vlan push id 15 protocol 802.1q index 15 \
action mirred egress mirror index 111 \
action drop index 111

So for the shared mirror action the counter is shared
by virtue of specifying index 111.

What tc _doesnt allow_ is to re-use the same
counter index across different types of actions (example
mirror index 111 is not the same instance as drop 111).
Thats why i was asking if you are exposing the hw index.

> However, in tc there is implicit separate counter for every action.
> 

Yes, and is proving to be a challenge for hw. In s/w it makes sense.
It seemed sensible at the time; existing hardware back then
(broadcom 5691 family and cant remember the other vendor, iirc)
hard coded the actions with counters. Mind you they would
only support one action per match.

Some rethinking is needed of this status quo.
So maybe having syntaticaly an index for s/w vs h/w may make
sense.
Knowing the indices is important. As an example, for telemetry
you may be very interesting in dumping only the counter stats
instead of the rule. Dumping gact has always made it easy in
my use cases because it doesnt have a lot of attributes. But it
could make sense to introduce a new semantic like "dump action stats .."

> The counter is limited resource. So we overcome this mismatch in mlxsw
> by having action "counter" always first for every rule inserted:
> rule->action_counter,the_actual_action,the_actual_action2,...the_actual_actionN
>

So i am guessing the hw cant support "branching" i.e based on in
some action state sometime you may execute action foo and other times
action bar. Those kind of scenarios would need multiple counters.
> and we report stats from action_counter for all the_actual_actionX.

This may not be accurate if you are branching - for example
a policer or quota enforcer which either accepts or drops or sends next
to a marker action etc .
IMO, this was fine in the old days when you had one action per match.
Best is to leave it to whoever creates the policy to decide what to
count. IOW, I think modelling it as a pipe or ok or drop or continue
and be placed anywhere in the policy graph instead of the begining.

>> Sharing then becomes a matter of specifying the same drop action
>> with the correct index across multiple filters.
>>
>> If you dont have enough hw counters - then perhaps a scheme to show
>> separate hardware counter index and software counter index (aka action
>> index) is needed.
> 
> I don't, that is the purpose of this patchset, to be able to avoid the
> "action_counter" from the example I wrote above.

IMO, it would make sense to reuse existing gact for example and
annotate s/w vs h/w indices as a starting point. It keeps the
existing approach intact.

> Note that I don't want to share, there is still separate "last_hit"
> record in hw I expose in "used X sec". Interestingly enough, in
> Spectrum-1 this is per rule, in Spectrum-2,3 this is per action block :)

I didnt understand this one..

cheers,
jamal
Jiri Pirko Feb. 25, 2020, 4:22 p.m. UTC | #10
Tue, Feb 25, 2020 at 05:01:05PM CET, jhs@mojatatu.com wrote:
>+Cc Marian.
>
>On 2020-02-24 11:25 a.m., Jiri Pirko wrote:
>> Mon, Feb 24, 2020 at 04:45:57PM CET, jhs@mojatatu.com wrote:
>> > On 2020-02-24 8:11 a.m., Jiri Pirko wrote:
>> > > Mon, Feb 24, 2020 at 12:38:20PM CET, ecree@solarflare.com wrote:
>> > > > On 22/02/2020 06:38, Jiri Pirko wrote:
>> > 
>
>> > There's also a requirement for the concept of "sharing" - think
>> > "family plans" or "small bussiness plan".
>> > Counters may be shared across multiple filter-action chains for example.
>> 
>> In hardware, we have a separate "counter" action with counter index.
>
>Ok, so it is similar semantics.
>In your case, you abstract it as a speacial action, but in most
>abstractions(including P4) it looks like an indexed table.
>From a tc perspective you could abstract the equivalent to
>your "counter action" as a gact "ok" or "pipe",etc depending
>on your policy goal. The counter index becomes the gact index
>if there is no conflict.
>In most actions "index" attribute is really mapped to a
>"counter" index. Exception would be actions with state
>(like policer).
>
>> You can reuse this index in multiple counter action instances.
>
>That is great because it maps to tc semantics. When you create
>an action of the same type, you can specify the index and it
>is re-used. Example:
>
>sudo tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
>match ip dst 127.0.0.8/32 flowid 1:8 \
>action vlan push id 8 protocol 802.1q index 8\
>action mirred egress mirror dev eth0 index 111
>
>sudo tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
>match ip dst 127.0.0.15/32 flowid 1:10 \
>action vlan push id 15 protocol 802.1q index 15 \
>action mirred egress mirror index 111 \
>action drop index 111
>
>So for the shared mirror action the counter is shared
>by virtue of specifying index 111.
>
>What tc _doesnt allow_ is to re-use the same
>counter index across different types of actions (example
>mirror index 111 is not the same instance as drop 111).
>Thats why i was asking if you are exposing the hw index.

User does not care about any "hw index". That should be abstracted out
by the driver.


>
>> However, in tc there is implicit separate counter for every action.
>> 
>
>Yes, and is proving to be a challenge for hw. In s/w it makes sense.
>It seemed sensible at the time; existing hardware back then
>(broadcom 5691 family and cant remember the other vendor, iirc)
>hard coded the actions with counters. Mind you they would
>only support one action per match.
>
>Some rethinking is needed of this status quo.
>So maybe having syntaticaly an index for s/w vs h/w may make
>sense.
>Knowing the indices is important. As an example, for telemetry
>you may be very interesting in dumping only the counter stats
>instead of the rule. Dumping gact has always made it easy in
>my use cases because it doesnt have a lot of attributes. But it
>could make sense to introduce a new semantic like "dump action stats .."
>
>> The counter is limited resource. So we overcome this mismatch in mlxsw
>> by having action "counter" always first for every rule inserted:
>> rule->action_counter,the_actual_action,the_actual_action2,...the_actual_actionN
>> 
>
>So i am guessing the hw cant support "branching" i.e based on in
>some action state sometime you may execute action foo and other times
>action bar. Those kind of scenarios would need multiple counters.

We don't and when/if we do, we need to put another counter to the
branch point.


>> and we report stats from action_counter for all the_actual_actionX.
>
>This may not be accurate if you are branching - for example
>a policer or quota enforcer which either accepts or drops or sends next
>to a marker action etc .
>IMO, this was fine in the old days when you had one action per match.
>Best is to leave it to whoever creates the policy to decide what to
>count. IOW, I think modelling it as a pipe or ok or drop or continue
>and be placed anywhere in the policy graph instead of the begining.

Eh, that is not that simple. The existing users are used to the fact
that the actions are providing counters by themselves. Having and
explicit counter action like this would break that expectation.
Also, I think it should be up to the driver implementation. Some HW
might only support stats per rule, not the actions. Driver should fit
into the existing abstraction, I think it is fine.


>
>> > Sharing then becomes a matter of specifying the same drop action
>> > with the correct index across multiple filters.
>> > 
>> > If you dont have enough hw counters - then perhaps a scheme to show
>> > separate hardware counter index and software counter index (aka action
>> > index) is needed.
>> 
>> I don't, that is the purpose of this patchset, to be able to avoid the
>> "action_counter" from the example I wrote above.
>
>IMO, it would make sense to reuse existing gact for example and
>annotate s/w vs h/w indices as a starting point. It keeps the
>existing approach intact.
>
>> Note that I don't want to share, there is still separate "last_hit"
>> record in hw I expose in "used X sec". Interestingly enough, in
>> Spectrum-1 this is per rule, in Spectrum-2,3 this is per action block :)
>
>I didnt understand this one..

It's not "stats", it's an information about how long ago the act was
used.


>
>cheers,
>jamal
Jakub Kicinski Feb. 25, 2020, 6:06 p.m. UTC | #11
On Tue, 25 Feb 2020 17:22:03 +0100 Jiri Pirko wrote:
> >> You can reuse this index in multiple counter action instances.  
> >
> >That is great because it maps to tc semantics. When you create
> >an action of the same type, you can specify the index and it
> >is re-used. Example:
> >
> >sudo tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
> >match ip dst 127.0.0.8/32 flowid 1:8 \
> >action vlan push id 8 protocol 802.1q index 8\
> >action mirred egress mirror dev eth0 index 111
> >
> >sudo tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
> >match ip dst 127.0.0.15/32 flowid 1:10 \
> >action vlan push id 15 protocol 802.1q index 15 \
> >action mirred egress mirror index 111 \
> >action drop index 111
> >
> >So for the shared mirror action the counter is shared
> >by virtue of specifying index 111.
> >
> >What tc _doesnt allow_ is to re-use the same
> >counter index across different types of actions (example
> >mirror index 111 is not the same instance as drop 111).
> >Thats why i was asking if you are exposing the hw index.  
> 
> User does not care about any "hw index". That should be abstracted out
> by the driver.

+1

> >> and we report stats from action_counter for all the_actual_actionX.  
> >
> >This may not be accurate if you are branching - for example
> >a policer or quota enforcer which either accepts or drops or sends next
> >to a marker action etc .
> >IMO, this was fine in the old days when you had one action per match.
> >Best is to leave it to whoever creates the policy to decide what to
> >count. IOW, I think modelling it as a pipe or ok or drop or continue
> >and be placed anywhere in the policy graph instead of the begining.  
> 
> Eh, that is not that simple. The existing users are used to the fact
> that the actions are providing counters by themselves. Having and
> explicit counter action like this would break that expectation.
> Also, I think it should be up to the driver implementation. Some HW
> might only support stats per rule, not the actions. Driver should fit
> into the existing abstraction, I think it is fine.

+1
Jamal Hadi Salim Feb. 26, 2020, 12:52 p.m. UTC | #12
On 2020-02-25 11:22 a.m., Jiri Pirko wrote:
> Tue, Feb 25, 2020 at 05:01:05PM CET, jhs@mojatatu.com wrote:
>> +Cc Marian.
>>



>> So for the shared mirror action the counter is shared
>> by virtue of specifying index 111.
>>
>> What tc _doesnt allow_ is to re-use the same
>> counter index across different types of actions (example
>> mirror index 111 is not the same instance as drop 111).
>> Thats why i was asking if you are exposing the hw index.
> 
> User does not care about any "hw index". That should be abstracted out
> by the driver.
> 

My main motivation is proper accounting (which is important
for the billing and debugging of course). Example:
if i say "get stats" I should know it is the sum of both
h/w + s/w stats or the rules are clear in regards to how
to retrieve each and sum them or differentiate them.
If your patch takes care of summing up things etc, then i agree.
Or if the rules for accounting are consistent then we are fine
as well.

>> So i am guessing the hw cant support "branching" i.e based on in
>> some action state sometime you may execute action foo and other times
>> action bar. Those kind of scenarios would need multiple counters.
> 
> We don't and when/if we do, we need to put another counter to the
> branch point.
>

Ok, that would work.
> 
>>> and we report stats from action_counter for all the_actual_actionX.
>>
>> This may not be accurate if you are branching - for example
>> a policer or quota enforcer which either accepts or drops or sends next
>> to a marker action etc .
>> IMO, this was fine in the old days when you had one action per match.
>> Best is to leave it to whoever creates the policy to decide what to
>> count. IOW, I think modelling it as a pipe or ok or drop or continue
>> and be placed anywhere in the policy graph instead of the begining.
> 
> Eh, that is not that simple. The existing users are used to the fact
> that the actions are providing counters by themselves. Having and
> explicit counter action like this would break that expectation.
 >
> Also, I think it should be up to the driver implementation. Some HW
> might only support stats per rule, not the actions. Driver should fit
> into the existing abstraction, I think it is fine.
>

Reasonable point.
So "count" action is only useful for h/w?

>>> Note that I don't want to share, there is still separate "last_hit"
>>> record in hw I expose in "used X sec". Interestingly enough, in
>>> Spectrum-1 this is per rule, in Spectrum-2,3 this is per action block :)
>>
>> I didnt understand this one..
> 
> It's not "stats", it's an information about how long ago the act was
> used.

ah. Given tc has one of those per action, are you looking to introduce
a new "last used" action?

cheers,
jamal
Jiri Pirko Feb. 26, 2020, 1:56 p.m. UTC | #13
Wed, Feb 26, 2020 at 01:52:20PM CET, jhs@mojatatu.com wrote:
>On 2020-02-25 11:22 a.m., Jiri Pirko wrote:
>> Tue, Feb 25, 2020 at 05:01:05PM CET, jhs@mojatatu.com wrote:
>> > +Cc Marian.
>> > 
>
>
>
>> > So for the shared mirror action the counter is shared
>> > by virtue of specifying index 111.
>> > 
>> > What tc _doesnt allow_ is to re-use the same
>> > counter index across different types of actions (example
>> > mirror index 111 is not the same instance as drop 111).
>> > Thats why i was asking if you are exposing the hw index.
>> 
>> User does not care about any "hw index". That should be abstracted out
>> by the driver.
>> 
>
>My main motivation is proper accounting (which is important
>for the billing and debugging of course). Example:
>if i say "get stats" I should know it is the sum of both
>h/w + s/w stats or the rules are clear in regards to how
>to retrieve each and sum them or differentiate them.
>If your patch takes care of summing up things etc, then i agree.

The current state implemented in the code is summing up the stats. My
patchset has no relation to that.


>Or if the rules for accounting are consistent then we are fine
>as well.
>
>> > So i am guessing the hw cant support "branching" i.e based on in
>> > some action state sometime you may execute action foo and other times
>> > action bar. Those kind of scenarios would need multiple counters.
>> 
>> We don't and when/if we do, we need to put another counter to the
>> branch point.
>> 
>
>Ok, that would work.
>> 
>> > > and we report stats from action_counter for all the_actual_actionX.
>> > 
>> > This may not be accurate if you are branching - for example
>> > a policer or quota enforcer which either accepts or drops or sends next
>> > to a marker action etc .
>> > IMO, this was fine in the old days when you had one action per match.
>> > Best is to leave it to whoever creates the policy to decide what to
>> > count. IOW, I think modelling it as a pipe or ok or drop or continue
>> > and be placed anywhere in the policy graph instead of the begining.
>> 
>> Eh, that is not that simple. The existing users are used to the fact
>> that the actions are providing counters by themselves. Having and
>> explicit counter action like this would break that expectation.
>>
>> Also, I think it should be up to the driver implementation. Some HW
>> might only support stats per rule, not the actions. Driver should fit
>> into the existing abstraction, I think it is fine.
>> 
>
>Reasonable point.
>So "count" action is only useful for h/w?

There is no "count" action and should not be.


>
>> > > Note that I don't want to share, there is still separate "last_hit"
>> > > record in hw I expose in "used X sec". Interestingly enough, in
>> > > Spectrum-1 this is per rule, in Spectrum-2,3 this is per action block :)
>> > 
>> > I didnt understand this one..
>> 
>> It's not "stats", it's an information about how long ago the act was
>> used.
>
>ah. Given tc has one of those per action, are you looking to introduce
>a new "last used" action?

No.


>
>cheers,
>jamal
Edward Cree Feb. 27, 2020, 3:57 p.m. UTC | #14
On 24/02/2020 16:25, Jiri Pirko wrote:
> In hardware, we have a separate "counter" action with counter index.
> You can reuse this index in multiple counter action instances.
> However, in tc there is implicit separate counter for every action.
Importantly, not all actions have counters.  Only those with a
 .stats_update() method in their struct tc_action_ops have an implicit
 counter (which requires hardware offload), and to a first approximation
 those are the 'deliverish' actions (pass, drop, mirred).
This informs the sfc design below.

> The counter is limited resource. So we overcome this mismatch in mlxsw
> by having action "counter" always first for every rule inserted:
> rule->action_counter,the_actual_action,the_actual_action2,...the_actual_actionN
>
> and we report stats from action_counter for all the_actual_actionX.
For comparison, here's how we're doing it in the upcoming sfc hardware:
Rather than a sequence of actions in an arbitrary order (applied
 cumulatively to the original packet), we have a concept of an 'action
 set', which is some subset of a fixed sequence of actions, the last of
 which is a delivery.  (One of these actions is 'count', which takes one
 or more counter IDs as its metadata.)  Then the result of a rule match
 is an _action set list_, one or more action sets each of which is
 applied to a separate copy of the original packet.
This works because the delivery (hw) action corresponds to the only (tc)
 action that wants stats, combined with some cleverness in the defined
 order of our fixed action sequence.  And it means that we can properly
 support per-action counters, including making stats type be a per-action
 property (if one of the mirreds doesn't want stats, just don't put a
 'count' in that action-set).
For shared counters we just use the same counter index.  Mapping from
 tc action index to hw counter index is handled in the driver (transparent
 to userspace).

So from our perspective, making stats-type a property of the TC action is
 the Right Thing and fits in with the existing design of TC.  See also my
 old RFC series restoring per-action stats to flow_offload [1] which (in
 patch 4/4) added a 'want_stats' field to struct flow_action_entry; this
 would presumably become an enum flow_cls_hw_stats_typeto support these
 new stats-type semantics, and would be set to
 FLOW_CLS_HW_STATS_TYPE_DISABLEDif act->ops->stats_update == NULL
 regardless of the stats-type specified by the user.

-ed

[1] http://patchwork.ozlabs.org/cover/1110071/
Pablo Neira Ayuso Feb. 28, 2020, 7:59 p.m. UTC | #15
Hi Pirko,

On Tue, Feb 25, 2020 at 05:22:03PM +0100, Jiri Pirko wrote:
[...]
> Eh, that is not that simple. The existing users are used to the fact
> that the actions are providing counters by themselves. Having and
> explicit counter action like this would break that expectation.
> Also, I think it should be up to the driver implementation. Some HW
> might only support stats per rule, not the actions. Driver should fit
> into the existing abstraction, I think it is fine.

Something like the sketch patch that I'm attaching?

The idea behind it is to provide a counter action through the
flow_action API. Then, tc_setup_flow_action() checks if this action
comes with counters in that case the counter action is added.

My patch assumes tcf_vlan_counter() provides tells us what counter
type the user wants, I just introduced this to provide an example.

Thank you.
Jiri Pirko Feb. 29, 2020, 8:01 a.m. UTC | #16
Fri, Feb 28, 2020 at 08:59:09PM CET, pablo@netfilter.org wrote:
>Hi Pirko,
>
>On Tue, Feb 25, 2020 at 05:22:03PM +0100, Jiri Pirko wrote:
>[...]
>> Eh, that is not that simple. The existing users are used to the fact
>> that the actions are providing counters by themselves. Having and
>> explicit counter action like this would break that expectation.
>> Also, I think it should be up to the driver implementation. Some HW
>> might only support stats per rule, not the actions. Driver should fit
>> into the existing abstraction, I think it is fine.
>
>Something like the sketch patch that I'm attaching?

But why? Actions are separate entities, with separate counters. The
driver is either able to offload that or not. Up to the driver to
abstract this out.


>
>The idea behind it is to provide a counter action through the
>flow_action API. Then, tc_setup_flow_action() checks if this action
>comes with counters in that case the counter action is added.
>
>My patch assumes tcf_vlan_counter() provides tells us what counter
>type the user wants, I just introduced this to provide an example.
>
>Thank you.

>diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>index c6f7bd22db60..1a5006091edc 100644
>--- a/include/net/flow_offload.h
>+++ b/include/net/flow_offload.h
>@@ -138,9 +138,16 @@ enum flow_action_id {
> 	FLOW_ACTION_MPLS_PUSH,
> 	FLOW_ACTION_MPLS_POP,
> 	FLOW_ACTION_MPLS_MANGLE,
>+	FLOW_ACTION_COUNTER,
> 	NUM_FLOW_ACTIONS,
> };
> 
>+enum flow_action_counter_type {
>+	FLOW_COUNTER_DISABLED		= 0,
>+	FLOW_COUNTER_DELAYED,
>+	FLOW_COUNTER_IMMEDIATE,
>+};
>+
> /* This is mirroring enum pedit_header_type definition for easy mapping between
>  * tc pedit action. Legacy TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK is mapped to
>  * FLOW_ACT_MANGLE_UNSPEC, which is supported by no driver.
>@@ -213,6 +220,9 @@ struct flow_action_entry {
> 			u8		bos;
> 			u8		ttl;
> 		} mpls_mangle;
>+		struct {				/* FLOW_ACTION_COUNTER */
>+			enum flow_action_counter_type	type;
>+		} counter;
> 	};
> };
> 
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index 13c33eaf1ca1..984f2129c760 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -3435,6 +3435,7 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
> int tc_setup_flow_action(struct flow_action *flow_action,
> 			 const struct tcf_exts *exts)
> {
>+	enum flow_action_counter_type counter = FLOW_COUNTER_DISABLED;
> 	struct tc_action *act;
> 	int i, j, k, err = 0;
> 
>@@ -3489,6 +3490,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> 				err = -EOPNOTSUPP;
> 				goto err_out_locked;
> 			}
>+			counter = tcf_vlan_counter(act);
> 		} else if (is_tcf_tunnel_set(act)) {
> 			entry->id = FLOW_ACTION_TUNNEL_ENCAP;
> 			err = tcf_tunnel_encap_get_tunnel(entry, act);
>@@ -3567,10 +3569,19 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> 			err = -EOPNOTSUPP;
> 			goto err_out_locked;
> 		}
>-		spin_unlock_bh(&act->tcfa_lock);
> 
> 		if (!is_tcf_pedit(act))
> 			j++;
>+
>+		if (counter) {
>+			struct flow_action_entry *entry;
>+
>+			entry = &flow_action->entries[j++];
>+			entry->id = FLOW_ACTION_COUNTER;
>+			entry->counter.type = counter;
>+			counter = FLOW_COUNTER_DISABLED;
>+		}
>+		spin_unlock_bh(&act->tcfa_lock);
> 	}
> 
> err_out:
Pablo Neira Ayuso Feb. 29, 2020, 7:56 p.m. UTC | #17
On Sat, Feb 29, 2020 at 09:01:20AM +0100, Jiri Pirko wrote:
> Fri, Feb 28, 2020 at 08:59:09PM CET, pablo@netfilter.org wrote:
> >Hi Pirko,
> >
> >On Tue, Feb 25, 2020 at 05:22:03PM +0100, Jiri Pirko wrote:
> >[...]
> >> Eh, that is not that simple. The existing users are used to the fact
> >> that the actions are providing counters by themselves. Having and
> >> explicit counter action like this would break that expectation.
> >> Also, I think it should be up to the driver implementation. Some HW
> >> might only support stats per rule, not the actions. Driver should fit
> >> into the existing abstraction, I think it is fine.
> >
> >Something like the sketch patch that I'm attaching?
> 
> But why? Actions are separate entities, with separate counters. The
> driver is either able to offload that or not. Up to the driver to
> abstract this out.

You can add one counter for each action through FLOW_ACTION_COUNTER.
Then, one single tc action maps to two flow_actions, the action itself
and the counter action. Hence, the tc frontend only needs to append a
counter after the action.

Why this might be a problem from the driver side? From reading this
thread, my understanding is that:

1) Some drivers have the ability to attach to immediate/delayed
   counters.
2) The immediate counters might be a limited resource while delayed
   counters are abundant.
3) Some drivers have counters per-filter, not per-action. In that
   case, for each counter action in the rule, the driver might decide
   to make all counter actions refer to the per-filter counter.

Thank you!
Edward Cree March 2, 2020, 4:07 p.m. UTC | #18
On 29/02/2020 08:01, Jiri Pirko wrote:
> Fri, Feb 28, 2020 at 08:59:09PM CET, pablo@netfilter.org wrote:
>> Something like the sketch patch that I'm attaching?
> But why? Actions are separate entities, with separate counters. The
> driver is either able to offload that or not. Up to the driver to
> abstract this out.
+1

If any of this "fake up a counter action" stuff is common to several
 drivers, we could maybe have some library functions to help them with
 it, but the tc<->driver API shouldn't change (since the change adds
 no expressiveness).

-ed