mbox series

[net-next,v2,00/12] net: sched: propagate extack to cls offloads on destroy and only with skip_sw

Message ID 20180124205424.6976-1-jakub.kicinski@netronome.com
Headers show
Series net: sched: propagate extack to cls offloads on destroy and only with skip_sw | expand

Message

Jakub Kicinski Jan. 24, 2018, 8:54 p.m. UTC
Hi!

This series some of Jiri's comments and the fact that today drivers
may produce extack even if there is no skip_sw flag (meaning the
driver failure is not really a problem), and warning messages will
only confuse the users.

First patch propagates extack to destroy as requested by Jiri, extack
is then propagated to the driver callback for each classifier.  I chose
not to provide the extack on error paths.  As a rule of thumb it seems
best to keep the extack of the condition which caused the error.  E.g.

     err = this_will_fail(arg, extack);
     if (err) {
        undo_things(arg, NULL /* don't pass extack */);
	return err;
     }

Note that NL_SET_ERR_MSG() will ignore the message if extack is NULL.
I was pondering whether we should make NL_SET_ERR_MSG() refuse to
overwrite the msg, but there seem to be cases in the tree where extack
is set like this:

     err = this_will_fail(arg, extack);
     if (err) {
        undo_things(arg, NULL /* don't pass extack */);
	NL_SET_ERR_MSG(extack, "extack is set after undo call :/");
	return err;
     }

I think not passing extack to undo calls is reasonable.

v2:
 - rename the temporary tc_cls_common_offload_init().

Jakub Kicinski (12):
  net: sched: propagate extack to cls->destroy callbacks
  net: sched: prepare for reimplementation of
    tc_cls_common_offload_init()
  cls_bpf: remove gen_flags from bpf_offload
  cls_bpf: pass offload flags to tc_cls_common_offload_init()
  cls_bpf: propagate extack to offload delete callback
  cls_matchall: pass offload flags to tc_cls_common_offload_init()
  cls_matchall: propagate extack to delete callback
  cls_flower: pass offload flags to tc_cls_common_offload_init()
  cls_flower: propagate extack to delete callback
  cls_u32: pass offload flags to tc_cls_common_offload_init()
  cls_u32: propagate extack to delete callback
  net: sched: remove tc_cls_common_offload_init_deprecated()

 include/net/pkt_cls.h     | 24 ++++++++++++------------
 include/net/sch_generic.h |  3 ++-
 net/sched/cls_api.c       | 15 ++++++++-------
 net/sched/cls_basic.c     |  2 +-
 net/sched/cls_bpf.c       | 24 +++++++++++++-----------
 net/sched/cls_cgroup.c    |  3 ++-
 net/sched/cls_flow.c      |  2 +-
 net/sched/cls_flower.c    | 24 +++++++++++++-----------
 net/sched/cls_fw.c        |  2 +-
 net/sched/cls_matchall.c  | 13 +++++++------
 net/sched/cls_route.c     |  2 +-
 net/sched/cls_rsvp.h      |  2 +-
 net/sched/cls_tcindex.c   |  3 ++-
 net/sched/cls_u32.c       | 42 ++++++++++++++++++++++++------------------
 14 files changed, 88 insertions(+), 73 deletions(-)

Comments

David Miller Jan. 24, 2018, 9:03 p.m. UTC | #1
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 24 Jan 2018 12:54:12 -0800

> This series some of Jiri's comments and the fact that today drivers
> may produce extack even if there is no skip_sw flag (meaning the
> driver failure is not really a problem), and warning messages will
> only confuse the users.
...
> v2:
>  - rename the temporary tc_cls_common_offload_init().

This looks better, series applied, thanks Jakub.
Jiri Pirko Jan. 24, 2018, 9:04 p.m. UTC | #2
Wed, Jan 24, 2018 at 09:54:12PM CET, jakub.kicinski@netronome.com wrote:
>Hi!
>
>This series some of Jiri's comments and the fact that today drivers
>may produce extack even if there is no skip_sw flag (meaning the
>driver failure is not really a problem), and warning messages will
>only confuse the users.
>
>First patch propagates extack to destroy as requested by Jiri, extack
>is then propagated to the driver callback for each classifier.  I chose
>not to provide the extack on error paths.  As a rule of thumb it seems
>best to keep the extack of the condition which caused the error.  E.g.
>
>     err = this_will_fail(arg, extack);
>     if (err) {
>        undo_things(arg, NULL /* don't pass extack */);
>	return err;
>     }
>
>Note that NL_SET_ERR_MSG() will ignore the message if extack is NULL.
>I was pondering whether we should make NL_SET_ERR_MSG() refuse to
>overwrite the msg, but there seem to be cases in the tree where extack
>is set like this:
>
>     err = this_will_fail(arg, extack);
>     if (err) {
>        undo_things(arg, NULL /* don't pass extack */);
>	NL_SET_ERR_MSG(extack, "extack is set after undo call :/");
>	return err;
>     }
>
>I think not passing extack to undo calls is reasonable.
>
>v2:
> - rename the temporary tc_cls_common_offload_init().
>
>Jakub Kicinski (12):
>  net: sched: propagate extack to cls->destroy callbacks
>  net: sched: prepare for reimplementation of
>    tc_cls_common_offload_init()
>  cls_bpf: remove gen_flags from bpf_offload
>  cls_bpf: pass offload flags to tc_cls_common_offload_init()
>  cls_bpf: propagate extack to offload delete callback
>  cls_matchall: pass offload flags to tc_cls_common_offload_init()
>  cls_matchall: propagate extack to delete callback
>  cls_flower: pass offload flags to tc_cls_common_offload_init()
>  cls_flower: propagate extack to delete callback
>  cls_u32: pass offload flags to tc_cls_common_offload_init()
>  cls_u32: propagate extack to delete callback
>  net: sched: remove tc_cls_common_offload_init_deprecated()

For the record, I still think it is odd to have 6 patches just to add
one arg to a function. I wonder where this unnecessary patch splits
would lead to in the future.

Anyway, since apparently no one really cares, and the code result looks
good to me, for whole patchset:
Acked-by: Jiri Pirko <jiri@mellanox.com>
David Ahern Jan. 24, 2018, 9:07 p.m. UTC | #3
On 1/24/18 2:04 PM, Jiri Pirko wrote:
> For the record, I still think it is odd to have 6 patches just to add
> one arg to a function. I wonder where this unnecessary patch splits
> would lead to in the future.

I think it made the review much easier than 1 really long patch.
Jiri Pirko Jan. 24, 2018, 9:15 p.m. UTC | #4
Wed, Jan 24, 2018 at 10:07:25PM CET, dsahern@gmail.com wrote:
>On 1/24/18 2:04 PM, Jiri Pirko wrote:
>> For the record, I still think it is odd to have 6 patches just to add
>> one arg to a function. I wonder where this unnecessary patch splits
>> would lead to in the future.
>
>I think it made the review much easier than 1 really long patch.

Even squashed, the patch is quite small. Doing the same thing in every
hunk.

On contrary, the split made it more complicated for me, because when
I looked at patch 1 and the function duplication with another arg,
I did not understand what is going on. Only the last patch actually
explained it. But perhaps I'm slow.
Jakub Kicinski Jan. 24, 2018, 9:49 p.m. UTC | #5
On Wed, 24 Jan 2018 22:15:00 +0100, Jiri Pirko wrote:
> Wed, Jan 24, 2018 at 10:07:25PM CET, dsahern@gmail.com wrote:
> >On 1/24/18 2:04 PM, Jiri Pirko wrote:  
> >> For the record, I still think it is odd to have 6 patches just to add
> >> one arg to a function. I wonder where this unnecessary patch splits
> >> would lead to in the future.  
> >
> >I think it made the review much easier than 1 really long patch.  
> 
> Even squashed, the patch is quite small. Doing the same thing in every
> hunk.
> 
> On contrary, the split made it more complicated for me, because when
> I looked at patch 1 and the function duplication with another arg,
> I did not understand what is going on. Only the last patch actually
> explained it. But perhaps I'm slow.

Next time I'll do a better job explaining things in commit logs, sorry!
Marcelo Ricardo Leitner Jan. 25, 2018, 3:11 p.m. UTC | #6
On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote:
> Hi!
> 
> This series some of Jiri's comments and the fact that today drivers
> may produce extack even if there is no skip_sw flag (meaning the
> driver failure is not really a problem), and warning messages will
> only confuse the users.

It's a fair point, but I think this is not the best solution. How will
the user then know why it failed to install in hw? Will have to
install a new rule, just with skip_sw, and hope that it fails with the
same reason?

Maybe it's better to let tc/ovs/etc only exhibit this information
under a certain log/debug level.

  Marcelo
Jakub Kicinski Jan. 25, 2018, 10:57 p.m. UTC | #7
On Thu, 25 Jan 2018 13:11:57 -0200, Marcelo Ricardo Leitner wrote:
> On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote:
> > Hi!
> > 
> > This series some of Jiri's comments and the fact that today drivers
> > may produce extack even if there is no skip_sw flag (meaning the
> > driver failure is not really a problem), and warning messages will
> > only confuse the users.  
> 
> It's a fair point, but I think this is not the best solution. How will
> the user then know why it failed to install in hw? Will have to
> install a new rule, just with skip_sw, and hope that it fails with the
> same reason?
>
> Maybe it's better to let tc/ovs/etc only exhibit this information
> under a certain log/debug level.

What you say does make sense in case of classifiers which are basically
HW offload vehicles.  But for cls_bpf, which people are actually using
heavily as a software solution, I don't want any warning to be produced
just because someone happened to run the command on a Netronome
card :(  Say someone swaps an old NIC for a NFP, and runs their old
cls_bpf scripts and suddenly there are warnings they don't care about
and have no way of silencing.

I do think skip_sw will fail for the same reason, unless someone adds
extacks for IO or memory allocation problems or other transient things.

Do I understand correctly that OvS TC does not set skip_sw?  We could
add a "verbose offload" flag to the API or filter the bad extacks at
the user space level.  Although, again, my preference would be not to
filter at the user space level, because user space can't differentiate
between a probably-doesn't-matter-but-HW-offload-failed warning or legit
something-is-not-right-in-the-software-but-command-succeeded warning :S
So if there is a major use for non-forced offload failure warnings I
would lean towards a new flag.
Marcelo Ricardo Leitner Jan. 28, 2018, 10:39 p.m. UTC | #8
On Thu, Jan 25, 2018 at 02:57:17PM -0800, Jakub Kicinski wrote:
> On Thu, 25 Jan 2018 13:11:57 -0200, Marcelo Ricardo Leitner wrote:
> > On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote:
> > > Hi!
> > > 
> > > This series some of Jiri's comments and the fact that today drivers
> > > may produce extack even if there is no skip_sw flag (meaning the
> > > driver failure is not really a problem), and warning messages will
> > > only confuse the users.  
> > 
> > It's a fair point, but I think this is not the best solution. How will
> > the user then know why it failed to install in hw? Will have to
> > install a new rule, just with skip_sw, and hope that it fails with the
> > same reason?
> >
> > Maybe it's better to let tc/ovs/etc only exhibit this information
> > under a certain log/debug level.
> 
> What you say does make sense in case of classifiers which are basically
> HW offload vehicles.  But for cls_bpf, which people are actually using
> heavily as a software solution, I don't want any warning to be produced
> just because someone happened to run the command on a Netronome
> card :(  Say someone swaps an old NIC for a NFP, and runs their old
> cls_bpf scripts and suddenly there are warnings they don't care about
> and have no way of silencing.

(Sorry for the delay on replying, btw. I'm still traveling.)

Makes sense. I agree that at least it shouldn't be displayed in a way
that may lead the user to think it was a big/fatal error.

> 
> I do think skip_sw will fail for the same reason, unless someone adds
> extacks for IO or memory allocation problems or other transient things.

I don't really follow this one. Fail you mean, fail to report the
actual reason? If so, ok, but that's something easily fixable I think,
especially because with skip_sw, if such an error happens, it's fatal
for the operation so the error reporting is consistent.

> 
> Do I understand correctly that OvS TC does not set skip_sw?  We could

Yes.

> add a "verbose offload" flag to the API or filter the bad extacks at
> the user space level.  Although, again, my preference would be not to
> filter at the user space level, because user space can't differentiate
> between a probably-doesn't-matter-but-HW-offload-failed warning or legit
> something-is-not-right-in-the-software-but-command-succeeded warning :S

But userspace is the original requester. It should know what the rule
is intended for and how to act upon it. For ovs, for example, it could
just log a message and move on, while tc could report "hey, ok, but
please note that the rule couldn't be offloaded".

> So if there is a major use for non-forced offload failure warnings I
> would lean towards a new flag.

I'm thinking about this, still undecided. In the end maybe a counter
somewhere could be enough and such reporting is not needed. Thinking..

  Marcelo
Jakub Kicinski Jan. 29, 2018, 11:36 p.m. UTC | #9
On Sun, 28 Jan 2018 20:39:02 -0200, Marcelo Ricardo Leitner wrote:
> On Thu, Jan 25, 2018 at 02:57:17PM -0800, Jakub Kicinski wrote:
> > On Thu, 25 Jan 2018 13:11:57 -0200, Marcelo Ricardo Leitner wrote:  
> > > On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote:  
> > > > Hi!
> > > > 
> > > > This series some of Jiri's comments and the fact that today drivers
> > > > may produce extack even if there is no skip_sw flag (meaning the
> > > > driver failure is not really a problem), and warning messages will
> > > > only confuse the users.    
> > > 
> > > It's a fair point, but I think this is not the best solution. How will
> > > the user then know why it failed to install in hw? Will have to
> > > install a new rule, just with skip_sw, and hope that it fails with the
> > > same reason?
> > >
> > > Maybe it's better to let tc/ovs/etc only exhibit this information
> > > under a certain log/debug level.  
> > 
> > What you say does make sense in case of classifiers which are basically
> > HW offload vehicles.  But for cls_bpf, which people are actually using
> > heavily as a software solution, I don't want any warning to be produced
> > just because someone happened to run the command on a Netronome
> > card :(  Say someone swaps an old NIC for a NFP, and runs their old
> > cls_bpf scripts and suddenly there are warnings they don't care about
> > and have no way of silencing.  
> 
> (Sorry for the delay on replying, btw. I'm still traveling.)
> 
> Makes sense. I agree that at least it shouldn't be displayed in a way
> that may lead the user to think it was a big/fatal error.
> 
> > I do think skip_sw will fail for the same reason, unless someone adds
> > extacks for IO or memory allocation problems or other transient things.  
> 
> I don't really follow this one. Fail you mean, fail to report the
> actual reason? If so, ok, but that's something easily fixable I think,
> especially because with skip_sw, if such an error happens, it's fatal
> for the operation so the error reporting is consistent.

I was referring to your question: "Will have to install a new rule,
just with skip_sw, and hope that it fails with the same reason?"
So yes, currently the only way to get the extack would be to retry the
command with skip_sw.

> > add a "verbose offload" flag to the API or filter the bad extacks at
> > the user space level.  Although, again, my preference would be not to
> > filter at the user space level, because user space can't differentiate
> > between a probably-doesn't-matter-but-HW-offload-failed warning or legit
> > something-is-not-right-in-the-software-but-command-succeeded warning :S  
> 
> But userspace is the original requester. It should know what the rule
> is intended for and how to act upon it. For ovs, for example, it could
> just log a message and move on, while tc could report "hey, ok, but
> please note that the rule couldn't be offloaded".
> 
> > So if there is a major use for non-forced offload failure warnings I
> > would lean towards a new flag.  
> 
> I'm thinking about this, still undecided. In the end maybe a counter
> somewhere could be enough and such reporting is not needed. Thinking..

Hm, special counters for failure reasons or just for all failures to
offload?  FWIW user space can dump the filters and if the filter is
offloaded there will be an in_hw flag Or added a few releases back.  
Let us know what your thoughts are :)