Message ID | 20180124205424.6976-1-jakub.kicinski@netronome.com |
---|---|
Headers | show |
Series | net: sched: propagate extack to cls offloads on destroy and only with skip_sw | expand |
From: Jakub Kicinski <jakub.kicinski@netronome.com> Date: Wed, 24 Jan 2018 12:54:12 -0800 > This series some of Jiri's comments and the fact that today drivers > may produce extack even if there is no skip_sw flag (meaning the > driver failure is not really a problem), and warning messages will > only confuse the users. ... > v2: > - rename the temporary tc_cls_common_offload_init(). This looks better, series applied, thanks Jakub.
Wed, Jan 24, 2018 at 09:54:12PM CET, jakub.kicinski@netronome.com wrote: >Hi! > >This series some of Jiri's comments and the fact that today drivers >may produce extack even if there is no skip_sw flag (meaning the >driver failure is not really a problem), and warning messages will >only confuse the users. > >First patch propagates extack to destroy as requested by Jiri, extack >is then propagated to the driver callback for each classifier. I chose >not to provide the extack on error paths. As a rule of thumb it seems >best to keep the extack of the condition which caused the error. E.g. > > err = this_will_fail(arg, extack); > if (err) { > undo_things(arg, NULL /* don't pass extack */); > return err; > } > >Note that NL_SET_ERR_MSG() will ignore the message if extack is NULL. >I was pondering whether we should make NL_SET_ERR_MSG() refuse to >overwrite the msg, but there seem to be cases in the tree where extack >is set like this: > > err = this_will_fail(arg, extack); > if (err) { > undo_things(arg, NULL /* don't pass extack */); > NL_SET_ERR_MSG(extack, "extack is set after undo call :/"); > return err; > } > >I think not passing extack to undo calls is reasonable. > >v2: > - rename the temporary tc_cls_common_offload_init(). > >Jakub Kicinski (12): > net: sched: propagate extack to cls->destroy callbacks > net: sched: prepare for reimplementation of > tc_cls_common_offload_init() > cls_bpf: remove gen_flags from bpf_offload > cls_bpf: pass offload flags to tc_cls_common_offload_init() > cls_bpf: propagate extack to offload delete callback > cls_matchall: pass offload flags to tc_cls_common_offload_init() > cls_matchall: propagate extack to delete callback > cls_flower: pass offload flags to tc_cls_common_offload_init() > cls_flower: propagate extack to delete callback > cls_u32: pass offload flags to tc_cls_common_offload_init() > cls_u32: propagate extack to delete callback > net: sched: remove tc_cls_common_offload_init_deprecated() For the record, I still think it is odd to have 6 patches just to add one arg to a function. I wonder where this unnecessary patch splits would lead to in the future. Anyway, since apparently no one really cares, and the code result looks good to me, for whole patchset: Acked-by: Jiri Pirko <jiri@mellanox.com>
On 1/24/18 2:04 PM, Jiri Pirko wrote: > For the record, I still think it is odd to have 6 patches just to add > one arg to a function. I wonder where this unnecessary patch splits > would lead to in the future. I think it made the review much easier than 1 really long patch.
Wed, Jan 24, 2018 at 10:07:25PM CET, dsahern@gmail.com wrote: >On 1/24/18 2:04 PM, Jiri Pirko wrote: >> For the record, I still think it is odd to have 6 patches just to add >> one arg to a function. I wonder where this unnecessary patch splits >> would lead to in the future. > >I think it made the review much easier than 1 really long patch. Even squashed, the patch is quite small. Doing the same thing in every hunk. On contrary, the split made it more complicated for me, because when I looked at patch 1 and the function duplication with another arg, I did not understand what is going on. Only the last patch actually explained it. But perhaps I'm slow.
On Wed, 24 Jan 2018 22:15:00 +0100, Jiri Pirko wrote: > Wed, Jan 24, 2018 at 10:07:25PM CET, dsahern@gmail.com wrote: > >On 1/24/18 2:04 PM, Jiri Pirko wrote: > >> For the record, I still think it is odd to have 6 patches just to add > >> one arg to a function. I wonder where this unnecessary patch splits > >> would lead to in the future. > > > >I think it made the review much easier than 1 really long patch. > > Even squashed, the patch is quite small. Doing the same thing in every > hunk. > > On contrary, the split made it more complicated for me, because when > I looked at patch 1 and the function duplication with another arg, > I did not understand what is going on. Only the last patch actually > explained it. But perhaps I'm slow. Next time I'll do a better job explaining things in commit logs, sorry!
On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote: > Hi! > > This series some of Jiri's comments and the fact that today drivers > may produce extack even if there is no skip_sw flag (meaning the > driver failure is not really a problem), and warning messages will > only confuse the users. It's a fair point, but I think this is not the best solution. How will the user then know why it failed to install in hw? Will have to install a new rule, just with skip_sw, and hope that it fails with the same reason? Maybe it's better to let tc/ovs/etc only exhibit this information under a certain log/debug level. Marcelo
On Thu, 25 Jan 2018 13:11:57 -0200, Marcelo Ricardo Leitner wrote: > On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote: > > Hi! > > > > This series some of Jiri's comments and the fact that today drivers > > may produce extack even if there is no skip_sw flag (meaning the > > driver failure is not really a problem), and warning messages will > > only confuse the users. > > It's a fair point, but I think this is not the best solution. How will > the user then know why it failed to install in hw? Will have to > install a new rule, just with skip_sw, and hope that it fails with the > same reason? > > Maybe it's better to let tc/ovs/etc only exhibit this information > under a certain log/debug level. What you say does make sense in case of classifiers which are basically HW offload vehicles. But for cls_bpf, which people are actually using heavily as a software solution, I don't want any warning to be produced just because someone happened to run the command on a Netronome card :( Say someone swaps an old NIC for a NFP, and runs their old cls_bpf scripts and suddenly there are warnings they don't care about and have no way of silencing. I do think skip_sw will fail for the same reason, unless someone adds extacks for IO or memory allocation problems or other transient things. Do I understand correctly that OvS TC does not set skip_sw? We could add a "verbose offload" flag to the API or filter the bad extacks at the user space level. Although, again, my preference would be not to filter at the user space level, because user space can't differentiate between a probably-doesn't-matter-but-HW-offload-failed warning or legit something-is-not-right-in-the-software-but-command-succeeded warning :S So if there is a major use for non-forced offload failure warnings I would lean towards a new flag.
On Thu, Jan 25, 2018 at 02:57:17PM -0800, Jakub Kicinski wrote: > On Thu, 25 Jan 2018 13:11:57 -0200, Marcelo Ricardo Leitner wrote: > > On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote: > > > Hi! > > > > > > This series some of Jiri's comments and the fact that today drivers > > > may produce extack even if there is no skip_sw flag (meaning the > > > driver failure is not really a problem), and warning messages will > > > only confuse the users. > > > > It's a fair point, but I think this is not the best solution. How will > > the user then know why it failed to install in hw? Will have to > > install a new rule, just with skip_sw, and hope that it fails with the > > same reason? > > > > Maybe it's better to let tc/ovs/etc only exhibit this information > > under a certain log/debug level. > > What you say does make sense in case of classifiers which are basically > HW offload vehicles. But for cls_bpf, which people are actually using > heavily as a software solution, I don't want any warning to be produced > just because someone happened to run the command on a Netronome > card :( Say someone swaps an old NIC for a NFP, and runs their old > cls_bpf scripts and suddenly there are warnings they don't care about > and have no way of silencing. (Sorry for the delay on replying, btw. I'm still traveling.) Makes sense. I agree that at least it shouldn't be displayed in a way that may lead the user to think it was a big/fatal error. > > I do think skip_sw will fail for the same reason, unless someone adds > extacks for IO or memory allocation problems or other transient things. I don't really follow this one. Fail you mean, fail to report the actual reason? If so, ok, but that's something easily fixable I think, especially because with skip_sw, if such an error happens, it's fatal for the operation so the error reporting is consistent. > > Do I understand correctly that OvS TC does not set skip_sw? We could Yes. > add a "verbose offload" flag to the API or filter the bad extacks at > the user space level. Although, again, my preference would be not to > filter at the user space level, because user space can't differentiate > between a probably-doesn't-matter-but-HW-offload-failed warning or legit > something-is-not-right-in-the-software-but-command-succeeded warning :S But userspace is the original requester. It should know what the rule is intended for and how to act upon it. For ovs, for example, it could just log a message and move on, while tc could report "hey, ok, but please note that the rule couldn't be offloaded". > So if there is a major use for non-forced offload failure warnings I > would lean towards a new flag. I'm thinking about this, still undecided. In the end maybe a counter somewhere could be enough and such reporting is not needed. Thinking.. Marcelo
On Sun, 28 Jan 2018 20:39:02 -0200, Marcelo Ricardo Leitner wrote: > On Thu, Jan 25, 2018 at 02:57:17PM -0800, Jakub Kicinski wrote: > > On Thu, 25 Jan 2018 13:11:57 -0200, Marcelo Ricardo Leitner wrote: > > > On Wed, Jan 24, 2018 at 12:54:12PM -0800, Jakub Kicinski wrote: > > > > Hi! > > > > > > > > This series some of Jiri's comments and the fact that today drivers > > > > may produce extack even if there is no skip_sw flag (meaning the > > > > driver failure is not really a problem), and warning messages will > > > > only confuse the users. > > > > > > It's a fair point, but I think this is not the best solution. How will > > > the user then know why it failed to install in hw? Will have to > > > install a new rule, just with skip_sw, and hope that it fails with the > > > same reason? > > > > > > Maybe it's better to let tc/ovs/etc only exhibit this information > > > under a certain log/debug level. > > > > What you say does make sense in case of classifiers which are basically > > HW offload vehicles. But for cls_bpf, which people are actually using > > heavily as a software solution, I don't want any warning to be produced > > just because someone happened to run the command on a Netronome > > card :( Say someone swaps an old NIC for a NFP, and runs their old > > cls_bpf scripts and suddenly there are warnings they don't care about > > and have no way of silencing. > > (Sorry for the delay on replying, btw. I'm still traveling.) > > Makes sense. I agree that at least it shouldn't be displayed in a way > that may lead the user to think it was a big/fatal error. > > > I do think skip_sw will fail for the same reason, unless someone adds > > extacks for IO or memory allocation problems or other transient things. > > I don't really follow this one. Fail you mean, fail to report the > actual reason? If so, ok, but that's something easily fixable I think, > especially because with skip_sw, if such an error happens, it's fatal > for the operation so the error reporting is consistent. I was referring to your question: "Will have to install a new rule, just with skip_sw, and hope that it fails with the same reason?" So yes, currently the only way to get the extack would be to retry the command with skip_sw. > > add a "verbose offload" flag to the API or filter the bad extacks at > > the user space level. Although, again, my preference would be not to > > filter at the user space level, because user space can't differentiate > > between a probably-doesn't-matter-but-HW-offload-failed warning or legit > > something-is-not-right-in-the-software-but-command-succeeded warning :S > > But userspace is the original requester. It should know what the rule > is intended for and how to act upon it. For ovs, for example, it could > just log a message and move on, while tc could report "hey, ok, but > please note that the rule couldn't be offloaded". > > > So if there is a major use for non-forced offload failure warnings I > > would lean towards a new flag. > > I'm thinking about this, still undecided. In the end maybe a counter > somewhere could be enough and such reporting is not needed. Thinking.. Hm, special counters for failure reasons or just for all failures to offload? FWIW user space can dump the filters and if the filter is offloaded there will be an in_hw flag Or added a few releases back. Let us know what your thoughts are :)