diff mbox series

[net,v2] bonding: fix feature flag setting at init time

Message ID 20201202173053.13800-1-jarod@redhat.com
State Not Applicable
Headers show
Series [net,v2] bonding: fix feature flag setting at init time | expand

Commit Message

Jarod Wilson Dec. 2, 2020, 5:30 p.m. UTC
Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. Basically,
this code was racing against register_netdevice() filling in
wanted_features, and when it got there first, the empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

    esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

v2: rework based on further testing and suggestions from ivecera

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera <ivecera@redhat.com>
Suggested-by: Ivan Vecera <ivecera@redhat.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_main.c    | 10 ++++------
 drivers/net/bonding/bond_options.c |  6 +++++-
 2 files changed, 9 insertions(+), 7 deletions(-)

Comments

Ivan Vecera Dec. 2, 2020, 5:41 p.m. UTC | #1
On Wed,  2 Dec 2020 12:30:53 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> Don't try to adjust XFRM support flags if the bond device isn't yet
> registered. Bad things can currently happen when netdev_change_features()
> is called without having wanted_features fully filled in yet. Basically,
> this code was racing against register_netdevice() filling in
> wanted_features, and when it got there first, the empty wanted_features
> led to features also getting emptied out, which was definitely not the
> intended behavior, so prevent that from happening.
> 
> Originally, I'd hoped to stop adjusting wanted_features at all in the
> bonding driver, as it's documented as being something only the network
> core should touch, but we actually do need to do this to properly update
> both the features and wanted_features fields when changing the bond type,
> or we get to a situation where ethtool sees:
> 
>     esp-hw-offload: off [requested on]
> 
> I do think we should be using netdev_update_features instead of
> netdev_change_features here though, so we only send notifiers when the
> features actually changed.
> 
> v2: rework based on further testing and suggestions from ivecera
> 
> Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
> Reported-by: Ivan Vecera <ivecera@redhat.com>
> Suggested-by: Ivan Vecera <ivecera@redhat.com>
> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Thomas Davis <tadavis@lbl.gov>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  drivers/net/bonding/bond_main.c    | 10 ++++------
>  drivers/net/bonding/bond_options.c |  6 +++++-
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index e0880a3840d7..5fe5232cc3f3 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
>  				NETIF_F_HW_VLAN_CTAG_FILTER;
>  
>  	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
> -#ifdef CONFIG_XFRM_OFFLOAD
> -	bond_dev->hw_features |= BOND_XFRM_FEATURES;
> -#endif /* CONFIG_XFRM_OFFLOAD */
>  	bond_dev->features |= bond_dev->hw_features;
>  	bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
>  #ifdef CONFIG_XFRM_OFFLOAD
> -	/* Disable XFRM features if this isn't an active-backup config */
> -	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> -		bond_dev->features &= ~BOND_XFRM_FEATURES;
> +	bond_dev->hw_features |= BOND_XFRM_FEATURES;
> +	/* Only enable XFRM features if this is an active-backup config */
> +	if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> +		bond_dev->features |= BOND_XFRM_FEATURES;
>  #endif /* CONFIG_XFRM_OFFLOAD */
>  }
>  
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index 9abfaae1c6f7..19205cfac751 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond,
>  		bond->params.tlb_dynamic_lb = 1;
>  
>  #ifdef CONFIG_XFRM_OFFLOAD
> +	if (bond->dev->reg_state != NETREG_REGISTERED)
> +		goto noreg;
> +
>  	if (newval->value == BOND_MODE_ACTIVEBACKUP)
>  		bond->dev->wanted_features |= BOND_XFRM_FEATURES;
>  	else
>  		bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> -	netdev_change_features(bond->dev);
> +	netdev_update_features(bond->dev);
> +noreg:
>  #endif /* CONFIG_XFRM_OFFLOAD */
>  
>  	/* don't cache arp_validate between modes */

Tested-by: Ivan Vecera <ivecera@redhat.com>
Jakub Kicinski Dec. 2, 2020, 5:53 p.m. UTC | #2
On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> +	if (bond->dev->reg_state != NETREG_REGISTERED)
> +		goto noreg;
> +
>  	if (newval->value == BOND_MODE_ACTIVEBACKUP)
>  		bond->dev->wanted_features |= BOND_XFRM_FEATURES;
>  	else
>  		bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> -	netdev_change_features(bond->dev);
> +	netdev_update_features(bond->dev);
> +noreg:

Why the goto?
Jay Vosburgh Dec. 2, 2020, 5:55 p.m. UTC | #3
Jarod Wilson <jarod@redhat.com> wrote:

>Don't try to adjust XFRM support flags if the bond device isn't yet
>registered. Bad things can currently happen when netdev_change_features()
>is called without having wanted_features fully filled in yet. Basically,
>this code was racing against register_netdevice() filling in
>wanted_features, and when it got there first, the empty wanted_features
>led to features also getting emptied out, which was definitely not the
>intended behavior, so prevent that from happening.

	Is this an actual race?  Reading Ivan's prior message, it sounds
like it's an ordering problem (in that bond_newlink calls
register_netdevice after bond_changelink).

	The change to bond_option_mode_set tests against reg_state, so
presumably it wants to skip the first(?) time through, before the
register_netdevice call; is that right?

	-J

>Originally, I'd hoped to stop adjusting wanted_features at all in the
>bonding driver, as it's documented as being something only the network
>core should touch, but we actually do need to do this to properly update
>both the features and wanted_features fields when changing the bond type,
>or we get to a situation where ethtool sees:
>
>    esp-hw-offload: off [requested on]
>
>I do think we should be using netdev_update_features instead of
>netdev_change_features here though, so we only send notifiers when the
>features actually changed.
>
>v2: rework based on further testing and suggestions from ivecera
>
>Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
>Reported-by: Ivan Vecera <ivecera@redhat.com>
>Suggested-by: Ivan Vecera <ivecera@redhat.com>
>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_main.c    | 10 ++++------
> drivers/net/bonding/bond_options.c |  6 +++++-
> 2 files changed, 9 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index e0880a3840d7..5fe5232cc3f3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
> 				NETIF_F_HW_VLAN_CTAG_FILTER;
> 
> 	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
>-#ifdef CONFIG_XFRM_OFFLOAD
>-	bond_dev->hw_features |= BOND_XFRM_FEATURES;
>-#endif /* CONFIG_XFRM_OFFLOAD */
> 	bond_dev->features |= bond_dev->hw_features;
> 	bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> #ifdef CONFIG_XFRM_OFFLOAD
>-	/* Disable XFRM features if this isn't an active-backup config */
>-	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
>-		bond_dev->features &= ~BOND_XFRM_FEATURES;
>+	bond_dev->hw_features |= BOND_XFRM_FEATURES;
>+	/* Only enable XFRM features if this is an active-backup config */
>+	if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
>+		bond_dev->features |= BOND_XFRM_FEATURES;
> #endif /* CONFIG_XFRM_OFFLOAD */
> }
> 
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 9abfaae1c6f7..19205cfac751 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond,
> 		bond->params.tlb_dynamic_lb = 1;
> 
> #ifdef CONFIG_XFRM_OFFLOAD
>+	if (bond->dev->reg_state != NETREG_REGISTERED)
>+		goto noreg;
>+
> 	if (newval->value == BOND_MODE_ACTIVEBACKUP)
> 		bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> 	else
> 		bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
>-	netdev_change_features(bond->dev);
>+	netdev_update_features(bond->dev);
>+noreg:
>
> #endif /* CONFIG_XFRM_OFFLOAD */
> 
> 	/* don't cache arp_validate between modes */
>-- 
>2.28.0
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jarod Wilson Dec. 2, 2020, 7:03 p.m. UTC | #4
On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> > +     if (bond->dev->reg_state != NETREG_REGISTERED)
> > +             goto noreg;
> > +
> >       if (newval->value == BOND_MODE_ACTIVEBACKUP)
> >               bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> >       else
> >               bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > -     netdev_change_features(bond->dev);
> > +     netdev_update_features(bond->dev);
> > +noreg:
>
> Why the goto?

Seemed cleaner to prevent an extra level of indentation of the code
following the goto and before the label, but I'm not that attached to
it if it's not wanted for coding style reasons.
Jakub Kicinski Dec. 2, 2020, 7:22 p.m. UTC | #5
On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote:
> On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:  
> > > +     if (bond->dev->reg_state != NETREG_REGISTERED)
> > > +             goto noreg;
> > > +
> > >       if (newval->value == BOND_MODE_ACTIVEBACKUP)
> > >               bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> > >       else
> > >               bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > > -     netdev_change_features(bond->dev);
> > > +     netdev_update_features(bond->dev);
> > > +noreg:  
> >
> > Why the goto?  
> 
> Seemed cleaner to prevent an extra level of indentation of the code
> following the goto and before the label, but I'm not that attached to
> it if it's not wanted for coding style reasons.

Yes, please don't use gotos where a normal if statement is sufficient.
If you must avoid the indentation move the code to a helper.

Also - this patch did not apply to net, please make sure you're
developing on the correct base.
Jarod Wilson Dec. 2, 2020, 7:23 p.m. UTC | #6
On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Jarod Wilson <jarod@redhat.com> wrote:
>
> >Don't try to adjust XFRM support flags if the bond device isn't yet
> >registered. Bad things can currently happen when netdev_change_features()
> >is called without having wanted_features fully filled in yet. Basically,
> >this code was racing against register_netdevice() filling in
> >wanted_features, and when it got there first, the empty wanted_features
> >led to features also getting emptied out, which was definitely not the
> >intended behavior, so prevent that from happening.
>
>         Is this an actual race?  Reading Ivan's prior message, it sounds
> like it's an ordering problem (in that bond_newlink calls
> register_netdevice after bond_changelink).

Sorry, yeah, this is not actually a race condition, just an ordering
issue, bond_check_params() gets called at init time, which leads to
bond_option_mode_set() being called, and does so prior to
bond_create() running, which is where we actually call
register_netdevice().

>         The change to bond_option_mode_set tests against reg_state, so
> presumably it wants to skip the first(?) time through, before the
> register_netdevice call; is that right?

Correct. Later on, when the bonding driver is already loaded, and
parameter changes are made, bond_option_mode_set() gets called and if
the mode changes to or from active-backup, we do need/want this code
to run to update wanted and features flags properly.
Jarod Wilson Dec. 2, 2020, 7:39 p.m. UTC | #7
On Wed, Dec 2, 2020 at 2:23 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote:
> > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> > > > +     if (bond->dev->reg_state != NETREG_REGISTERED)
> > > > +             goto noreg;
> > > > +
> > > >       if (newval->value == BOND_MODE_ACTIVEBACKUP)
> > > >               bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> > > >       else
> > > >               bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > > > -     netdev_change_features(bond->dev);
> > > > +     netdev_update_features(bond->dev);
> > > > +noreg:
> > >
> > > Why the goto?
> >
> > Seemed cleaner to prevent an extra level of indentation of the code
> > following the goto and before the label, but I'm not that attached to
> > it if it's not wanted for coding style reasons.
>
> Yes, please don't use gotos where a normal if statement is sufficient.
> If you must avoid the indentation move the code to a helper.
>
> Also - this patch did not apply to net, please make sure you're
> developing on the correct base.

Argh, I must have been working in net-next instead of net, apologies.
Okay, I'll clarify the description per what Jay pointed out and adjust
the code to not include a goto, then make it on the right branch.
Jay Vosburgh Dec. 2, 2020, 8:17 p.m. UTC | #8
Jarod Wilson <jarod@redhat.com> wrote:

>On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>>
>> Jarod Wilson <jarod@redhat.com> wrote:
>>
>> >Don't try to adjust XFRM support flags if the bond device isn't yet
>> >registered. Bad things can currently happen when netdev_change_features()
>> >is called without having wanted_features fully filled in yet. Basically,
>> >this code was racing against register_netdevice() filling in
>> >wanted_features, and when it got there first, the empty wanted_features
>> >led to features also getting emptied out, which was definitely not the
>> >intended behavior, so prevent that from happening.
>>
>>         Is this an actual race?  Reading Ivan's prior message, it sounds
>> like it's an ordering problem (in that bond_newlink calls
>> register_netdevice after bond_changelink).
>
>Sorry, yeah, this is not actually a race condition, just an ordering
>issue, bond_check_params() gets called at init time, which leads to
>bond_option_mode_set() being called, and does so prior to
>bond_create() running, which is where we actually call
>register_netdevice().

	So this only happens if there's a "mode" module parameter?  That
doesn't sound like the call path that Ivan described (coming in via
bond_newlink).

	-J

>>         The change to bond_option_mode_set tests against reg_state, so
>> presumably it wants to skip the first(?) time through, before the
>> register_netdevice call; is that right?
>
>Correct. Later on, when the bonding driver is already loaded, and
>parameter changes are made, bond_option_mode_set() gets called and if
>the mode changes to or from active-backup, we do need/want this code
>to run to update wanted and features flags properly.
>
>
>-- 
>Jarod Wilson
>jarod@redhat.com

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jarod Wilson Dec. 2, 2020, 8:54 p.m. UTC | #9
On Wed, Dec 2, 2020 at 3:17 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Jarod Wilson <jarod@redhat.com> wrote:
>
> >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
> >>
> >> Jarod Wilson <jarod@redhat.com> wrote:
> >>
> >> >Don't try to adjust XFRM support flags if the bond device isn't yet
> >> >registered. Bad things can currently happen when netdev_change_features()
> >> >is called without having wanted_features fully filled in yet. Basically,
> >> >this code was racing against register_netdevice() filling in
> >> >wanted_features, and when it got there first, the empty wanted_features
> >> >led to features also getting emptied out, which was definitely not the
> >> >intended behavior, so prevent that from happening.
> >>
> >>         Is this an actual race?  Reading Ivan's prior message, it sounds
> >> like it's an ordering problem (in that bond_newlink calls
> >> register_netdevice after bond_changelink).
> >
> >Sorry, yeah, this is not actually a race condition, just an ordering
> >issue, bond_check_params() gets called at init time, which leads to
> >bond_option_mode_set() being called, and does so prior to
> >bond_create() running, which is where we actually call
> >register_netdevice().
>
>         So this only happens if there's a "mode" module parameter?  That
> doesn't sound like the call path that Ivan described (coming in via
> bond_newlink).

Ah. I think there's actually two different pathways that can trigger
this. The first is for bonds created at module load time, which I was
describing, the second is for a new bond created via bond_newlink()
after the bonding module is already loaded, as described by Ivan. Both
have the problem of bond_option_mode_set() running prior to
register_netdevice(). Of course, that would suggest every bond
currently comes up with unintentionally neutered flags, which I
neglected to catch in earlier testing and development.
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e0880a3840d7..5fe5232cc3f3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4746,15 +4746,13 @@  void bond_setup(struct net_device *bond_dev)
 				NETIF_F_HW_VLAN_CTAG_FILTER;
 
 	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
-#ifdef CONFIG_XFRM_OFFLOAD
-	bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
 	bond_dev->features |= bond_dev->hw_features;
 	bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-	/* Disable XFRM features if this isn't an active-backup config */
-	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-		bond_dev->features &= ~BOND_XFRM_FEATURES;
+	bond_dev->hw_features |= BOND_XFRM_FEATURES;
+	/* Only enable XFRM features if this is an active-backup config */
+	if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+		bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..19205cfac751 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -768,11 +768,15 @@  static int bond_option_mode_set(struct bonding *bond,
 		bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
+	if (bond->dev->reg_state != NETREG_REGISTERED)
+		goto noreg;
+
 	if (newval->value == BOND_MODE_ACTIVEBACKUP)
 		bond->dev->wanted_features |= BOND_XFRM_FEATURES;
 	else
 		bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-	netdev_change_features(bond->dev);
+	netdev_update_features(bond->dev);
+noreg:
 #endif /* CONFIG_XFRM_OFFLOAD */
 
 	/* don't cache arp_validate between modes */