diff mbox

[v2,net-next] net/core: generic support for disabling netdev features down stack

Message ID 1446519359-21400-1-git-send-email-jarod@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jarod Wilson Nov. 3, 2015, 2:55 a.m. UTC
There are some netdev features, which when disabled on an upper device,
such as a bonding master or a bridge, must be disabled and cannot be
re-enabled on underlying devices.

This is a rework of an earlier more heavy-handed appraoch, which simply
disables and prevents re-enabling of netdev features listed in a new
define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper
device that disables a flag in that feature mask, the disabling will
propagate down the stack, and any lower device that has any upper device
with one of those flags disabled should not be able to enable said flag.

Initially, only LRO is included for proof of concept, and because this
code effectively does the same thing as dev_disable_lro(), though it will
also activate from the ethtool path, which was one of the goals here.

[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -K bond0 lro off
[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: off
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: off

dmesg dump:

[ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
[ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71

This has been successfully tested with bnx2x, qlcnic and netxen network
cards as slaves in a bond interface. Turning LRO on or off on the master
also turns it on or off on each of the slaves, new slaves are added with
LRO in the same state as the master, and LRO can't be toggled on the
slaves.

Also, this should largely remove the need for dev_disable_lro(), and most,
if not all, of its call sites can be replaced by simply making sure
NETIF_F_LRO isn't included in the relevant device's feature flags.

Note that this patch is driven by bug reports from users saying it was
confusing that bonds and slaves had different settings for the same
features, and while it won't be 100% in sync if a lower device doesn't
support a feature like LRO, I think this is a good step in the right
direction.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
v2: Move manipulation of lower devices after ndo_set_features so the upper
device gets fully set up first and remove now redundant flag clearing, as
suggested by Alex Duyck. Filtering features on a current lower device
needs to be done prior to ndo_set_features, but both are now in
__netdev_update_features().

 include/linux/netdev_features.h | 11 +++++++++
 net/core/dev.c                  | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

Comments

David Miller Nov. 3, 2015, 4:41 a.m. UTC | #1
From: Jarod Wilson <jarod@redhat.com>
Date: Mon,  2 Nov 2015 21:55:59 -0500

> There are some netdev features, which when disabled on an upper device,
> such as a bonding master or a bridge, must be disabled and cannot be
> re-enabled on underlying devices.
> 
> This is a rework of an earlier more heavy-handed appraoch, which simply
> disables and prevents re-enabling of netdev features listed in a new
> define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper
> device that disables a flag in that feature mask, the disabling will
> propagate down the stack, and any lower device that has any upper device
> with one of those flags disabled should not be able to enable said flag.
> 
> Initially, only LRO is included for proof of concept, and because this
> code effectively does the same thing as dev_disable_lro(), though it will
> also activate from the ethtool path, which was one of the goals here.
 ...
> This has been successfully tested with bnx2x, qlcnic and netxen network
> cards as slaves in a bond interface. Turning LRO on or off on the master
> also turns it on or off on each of the slaves, new slaves are added with
> LRO in the same state as the master, and LRO can't be toggled on the
> slaves.
> 
> Also, this should largely remove the need for dev_disable_lro(), and most,
> if not all, of its call sites can be replaced by simply making sure
> NETIF_F_LRO isn't included in the relevant device's feature flags.
> 
> Note that this patch is driven by bug reports from users saying it was
> confusing that bonds and slaves had different settings for the same
> features, and while it won't be 100% in sync if a lower device doesn't
> support a feature like LRO, I think this is a good step in the right
> direction.
 ...
> Signed-off-by: Jarod Wilson <jarod@redhat.com>

Looks good to me, applied, thanks Jarod.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov Nov. 3, 2015, 10:03 a.m. UTC | #2
On 11/03/2015 03:55 AM, Jarod Wilson wrote:
[snip]
>  
> +#define for_each_netdev_feature(mask_addr, feature)				\
> +	int bit;								\
> +	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)	\
> +		feature = __NETIF_F_BIT(bit);
> +
^
This is broken, it will not work for more than a single feature.

>  /* Features valid for ethtool to change */
>  /* = all defined minus driver/device-class-related */
>  #define NETIF_F_NEVER_CHANGE	(NETIF_F_VLAN_CHALLENGED | \
> @@ -167,6 +172,12 @@ enum {
>   */
>  #define NETIF_F_ALL_FOR_ALL	(NETIF_F_NOCACHE_COPY | NETIF_F_FSO)
>  
> +/*
> + * If upper/master device has these features disabled, they must be disabled
> + * on all lower/slave devices as well.
> + */
> +#define NETIF_F_UPPER_DISABLES	NETIF_F_LRO
> +
[snip]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven Nov. 3, 2015, 1:52 p.m. UTC | #3
On Tue, Nov 3, 2015 at 11:03 AM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> On 11/03/2015 03:55 AM, Jarod Wilson wrote:
> [snip]
>>
>> +#define for_each_netdev_feature(mask_addr, feature)                          \
>> +     int bit;                                                                \
>> +     for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) \
>> +             feature = __NETIF_F_BIT(bit);
>> +
> ^
> This is broken, it will not work for more than a single feature.

Indeed it is.

This is used as:

        for_each_netdev_feature(&upper_disables, feature) {
        ...
        }

which expands to:

        int bit;
        for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
                feature = __NETIF_F_BIT(bit);
        {
                ...
        }

Note the assignment to "feature" happens outside the {}-delimited block.
And the block is always executed once.

Interestingly, arm-unknown-linux-gnueabi-gcc 4.9.0 warns about this in
some of my configs, but not in all of them:

net/core/dev.c: In function '__netdev_update_features':
net/core/dev.c:6302:16: warning: 'feature' may be used uninitialized
in this function [-Wmaybe-uninitialized]
    features &= ~feature;
                ^
net/core/dev.c:6295:20: note: 'feature' was declared here
  netdev_features_t feature;
                    ^
Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarod Wilson Nov. 3, 2015, 1:57 p.m. UTC | #4
Geert Uytterhoeven wrote:
> On Tue, Nov 3, 2015 at 11:03 AM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com>  wrote:
>> On 11/03/2015 03:55 AM, Jarod Wilson wrote:
>> [snip]
>>> +#define for_each_netdev_feature(mask_addr, feature)                          \
>>> +     int bit;                                                                \
>>> +     for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) \
>>> +             feature = __NETIF_F_BIT(bit);
>>> +
>> ^
>> This is broken, it will not work for more than a single feature.
>
> Indeed it is.
>
> This is used as:
>
>          for_each_netdev_feature(&upper_disables, feature) {
>          ...
>          }
>
> which expands to:
>
>          int bit;
>          for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
>                  feature = __NETIF_F_BIT(bit);
>          {
>                  ...
>          }
>
> Note the assignment to "feature" happens outside the {}-delimited block.
> And the block is always executed once.

Bah, crap, I was still staring at the code not seeing it, thank you for 
the detailed cluebat. I'll fix that up right now.
Nikolay Aleksandrov Nov. 3, 2015, 2:05 p.m. UTC | #5
On 11/03/2015 02:57 PM, Jarod Wilson wrote:
> Geert Uytterhoeven wrote:
>> On Tue, Nov 3, 2015 at 11:03 AM, Nikolay Aleksandrov
>> <nikolay@cumulusnetworks.com>  wrote:
>>> On 11/03/2015 03:55 AM, Jarod Wilson wrote:
>>> [snip]
>>>> +#define for_each_netdev_feature(mask_addr, feature)                          \
>>>> +     int bit;                                                                \
>>>> +     for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) \
>>>> +             feature = __NETIF_F_BIT(bit);
>>>> +
>>> ^
>>> This is broken, it will not work for more than a single feature.
>>
>> Indeed it is.
>>
>> This is used as:
>>
>>          for_each_netdev_feature(&upper_disables, feature) {
>>          ...
>>          }
>>
>> which expands to:
>>
>>          int bit;
>>          for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
>>                  feature = __NETIF_F_BIT(bit);
>>          {
>>                  ...
>>          }
>>
>> Note the assignment to "feature" happens outside the {}-delimited block.
>> And the block is always executed once.
> 
> Bah, crap, I was still staring at the code not seeing it, thank you for the detailed cluebat. I'll fix that up right now.
> 

Yeah, sorry for not elaborating, I wrote it in a hurry. :-)
Thanks Geert!

By the way since you'll be changing this code, I don't know if it's okay to
declare caller-visible hidden local variables in a macro like this, at the very
least please consider renaming it to something that's much less common, I can see
"bit" being used here and there. IMO either try to find a way to avoid it
altogether or add another argument to the macro so it's explicitly passed.

Cheers,
 Nik


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarod Wilson Nov. 3, 2015, 3:18 p.m. UTC | #6
Nikolay Aleksandrov wrote:
> On 11/03/2015 02:57 PM, Jarod Wilson wrote:
>> Geert Uytterhoeven wrote:
>>> On Tue, Nov 3, 2015 at 11:03 AM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com>   wrote:
>>>> On 11/03/2015 03:55 AM, Jarod Wilson wrote:
>>>> [snip]
>>>>> +#define for_each_netdev_feature(mask_addr, feature)                          \
>>>>> +     int bit;                                                                \
>>>>> +     for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) \
>>>>> +             feature = __NETIF_F_BIT(bit);
>>>>> +
>>>> ^
>>>> This is broken, it will not work for more than a single feature.
>>> Indeed it is.
>>>
>>> This is used as:
>>>
>>>           for_each_netdev_feature(&upper_disables, feature) {
>>>           ...
>>>           }
>>>
>>> which expands to:
>>>
>>>           int bit;
>>>           for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
>>>                   feature = __NETIF_F_BIT(bit);
>>>           {
>>>                   ...
>>>           }
>>>
>>> Note the assignment to "feature" happens outside the {}-delimited block.
>>> And the block is always executed once.
>> Bah, crap, I was still staring at the code not seeing it, thank you for the detailed cluebat. I'll fix that up right now.
>>
>
> Yeah, sorry for not elaborating, I wrote it in a hurry. :-)
> Thanks Geert!
>
> By the way since you'll be changing this code, I don't know if it's okay to
> declare caller-visible hidden local variables in a macro like this, at the very
> least please consider renaming it to something that's much less common, I can see
> "bit" being used here and there. IMO either try to find a way to avoid it
> altogether or add another argument to the macro so it's explicitly passed.

Just posted a follow-up that removes the macro-internal use of bit and 
doesn't botch up assigning feature. It's not as pretty, but it works 
correctly with multiple feature bits.
=?ISO-8859-2?Q?Micha=B3_Miros=B3aw?= April 2, 2016, 2:21 a.m. UTC | #7
Hi,

Sorry for digging up an old patch, but... ;-)

dev_disable_lro() is a leftover from ancient times. If you read commit
27660515a,
there is a hint where it should go. Please, read on if you'd like to
fix this properly.

2015-11-03 3:55 GMT+01:00 Jarod Wilson <jarod@redhat.com>:
> There are some netdev features, which when disabled on an upper device,
> such as a bonding master or a bridge, must be disabled and cannot be
> re-enabled on underlying devices.
[...]
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6288,6 +6288,44 @@ static void rollback_registered(struct net_device *dev)
>         list_del(&single);
>  }
>
> +static netdev_features_t netdev_sync_upper_features(struct net_device *lower,
> +       struct net_device *upper, netdev_features_t features)
> +{
> +       netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
> +       netdev_features_t feature;
> +
> +       for_each_netdev_feature(&upper_disables, feature) {
> +               if (!(upper->wanted_features & feature)
> +                   && (features & feature)) {
> +                       netdev_dbg(lower, "Dropping feature %pNF, upper dev %s has it off.\n",
> +                                  &feature, upper->name);
> +                       features &= ~feature;
> +               }
> +       }

You could do this once:

disable_features = ~upper->features & features & NETIF_F_UPPER_DISABLES;
if (features & disable_features)
  netdev_dbg(...)
features &= ~disable_features;

> +
> +       return features;
> +}
[...]
> @@ -6370,6 +6410,10 @@ int __netdev_update_features(struct net_device *dev)
>         /* driver might be less strict about feature dependencies */
>         features = netdev_fix_features(dev, features);
>
> +       /* some features can't be enabled if they're off an an upper device */
> +       netdev_for_each_upper_dev_rcu(dev, upper, iter)
> +               features = netdev_sync_upper_features(dev, upper, features);
> +
>         if (dev->features == features)
>                 return 0;
>

This should go into netdev_fix_features(), as it is a one single place,
where are feature dependencies are verified.

[...]
> @@ -6386,6 +6430,12 @@ int __netdev_update_features(struct net_device *dev)
>                 return -1;
>         }
>
> +       /* some features must be disabled on lower devices when disabled
> +        * on an upper device (think: bonding master or bridge)
> +        */
> +       netdev_for_each_lower_dev(dev, lower, iter)
> +               netdev_sync_lower_features(dev, lower, features);
> +
[...]

This should be equal in resulting flags to:

   netdev_for_each_lower_dev(dev, lower...)
     netdev_update_features(lower);

because netdev_fix_features(lower) will see already changed dev->features.

Best Regards,
Michał Mirosław
diff mbox

Patch

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 9672781..0f5837a 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -125,6 +125,11 @@  enum {
 #define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL	__NETIF_F(BUSY_POLL)
 
+#define for_each_netdev_feature(mask_addr, feature)				\
+	int bit;								\
+	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)	\
+		feature = __NETIF_F_BIT(bit);
+
 /* Features valid for ethtool to change */
 /* = all defined minus driver/device-class-related */
 #define NETIF_F_NEVER_CHANGE	(NETIF_F_VLAN_CHALLENGED | \
@@ -167,6 +172,12 @@  enum {
  */
 #define NETIF_F_ALL_FOR_ALL	(NETIF_F_NOCACHE_COPY | NETIF_F_FSO)
 
+/*
+ * If upper/master device has these features disabled, they must be disabled
+ * on all lower/slave devices as well.
+ */
+#define NETIF_F_UPPER_DISABLES	NETIF_F_LRO
+
 /* changeable features with no special hardware requirements */
 #define NETIF_F_SOFT_FEATURES	(NETIF_F_GSO | NETIF_F_GRO)
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 13f49f8..c4d2b43 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6288,6 +6288,44 @@  static void rollback_registered(struct net_device *dev)
 	list_del(&single);
 }
 
+static netdev_features_t netdev_sync_upper_features(struct net_device *lower,
+	struct net_device *upper, netdev_features_t features)
+{
+	netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
+	netdev_features_t feature;
+
+	for_each_netdev_feature(&upper_disables, feature) {
+		if (!(upper->wanted_features & feature)
+		    && (features & feature)) {
+			netdev_dbg(lower, "Dropping feature %pNF, upper dev %s has it off.\n",
+				   &feature, upper->name);
+			features &= ~feature;
+		}
+	}
+
+	return features;
+}
+
+static void netdev_sync_lower_features(struct net_device *upper,
+	struct net_device *lower, netdev_features_t features)
+{
+	netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
+	netdev_features_t feature;
+
+	for_each_netdev_feature(&upper_disables, feature) {
+		if (!(features & feature) && (lower->features & feature)) {
+			netdev_dbg(upper, "Disabling feature %pNF on lower dev %s.\n",
+				   &feature, lower->name);
+			lower->wanted_features &= ~feature;
+			netdev_update_features(lower);
+
+			if (unlikely(lower->features & feature))
+				netdev_WARN(upper, "failed to disable %pNF on %s!\n",
+					    &feature, lower->name);
+		}
+	}
+}
+
 static netdev_features_t netdev_fix_features(struct net_device *dev,
 	netdev_features_t features)
 {
@@ -6357,7 +6395,9 @@  static netdev_features_t netdev_fix_features(struct net_device *dev,
 
 int __netdev_update_features(struct net_device *dev)
 {
+	struct net_device *upper, *lower;
 	netdev_features_t features;
+	struct list_head *iter;
 	int err = 0;
 
 	ASSERT_RTNL();
@@ -6370,6 +6410,10 @@  int __netdev_update_features(struct net_device *dev)
 	/* driver might be less strict about feature dependencies */
 	features = netdev_fix_features(dev, features);
 
+	/* some features can't be enabled if they're off an an upper device */
+	netdev_for_each_upper_dev_rcu(dev, upper, iter)
+		features = netdev_sync_upper_features(dev, upper, features);
+
 	if (dev->features == features)
 		return 0;
 
@@ -6386,6 +6430,12 @@  int __netdev_update_features(struct net_device *dev)
 		return -1;
 	}
 
+	/* some features must be disabled on lower devices when disabled
+	 * on an upper device (think: bonding master or bridge)
+	 */
+	netdev_for_each_lower_dev(dev, lower, iter)
+		netdev_sync_lower_features(dev, lower, features);
+
 	if (!err)
 		dev->features = features;