[net-next,v2,1/5] net: Introduce NETIF_F_GRO_HW.

Message ID 1512633815-25037-2-git-send-email-michael.chan@broadcom.com
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • Introduce NETIF_F_GRO_HW
Related show

Commit Message

Michael Chan Dec. 7, 2017, 8:03 a.m.
Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
GRO.  With this flag, we can now independently turn on or off hardware
GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
control hardware GRO and so it cannot be independently turned on or
off without affecting GRO.

Hardware GRO (just like GRO) guarantees that packets can be re-segmented
by TSO/GSO to reconstruct the original packet stream.  It is a subset of
NETIF_F_GRO and depends on it

Since NETIF_F_GRO is not propagated between upper and lower devices,
NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
words, a lower device can independent have GRO/GRO_HW enabled or disabled
and no feature propagation is required.  This will preserve the current
GRO behavior.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 Documentation/networking/netdev-features.txt |  8 ++++++++
 include/linux/netdev_features.h              |  3 +++
 net/core/dev.c                               | 17 +++++++++++++++++
 net/core/ethtool.c                           |  1 +
 4 files changed, 29 insertions(+)

Comments

Alexander Duyck Dec. 7, 2017, 6:13 p.m. | #1
On Thu, Dec 7, 2017 at 12:03 AM, Michael Chan <michael.chan@broadcom.com> wrote:
> Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
> GRO.  With this flag, we can now independently turn on or off hardware
> GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
> control hardware GRO and so it cannot be independently turned on or
> off without affecting GRO.
>
> Hardware GRO (just like GRO) guarantees that packets can be re-segmented
> by TSO/GSO to reconstruct the original packet stream.  It is a subset of
> NETIF_F_GRO and depends on it
>
> Since NETIF_F_GRO is not propagated between upper and lower devices,
> NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
> words, a lower device can independent have GRO/GRO_HW enabled or disabled
> and no feature propagation is required.  This will preserve the current
> GRO behavior.
>
> Cc: Ariel Elior <Ariel.Elior@cavium.com>
> Cc: everest-linux-l2@cavium.com
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  Documentation/networking/netdev-features.txt |  8 ++++++++
>  include/linux/netdev_features.h              |  3 +++
>  net/core/dev.c                               | 17 +++++++++++++++++
>  net/core/ethtool.c                           |  1 +
>  4 files changed, 29 insertions(+)
>
> diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
> index 7413eb0..8f36527 100644
> --- a/Documentation/networking/netdev-features.txt
> +++ b/Documentation/networking/netdev-features.txt
> @@ -163,3 +163,11 @@ This requests that the NIC receive all possible frames, including errored
>  frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
>  bad packets on it.  Some NICs may receive more packets if also put into normal
>  PROMISC mode.
> +
> +*  rx-gro-hw
> +
> +This requests that the NIC enables Hardware GRO (generic receive offload).
> +Hardware GRO is basically the exact reverse of TSO, and is generally
> +stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
> +be re-segmentable by GSO or TSO back to the exact original packet stream.
> +Hardware GRO is dependent on GRO and RXCSUM.
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index b1b0ca7..db84c51 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -78,6 +78,8 @@ enum {
>         NETIF_F_HW_ESP_TX_CSUM_BIT,     /* ESP with TX checksum offload */
>         NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
>
> +       NETIF_F_GRO_HW_BIT,             /* Hardware Generic receive offload */
> +
>         /*
>          * Add your fresh new feature above and remember to update
>          * netdev_features_strings[] in net/core/ethtool.c and maybe
> @@ -97,6 +99,7 @@ enum {
>  #define NETIF_F_FRAGLIST       __NETIF_F(FRAGLIST)
>  #define NETIF_F_FSO            __NETIF_F(FSO)
>  #define NETIF_F_GRO            __NETIF_F(GRO)
> +#define NETIF_F_GRO_HW         __NETIF_F(GRO_HW)
>  #define NETIF_F_GSO            __NETIF_F(GSO)
>  #define NETIF_F_GSO_ROBUST     __NETIF_F(GSO_ROBUST)
>  #define NETIF_F_HIGHDMA                __NETIF_F(HIGHDMA)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6bea893..7242e5e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7405,6 +7405,23 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>                 features &= ~dev->gso_partial_features;
>         }
>
> +       if (features & NETIF_F_GRO_HW) {
> +               /* Hardware GRO depends on GRO and RXCSUM. */
> +               if (!(features & NETIF_F_GRO)) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
> +                       features &= ~NETIF_F_GRO_HW;
> +               }

I'm not sure I agree with this part. The hardware offload should not
be dependent on the software offload. If you are concerned with the
XDP case and such you should probably just look at handling it more
like how TSO is handled with a group of flags that aggregate GRO, LRO,
and GRO_HW into a group of features that must be disabled to support
XDP.

> +               if (!(features & NETIF_F_RXCSUM)) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
> +                       features &= ~NETIF_F_GRO_HW;
> +               }
> +               /* Hardware GRO and LRO are mutually exclusive. */
> +               if (features & NETIF_F_LRO) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_LRO since GRO_HW is set.\n");
> +                       features &= ~NETIF_F_LRO;
> +               }

This is going to be problematic. What if you bond one interface that
has LRO and one that has GRO_HW? It seems like this is going to cause
the bond interface to lie and say LRO isn't there when it actually is,
or worse yet it will force features off when they shouldn't be.

> +       }
> +
>         return features;
>  }
>
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index f8fcf45..50a7920 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -73,6 +73,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
>         [NETIF_F_LLTX_BIT] =             "tx-lockless",
>         [NETIF_F_NETNS_LOCAL_BIT] =      "netns-local",
>         [NETIF_F_GRO_BIT] =              "rx-gro",
> +       [NETIF_F_GRO_HW_BIT] =           "rx-gro-hw",
>         [NETIF_F_LRO_BIT] =              "rx-lro",
>
>         [NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
> --
> 1.8.3.1
>
Michael Chan Dec. 7, 2017, 6:44 p.m. | #2
On Thu, Dec 7, 2017 at 10:13 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 12:03 AM, Michael Chan <michael.chan@broadcom.com> wrote:
>> @@ -7405,6 +7405,23 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>                 features &= ~dev->gso_partial_features;
>>         }
>>
>> +       if (features & NETIF_F_GRO_HW) {
>> +               /* Hardware GRO depends on GRO and RXCSUM. */
>> +               if (!(features & NETIF_F_GRO)) {
>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
>> +                       features &= ~NETIF_F_GRO_HW;
>> +               }
>
> I'm not sure I agree with this part. The hardware offload should not
> be dependent on the software offload.

As I said before, GRO_HW is basically hardware accelerated GRO. To me,
it makes perfect sense that GRO_HW depends on GRO.

> If you are concerned with the
> XDP case and such you should probably just look at handling it more
> like how TSO is handled with a group of flags that aggregate GRO, LRO,
> and GRO_HW into a group of features that must be disabled to support
> XDP.
>
>> +               if (!(features & NETIF_F_RXCSUM)) {
>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
>> +                       features &= ~NETIF_F_GRO_HW;
>> +               }
>> +               /* Hardware GRO and LRO are mutually exclusive. */
>> +               if (features & NETIF_F_LRO) {
>> +                       netdev_dbg(dev, "Dropping NETIF_F_LRO since GRO_HW is set.\n");
>> +                       features &= ~NETIF_F_LRO;
>> +               }
>
> This is going to be problematic. What if you bond one interface that
> has LRO and one that has GRO_HW? It seems like this is going to cause
> the bond interface to lie and say LRO isn't there when it actually is,
> or worse yet it will force features off when they shouldn't be.

This is just dropping incompatible features similar to how other
incompatible feature flags are dropped in netdev_fix_features().
Hardware cannot aggregate in both LRO and GRO_HW modes at the same
time.  It's one or the other.

LRO and GRO_HW are different.  We may never agree on this but they are
different.  If a bond needs to disable LRO, it will be propagated to
all slaves.  But each slave can have GRO enabled.  If GRO is enabled,
GRO_HW, being a subset of GRO, can be enabled.
Alexander Duyck Dec. 7, 2017, 9:35 p.m. | #3
On Thu, Dec 7, 2017 at 10:44 AM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Thu, Dec 7, 2017 at 10:13 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 12:03 AM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> @@ -7405,6 +7405,23 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>                 features &= ~dev->gso_partial_features;
>>>         }
>>>
>>> +       if (features & NETIF_F_GRO_HW) {
>>> +               /* Hardware GRO depends on GRO and RXCSUM. */
>>> +               if (!(features & NETIF_F_GRO)) {
>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
>>> +                       features &= ~NETIF_F_GRO_HW;
>>> +               }
>>
>> I'm not sure I agree with this part. The hardware offload should not
>> be dependent on the software offload.
>
> As I said before, GRO_HW is basically hardware accelerated GRO. To me,
> it makes perfect sense that GRO_HW depends on GRO.
>
>> If you are concerned with the
>> XDP case and such you should probably just look at handling it more
>> like how TSO is handled with a group of flags that aggregate GRO, LRO,
>> and GRO_HW into a group of features that must be disabled to support
>> XDP.
>>
>>> +               if (!(features & NETIF_F_RXCSUM)) {
>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
>>> +                       features &= ~NETIF_F_GRO_HW;
>>> +               }
>>> +               /* Hardware GRO and LRO are mutually exclusive. */
>>> +               if (features & NETIF_F_LRO) {
>>> +                       netdev_dbg(dev, "Dropping NETIF_F_LRO since GRO_HW is set.\n");
>>> +                       features &= ~NETIF_F_LRO;
>>> +               }
>>
>> This is going to be problematic. What if you bond one interface that
>> has LRO and one that has GRO_HW? It seems like this is going to cause
>> the bond interface to lie and say LRO isn't there when it actually is,
>> or worse yet it will force features off when they shouldn't be.
>
> This is just dropping incompatible features similar to how other
> incompatible feature flags are dropped in netdev_fix_features().
> Hardware cannot aggregate in both LRO and GRO_HW modes at the same
> time.  It's one or the other.

I'm not saying this is on a single device. The problem is in the case
of bonding you have combined 2 devices. We don't want to have an upper
device saying LRO isn't there when it is. A bonding setup is supposed
to advertise LRO as being enabled if a lower device has it present.
You are now making this an either/or thing at the driver level
generically when that isn't the case due to things like bridging and
bonding. This is why I was arguing that it would be easier to just
make this a super-set of LRO with one form that is reversible and one
that isn't. As is you are introducing all sorts of regressions that
are going to be problematic in any sort of mixed environment.

> LRO and GRO_HW are different.  We may never agree on this but they are
> different.  If a bond needs to disable LRO, it will be propagated to
> all slaves.  But each slave can have GRO enabled.  If GRO is enabled,
> GRO_HW, being a subset of GRO, can be enabled.

I get that they are very different. At a driver level it makes sense
to say you get one or the other at a physical driver level. The point
I was trying to make is that the bond is only able to set one or the
other when you might have a mix of devices. On the bonding device you
should be able to have both LRO and GRO_HW enabled.
Michael Chan Dec. 7, 2017, 10:08 p.m. | #4
On Thu, Dec 7, 2017 at 1:35 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 10:44 AM, Michael Chan <michael.chan@broadcom.com> wrote:
>> On Thu, Dec 7, 2017 at 10:13 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Thu, Dec 7, 2017 at 12:03 AM, Michael Chan <michael.chan@broadcom.com> wrote:
>>>> @@ -7405,6 +7405,23 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>>                 features &= ~dev->gso_partial_features;
>>>>         }
>>>>
>>>> +       if (features & NETIF_F_GRO_HW) {
>>>> +               /* Hardware GRO depends on GRO and RXCSUM. */
>>>> +               if (!(features & NETIF_F_GRO)) {
>>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
>>>> +                       features &= ~NETIF_F_GRO_HW;
>>>> +               }
>>>
>>> I'm not sure I agree with this part. The hardware offload should not
>>> be dependent on the software offload.
>>
>> As I said before, GRO_HW is basically hardware accelerated GRO. To me,
>> it makes perfect sense that GRO_HW depends on GRO.
>>
>>> If you are concerned with the
>>> XDP case and such you should probably just look at handling it more
>>> like how TSO is handled with a group of flags that aggregate GRO, LRO,
>>> and GRO_HW into a group of features that must be disabled to support
>>> XDP.
>>>
>>>> +               if (!(features & NETIF_F_RXCSUM)) {
>>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
>>>> +                       features &= ~NETIF_F_GRO_HW;
>>>> +               }
>>>> +               /* Hardware GRO and LRO are mutually exclusive. */
>>>> +               if (features & NETIF_F_LRO) {
>>>> +                       netdev_dbg(dev, "Dropping NETIF_F_LRO since GRO_HW is set.\n");
>>>> +                       features &= ~NETIF_F_LRO;
>>>> +               }
>>>
>>> This is going to be problematic. What if you bond one interface that
>>> has LRO and one that has GRO_HW? It seems like this is going to cause
>>> the bond interface to lie and say LRO isn't there when it actually is,
>>> or worse yet it will force features off when they shouldn't be.
>>
>> This is just dropping incompatible features similar to how other
>> incompatible feature flags are dropped in netdev_fix_features().
>> Hardware cannot aggregate in both LRO and GRO_HW modes at the same
>> time.  It's one or the other.
>
> I'm not saying this is on a single device. The problem is in the case
> of bonding you have combined 2 devices. We don't want to have an upper
> device saying LRO isn't there when it is. A bonding setup is supposed
> to advertise LRO as being enabled if a lower device has it present.
> You are now making this an either/or thing at the driver level
> generically when that isn't the case due to things like bridging and
> bonding. This is why I was arguing that it would be easier to just
> make this a super-set of LRO with one form that is reversible and one
> that isn't. As is you are introducing all sorts of regressions that
> are going to be problematic in any sort of mixed environment.

Are you saying that if LRO is dropped on a upper device (by the new
code added in this patch), all lower devices will need to drop it too?

If that's what you are saying, it's already taken care of.  There is
netdev_sync_upper_features() and netdev_sync_lower_features() that
will automatically do that as part of __netdev_update_features().

>
>> LRO and GRO_HW are different.  We may never agree on this but they are
>> different.  If a bond needs to disable LRO, it will be propagated to
>> all slaves.  But each slave can have GRO enabled.  If GRO is enabled,
>> GRO_HW, being a subset of GRO, can be enabled.
>
> I get that they are very different. At a driver level it makes sense
> to say you get one or the other at a physical driver level. The point
> I was trying to make is that the bond is only able to set one or the
> other when you might have a mix of devices. On the bonding device you
> should be able to have both LRO and GRO_HW enabled.

On the bond, you can have LRO enabled and it is propagated to lower
devices so that lower devices will enable LRO if it is supported.  If
LRO is disabled on the bond (e.g. due to bridging), It will be
propagated so all lower devices will disable LRO if it is enabled.

GRO/GRO_HW is different and it is not propagated from upper devices to
lower devices like LRO.  Lower devices can have GRO/GRO_HW
enabled/disabled regardless of upper device.  This is no different
from say RXCSUM.  Lower devices can have RXCSUM on/off regardless.
Alexander Duyck Dec. 7, 2017, 10:43 p.m. | #5
On Thu, Dec 7, 2017 at 2:08 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Thu, Dec 7, 2017 at 1:35 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 10:44 AM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> On Thu, Dec 7, 2017 at 10:13 AM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On Thu, Dec 7, 2017 at 12:03 AM, Michael Chan <michael.chan@broadcom.com> wrote:
>>>>> @@ -7405,6 +7405,23 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>>>                 features &= ~dev->gso_partial_features;
>>>>>         }
>>>>>
>>>>> +       if (features & NETIF_F_GRO_HW) {
>>>>> +               /* Hardware GRO depends on GRO and RXCSUM. */
>>>>> +               if (!(features & NETIF_F_GRO)) {
>>>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
>>>>> +                       features &= ~NETIF_F_GRO_HW;
>>>>> +               }
>>>>
>>>> I'm not sure I agree with this part. The hardware offload should not
>>>> be dependent on the software offload.
>>>
>>> As I said before, GRO_HW is basically hardware accelerated GRO. To me,
>>> it makes perfect sense that GRO_HW depends on GRO.
>>>
>>>> If you are concerned with the
>>>> XDP case and such you should probably just look at handling it more
>>>> like how TSO is handled with a group of flags that aggregate GRO, LRO,
>>>> and GRO_HW into a group of features that must be disabled to support
>>>> XDP.
>>>>
>>>>> +               if (!(features & NETIF_F_RXCSUM)) {
>>>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
>>>>> +                       features &= ~NETIF_F_GRO_HW;
>>>>> +               }
>>>>> +               /* Hardware GRO and LRO are mutually exclusive. */
>>>>> +               if (features & NETIF_F_LRO) {
>>>>> +                       netdev_dbg(dev, "Dropping NETIF_F_LRO since GRO_HW is set.\n");
>>>>> +                       features &= ~NETIF_F_LRO;
>>>>> +               }
>>>>
>>>> This is going to be problematic. What if you bond one interface that
>>>> has LRO and one that has GRO_HW? It seems like this is going to cause
>>>> the bond interface to lie and say LRO isn't there when it actually is,
>>>> or worse yet it will force features off when they shouldn't be.
>>>
>>> This is just dropping incompatible features similar to how other
>>> incompatible feature flags are dropped in netdev_fix_features().
>>> Hardware cannot aggregate in both LRO and GRO_HW modes at the same
>>> time.  It's one or the other.
>>
>> I'm not saying this is on a single device. The problem is in the case
>> of bonding you have combined 2 devices. We don't want to have an upper
>> device saying LRO isn't there when it is. A bonding setup is supposed
>> to advertise LRO as being enabled if a lower device has it present.
>> You are now making this an either/or thing at the driver level
>> generically when that isn't the case due to things like bridging and
>> bonding. This is why I was arguing that it would be easier to just
>> make this a super-set of LRO with one form that is reversible and one
>> that isn't. As is you are introducing all sorts of regressions that
>> are going to be problematic in any sort of mixed environment.
>
> Are you saying that if LRO is dropped on a upper device (by the new
> code added in this patch), all lower devices will need to drop it too?
>
> If that's what you are saying, it's already taken care of.  There is
> netdev_sync_upper_features() and netdev_sync_lower_features() that
> will automatically do that as part of __netdev_update_features().
>
>>
>>> LRO and GRO_HW are different.  We may never agree on this but they are
>>> different.  If a bond needs to disable LRO, it will be propagated to
>>> all slaves.  But each slave can have GRO enabled.  If GRO is enabled,
>>> GRO_HW, being a subset of GRO, can be enabled.
>>
>> I get that they are very different. At a driver level it makes sense
>> to say you get one or the other at a physical driver level. The point
>> I was trying to make is that the bond is only able to set one or the
>> other when you might have a mix of devices. On the bonding device you
>> should be able to have both LRO and GRO_HW enabled.
>
> On the bond, you can have LRO enabled and it is propagated to lower
> devices so that lower devices will enable LRO if it is supported.  If
> LRO is disabled on the bond (e.g. due to bridging), It will be
> propagated so all lower devices will disable LRO if it is enabled.
>
> GRO/GRO_HW is different and it is not propagated from upper devices to
> lower devices like LRO.  Lower devices can have GRO/GRO_HW
> enabled/disabled regardless of upper device.  This is no different
> from say RXCSUM.  Lower devices can have RXCSUM on/off regardless.

So on the bond device you should be able to have LRO, GRO, and GRO_HW
enabled all at the same time and that means devices below it can have
any combination of those enabled. If I recall enabling it on the lower
device shouldn't be allowed if an upper device has it disabled. If it
is enabled on the upper device it can be either enabled or disabled on
the lower device depending on the state the device prefers.

The GRO and GRO_HW should be propagated just like RXCSUM and
everything else. Basically if I turn it off on the bond all of the
lower devices should have it disabled. Only if it is enabled should
lower devices be allowed to have it either enabled or disabled. The
easiest way to think of it is on the bond the features are like a mask
of what can be enabled/disabled for the lower devices. If a feature is
not present in the bond it should not be present on any of the devices
below it. That is why I am saying you cannot say LRO and GRO_HW are
mutually exclusive for all devices. For physical devices I would say
yes it can be mutually exclusive, but for things like bond it cannot
be since you can have one device doing LRO and one device doing GRO_HW
getting added to a bond and both features should be advertised there.

My concern is if I have a bond with both GRO_HW and LRO enabled on
lower devices, and then disable something like TSO we are going to see
LRO disabled on the bond and propagated to all the lower devices. That
is a bad behavior and shouldn't happen. In addition I would want to be
able to disable GRO_HW from the bond should I encounter a buggy
implementation of the GRO hardware offload at some point in the
future. I don't like it being tied so closely with GRO since all it
takes is one hardware failure and then we are dealing with people
disabling GRO due to reports of it being buggy instead of it being
tied specifically to GRO_HW.

- Alex
Michael Chan Dec. 7, 2017, 11:17 p.m. | #6
On Thu, Dec 7, 2017 at 2:43 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 2:08 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> On the bond, you can have LRO enabled and it is propagated to lower
>> devices so that lower devices will enable LRO if it is supported.  If
>> LRO is disabled on the bond (e.g. due to bridging), It will be
>> propagated so all lower devices will disable LRO if it is enabled.
>>
>> GRO/GRO_HW is different and it is not propagated from upper devices to
>> lower devices like LRO.  Lower devices can have GRO/GRO_HW
>> enabled/disabled regardless of upper device.  This is no different
>> from say RXCSUM.  Lower devices can have RXCSUM on/off regardless.
>
> So on the bond device you should be able to have LRO, GRO, and GRO_HW
> enabled all at the same time and that means devices below it can have
> any combination of those enabled. If I recall enabling it on the lower
> device shouldn't be allowed if an upper device has it disabled. If it
> is enabled on the upper device it can be either enabled or disabled on
> the lower device depending on the state the device prefers.

Only LRO behaves the way you describe.  I believe LRO is treated like
this differently because bridging or routing needs to be able to
disable it completely.

GRO is not like that.  RXCSUM is not like that.

>
> The GRO and GRO_HW should be propagated just like RXCSUM and
> everything else. Basically if I turn it off on the bond all of the
> lower devices should have it disabled. Only if it is enabled should
> lower devices be allowed to have it either enabled or disabled. The
> easiest way to think of it is on the bond the features are like a mask
> of what can be enabled/disabled for the lower devices. If a feature is
> not present in the bond it should not be present on any of the devices
> below it. That is why I am saying you cannot say LRO and GRO_HW are
> mutually exclusive for all devices. For physical devices I would say
> yes it can be mutually exclusive, but for things like bond it cannot
> be since you can have one device doing LRO and one device doing GRO_HW
> getting added to a bond and both features should be advertised there.

I think you keep thinking that whatever we do with LRO, we must do the
same thing with GRO_HW.

But that's not true.  GRO/GRO_HW don't have to be disabled when
bridging/routing are enabled.  That's why upper devices don't have to
treat GRO/GRO_HW like LRO.  It can be treated just like RXCSUM which
is don't care.

>
> My concern is if I have a bond with both GRO_HW and LRO enabled on
> lower devices, and then disable something like TSO we are going to see
> LRO disabled on the bond and propagated to all the lower devices.

I don't get this.  I don't see how TSO is related.

> That
> is a bad behavior and shouldn't happen. In addition I would want to be
> able to disable GRO_HW from the bond should I encounter a buggy
> implementation of the GRO hardware offload at some point in the
> future. I don't like it being tied so closely with GRO since all it
> takes is one hardware failure and then we are dealing with people
> disabling GRO due to reports of it being buggy instead of it being
> tied specifically to GRO_HW.
>

Today, GRO, RXCSUM, NTUPLE, etc on a bond do not get propagated.  You
can propose and make them propagate if you want.

So GRO_HW just naturally follows GRO since it is a true subset of it.
Alexander Duyck Dec. 7, 2017, 11:35 p.m. | #7
On Thu, Dec 7, 2017 at 3:17 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Thu, Dec 7, 2017 at 2:43 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 2:08 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> On the bond, you can have LRO enabled and it is propagated to lower
>>> devices so that lower devices will enable LRO if it is supported.  If
>>> LRO is disabled on the bond (e.g. due to bridging), It will be
>>> propagated so all lower devices will disable LRO if it is enabled.
>>>
>>> GRO/GRO_HW is different and it is not propagated from upper devices to
>>> lower devices like LRO.  Lower devices can have GRO/GRO_HW
>>> enabled/disabled regardless of upper device.  This is no different
>>> from say RXCSUM.  Lower devices can have RXCSUM on/off regardless.
>>
>> So on the bond device you should be able to have LRO, GRO, and GRO_HW
>> enabled all at the same time and that means devices below it can have
>> any combination of those enabled. If I recall enabling it on the lower
>> device shouldn't be allowed if an upper device has it disabled. If it
>> is enabled on the upper device it can be either enabled or disabled on
>> the lower device depending on the state the device prefers.
>
> Only LRO behaves the way you describe.  I believe LRO is treated like
> this differently because bridging or routing needs to be able to
> disable it completely.
>
> GRO is not like that.  RXCSUM is not like that.

I think that what your saying is your patch has exposed a bug. If I
disable GRO or GRO_HW on the upper device I would expect to have it
disabled on all lower devices. If anything it should probably be added
to NETIF_F_UPPER_DISABLES. Otherwise we are going to be breaking
things like XPS which you pointed out earlier.

>> The GRO and GRO_HW should be propagated just like RXCSUM and
>> everything else. Basically if I turn it off on the bond all of the
>> lower devices should have it disabled. Only if it is enabled should
>> lower devices be allowed to have it either enabled or disabled. The
>> easiest way to think of it is on the bond the features are like a mask
>> of what can be enabled/disabled for the lower devices. If a feature is
>> not present in the bond it should not be present on any of the devices
>> below it. That is why I am saying you cannot say LRO and GRO_HW are
>> mutually exclusive for all devices. For physical devices I would say
>> yes it can be mutually exclusive, but for things like bond it cannot
>> be since you can have one device doing LRO and one device doing GRO_HW
>> getting added to a bond and both features should be advertised there.
>
> I think you keep thinking that whatever we do with LRO, we must do the
> same thing with GRO_HW.
>
> But that's not true.  GRO/GRO_HW don't have to be disabled when
> bridging/routing are enabled.  That's why upper devices don't have to
> treat GRO/GRO_HW like LRO.  It can be treated just like RXCSUM which
> is don't care.
>
>>
>> My concern is if I have a bond with both GRO_HW and LRO enabled on
>> lower devices, and then disable something like TSO we are going to see
>> LRO disabled on the bond and propagated to all the lower devices.
>
> I don't get this.  I don't see how TSO is related.

It isn't. That is the point. If I change ANY feature it will trigger
fix_features to get called. It will then see we have GRO_HW and LRO
since we have one interface that supports one and one that supports
another. It will trigger this code and cause LRO to be disabled which
then will be propagated down.

>> That
>> is a bad behavior and shouldn't happen. In addition I would want to be
>> able to disable GRO_HW from the bond should I encounter a buggy
>> implementation of the GRO hardware offload at some point in the
>> future. I don't like it being tied so closely with GRO since all it
>> takes is one hardware failure and then we are dealing with people
>> disabling GRO due to reports of it being buggy instead of it being
>> tied specifically to GRO_HW.
>>
>
> Today, GRO, RXCSUM, NTUPLE, etc on a bond do not get propagated.  You
> can propose and make them propagate if you want.

Or you can fix your patches so you take it into account now instead of
putting things in a broken state and asking others to fix it.

> So GRO_HW just naturally follows GRO since it is a true subset of it.

Right, and I am saying you have exposed a bug. If I don't want GRO on
the bond it should be propagated down. It doesn't make sense to allow
lower devices to assemble frames when the bond doesn't want them and
GSO won't kick in before it gets to the bond.
Michael Chan Dec. 8, 2017, 12:05 a.m. | #8
On Thu, Dec 7, 2017 at 3:35 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 3:17 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> I don't get this.  I don't see how TSO is related.
>
> It isn't. That is the point. If I change ANY feature it will trigger
> fix_features to get called. It will then see we have GRO_HW and LRO
> since we have one interface that supports one and one that supports
> another. It will trigger this code and cause LRO to be disabled which
> then will be propagated down.

I see.  But this won't happen.  Because the bonding driver is not
advertising NETIF_F_GRO_HW in its hw_features.  It is not advertising
NETIF_F_GRO either but I think it gets added automatically since it is
a software feature.  So LRO won't get disabled on the bond when a
unrelated feature is changed.

But I think I see your point.  I can make it so that it is up to
individual driver's .ndo_fix_features() to drop LRO/GRO_HW as it sees
fit, instead of doing it in the common netdev_fix_features().  That
way, it is more flexible at least.

>
>>> That
>>> is a bad behavior and shouldn't happen. In addition I would want to be
>>> able to disable GRO_HW from the bond should I encounter a buggy
>>> implementation of the GRO hardware offload at some point in the
>>> future. I don't like it being tied so closely with GRO since all it
>>> takes is one hardware failure and then we are dealing with people
>>> disabling GRO due to reports of it being buggy instead of it being
>>> tied specifically to GRO_HW.
>>>
>>
>> Today, GRO, RXCSUM, NTUPLE, etc on a bond do not get propagated.  You
>> can propose and make them propagate if you want.
>
> Or you can fix your patches so you take it into account now instead of
> putting things in a broken state and asking others to fix it.

I don't think that things are necessarily broken today.  LRO truly
needs to be propagated.  It's debatable whether other features like
GRO/RXCSUM/NTUPLE should be centrally set by the upper device or not.

>
>> So GRO_HW just naturally follows GRO since it is a true subset of it.
>
> Right, and I am saying you have exposed a bug. If I don't want GRO on
> the bond it should be propagated down. It doesn't make sense to allow
> lower devices to assemble frames when the bond doesn't want them and
> GSO won't kick in before it gets to the bond.

GRO kicks in at the lower device before it gets to the bond if the
lower device calls napi_gro_receive() and GRO is enabled.
Alexander Duyck Dec. 8, 2017, 2:36 a.m. | #9
On Thu, Dec 7, 2017 at 4:05 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Thu, Dec 7, 2017 at 3:35 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 3:17 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> I don't get this.  I don't see how TSO is related.
>>
>> It isn't. That is the point. If I change ANY feature it will trigger
>> fix_features to get called. It will then see we have GRO_HW and LRO
>> since we have one interface that supports one and one that supports
>> another. It will trigger this code and cause LRO to be disabled which
>> then will be propagated down.
>
> I see.  But this won't happen.  Because the bonding driver is not
> advertising NETIF_F_GRO_HW in its hw_features.  It is not advertising
> NETIF_F_GRO either but I think it gets added automatically since it is
> a software feature.  So LRO won't get disabled on the bond when a
> unrelated feature is changed.
>
> But I think I see your point.  I can make it so that it is up to
> individual driver's .ndo_fix_features() to drop LRO/GRO_HW as it sees
> fit, instead of doing it in the common netdev_fix_features().  That
> way, it is more flexible at least.

Thank you.

>>
>>>> That
>>>> is a bad behavior and shouldn't happen. In addition I would want to be
>>>> able to disable GRO_HW from the bond should I encounter a buggy
>>>> implementation of the GRO hardware offload at some point in the
>>>> future. I don't like it being tied so closely with GRO since all it
>>>> takes is one hardware failure and then we are dealing with people
>>>> disabling GRO due to reports of it being buggy instead of it being
>>>> tied specifically to GRO_HW.
>>>>
>>>
>>> Today, GRO, RXCSUM, NTUPLE, etc on a bond do not get propagated.  You
>>> can propose and make them propagate if you want.
>>
>> Or you can fix your patches so you take it into account now instead of
>> putting things in a broken state and asking others to fix it.
>
> I don't think that things are necessarily broken today.  LRO truly
> needs to be propagated.  It's debatable whether other features like
> GRO/RXCSUM/NTUPLE should be centrally set by the upper device or not.

So I can agree with the NTUPLE not being propagated since it doesn't
actually effect upper devices. Really the functionality only really
has effects locally since the functionality consists of route to a
specific queue/device or drop the packet.

I'm not sure why RXCSUM isn't being propagated. It seems like that is
something that would make sense to have passed all the way down to the
lower devices since a single device that is doing bad Rx checksum
offloads could potentially corrupt all traffic in a bond. Seems like
that one should definitely be included.

>>
>>> So GRO_HW just naturally follows GRO since it is a true subset of it.
>>
>> Right, and I am saying you have exposed a bug. If I don't want GRO on
>> the bond it should be propagated down. It doesn't make sense to allow
>> lower devices to assemble frames when the bond doesn't want them and
>> GSO won't kick in before it gets to the bond.
>
> GRO kicks in at the lower device before it gets to the bond if the
> lower device calls napi_gro_receive() and GRO is enabled.

I get that. I assume the reason why the bond doesn't have it enabled
is because we don't want it to kick in at every given netdev, there
isn't any point to do GRO more than once. The problem is GRO_HW isn't
a pure software offload like GRO is. Call me a pessimist, but when we
end up encountering a buggy implementation that has to be disabled we
will want the right infrastructure in place to handle it. It becomes
another argument for why we might want to split GRO_HW and GRO without
tying them together. It would make sense to expose GRO_HW in a bond,
but not GRO. It might be something where we want to do any close tying
together of the GRO flag and GRO_HW at the driver as well. Basically
the legacy devices that transition over to GRO_HW from using just the
GRO flag could do that to maintain existing functionality, and new
drivers that implement it could opt in to the same behavior or just
handle GRO_HW as a separate flag.

Actually I just had a thought. What if we consider this a separate GRO
stage instead of just a hardware offload? Our standard GRO is a post
receive from the driver perspective, basically the packet is assembled
after we have handed it to the stack. What you are doing with GRO_HW
is essentially providing an early reassembly before it is handed to
the stack. What if we were to rename GRO_HW to something like
GRO_LOWER, GRO_EARLY, GRO_PRE, or pick your name (I'm lousy at
naming), and used it as a way to indicate that we want to perform GRO
before we begin receive processing on the frame in our driver? Then
for stacked devices you could use this new flag to indicate you don't
want to perform GRO on the lower levels below this device, and could
then use the regular GRO flag to control if we do it ourselves. Doing
that should provide stacked devices with a good way to control GRO on
the lower devices and would resolve what you need to indicate as well.
The only real changes needed might be a rename and to add the
necessary bit shifting for the upper and lower dev sync code. If you
aren't interested in the idea I can probably spend a couple of hours
getting to it tomorrow since I think this might be a much better way
to go as it solves multiple issues.
Michael Chan Dec. 8, 2017, 4:02 a.m. | #10
On Thu, Dec 7, 2017 at 6:36 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 4:05 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> I see.  But this won't happen.  Because the bonding driver is not
>> advertising NETIF_F_GRO_HW in its hw_features.  It is not advertising
>> NETIF_F_GRO either but I think it gets added automatically since it is
>> a software feature.  So LRO won't get disabled on the bond when a
>> unrelated feature is changed.
>>
>> But I think I see your point.  I can make it so that it is up to
>> individual driver's .ndo_fix_features() to drop LRO/GRO_HW as it sees
>> fit, instead of doing it in the common netdev_fix_features().  That
>> way, it is more flexible at least.
>
> Thank you.

OK.  I will make this change for V3.

>>
>> I don't think that things are necessarily broken today.  LRO truly
>> needs to be propagated.  It's debatable whether other features like
>> GRO/RXCSUM/NTUPLE should be centrally set by the upper device or not.
>
> So I can agree with the NTUPLE not being propagated since it doesn't
> actually effect upper devices. Really the functionality only really
> has effects locally since the functionality consists of route to a
> specific queue/device or drop the packet.
>
> I'm not sure why RXCSUM isn't being propagated. It seems like that is
> something that would make sense to have passed all the way down to the
> lower devices since a single device that is doing bad Rx checksum
> offloads could potentially corrupt all traffic in a bond. Seems like
> that one should definitely be included.

This is a separate discussion that goes beyond GRO.  There are pros
and cons for the upper device to propagate every single feature flag.

>>
>> GRO kicks in at the lower device before it gets to the bond if the
>> lower device calls napi_gro_receive() and GRO is enabled.
>
> I get that. I assume the reason why the bond doesn't have it enabled
> is because we don't want it to kick in at every given netdev, there
> isn't any point to do GRO more than once. The problem is GRO_HW isn't
> a pure software offload like GRO is. Call me a pessimist, but when we
> end up encountering a buggy implementation that has to be disabled we
> will want the right infrastructure in place to handle it. It becomes
> another argument for why we might want to split GRO_HW and GRO without
> tying them together. It would make sense to expose GRO_HW in a bond,
> but not GRO. It might be something where we want to do any close tying
> together of the GRO flag and GRO_HW at the driver as well. Basically
> the legacy devices that transition over to GRO_HW from using just the
> GRO flag could do that to maintain existing functionality, and new
> drivers that implement it could opt in to the same behavior or just
> handle GRO_HW as a separate flag.

To me, making GRO_HW dependent on GRO makes the most intuitive sense.
Separating them is just confusing.  The possibility of GRO_HW being
enabled without GRO enabled makes no sense to me.

>
> Actually I just had a thought. What if we consider this a separate GRO
> stage instead of just a hardware offload? Our standard GRO is a post
> receive from the driver perspective, basically the packet is assembled
> after we have handed it to the stack. What you are doing with GRO_HW
> is essentially providing an early reassembly before it is handed to
> the stack. What if we were to rename GRO_HW to something like
> GRO_LOWER, GRO_EARLY, GRO_PRE, or pick your name (I'm lousy at
> naming), and used it as a way to indicate that we want to perform GRO
> before we begin receive processing on the frame in our driver? Then
> for stacked devices you could use this new flag to indicate you don't
> want to perform GRO on the lower levels below this device, and could
> then use the regular GRO flag to control if we do it ourselves. Doing
> that should provide stacked devices with a good way to control GRO on
> the lower devices and would resolve what you need to indicate as well.
> The only real changes needed might be a rename and to add the
> necessary bit shifting for the upper and lower dev sync code. If you
> aren't interested in the idea I can probably spend a couple of hours
> getting to it tomorrow since I think this might be a much better way
> to go as it solves multiple issues.

I really don't see what a different name will buy us.  If you want to
propagate GRO/GRO_HW, we can do that if others agree.  I only feel
strongly that GRO/GRO_HW should be tied.  I don't feel strongly
whether GRO/GRO_HW should be propagated or not propagated.  Again, LRO
needs to be propagated out of necessity (e.g. when a bond is added to
a bridge).

Patch

diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
index 7413eb0..8f36527 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.txt
@@ -163,3 +163,11 @@  This requests that the NIC receive all possible frames, including errored
 frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
 bad packets on it.  Some NICs may receive more packets if also put into normal
 PROMISC mode.
+
+*  rx-gro-hw
+
+This requests that the NIC enables Hardware GRO (generic receive offload).
+Hardware GRO is basically the exact reverse of TSO, and is generally
+stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
+be re-segmentable by GSO or TSO back to the exact original packet stream.
+Hardware GRO is dependent on GRO and RXCSUM.
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index b1b0ca7..db84c51 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -78,6 +78,8 @@  enum {
 	NETIF_F_HW_ESP_TX_CSUM_BIT,	/* ESP with TX checksum offload */
 	NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
 
+	NETIF_F_GRO_HW_BIT,		/* Hardware Generic receive offload */
+
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -97,6 +99,7 @@  enum {
 #define NETIF_F_FRAGLIST	__NETIF_F(FRAGLIST)
 #define NETIF_F_FSO		__NETIF_F(FSO)
 #define NETIF_F_GRO		__NETIF_F(GRO)
+#define NETIF_F_GRO_HW		__NETIF_F(GRO_HW)
 #define NETIF_F_GSO		__NETIF_F(GSO)
 #define NETIF_F_GSO_ROBUST	__NETIF_F(GSO_ROBUST)
 #define NETIF_F_HIGHDMA		__NETIF_F(HIGHDMA)
diff --git a/net/core/dev.c b/net/core/dev.c
index 6bea893..7242e5e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7405,6 +7405,23 @@  static netdev_features_t netdev_fix_features(struct net_device *dev,
 		features &= ~dev->gso_partial_features;
 	}
 
+	if (features & NETIF_F_GRO_HW) {
+		/* Hardware GRO depends on GRO and RXCSUM. */
+		if (!(features & NETIF_F_GRO)) {
+			netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
+			features &= ~NETIF_F_GRO_HW;
+		}
+		if (!(features & NETIF_F_RXCSUM)) {
+			netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
+			features &= ~NETIF_F_GRO_HW;
+		}
+		/* Hardware GRO and LRO are mutually exclusive. */
+		if (features & NETIF_F_LRO) {
+			netdev_dbg(dev, "Dropping NETIF_F_LRO since GRO_HW is set.\n");
+			features &= ~NETIF_F_LRO;
+		}
+	}
+
 	return features;
 }
 
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf45..50a7920 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -73,6 +73,7 @@  int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
 	[NETIF_F_LLTX_BIT] =             "tx-lockless",
 	[NETIF_F_NETNS_LOCAL_BIT] =      "netns-local",
 	[NETIF_F_GRO_BIT] =              "rx-gro",
+	[NETIF_F_GRO_HW_BIT] =           "rx-gro-hw",
 	[NETIF_F_LRO_BIT] =              "rx-lro",
 
 	[NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",