[V2,net] netdevice: Include NETIF_F_HW_CSUM when intersecting features

Submitted by Vlad Yasevich on April 20, 2017, 7:17 p.m.

Details

Message ID 1492715844-30273-1-git-send-email-vyasevic@redhat.com
State Changes Requested
Delegated to: David Miller
Headers show

Commit Message

Vlad Yasevich April 20, 2017, 7:17 p.m.
While hardware device use either NETIF_F_(IP|IPV6)_CSUM or
NETIF_F_HW_CSUM, all of the software devices use HW_CSUM.
This results in an interesting situation when the software
device is configured on top of hw device using (IP|IPV6)_CSUM.
In this situation, the user can't turn off checksum offloading
features on the software device.

This patch resolves that by adding NETIF_F_HW_CSUM to the mask
if a feature set includes only IP|IPV6 csum.  This allows the user
to control the upper (software) device checksum, while at the same
time correctly propagating lower device changes up.

CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Tom Herbert <tom@herbertland.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>

---

V2: Addressed comments from Alex Duyck.  I tested this with hacked virtio
device that set IP|IPV6 checksums instead of HW.  Configuring a vlan on
top gave the vlan device with 'ip-generic: on' setting (using HW checksum).
This allows me to change vlan checksum offloads independent of virt-io nic.
Changes to virtio-nic propagated up to vlan, turning off the offloading
correctly.

 include/linux/netdevice.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Alexander Duyck April 20, 2017, 10:31 p.m.
On Thu, Apr 20, 2017 at 12:17 PM, Vladislav Yasevich
<vyasevich@gmail.com> wrote:
> While hardware device use either NETIF_F_(IP|IPV6)_CSUM or
> NETIF_F_HW_CSUM, all of the software devices use HW_CSUM.
> This results in an interesting situation when the software
> device is configured on top of hw device using (IP|IPV6)_CSUM.
> In this situation, the user can't turn off checksum offloading
> features on the software device.

Why wouldn't they be able to? It seems like the software device
shouldn't be setting IP_CSUM or IPV6_CSUM feature flags, but they will
be set in the intersect features. The result won't have
NETIF_F_HW_CSUM set, but it should advertise the features of the lower
device instead.

> This patch resolves that by adding NETIF_F_HW_CSUM to the mask
> if a feature set includes only IP|IPV6 csum.  This allows the user
> to control the upper (software) device checksum, while at the same
> time correctly propagating lower device changes up.

You can't go this way. HW_CSUM is an all inclusive feature flag,
whereas IP_CSUM and IPV6_CSUM specify only specific offload types.
With your change the lower device could disable IPV6_CSUM for instance
but you would still end up advertising checksum offload via HW_CSUM.

> CC: Michal Kubecek <mkubecek@suse.cz>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>
> ---
>
> V2: Addressed comments from Alex Duyck.  I tested this with hacked virtio
> device that set IP|IPV6 checksums instead of HW.  Configuring a vlan on
> top gave the vlan device with 'ip-generic: on' setting (using HW checksum).
> This allows me to change vlan checksum offloads independent of virt-io nic.
> Changes to virtio-nic propagated up to vlan, turning off the offloading
> correctly.
>
>  include/linux/netdevice.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index b0aa089..81aed2f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4009,10 +4009,10 @@ static inline netdev_features_t netdev_intersect_features(netdev_features_t f1,
>                                                           netdev_features_t f2)
>  {
>         if ((f1 ^ f2) & NETIF_F_HW_CSUM) {
> -               if (f1 & NETIF_F_HW_CSUM)
> -                       f1 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
> -               else
> -                       f2 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
> +               if (f1 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
> +                       f1 |= NETIF_F_HW_CSUM;
> +               if(f2 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
> +                       f2 |= NETIF_F_HW_CSUM;
>         }
>
>         return f1 & f2;
> --
> 2.7.4
>

This doesn't work. "NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM" doesn't
equate to "NETIF_F_HW_CSUM". The problem is NETIF_F_HW_CSUM is a
catch-all, the IP and IPV6 CSUM flags are not. Right now you would
introduce escapes on devices that enable IP but not IPv6, or the other
way around.

Can you point to the exact case where this code is an issue? It seems
like maybe you are wanting to have NETIF_F_HW_CSUM set if you support
offloading a given protocol. You might want to look at how we dealt
with this in the GSO code path so that if we could offload the
checksum we set NETIF_F_HW_CSUM based on protocol and the CSUM offload
feature bit for that protocol.

- Alex
Vlad Yasevich April 20, 2017, 11:19 p.m.
On 04/20/2017 06:31 PM, Alexander Duyck wrote:
> On Thu, Apr 20, 2017 at 12:17 PM, Vladislav Yasevich
> <vyasevich@gmail.com> wrote:
>> While hardware device use either NETIF_F_(IP|IPV6)_CSUM or
>> NETIF_F_HW_CSUM, all of the software devices use HW_CSUM.
>> This results in an interesting situation when the software
>> device is configured on top of hw device using (IP|IPV6)_CSUM.
>> In this situation, the user can't turn off checksum offloading
>> features on the software device.
> 
> Why wouldn't they be able to? It seems like the software device
> shouldn't be setting IP_CSUM or IPV6_CSUM feature flags, but they will
> be set in the intersect features. The result won't have
> NETIF_F_HW_CSUM set, but it should advertise the features of the lower
> device instead.

The can't because the upper software devices typically has HW_CSUM set in
hw_features and features.  When we intersect with a lower device which has
IP|IPV6 set, we end up with a software device that has IP|IPV6 set.  However,
the the hw_features have HW_CSUM, you end with a "fixed" setting of IP|IPV6
which can't be controlled now on the software device.

I've had a situation where trying to debug something, to turn off checksum
offloading on a vlan, I had to turn it of on the hw itself which effects all
traffic now.

We should be able to control it properly.

> 
>> This patch resolves that by adding NETIF_F_HW_CSUM to the mask
>> if a feature set includes only IP|IPV6 csum.  This allows the user
>> to control the upper (software) device checksum, while at the same
>> time correctly propagating lower device changes up.
> 
> You can't go this way. HW_CSUM is an all inclusive feature flag,
> whereas IP_CSUM and IPV6_CSUM specify only specific offload types.
> With your change the lower device could disable IPV6_CSUM for instance
> but you would still end up advertising checksum offload via HW_CSUM.
> 
>> CC: Michal Kubecek <mkubecek@suse.cz>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>> CC: Tom Herbert <tom@herbertland.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>
>> ---
>>
>> V2: Addressed comments from Alex Duyck.  I tested this with hacked virtio
>> device that set IP|IPV6 checksums instead of HW.  Configuring a vlan on
>> top gave the vlan device with 'ip-generic: on' setting (using HW checksum).
>> This allows me to change vlan checksum offloads independent of virt-io nic.
>> Changes to virtio-nic propagated up to vlan, turning off the offloading
>> correctly.
>>
>>  include/linux/netdevice.h | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index b0aa089..81aed2f 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -4009,10 +4009,10 @@ static inline netdev_features_t netdev_intersect_features(netdev_features_t f1,
>>                                                           netdev_features_t f2)
>>  {
>>         if ((f1 ^ f2) & NETIF_F_HW_CSUM) {
>> -               if (f1 & NETIF_F_HW_CSUM)
>> -                       f1 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
>> -               else
>> -                       f2 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
>> +               if (f1 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
>> +                       f1 |= NETIF_F_HW_CSUM;
>> +               if(f2 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
>> +                       f2 |= NETIF_F_HW_CSUM;
>>         }
>>
>>         return f1 & f2;
>> --
>> 2.7.4
>>
> 
> This doesn't work. "NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM" doesn't
> equate to "NETIF_F_HW_CSUM". The problem is NETIF_F_HW_CSUM is a
> catch-all, the IP and IPV6 CSUM flags are not. Right now you would
> introduce escapes on devices that enable IP but not IPv6, or the other
> way around.
> 
> Can you point to the exact case where this code is an issue? It seems
> like maybe you are wanting to have NETIF_F_HW_CSUM set if you support
> offloading a given protocol. You might want to look at how we dealt
> with this in the GSO code path so that if we could offload the
> checksum we set NETIF_F_HW_CSUM based on protocol and the CSUM offload
> feature bit for that protocol.
> 

What I'd like to be able to do is control features on the software device
without having to turn them off on the HW.  As it stands right now, we can't
do that.  If you want to try, configure a vlan on top of any device that
sets IP|IPV6 csum features.

I was trying to fix it in the common place.  It actually works correctly
since the software checksum gets computed correctly lower in the stack,
so there is no actual escape.

Having said that, the other alternative is to inherit hw_features from
lower devices.  BTW, bonding I think has a similar "issue" you are
describing since it prefers HW_CSUM if any of the slaves have it set.

-vlad

> - Alex
>
Alexander Duyck April 21, 2017, 3:06 a.m.
On Thu, Apr 20, 2017 at 4:19 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> On 04/20/2017 06:31 PM, Alexander Duyck wrote:
>> On Thu, Apr 20, 2017 at 12:17 PM, Vladislav Yasevich
>> <vyasevich@gmail.com> wrote:
>>> While hardware device use either NETIF_F_(IP|IPV6)_CSUM or
>>> NETIF_F_HW_CSUM, all of the software devices use HW_CSUM.
>>> This results in an interesting situation when the software
>>> device is configured on top of hw device using (IP|IPV6)_CSUM.
>>> In this situation, the user can't turn off checksum offloading
>>> features on the software device.
>>
>> Why wouldn't they be able to? It seems like the software device
>> shouldn't be setting IP_CSUM or IPV6_CSUM feature flags, but they will
>> be set in the intersect features. The result won't have
>> NETIF_F_HW_CSUM set, but it should advertise the features of the lower
>> device instead.
>
> The can't because the upper software devices typically has HW_CSUM set in
> hw_features and features.  When we intersect with a lower device which has
> IP|IPV6 set, we end up with a software device that has IP|IPV6 set.  However,
> the the hw_features have HW_CSUM, you end with a "fixed" setting of IP|IPV6
> which can't be controlled now on the software device.

Okay so it sounds like we have 2 competing needs here. My concern is
this gets used in netif_skb_features. We don't want to break that.

My thought is you may want to look at instead moving this change into
vlan_dev_fix_features. If you add one extra line there to explicitly
strop the IP_CSUM and IPV6_CSUM bits out of features you should have
the issue resolved. Maybe an additional line or two beyond that for a
comment indicating that the change is needed since we leak the two
flags in as a result of having support for HW_CSUM. You might need to
also add NETIF_F_HW_CSUM to the values you pull in based off of
old_features.

> I've had a situation where trying to debug something, to turn off checksum
> offloading on a vlan, I had to turn it of on the hw itself which effects all
> traffic now.
>
> We should be able to control it properly.

Agreed, I just don't think this was quite the right spot to do it. We
need to advertise support for what we have. The reason why this code
is the way it is is a result of the fact that HW_CSUM and IP_CSUM
should only result in us advertising support for IP_CSUM.

>>
>>> This patch resolves that by adding NETIF_F_HW_CSUM to the mask
>>> if a feature set includes only IP|IPV6 csum.  This allows the user
>>> to control the upper (software) device checksum, while at the same
>>> time correctly propagating lower device changes up.
>>
>> You can't go this way. HW_CSUM is an all inclusive feature flag,
>> whereas IP_CSUM and IPV6_CSUM specify only specific offload types.
>> With your change the lower device could disable IPV6_CSUM for instance
>> but you would still end up advertising checksum offload via HW_CSUM.
>>
>>> CC: Michal Kubecek <mkubecek@suse.cz>
>>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>>> CC: Tom Herbert <tom@herbertland.com>
>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>
>>> ---
>>>
>>> V2: Addressed comments from Alex Duyck.  I tested this with hacked virtio
>>> device that set IP|IPV6 checksums instead of HW.  Configuring a vlan on
>>> top gave the vlan device with 'ip-generic: on' setting (using HW checksum).
>>> This allows me to change vlan checksum offloads independent of virt-io nic.
>>> Changes to virtio-nic propagated up to vlan, turning off the offloading
>>> correctly.
>>>
>>>  include/linux/netdevice.h | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index b0aa089..81aed2f 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -4009,10 +4009,10 @@ static inline netdev_features_t netdev_intersect_features(netdev_features_t f1,
>>>                                                           netdev_features_t f2)
>>>  {
>>>         if ((f1 ^ f2) & NETIF_F_HW_CSUM) {
>>> -               if (f1 & NETIF_F_HW_CSUM)
>>> -                       f1 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
>>> -               else
>>> -                       f2 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
>>> +               if (f1 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
>>> +                       f1 |= NETIF_F_HW_CSUM;
>>> +               if(f2 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
>>> +                       f2 |= NETIF_F_HW_CSUM;
>>>         }
>>>
>>>         return f1 & f2;
>>> --
>>> 2.7.4
>>>
>>
>> This doesn't work. "NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM" doesn't
>> equate to "NETIF_F_HW_CSUM". The problem is NETIF_F_HW_CSUM is a
>> catch-all, the IP and IPV6 CSUM flags are not. Right now you would
>> introduce escapes on devices that enable IP but not IPv6, or the other
>> way around.
>>
>> Can you point to the exact case where this code is an issue? It seems
>> like maybe you are wanting to have NETIF_F_HW_CSUM set if you support
>> offloading a given protocol. You might want to look at how we dealt
>> with this in the GSO code path so that if we could offload the
>> checksum we set NETIF_F_HW_CSUM based on protocol and the CSUM offload
>> feature bit for that protocol.
>>
>
> What I'd like to be able to do is control features on the software device
> without having to turn them off on the HW.  As it stands right now, we can't
> do that.  If you want to try, configure a vlan on top of any device that
> sets IP|IPV6 csum features.
>
> I was trying to fix it in the common place.  It actually works correctly
> since the software checksum gets computed correctly lower in the stack,
> so there is no actual escape.
>
> Having said that, the other alternative is to inherit hw_features from
> lower devices.  BTW, bonding I think has a similar "issue" you are
> describing since it prefers HW_CSUM if any of the slaves have it set.
>
> -vlad

So it looks like you probably need to fix this in the VLAN driver
instead of here. From what I can tell that is the only spot that
really has any issues with this code.

- Alex
Michal Kubeček April 21, 2017, 5:33 a.m.
On Thu, Apr 20, 2017 at 07:19:55PM -0400, Vlad Yasevich wrote:
> 
> Having said that, the other alternative is to inherit hw_features from
> lower devices.  BTW, bonding I think has a similar "issue" you are
> describing since it prefers HW_CSUM if any of the slaves have it set.

It does but bonding uses netdev_increment_features() to combine slave
features and this function handles checksumming like "or", not "and"
(not only checksumming, also flags in NETIF_F_ONE_FOR_ALL).

That said, it's legitimate to ask if we want some of the features to be
handled differently when computing features for a vlan device. My point
before was that if the helper is called netdev_intersect_features(), it
shouldn't return any features that are not supported by both argument
sets, even if all its current users would benefit from slightly
different behaviour. If it does, it's a trap that someone might one day
fall in.

                                                         Michal Kubecek
Vlad Yasevich April 21, 2017, 5:33 p.m.
On Fri, Apr 21, 2017 at 1:33 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> On Thu, Apr 20, 2017 at 07:19:55PM -0400, Vlad Yasevich wrote:
>>
>> Having said that, the other alternative is to inherit hw_features from
>> lower devices.  BTW, bonding I think has a similar "issue" you are
>> describing since it prefers HW_CSUM if any of the slaves have it set.
>
> It does but bonding uses netdev_increment_features() to combine slave
> features and this function handles checksumming like "or", not "and"
> (not only checksumming, also flags in NETIF_F_ONE_FOR_ALL).
>

I am not saying that it doesn't.  What I am saying is that if you
form a bond with two devices: one with only NETIF_F_IP_CSUM and the
other with NETIF_F_HW_CSUM, then the bonding device will have NETIF_F_HW_CSUM
set.  This is similar to what is being proposed in the patch.

Alex's objection, at least as I understand it, is that we never want to
allow the above condition.  However, it looks like we already allow it
and correctly handle it.

> That said, it's legitimate to ask if we want some of the features to be
> handled differently when computing features for a vlan device. My point
> before was that if the helper is called netdev_intersect_features(), it
> shouldn't return any features that are not supported by both argument
> sets, even if all its current users would benefit from slightly
> different behaviour. If it does, it's a trap that someone might one day
> fall in.

Ok.  I think I understand, but we've always handled the checksum intersection
stangely.  I'll see what I can figure out.

Thanks
-vlad

>
>                                                          Michal Kubecek
>
Alexander Duyck April 21, 2017, 6:53 p.m.
On Fri, Apr 21, 2017 at 10:33 AM, Vladislav Yasevich
<vyasevich@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 1:33 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>> On Thu, Apr 20, 2017 at 07:19:55PM -0400, Vlad Yasevich wrote:
>>>
>>> Having said that, the other alternative is to inherit hw_features from
>>> lower devices.  BTW, bonding I think has a similar "issue" you are
>>> describing since it prefers HW_CSUM if any of the slaves have it set.
>>
>> It does but bonding uses netdev_increment_features() to combine slave
>> features and this function handles checksumming like "or", not "and"
>> (not only checksumming, also flags in NETIF_F_ONE_FOR_ALL).
>>
>
> I am not saying that it doesn't.  What I am saying is that if you
> form a bond with two devices: one with only NETIF_F_IP_CSUM and the
> other with NETIF_F_HW_CSUM, then the bonding device will have NETIF_F_HW_CSUM
> set.  This is similar to what is being proposed in the patch.
>
> Alex's objection, at least as I understand it, is that we never want to
> allow the above condition.  However, it looks like we already allow it
> and correctly handle it.

My objection is that the change as you have proposed it doesn't work
that way. It is one thing to advertise NETIF_F_HW_CSUM on an upper
device, the problem is netdev_intersect_features isn't used on just
the upper device. It is used to perform what is essentially a logical
AND of the features. What you are doing changes that logic. That is
why I suggested fixing this in the VLAN driver code instead.

>> That said, it's legitimate to ask if we want some of the features to be
>> handled differently when computing features for a vlan device. My point
>> before was that if the helper is called netdev_intersect_features(), it
>> shouldn't return any features that are not supported by both argument
>> sets, even if all its current users would benefit from slightly
>> different behaviour. If it does, it's a trap that someone might one day
>> fall in.
>
> Ok.  I think I understand, but we've always handled the checksum intersection
> stangely.  I'll see what I can figure out.
>

I'm okay with us setting NETIF_F_HW_CSUM if the lower device supports
any checksum offload. I just don't want the change to impact
netif_skb_features() in any way that could cause us to advertise
offload support that isn't there. That was why I suggested updating
vlan_dev_fix_features so that it would zero out the IP_CSUM and
IP6_CSUM flags and set the HW_CSUM if any offload was supported.
Basically it would just consist of adding the following lines after
the calls to netdev_intersect_features:

if (features & NETIF_F_CSUM_MASK) {
        features &= ~NETIF_F_CSUM_MASK;
        features |= NETIF_F_HW_CSUM;
}

Just that should be enough to resolve the issues you were seeing and
make it so that you always advertise NETIF_F_HW_CSUM instead of
IP_CSUM or IPV6_CSUM.

- Alex

Patch hide | download patch | download mbox

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b0aa089..81aed2f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4009,10 +4009,10 @@  static inline netdev_features_t netdev_intersect_features(netdev_features_t f1,
 							  netdev_features_t f2)
 {
 	if ((f1 ^ f2) & NETIF_F_HW_CSUM) {
-		if (f1 & NETIF_F_HW_CSUM)
-			f1 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
-		else
-			f2 |= (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
+		if (f1 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
+			f1 |= NETIF_F_HW_CSUM;
+		if(f2 & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
+			f2 |= NETIF_F_HW_CSUM;
 	}
 
 	return f1 & f2;