diff mbox series

[ovs-dev,2/2] tc: Support masks for tunnel destination ports.

Message ID 20220704224505.1117988-3-i.maximets@ovn.org
State Superseded
Headers show
Series tc: Fixes for tunnel offloading. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets July 4, 2022, 10:45 p.m. UTC
TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
create a masked match on this field.

This change is important as we're not clearing the masks which wasn't
really used, so if OVS requests match on ports, we should use the
mask and clear, otherwise offloading will fail.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 lib/netdev-offload-tc.c |  4 ++++
 lib/tc.c                | 11 ++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

Comments

Eelco Chaudron July 6, 2022, 10:26 a.m. UTC | #1
On 5 Jul 2022, at 0:45, Ilya Maximets wrote:

> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
> create a masked match on this field.
>
> This change is important as we're not clearing the masks which wasn't
> really used, so if OVS requests match on ports, we should use the
> mask and clear, otherwise offloading will fail.
>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Roi Dayan July 6, 2022, 12:34 p.m. UTC | #2
On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
> 
> 
> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
> 
>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>> create a masked match on this field.
>>
>> This change is important as we're not clearing the masks which wasn't
>> really used, so if OVS requests match on ports, we should use the
>> mask and clear, otherwise offloading will fail.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.
> 
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> 

can we postpone to merge this?
After this patch simple vxlan rules won't have enc dst port on decap
rules and currently mlx5 driver must have enc dst port to do the
offload of the decap rule.
I'm checking about this limitation and if we can remove it but
if we can postpone a bit so we won't break users with our nics..

thanks,
Roi
Roi Dayan July 6, 2022, 2 p.m. UTC | #3
On 2022-07-06 3:34 PM, Roi Dayan wrote:
> 
> 
> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>
>>
>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>
>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>> create a masked match on this field.
>>>
>>> This change is important as we're not clearing the masks which wasn't
>>> really used, so if OVS requests match on ports, we should use the
>>> mask and clear, otherwise offloading will fail.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> Changes look good to me, and all system-traffic.at tests that were 
>> passing without the change are still passing, including the failing 
>> ERSPAN ones.
>>
>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>
> 
> can we postpone to merge this?
> After this patch simple vxlan rules won't have enc dst port on decap
> rules and currently mlx5 driver must have enc dst port to do the
> offload of the decap rule.
> I'm checking about this limitation and if we can remove it but
> if we can postpone a bit so we won't break users with our nics..
> 
> thanks,
> Roi


I finished with the testing and checking I needed.
I don't have a problem with the patch. if something else will raise i'll
start a different discussion.

Acked-by: Roi Dayan <roid@nvidia.com>
Roi Dayan July 7, 2022, 6:48 a.m. UTC | #4
On 2022-07-06 5:00 PM, Roi Dayan wrote:
> 
> 
> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>
>>
>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>
>>>
>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>
>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>> create a masked match on this field.
>>>>
>>>> This change is important as we're not clearing the masks which wasn't
>>>> really used, so if OVS requests match on ports, we should use the
>>>> mask and clear, otherwise offloading will fail.
>>>>
>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>
>>> Changes look good to me, and all system-traffic.at tests that were 
>>> passing without the change are still passing, including the failing 
>>> ERSPAN ones.
>>>
>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>
>>
>> can we postpone to merge this?
>> After this patch simple vxlan rules won't have enc dst port on decap
>> rules and currently mlx5 driver must have enc dst port to do the
>> offload of the decap rule.
>> I'm checking about this limitation and if we can remove it but
>> if we can postpone a bit so we won't break users with our nics..
>>
>> thanks,
>> Roi
> 
> 
> I finished with the testing and checking I needed.
> I don't have a problem with the patch. if something else will raise i'll
> start a different discussion.
> 
> Acked-by: Roi Dayan <roid@nvidia.com>


I have a concern now. today for an offload driver to offload
a tunnel decap rule it registers to TC indirect callback.

Adding a TC rule on a tunnel device calls all registered drivers
to offload the rule.
With a single tunnel on a device this is ok but if a system have 
multiple tunnels with same src/dst ip but different enc dst port,
adding a tc rule on one of the tunnel devices without matching
enc dst port will call a driver to offload such a rule in the hw,
but that rule is potentially matching all sw tunnels. a driver
will have to match implicit on the the enc_dst_port to distinguish
between the tunnels.

OVS doesn't match explicit on enc_dst_port also when having multiple
tunnels as the in SW the packets will be on the correct
tunnel device and then pass tc or ovs datapath.

After this patch offload drivers must to implicit match the
enc_dst_port to avoid having hw rule matching all the tunnels
with same src/dst ip. taking the tunnel maybe.
e.g. for vxlan tunnel from vxlan->cfg.dst_port

What do you think?
Roi Dayan July 7, 2022, 7:01 a.m. UTC | #5
On 2022-07-07 9:48 AM, Roi Dayan wrote:
> 
> 
> On 2022-07-06 5:00 PM, Roi Dayan wrote:
>>
>>
>> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>>
>>>
>>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>>
>>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>>> create a masked match on this field.
>>>>>
>>>>> This change is important as we're not clearing the masks which wasn't
>>>>> really used, so if OVS requests match on ports, we should use the
>>>>> mask and clear, otherwise offloading will fail.
>>>>>
>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>
>>>> Changes look good to me, and all system-traffic.at tests that were 
>>>> passing without the change are still passing, including the failing 
>>>> ERSPAN ones.
>>>>
>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>
>>>
>>> can we postpone to merge this?
>>> After this patch simple vxlan rules won't have enc dst port on decap
>>> rules and currently mlx5 driver must have enc dst port to do the
>>> offload of the decap rule.
>>> I'm checking about this limitation and if we can remove it but
>>> if we can postpone a bit so we won't break users with our nics..
>>>
>>> thanks,
>>> Roi
>>
>>
>> I finished with the testing and checking I needed.
>> I don't have a problem with the patch. if something else will raise i'll
>> start a different discussion.
>>
>> Acked-by: Roi Dayan <roid@nvidia.com>
> 
> 
> I have a concern now. today for an offload driver to offload
> a tunnel decap rule it registers to TC indirect callback.
> 
> Adding a TC rule on a tunnel device calls all registered drivers
> to offload the rule.
> With a single tunnel on a device this is ok but if a system have 
> multiple tunnels with same src/dst ip but different enc dst port,
> adding a tc rule on one of the tunnel devices without matching
> enc dst port will call a driver to offload such a rule in the hw,
> but that rule is potentially matching all sw tunnels. a driver
> will have to match implicit on the the enc_dst_port to distinguish
> between the tunnels.
> 
> OVS doesn't match explicit on enc_dst_port also when having multiple
> tunnels as the in SW the packets will be on the correct
> tunnel device and then pass tc or ovs datapath.
> 
> After this patch offload drivers must to implicit match the
> enc_dst_port to avoid having hw rule matching all the tunnels
> with same src/dst ip. taking the tunnel maybe.
> e.g. for vxlan tunnel from vxlan->cfg.dst_port
> 
> What do you think?

to support what i said about implicit match for geneve and mpls for
example will require kernel change to actually expose their struct
in a header file.

I think we should keep ovs TC to keep explicit match on enc_dst_port.
We can add a comment explaining why.
Ilya Maximets July 12, 2022, 9:23 a.m. UTC | #6
On 7/7/22 09:01, Roi Dayan wrote:
> 
> 
> On 2022-07-07 9:48 AM, Roi Dayan wrote:
>>
>>
>> On 2022-07-06 5:00 PM, Roi Dayan wrote:
>>>
>>>
>>> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>>>
>>>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>>>> create a masked match on this field.
>>>>>>
>>>>>> This change is important as we're not clearing the masks which wasn't
>>>>>> really used, so if OVS requests match on ports, we should use the
>>>>>> mask and clear, otherwise offloading will fail.
>>>>>>
>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>
>>>>> Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.
>>>>>
>>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>
>>>>
>>>> can we postpone to merge this?
>>>> After this patch simple vxlan rules won't have enc dst port on decap
>>>> rules and currently mlx5 driver must have enc dst port to do the
>>>> offload of the decap rule.
>>>> I'm checking about this limitation and if we can remove it but
>>>> if we can postpone a bit so we won't break users with our nics..
>>>>
>>>> thanks,
>>>> Roi
>>>
>>>
>>> I finished with the testing and checking I needed.
>>> I don't have a problem with the patch. if something else will raise i'll
>>> start a different discussion.
>>>
>>> Acked-by: Roi Dayan <roid@nvidia.com>
>>
>>
>> I have a concern now. today for an offload driver to offload
>> a tunnel decap rule it registers to TC indirect callback.
>>
>> Adding a TC rule on a tunnel device calls all registered drivers
>> to offload the rule.
>> With a single tunnel on a device this is ok but if a system have multiple tunnels with same src/dst ip but different enc dst port,
>> adding a tc rule on one of the tunnel devices without matching
>> enc dst port will call a driver to offload such a rule in the hw,
>> but that rule is potentially matching all sw tunnels. a driver
>> will have to match implicit on the the enc_dst_port to distinguish
>> between the tunnels.
>>
>> OVS doesn't match explicit on enc_dst_port also when having multiple
>> tunnels as the in SW the packets will be on the correct
>> tunnel device and then pass tc or ovs datapath.
>>
>> After this patch offload drivers must to implicit match the
>> enc_dst_port to avoid having hw rule matching all the tunnels
>> with same src/dst ip. taking the tunnel maybe.
>> e.g. for vxlan tunnel from vxlan->cfg.dst_port
>>
>> What do you think?
> 
> to support what i said about implicit match for geneve and mpls for
> example will require kernel change to actually expose their struct
> in a header file.

But the match contains the input port (tunnel port) right?  And that
should be sufficient to distinguish tunnels.  Does it mean that HW
offload in the kernel ignores that match?  Or am I reading that wrong?

> 
> I think we should keep ovs TC to keep explicit match on enc_dst_port.
> We can add a comment explaining why.
Roi Dayan July 12, 2022, 9:47 a.m. UTC | #7
On 2022-07-12 12:23 PM, Ilya Maximets wrote:
> On 7/7/22 09:01, Roi Dayan wrote:
>>
>>
>> On 2022-07-07 9:48 AM, Roi Dayan wrote:
>>>
>>>
>>> On 2022-07-06 5:00 PM, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>>>>
>>>>>>
>>>>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>>>>
>>>>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>>>>> create a masked match on this field.
>>>>>>>
>>>>>>> This change is important as we're not clearing the masks which wasn't
>>>>>>> really used, so if OVS requests match on ports, we should use the
>>>>>>> mask and clear, otherwise offloading will fail.
>>>>>>>
>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>
>>>>>> Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.
>>>>>>
>>>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>>
>>>>>
>>>>> can we postpone to merge this?
>>>>> After this patch simple vxlan rules won't have enc dst port on decap
>>>>> rules and currently mlx5 driver must have enc dst port to do the
>>>>> offload of the decap rule.
>>>>> I'm checking about this limitation and if we can remove it but
>>>>> if we can postpone a bit so we won't break users with our nics..
>>>>>
>>>>> thanks,
>>>>> Roi
>>>>
>>>>
>>>> I finished with the testing and checking I needed.
>>>> I don't have a problem with the patch. if something else will raise i'll
>>>> start a different discussion.
>>>>
>>>> Acked-by: Roi Dayan <roid@nvidia.com>
>>>
>>>
>>> I have a concern now. today for an offload driver to offload
>>> a tunnel decap rule it registers to TC indirect callback.
>>>
>>> Adding a TC rule on a tunnel device calls all registered drivers
>>> to offload the rule.
>>> With a single tunnel on a device this is ok but if a system have multiple tunnels with same src/dst ip but different enc dst port,
>>> adding a tc rule on one of the tunnel devices without matching
>>> enc dst port will call a driver to offload such a rule in the hw,
>>> but that rule is potentially matching all sw tunnels. a driver
>>> will have to match implicit on the the enc_dst_port to distinguish
>>> between the tunnels.
>>>
>>> OVS doesn't match explicit on enc_dst_port also when having multiple
>>> tunnels as the in SW the packets will be on the correct
>>> tunnel device and then pass tc or ovs datapath.
>>>
>>> After this patch offload drivers must to implicit match the
>>> enc_dst_port to avoid having hw rule matching all the tunnels
>>> with same src/dst ip. taking the tunnel maybe.
>>> e.g. for vxlan tunnel from vxlan->cfg.dst_port
>>>
>>> What do you think?
>>
>> to support what i said about implicit match for geneve and mpls for
>> example will require kernel change to actually expose their struct
>> in a header file.
> 
> But the match contains the input port (tunnel port) right?  And that
> should be sufficient to distinguish tunnels.  Does it mean that HW
> offload in the kernel ignores that match?  Or am I reading that wrong?
> 

in sw the matching of the rules is done after the packet is already
placed on the sw port. so rule on vxlan1 will patch only packets on
vxlan1.

for offloaded rule, the input port is ignored for tunnels as once
offloaded, the hw match an incoming packet and is not aware of the
sw ports (tunnels).

>>
>> I think we should keep ovs TC to keep explicit match on enc_dst_port.
>> We can add a comment explaining why.
>
Ilya Maximets July 13, 2022, 4:46 p.m. UTC | #8
On 7/12/22 11:47, Roi Dayan wrote:
> 
> 
> On 2022-07-12 12:23 PM, Ilya Maximets wrote:
>> On 7/7/22 09:01, Roi Dayan wrote:
>>>
>>>
>>> On 2022-07-07 9:48 AM, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2022-07-06 5:00 PM, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>>>>>
>>>>>>
>>>>>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>>>>>
>>>>>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>>>>>> create a masked match on this field.
>>>>>>>>
>>>>>>>> This change is important as we're not clearing the masks which wasn't
>>>>>>>> really used, so if OVS requests match on ports, we should use the
>>>>>>>> mask and clear, otherwise offloading will fail.
>>>>>>>>
>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>
>>>>>>> Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.
>>>>>>>
>>>>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>>>
>>>>>>
>>>>>> can we postpone to merge this?
>>>>>> After this patch simple vxlan rules won't have enc dst port on decap
>>>>>> rules and currently mlx5 driver must have enc dst port to do the
>>>>>> offload of the decap rule.
>>>>>> I'm checking about this limitation and if we can remove it but
>>>>>> if we can postpone a bit so we won't break users with our nics..
>>>>>>
>>>>>> thanks,
>>>>>> Roi
>>>>>
>>>>>
>>>>> I finished with the testing and checking I needed.
>>>>> I don't have a problem with the patch. if something else will raise i'll
>>>>> start a different discussion.
>>>>>
>>>>> Acked-by: Roi Dayan <roid@nvidia.com>
>>>>
>>>>
>>>> I have a concern now. today for an offload driver to offload
>>>> a tunnel decap rule it registers to TC indirect callback.
>>>>
>>>> Adding a TC rule on a tunnel device calls all registered drivers
>>>> to offload the rule.
>>>> With a single tunnel on a device this is ok but if a system have multiple tunnels with same src/dst ip but different enc dst port,
>>>> adding a tc rule on one of the tunnel devices without matching
>>>> enc dst port will call a driver to offload such a rule in the hw,
>>>> but that rule is potentially matching all sw tunnels. a driver
>>>> will have to match implicit on the the enc_dst_port to distinguish
>>>> between the tunnels.
>>>>
>>>> OVS doesn't match explicit on enc_dst_port also when having multiple
>>>> tunnels as the in SW the packets will be on the correct
>>>> tunnel device and then pass tc or ovs datapath.
>>>>
>>>> After this patch offload drivers must to implicit match the
>>>> enc_dst_port to avoid having hw rule matching all the tunnels
>>>> with same src/dst ip. taking the tunnel maybe.
>>>> e.g. for vxlan tunnel from vxlan->cfg.dst_port
>>>>
>>>> What do you think?
>>>
>>> to support what i said about implicit match for geneve and mpls for
>>> example will require kernel change to actually expose their struct
>>> in a header file.
>>
>> But the match contains the input port (tunnel port) right?  And that
>> should be sufficient to distinguish tunnels.  Does it mean that HW
>> offload in the kernel ignores that match?  Or am I reading that wrong?
>>
> 
> in sw the matching of the rules is done after the packet is already
> placed on the sw port. so rule on vxlan1 will patch only packets on
> vxlan1.
> 
> for offloaded rule, the input port is ignored for tunnels as once
> offloaded, the hw match an incoming packet and is not aware of the
> sw ports (tunnels).

Sounds like an architectural bug in the kernel.  Should the driver
reject offload of rules with non-exact match on tunnel dst port in
this case?

OTOH, doesn't the original flow that is matching on the tunnel
packet have a match on tp_dst?  I'm not 100% sure that is always
true though, but it probably is in most cases.

> 
>>>
>>> I think we should keep ovs TC to keep explicit match on enc_dst_port.
>>> We can add a comment explaining why.
>>
Roi Dayan July 14, 2022, 12:23 p.m. UTC | #9
On 2022-07-13 7:46 PM, Ilya Maximets wrote:
> On 7/12/22 11:47, Roi Dayan wrote:
>>
>>
>> On 2022-07-12 12:23 PM, Ilya Maximets wrote:
>>> On 7/7/22 09:01, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2022-07-07 9:48 AM, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2022-07-06 5:00 PM, Roi Dayan wrote:
>>>>>>
>>>>>>
>>>>>> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>>>>>>
>>>>>>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>>>>>>> create a masked match on this field.
>>>>>>>>>
>>>>>>>>> This change is important as we're not clearing the masks which wasn't
>>>>>>>>> really used, so if OVS requests match on ports, we should use the
>>>>>>>>> mask and clear, otherwise offloading will fail.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>>
>>>>>>>> Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.
>>>>>>>>
>>>>>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>>>>
>>>>>>>
>>>>>>> can we postpone to merge this?
>>>>>>> After this patch simple vxlan rules won't have enc dst port on decap
>>>>>>> rules and currently mlx5 driver must have enc dst port to do the
>>>>>>> offload of the decap rule.
>>>>>>> I'm checking about this limitation and if we can remove it but
>>>>>>> if we can postpone a bit so we won't break users with our nics..
>>>>>>>
>>>>>>> thanks,
>>>>>>> Roi
>>>>>>
>>>>>>
>>>>>> I finished with the testing and checking I needed.
>>>>>> I don't have a problem with the patch. if something else will raise i'll
>>>>>> start a different discussion.
>>>>>>
>>>>>> Acked-by: Roi Dayan <roid@nvidia.com>
>>>>>
>>>>>
>>>>> I have a concern now. today for an offload driver to offload
>>>>> a tunnel decap rule it registers to TC indirect callback.
>>>>>
>>>>> Adding a TC rule on a tunnel device calls all registered drivers
>>>>> to offload the rule.
>>>>> With a single tunnel on a device this is ok but if a system have multiple tunnels with same src/dst ip but different enc dst port,
>>>>> adding a tc rule on one of the tunnel devices without matching
>>>>> enc dst port will call a driver to offload such a rule in the hw,
>>>>> but that rule is potentially matching all sw tunnels. a driver
>>>>> will have to match implicit on the the enc_dst_port to distinguish
>>>>> between the tunnels.
>>>>>
>>>>> OVS doesn't match explicit on enc_dst_port also when having multiple
>>>>> tunnels as the in SW the packets will be on the correct
>>>>> tunnel device and then pass tc or ovs datapath.
>>>>>
>>>>> After this patch offload drivers must to implicit match the
>>>>> enc_dst_port to avoid having hw rule matching all the tunnels
>>>>> with same src/dst ip. taking the tunnel maybe.
>>>>> e.g. for vxlan tunnel from vxlan->cfg.dst_port
>>>>>
>>>>> What do you think?
>>>>
>>>> to support what i said about implicit match for geneve and mpls for
>>>> example will require kernel change to actually expose their struct
>>>> in a header file.
>>>
>>> But the match contains the input port (tunnel port) right?  And that
>>> should be sufficient to distinguish tunnels.  Does it mean that HW
>>> offload in the kernel ignores that match?  Or am I reading that wrong?
>>>
>>
>> in sw the matching of the rules is done after the packet is already
>> placed on the sw port. so rule on vxlan1 will patch only packets on
>> vxlan1.
>>
>> for offloaded rule, the input port is ignored for tunnels as once
>> offloaded, the hw match an incoming packet and is not aware of the
>> sw ports (tunnels).
> 
> Sounds like an architectural bug in the kernel.  Should the driver
> reject offload of rules with non-exact match on tunnel dst port in
> this case?

The drivers do reject such rules to avoid the issue described.

> 
> OTOH, doesn't the original flow that is matching on the tunnel
> packet have a match on tp_dst?  I'm not 100% sure that is always
> true though, but it probably is in most cases.
> 

I didn't understand the question. what original flow?
with this commit the match doesn't pass tp_dst match to tc.

>>
>>>>
>>>> I think we should keep ovs TC to keep explicit match on enc_dst_port.
>>>> We can add a comment explaining why.
>>>
>
Ilya Maximets July 14, 2022, 12:34 p.m. UTC | #10
On 7/14/22 14:23, Roi Dayan wrote:
> 
> 
> On 2022-07-13 7:46 PM, Ilya Maximets wrote:
>> On 7/12/22 11:47, Roi Dayan wrote:
>>>
>>>
>>> On 2022-07-12 12:23 PM, Ilya Maximets wrote:
>>>> On 7/7/22 09:01, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2022-07-07 9:48 AM, Roi Dayan wrote:
>>>>>>
>>>>>>
>>>>>> On 2022-07-06 5:00 PM, Roi Dayan wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2022-07-06 3:34 PM, Roi Dayan wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2022-07-06 1:26 PM, Eelco Chaudron wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5 Jul 2022, at 0:45, Ilya Maximets wrote:
>>>>>>>>>
>>>>>>>>>> TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK is supported by TC, and OVS may
>>>>>>>>>> create a masked match on this field.
>>>>>>>>>>
>>>>>>>>>> This change is important as we're not clearing the masks which wasn't
>>>>>>>>>> really used, so if OVS requests match on ports, we should use the
>>>>>>>>>> mask and clear, otherwise offloading will fail.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>>>
>>>>>>>>> Changes look good to me, and all system-traffic.at tests that were passing without the change are still passing, including the failing ERSPAN ones.
>>>>>>>>>
>>>>>>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>> can we postpone to merge this?
>>>>>>>> After this patch simple vxlan rules won't have enc dst port on decap
>>>>>>>> rules and currently mlx5 driver must have enc dst port to do the
>>>>>>>> offload of the decap rule.
>>>>>>>> I'm checking about this limitation and if we can remove it but
>>>>>>>> if we can postpone a bit so we won't break users with our nics..
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Roi
>>>>>>>
>>>>>>>
>>>>>>> I finished with the testing and checking I needed.
>>>>>>> I don't have a problem with the patch. if something else will raise i'll
>>>>>>> start a different discussion.
>>>>>>>
>>>>>>> Acked-by: Roi Dayan <roid@nvidia.com>
>>>>>>
>>>>>>
>>>>>> I have a concern now. today for an offload driver to offload
>>>>>> a tunnel decap rule it registers to TC indirect callback.
>>>>>>
>>>>>> Adding a TC rule on a tunnel device calls all registered drivers
>>>>>> to offload the rule.
>>>>>> With a single tunnel on a device this is ok but if a system have multiple tunnels with same src/dst ip but different enc dst port,
>>>>>> adding a tc rule on one of the tunnel devices without matching
>>>>>> enc dst port will call a driver to offload such a rule in the hw,
>>>>>> but that rule is potentially matching all sw tunnels. a driver
>>>>>> will have to match implicit on the the enc_dst_port to distinguish
>>>>>> between the tunnels.
>>>>>>
>>>>>> OVS doesn't match explicit on enc_dst_port also when having multiple
>>>>>> tunnels as the in SW the packets will be on the correct
>>>>>> tunnel device and then pass tc or ovs datapath.
>>>>>>
>>>>>> After this patch offload drivers must to implicit match the
>>>>>> enc_dst_port to avoid having hw rule matching all the tunnels
>>>>>> with same src/dst ip. taking the tunnel maybe.
>>>>>> e.g. for vxlan tunnel from vxlan->cfg.dst_port
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> to support what i said about implicit match for geneve and mpls for
>>>>> example will require kernel change to actually expose their struct
>>>>> in a header file.
>>>>
>>>> But the match contains the input port (tunnel port) right?  And that
>>>> should be sufficient to distinguish tunnels.  Does it mean that HW
>>>> offload in the kernel ignores that match?  Or am I reading that wrong?
>>>>
>>>
>>> in sw the matching of the rules is done after the packet is already
>>> placed on the sw port. so rule on vxlan1 will patch only packets on
>>> vxlan1.
>>>
>>> for offloaded rule, the input port is ignored for tunnels as once
>>> offloaded, the hw match an incoming packet and is not aware of the
>>> sw ports (tunnels).
>>
>> Sounds like an architectural bug in the kernel.  Should the driver
>> reject offload of rules with non-exact match on tunnel dst port in
>> this case?
> 
> The drivers do reject such rules to avoid the issue described.

So the behavior is correct, it just won't be offloaded.  OK.

> 
>>
>> OTOH, doesn't the original flow that is matching on the tunnel
>> packet have a match on tp_dst?  I'm not 100% sure that is always
>> true though, but it probably is in most cases.
>>
> 
> I didn't understand the question. what original flow?
> with this commit the match doesn't pass tp_dst match to tc.

I meant 'tp_dst' match in the first datapath flow that matches
on the encapsulated packet and sends it to a tunnel port for
decapsulation.  The second flow that matches on tunnel metadata
after decapsulation will have tunnel metadata tp_dst with a zero
mask, while the first flow will likely have an exact match on
the tp_dst, because it was used to decide to which tunnel port
this packet should go for decapsulation.

I just was thinking that if the driver/HW can view these two
flows as one, there should be sufficient data to not swallow
the unrelated traffic.  But I don't really know that looks like
on a driver/HW level.

> 
>>>
>>>>>
>>>>> I think we should keep ovs TC to keep explicit match on enc_dst_port.
>>>>> We can add a comment explaining why.
>>>>
>>
diff mbox series

Patch

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index e62687c82..3f166599c 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1651,6 +1651,8 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
         flower.mask.tunnel.ipv6.ipv6_dst = tnl_mask->ipv6_dst;
         flower.mask.tunnel.tos = tnl_mask->ip_tos;
         flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
+        flower.mask.tunnel.tp_src = tnl_mask->tp_src;
+        flower.mask.tunnel.tp_dst = tnl_mask->tp_dst;
         flower.mask.tunnel.id = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
         memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
         memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
@@ -1659,6 +1661,8 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
         memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
         memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
         memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
+        memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
+        memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
         tnl_mask->flags &= ~FLOW_TNL_F_KEY;
 
         /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
diff --git a/lib/tc.c b/lib/tc.c
index bbb8c86f7..17df1ee77 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -760,9 +760,11 @@  nl_parse_flower_tunnel(struct nlattr **attrs, struct tc_flower *flower)
         flower->key.tunnel.ipv6.ipv6_dst =
             nl_attr_get_in6_addr(attrs[TCA_FLOWER_KEY_ENC_IPV6_DST]);
     }
-    if (attrs[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]) {
+    if (attrs[TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK]) {
         flower->key.tunnel.tp_dst =
             nl_attr_get_be16(attrs[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]);
+        flower->mask.tunnel.tp_dst =
+            nl_attr_get_be16(attrs[TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK]);
     }
     if (attrs[TCA_FLOWER_KEY_ENC_IP_TOS_MASK]) {
         flower->key.tunnel.tos =
@@ -2849,13 +2851,14 @@  nl_msg_put_flower_tunnel(struct ofpbuf *request, struct tc_flower *flower)
     struct in6_addr *ipv6_dst_mask = &flower->mask.tunnel.ipv6.ipv6_dst;
     struct in6_addr *ipv6_src = &flower->key.tunnel.ipv6.ipv6_src;
     struct in6_addr *ipv6_dst = &flower->key.tunnel.ipv6.ipv6_dst;
-    ovs_be16 tp_dst = flower->key.tunnel.tp_dst;
     ovs_be32 id = be64_to_be32(flower->key.tunnel.id);
+    ovs_be16 tp_dst = flower->key.tunnel.tp_dst;
     uint8_t tos = flower->key.tunnel.tos;
     uint8_t ttl = flower->key.tunnel.ttl;
     uint8_t tos_mask = flower->mask.tunnel.tos;
     uint8_t ttl_mask = flower->mask.tunnel.ttl;
     ovs_be64 id_mask = flower->mask.tunnel.id;
+    ovs_be16 tp_dst_mask = flower->mask.tunnel.tp_dst;
 
     if (ipv4_dst_mask || ipv4_src_mask) {
         nl_msg_put_be32(request, TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
@@ -2881,8 +2884,10 @@  nl_msg_put_flower_tunnel(struct ofpbuf *request, struct tc_flower *flower)
         nl_msg_put_u8(request, TCA_FLOWER_KEY_ENC_IP_TTL, ttl);
         nl_msg_put_u8(request, TCA_FLOWER_KEY_ENC_IP_TTL_MASK, ttl_mask);
     }
-    if (tp_dst) {
+    if (tp_dst_mask) {
         nl_msg_put_be16(request, TCA_FLOWER_KEY_ENC_UDP_DST_PORT, tp_dst);
+        nl_msg_put_be16(request, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,
+                        tp_dst_mask);
     }
     if (id_mask) {
         nl_msg_put_be32(request, TCA_FLOWER_KEY_ENC_KEY_ID, id);