diff mbox series

[ovs-dev,v2] Fix redundant datapath set ethernet action with NSH Decap

Message ID 20210517134548.2083-1-martinvarghesenokia@gmail.com
State Accepted
Headers show
Series [ovs-dev,v2] Fix redundant datapath set ethernet action with NSH Decap | expand

Commit Message

Martin Varghese May 17, 2021, 1:45 p.m. UTC
From: Martin Varghese <martin.varghese@nokia.com>

When a decap action is applied on NSH header encapsulatiing a
ethernet packet a redundant set mac address action is programmed
to the datapath.

Fixes: f839892a206a ("OF support and translation of generic encap and decap")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
---
Changes in v2:
  - Fixed code styling
  - Added Ack from jan.scheurich@ericsson.com
  - Added Ack from echaudro@redhat.com

 lib/odp-util.c               | 3 ++-
 ofproto/ofproto-dpif-xlate.c | 2 ++
 tests/nsh.at                 | 8 ++++----
 3 files changed, 8 insertions(+), 5 deletions(-)

Comments

Ilya Maximets May 18, 2021, 8:03 p.m. UTC | #1
On 5/17/21 3:45 PM, Martin Varghese wrote:
> From: Martin Varghese <martin.varghese@nokia.com>
> 
> When a decap action is applied on NSH header encapsulatiing a
> ethernet packet a redundant set mac address action is programmed
> to the datapath.
> 
> Fixes: f839892a206a ("OF support and translation of generic encap and decap")
> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> ---
> Changes in v2:
>   - Fixed code styling
>   - Added Ack from jan.scheurich@ericsson.com
>   - Added Ack from echaudro@redhat.com
> 

Hi, Martin.
For some reason this patch triggers frequent failures of the following
unit test:
  
2314. packet-type-aware.at:619: testing ptap - L3 over patch port
...
stdout:
warped
./packet-type-aware.at:726:
    ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort

--- -   2021-05-18 21:57:56.810513366 +0200
+++ /home/i.maximets/work/git/ovs/tests/testsuite.dir/at-groups/2314/stdout     2021-05-18 21:57:56.806609814 +0200
@@ -1,3 +1,3 @@
 flow-dump from the main thread:
-recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
+recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:drop


It fails very frequently in GitHub Actions, but it's harder to make it fail
on my local machine.  Following change to the test allows to reproduce the
failure almost always on my local machine:

diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
index 540cf98f3..01dbc8030 100644
--- a/tests/packet-type-aware.at
+++ b/tests/packet-type-aware.at
@@ -721,7 +721,7 @@ AT_CHECK([
     ovs-appctl netdev-dummy/receive n0 1e2ce92a669e3a6dd2099cab0800450000548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a5800000000de1c020000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
 ], [0], [ignore])
 
-ovs-appctl time/warp 1000
+ovs-appctl time/warp 1000 100
 
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
--

Without your patch I can not make it fail locally even with above wrapping
change applied.

Could you, please, take a look?

Best regards, Ilya Maximets.
Martin Varghese May 19, 2021, 3:26 a.m. UTC | #2
On Tue, May 18, 2021 at 10:03:39PM +0200, Ilya Maximets wrote:
> On 5/17/21 3:45 PM, Martin Varghese wrote:
> > From: Martin Varghese <martin.varghese@nokia.com>
> > 
> > When a decap action is applied on NSH header encapsulatiing a
> > ethernet packet a redundant set mac address action is programmed
> > to the datapath.
> > 
> > Fixes: f839892a206a ("OF support and translation of generic encap and decap")
> > Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> > Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
> > Acked-by: Eelco Chaudron <echaudro@redhat.com>
> > ---
> > Changes in v2:
> >   - Fixed code styling
> >   - Added Ack from jan.scheurich@ericsson.com
> >   - Added Ack from echaudro@redhat.com
> > 
> 
> Hi, Martin.
> For some reason this patch triggers frequent failures of the following
> unit test:
>   
> 2314. packet-type-aware.at:619: testing ptap - L3 over patch port
> ...
> stdout:
> warped
> ./packet-type-aware.at:726:
>     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
> 
> --- -   2021-05-18 21:57:56.810513366 +0200
> +++ /home/i.maximets/work/git/ovs/tests/testsuite.dir/at-groups/2314/stdout     2021-05-18 21:57:56.806609814 +0200
> @@ -1,3 +1,3 @@
>  flow-dump from the main thread:
> -recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
> +recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:drop
> 
> 
> It fails very frequently in GitHub Actions, but it's harder to make it fail
> on my local machine.  Following change to the test allows to reproduce the
> failure almost always on my local machine:
> 
> diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
> index 540cf98f3..01dbc8030 100644
> --- a/tests/packet-type-aware.at
> +++ b/tests/packet-type-aware.at
> @@ -721,7 +721,7 @@ AT_CHECK([
>      ovs-appctl netdev-dummy/receive n0 1e2ce92a669e3a6dd2099cab0800450000548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a5800000000de1c020000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
>  ], [0], [ignore])
>  
> -ovs-appctl time/warp 1000
> +ovs-appctl time/warp 1000 100
>  
>  AT_CHECK([
>      ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
> --
> 
> Without your patch I can not make it fail locally even with above wrapping
> change applied.
> 
> Could you, please, take a look?
>

Hi Ilya

Travis CI was good. i will rebase & try again
https://travis-ci.org/github/martinpattara/ovs/builds/770919003

Regards,
Martin
Ilya Maximets May 19, 2021, 10:26 a.m. UTC | #3
On 5/19/21 5:26 AM, Martin Varghese wrote:
> On Tue, May 18, 2021 at 10:03:39PM +0200, Ilya Maximets wrote:
>> On 5/17/21 3:45 PM, Martin Varghese wrote:
>>> From: Martin Varghese <martin.varghese@nokia.com>
>>>
>>> When a decap action is applied on NSH header encapsulatiing a
>>> ethernet packet a redundant set mac address action is programmed
>>> to the datapath.
>>>
>>> Fixes: f839892a206a ("OF support and translation of generic encap and decap")
>>> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
>>> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>> ---
>>> Changes in v2:
>>>   - Fixed code styling
>>>   - Added Ack from jan.scheurich@ericsson.com
>>>   - Added Ack from echaudro@redhat.com
>>>
>>
>> Hi, Martin.
>> For some reason this patch triggers frequent failures of the following
>> unit test:
>>   
>> 2314. packet-type-aware.at:619: testing ptap - L3 over patch port
>> ...
>> stdout:
>> warped
>> ./packet-type-aware.at:726:
>>     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
>>
>> --- -   2021-05-18 21:57:56.810513366 +0200
>> +++ /home/i.maximets/work/git/ovs/tests/testsuite.dir/at-groups/2314/stdout     2021-05-18 21:57:56.806609814 +0200
>> @@ -1,3 +1,3 @@
>>  flow-dump from the main thread:
>> -recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
>> +recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:drop
>>
>>
>> It fails very frequently in GitHub Actions, but it's harder to make it fail
>> on my local machine.  Following change to the test allows to reproduce the
>> failure almost always on my local machine:
>>
>> diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
>> index 540cf98f3..01dbc8030 100644
>> --- a/tests/packet-type-aware.at
>> +++ b/tests/packet-type-aware.at
>> @@ -721,7 +721,7 @@ AT_CHECK([
>>      ovs-appctl netdev-dummy/receive n0 1e2ce92a669e3a6dd2099cab0800450000548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a5800000000de1c020000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
>>  ], [0], [ignore])
>>  
>> -ovs-appctl time/warp 1000
>> +ovs-appctl time/warp 1000 100
>>  
>>  AT_CHECK([
>>      ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
>> --
>>
>> Without your patch I can not make it fail locally even with above wrapping
>> change applied.
>>
>> Could you, please, take a look?
>>
> 
> Hi Ilya
> 
> Travis CI was good. i will rebase & try again
> https://travis-ci.org/github/martinpattara/ovs/builds/770919003

Travis has only one job with tests enabled and it tests on arm.
GitHub Actions (which is our main CI now) wasn't good:
  https://github.com/martinpattara/ovs/runs/2567454405

Best regards, Ilya Maximets.
Martin Varghese June 7, 2021, 2:47 p.m. UTC | #4
On Wed, May 19, 2021 at 12:26:40PM +0200, Ilya Maximets wrote:
> On 5/19/21 5:26 AM, Martin Varghese wrote:
> > On Tue, May 18, 2021 at 10:03:39PM +0200, Ilya Maximets wrote:
> >> On 5/17/21 3:45 PM, Martin Varghese wrote:
> >>> From: Martin Varghese <martin.varghese@nokia.com>
> >>>
> >>> When a decap action is applied on NSH header encapsulatiing a
> >>> ethernet packet a redundant set mac address action is programmed
> >>> to the datapath.
> >>>
> >>> Fixes: f839892a206a ("OF support and translation of generic encap and decap")
> >>> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> >>> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
> >>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> >>> ---
> >>> Changes in v2:
> >>>   - Fixed code styling
> >>>   - Added Ack from jan.scheurich@ericsson.com
> >>>   - Added Ack from echaudro@redhat.com
> >>>
> >>
> >> Hi, Martin.
> >> For some reason this patch triggers frequent failures of the following
> >> unit test:
> >>   
> >> 2314. packet-type-aware.at:619: testing ptap - L3 over patch port
> >> ...

The test is failing as, during revalidation, NORMAL action is dropping packets.
With these changes, the mac address in flow structures get cleared with decap
action. Hence the NORMAL action drops the packet assuming a loop (SRC and DST mac address are zero). I assume NORMAL action handling in xlate_push_stats_entry is not adapted for l3 packet. The timing at which revalidator gets triggered explains the sporadicity of the issue. The issue is never seen as the MAC addresses in flow structure were not cleared with decap before.

So can we use NORMAL action with a L3 packet ?  Does OVS handle all the L3
use cases with Normal action ? If not, shouldn't we not use NORMAL action in this test case
 
Comments? 


> >> stdout:
> >> warped
> >> ./packet-type-aware.at:726:
> >>     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
> >>
> >> --- -   2021-05-18 21:57:56.810513366 +0200
> >> +++ /home/i.maximets/work/git/ovs/tests/testsuite.dir/at-groups/2314/stdout     2021-05-18 21:57:56.806609814 +0200
> >> @@ -1,3 +1,3 @@
> >>  flow-dump from the main thread:
> >> -recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
> >> +recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:09:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:drop
> >>
> >>
> >> It fails very frequently in GitHub Actions, but it's harder to make it fail
> >> on my local machine.  Following change to the test allows to reproduce the
> >> failure almost always on my local machine:
> >>
> >> diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
> >> index 540cf98f3..01dbc8030 100644
> >> --- a/tests/packet-type-aware.at
> >> +++ b/tests/packet-type-aware.at
> >> @@ -721,7 +721,7 @@ AT_CHECK([
> >>      ovs-appctl netdev-dummy/receive n0 1e2ce92a669e3a6dd2099cab0800450000548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a5800000000de1c020000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
> >>  ], [0], [ignore])
> >>  
> >> -ovs-appctl time/warp 1000
> >> +ovs-appctl time/warp 1000 100
> >>  
> >>  AT_CHECK([
> >>      ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
> >> --
> >>
> >> Without your patch I can not make it fail locally even with above wrapping
> >> change applied.
> >>
> >> Could you, please, take a look?
> >>
> > 
> > Hi Ilya
> > 
> > Travis CI was good. i will rebase & try again
> > https://travis-ci.org/github/martinpattara/ovs/builds/770919003
> 
> Travis has only one job with tests enabled and it tests on arm.
> GitHub Actions (which is our main CI now) wasn't good:
>   https://github.com/martinpattara/ovs/runs/2567454405
> 
> Best regards, Ilya Maximets.
Jan Scheurich June 7, 2021, 4:12 p.m. UTC | #5
> -----Original Message-----
> From: Martin Varghese <martinvarghesenokia@gmail.com>
> Sent: Monday, 7 June, 2021 16:47
> To: Ilya Maximets <i.maximets@ovn.org>
> Cc: dev@openvswitch.org; echaudro@redhat.com; Jan Scheurich
> <jan.scheurich@ericsson.com>; Martin Varghese
> <martin.varghese@nokia.com>
> Subject: Re: [ovs-dev] [PATCH v2] Fix redundant datapath set ethernet action
> with NSH Decap
> 
> On Wed, May 19, 2021 at 12:26:40PM +0200, Ilya Maximets wrote:
> > On 5/19/21 5:26 AM, Martin Varghese wrote:
> > > On Tue, May 18, 2021 at 10:03:39PM +0200, Ilya Maximets wrote:
> > >> On 5/17/21 3:45 PM, Martin Varghese wrote:
> > >>> From: Martin Varghese <martin.varghese@nokia.com>
> > >>>
> > >>> When a decap action is applied on NSH header encapsulatiing a
> > >>> ethernet packet a redundant set mac address action is programmed
> > >>> to the datapath.
> > >>>
> > >>> Fixes: f839892a206a ("OF support and translation of generic encap
> > >>> and decap")
> > >>> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> > >>> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
> > >>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> > >>> ---
> > >>> Changes in v2:
> > >>>   - Fixed code styling
> > >>>   - Added Ack from jan.scheurich@ericsson.com
> > >>>   - Added Ack from echaudro@redhat.com
> > >>>
> > >>
> > >> Hi, Martin.
> > >> For some reason this patch triggers frequent failures of the
> > >> following unit test:
> > >>
> > >> 2314. packet-type-aware.at:619: testing ptap - L3 over patch port
> > >> ...
> 
> The test is failing as, during revalidation, NORMAL action is dropping packets.
> With these changes, the mac address in flow structures get cleared with decap
> action. Hence the NORMAL action drops the packet assuming a loop (SRC and
> DST mac address are zero). I assume NORMAL action handling in
> xlate_push_stats_entry is not adapted for l3 packet. The timing at which
> revalidator gets triggered explains the sporadicity of the issue. The issue is
> never seen as the MAC addresses in flow structure were not cleared with decap
> before.
> 
> So can we use NORMAL action with a L3 packet ?  Does OVS handle all the L3
> use cases with Normal action ? If not, shouldn't we not use NORMAL action in
> this test case
> 
> Comments?
> 

Good catch! Normal flow L2 bridging is of course nonsense for the use case of forwarding an L3 packet. I am surprised that the packet was forwarded at all in the first place. That in itself can be considered a bug. Correctly, a Normal flow should drop non-Ethernet packets, I would say.

To fix the test case I suggest to replace the Normal action in br1 with "output:gre1" in line 700.

> 
> > >> stdout:
> > >> warped
> > >> ./packet-type-aware.at:726:
> > >>     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy |
> > >> strip_used | grep -v ipv6 | sort
> > >>
> > >> --- -   2021-05-18 21:57:56.810513366 +0200
> > >> +++ /home/i.maximets/work/git/ovs/tests/testsuite.dir/at-
> groups/2314/stdout     2021-05-18 21:57:56.806609814 +0200
> > >> @@ -1,3 +1,3 @@
> > >>  flow-dump from the main thread:
> > >> -recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:0
> > >> 9:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag
> > >> =no), packets:1, bytes:98, used:0.0s,
> > >> actions:pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,typ
> > >> e=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800)
> > >> ,ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),
> > >> gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
> > >> +recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(src=3a:6d:d2:0
> > >> +9:9c:ab,dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,fra
> > >> +g=no), packets:1, bytes:98, used:0.0s, actions:drop
> > >>
> > >>
> > >> It fails very frequently in GitHub Actions, but it's harder to make
> > >> it fail on my local machine.  Following change to the test allows
> > >> to reproduce the failure almost always on my local machine:
> > >>
> > >> diff --git a/tests/packet-type-aware.at
> > >> b/tests/packet-type-aware.at index 540cf98f3..01dbc8030 100644
> > >> --- a/tests/packet-type-aware.at
> > >> +++ b/tests/packet-type-aware.at
> > >> @@ -721,7 +721,7 @@ AT_CHECK([
> > >>      ovs-appctl netdev-dummy/receive n0
> > >>
> 1e2ce92a669e3a6dd2099cab0800450000548a83400040011aadc0a80a0ac0a80
> a1
> > >>
> e0800b7170a4d0002fd509a5800000000de1c0200000000001011121314151617
> 18
> > >>
> 191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
> > >>  ], [0], [ignore])
> > >>
> > >> -ovs-appctl time/warp 1000
> > >> +ovs-appctl time/warp 1000 100
> > >>
> > >>  AT_CHECK([
> > >>      ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy |
> > >> strip_used | grep -v ipv6 | sort
> > >> --
> > >>
> > >> Without your patch I can not make it fail locally even with above
> > >> wrapping change applied.
> > >>
> > >> Could you, please, take a look?
> > >>
> > >
> > > Hi Ilya
> > >
> > > Travis CI was good. i will rebase & try again
> > > https://protect2.fireeye.com/v1/url?k=8ec813e4-d1532ae4-8ec8537f-86f
> > > c6812c361-3273f254661f9c75&q=1&e=c83c3bb8-d3c9-439a-8031-
> 72634d0fecc
> > > 2&u=https%3A%2F%2Ftravis-
> ci.org%2Fgithub%2Fmartinpattara%2Fovs%2Fbui
> > > lds%2F770919003
> >
> > Travis has only one job with tests enabled and it tests on arm.
> > GitHub Actions (which is our main CI now) wasn't good:
> >
> > https://protect2.fireeye.com/v1/url?k=c1443ad3-9edf03d3-c1447a48-86fc6
> > 812c361-35a33eb8a7c0b489&q=1&e=c83c3bb8-d3c9-439a-8031-
> 72634d0fecc2&u=
> >
> https%3A%2F%2Fgithub.com%2Fmartinpattara%2Fovs%2Fruns%2F2567454405
> >
> > Best regards, Ilya Maximets.
Ilya Maximets June 16, 2021, 12:13 p.m. UTC | #6
On 6/7/21 6:12 PM, Jan Scheurich wrote:
>> -----Original Message-----
>> From: Martin Varghese <martinvarghesenokia@gmail.com>
>> Sent: Monday, 7 June, 2021 16:47
>> To: Ilya Maximets <i.maximets@ovn.org>
>> Cc: dev@openvswitch.org; echaudro@redhat.com; Jan Scheurich
>> <jan.scheurich@ericsson.com>; Martin Varghese
>> <martin.varghese@nokia.com>
>> Subject: Re: [ovs-dev] [PATCH v2] Fix redundant datapath set ethernet action
>> with NSH Decap
>>
>> On Wed, May 19, 2021 at 12:26:40PM +0200, Ilya Maximets wrote:
>>> On 5/19/21 5:26 AM, Martin Varghese wrote:
>>>> On Tue, May 18, 2021 at 10:03:39PM +0200, Ilya Maximets wrote:
>>>>> On 5/17/21 3:45 PM, Martin Varghese wrote:
>>>>>> From: Martin Varghese <martin.varghese@nokia.com>
>>>>>>
>>>>>> When a decap action is applied on NSH header encapsulatiing a
>>>>>> ethernet packet a redundant set mac address action is programmed
>>>>>> to the datapath.
>>>>>>
>>>>>> Fixes: f839892a206a ("OF support and translation of generic encap
>>>>>> and decap")
>>>>>> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
>>>>>> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
>>>>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>> ---
>>>>>> Changes in v2:
>>>>>>   - Fixed code styling
>>>>>>   - Added Ack from jan.scheurich@ericsson.com
>>>>>>   - Added Ack from echaudro@redhat.com
>>>>>>
>>>>>
>>>>> Hi, Martin.
>>>>> For some reason this patch triggers frequent failures of the
>>>>> following unit test:
>>>>>
>>>>> 2314. packet-type-aware.at:619: testing ptap - L3 over patch port
>>>>> ...
>>
>> The test is failing as, during revalidation, NORMAL action is dropping packets.
>> With these changes, the mac address in flow structures get cleared with decap
>> action. Hence the NORMAL action drops the packet assuming a loop (SRC and
>> DST mac address are zero). I assume NORMAL action handling in
>> xlate_push_stats_entry is not adapted for l3 packet. The timing at which
>> revalidator gets triggered explains the sporadicity of the issue. The issue is
>> never seen as the MAC addresses in flow structure were not cleared with decap
>> before.
>>
>> So can we use NORMAL action with a L3 packet ?  Does OVS handle all the L3
>> use cases with Normal action ? If not, shouldn't we not use NORMAL action in
>> this test case
>>
>> Comments?
>>
> 
> Good catch! Normal flow L2 bridging is of course nonsense for the use case of forwarding an L3 packet. I am surprised that the packet was forwarded at all in the first place. That in itself can be considered a bug. Correctly, a Normal flow should drop non-Ethernet packets, I would say.
> 
> To fix the test case I suggest to replace the Normal action in br1 with "output:gre1" in line 700.

OK.  With the test case fix that I applied earlier today, this patch
works fine for me.  So, applied to master.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/odp-util.c b/lib/odp-util.c
index e1199d1da..e2ac241d7 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -7830,7 +7830,8 @@  commit_set_ether_action(const struct flow *flow, struct flow *base_flow,
     struct offsetof_sizeof ovs_key_ethernet_offsetof_sizeof_arr[] =
         OVS_KEY_ETHERNET_OFFSETOF_SIZEOF_ARR;
 
-    if (flow->packet_type != htonl(PT_ETH)) {
+    if (flow->packet_type != htonl(PT_ETH) ||
+        base_flow->packet_type != htonl(PT_ETH)) {
         return;
     }
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7108c8a30..a6f4ea334 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -6549,6 +6549,8 @@  xlate_generic_decap_action(struct xlate_ctx *ctx,
                  * Delay generating pop_eth to the next commit. */
                 flow->packet_type = htonl(PACKET_TYPE(OFPHTN_ETHERTYPE,
                                                       ntohs(flow->dl_type)));
+                flow->dl_src = eth_addr_zero;
+                flow->dl_dst = eth_addr_zero;
                 ctx->wc->masks.dl_type = OVS_BE16_MAX;
             }
             return false;
diff --git a/tests/nsh.at b/tests/nsh.at
index d5c772ff0..e84134e42 100644
--- a/tests/nsh.at
+++ b/tests/nsh.at
@@ -105,7 +105,7 @@  bridge("br0")
 
 Final flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=11:22:33:44:55:66,dl_type=0x894f,nsh_flags=0,nsh_ttl=63,nsh_mdtype=1,nsh_np=3,nsh_spi=0x1234,nsh_si=255,nsh_c1=0x11223344,nsh_c2=0x0,nsh_c3=0x0,nsh_c4=0x0,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
 Megaflow: recirc_id=0,eth,ip,in_port=1,dl_dst=66:77:88:99:aa:bb,nw_frag=no
-Datapath actions: push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x11223344,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),set(eth(dst=11:22:33:44:55:66)),recirc(0x1)
+Datapath actions: push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x11223344,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),recirc(0x1)
 ])
 
 AT_CHECK([
@@ -139,7 +139,7 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from the main thread:
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x11223344,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),set(eth(dst=11:22:33:44:55:66)),recirc(0x3)
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x11223344,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),recirc(0x3)
 recirc_id(0x3),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:2
 ])
 
@@ -232,7 +232,7 @@  bridge("br0")
 
 Final flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=11:22:33:44:55:66,dl_type=0x894f,nsh_flags=0,nsh_ttl=63,nsh_mdtype=2,nsh_np=3,nsh_spi=0x1234,nsh_si=255,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
 Megaflow: recirc_id=0,eth,ip,in_port=1,dl_dst=66:77:88:99:aa:bb,nw_frag=no
-Datapath actions: push_nsh(flags=0,ttl=63,mdtype=2,np=3,spi=0x1234,si=255,md2=0x10000a041234567820001408fedcba9876543210),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),set(eth(dst=11:22:33:44:55:66)),recirc(0x1)
+Datapath actions: push_nsh(flags=0,ttl=63,mdtype=2,np=3,spi=0x1234,si=255,md2=0x10000a041234567820001408fedcba9876543210),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),recirc(0x1)
 ])
 
 AT_CHECK([
@@ -266,7 +266,7 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from the main thread:
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:push_nsh(flags=0,ttl=63,mdtype=2,np=3,spi=0x1234,si=255,md2=0x10000a041234567820001408fedcba9876543210),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),set(eth(dst=11:22:33:44:55:66)),recirc(0x3)
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:push_nsh(flags=0,ttl=63,mdtype=2,np=3,spi=0x1234,si=255,md2=0x10000a041234567820001408fedcba9876543210),push_eth(src=00:00:00:00:00:00,dst=11:22:33:44:55:66),pop_eth,pop_nsh(),recirc(0x3)
 recirc_id(0x3),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:2
 ])