diff mbox series

[ovs-dev] dpif-netdev: Fix crash when PACKET_OUT is metered

Message ID 20210616000424.29894-1-tony.vanderpeet@alliedtelesis.co.nz
State Changes Requested
Headers show
Series [ovs-dev] dpif-netdev: Fix crash when PACKET_OUT is metered | expand

Commit Message

Tony van der Peet June 16, 2021, 12:04 a.m. UTC
From: Tony van der Peet <tony.vanderpeet@gmail.com>

When a PACKET_OUT has output port of OFPP_TABLE, and the rule
table includes a meter and this causes the packet to be deleted,
stop the packet from being deleted twice by cloning it and setting
it up to be stolen in execution.

Add a test to verify this condition.

Signed-off-by: Tony van der Peet <tony.vanderpeet@gmail.com>
---
 lib/dpif-netdev.c     |  9 ++++++---
 tests/ofproto-dpif.at | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+), 3 deletions(-)

Comments

Ilya Maximets June 16, 2021, 8:05 p.m. UTC | #1
On 6/16/21 2:04 AM, Tony van der Peet wrote:
> From: Tony van der Peet <tony.vanderpeet@gmail.com>
> 
> When a PACKET_OUT has output port of OFPP_TABLE, and the rule
> table includes a meter and this causes the packet to be deleted,
> stop the packet from being deleted twice by cloning it and setting
> it up to be stolen in execution.
> 
> Add a test to verify this condition.
> 
> Signed-off-by: Tony van der Peet <tony.vanderpeet@gmail.com>

Thanks for the patch!  OVS seems to work fine with this change,
but for some reason several OVN unit tests are failing if it
uses OVS with this change applied.

Trying to figure out why...

Best regards, Ilya Maximets.
Tony van der Peet June 16, 2021, 9:54 p.m. UTC | #2
Thanks Ilya. For what it's worth, besides running the OVS unit tests, I put
this new code through our (enhanced) version of oftest (500 test cases)
including a couple I wrote just for this situation.

Tony

On Thu, Jun 17, 2021 at 8:05 AM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 6/16/21 2:04 AM, Tony van der Peet wrote:
> > From: Tony van der Peet <tony.vanderpeet@gmail.com>
> >
> > When a PACKET_OUT has output port of OFPP_TABLE, and the rule
> > table includes a meter and this causes the packet to be deleted,
> > stop the packet from being deleted twice by cloning it and setting
> > it up to be stolen in execution.
> >
> > Add a test to verify this condition.
> >
> > Signed-off-by: Tony van der Peet <tony.vanderpeet@gmail.com>
>
> Thanks for the patch!  OVS seems to work fine with this change,
> but for some reason several OVN unit tests are failing if it
> uses OVS with this change applied.
>
> Trying to figure out why...
>
> Best regards, Ilya Maximets.
>
Ilya Maximets June 17, 2021, 5:47 p.m. UTC | #3
On 6/16/21 11:54 PM, Tony van der Peet wrote:
> Thanks Ilya. For what it's worth, besides running the OVS unit tests, I put this new code through our (enhanced) version of oftest (500 test cases) including a couple I wrote just for this situation.
> 
> Tony
> 
> On Thu, Jun 17, 2021 at 8:05 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> 
>     On 6/16/21 2:04 AM, Tony van der Peet wrote:
>     > From: Tony van der Peet <tony.vanderpeet@gmail.com <mailto:tony.vanderpeet@gmail.com>>
>     >
>     > When a PACKET_OUT has output port of OFPP_TABLE, and the rule
>     > table includes a meter and this causes the packet to be deleted,
>     > stop the packet from being deleted twice by cloning it and setting
>     > it up to be stolen in execution.
>     >
>     > Add a test to verify this condition.
>     >
>     > Signed-off-by: Tony van der Peet <tony.vanderpeet@gmail.com <mailto:tony.vanderpeet@gmail.com>>
> 
>     Thanks for the patch!  OVS seems to work fine with this change,
>     but for some reason several OVN unit tests are failing if it
>     uses OVS with this change applied.
> 
>     Trying to figure out why...
> 
>     Best regards, Ilya Maximets.
> 

OK, I've spent most of a day trying to figure out what is going wrong there
(mostly because OVN tests are insanely complex and very hard to debug).

In short: dpif_execute() expected to modify the packet by higher layers and
by cloning the packet inside dp_netdev_execute_actions() all the
modifications applied to the copy and not propagated to the original packet.

The call stack looks something like this:

1. handle_packet_out()
2. --> ofproto_packet_out_finish()
3.     --> packet_execute()
4.         --> dpif_execute()
5.             --> dpif_operate()
6.                 --> dpif_execute_with_help()
7.                     --> odp_execute_actions()
8.                         ** For each action on a list **:
9.                         --> dpif_execute_helper_cb()
10.                            --> dpif_execute()
11.                                --> dpif_operate()
12.                                    --> dp_netdev_execute()

The problem is on a line 8.  odp_execute_actions() executes actions one by
one expecting the datapath to modify the packet.  In case of OVN unit tests
PACKET_OUT resulted in a flow with 2 actions: tunnel push + output.
So, the first dp_netdev_execute() is called to push the tunnel header to the
packet and the second time it's called to execute OUTPUT action and send the
packet to the destination.  And since we're cloning the packet, tunnel header
was lost and bare packet was sent out.  This caused failure of OVN unit tests.

It's understandable why odp_execute_actions() executes actions one by one.
Some actions requires datapath assistance and others could be executed in
userspace, some actions could be executed in userspace only.  So, we have
to use it this way.  But that means that we need to propagate information
about stolen packet at least to the level of odp_execute_actions().  I tried
to implement that but, it doesn't look good...

So, technically, what I'm suggesting is to copy the content of the packet
back after the execution.  Still kind of ugly, but, at least localized.
We also need to report an error condition if the copy was stolen, so upper
layers will be aware that actions was not successful.
Something like this:

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 08d93c277..860a6c3e7 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -199,6 +199,7 @@ struct dp_packet *dp_packet_clone_data_with_headroom(const void *, size_t,
 void dp_packet_resize(struct dp_packet *b, size_t new_headroom,
                       size_t new_tailroom);
 static inline void dp_packet_delete(struct dp_packet *);
+static inline void dp_packet_swap(struct dp_packet *, struct dp_packet *);
 
 static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
                                  size_t size);
@@ -256,6 +257,16 @@ dp_packet_delete(struct dp_packet *b)
     }
 }
 
+/* Swaps content of two packets. */
+static inline void
+dp_packet_swap(struct dp_packet *a, struct dp_packet *b)
+{
+    struct dp_packet c = *a;
+
+    *a = *b;
+    *b = c;
+}
+
 /* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to
  * byte 'offset'.  Otherwise, returns a null pointer. */
 static inline void *
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8fa7eb6d4..a660a2fd6 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4168,7 +4168,11 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
                                flow_hash_5tuple(execute->flow, 0));
     }
 
-    dp_packet_batch_init_packet(&pp, execute->packet);
+    /* Making a copy because the packet might be stolen during the execution
+     * and caller might still need it.  */
+    struct dp_packet *packet_clone = dp_packet_clone(execute->packet);
+    dp_packet_batch_init_packet(&pp, packet_clone);
+
     dp_netdev_execute_actions(pmd, &pp, false, execute->flow,
                               execute->actions, execute->actions_len);
     dp_netdev_pmd_flush_output_packets(pmd, true);
@@ -4178,6 +4182,20 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
         dp_netdev_pmd_unref(pmd);
     }
 
+    if (dp_packet_batch_size(&pp)) {
+        /* Packet wasn't dropped during the execution.  Swapping content with
+         * the original packet, because the caller might expect actions to
+         * modify it. */
+        dp_packet_swap(execute->packet, packet_clone);
+        dp_packet_delete_batch(&pp, true);
+    } else {
+        /* Packet was stolen during the execution due to some error.  We need
+         * to flag that for the caller, so it will not proceed with other
+         * actions on this packet.  Returning EAGAIN because we just don't
+         * know what execlty happened. */
+        return EAGAIN;
+    }
+
     return 0;
 }
 
---

For the test, following addition will be needed, because the error now will
be printed:

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 522cc1318..1b55aa0c5 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -9537,7 +9537,9 @@ meter=1 pktps bands=
 type=drop rate=1
 ])
 
-OVS_VSWITCHD_STOP
+OVS_WAIT_UNTIL([grep 'execute meter(0),2 failed' ovs-vswitchd.log])
+
+OVS_VSWITCHD_STOP(["/execute meter(0),2 failed/d"])
 AT_CLEANUP
 
 AT_SETUP([ofproto-dpif - ICMPv6])
---

With above changes both OVS and OVN testsuites works fine.

Any thoughts?

Best regards, Ilya Maximets.
Ilya Maximets June 17, 2021, 6:26 p.m. UTC | #4
On 6/17/21 7:47 PM, Ilya Maximets wrote:
> On 6/16/21 11:54 PM, Tony van der Peet wrote:
>> Thanks Ilya. For what it's worth, besides running the OVS unit tests, I put this new code through our (enhanced) version of oftest (500 test cases) including a couple I wrote just for this situation.
>>
>> Tony
>>
>> On Thu, Jun 17, 2021 at 8:05 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>
>>     On 6/16/21 2:04 AM, Tony van der Peet wrote:
>>     > From: Tony van der Peet <tony.vanderpeet@gmail.com <mailto:tony.vanderpeet@gmail.com>>
>>     >
>>     > When a PACKET_OUT has output port of OFPP_TABLE, and the rule
>>     > table includes a meter and this causes the packet to be deleted,
>>     > stop the packet from being deleted twice by cloning it and setting
>>     > it up to be stolen in execution.
>>     >
>>     > Add a test to verify this condition.
>>     >
>>     > Signed-off-by: Tony van der Peet <tony.vanderpeet@gmail.com <mailto:tony.vanderpeet@gmail.com>>
>>
>>     Thanks for the patch!  OVS seems to work fine with this change,
>>     but for some reason several OVN unit tests are failing if it
>>     uses OVS with this change applied.
>>
>>     Trying to figure out why...
>>
>>     Best regards, Ilya Maximets.
>>
> 
> OK, I've spent most of a day trying to figure out what is going wrong there
> (mostly because OVN tests are insanely complex and very hard to debug).
> 
> In short: dpif_execute() expected to modify the packet by higher layers and
> by cloning the packet inside dp_netdev_execute_actions() all the
> modifications applied to the copy and not propagated to the original packet.
> 
> The call stack looks something like this:
> 
> 1. handle_packet_out()
> 2. --> ofproto_packet_out_finish()
> 3.     --> packet_execute()
> 4.         --> dpif_execute()
> 5.             --> dpif_operate()
> 6.                 --> dpif_execute_with_help()
> 7.                     --> odp_execute_actions()
> 8.                         ** For each action on a list **:
> 9.                         --> dpif_execute_helper_cb()
> 10.                            --> dpif_execute()
> 11.                                --> dpif_operate()
> 12.                                    --> dp_netdev_execute()
> 
> The problem is on a line 8.  odp_execute_actions() executes actions one by
> one expecting the datapath to modify the packet.  In case of OVN unit tests
> PACKET_OUT resulted in a flow with 2 actions: tunnel push + output.
> So, the first dp_netdev_execute() is called to push the tunnel header to the
> packet and the second time it's called to execute OUTPUT action and send the
> packet to the destination.  And since we're cloning the packet, tunnel header
> was lost and bare packet was sent out.  This caused failure of OVN unit tests.
> 
> It's understandable why odp_execute_actions() executes actions one by one.
> Some actions requires datapath assistance and others could be executed in
> userspace, some actions could be executed in userspace only.  So, we have
> to use it this way.  But that means that we need to propagate information
> about stolen packet at least to the level of odp_execute_actions().  I tried
> to implement that but, it doesn't look good...
> 
> So, technically, what I'm suggesting is to copy the content of the packet
> back after the execution.  Still kind of ugly, but, at least localized.
> We also need to report an error condition if the copy was stolen, so upper
> layers will be aware that actions was not successful.
> Something like this:
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 08d93c277..860a6c3e7 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -199,6 +199,7 @@ struct dp_packet *dp_packet_clone_data_with_headroom(const void *, size_t,
>  void dp_packet_resize(struct dp_packet *b, size_t new_headroom,
>                        size_t new_tailroom);
>  static inline void dp_packet_delete(struct dp_packet *);
> +static inline void dp_packet_swap(struct dp_packet *, struct dp_packet *);
>  
>  static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
>                                   size_t size);
> @@ -256,6 +257,16 @@ dp_packet_delete(struct dp_packet *b)
>      }
>  }
>  
> +/* Swaps content of two packets. */
> +static inline void
> +dp_packet_swap(struct dp_packet *a, struct dp_packet *b)
> +{

Ugh.  This function can not be used for packets originated from
afxdp or dpdk.  Luckily, I think, that dpif_netdev_execute() should
never work with such packets, but a bunch of assertions here is
needed.  DPBUF_STACK also doesn't sound like a good idea, because
it's intended to be immutable.  So:

    ovs_assert(a->source == DPBUF_MALLOC || a->source == DPBUF_STUB);
    ovs_assert(b->source == DPBUF_MALLOC || b->source == DPBUF_STUB);

> +    struct dp_packet c = *a;
> +
> +    *a = *b;
> +    *b = c;
> +}
> +
>  /* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to
>   * byte 'offset'.  Otherwise, returns a null pointer. */
>  static inline void *
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 8fa7eb6d4..a660a2fd6 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4168,7 +4168,11 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
>                                 flow_hash_5tuple(execute->flow, 0));
>      }
>  
> -    dp_packet_batch_init_packet(&pp, execute->packet);
> +    /* Making a copy because the packet might be stolen during the execution
> +     * and caller might still need it.  */
> +    struct dp_packet *packet_clone = dp_packet_clone(execute->packet);
> +    dp_packet_batch_init_packet(&pp, packet_clone);
> +
>      dp_netdev_execute_actions(pmd, &pp, false, execute->flow,
>                                execute->actions, execute->actions_len);
>      dp_netdev_pmd_flush_output_packets(pmd, true);
> @@ -4178,6 +4182,20 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
>          dp_netdev_pmd_unref(pmd);
>      }
>  
> +    if (dp_packet_batch_size(&pp)) {
> +        /* Packet wasn't dropped during the execution.  Swapping content with
> +         * the original packet, because the caller might expect actions to
> +         * modify it. */
> +        dp_packet_swap(execute->packet, packet_clone);
> +        dp_packet_delete_batch(&pp, true);
> +    } else {
> +        /* Packet was stolen during the execution due to some error.  We need
> +         * to flag that for the caller, so it will not proceed with other
> +         * actions on this packet.  Returning EAGAIN because we just don't
> +         * know what execlty happened. */
> +        return EAGAIN;
> +    }
> +
>      return 0;
>  }
>  
> ---
> 
> For the test, following addition will be needed, because the error now will
> be printed:
> 
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index 522cc1318..1b55aa0c5 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -9537,7 +9537,9 @@ meter=1 pktps bands=
>  type=drop rate=1
>  ])
>  
> -OVS_VSWITCHD_STOP
> +OVS_WAIT_UNTIL([grep 'execute meter(0),2 failed' ovs-vswitchd.log])
> +
> +OVS_VSWITCHD_STOP(["/execute meter(0),2 failed/d"])
>  AT_CLEANUP
>  
>  AT_SETUP([ofproto-dpif - ICMPv6])
> ---
> 
> With above changes both OVS and OVN testsuites works fine.
> 
> Any thoughts?
> 
> Best regards, Ilya Maximets.
>
Ben Pfaff June 17, 2021, 9:59 p.m. UTC | #5
All these flags for stealing, allowing stealing, blah blah, are just
ways to do some kind of dumb reference counting without actually have a
reference count.  When it gets super complex like this, maybe
introducing a reference count is the way to go.  It would be a bigger
change, but perhaps more maintainable over time.
Ilya Maximets June 18, 2021, 1:54 p.m. UTC | #6
On 6/17/21 11:59 PM, Ben Pfaff wrote:
> All these flags for stealing, allowing stealing, blah blah, are just
> ways to do some kind of dumb reference counting without actually have a
> reference count.  When it gets super complex like this, maybe
> introducing a reference count is the way to go.  It would be a bigger
> change, but perhaps more maintainable over time.
> 

Yes, I absolutely agree.  We just removed one of these hacky flags from
the struct dp_packet_batch and we need to get rid of the rest of them.

One thing that bothers me is that some parts of the code treats
'should_steal=false' as "do not modify".  For example, I don't really
understand why these functions are carrying cutlen around and clones
the packet before truncating if 'should_steal=false'.

Some action executors also have optimizations that allows to not clone
the packet before the fatal action if this action is the last one.

So, in general, yes, we need to get rid of these flags and accurately
re-work all the packet paths.  And yes, this would be not a small
change.

For now, I think, we need to find a less ugly as possible solution for
existing crash (maybe the one that I suggested), so we will be able to
backport it and continue working on correct reference counting.

What do you think?  Tony?  Aaron?

Best regards, Ilya Maximets.
Aaron Conole June 18, 2021, 2:50 p.m. UTC | #7
Ilya Maximets <i.maximets@ovn.org> writes:

> On 6/17/21 11:59 PM, Ben Pfaff wrote:
>> All these flags for stealing, allowing stealing, blah blah, are just
>> ways to do some kind of dumb reference counting without actually have a
>> reference count.  When it gets super complex like this, maybe
>> introducing a reference count is the way to go.  It would be a bigger
>> change, but perhaps more maintainable over time.
>> 
>
> Yes, I absolutely agree.  We just removed one of these hacky flags from
> the struct dp_packet_batch and we need to get rid of the rest of them.

+1 from me - it's a bigger scoped change, but over-all, I think it's a
better one, and makes the most sense.

> One thing that bothers me is that some parts of the code treats
> 'should_steal=false' as "do not modify".  For example, I don't really
> understand why these functions are carrying cutlen around and clones
> the packet before truncating if 'should_steal=false'.
>
> Some action executors also have optimizations that allows to not clone
> the packet before the fatal action if this action is the last one.
>
> So, in general, yes, we need to get rid of these flags and accurately
> re-work all the packet paths.  And yes, this would be not a small
> change.
>
> For now, I think, we need to find a less ugly as possible solution for
> existing crash (maybe the one that I suggested), so we will be able to
> backport it and continue working on correct reference counting.
>
> What do you think?  Tony?  Aaron?

That makes sense to me, and I hope we will actually work on the
ref-count stuff.  I had started taking a look a few weeks back, but the
idea of 'steal' is pretty ingrained throughout the code, so getting a
ref-count semantic correct is a big effort.  As an example, the
odp-execute area expects that each sub-action will have its own copy to
modify (and this is documented with the 'should_steal' semantics).
Taking that out will be a big rework in that area.  I think it makes
the most sense, and we probably could implement something like COW to
cover those cases where we really need to modify a packet without
propagating those changes back up.

> Best regards, Ilya Maximets.
Tony van der Peet June 19, 2021, 2:11 a.m. UTC | #8
Hi all

I am in favour of a better fix, bandaids tend to come back and bite you in the end anyway. So, will be happy to help with this effort, though I am probably going to have to defer to the rest of you when it comes to actually knowing what to do in this area.

Cheers
Tony
Ilya Maximets June 21, 2021, 8:58 p.m. UTC | #9
On 6/19/21 4:11 AM, Tony van der Peet wrote:
> Hi all
> 
> I am in favour of a better fix, bandaids tend to come back and bite you in the end anyway. So, will be happy to help with this effort, though I am probably going to have to defer to the rest of you when it comes to actually knowing what to do in this area.
> 

It's great to have a full correct solution, but unfortunately, I don't
think that we can come up with something like a full and correct
reference counting in a short period of time.  But we still need to fix
a crash that is pretty easy to trigger.  I'm also nervous that OVN is
using meters more and more (e.g. control plane protection patch set)
and they might actually hit this issue at the high load.  Another point
is that we need a fix that we can backport to LTS, and I don't think that
reference counting would be a small change that we can easily backport
without worrying about breaking something else.

All in all, I think that we need to come up with a "bandaid" solution
and work further on correct reference counting after that.

And we also need to create a unit test that will mimic packet-out that
I encountered in OVN tests, so we can test this kind of behavior in OVS.

For the reference, the packet-out generated by OVN controller had a few
set() actions and the resubmit() to a different table. And this
table had rules leading to packet output to a tunnel port, resulting
in a tunnel push + output datapath actions.

> Cheers
> Tony
> ________________________________________
> From: Aaron Conole <aconole@redhat.com>
> Sent: Saturday, 19 June 2021 2:50 a.m.
> To: Ilya Maximets
> Cc: Ben Pfaff; Tony van der Peet; dev@openvswitch.org; Tony van der Peet
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered
> 
> Ilya Maximets <i.maximets@ovn.org> writes:
> 
>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
>>> All these flags for stealing, allowing stealing, blah blah, are just
>>> ways to do some kind of dumb reference counting without actually have a
>>> reference count.  When it gets super complex like this, maybe
>>> introducing a reference count is the way to go.  It would be a bigger
>>> change, but perhaps more maintainable over time.
>>>
>>
>> Yes, I absolutely agree.  We just removed one of these hacky flags from
>> the struct dp_packet_batch and we need to get rid of the rest of them.
> 
> +1 from me - it's a bigger scoped change, but over-all, I think it's a
> better one, and makes the most sense.
> 
>> One thing that bothers me is that some parts of the code treats
>> 'should_steal=false' as "do not modify".  For example, I don't really
>> understand why these functions are carrying cutlen around and clones
>> the packet before truncating if 'should_steal=false'.
>>
>> Some action executors also have optimizations that allows to not clone
>> the packet before the fatal action if this action is the last one.
>>
>> So, in general, yes, we need to get rid of these flags and accurately
>> re-work all the packet paths.  And yes, this would be not a small
>> change.
>>
>> For now, I think, we need to find a less ugly as possible solution for
>> existing crash (maybe the one that I suggested), so we will be able to
>> backport it and continue working on correct reference counting.
>>
>> What do you think?  Tony?  Aaron?
> 
> That makes sense to me, and I hope we will actually work on the
> ref-count stuff.  I had started taking a look a few weeks back, but the
> idea of 'steal' is pretty ingrained throughout the code, so getting a
> ref-count semantic correct is a big effort.  As an example, the
> odp-execute area expects that each sub-action will have its own copy to
> modify (and this is documented with the 'should_steal' semantics).
> Taking that out will be a big rework in that area.  I think it makes
> the most sense, and we probably could implement something like COW to
> cover those cases where we really need to modify a packet without
> propagating those changes back up.
> 
>> Best regards, Ilya Maximets.
>
Ilya Maximets Aug. 11, 2021, 10:51 p.m. UTC | #10
Hi.

It's been some time since the last email in this thread.  I spent more
time debugging and analyzing the code and, I think, I got to the bottom
of this.  First of all, I had a question why the set of actions in that
particular unit test in OVN needs help for execution, so it hits the
dpif_execute_with_help() path?  The answer is the action set that looks
like this:

actions:set(eth(dst=ff:ff:ff:ff:ff:ff)),set(arp(sip=172.16.1.1)),
        clone(tnl_push(tnl_port(6081),...,out_port(1)),101)

Here the 'set(arp())' action is always marked as SLOW_ACTION and leads
to 'needs_help == true'.  Changes of arp fields is not supported by
kernel.  We may argue that userspace datapath supports it and we should
be able to execute these actions inside the datapath avoiding the
slow path execution process, but there are other cases where slow path
execution can be requested.  One of the most obvious is a direct request
of OFPACT_DEBUG_SLOW.  So, it's hard to completely avoid slow path
execution even for a userspace datapath.

Anyway, dpif_execute_with_help() or, more precisely, the callback
dpif_execute_helper_cb() seems to be the source of all issues.  The
overall design of the actions execution seems to be flawed here.
This function makes several opposite assumptions:

1. It assumes that dpif_execute() can't modify packets and that
   there is no way to get result of the execution from the datapath.

2. It assumes that dpif_execute() can and will modify the packet.

Assumption 1 leads to a workaround for 'meter' actions which are
accumulated from several calls and pushed along with the first terminal
action (output, userspace or ct).  This assumption is true for the
kernel datapath, because the packet basically gets copied to the kernel.

Assumption 2 at the same time allows to implement tunneling for the
userpsace datapath.  dpif_execute_helper_cb() relies on the fact
that dpif_execute(OVS_ACTION_ATTR_TUNNEL_PUSH) will modify the packet
by pushing the tunnel header.

So, this inconsistent behavior for different actions breaks the
dpif abstraction and doesn't allow us to correctly implement the
dpif_execute() interface consistently across different datapath
interfaces.

Even though I see what is going on here, I don't currently see a way
to fix this architectural/design flaw.  Definitely packet reference
counters will still help in correct tracking of dp_packets and we
need that.  I know that Aaron started working on this, but it will
take a while before we will have a working solution.  I'll try
to think how to actually re-work the actions execution path so we
will not have this kind of self-contradicting functions and APIs
(these inconsistencies are really the main thing that complicates
the code and doesn't allow to understand it without a deep dive).
If someone has ideas on that front, I'd like to hear.

Meanwhile, I still think that we need to get the "bandaid" workaround
(clone + data copy) in, so we can fix the actual crash.  As far as
I understand now, change like this should not have any unexpected
side effect.

Tony, do you want to sent the workaround patch?  Or else I can do that.

Bets regards, Ilya Maximets.

On 6/21/21 10:58 PM, Ilya Maximets wrote:
> On 6/19/21 4:11 AM, Tony van der Peet wrote:
>> Hi all
>>
>> I am in favour of a better fix, bandaids tend to come back and bite you in the end anyway. So, will be happy to help with this effort, though I am probably going to have to defer to the rest of you when it comes to actually knowing what to do in this area.
>>
> 
> It's great to have a full correct solution, but unfortunately, I don't
> think that we can come up with something like a full and correct
> reference counting in a short period of time.  But we still need to fix
> a crash that is pretty easy to trigger.  I'm also nervous that OVN is
> using meters more and more (e.g. control plane protection patch set)
> and they might actually hit this issue at the high load.  Another point
> is that we need a fix that we can backport to LTS, and I don't think that
> reference counting would be a small change that we can easily backport
> without worrying about breaking something else.
> 
> All in all, I think that we need to come up with a "bandaid" solution
> and work further on correct reference counting after that.
> 
> And we also need to create a unit test that will mimic packet-out that
> I encountered in OVN tests, so we can test this kind of behavior in OVS.
> 
> For the reference, the packet-out generated by OVN controller had a few
> set() actions and the resubmit() to a different table. And this
> table had rules leading to packet output to a tunnel port, resulting
> in a tunnel push + output datapath actions.
> 
>> Cheers
>> Tony
>> ________________________________________
>> From: Aaron Conole <aconole@redhat.com>
>> Sent: Saturday, 19 June 2021 2:50 a.m.
>> To: Ilya Maximets
>> Cc: Ben Pfaff; Tony van der Peet; dev@openvswitch.org; Tony van der Peet
>> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered
>>
>> Ilya Maximets <i.maximets@ovn.org> writes:
>>
>>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
>>>> All these flags for stealing, allowing stealing, blah blah, are just
>>>> ways to do some kind of dumb reference counting without actually have a
>>>> reference count.  When it gets super complex like this, maybe
>>>> introducing a reference count is the way to go.  It would be a bigger
>>>> change, but perhaps more maintainable over time.
>>>>
>>>
>>> Yes, I absolutely agree.  We just removed one of these hacky flags from
>>> the struct dp_packet_batch and we need to get rid of the rest of them.
>>
>> +1 from me - it's a bigger scoped change, but over-all, I think it's a
>> better one, and makes the most sense.
>>
>>> One thing that bothers me is that some parts of the code treats
>>> 'should_steal=false' as "do not modify".  For example, I don't really
>>> understand why these functions are carrying cutlen around and clones
>>> the packet before truncating if 'should_steal=false'.
>>>
>>> Some action executors also have optimizations that allows to not clone
>>> the packet before the fatal action if this action is the last one.
>>>
>>> So, in general, yes, we need to get rid of these flags and accurately
>>> re-work all the packet paths.  And yes, this would be not a small
>>> change.
>>>
>>> For now, I think, we need to find a less ugly as possible solution for
>>> existing crash (maybe the one that I suggested), so we will be able to
>>> backport it and continue working on correct reference counting.
>>>
>>> What do you think?  Tony?  Aaron?
>>
>> That makes sense to me, and I hope we will actually work on the
>> ref-count stuff.  I had started taking a look a few weeks back, but the
>> idea of 'steal' is pretty ingrained throughout the code, so getting a
>> ref-count semantic correct is a big effort.  As an example, the
>> odp-execute area expects that each sub-action will have its own copy to
>> modify (and this is documented with the 'should_steal' semantics).
>> Taking that out will be a big rework in that area.  I think it makes
>> the most sense, and we probably could implement something like COW to
>> cover those cases where we really need to modify a packet without
>> propagating those changes back up.
>>
>>> Best regards, Ilya Maximets.
>>
>
Tony van der Peet Aug. 13, 2021, 4:25 a.m. UTC | #11
Hi Ilya et al

I can do that, but it may take a few days to get to it. Thanks for the
detailed analysis, I'll try to get my head around that too and contribute
to a fix.

Tony

On Thu, Aug 12, 2021 at 10:51 AM Ilya Maximets <i.maximets@ovn.org> wrote:

> Hi.
>
> It's been some time since the last email in this thread.  I spent more
> time debugging and analyzing the code and, I think, I got to the bottom
> of this.  First of all, I had a question why the set of actions in that
> particular unit test in OVN needs help for execution, so it hits the
> dpif_execute_with_help() path?  The answer is the action set that looks
> like this:
>
> actions:set(eth(dst=ff:ff:ff:ff:ff:ff)),set(arp(sip=172.16.1.1)),
>         clone(tnl_push(tnl_port(6081),...,out_port(1)),101)
>
> Here the 'set(arp())' action is always marked as SLOW_ACTION and leads
> to 'needs_help == true'.  Changes of arp fields is not supported by
> kernel.  We may argue that userspace datapath supports it and we should
> be able to execute these actions inside the datapath avoiding the
> slow path execution process, but there are other cases where slow path
> execution can be requested.  One of the most obvious is a direct request
> of OFPACT_DEBUG_SLOW.  So, it's hard to completely avoid slow path
> execution even for a userspace datapath.
>
> Anyway, dpif_execute_with_help() or, more precisely, the callback
> dpif_execute_helper_cb() seems to be the source of all issues.  The
> overall design of the actions execution seems to be flawed here.
> This function makes several opposite assumptions:
>
> 1. It assumes that dpif_execute() can't modify packets and that
>    there is no way to get result of the execution from the datapath.
>
> 2. It assumes that dpif_execute() can and will modify the packet.
>
> Assumption 1 leads to a workaround for 'meter' actions which are
> accumulated from several calls and pushed along with the first terminal
> action (output, userspace or ct).  This assumption is true for the
> kernel datapath, because the packet basically gets copied to the kernel.
>
> Assumption 2 at the same time allows to implement tunneling for the
> userpsace datapath.  dpif_execute_helper_cb() relies on the fact
> that dpif_execute(OVS_ACTION_ATTR_TUNNEL_PUSH) will modify the packet
> by pushing the tunnel header.
>
> So, this inconsistent behavior for different actions breaks the
> dpif abstraction and doesn't allow us to correctly implement the
> dpif_execute() interface consistently across different datapath
> interfaces.
>
> Even though I see what is going on here, I don't currently see a way
> to fix this architectural/design flaw.  Definitely packet reference
> counters will still help in correct tracking of dp_packets and we
> need that.  I know that Aaron started working on this, but it will
> take a while before we will have a working solution.  I'll try
> to think how to actually re-work the actions execution path so we
> will not have this kind of self-contradicting functions and APIs
> (these inconsistencies are really the main thing that complicates
> the code and doesn't allow to understand it without a deep dive).
> If someone has ideas on that front, I'd like to hear.
>
> Meanwhile, I still think that we need to get the "bandaid" workaround
> (clone + data copy) in, so we can fix the actual crash.  As far as
> I understand now, change like this should not have any unexpected
> side effect.
>
> Tony, do you want to sent the workaround patch?  Or else I can do that.
>
> Bets regards, Ilya Maximets.
>
> On 6/21/21 10:58 PM, Ilya Maximets wrote:
> > On 6/19/21 4:11 AM, Tony van der Peet wrote:
> >> Hi all
> >>
> >> I am in favour of a better fix, bandaids tend to come back and bite you
> in the end anyway. So, will be happy to help with this effort, though I am
> probably going to have to defer to the rest of you when it comes to
> actually knowing what to do in this area.
> >>
> >
> > It's great to have a full correct solution, but unfortunately, I don't
> > think that we can come up with something like a full and correct
> > reference counting in a short period of time.  But we still need to fix
> > a crash that is pretty easy to trigger.  I'm also nervous that OVN is
> > using meters more and more (e.g. control plane protection patch set)
> > and they might actually hit this issue at the high load.  Another point
> > is that we need a fix that we can backport to LTS, and I don't think that
> > reference counting would be a small change that we can easily backport
> > without worrying about breaking something else.
> >
> > All in all, I think that we need to come up with a "bandaid" solution
> > and work further on correct reference counting after that.
> >
> > And we also need to create a unit test that will mimic packet-out that
> > I encountered in OVN tests, so we can test this kind of behavior in OVS.
> >
> > For the reference, the packet-out generated by OVN controller had a few
> > set() actions and the resubmit() to a different table. And this
> > table had rules leading to packet output to a tunnel port, resulting
> > in a tunnel push + output datapath actions.
> >
> >> Cheers
> >> Tony
> >> ________________________________________
> >> From: Aaron Conole <aconole@redhat.com>
> >> Sent: Saturday, 19 June 2021 2:50 a.m.
> >> To: Ilya Maximets
> >> Cc: Ben Pfaff; Tony van der Peet; dev@openvswitch.org; Tony van der
> Peet
> >> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT
> is metered
> >>
> >> Ilya Maximets <i.maximets@ovn.org> writes:
> >>
> >>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
> >>>> All these flags for stealing, allowing stealing, blah blah, are just
> >>>> ways to do some kind of dumb reference counting without actually have
> a
> >>>> reference count.  When it gets super complex like this, maybe
> >>>> introducing a reference count is the way to go.  It would be a bigger
> >>>> change, but perhaps more maintainable over time.
> >>>>
> >>>
> >>> Yes, I absolutely agree.  We just removed one of these hacky flags from
> >>> the struct dp_packet_batch and we need to get rid of the rest of them.
> >>
> >> +1 from me - it's a bigger scoped change, but over-all, I think it's a
> >> better one, and makes the most sense.
> >>
> >>> One thing that bothers me is that some parts of the code treats
> >>> 'should_steal=false' as "do not modify".  For example, I don't really
> >>> understand why these functions are carrying cutlen around and clones
> >>> the packet before truncating if 'should_steal=false'.
> >>>
> >>> Some action executors also have optimizations that allows to not clone
> >>> the packet before the fatal action if this action is the last one.
> >>>
> >>> So, in general, yes, we need to get rid of these flags and accurately
> >>> re-work all the packet paths.  And yes, this would be not a small
> >>> change.
> >>>
> >>> For now, I think, we need to find a less ugly as possible solution for
> >>> existing crash (maybe the one that I suggested), so we will be able to
> >>> backport it and continue working on correct reference counting.
> >>>
> >>> What do you think?  Tony?  Aaron?
> >>
> >> That makes sense to me, and I hope we will actually work on the
> >> ref-count stuff.  I had started taking a look a few weeks back, but the
> >> idea of 'steal' is pretty ingrained throughout the code, so getting a
> >> ref-count semantic correct is a big effort.  As an example, the
> >> odp-execute area expects that each sub-action will have its own copy to
> >> modify (and this is documented with the 'should_steal' semantics).
> >> Taking that out will be a big rework in that area.  I think it makes
> >> the most sense, and we probably could implement something like COW to
> >> cover those cases where we really need to modify a packet without
> >> propagating those changes back up.
> >>
> >>> Best regards, Ilya Maximets.
> >>
> >
>
>
Ilya Maximets Aug. 13, 2021, 5:26 p.m. UTC | #12
On 8/13/21 6:25 AM, Tony van der Peet wrote:
> Hi Ilya et al
> 
> I can do that, but it may take a few days to get to it. Thanks for the detailed analysis, I'll try to get my head around that too and contribute to a fix.

Thanks!  I'm planning to make a series of stable releases in a couple
of weeks after the 2.16 release (which is on Monday). It would be great
to have a fix for this crash there.  So, we have a bit of time, no rush.
In any case, if you have any questions feel free to ask.

BTW, here is the test for the tunnel push issue:

diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index 48c5de9d1..2ba39cff5 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -595,6 +595,62 @@ OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep 50540000000a5054000000091235 | wc
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([tunnel_push_pop - packet_out debug_slow])
+
+OVS_VSWITCHD_START(
+    [add-port br0 p0 dnl
+     -- set Interface p0 type=dummy ofport_request=1 dnl
+                         other-config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
+AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy])
+AT_CHECK([ovs-vsctl add-port int-br t2 dnl
+          -- set Interface t2 type=geneve options:remote_ip=1.1.2.92 dnl
+                              options:key=123 ofport_request=2])
+
+dnl First setup dummy interface IP address, then add the route
+dnl so that tnl-port table can get valid IP address for the device.
+AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 br0], [0], [OK
+])
+AT_CHECK([ovs-ofctl add-flow br0 action=normal])
+
+dnl This ARP reply from p0 has two effects:
+dnl 1. The ARP cache will learn that 1.1.2.92 is at f8:bc:12:44:34:b6.
+dnl 2. The br0 mac learning will learn that f8:bc:12:44:34:b6 is on p0.
+AT_CHECK([
+  ovs-appctl netdev-dummy/receive p0 dnl
+      'recirc_id(0),in_port(2),dnl
+       eth(src=f8:bc:12:44:34:b6,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),dnl
+       arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'
+])
+
+AT_CHECK([ovs-vsctl -- set Interface p0 options:tx_pcap=p0.pcap])
+
+packet=50540000000a505400000009123
+encap=f8bc124434b6aa55aa5500000800450000320000400040113406010102580101025c83a917c1001e00000000655800007b00
+
+dnl Output to tunnel from a int-br internal port.
+dnl Checking that the packet arrived and it was correctly encapsulated.
+AT_CHECK([ovs-ofctl add-flow int-br "in_port=LOCAL,actions=debug_slow,output:2"])
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}4" | wc -l` -ge 1])
+dnl Sending again to exercise the non-miss upcall path.
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}4" | wc -l` -ge 2])
+
+dnl Output to tunnel from the controller.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out int-br CONTROLLER "debug_slow,output:2" "${packet}5"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}5" | wc -l` -ge 1])
+
+dnl Datapath actions should not have tunnel push action.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -q tnl_push], [1])
+dnl There should be slow_path action instead.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -q 'slow_path(action)'], [0])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([tunnel_push_pop - underlay bridge match])
 
 OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
---

Basically, it triggers the following PACKET_OUT:
OFPT_PACKET_OUT (OF1.3) (xid=0x6): in_port=CONTROLLER actions=debug_slow,output:2 data_len=14
vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,dl_type=0x1235

And it results in sending a packet through the tunnel with the datapath actions:
  tnl_push(tnl_port(6081),header(...),out_port(100)),1

The trick is that 'debug_slow' forces slow path execution, i.e. dpif_execute_with_help().
Therefore dpif_execute() called twice, first time with actions:tnl_push(...) and the
second time with actions:1 expecting that packet got modified between the calls.

If we will clone the packet inside the dpif_netdev_execute() and not copy the
data back, packet will not be modified and we will have original packet without
tunnel headers in the pcap file.

Interestingly, this happens only for PACKET_OUT packets and never happens
for the packets originated inside the datapath (received from the actual port).
This is because on upcall (both miss and actions upcalls), dp->upcall_cb()
receives both actual 'actions' and 'put_actions'.  In this case, for the first
packet entering the datapath handle_packet_upcall will trigger DPIF_UC_MISS
upcall and it will receive following:

actions    :  tnl_push(tnl_port(6081),header(...),out_port(100)),1
put_actions:  userspace(pid=0,slow_path(action))

It will execute the 'actions' inside the datapath avoiding the problem, but
will install 'put_actions' into the classifier.

The second packet in the userspace datapath will have a match in the datapath
classifier and datapath will execute 'userspace(pid=0,slow_path(action))'
actions.  It will do that by triggering DPIF_UC_ACTIONS upcall and it will
receive 'tnl_push(tnl_port(6081),header(...),out_port(100)),1' as 'actions'.
Again they will be executed inside the datapath avoiding the problem.

But the test above exercises this path too just in case.

Best regards, Ilya Maximets.

> 
> Tony
> 
> On Thu, Aug 12, 2021 at 10:51 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> 
>     Hi.
> 
>     It's been some time since the last email in this thread.  I spent more
>     time debugging and analyzing the code and, I think, I got to the bottom
>     of this.  First of all, I had a question why the set of actions in that
>     particular unit test in OVN needs help for execution, so it hits the
>     dpif_execute_with_help() path?  The answer is the action set that looks
>     like this:
> 
>     actions:set(eth(dst=ff:ff:ff:ff:ff:ff)),set(arp(sip=172.16.1.1)),
>             clone(tnl_push(tnl_port(6081),...,out_port(1)),101)
> 
>     Here the 'set(arp())' action is always marked as SLOW_ACTION and leads
>     to 'needs_help == true'.  Changes of arp fields is not supported by
>     kernel.  We may argue that userspace datapath supports it and we should
>     be able to execute these actions inside the datapath avoiding the
>     slow path execution process, but there are other cases where slow path
>     execution can be requested.  One of the most obvious is a direct request
>     of OFPACT_DEBUG_SLOW.  So, it's hard to completely avoid slow path
>     execution even for a userspace datapath.
> 
>     Anyway, dpif_execute_with_help() or, more precisely, the callback
>     dpif_execute_helper_cb() seems to be the source of all issues.  The
>     overall design of the actions execution seems to be flawed here.
>     This function makes several opposite assumptions:
> 
>     1. It assumes that dpif_execute() can't modify packets and that
>        there is no way to get result of the execution from the datapath.
> 
>     2. It assumes that dpif_execute() can and will modify the packet.
> 
>     Assumption 1 leads to a workaround for 'meter' actions which are
>     accumulated from several calls and pushed along with the first terminal
>     action (output, userspace or ct).  This assumption is true for the
>     kernel datapath, because the packet basically gets copied to the kernel.
> 
>     Assumption 2 at the same time allows to implement tunneling for the
>     userpsace datapath.  dpif_execute_helper_cb() relies on the fact
>     that dpif_execute(OVS_ACTION_ATTR_TUNNEL_PUSH) will modify the packet
>     by pushing the tunnel header.
> 
>     So, this inconsistent behavior for different actions breaks the
>     dpif abstraction and doesn't allow us to correctly implement the
>     dpif_execute() interface consistently across different datapath
>     interfaces.
> 
>     Even though I see what is going on here, I don't currently see a way
>     to fix this architectural/design flaw.  Definitely packet reference
>     counters will still help in correct tracking of dp_packets and we
>     need that.  I know that Aaron started working on this, but it will
>     take a while before we will have a working solution.  I'll try
>     to think how to actually re-work the actions execution path so we
>     will not have this kind of self-contradicting functions and APIs
>     (these inconsistencies are really the main thing that complicates
>     the code and doesn't allow to understand it without a deep dive).
>     If someone has ideas on that front, I'd like to hear.
> 
>     Meanwhile, I still think that we need to get the "bandaid" workaround
>     (clone + data copy) in, so we can fix the actual crash.  As far as
>     I understand now, change like this should not have any unexpected
>     side effect.
> 
>     Tony, do you want to sent the workaround patch?  Or else I can do that.
> 
>     Bets regards, Ilya Maximets.
> 
>     On 6/21/21 10:58 PM, Ilya Maximets wrote:
>     > On 6/19/21 4:11 AM, Tony van der Peet wrote:
>     >> Hi all
>     >>
>     >> I am in favour of a better fix, bandaids tend to come back and bite you in the end anyway. So, will be happy to help with this effort, though I am probably going to have to defer to the rest of you when it comes to actually knowing what to do in this area.
>     >>
>     >
>     > It's great to have a full correct solution, but unfortunately, I don't
>     > think that we can come up with something like a full and correct
>     > reference counting in a short period of time.  But we still need to fix
>     > a crash that is pretty easy to trigger.  I'm also nervous that OVN is
>     > using meters more and more (e.g. control plane protection patch set)
>     > and they might actually hit this issue at the high load.  Another point
>     > is that we need a fix that we can backport to LTS, and I don't think that
>     > reference counting would be a small change that we can easily backport
>     > without worrying about breaking something else.
>     >
>     > All in all, I think that we need to come up with a "bandaid" solution
>     > and work further on correct reference counting after that.
>     >
>     > And we also need to create a unit test that will mimic packet-out that
>     > I encountered in OVN tests, so we can test this kind of behavior in OVS.
>     >
>     > For the reference, the packet-out generated by OVN controller had a few
>     > set() actions and the resubmit() to a different table. And this
>     > table had rules leading to packet output to a tunnel port, resulting
>     > in a tunnel push + output datapath actions.
>     >
>     >> Cheers
>     >> Tony
>     >> ________________________________________
>     >> From: Aaron Conole <aconole@redhat.com <mailto:aconole@redhat.com>>
>     >> Sent: Saturday, 19 June 2021 2:50 a.m.
>     >> To: Ilya Maximets
>     >> Cc: Ben Pfaff; Tony van der Peet; dev@openvswitch.org <mailto:dev@openvswitch.org>; Tony van der Peet
>     >> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered
>     >>
>     >> Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> writes:
>     >>
>     >>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
>     >>>> All these flags for stealing, allowing stealing, blah blah, are just
>     >>>> ways to do some kind of dumb reference counting without actually have a
>     >>>> reference count.  When it gets super complex like this, maybe
>     >>>> introducing a reference count is the way to go.  It would be a bigger
>     >>>> change, but perhaps more maintainable over time.
>     >>>>
>     >>>
>     >>> Yes, I absolutely agree.  We just removed one of these hacky flags from
>     >>> the struct dp_packet_batch and we need to get rid of the rest of them.
>     >>
>     >> +1 from me - it's a bigger scoped change, but over-all, I think it's a
>     >> better one, and makes the most sense.
>     >>
>     >>> One thing that bothers me is that some parts of the code treats
>     >>> 'should_steal=false' as "do not modify".  For example, I don't really
>     >>> understand why these functions are carrying cutlen around and clones
>     >>> the packet before truncating if 'should_steal=false'.
>     >>>
>     >>> Some action executors also have optimizations that allows to not clone
>     >>> the packet before the fatal action if this action is the last one.
>     >>>
>     >>> So, in general, yes, we need to get rid of these flags and accurately
>     >>> re-work all the packet paths.  And yes, this would be not a small
>     >>> change.
>     >>>
>     >>> For now, I think, we need to find a less ugly as possible solution for
>     >>> existing crash (maybe the one that I suggested), so we will be able to
>     >>> backport it and continue working on correct reference counting.
>     >>>
>     >>> What do you think?  Tony?  Aaron?
>     >>
>     >> That makes sense to me, and I hope we will actually work on the
>     >> ref-count stuff.  I had started taking a look a few weeks back, but the
>     >> idea of 'steal' is pretty ingrained throughout the code, so getting a
>     >> ref-count semantic correct is a big effort.  As an example, the
>     >> odp-execute area expects that each sub-action will have its own copy to
>     >> modify (and this is documented with the 'should_steal' semantics).
>     >> Taking that out will be a big rework in that area.  I think it makes
>     >> the most sense, and we probably could implement something like COW to
>     >> cover those cases where we really need to modify a packet without
>     >> propagating those changes back up.
>     >>
>     >>> Best regards, Ilya Maximets.
>     >>
>     >
>
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 650e67ab3..d7dc815d8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4168,9 +4168,12 @@  dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
                                flow_hash_5tuple(execute->flow, 0));
     }
 
-    dp_packet_batch_init_packet(&pp, execute->packet);
-    pp.do_not_steal = true;
-    dp_netdev_execute_actions(pmd, &pp, false, execute->flow,
+    /* Batch a copy of the packet for execution and expect this
+     * packet to be stolne.
+     */
+    dp_packet_batch_init_packet(&pp, dp_packet_clone(execute->packet));
+    pp.do_not_steal = false;
+    dp_netdev_execute_actions(pmd, &pp, true, execute->flow,
                               execute->actions, execute->actions_len);
     dp_netdev_pmd_flush_output_packets(pmd, true);
 
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 31064ed95..522cc1318 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -9520,6 +9520,26 @@  OFPST_TABLE reply (OF1.3) (xid=0x2):
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([ofproto-dpif packet-out table meter drop])
+OVS_VSWITCHD_START
+add_of_ports br0 1 2
+
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps bands=type=drop rate=1'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 'in_port=1 action=meter:1,output:2'])
+
+ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 actions=resubmit(,0)"
+ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 actions=resubmit(,0)"
+
+# Check that vswitchd hasn't crashed by dumping the meter added above
+AT_CHECK([ovs-ofctl -O OpenFlow13 dump-meters br0 | ofctl_strip], [0], [dnl
+OFPST_METER_CONFIG reply (OF1.3):
+meter=1 pktps bands=
+type=drop rate=1
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - ICMPv6])
 OVS_VSWITCHD_START
 add_of_ports br0 1