diff mbox series

[ovs-dev] odp-util.c: Fix dp_hash execution with slowpath actions.

Message ID 1589527067-91901-1-git-send-email-hzhou@ovn.org
State Accepted
Commit 29b1dd934f8d0c4cf3d58abc2c10aa9d0ae68277
Headers show
Series [ovs-dev] odp-util.c: Fix dp_hash execution with slowpath actions. | expand

Commit Message

Han Zhou May 15, 2020, 7:17 a.m. UTC
When dp_hash is executed with slowpath actions, it results in endless
recirc loop in kernel datapath, and finally drops the packet, with
kernel logs:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

The root cause is that the dp_hash value calculated by slowpath is not
passed to datapath when executing the recirc action, thus when the recirced
packet miss upcall comes to userspace again, it generates the dp_hash
and recirc action again, with same recirc_id, which in turn generates
a megaflow with recirc action with the recird_id same as the recirc_id in
its match condition, which causes a loop in datapath.

For example, this can be reproduced with below setup of OVN environment:

                         LS1            LS2
                          |              |
                          |------R1------|
        VIF--LS0---R0-----|              |------R3
                          |------R2------|

Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
routes (ECMP) from R3 to the VIF:
R3 -> R1 -> R0
R3 -> R2 -> R0

Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
will hit the R3's datapath which has flows that responds to the ICMP packet
by setting ICMP fields, which requires slowpath actions, and in later flow
tables it will hit the "group" action that selects between the ECMP routes.

By default OVN uses "dp_hash" method for the "group" action.

For the first miss upcall packet, dp_hash value is empty, so the group action
will be translated to "dp_hash" and "recirc".

During action execution, because of the previous actions that sets ICMP fields,
the whole execution requires slowpath, so it tries to execute all actions in
userspace in odp_execute_actions(), including dp_hash action, except the
recirc action, which can only be executed in datapath. So the dp_hash value
is calculated in userspace, and then the packet is injected to datapath for
recirc action execution.

However, the dp_hash calculated by the userspace is not passed to datapath.

Because of this, the packet recirc in datapath doesn't have dp_hash value,
and the miss upcall for the recirced packet hits the same flow tables and
triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!

This time, the new upcall doesn't require any slowpath execution, so both
the dp_hash and recirc actions are executed in datapath, after creating a
datapath megaflow like:

recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)

with match recirc_id equals the recirc id in the action, thus creating a loop.

This patch fixes the problem by passing the calculated dp_hash value to
datapath in odp_key_from_dp_packet().

Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 lib/odp-util.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Han Zhou May 15, 2020, 6:55 p.m. UTC | #1
On Fri, May 15, 2020 at 12:18 AM Han Zhou <hzhou@ovn.org> wrote:
>
> When dp_hash is executed with slowpath actions, it results in endless
> recirc loop in kernel datapath, and finally drops the packet, with
> kernel logs:
>
> openvswitch: ovs-system: deferred action limit reached, drop recirc action
>
> The root cause is that the dp_hash value calculated by slowpath is not
> passed to datapath when executing the recirc action, thus when the
recirced
> packet miss upcall comes to userspace again, it generates the dp_hash
> and recirc action again, with same recirc_id, which in turn generates
> a megaflow with recirc action with the recird_id same as the recirc_id in
> its match condition, which causes a loop in datapath.
>
> For example, this can be reproduced with below setup of OVN environment:
>
>                          LS1            LS2
>                           |              |
>                           |------R1------|
>         VIF--LS0---R0-----|              |------R3
>                           |------R2------|
>
> Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are
two
> routes (ECMP) from R3 to the VIF:
> R3 -> R1 -> R0
> R3 -> R2 -> R0
>
> Now if we ping from the VIF to R3, the OVS flow execution on the HV of
the VIF
> will hit the R3's datapath which has flows that responds to the ICMP
packet
> by setting ICMP fields, which requires slowpath actions, and in later flow
> tables it will hit the "group" action that selects between the ECMP
routes.
>
> By default OVN uses "dp_hash" method for the "group" action.
>
> For the first miss upcall packet, dp_hash value is empty, so the group
action
> will be translated to "dp_hash" and "recirc".
>
> During action execution, because of the previous actions that sets ICMP
fields,
> the whole execution requires slowpath, so it tries to execute all actions
in
> userspace in odp_execute_actions(), including dp_hash action, except the
> recirc action, which can only be executed in datapath. So the dp_hash
value
> is calculated in userspace, and then the packet is injected to datapath
for
> recirc action execution.
>
> However, the dp_hash calculated by the userspace is not passed to
datapath.
>
> Because of this, the packet recirc in datapath doesn't have dp_hash value,
> and the miss upcall for the recirced packet hits the same flow tables and
> triggers same "dp_hash" and "recirc" action again, with exactly same
recirc_id!
>
> This time, the new upcall doesn't require any slowpath execution, so both
> the dp_hash and recirc actions are executed in datapath, after creating a
> datapath megaflow like:
>
> recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
>
> with match recirc_id equals the recirc id in the action, thus creating a
loop.
>
> This patch fixes the problem by passing the calculated dp_hash value to
> datapath in odp_key_from_dp_packet().
>
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  lib/odp-util.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index b66d266..ac532fe 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -6392,6 +6392,10 @@ odp_key_from_dp_packet(struct ofpbuf *buf, const
struct dp_packet *packet)
>
>      nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
>
> +    if (md->dp_hash) {
> +        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
> +    }
> +
>      if (flow_tnl_dst_is_set(&md->tunnel)) {
>          tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
>      }
> --
> 2.1.0
>

Ben and Ilya, this is the fix to the dp_hash problem we discussed in
yesterday's meeting. The actual fix is simpler that I thought it would be.
I didn't take the approach of executing dp_hash in datapath because in this
case since the flow is required to be slowpathed, all the following packets
for this flow will anyway get upcalled. If all the dp_hash for the flow is
executed in slowpath then there is no consistency problem. So I think it is
ok to keep the calculation in userspace and the fix is simple. Let me know
if you think differently.

Thanks,
Han
aginwala May 29, 2020, 12:06 a.m. UTC | #2
Thanks Han for the fix:

I verified it in the internal ecmp environment and Ping works fine with no
more drop recirc action error messages as per example above.

Maybe it would be good to show if R3 has a VIF too in the diagram and tweak
the commit message a bit stating VIFs under R3 can ping R3 including VIFs
in R0 <-> VIFs in R3 too; with/without this patch and no 'drop recirc
action' error message. What do you think? Just my two cents.

Tested-by: Aliasgar Ginwala <aginwala@ebay.com>


On Fri, May 15, 2020 at 11:55 AM Han Zhou <hzhou@ovn.org> wrote:

> On Fri, May 15, 2020 at 12:18 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> > When dp_hash is executed with slowpath actions, it results in endless
> > recirc loop in kernel datapath, and finally drops the packet, with
> > kernel logs:
> >
> > openvswitch: ovs-system: deferred action limit reached, drop recirc
> action
> >
> > The root cause is that the dp_hash value calculated by slowpath is not
> > passed to datapath when executing the recirc action, thus when the
> recirced
> > packet miss upcall comes to userspace again, it generates the dp_hash
> > and recirc action again, with same recirc_id, which in turn generates
> > a megaflow with recirc action with the recird_id same as the recirc_id in
> > its match condition, which causes a loop in datapath.
> >
> > For example, this can be reproduced with below setup of OVN environment:
> >
> >                          LS1            LS2
> >                           |              |
> >                           |------R1------|
> >         VIF--LS0---R0-----|              |------R3
> >                           |------R2------|
> >
> > Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are
> two
> > routes (ECMP) from R3 to the VIF:
> > R3 -> R1 -> R0
> > R3 -> R2 -> R0
> >
> > Now if we ping from the VIF to R3, the OVS flow execution on the HV of
> the VIF
> > will hit the R3's datapath which has flows that responds to the ICMP
> packet
> > by setting ICMP fields, which requires slowpath actions, and in later
> flow
> > tables it will hit the "group" action that selects between the ECMP
> routes.
> >
> > By default OVN uses "dp_hash" method for the "group" action.
> >
> > For the first miss upcall packet, dp_hash value is empty, so the group
> action
> > will be translated to "dp_hash" and "recirc".
> >
> > During action execution, because of the previous actions that sets ICMP
> fields,
> > the whole execution requires slowpath, so it tries to execute all actions
> in
> > userspace in odp_execute_actions(), including dp_hash action, except the
> > recirc action, which can only be executed in datapath. So the dp_hash
> value
> > is calculated in userspace, and then the packet is injected to datapath
> for
> > recirc action execution.
> >
> > However, the dp_hash calculated by the userspace is not passed to
> datapath.
> >
> > Because of this, the packet recirc in datapath doesn't have dp_hash
> value,
> > and the miss upcall for the recirced packet hits the same flow tables and
> > triggers same "dp_hash" and "recirc" action again, with exactly same
> recirc_id!
> >
> > This time, the new upcall doesn't require any slowpath execution, so both
> > the dp_hash and recirc actions are executed in datapath, after creating a
> > datapath megaflow like:
> >
> > recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> >
> > with match recirc_id equals the recirc id in the action, thus creating a
> loop.
> >
> > This patch fixes the problem by passing the calculated dp_hash value to
> > datapath in odp_key_from_dp_packet().
> >
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >  lib/odp-util.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/odp-util.c b/lib/odp-util.c
> > index b66d266..ac532fe 100644
> > --- a/lib/odp-util.c
> > +++ b/lib/odp-util.c
> > @@ -6392,6 +6392,10 @@ odp_key_from_dp_packet(struct ofpbuf *buf, const
> struct dp_packet *packet)
> >
> >      nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
> >
> > +    if (md->dp_hash) {
> > +        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
> > +    }
> > +
> >      if (flow_tnl_dst_is_set(&md->tunnel)) {
> >          tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
> >      }
> > --
> > 2.1.0
> >
>
> Ben and Ilya, this is the fix to the dp_hash problem we discussed in
> yesterday's meeting. The actual fix is simpler that I thought it would be.
> I didn't take the approach of executing dp_hash in datapath because in this
> case since the flow is required to be slowpathed, all the following packets
> for this flow will anyway get upcalled. If all the dp_hash for the flow is
> executed in slowpath then there is no consistency problem. So I think it is
> ok to keep the calculation in userspace and the fix is simple. Let me know
> if you think differently.
>
> Thanks,
> Han
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ilya Maximets June 11, 2020, 1:15 p.m. UTC | #3
On 5/15/20 8:55 PM, Han Zhou wrote:
> 
> 
> On Fri, May 15, 2020 at 12:18 AM Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> wrote:
>>
>> When dp_hash is executed with slowpath actions, it results in endless
>> recirc loop in kernel datapath, and finally drops the packet, with
>> kernel logs:
>>
>> openvswitch: ovs-system: deferred action limit reached, drop recirc action
>>
>> The root cause is that the dp_hash value calculated by slowpath is not
>> passed to datapath when executing the recirc action, thus when the recirced
>> packet miss upcall comes to userspace again, it generates the dp_hash
>> and recirc action again, with same recirc_id, which in turn generates
>> a megaflow with recirc action with the recird_id same as the recirc_id in
>> its match condition, which causes a loop in datapath.
>>
>> For example, this can be reproduced with below setup of OVN environment:
>>
>>                          LS1            LS2
>>                           |              |
>>                           |------R1------|
>>         VIF--LS0---R0-----|              |------R3
>>                           |------R2------|
>>
>> Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
>> routes (ECMP) from R3 to the VIF:
>> R3 -> R1 -> R0
>> R3 -> R2 -> R0
>>
>> Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
>> will hit the R3's datapath which has flows that responds to the ICMP packet
>> by setting ICMP fields, which requires slowpath actions, and in later flow
>> tables it will hit the "group" action that selects between the ECMP routes.
>>
>> By default OVN uses "dp_hash" method for the "group" action.
>>
>> For the first miss upcall packet, dp_hash value is empty, so the group action
>> will be translated to "dp_hash" and "recirc".
>>
>> During action execution, because of the previous actions that sets ICMP fields,
>> the whole execution requires slowpath, so it tries to execute all actions in
>> userspace in odp_execute_actions(), including dp_hash action, except the
>> recirc action, which can only be executed in datapath. So the dp_hash value
>> is calculated in userspace, and then the packet is injected to datapath for
>> recirc action execution.
>>
>> However, the dp_hash calculated by the userspace is not passed to datapath.
>>
>> Because of this, the packet recirc in datapath doesn't have dp_hash value,
>> and the miss upcall for the recirced packet hits the same flow tables and
>> triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!
>>
>> This time, the new upcall doesn't require any slowpath execution, so both
>> the dp_hash and recirc actions are executed in datapath, after creating a
>> datapath megaflow like:
>>
>> recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
>>
>> with match recirc_id equals the recirc id in the action, thus creating a loop.
>>
>> This patch fixes the problem by passing the calculated dp_hash value to
>> datapath in odp_key_from_dp_packet().
>>
>> Signed-off-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
>> ---
>>  lib/odp-util.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/lib/odp-util.c b/lib/odp-util.c
>> index b66d266..ac532fe 100644
>> --- a/lib/odp-util.c
>> +++ b/lib/odp-util.c
>> @@ -6392,6 +6392,10 @@ odp_key_from_dp_packet(struct ofpbuf *buf, const struct dp_packet *packet)
>>
>>      nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
>>
>> +    if (md->dp_hash) {
>> +        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
>> +    }
>> +
>>      if (flow_tnl_dst_is_set(&md->tunnel)) {
>>          tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
>>      }
>> --
>> 2.1.0
>>
> 
> Ben and Ilya, this is the fix to the dp_hash problem we discussed in yesterday's meeting. The actual fix is simpler that I thought it would be. I didn't take the approach of executing dp_hash in datapath because in this case since the flow is required to be slowpathed, all the following packets for this flow will anyway get upcalled. If all the dp_hash for the flow is executed in slowpath then there is no consistency problem. So I think it is ok to keep the calculation in userspace and the fix is simple. Let me know if you think differently.

Hi.  Sorry for it took so long to reply.

I understand that this patch fixes this particular case, however I still
think it's dangerous to pass the hash calculated in userspace to kernel
since it might cause mismatch for the later packets in case where the
flow doesn't have actions that requires sending to userspace.

One more thing.  We have such a comment in odp-execute.c:

        /* Calculate a hash value directly. This might not match the         
         * value computed by the datapath, but it is much less expensive,    
         * and the current use case (bonding) does not require a strict      
         * match to work properly. */

But we're using dp_hash not only for bonding for a long time now.
And this doesn't look correct.  Even for bonding I'm not sure if that
is a fully correct assumption.

From the other side, AFAIU, OVS is not able to execute any "non-terminal"
actions in datapath, get results and continue to execute further actions.
The option here is to assume that dp_hash + recirc always goes together.

I'm not sure how to proceed here.
Ben, what do you think?

Best regards, Ilya Maximets.
Han Zhou June 11, 2020, 5:41 p.m. UTC | #4
On Thu, Jun 11, 2020 at 6:15 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 5/15/20 8:55 PM, Han Zhou wrote:
> >
> >
> > On Fri, May 15, 2020 at 12:18 AM Han Zhou <hzhou@ovn.org <mailto:
hzhou@ovn.org>> wrote:
> >>
> >> When dp_hash is executed with slowpath actions, it results in endless
> >> recirc loop in kernel datapath, and finally drops the packet, with
> >> kernel logs:
> >>
> >> openvswitch: ovs-system: deferred action limit reached, drop recirc
action
> >>
> >> The root cause is that the dp_hash value calculated by slowpath is not
> >> passed to datapath when executing the recirc action, thus when the
recirced
> >> packet miss upcall comes to userspace again, it generates the dp_hash
> >> and recirc action again, with same recirc_id, which in turn generates
> >> a megaflow with recirc action with the recird_id same as the recirc_id
in
> >> its match condition, which causes a loop in datapath.
> >>
> >> For example, this can be reproduced with below setup of OVN
environment:
> >>
> >>                          LS1            LS2
> >>                           |              |
> >>                           |------R1------|
> >>         VIF--LS0---R0-----|              |------R3
> >>                           |------R2------|
> >>
> >> Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there
are two
> >> routes (ECMP) from R3 to the VIF:
> >> R3 -> R1 -> R0
> >> R3 -> R2 -> R0
> >>
> >> Now if we ping from the VIF to R3, the OVS flow execution on the HV of
the VIF
> >> will hit the R3's datapath which has flows that responds to the ICMP
packet
> >> by setting ICMP fields, which requires slowpath actions, and in later
flow
> >> tables it will hit the "group" action that selects between the ECMP
routes.
> >>
> >> By default OVN uses "dp_hash" method for the "group" action.
> >>
> >> For the first miss upcall packet, dp_hash value is empty, so the group
action
> >> will be translated to "dp_hash" and "recirc".
> >>
> >> During action execution, because of the previous actions that sets
ICMP fields,
> >> the whole execution requires slowpath, so it tries to execute all
actions in
> >> userspace in odp_execute_actions(), including dp_hash action, except
the
> >> recirc action, which can only be executed in datapath. So the dp_hash
value
> >> is calculated in userspace, and then the packet is injected to
datapath for
> >> recirc action execution.
> >>
> >> However, the dp_hash calculated by the userspace is not passed to
datapath.
> >>
> >> Because of this, the packet recirc in datapath doesn't have dp_hash
value,
> >> and the miss upcall for the recirced packet hits the same flow tables
and
> >> triggers same "dp_hash" and "recirc" action again, with exactly same
recirc_id!
> >>
> >> This time, the new upcall doesn't require any slowpath execution, so
both
> >> the dp_hash and recirc actions are executed in datapath, after
creating a
> >> datapath megaflow like:
> >>
> >> recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> >>
> >> with match recirc_id equals the recirc id in the action, thus creating
a loop.
> >>
> >> This patch fixes the problem by passing the calculated dp_hash value to
> >> datapath in odp_key_from_dp_packet().
> >>
> >> Signed-off-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
> >> ---
> >>  lib/odp-util.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/lib/odp-util.c b/lib/odp-util.c
> >> index b66d266..ac532fe 100644
> >> --- a/lib/odp-util.c
> >> +++ b/lib/odp-util.c
> >> @@ -6392,6 +6392,10 @@ odp_key_from_dp_packet(struct ofpbuf *buf,
const struct dp_packet *packet)
> >>
> >>      nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
> >>
> >> +    if (md->dp_hash) {
> >> +        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
> >> +    }
> >> +
> >>      if (flow_tnl_dst_is_set(&md->tunnel)) {
> >>          tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
> >>      }
> >> --
> >> 2.1.0
> >>
> >
> > Ben and Ilya, this is the fix to the dp_hash problem we discussed in
yesterday's meeting. The actual fix is simpler that I thought it would be.
I didn't take the approach of executing dp_hash in datapath because in this
case since the flow is required to be slowpathed, all the following packets
for this flow will anyway get upcalled. If all the dp_hash for the flow is
executed in slowpath then there is no consistency problem. So I think it is
ok to keep the calculation in userspace and the fix is simple. Let me know
if you think differently.
>
> Hi.  Sorry for it took so long to reply.
>
Hi Ilya, thanks for the review!

> I understand that this patch fixes this particular case, however I still
> think it's dangerous to pass the hash calculated in userspace to kernel
> since it might cause mismatch for the later packets in case where the
> flow doesn't have actions that requires sending to userspace.
>

If the flow doesn't have actions that require slowpath, the dp_hash will
not get processed in userspace for the first upcall. This problem happens
only if it requires both slowpath actions (in the example of the commit
message it is the ICMP field setting) and dp_hash, and in such case dp_hash
is always caculated in userspace for this flow. The fix is just to make
this scenario work without endless recirculation. Please correct me if I am
wrong.

> One more thing.  We have such a comment in odp-execute.c:
>
>         /* Calculate a hash value directly. This might not match the

>          * value computed by the datapath, but it is much less expensive,

>          * and the current use case (bonding) does not require a strict

>          * match to work properly. */
>
> But we're using dp_hash not only for bonding for a long time now.
> And this doesn't look correct.  Even for bonding I'm not sure if that
> is a fully correct assumption.

Yes, I saw this comment and got confused. I think this out-of-date comment
may be fixed in a separate patch.

>
> From the other side, AFAIU, OVS is not able to execute any "non-terminal"
> actions in datapath, get results and continue to execute further actions.
> The option here is to assume that dp_hash + recirc always goes together.
>
Sorry, I didn't catch this point. What do you mean by "non-terminal"
actions?

Here's how I think about this problem:
1. The current design is that when the flow has actions that require
slowpath, it tries to execute as many actions as possible in userspace
(see dpif_execute_with_help()),
and only execute actions in datapath if it can't be handled in userspace
(e.g. recirc). For this design, the current implementation has a bug: it
didn't pass the hash value calculated by userspace when sending to datapath
to continue executing "recirc", which caused the endless loop of recirc.
This patch just fix this bug.

2. As you pointed out, it may be better to always execute dp_hash in
datapath, which is a design change. While it seems more reasonable, but
maybe not really necessary. I think we can continue the discussion, and we
can address it with another patch if we conclude that it is necessary.

So do you think it makes sense to fix 1) with this patch and then continue
on 2)? (I can't promise that I can address 2) very soon, but I will try if
I have time)

Thanks,
Han

> I'm not sure how to proceed here.
> Ben, what do you think?
>
> Best regards, Ilya Maximets.
Ben Pfaff June 16, 2020, 9:40 p.m. UTC | #5
On Fri, May 15, 2020 at 12:17:47AM -0700, Han Zhou wrote:
> When dp_hash is executed with slowpath actions, it results in endless
> recirc loop in kernel datapath, and finally drops the packet, with
> kernel logs:
> 
> openvswitch: ovs-system: deferred action limit reached, drop recirc action
> 
> The root cause is that the dp_hash value calculated by slowpath is not
> passed to datapath when executing the recirc action, thus when the recirced
> packet miss upcall comes to userspace again, it generates the dp_hash
> and recirc action again, with same recirc_id, which in turn generates
> a megaflow with recirc action with the recird_id same as the recirc_id in
> its match condition, which causes a loop in datapath.
> 
> For example, this can be reproduced with below setup of OVN environment:
> 
>                          LS1            LS2
>                           |              |
>                           |------R1------|
>         VIF--LS0---R0-----|              |------R3
>                           |------R2------|
> 
> Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
> routes (ECMP) from R3 to the VIF:
> R3 -> R1 -> R0
> R3 -> R2 -> R0
> 
> Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
> will hit the R3's datapath which has flows that responds to the ICMP packet
> by setting ICMP fields, which requires slowpath actions, and in later flow
> tables it will hit the "group" action that selects between the ECMP routes.
> 
> By default OVN uses "dp_hash" method for the "group" action.
> 
> For the first miss upcall packet, dp_hash value is empty, so the group action
> will be translated to "dp_hash" and "recirc".
> 
> During action execution, because of the previous actions that sets ICMP fields,
> the whole execution requires slowpath, so it tries to execute all actions in
> userspace in odp_execute_actions(), including dp_hash action, except the
> recirc action, which can only be executed in datapath. So the dp_hash value
> is calculated in userspace, and then the packet is injected to datapath for
> recirc action execution.
> 
> However, the dp_hash calculated by the userspace is not passed to datapath.
> 
> Because of this, the packet recirc in datapath doesn't have dp_hash value,
> and the miss upcall for the recirced packet hits the same flow tables and
> triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!
> 
> This time, the new upcall doesn't require any slowpath execution, so both
> the dp_hash and recirc actions are executed in datapath, after creating a
> datapath megaflow like:
> 
> recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> 
> with match recirc_id equals the recirc id in the action, thus creating a loop.
> 
> This patch fixes the problem by passing the calculated dp_hash value to
> datapath in odp_key_from_dp_packet().
> 
> Signed-off-by: Han Zhou <hzhou@ovn.org>

I remember that we discussed whether dp_hash should be passed to and
from the kernel back when we introduced the field, in commit
572f732ab078 ("dpif-netdev: user space datapath recirculation").  The
concern was that userspace and the datapath might compute different hash
values, which could cause a problem for flows that ended up being
handled two different ways with different hash values.  However, not
doing that caused the problem that this commit solves.

I added a Fixes: tag for that original commit, ran the tests, and pushed
this to master.  Thank you!

Do you want this backported at all?
Ben Pfaff June 16, 2020, 9:45 p.m. UTC | #6
On Thu, Jun 11, 2020 at 03:15:13PM +0200, Ilya Maximets wrote:
> I understand that this patch fixes this particular case, however I still
> think it's dangerous to pass the hash calculated in userspace to kernel
> since it might cause mismatch for the later packets in case where the
> flow doesn't have actions that requires sending to userspace.
> 
> One more thing.  We have such a comment in odp-execute.c:
> 
>         /* Calculate a hash value directly. This might not match the         
>          * value computed by the datapath, but it is much less expensive,    
>          * and the current use case (bonding) does not require a strict      
>          * match to work properly. */
> 
> But we're using dp_hash not only for bonding for a long time now.
> And this doesn't look correct.  Even for bonding I'm not sure if that
> is a fully correct assumption.

It's been a problem a long time.  We were aware of the issue from the
day we introduced dp_hash, and we've come up with various workarounds
over the years.  It was never clear before that the problem caused by
not passing dp_hash around was such a big deal.

I think that a real solution would be to reconcile the hash functions.
I guess that this would have to amount to userspace adopting the kernel
hash function, at least for the cases where it matters, since we can't
expect the kernel to change.  I don't know how hard that would be (I do
know that userspace could check that it was correct by probing some
sample packets).
Han Zhou June 16, 2020, 10:51 p.m. UTC | #7
On Tue, Jun 16, 2020 at 2:40 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Fri, May 15, 2020 at 12:17:47AM -0700, Han Zhou wrote:
> > When dp_hash is executed with slowpath actions, it results in endless
> > recirc loop in kernel datapath, and finally drops the packet, with
> > kernel logs:
> >
> > openvswitch: ovs-system: deferred action limit reached, drop recirc
action
> >
> > The root cause is that the dp_hash value calculated by slowpath is not
> > passed to datapath when executing the recirc action, thus when the
recirced
> > packet miss upcall comes to userspace again, it generates the dp_hash
> > and recirc action again, with same recirc_id, which in turn generates
> > a megaflow with recirc action with the recird_id same as the recirc_id
in
> > its match condition, which causes a loop in datapath.
> >
> > For example, this can be reproduced with below setup of OVN environment:
> >
> >                          LS1            LS2
> >                           |              |
> >                           |------R1------|
> >         VIF--LS0---R0-----|              |------R3
> >                           |------R2------|
> >
> > Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there
are two
> > routes (ECMP) from R3 to the VIF:
> > R3 -> R1 -> R0
> > R3 -> R2 -> R0
> >
> > Now if we ping from the VIF to R3, the OVS flow execution on the HV of
the VIF
> > will hit the R3's datapath which has flows that responds to the ICMP
packet
> > by setting ICMP fields, which requires slowpath actions, and in later
flow
> > tables it will hit the "group" action that selects between the ECMP
routes.
> >
> > By default OVN uses "dp_hash" method for the "group" action.
> >
> > For the first miss upcall packet, dp_hash value is empty, so the group
action
> > will be translated to "dp_hash" and "recirc".
> >
> > During action execution, because of the previous actions that sets ICMP
fields,
> > the whole execution requires slowpath, so it tries to execute all
actions in
> > userspace in odp_execute_actions(), including dp_hash action, except the
> > recirc action, which can only be executed in datapath. So the dp_hash
value
> > is calculated in userspace, and then the packet is injected to datapath
for
> > recirc action execution.
> >
> > However, the dp_hash calculated by the userspace is not passed to
datapath.
> >
> > Because of this, the packet recirc in datapath doesn't have dp_hash
value,
> > and the miss upcall for the recirced packet hits the same flow tables
and
> > triggers same "dp_hash" and "recirc" action again, with exactly same
recirc_id!
> >
> > This time, the new upcall doesn't require any slowpath execution, so
both
> > the dp_hash and recirc actions are executed in datapath, after creating
a
> > datapath megaflow like:
> >
> > recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> >
> > with match recirc_id equals the recirc id in the action, thus creating
a loop.
> >
> > This patch fixes the problem by passing the calculated dp_hash value to
> > datapath in odp_key_from_dp_packet().
> >
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
>
> I remember that we discussed whether dp_hash should be passed to and
> from the kernel back when we introduced the field, in commit
> 572f732ab078 ("dpif-netdev: user space datapath recirculation").  The
> concern was that userspace and the datapath might compute different hash
> values, which could cause a problem for flows that ended up being
> handled two different ways with different hash values.  However, not
> doing that caused the problem that this commit solves.
>
> I added a Fixes: tag for that original commit, ran the tests, and pushed
> this to master.  Thank you!
>
> Do you want this backported at all?

Thanks Ben. I think it is better to be backported to at least 2.12, 2.11. I
am not sure about older branches.
Ben Pfaff July 6, 2020, 8:42 p.m. UTC | #8
On Tue, Jun 16, 2020 at 03:51:06PM -0700, Han Zhou wrote:
> Thanks Ben. I think it is better to be backported to at least 2.12, 2.11. I
> am not sure about older branches.

I guess 2.13, then, as well.

Would you mind posting backport patches?  I see that there are patch
rejects, although they are probably easy to resolve.
Han Zhou July 6, 2020, 11:06 p.m. UTC | #9
On Mon, Jul 6, 2020 at 1:42 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Tue, Jun 16, 2020 at 03:51:06PM -0700, Han Zhou wrote:
> > Thanks Ben. I think it is better to be backported to at least 2.12,
2.11. I
> > am not sure about older branches.
>
> I guess 2.13, then, as well.

Yes, sorry I forgot to mention 2.13.

>
> Would you mind posting backport patches?  I see that there are patch
> rejects, although they are probably easy to resolve.

I just tried cherry-picking the commit on all 3 branches (2.13, 2.12 and
2.11) but I didn't see any conflicts (all branches are up-to-date). Could
you double check?

$ git cherry-pick  29b1dd934f8d0c4cf3d58abc2c10aa9d0ae68277
[branch-2.13 4e9b1e3] odp-util.c: Fix dp_hash execution with slowpath
actions.
 Date: Fri May 15 00:17:47 2020 -0700
 1 file changed, 4 insertions(+)

$ git cherry-pick  29b1dd934f8d0c4cf3d58abc2c10aa9d0ae68277
[branch-2.12 b625241] odp-util.c: Fix dp_hash execution with slowpath
actions.
 Date: Fri May 15 00:17:47 2020 -0700
 1 file changed, 4 insertions(+)

$ git cherry-pick  29b1dd934f8d0c4cf3d58abc2c10aa9d0ae68277
[branch-2.11 1671c85] odp-util.c: Fix dp_hash execution with slowpath
actions.
 Date: Fri May 15 00:17:47 2020 -0700
 1 file changed, 4 insertions(+)

Thanks,
Han
Ben Pfaff July 6, 2020, 11:20 p.m. UTC | #10
On Mon, Jul 06, 2020 at 04:06:48PM -0700, Han Zhou wrote:
> On Mon, Jul 6, 2020 at 1:42 PM Ben Pfaff <blp@ovn.org> wrote:
> >
> > On Tue, Jun 16, 2020 at 03:51:06PM -0700, Han Zhou wrote:
> > > Thanks Ben. I think it is better to be backported to at least 2.12,
> 2.11. I
> > > am not sure about older branches.
> >
> > I guess 2.13, then, as well.
> 
> Yes, sorry I forgot to mention 2.13.
> 
> >
> > Would you mind posting backport patches?  I see that there are patch
> > rejects, although they are probably easy to resolve.
> 
> I just tried cherry-picking the commit on all 3 branches (2.13, 2.12 and
> 2.11) but I didn't see any conflicts (all branches are up-to-date). Could
> you double check?

Hmm, my apologies.  I must have picked the wrong commit before.  Thank
you for the correction.

I applied this comit to 2.13, 2.12, and 2.11.
diff mbox series

Patch

diff --git a/lib/odp-util.c b/lib/odp-util.c
index b66d266..ac532fe 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -6392,6 +6392,10 @@  odp_key_from_dp_packet(struct ofpbuf *buf, const struct dp_packet *packet)
 
     nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
 
+    if (md->dp_hash) {
+        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
+    }
+
     if (flow_tnl_dst_is_set(&md->tunnel)) {
         tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
     }