diff mbox series

[ovs-dev] odp-util.c: Fix dp_hash execution with slowpath actions.

Message ID 1589527067-91901-1-git-send-email-hzhou@ovn.org
State New
Headers show
Series [ovs-dev] odp-util.c: Fix dp_hash execution with slowpath actions. | expand

Commit Message

Han Zhou May 15, 2020, 7:17 a.m. UTC
When dp_hash is executed with slowpath actions, it results in endless
recirc loop in kernel datapath, and finally drops the packet, with
kernel logs:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

The root cause is that the dp_hash value calculated by slowpath is not
passed to datapath when executing the recirc action, thus when the recirced
packet miss upcall comes to userspace again, it generates the dp_hash
and recirc action again, with same recirc_id, which in turn generates
a megaflow with recirc action with the recird_id same as the recirc_id in
its match condition, which causes a loop in datapath.

For example, this can be reproduced with below setup of OVN environment:

                         LS1            LS2
                          |              |
                          |------R1------|
        VIF--LS0---R0-----|              |------R3
                          |------R2------|

Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
routes (ECMP) from R3 to the VIF:
R3 -> R1 -> R0
R3 -> R2 -> R0

Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
will hit the R3's datapath which has flows that responds to the ICMP packet
by setting ICMP fields, which requires slowpath actions, and in later flow
tables it will hit the "group" action that selects between the ECMP routes.

By default OVN uses "dp_hash" method for the "group" action.

For the first miss upcall packet, dp_hash value is empty, so the group action
will be translated to "dp_hash" and "recirc".

During action execution, because of the previous actions that sets ICMP fields,
the whole execution requires slowpath, so it tries to execute all actions in
userspace in odp_execute_actions(), including dp_hash action, except the
recirc action, which can only be executed in datapath. So the dp_hash value
is calculated in userspace, and then the packet is injected to datapath for
recirc action execution.

However, the dp_hash calculated by the userspace is not passed to datapath.

Because of this, the packet recirc in datapath doesn't have dp_hash value,
and the miss upcall for the recirced packet hits the same flow tables and
triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!

This time, the new upcall doesn't require any slowpath execution, so both
the dp_hash and recirc actions are executed in datapath, after creating a
datapath megaflow like:

recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)

with match recirc_id equals the recirc id in the action, thus creating a loop.

This patch fixes the problem by passing the calculated dp_hash value to
datapath in odp_key_from_dp_packet().

Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 lib/odp-util.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Han Zhou May 15, 2020, 6:55 p.m. UTC | #1
On Fri, May 15, 2020 at 12:18 AM Han Zhou <hzhou@ovn.org> wrote:
>
> When dp_hash is executed with slowpath actions, it results in endless
> recirc loop in kernel datapath, and finally drops the packet, with
> kernel logs:
>
> openvswitch: ovs-system: deferred action limit reached, drop recirc action
>
> The root cause is that the dp_hash value calculated by slowpath is not
> passed to datapath when executing the recirc action, thus when the
recirced
> packet miss upcall comes to userspace again, it generates the dp_hash
> and recirc action again, with same recirc_id, which in turn generates
> a megaflow with recirc action with the recird_id same as the recirc_id in
> its match condition, which causes a loop in datapath.
>
> For example, this can be reproduced with below setup of OVN environment:
>
>                          LS1            LS2
>                           |              |
>                           |------R1------|
>         VIF--LS0---R0-----|              |------R3
>                           |------R2------|
>
> Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are
two
> routes (ECMP) from R3 to the VIF:
> R3 -> R1 -> R0
> R3 -> R2 -> R0
>
> Now if we ping from the VIF to R3, the OVS flow execution on the HV of
the VIF
> will hit the R3's datapath which has flows that responds to the ICMP
packet
> by setting ICMP fields, which requires slowpath actions, and in later flow
> tables it will hit the "group" action that selects between the ECMP
routes.
>
> By default OVN uses "dp_hash" method for the "group" action.
>
> For the first miss upcall packet, dp_hash value is empty, so the group
action
> will be translated to "dp_hash" and "recirc".
>
> During action execution, because of the previous actions that sets ICMP
fields,
> the whole execution requires slowpath, so it tries to execute all actions
in
> userspace in odp_execute_actions(), including dp_hash action, except the
> recirc action, which can only be executed in datapath. So the dp_hash
value
> is calculated in userspace, and then the packet is injected to datapath
for
> recirc action execution.
>
> However, the dp_hash calculated by the userspace is not passed to
datapath.
>
> Because of this, the packet recirc in datapath doesn't have dp_hash value,
> and the miss upcall for the recirced packet hits the same flow tables and
> triggers same "dp_hash" and "recirc" action again, with exactly same
recirc_id!
>
> This time, the new upcall doesn't require any slowpath execution, so both
> the dp_hash and recirc actions are executed in datapath, after creating a
> datapath megaflow like:
>
> recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
>
> with match recirc_id equals the recirc id in the action, thus creating a
loop.
>
> This patch fixes the problem by passing the calculated dp_hash value to
> datapath in odp_key_from_dp_packet().
>
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  lib/odp-util.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index b66d266..ac532fe 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -6392,6 +6392,10 @@ odp_key_from_dp_packet(struct ofpbuf *buf, const
struct dp_packet *packet)
>
>      nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
>
> +    if (md->dp_hash) {
> +        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
> +    }
> +
>      if (flow_tnl_dst_is_set(&md->tunnel)) {
>          tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
>      }
> --
> 2.1.0
>

Ben and Ilya, this is the fix to the dp_hash problem we discussed in
yesterday's meeting. The actual fix is simpler that I thought it would be.
I didn't take the approach of executing dp_hash in datapath because in this
case since the flow is required to be slowpathed, all the following packets
for this flow will anyway get upcalled. If all the dp_hash for the flow is
executed in slowpath then there is no consistency problem. So I think it is
ok to keep the calculation in userspace and the fix is simple. Let me know
if you think differently.

Thanks,
Han
aginwala May 29, 2020, 12:06 a.m. UTC | #2
Thanks Han for the fix:

I verified it in the internal ecmp environment and Ping works fine with no
more drop recirc action error messages as per example above.

Maybe it would be good to show if R3 has a VIF too in the diagram and tweak
the commit message a bit stating VIFs under R3 can ping R3 including VIFs
in R0 <-> VIFs in R3 too; with/without this patch and no 'drop recirc
action' error message. What do you think? Just my two cents.

Tested-by: Aliasgar Ginwala <aginwala@ebay.com>


On Fri, May 15, 2020 at 11:55 AM Han Zhou <hzhou@ovn.org> wrote:

> On Fri, May 15, 2020 at 12:18 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> > When dp_hash is executed with slowpath actions, it results in endless
> > recirc loop in kernel datapath, and finally drops the packet, with
> > kernel logs:
> >
> > openvswitch: ovs-system: deferred action limit reached, drop recirc
> action
> >
> > The root cause is that the dp_hash value calculated by slowpath is not
> > passed to datapath when executing the recirc action, thus when the
> recirced
> > packet miss upcall comes to userspace again, it generates the dp_hash
> > and recirc action again, with same recirc_id, which in turn generates
> > a megaflow with recirc action with the recird_id same as the recirc_id in
> > its match condition, which causes a loop in datapath.
> >
> > For example, this can be reproduced with below setup of OVN environment:
> >
> >                          LS1            LS2
> >                           |              |
> >                           |------R1------|
> >         VIF--LS0---R0-----|              |------R3
> >                           |------R2------|
> >
> > Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are
> two
> > routes (ECMP) from R3 to the VIF:
> > R3 -> R1 -> R0
> > R3 -> R2 -> R0
> >
> > Now if we ping from the VIF to R3, the OVS flow execution on the HV of
> the VIF
> > will hit the R3's datapath which has flows that responds to the ICMP
> packet
> > by setting ICMP fields, which requires slowpath actions, and in later
> flow
> > tables it will hit the "group" action that selects between the ECMP
> routes.
> >
> > By default OVN uses "dp_hash" method for the "group" action.
> >
> > For the first miss upcall packet, dp_hash value is empty, so the group
> action
> > will be translated to "dp_hash" and "recirc".
> >
> > During action execution, because of the previous actions that sets ICMP
> fields,
> > the whole execution requires slowpath, so it tries to execute all actions
> in
> > userspace in odp_execute_actions(), including dp_hash action, except the
> > recirc action, which can only be executed in datapath. So the dp_hash
> value
> > is calculated in userspace, and then the packet is injected to datapath
> for
> > recirc action execution.
> >
> > However, the dp_hash calculated by the userspace is not passed to
> datapath.
> >
> > Because of this, the packet recirc in datapath doesn't have dp_hash
> value,
> > and the miss upcall for the recirced packet hits the same flow tables and
> > triggers same "dp_hash" and "recirc" action again, with exactly same
> recirc_id!
> >
> > This time, the new upcall doesn't require any slowpath execution, so both
> > the dp_hash and recirc actions are executed in datapath, after creating a
> > datapath megaflow like:
> >
> > recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> >
> > with match recirc_id equals the recirc id in the action, thus creating a
> loop.
> >
> > This patch fixes the problem by passing the calculated dp_hash value to
> > datapath in odp_key_from_dp_packet().
> >
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >  lib/odp-util.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/odp-util.c b/lib/odp-util.c
> > index b66d266..ac532fe 100644
> > --- a/lib/odp-util.c
> > +++ b/lib/odp-util.c
> > @@ -6392,6 +6392,10 @@ odp_key_from_dp_packet(struct ofpbuf *buf, const
> struct dp_packet *packet)
> >
> >      nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
> >
> > +    if (md->dp_hash) {
> > +        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
> > +    }
> > +
> >      if (flow_tnl_dst_is_set(&md->tunnel)) {
> >          tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
> >      }
> > --
> > 2.1.0
> >
>
> Ben and Ilya, this is the fix to the dp_hash problem we discussed in
> yesterday's meeting. The actual fix is simpler that I thought it would be.
> I didn't take the approach of executing dp_hash in datapath because in this
> case since the flow is required to be slowpathed, all the following packets
> for this flow will anyway get upcalled. If all the dp_hash for the flow is
> executed in slowpath then there is no consistency problem. So I think it is
> ok to keep the calculation in userspace and the fix is simple. Let me know
> if you think differently.
>
> Thanks,
> Han
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/lib/odp-util.c b/lib/odp-util.c
index b66d266..ac532fe 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -6392,6 +6392,10 @@  odp_key_from_dp_packet(struct ofpbuf *buf, const struct dp_packet *packet)
 
     nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, md->skb_priority);
 
+    if (md->dp_hash) {
+        nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, md->dp_hash);
+    }
+
     if (flow_tnl_dst_is_set(&md->tunnel)) {
         tun_key_to_attr(buf, &md->tunnel, &md->tunnel, NULL, NULL);
     }