diff mbox series

[ovs-dev,v2] northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch.

Message ID 20200911091013.694297-1-numans@ovn.org
State Accepted
Headers show
Series [ovs-dev,v2] northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch. | expand

Commit Message

Numan Siddique Sept. 11, 2020, 9:10 a.m. UTC
From: Numan Siddique <numans@ovn.org>

Prior to this patch, if a load balancer is configured on a logical switch but with no
ACLs with allow-related configured, then in the ingress pipeline only the packets
with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port.

If the backend of the load balancer, sends an invalid packet (for example invalid tcp
sequence number), then such packets will be delivered to the source logical port VIF
without unDNATting. This causes the source to reset the connection.

This patch fixes this issue by sending all the packets to conntrack if a load balancer
is configured on the logical switch. Because of this, any invalid (ct.inv) packets will
be dropped in the ingress pipeline itself.

Unfortunately this impacts the performance as now there will be extra recirculations
because of ct and ct_commit (for new connections) actions.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359
Reported-by: Tim Rozet (trozet@redhat.com)
Signed-off-by: Numan Siddique <numans@ovn.org>
---
v1 -> v2
 ----
  * Addressed the review comments from Dumitru and Mark.

 northd/ovn-northd.8.xml | 41 +++++++++++++++++-----
 northd/ovn-northd.c     | 77 +++++++++++++++++++++++++----------------
 2 files changed, 80 insertions(+), 38 deletions(-)

Comments

Dumitru Ceara Sept. 11, 2020, 9:32 a.m. UTC | #1
On 9/11/20 11:10 AM, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> Prior to this patch, if a load balancer is configured on a logical switch but with no
> ACLs with allow-related configured, then in the ingress pipeline only the packets
> with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port.
> 
> If the backend of the load balancer, sends an invalid packet (for example invalid tcp
> sequence number), then such packets will be delivered to the source logical port VIF
> without unDNATting. This causes the source to reset the connection.
> 
> This patch fixes this issue by sending all the packets to conntrack if a load balancer
> is configured on the logical switch. Because of this, any invalid (ct.inv) packets will
> be dropped in the ingress pipeline itself.
> 
> Unfortunately this impacts the performance as now there will be extra recirculations
> because of ct and ct_commit (for new connections) actions.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359
> Reported-by: Tim Rozet (trozet@redhat.com)
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---
> v1 -> v2
>  ----
>   * Addressed the review comments from Dumitru and Mark.
> 
>  northd/ovn-northd.8.xml | 41 +++++++++++++++++-----
>  northd/ovn-northd.c     | 77 +++++++++++++++++++++++++----------------
>  2 files changed, 80 insertions(+), 38 deletions(-)
> 
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index a275442a82..a9826e8e9e 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -340,15 +340,12 @@
>        it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic
>        to the next table. If load balancing rules with virtual IP addresses
>        (and ports) are configured in <code>OVN_Northbound</code> database for a
> -      logical switch datapath, a priority-100 flow is added for each configured
> -      virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is
> -      <code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6
> -      <var>VIPs</var>, the match is <code>ip &amp;&amp;
> -      ip6.dst == <var>VIP</var></code>. The flow sets an action
> +      logical switch datapath, a priority-100 flow is added with the match
> +      <code>ip</code> to match on IP packets and sets the action
>        <code>reg0[0] = 1; next;</code> to act as a hint for table
>        <code>Pre-stateful</code> to send IP packets to the connection tracker
> -      for packet de-fragmentation before eventually advancing to ingress table
> -      <code>LB</code>.
> +      for packet de-fragmentation before eventually advancing to ingress
> +      table <code>LB</code>.
>        If controller_event has been enabled and load balancing rules with
>        empty backends have been added in <code>OVN_Northbound</code>, a 130 flow
>        is added to trigger ovn-controller events whenever the chassis receives a
> @@ -356,6 +353,32 @@
>        previously created, it will be associated to the empty_lb logical flow
>      </p>
>  
> +    <p>
> +      Prior to <code>OVN 20.09</code> we were setting the
> +      <code>reg0[0] = 1</code> only if the IP destination matches the load
> +      balancer VIP. However this had few issues cases where a logical switch
> +      doesn't have any ACLs with <code>allow-related</code> action.
> +      To understand the issue lets a take a TCP load balancer -
> +      <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with
> +      IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the
> +      packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone
> +      id and the packet is load balanced to the backend - 10.0.0.3. For the
> +      reply packet from the backend lport, it is not sent to the conntrack of
> +      backend lport's zone id. This is fine as long as the packet is valid.
> +      Suppose the backend lport sends an invalid TCP packet (like incorrect
> +      sequence number), the packet gets delivered to the lport 'p1' without
> +      unDNATing the packet to the VIP - 10.0.0.10. And this causes the
> +      connection to be reset by the lport p1's VIF.
> +    </p>
> +
> +    <p>
> +      We can't fix this issue by adding a logical flow to drop ct.inv packets
> +      in the egress pipeline since it will drop all other connections not
> +      destined to the load balancers. To fix this issue, we send all the
> +      packets to the conntrack in the ingress pipeline if a load balancer is
> +      configured. We can now add a lflow to drop ct.inv packets.
> +    </p>
> +
>      <p>
>        This table also has a priority-110 flow with the match
>        <code>eth.dst == <var>E</var></code> for all logical switch
> @@ -430,8 +453,8 @@
>      <p>
>        This table also contains a priority 0 flow with action
>        <code>next;</code>, so that ACLs allow packets by default.  If the
> -      logical datapath has a statetful ACL, the following flows will
> -      also be added:
> +      logical datapath has a statetful ACL or a load balancer with VIP
> +      configured, the following flows will also be added:

Tiny nit: s/statetful/stateful

With this addressed:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks,
Dumitru
0-day Robot Sept. 11, 2020, 10:03 a.m. UTC | #2
Bleep bloop.  Greetings Numan Siddique, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Mark Michelson Sept. 11, 2020, 2:43 p.m. UTC | #3
Acked-by: Mark Michelson <mmichels@redhat.com>

Thanks Numan!

On 9/11/20 5:10 AM, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> Prior to this patch, if a load balancer is configured on a logical switch but with no
> ACLs with allow-related configured, then in the ingress pipeline only the packets
> with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port.
> 
> If the backend of the load balancer, sends an invalid packet (for example invalid tcp
> sequence number), then such packets will be delivered to the source logical port VIF
> without unDNATting. This causes the source to reset the connection.
> 
> This patch fixes this issue by sending all the packets to conntrack if a load balancer
> is configured on the logical switch. Because of this, any invalid (ct.inv) packets will
> be dropped in the ingress pipeline itself.
> 
> Unfortunately this impacts the performance as now there will be extra recirculations
> because of ct and ct_commit (for new connections) actions.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359
> Reported-by: Tim Rozet (trozet@redhat.com)
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---
> v1 -> v2
>   ----
>    * Addressed the review comments from Dumitru and Mark.
> 
>   northd/ovn-northd.8.xml | 41 +++++++++++++++++-----
>   northd/ovn-northd.c     | 77 +++++++++++++++++++++++++----------------
>   2 files changed, 80 insertions(+), 38 deletions(-)
> 
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index a275442a82..a9826e8e9e 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -340,15 +340,12 @@
>         it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic
>         to the next table. If load balancing rules with virtual IP addresses
>         (and ports) are configured in <code>OVN_Northbound</code> database for a
> -      logical switch datapath, a priority-100 flow is added for each configured
> -      virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is
> -      <code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6
> -      <var>VIPs</var>, the match is <code>ip &amp;&amp;
> -      ip6.dst == <var>VIP</var></code>. The flow sets an action
> +      logical switch datapath, a priority-100 flow is added with the match
> +      <code>ip</code> to match on IP packets and sets the action
>         <code>reg0[0] = 1; next;</code> to act as a hint for table
>         <code>Pre-stateful</code> to send IP packets to the connection tracker
> -      for packet de-fragmentation before eventually advancing to ingress table
> -      <code>LB</code>.
> +      for packet de-fragmentation before eventually advancing to ingress
> +      table <code>LB</code>.
>         If controller_event has been enabled and load balancing rules with
>         empty backends have been added in <code>OVN_Northbound</code>, a 130 flow
>         is added to trigger ovn-controller events whenever the chassis receives a
> @@ -356,6 +353,32 @@
>         previously created, it will be associated to the empty_lb logical flow
>       </p>
>   
> +    <p>
> +      Prior to <code>OVN 20.09</code> we were setting the
> +      <code>reg0[0] = 1</code> only if the IP destination matches the load
> +      balancer VIP. However this had few issues cases where a logical switch
> +      doesn't have any ACLs with <code>allow-related</code> action.
> +      To understand the issue lets a take a TCP load balancer -
> +      <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with
> +      IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the
> +      packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone
> +      id and the packet is load balanced to the backend - 10.0.0.3. For the
> +      reply packet from the backend lport, it is not sent to the conntrack of
> +      backend lport's zone id. This is fine as long as the packet is valid.
> +      Suppose the backend lport sends an invalid TCP packet (like incorrect
> +      sequence number), the packet gets delivered to the lport 'p1' without
> +      unDNATing the packet to the VIP - 10.0.0.10. And this causes the
> +      connection to be reset by the lport p1's VIF.
> +    </p>
> +
> +    <p>
> +      We can't fix this issue by adding a logical flow to drop ct.inv packets
> +      in the egress pipeline since it will drop all other connections not
> +      destined to the load balancers. To fix this issue, we send all the
> +      packets to the conntrack in the ingress pipeline if a load balancer is
> +      configured. We can now add a lflow to drop ct.inv packets.
> +    </p>
> +
>       <p>
>         This table also has a priority-110 flow with the match
>         <code>eth.dst == <var>E</var></code> for all logical switch
> @@ -430,8 +453,8 @@
>       <p>
>         This table also contains a priority 0 flow with action
>         <code>next;</code>, so that ACLs allow packets by default.  If the
> -      logical datapath has a statetful ACL, the following flows will
> -      also be added:
> +      logical datapath has a statetful ACL or a load balancer with VIP
> +      configured, the following flows will also be added:
>       </p>
>   
>       <ul>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index b95d6cd8a1..3967bae569 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows,
>       free(action);
>   }
>   
> +static bool
> +has_lb_vip(struct ovn_datapath *od)
> +{
> +    for (int i = 0; i < od->nbs->n_load_balancer; i++) {
> +        struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
> +        if (!smap_is_empty(&nb_lb->vips)) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static void
>   build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
>                struct shash *meter_groups, struct hmap *lbs)
> @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
>                                    110, lflows);
>       }
>   
> -    struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4);
> -    struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6);
>       bool vip_configured = false;
>       for (int i = 0; i < od->nbs->n_load_balancer; i++) {
>           struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
> @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
>   
>           for (size_t j = 0; j < lb->n_vips; j++) {
>               struct lb_vip *lb_vip = &lb->vips[j];
> -            if (lb_vip->addr_family == AF_INET) {
> -                sset_add(&all_ips_v4, lb_vip->vip);
> -            } else {
> -                sset_add(&all_ips_v6, lb_vip->vip);
> -            }
> -
>               build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb,
>                                         S_SWITCH_IN_PRE_LB, meter_groups);
>   
> @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
>       }
>   
>       /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send
> -     * packet to conntrack for defragmentation. */
> -    const char *ip_address;
> -    SSET_FOR_EACH (ip_address, &all_ips_v4) {
> -        char *match = xasprintf("ip && ip4.dst == %s", ip_address);
> -        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> -                      100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> -        free(match);
> -    }
> -
> -    SSET_FOR_EACH (ip_address, &all_ips_v6) {
> -        char *match = xasprintf("ip && ip6.dst == %s", ip_address);
> -        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> -                      100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> -        free(match);
> -    }
> -
> -    sset_destroy(&all_ips_v4);
> -    sset_destroy(&all_ips_v6);
> -
> +     * packet to conntrack for defragmentation.
> +     *
> +     * Send all the packets to conntrack in the ingress pipeline if the
> +     * logical switch has a load balancer with VIP configured. Earlier
> +     * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline
> +     * if the IP destination matches the VIP. But this causes few issues when
> +     * a logical switch has no ACLs configured with allow-related.
> +     * To understand the issue, lets a take a TCP load balancer -
> +     * 10.0.0.10:80=10.0.0.3:80.
> +     * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with
> +     * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1'
> +     * is sent to the p1's conntrack zone id and the packet is load balanced
> +     * to the backend - 10.0.0.3. For the reply packet from the backend lport,
> +     * it is not sent to the conntrack of backend lport's zone id. This is fine
> +     * as long as the packet is valid. Suppose the backend lport sends an
> +     *  invalid TCP packet (like incorrect sequence number), the packet gets
> +     * delivered to the lport 'p1' without unDNATing the packet to the
> +     * VIP - 10.0.0.10. And this causes the connection to be reset by the
> +     * lport p1's VIF.
> +     *
> +     * We can't fix this issue by adding a logical flow to drop ct.inv packets
> +     * in the egress pipeline since it will drop all other connections not
> +     * destined to the load balancers.
> +     *
> +     * To fix this issue, we send all the packets to the conntrack in the
> +     * ingress pipeline if a load balancer is configured. We can now
> +     * add a lflow to drop ct.inv packets.
> +     */
>       if (vip_configured) {
> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> +                      100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
>           ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB,
>                         100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
>       }
> @@ -5482,7 +5498,7 @@ static void
>   build_acls(struct ovn_datapath *od, struct hmap *lflows,
>              struct hmap *port_groups)
>   {
> -    bool has_stateful = has_stateful_acl(od);
> +    bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od));
>   
>       /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
>        * default.  A related rule at priority 1 is added below if there
> @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows)
>       ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;");
>       ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;");
>   
> -    if (od->nbs->load_balancer) {
> +    if (od->nbs->n_load_balancer) {
>           for (size_t i = 0; i < od->n_router_ports; i++) {
>               skip_port_from_conntrack(od, od->router_ports[i],
>                                        S_SWITCH_IN_LB, S_SWITCH_OUT_LB,
>                                        UINT16_MAX, lflows);
>           }
> +    }
> +
> +    if (has_lb_vip(od)) {
>           /* Ingress and Egress LB Table (Priority 65534).
>            *
>            * Send established traffic through conntrack for just NAT. */
>
Numan Siddique Sept. 11, 2020, 3:15 p.m. UTC | #4
On Fri, Sep 11, 2020 at 8:14 PM Mark Michelson <mmichels@redhat.com> wrote:

> Acked-by: Mark Michelson <mmichels@redhat.com>
>

Thanks Dumitru and Mark. I applied this patch to master.

Numan


>
> Thanks Numan!
>
> On 9/11/20 5:10 AM, numans@ovn.org wrote:
> > From: Numan Siddique <numans@ovn.org>
> >
> > Prior to this patch, if a load balancer is configured on a logical
> switch but with no
> > ACLs with allow-related configured, then in the ingress pipeline only
> the packets
> > with ip.dst = VIP will be sent to conntrack using the zone id of the
> source logical port.
> >
> > If the backend of the load balancer, sends an invalid packet (for
> example invalid tcp
> > sequence number), then such packets will be delivered to the source
> logical port VIF
> > without unDNATting. This causes the source to reset the connection.
> >
> > This patch fixes this issue by sending all the packets to conntrack if a
> load balancer
> > is configured on the logical switch. Because of this, any invalid
> (ct.inv) packets will
> > be dropped in the ingress pipeline itself.
> >
> > Unfortunately this impacts the performance as now there will be extra
> recirculations
> > because of ct and ct_commit (for new connections) actions.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359
> > Reported-by: Tim Rozet (trozet@redhat.com)
> > Signed-off-by: Numan Siddique <numans@ovn.org>
> > ---
> > v1 -> v2
> >   ----
> >    * Addressed the review comments from Dumitru and Mark.
> >
> >   northd/ovn-northd.8.xml | 41 +++++++++++++++++-----
> >   northd/ovn-northd.c     | 77 +++++++++++++++++++++++++----------------
> >   2 files changed, 80 insertions(+), 38 deletions(-)
> >
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index a275442a82..a9826e8e9e 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -340,15 +340,12 @@
> >         it contains a priority-110 flow to move IPv6 Neighbor Discovery
> traffic
> >         to the next table. If load balancing rules with virtual IP
> addresses
> >         (and ports) are configured in <code>OVN_Northbound</code>
> database for a
> > -      logical switch datapath, a priority-100 flow is added for each
> configured
> > -      virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the
> match is
> > -      <code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6
> > -      <var>VIPs</var>, the match is <code>ip &amp;&amp;
> > -      ip6.dst == <var>VIP</var></code>. The flow sets an action
> > +      logical switch datapath, a priority-100 flow is added with the
> match
> > +      <code>ip</code> to match on IP packets and sets the action
> >         <code>reg0[0] = 1; next;</code> to act as a hint for table
> >         <code>Pre-stateful</code> to send IP packets to the connection
> tracker
> > -      for packet de-fragmentation before eventually advancing to
> ingress table
> > -      <code>LB</code>.
> > +      for packet de-fragmentation before eventually advancing to ingress
> > +      table <code>LB</code>.
> >         If controller_event has been enabled and load balancing rules
> with
> >         empty backends have been added in <code>OVN_Northbound</code>, a
> 130 flow
> >         is added to trigger ovn-controller events whenever the chassis
> receives a
> > @@ -356,6 +353,32 @@
> >         previously created, it will be associated to the empty_lb
> logical flow
> >       </p>
> >
> > +    <p>
> > +      Prior to <code>OVN 20.09</code> we were setting the
> > +      <code>reg0[0] = 1</code> only if the IP destination matches the
> load
> > +      balancer VIP. However this had few issues cases where a logical
> switch
> > +      doesn't have any ACLs with <code>allow-related</code> action.
> > +      To understand the issue lets a take a TCP load balancer -
> > +      <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1
> with
> > +      IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10,
> then the
> > +      packet in the ingress pipeline of 'p1' is sent to the p1's
> conntrack zone
> > +      id and the packet is load balanced to the backend - 10.0.0.3. For
> the
> > +      reply packet from the backend lport, it is not sent to the
> conntrack of
> > +      backend lport's zone id. This is fine as long as the packet is
> valid.
> > +      Suppose the backend lport sends an invalid TCP packet (like
> incorrect
> > +      sequence number), the packet gets delivered to the lport 'p1'
> without
> > +      unDNATing the packet to the VIP - 10.0.0.10. And this causes the
> > +      connection to be reset by the lport p1's VIF.
> > +    </p>
> > +
> > +    <p>
> > +      We can't fix this issue by adding a logical flow to drop ct.inv
> packets
> > +      in the egress pipeline since it will drop all other connections
> not
> > +      destined to the load balancers. To fix this issue, we send all the
> > +      packets to the conntrack in the ingress pipeline if a load
> balancer is
> > +      configured. We can now add a lflow to drop ct.inv packets.
> > +    </p>
> > +
> >       <p>
> >         This table also has a priority-110 flow with the match
> >         <code>eth.dst == <var>E</var></code> for all logical switch
> > @@ -430,8 +453,8 @@
> >       <p>
> >         This table also contains a priority 0 flow with action
> >         <code>next;</code>, so that ACLs allow packets by default.  If
> the
> > -      logical datapath has a statetful ACL, the following flows will
> > -      also be added:
> > +      logical datapath has a statetful ACL or a load balancer with VIP
> > +      configured, the following flows will also be added:
> >       </p>
> >
> >       <ul>
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index b95d6cd8a1..3967bae569 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath
> *od, struct hmap *lflows,
> >       free(action);
> >   }
> >
> > +static bool
> > +has_lb_vip(struct ovn_datapath *od)
> > +{
> > +    for (int i = 0; i < od->nbs->n_load_balancer; i++) {
> > +        struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
> > +        if (!smap_is_empty(&nb_lb->vips)) {
> > +            return true;
> > +        }
> > +    }
> > +
> > +    return false;
> > +}
> > +
> >   static void
> >   build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
> >                struct shash *meter_groups, struct hmap *lbs)
> > @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap
> *lflows,
> >                                    110, lflows);
> >       }
> >
> > -    struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4);
> > -    struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6);
> >       bool vip_configured = false;
> >       for (int i = 0; i < od->nbs->n_load_balancer; i++) {
> >           struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
> > @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap
> *lflows,
> >
> >           for (size_t j = 0; j < lb->n_vips; j++) {
> >               struct lb_vip *lb_vip = &lb->vips[j];
> > -            if (lb_vip->addr_family == AF_INET) {
> > -                sset_add(&all_ips_v4, lb_vip->vip);
> > -            } else {
> > -                sset_add(&all_ips_v6, lb_vip->vip);
> > -            }
> > -
> >               build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb,
> >                                         S_SWITCH_IN_PRE_LB,
> meter_groups);
> >
> > @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct
> hmap *lflows,
> >       }
> >
> >       /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table
> send
> > -     * packet to conntrack for defragmentation. */
> > -    const char *ip_address;
> > -    SSET_FOR_EACH (ip_address, &all_ips_v4) {
> > -        char *match = xasprintf("ip && ip4.dst == %s", ip_address);
> > -        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> > -                      100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> > -        free(match);
> > -    }
> > -
> > -    SSET_FOR_EACH (ip_address, &all_ips_v6) {
> > -        char *match = xasprintf("ip && ip6.dst == %s", ip_address);
> > -        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> > -                      100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> > -        free(match);
> > -    }
> > -
> > -    sset_destroy(&all_ips_v4);
> > -    sset_destroy(&all_ips_v6);
> > -
> > +     * packet to conntrack for defragmentation.
> > +     *
> > +     * Send all the packets to conntrack in the ingress pipeline if the
> > +     * logical switch has a load balancer with VIP configured. Earlier
> > +     * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress
> pipeline
> > +     * if the IP destination matches the VIP. But this causes few
> issues when
> > +     * a logical switch has no ACLs configured with allow-related.
> > +     * To understand the issue, lets a take a TCP load balancer -
> > +     * 10.0.0.10:80=10.0.0.3:80.
> > +     * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection
> with
> > +     * the VIP - 10.0.0.10, then the packet in the ingress pipeline of
> 'p1'
> > +     * is sent to the p1's conntrack zone id and the packet is load
> balanced
> > +     * to the backend - 10.0.0.3. For the reply packet from the backend
> lport,
> > +     * it is not sent to the conntrack of backend lport's zone id. This
> is fine
> > +     * as long as the packet is valid. Suppose the backend lport sends
> an
> > +     *  invalid TCP packet (like incorrect sequence number), the packet
> gets
> > +     * delivered to the lport 'p1' without unDNATing the packet to the
> > +     * VIP - 10.0.0.10. And this causes the connection to be reset by
> the
> > +     * lport p1's VIF.
> > +     *
> > +     * We can't fix this issue by adding a logical flow to drop ct.inv
> packets
> > +     * in the egress pipeline since it will drop all other connections
> not
> > +     * destined to the load balancers.
> > +     *
> > +     * To fix this issue, we send all the packets to the conntrack in
> the
> > +     * ingress pipeline if a load balancer is configured. We can now
> > +     * add a lflow to drop ct.inv packets.
> > +     */
> >       if (vip_configured) {
> > +        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> > +                      100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> >           ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB,
> >                         100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> >       }
> > @@ -5482,7 +5498,7 @@ static void
> >   build_acls(struct ovn_datapath *od, struct hmap *lflows,
> >              struct hmap *port_groups)
> >   {
> > -    bool has_stateful = has_stateful_acl(od);
> > +    bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od));
> >
> >       /* Ingress and Egress ACL Table (Priority 0): Packets are allowed
> by
> >        * default.  A related rule at priority 1 is added below if there
> > @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap
> *lflows)
> >       ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;");
> >       ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;");
> >
> > -    if (od->nbs->load_balancer) {
> > +    if (od->nbs->n_load_balancer) {
> >           for (size_t i = 0; i < od->n_router_ports; i++) {
> >               skip_port_from_conntrack(od, od->router_ports[i],
> >                                        S_SWITCH_IN_LB, S_SWITCH_OUT_LB,
> >                                        UINT16_MAX, lflows);
> >           }
> > +    }
> > +
> > +    if (has_lb_vip(od)) {
> >           /* Ingress and Egress LB Table (Priority 65534).
> >            *
> >            * Send established traffic through conntrack for just NAT. */
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
diff mbox series

Patch

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index a275442a82..a9826e8e9e 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -340,15 +340,12 @@ 
       it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic
       to the next table. If load balancing rules with virtual IP addresses
       (and ports) are configured in <code>OVN_Northbound</code> database for a
-      logical switch datapath, a priority-100 flow is added for each configured
-      virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is
-      <code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6
-      <var>VIPs</var>, the match is <code>ip &amp;&amp;
-      ip6.dst == <var>VIP</var></code>. The flow sets an action
+      logical switch datapath, a priority-100 flow is added with the match
+      <code>ip</code> to match on IP packets and sets the action
       <code>reg0[0] = 1; next;</code> to act as a hint for table
       <code>Pre-stateful</code> to send IP packets to the connection tracker
-      for packet de-fragmentation before eventually advancing to ingress table
-      <code>LB</code>.
+      for packet de-fragmentation before eventually advancing to ingress
+      table <code>LB</code>.
       If controller_event has been enabled and load balancing rules with
       empty backends have been added in <code>OVN_Northbound</code>, a 130 flow
       is added to trigger ovn-controller events whenever the chassis receives a
@@ -356,6 +353,32 @@ 
       previously created, it will be associated to the empty_lb logical flow
     </p>
 
+    <p>
+      Prior to <code>OVN 20.09</code> we were setting the
+      <code>reg0[0] = 1</code> only if the IP destination matches the load
+      balancer VIP. However this had few issues cases where a logical switch
+      doesn't have any ACLs with <code>allow-related</code> action.
+      To understand the issue lets a take a TCP load balancer -
+      <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with
+      IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the
+      packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone
+      id and the packet is load balanced to the backend - 10.0.0.3. For the
+      reply packet from the backend lport, it is not sent to the conntrack of
+      backend lport's zone id. This is fine as long as the packet is valid.
+      Suppose the backend lport sends an invalid TCP packet (like incorrect
+      sequence number), the packet gets delivered to the lport 'p1' without
+      unDNATing the packet to the VIP - 10.0.0.10. And this causes the
+      connection to be reset by the lport p1's VIF.
+    </p>
+
+    <p>
+      We can't fix this issue by adding a logical flow to drop ct.inv packets
+      in the egress pipeline since it will drop all other connections not
+      destined to the load balancers. To fix this issue, we send all the
+      packets to the conntrack in the ingress pipeline if a load balancer is
+      configured. We can now add a lflow to drop ct.inv packets.
+    </p>
+
     <p>
       This table also has a priority-110 flow with the match
       <code>eth.dst == <var>E</var></code> for all logical switch
@@ -430,8 +453,8 @@ 
     <p>
       This table also contains a priority 0 flow with action
       <code>next;</code>, so that ACLs allow packets by default.  If the
-      logical datapath has a statetful ACL, the following flows will
-      also be added:
+      logical datapath has a statetful ACL or a load balancer with VIP
+      configured, the following flows will also be added:
     </p>
 
     <ul>
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index b95d6cd8a1..3967bae569 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -5034,6 +5034,19 @@  build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows,
     free(action);
 }
 
+static bool
+has_lb_vip(struct ovn_datapath *od)
+{
+    for (int i = 0; i < od->nbs->n_load_balancer; i++) {
+        struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
+        if (!smap_is_empty(&nb_lb->vips)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void
 build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
              struct shash *meter_groups, struct hmap *lbs)
@@ -5072,8 +5085,6 @@  build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
                                  110, lflows);
     }
 
-    struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4);
-    struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6);
     bool vip_configured = false;
     for (int i = 0; i < od->nbs->n_load_balancer; i++) {
         struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
@@ -5083,12 +5094,6 @@  build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
 
         for (size_t j = 0; j < lb->n_vips; j++) {
             struct lb_vip *lb_vip = &lb->vips[j];
-            if (lb_vip->addr_family == AF_INET) {
-                sset_add(&all_ips_v4, lb_vip->vip);
-            } else {
-                sset_add(&all_ips_v6, lb_vip->vip);
-            }
-
             build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb,
                                       S_SWITCH_IN_PRE_LB, meter_groups);
 
@@ -5102,26 +5107,37 @@  build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
     }
 
     /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send
-     * packet to conntrack for defragmentation. */
-    const char *ip_address;
-    SSET_FOR_EACH (ip_address, &all_ips_v4) {
-        char *match = xasprintf("ip && ip4.dst == %s", ip_address);
-        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
-                      100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
-        free(match);
-    }
-
-    SSET_FOR_EACH (ip_address, &all_ips_v6) {
-        char *match = xasprintf("ip && ip6.dst == %s", ip_address);
-        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
-                      100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
-        free(match);
-    }
-
-    sset_destroy(&all_ips_v4);
-    sset_destroy(&all_ips_v6);
-
+     * packet to conntrack for defragmentation.
+     *
+     * Send all the packets to conntrack in the ingress pipeline if the
+     * logical switch has a load balancer with VIP configured. Earlier
+     * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline
+     * if the IP destination matches the VIP. But this causes few issues when
+     * a logical switch has no ACLs configured with allow-related.
+     * To understand the issue, lets a take a TCP load balancer -
+     * 10.0.0.10:80=10.0.0.3:80.
+     * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with
+     * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1'
+     * is sent to the p1's conntrack zone id and the packet is load balanced
+     * to the backend - 10.0.0.3. For the reply packet from the backend lport,
+     * it is not sent to the conntrack of backend lport's zone id. This is fine
+     * as long as the packet is valid. Suppose the backend lport sends an
+     *  invalid TCP packet (like incorrect sequence number), the packet gets
+     * delivered to the lport 'p1' without unDNATing the packet to the
+     * VIP - 10.0.0.10. And this causes the connection to be reset by the
+     * lport p1's VIF.
+     *
+     * We can't fix this issue by adding a logical flow to drop ct.inv packets
+     * in the egress pipeline since it will drop all other connections not
+     * destined to the load balancers.
+     *
+     * To fix this issue, we send all the packets to the conntrack in the
+     * ingress pipeline if a load balancer is configured. We can now
+     * add a lflow to drop ct.inv packets.
+     */
     if (vip_configured) {
+        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
+                      100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
         ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB,
                       100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
     }
@@ -5482,7 +5498,7 @@  static void
 build_acls(struct ovn_datapath *od, struct hmap *lflows,
            struct hmap *port_groups)
 {
-    bool has_stateful = has_stateful_acl(od);
+    bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od));
 
     /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
      * default.  A related rule at priority 1 is added below if there
@@ -5746,12 +5762,15 @@  build_lb(struct ovn_datapath *od, struct hmap *lflows)
     ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;");
     ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;");
 
-    if (od->nbs->load_balancer) {
+    if (od->nbs->n_load_balancer) {
         for (size_t i = 0; i < od->n_router_ports; i++) {
             skip_port_from_conntrack(od, od->router_ports[i],
                                      S_SWITCH_IN_LB, S_SWITCH_OUT_LB,
                                      UINT16_MAX, lflows);
         }
+    }
+
+    if (has_lb_vip(od)) {
         /* Ingress and Egress LB Table (Priority 65534).
          *
          * Send established traffic through conntrack for just NAT. */