[ovs-dev] ovn.at: A "peer" is only for interconnected routers.
diff mbox

Message ID 1468825230-10504-1-git-send-email-guru@ovn.org
State Accepted
Headers show

Commit Message

Guru Shetty July 18, 2016, 7 a.m. UTC
We should not use "peer" while connecting a router to a switch.
(Doing so, will cause ovn-northd to constantly create and destroy
port_binding records which causes CPU utilization of ovn-controller to
spike up.)

Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port commands.")
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
---
 tests/ovn.at | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

Comments

Ben Pfaff July 18, 2016, 5:35 p.m. UTC | #1
On Mon, Jul 18, 2016 at 12:00:30AM -0700, Gurucharan Shetty wrote:
> We should not use "peer" while connecting a router to a switch.
> (Doing so, will cause ovn-northd to constantly create and destroy
> port_binding records which causes CPU utilization of ovn-controller to
> spike up.)
> 
> Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port commands.")
> Signed-off-by: Gurucharan Shetty <guru@ovn.org>

This seems like a bug in ovn-northd.  Did you investigate why it
happens?

Acked-by: Ben Pfaff <blp@ovn.org>
Flavio Fernandes July 18, 2016, 5:38 p.m. UTC | #2
On Mon, Jul 18, 2016 at 3:00 AM, Gurucharan Shetty <guru@ovn.org> wrote:

> We should not use "peer" while connecting a router to a switch.
> (Doing so, will cause ovn-northd to constantly create and destroy
> port_binding records which causes CPU utilization of ovn-controller to
> spike up.)
>
> Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port commands.")
> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
>

Acked-by: Flavio Fernandes <flavio@flaviof.com>
Guru Shetty July 18, 2016, 6:34 p.m. UTC | #3
On 18 July 2016 at 10:35, Ben Pfaff <blp@ovn.org> wrote:

> On Mon, Jul 18, 2016 at 12:00:30AM -0700, Gurucharan Shetty wrote:
> > We should not use "peer" while connecting a router to a switch.
> > (Doing so, will cause ovn-northd to constantly create and destroy
> > port_binding records which causes CPU utilization of ovn-controller to
> > spike up.)
> >
> > Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port commands.")
> > Signed-off-by: Gurucharan Shetty <guru@ovn.org>
>
> I updated the commit message to correctly say:
    We should not use "peer" while connecting a router to a switch.
    (Doing so, will cause ovn-northd to constantly create and destroy
    logical_flow records which causes CPU utilization of ovn-controller to
    spike up.)


> This seems like a bug in ovn-northd.  Did you investigate why it
> happens?
>
I think I know ( I haven't put a debugger to confirm). We create a logical
flow incorrectly and add it via ovn_lflow_add() with a switch's datapath
and a router's pipeline stage (S_ROUTER_IN_ARP_RESOLVE) here (This is when
we incorrectly set router's peer as a switch):
https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2614

In build_lflows, we go through each record of SB database's logical flows
and then do a ovn_lflow_find() which returns a negative if the data was
wrongly inserted, so it goes ahead and deleted the record from SB database.

In the next block, it will insert it back into the SB database. I will send
one additional fix. But, I think we should also assert in ovn_lflow_add if
we add a datapath with a different datapath's pipeline - not sure what is a
nice way to do that.


>
> Acked-by: Ben Pfaff <blp@ovn.org>
>
Guru Shetty July 18, 2016, 8:57 p.m. UTC | #4
On 18 July 2016 at 13:11, Ben Pfaff <blp@ovn.org> wrote:

> On Mon, Jul 18, 2016 at 11:34:54AM -0700, Guru Shetty wrote:
> > On 18 July 2016 at 10:35, Ben Pfaff <blp@ovn.org> wrote:
> >
> > > On Mon, Jul 18, 2016 at 12:00:30AM -0700, Gurucharan Shetty wrote:
> > > > We should not use "peer" while connecting a router to a switch.
> > > > (Doing so, will cause ovn-northd to constantly create and destroy
> > > > port_binding records which causes CPU utilization of ovn-controller
> to
> > > > spike up.)
> > > >
> > > > Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port
> commands.")
> > > > Signed-off-by: Gurucharan Shetty <guru@ovn.org>
> > >
> > > I updated the commit message to correctly say:
> >     We should not use "peer" while connecting a router to a switch.
> >     (Doing so, will cause ovn-northd to constantly create and destroy
> >     logical_flow records which causes CPU utilization of ovn-controller
> to
> >     spike up.)
> >
> >
> > > This seems like a bug in ovn-northd.  Did you investigate why it
> > > happens?
> > >
> > I think I know ( I haven't put a debugger to confirm). We create a
> logical
> > flow incorrectly and add it via ovn_lflow_add() with a switch's datapath
> > and a router's pipeline stage (S_ROUTER_IN_ARP_RESOLVE) here (This is
> when
> > we incorrectly set router's peer as a switch):
> >
> https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2614
>
> Oh, good spotting, can we fix it something like this?
>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 7ce509d..97e3271 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -715,7 +736,10 @@ join_logical_ports(struct northd_context *ctx,
>                  sizeof *op->od->router_ports * (op->od->n_router_ports +
> 1));
>              op->od->router_ports[op->od->n_router_ports++] = op;
>          } else if (op->nbr && op->nbr->peer) {
> -            op->peer = ovn_port_find(ports, op->nbr->peer);
> +            struct ovn_port *peer = ovn_port_find(ports, op->nbr->peer);
> +            if (peer && peer->nbr) {
> +                op->peer = peer;
> +            }
>          }
>      }
>  }
>
Since the original patch in question fixed faulty tests independently, I
had applied it as soon as I got your Ack. The above code snippet is good
underlying fix. So you have my:
Acked-by: Gurucharan Shetty <guru@ovn.org>


>
> > In build_lflows, we go through each record of SB database's logical flows
> > and then do a ovn_lflow_find() which returns a negative if the data was
> > wrongly inserted, so it goes ahead and deleted the record from SB
> database.
> >
> > In the next block, it will insert it back into the SB database. I will
> send
> > one additional fix. But, I think we should also assert in ovn_lflow_add
> if
> > we add a datapath with a different datapath's pipeline - not sure what
> is a
> > nice way to do that.
>
> Here's an idea (it compiles but I didn't test it):
>
Nice. I tested the below and the tests asserted with a wrong configuration.
You have my:
Acked-by: Gurucharan Shetty <guru@ovn.org>

I think the below patch which Ryan Acked is probably still relevant to
apply?
https://patchwork.ozlabs.org/patch/649723/

>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 7ce509d..905f6a7 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -175,6 +175,20 @@ ovn_stage_to_str(enum ovn_stage stage)
>          default: return "<unknown>";
>      }
>  }
> +
> +/* Returns the type of the datapath to which a flow with the given
> 'stage' may
> + * be added. */
> +static enum ovn_datapath_type
> +ovn_stage_to_datapath_type(enum ovn_stage stage)
> +{
> +    switch (stage) {
> +#define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME)       \
> +        case S_##DP_TYPE##_##PIPELINE##_##STAGE: return DP_##DP_TYPE;
> +    PIPELINE_STAGES
> +#undef PIPELINE_STAGE
> +    default: OVS_NOT_REACHED();
> +    }
> +}
>
>  static void
>  usage(void)
> @@ -303,6 +317,13 @@ ovn_datapath_destroy(struct hmap *datapaths, struct
> ovn_datapath *od)
>      }
>  }
>
> +/* Returns 'od''s datapath type. */
> +static enum ovn_datapath_type
> +ovn_datapath_get_type(const struct ovn_datapath *od)
> +{
> +    return od->nbs ? DP_SWITCH : DP_ROUTER;
> +}
> +
>  static struct ovn_datapath *
>  ovn_datapath_find(struct hmap *datapaths, const struct uuid *uuid)
>  {
> @@ -985,6 +1006,8 @@ ovn_lflow_add(struct hmap *lflow_map, struct
> ovn_datapath *od,
>                enum ovn_stage stage, uint16_t priority,
>                const char *match, const char *actions)
>  {
> +    ovs_assert(ovn_stage_to_datapath_type(stage) ==
> ovn_datapath_get_type(od));
> +
>      struct ovn_lflow *lflow = xmalloc(sizeof *lflow);
>      ovn_lflow_init(lflow, od, stage, priority,
>                     xstrdup(match), xstrdup(actions));
>
> Thanks,
>
> Ben.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
Ben Pfaff July 19, 2016, 3:44 p.m. UTC | #5
On Mon, Jul 18, 2016 at 01:57:53PM -0700, Guru Shetty wrote:
> On 18 July 2016 at 13:11, Ben Pfaff <blp@ovn.org> wrote:
> 
> > On Mon, Jul 18, 2016 at 11:34:54AM -0700, Guru Shetty wrote:
> > > On 18 July 2016 at 10:35, Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > > On Mon, Jul 18, 2016 at 12:00:30AM -0700, Gurucharan Shetty wrote:
> > > > > We should not use "peer" while connecting a router to a switch.
> > > > > (Doing so, will cause ovn-northd to constantly create and destroy
> > > > > port_binding records which causes CPU utilization of ovn-controller
> > to
> > > > > spike up.)
> > > > >
> > > > > Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port
> > commands.")
> > > > > Signed-off-by: Gurucharan Shetty <guru@ovn.org>
> > > >
> > > > I updated the commit message to correctly say:
> > >     We should not use "peer" while connecting a router to a switch.
> > >     (Doing so, will cause ovn-northd to constantly create and destroy
> > >     logical_flow records which causes CPU utilization of ovn-controller
> > to
> > >     spike up.)
> > >
> > >
> > > > This seems like a bug in ovn-northd.  Did you investigate why it
> > > > happens?
> > > >
> > > I think I know ( I haven't put a debugger to confirm). We create a
> > logical
> > > flow incorrectly and add it via ovn_lflow_add() with a switch's datapath
> > > and a router's pipeline stage (S_ROUTER_IN_ARP_RESOLVE) here (This is
> > when
> > > we incorrectly set router's peer as a switch):
> > >
> > https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2614
> >
> > Oh, good spotting, can we fix it something like this?
> >
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 7ce509d..97e3271 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -715,7 +736,10 @@ join_logical_ports(struct northd_context *ctx,
> >                  sizeof *op->od->router_ports * (op->od->n_router_ports +
> > 1));
> >              op->od->router_ports[op->od->n_router_ports++] = op;
> >          } else if (op->nbr && op->nbr->peer) {
> > -            op->peer = ovn_port_find(ports, op->nbr->peer);
> > +            struct ovn_port *peer = ovn_port_find(ports, op->nbr->peer);
> > +            if (peer && peer->nbr) {
> > +                op->peer = peer;
> > +            }
> >          }
> >      }
> >  }
> >
> Since the original patch in question fixed faulty tests independently, I
> had applied it as soon as I got your Ack. The above code snippet is good
> underlying fix. So you have my:
> Acked-by: Gurucharan Shetty <guru@ovn.org>

Thanks for the ack!

> 
> >
> > > In build_lflows, we go through each record of SB database's logical flows
> > > and then do a ovn_lflow_find() which returns a negative if the data was
> > > wrongly inserted, so it goes ahead and deleted the record from SB
> > database.
> > >
> > > In the next block, it will insert it back into the SB database. I will
> > send
> > > one additional fix. But, I think we should also assert in ovn_lflow_add
> > if
> > > we add a datapath with a different datapath's pipeline - not sure what
> > is a
> > > nice way to do that.
> >
> > Here's an idea (it compiles but I didn't test it):
> >
> Nice. I tested the below and the tests asserted with a wrong configuration.
> You have my:
> Acked-by: Gurucharan Shetty <guru@ovn.org>

Thanks for this ack too.

I posted these as proper patches in case anyone has a comment:
        https://patchwork.ozlabs.org/patch/650263/
        https://patchwork.ozlabs.org/patch/650264/

I forgot to apply your acks to them, but I'll do that in a day or so and
commit them if no one else comments.

> I think the below patch which Ryan Acked is probably still relevant to
> apply?
> https://patchwork.ozlabs.org/patch/649723/

I'll take a look, thanks.
Ben Pfaff July 19, 2016, 4:08 p.m. UTC | #6
On Tue, Jul 19, 2016 at 08:44:23AM -0700, Ben Pfaff wrote:
> On Mon, Jul 18, 2016 at 01:57:53PM -0700, Guru Shetty wrote:
> I posted these as proper patches in case anyone has a comment:
>         https://patchwork.ozlabs.org/patch/650263/
>         https://patchwork.ozlabs.org/patch/650264/
> 
> I forgot to apply your acks to them, but I'll do that in a day or so and
> commit them if no one else comments.
> 
> > I think the below patch which Ryan Acked is probably still relevant to
> > apply?
> > https://patchwork.ozlabs.org/patch/649723/
> 
> I'll take a look, thanks.

I ended up posting a new, revised series:
        https://patchwork.ozlabs.org/patch/650269/
        https://patchwork.ozlabs.org/patch/650270/
        https://patchwork.ozlabs.org/patch/650271/
Ryan Moats July 19, 2016, 4:10 p.m. UTC | #7
"dev" <dev-bounces@openvswitch.org> wrote on 07/19/2016 11:08:10 AM:

> From: Ben Pfaff <blp@ovn.org>
> To: Guru Shetty <guru@ovn.org>
> Cc: ovs dev <dev@openvswitch.org>
> Date: 07/19/2016 11:08 AM
> Subject: Re: [ovs-dev] [PATCH] ovn.at: A "peer" is only for
> interconnected routers.
> Sent by: "dev" <dev-bounces@openvswitch.org>
>
> On Tue, Jul 19, 2016 at 08:44:23AM -0700, Ben Pfaff wrote:
> > On Mon, Jul 18, 2016 at 01:57:53PM -0700, Guru Shetty wrote:
> > I posted these as proper patches in case anyone has a comment:
> >         https://patchwork.ozlabs.org/patch/650263/
> >         https://patchwork.ozlabs.org/patch/650264/
> >
> > I forgot to apply your acks to them, but I'll do that in a day or so
and
> > commit them if no one else comments.
> >
> > > I think the below patch which Ryan Acked is probably still relevant
to
> > > apply?
> > > https://patchwork.ozlabs.org/patch/649723/
> >
> > I'll take a look, thanks.
>
> I ended up posting a new, revised series:
>         https://patchwork.ozlabs.org/patch/650269/
>         https://patchwork.ozlabs.org/patch/650270/
>         https://patchwork.ozlabs.org/patch/650271/

I'll take a look and see if anything bothers me :)

Ryan

Patch
diff mbox

diff --git a/tests/ovn.at b/tests/ovn.at
index 12de125..4f077d4 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -1508,8 +1508,7 @@  done
 ovn-nbctl lr-add lr0
 for i in 1 2 3; do
     for j in 1 2 3; do
-        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j 192.168.$i$j.254/24 \
-            peer=lrp$i$j-attachment
+        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j 192.168.$i$j.254/24
         ovn-nbctl \
             -- lsp-add ls$i lrp$i$j-attachment \
             -- set Logical_Switch_Port lrp$i$j-attachment type=router \
@@ -2305,13 +2304,13 @@  ovn-nbctl ls-add ls1
 ovn-nbctl ls-add ls2
 
 # Connect ls1 to R1
-ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:03 192.168.1.1/24 peer=rp-ls1
+ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:03 192.168.1.1/24
 
 ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 type=router \
   options:router-port=ls1 addresses=\"00:00:00:01:02:03\"
 
 # Connect ls2 to R2
-ovn-nbctl lrp-add R2 ls2 00:00:00:01:02:04 172.16.1.1/24 peer=rp-ls2
+ovn-nbctl lrp-add R2 ls2 00:00:00:01:02:04 172.16.1.1/24
 
 ovn-nbctl lsp-add ls2 rp-ls2 -- set Logical_Switch_Port rp-ls2 type=router \
   options:router-port=ls2 addresses=\"00:00:00:01:02:04\"
@@ -2430,8 +2429,7 @@  ovn-nbctl lr-add R1
 ovn-nbctl ls-add ls1
 
 # Connect ls1 to R1
-ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:03 192.168.1.1/24 172.16.1.1/24 \
-          peer=rp-ls1
+ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:03 192.168.1.1/24 172.16.1.1/24
 ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 type=router \
           options:router-port=ls1 addresses=\"00:00:00:01:02:03\"
 
@@ -2562,12 +2560,12 @@  ovn-nbctl ls-add ls1
 ovn-nbctl ls-add ls2
 
 # Connect ls1 to R1
-ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:03 192.168.1.1/24 peer=rp-ls1
+ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:03 192.168.1.1/24
 ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 type=router \
           options:router-port=ls1 addresses=\"00:00:00:01:02:03\"
 
 # Connect ls2 to R1
-ovn-nbctl lrp-add R1 ls2 00:00:00:01:02:04 172.16.1.1/24 peer=rp-ls2
+ovn-nbctl lrp-add R1 ls2 00:00:00:01:02:04 172.16.1.1/24
 ovn-nbctl lsp-add ls2 rp-ls2 -- set Logical_Switch_Port rp-ls2 type=router \
           options:router-port=ls2 addresses=\"00:00:00:01:02:04\"
 
@@ -2688,17 +2686,17 @@  ovn-nbctl ls-add alice
 ovn-nbctl ls-add bob
 
 # Connect foo to R1
-ovn-nbctl lrp-add R1 foo 00:00:00:01:02:03 192.168.1.1/24 peer=rp-foo
+ovn-nbctl lrp-add R1 foo 00:00:00:01:02:03 192.168.1.1/24
 ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo type=router \
           options:router-port=foo addresses=\"00:00:00:01:02:03\"
 
 # Connect alice to R2
-ovn-nbctl lrp-add R2 alice 00:00:00:01:02:04 172.16.1.1/24 peer=rp-alice
+ovn-nbctl lrp-add R2 alice 00:00:00:01:02:04 172.16.1.1/24
 ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
           type=router options:router-port=alice addresses=\"00:00:00:01:02:04\"
 
 # Connect bob to R2
-ovn-nbctl lrp-add R2 bob 00:00:00:01:02:05 172.16.2.1/24 peer=rp-bob
+ovn-nbctl lrp-add R2 bob 00:00:00:01:02:05 172.16.2.1/24
 ovn-nbctl lsp-add bob rp-bob -- set Logical_Switch_Port rp-bob type=router \
           options:router-port=bob addresses=\"00:00:00:01:02:05\"