mbox series

[ovs-dev,v2,0/3] ovn-controller: Fix and refactor chassis ovn-sbdb record init

Message ID 20190708100643.25904.50437.stgit@dceara.remote.csb
Headers show
Series ovn-controller: Fix and refactor chassis ovn-sbdb record init | expand

Message

Dumitru Ceara July 8, 2019, 10:06 a.m. UTC
The chassis_run code didn't take into account the scenario when the
system-id was changed in the Open_vSwitch table. Due to this the code
was trying to insert a new Chassis record in the OVN_Southbound DB with
the same Encaps as the previous Chassis record. The transaction used
to insert the new records was aborting due to the ["type", "ip"]
index constraint violation as we were creating new Encap entries with
the same "type" and "ip" as the old ones.

This series:
1. introduces a (partial) fix for the reported issue by storing in
ovn-controller memory the last successfully configured chassis-id.
2. refactors the code in chassis.c to abstract out the string
processing and make available the ovs configuration to the code.
that looks up stale chassis entries in the OVN_Southbound DB
3. completes the fix at point 1 above by taking care of the
scenario when ovn-controller restarts after stopping forcefully
(e.g., due to a crash) and the OVS system-id has changed in the
meantime.

Dumitru Ceara (3):
      ovn-controller: Fix chassis ovn-sbdb record init
      ovn-controller: Refactor chassis.c to abstract the string parsing
      ovn-controller: Update stale chassis entry at init


 ovn/controller/chassis.c        |  676 +++++++++++++++++++++++++++------------
 ovn/controller/chassis.h        |    2 
 ovn/controller/ovn-controller.c |   26 +-
 tests/ovn-controller.at         |    9 +
 4 files changed, 501 insertions(+), 212 deletions(-)


---
v2: rebase

Comments

Ben Pfaff July 8, 2019, 9:05 p.m. UTC | #1
On Mon, Jul 08, 2019 at 12:06:45PM +0200, Dumitru Ceara wrote:
> The chassis_run code didn't take into account the scenario when the
> system-id was changed in the Open_vSwitch table. Due to this the code
> was trying to insert a new Chassis record in the OVN_Southbound DB with
> the same Encaps as the previous Chassis record. The transaction used
> to insert the new records was aborting due to the ["type", "ip"]
> index constraint violation as we were creating new Encap entries with
> the same "type" and "ip" as the old ones.

Thanks.  I applied this series to master.
Han Zhou July 24, 2019, 10:50 p.m. UTC | #2
On Mon, Jul 8, 2019 at 2:11 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Mon, Jul 08, 2019 at 12:06:45PM +0200, Dumitru Ceara wrote:
> > The chassis_run code didn't take into account the scenario when the
> > system-id was changed in the Open_vSwitch table. Due to this the code
> > was trying to insert a new Chassis record in the OVN_Southbound DB with
> > the same Encaps as the previous Chassis record. The transaction used
> > to insert the new records was aborting due to the ["type", "ip"]
> > index constraint violation as we were creating new Encap entries with
> > the same "type" and "ip" as the old ones.
>
> Thanks.  I applied this series to master.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Hi Dumitru,

When reviewing Numan's fix "ovn-controller: Fix the chassis row recreation
issue" I found this original change and I have a question here regarding
this series. I tried this feature when SSL & RBAC is enabled, and it seems
not working as this patch declared. I used the OVN sandbox (which uses SSL
by default) to test.

Initially:
$ ovn-sbctl show
Chassis "chassis-1"
    hostname: sandbox
    Encap geneve
        ip: "127.0.0.1"
        options: {csum="true"}

Then update chassis id:
$ ovs-vsctl set open . external_ids:system-id="chassis-2"

The SB DB didn't get updated, and there are warn logs:
2019-07-24T08:28:51.036Z|00015|ovsdb_idl|WARN|transaction error:
{"details":"RBAC rules for client \"chassis-1\" role \"ovn-controller\"
prohibit modification of table \"Chassis\".","error":"permission error"}
2019-07-24T08:28:51.036Z|00016|chassis|WARN|Could not find Chassis : stored
(chassis-2) ovs (chassis-2)

This seems to be expected, because otherwise RBAC is malfunctioning.
However, I am not sure what is the goal of this patch. Is it supposed to
solve the problem only when HV uses TCP but not for SSL? If so, shall this
behaviour be clarified in some documents? Or did I misunderstood something?
(Sorry that I was not able to post the question during the patch review.)

Thanks,
Han
Dumitru Ceara July 25, 2019, 8:08 a.m. UTC | #3
On Thu, Jul 25, 2019 at 12:51 AM Han Zhou <zhouhan@gmail.com> wrote:
>
>
>
> On Mon, Jul 8, 2019 at 2:11 PM Ben Pfaff <blp@ovn.org> wrote:
> >
> > On Mon, Jul 08, 2019 at 12:06:45PM +0200, Dumitru Ceara wrote:
> > > The chassis_run code didn't take into account the scenario when the
> > > system-id was changed in the Open_vSwitch table. Due to this the code
> > > was trying to insert a new Chassis record in the OVN_Southbound DB with
> > > the same Encaps as the previous Chassis record. The transaction used
> > > to insert the new records was aborting due to the ["type", "ip"]
> > > index constraint violation as we were creating new Encap entries with
> > > the same "type" and "ip" as the old ones.
> >
> > Thanks.  I applied this series to master.
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> Hi Dumitru,
>
> When reviewing Numan's fix "ovn-controller: Fix the chassis row recreation issue" I found this original change and I have a question here regarding this series. I tried this feature when SSL & RBAC is enabled, and it seems not working as this patch declared. I used the OVN sandbox (which uses SSL by default) to test.
>
> Initially:
> $ ovn-sbctl show
> Chassis "chassis-1"
>     hostname: sandbox
>     Encap geneve
>         ip: "127.0.0.1"
>         options: {csum="true"}
>
> Then update chassis id:
> $ ovs-vsctl set open . external_ids:system-id="chassis-2"
>
> The SB DB didn't get updated, and there are warn logs:
> 2019-07-24T08:28:51.036Z|00015|ovsdb_idl|WARN|transaction error: {"details":"RBAC rules for client \"chassis-1\" role \"ovn-controller\" prohibit modification of table \"Chassis\".","error":"permission error"}
> 2019-07-24T08:28:51.036Z|00016|chassis|WARN|Could not find Chassis : stored (chassis-2) ovs (chassis-2)
>
> This seems to be expected, because otherwise RBAC is malfunctioning. However, I am not sure what is the goal of this patch. Is it supposed to solve the problem only when HV uses TCP but not for SSL? If so, shall this behaviour be clarified in some documents? Or did I misunderstood something? (Sorry that I was not able to post the question during the patch review.)
>
> Thanks,
> Han

Hi Han,

You're right, changing the OVS system-id when using SSL won't work due
to RBAC and that's indeed expected. This was the behavior for
ovn-controller also before the patch. It would be good though to
document that and maybe provide the steps on how to change the ovs
system-id when using SSL: I guess that means stopping ovn-controller,
regenerating certificates and starting ovn-controller. I'll put it on
my TODO list and try to handle it soon.

Thanks,
Dumitru