diff mbox series

[ovs-dev,RFC,1/3] OVN: introduce Controller_Event table

Message ID 0b49e766325c79cb22444bf26ba843bf101e8718.1558021382.git.lorenzo.bianconi@redhat.com
State RFC
Headers show
Series OVN: add Controller Events | expand

Commit Message

Lorenzo Bianconi May 16, 2019, 4:05 p.m. UTC
Add Controller_Event table to OVN SBDB in order to
report CMS related event.
Introduce event_table hashmap array and controller_event related
structures to ovn-controller in order to track pending events
forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
array with event_table ovn-sbdb table

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
 include/ovn/logical-fields.h    |  17 +++++
 ovn/controller/ovn-controller.c |   2 +
 ovn/controller/pinctrl.c        | 130 ++++++++++++++++++++++++++++++++
 ovn/controller/pinctrl.h        |   2 +
 ovn/ovn-sb.ovsschema            |  16 +++-
 ovn/ovn-sb.xml                  |  33 ++++++++
 6 files changed, 197 insertions(+), 3 deletions(-)

Comments

Ben Pfaff May 24, 2019, 5:39 p.m. UTC | #1
On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> Add Controller_Event table to OVN SBDB in order to
> report CMS related event.
> Introduce event_table hashmap array and controller_event related
> structures to ovn-controller in order to track pending events
> forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> array with event_table ovn-sbdb table
> 
> Signed-off-by: Mark Michelson <mmichels@redhat.com>
> Co-authored-by: Mark Michelson <mmichels@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>

This seems like one potentially reasonable way for ovn-controller to
report events to the CMS.  I want to point out some design aspects that
might need to be considered:

1. Controller_Event doesn't include a column to attribute an event to a
   particular Chassis.  This might be important for accounting: how can
   an ovn-controller tell which events it owns and should eventually
   delete?  Currently it looks like every ovn-controller always iterates
   over every Controller_Event; is that really desirable and scalable?
   Such a column would also allow update_sb_monitors() to monitor only
   the rows associated with the current chassis, increasing scalability.

   Keep in mind that ovn-controller probably can't just keep in memory
   which events belong to it, because it might be restarted and needs to
   be able to recover that information.

2. Are these events (or each individual event, anyway) single-consumer?
   I believe that this design only works for single-consumer events
   because only one consumer can mark them as 'handled'.

3. Is there value in having 'handled', instead of just having the CMS
   delete rows that it has processed?  Using 'handled' requires an extra
   round-trip between ovn-controller and the database.

4. What is the tolerance for events that are never delivered or that are
   delivered more than once?  What can actually be guaranteed, given
   that the database can die and that ovn-controller can die?  (Also,
   OVSDB transactions cannot guarantee exactly-once semantics in corner
   cases unless the transactions are idempotent.)

5. How is the number of per-controller rows limited?

I think the workflow for these events should be clearly specified.  I
guess it's something like this:

1. ovn-controller detects that an event has occurred and adds a
   corresponding row to the Controller_Event table with false 'handled'.

2. CMS consumes the row and acts on it.

3. CMS changes 'handled' to true.

4. ovn-controller deletes row.

although I think that "3. CMS deletes row." seems more straightforward.

Did you consider the inter-version compatibility issues of making
event_type an enum?  It will force an upgrade order of first ovn-northd,
then the database schema, then ovn-controller, because any other order
will make it possible that ovn-controller tries to add an invalid event
type (which ovsdb-server will reject) or that an event that ovn-northd
doesn't understand nevertheless reaches it.  However,
Documentation/intro/install/ovn-upgrades.rst recommends a different
order.  In any case the upgrade implications need to be considered.

Some typo fixes:

diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index ae5225e18fd0..a010c95b221b 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -3476,8 +3476,8 @@ tcp.flags = RST;
   </table>
   <table name="Controller_Event" title="Controller Event table">
     <p>
-      Database table used by ovn-controller to report CMS related
-      events
+      Database table used by <code>ovn-controller</code> to report CMS related
+      events.  The workflow for event processing is:
     </p>
     <column name="event_type"
             type='{"type": "string", "enum": ["set", ["empty_lb_backends"]]}'>
@@ -3485,7 +3485,7 @@ tcp.flags = RST;
     </column>
     <column name="event_info">
     <p>
-      Key-value pairs used to spcify evento info to the CMS.
+      Key-value pairs used to specify event info to the CMS.
       Possible values are:
     </p>
       <ul>
@@ -3498,7 +3498,7 @@ tcp.flags = RST;
           <code>empty_lb_backends</code> event
         </li>
         <li>
-          <code>load_balancer</code>: UUID fo the load balancer reported for
+          <code>load_balancer</code>: UUID of the load balancer reported for
           the <code>empty_lb_backends</code> event
         </li>
       </ul>
Lorenzo Bianconi May 29, 2019, 2:05 p.m. UTC | #2
> On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > Add Controller_Event table to OVN SBDB in order to
> > report CMS related event.
> > Introduce event_table hashmap array and controller_event related
> > structures to ovn-controller in order to track pending events
> > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > array with event_table ovn-sbdb table
> > 
> > Signed-off-by: Mark Michelson <mmichels@redhat.com>
> > Co-authored-by: Mark Michelson <mmichels@redhat.com>
> > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>

Hi Ben,

thanks for the review, few comments inline.

Regards,
Lorenzo

> 
> This seems like one potentially reasonable way for ovn-controller to
> report events to the CMS.  I want to point out some design aspects that
> might need to be considered:
> 
> 1. Controller_Event doesn't include a column to attribute an event to a
>    particular Chassis.  This might be important for accounting: how can
>    an ovn-controller tell which events it owns and should eventually
>    delete?  Currently it looks like every ovn-controller always iterates
>    over every Controller_Event; is that really desirable and scalable?
>    Such a column would also allow update_sb_monitors() to monitor only
>    the rows associated with the current chassis, increasing scalability.

ack, I will add it in v2

> 
>    Keep in mind that ovn-controller probably can't just keep in memory
>    which events belong to it, because it might be restarted and needs to
>    be able to recover that information.
> 
> 2. Are these events (or each individual event, anyway) single-consumer?
>    I believe that this design only works for single-consumer events
>    because only one consumer can mark them as 'handled'.

Yes, I guess a single-consumer approach would work, at least for the
architecture we were thinking about

> 
> 3. Is there value in having 'handled', instead of just having the CMS
>    delete rows that it has processed?  Using 'handled' requires an extra
>    round-trip between ovn-controller and the database.

The main reason is to manage the Controller_Event rows in the controller, but I
guess we can delete 'handled' rows even in the CMS. I will fix it in v2

> 
> 4. What is the tolerance for events that are never delivered or that are
>    delivered more than once?  What can actually be guaranteed, given
>    that the database can die and that ovn-controller can die?  (Also,
>    OVSDB transactions cannot guarantee exactly-once semantics in corner
>    cases unless the transactions are idempotent.)

If the ovn-controller dies I think there is no too much we can do, events will
be lost until the controller restarts properly.
If ovn-northd or the connection to the db dies, controller_event_run() will not
manage the Controller_Event table and pinctrl_handle_event() will queue the
pending events in the event_table hash until the upper limit is reached.
We can probably add a garbage collector for the pending events in the table.
What do you think?

> 
> 5. How is the number of per-controller rows limited?

If it enough to add a limit to the controller hash size
(https://github.com/LorenzoBianconi/ovs/blob/ovn_unidling/ovn/controller/pinctrl.c#L3488)
or do you mean to add a limit to per-controller rows in the Controller_Event
table?

> 
> I think the workflow for these events should be clearly specified.  I
> guess it's something like this:
> 
> 1. ovn-controller detects that an event has occurred and adds a
>    corresponding row to the Controller_Event table with false 'handled'.
> 
> 2. CMS consumes the row and acts on it.
> 
> 3. CMS changes 'handled' to true.
> 
> 4. ovn-controller deletes row.
> 
> although I think that "3. CMS deletes row." seems more straightforward.
> 
> Did you consider the inter-version compatibility issues of making
> event_type an enum?  It will force an upgrade order of first ovn-northd,
> then the database schema, then ovn-controller, because any other order
> will make it possible that ovn-controller tries to add an invalid event
> type (which ovsdb-server will reject) or that an event that ovn-northd
> doesn't understand nevertheless reaches it.  However,
> Documentation/intro/install/ovn-upgrades.rst recommends a different
> order.  In any case the upgrade implications need to be considered.

I guess there are no compatibility issues during upgrade, since:
- if we upgrade the ovn-northd first and then the controller 'unknown' events
  will not be forwarded by ovn-controller
- if we upgrade ovn-controller first and then ovn-northd 'unknown' events will
  not be forwarded since the related logical flows are missing in ovn-northd

Am I missing something?

> 
> Some typo fixes:

applied, thx

> 
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index ae5225e18fd0..a010c95b221b 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -3476,8 +3476,8 @@ tcp.flags = RST;
>    </table>
>    <table name="Controller_Event" title="Controller Event table">
>      <p>
> -      Database table used by ovn-controller to report CMS related
> -      events
> +      Database table used by <code>ovn-controller</code> to report CMS related
> +      events.  The workflow for event processing is:
>      </p>
>      <column name="event_type"
>              type='{"type": "string", "enum": ["set", ["empty_lb_backends"]]}'>
> @@ -3485,7 +3485,7 @@ tcp.flags = RST;
>      </column>
>      <column name="event_info">
>      <p>
> -      Key-value pairs used to spcify evento info to the CMS.
> +      Key-value pairs used to specify event info to the CMS.
>        Possible values are:
>      </p>
>        <ul>
> @@ -3498,7 +3498,7 @@ tcp.flags = RST;
>            <code>empty_lb_backends</code> event
>          </li>
>          <li>
> -          <code>load_balancer</code>: UUID fo the load balancer reported for
> +          <code>load_balancer</code>: UUID of the load balancer reported for
>            the <code>empty_lb_backends</code> event
>          </li>
>        </ul>
Ben Pfaff June 6, 2019, 1:40 a.m. UTC | #3
On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote:
> > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > > Add Controller_Event table to OVN SBDB in order to
> > > report CMS related event.
> > > Introduce event_table hashmap array and controller_event related
> > > structures to ovn-controller in order to track pending events
> > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > > array with event_table ovn-sbdb table
> > > 
> > > Signed-off-by: Mark Michelson <mmichels@redhat.com>
> > > Co-authored-by: Mark Michelson <mmichels@redhat.com>
> > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>

...

> > 4. What is the tolerance for events that are never delivered or that are
> >    delivered more than once?  What can actually be guaranteed, given
> >    that the database can die and that ovn-controller can die?  (Also,
> >    OVSDB transactions cannot guarantee exactly-once semantics in corner
> >    cases unless the transactions are idempotent.)
> 
> If the ovn-controller dies I think there is no too much we can do, events will
> be lost until the controller restarts properly.
> If ovn-northd or the connection to the db dies, controller_event_run() will not
> manage the Controller_Event table and pinctrl_handle_event() will queue the
> pending events in the event_table hash until the upper limit is reached.
> We can probably add a garbage collector for the pending events in the table.
> What do you think?

What's the consequence if an event is missed?  What's the consequence if
an event is pushed two or more times?  It's easiest to design a
distributed system so that it's OK if an event is delivered zero times
or multiple times.  It's a little harder to design so that an event is
delivered one or more times.  It's hardest to design so that an event is
delivered exactly one time.

There are the following obvious points of failure from these points of
view:

1. ovn-controller.  If it dies, it might not push an event that it
   should.  When it comes back up, will it know to push the event that
   it missed?  What about if it dies while it is pushing an event; is it
   possible that it will push it again when it comes up?

2. The OVSDB protocol.  If the OVSDB connection dies after
   ovn-controller's transaction is committed but before ovn-controller
   receives the acknowledgment, then when it reconnects ovn-controller
   might retry it, which could lead to an event being pushed two or more
   times.

3. ovsdb-server.  Clients don't typically use the OVSDB protocol feature
   that ensures that a transaction is committed to stable storage before
   it is acknowledged, so an event could get lost if ovsdb-server dies
   after acknowledging a transaction but before it gets written to disk.
   (Clustered OVSDB always does sync to stable storage though.)

4. ovn-northd.  There is a race between ovn-northd acting on an event
   and marking it handled (or deleting it).  There are also the same
   OVSDB protocol and ovsdb-server races in the reverse direction.

We may be able to work around some or all of these issues if necessary.
Have you considered them?  How important are they?
Lorenzo Bianconi June 11, 2019, 9:54 a.m. UTC | #4
> On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote:
> > > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > > > Add Controller_Event table to OVN SBDB in order to
> > > > report CMS related event.
> > > > Introduce event_table hashmap array and controller_event related
> > > > structures to ovn-controller in order to track pending events
> > > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > > > array with event_table ovn-sbdb table
> > > > 
> > > > Signed-off-by: Mark Michelson <mmichels@redhat.com>
> > > > Co-authored-by: Mark Michelson <mmichels@redhat.com>
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> 
> ...
> 
> > > 4. What is the tolerance for events that are never delivered or that are
> > >    delivered more than once?  What can actually be guaranteed, given
> > >    that the database can die and that ovn-controller can die?  (Also,
> > >    OVSDB transactions cannot guarantee exactly-once semantics in corner
> > >    cases unless the transactions are idempotent.)
> > 
> > If the ovn-controller dies I think there is no too much we can do, events will
> > be lost until the controller restarts properly.
> > If ovn-northd or the connection to the db dies, controller_event_run() will not
> > manage the Controller_Event table and pinctrl_handle_event() will queue the
> > pending events in the event_table hash until the upper limit is reached.
> > We can probably add a garbage collector for the pending events in the table.
> > What do you think?
> 
> What's the consequence if an event is missed?  What's the consequence if
> an event is pushed two or more times?  It's easiest to design a
> distributed system so that it's OK if an event is delivered zero times
> or multiple times.  It's a little harder to design so that an event is
> delivered one or more times.  It's hardest to design so that an event is
> delivered exactly one time.

Hi Ben,

thx a lot for your comments,

> 
> There are the following obvious points of failure from these points of
> view:
> 
> 1. ovn-controller.  If it dies, it might not push an event that it
>    should.  When it comes back up, will it know to push the event that
>    it missed?  What about if it dies while it is pushing an event; is it
>    possible that it will push it again when it comes up?

This is probably not an issue since if the event is lost because the controller
is dead we will receive a new one when the controller comes back.
If the controller dies after sending the event to the db it will not receive
new events when it comes back

> 
> 2. The OVSDB protocol.  If the OVSDB connection dies after
>    ovn-controller's transaction is committed but before ovn-controller
>    receives the acknowledgment, then when it reconnects ovn-controller
>    might retry it, which could lead to an event being pushed two or more
>    times.
> 
> 3. ovsdb-server.  Clients don't typically use the OVSDB protocol feature
>    that ensures that a transaction is committed to stable storage before
>    it is acknowledged, so an event could get lost if ovsdb-server dies
>    after acknowledging a transaction but before it gets written to disk.
>    (Clustered OVSDB always does sync to stable storage though.)
> 
> 4. ovn-northd.  There is a race between ovn-northd acting on an event
>    and marking it handled (or deleting it).  There are also the same
>    OVSDB protocol and ovsdb-server races in the reverse direction.

I agree with you, even if these kind of events (duplicated events or duplicated
rows in the db) are quite unlikely since controller_event processing is done holding
pinctrl_mutex, they can happen. However I think these kind of events can be managed
by the CMS since the controller does not have the 'history' of already handled events.

Regards,
Lorenzo

> 
> We may be able to work around some or all of these issues if necessary.
> Have you considered them?  How important are they?
Ben Pfaff June 11, 2019, 2:57 p.m. UTC | #5
On Tue, Jun 11, 2019 at 11:54:31AM +0200, Lorenzo Bianconi wrote:
> > On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote:
> > > > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > > > > Add Controller_Event table to OVN SBDB in order to
> > > > > report CMS related event.
> > > > > Introduce event_table hashmap array and controller_event related
> > > > > structures to ovn-controller in order to track pending events
> > > > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > > > > array with event_table ovn-sbdb table
> > > > > 
> > > > > Signed-off-by: Mark Michelson <mmichels@redhat.com>
> > > > > Co-authored-by: Mark Michelson <mmichels@redhat.com>
> > > > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> > 
> > ...
> > 
> > > > 4. What is the tolerance for events that are never delivered or that are
> > > >    delivered more than once?  What can actually be guaranteed, given
> > > >    that the database can die and that ovn-controller can die?  (Also,
> > > >    OVSDB transactions cannot guarantee exactly-once semantics in corner
> > > >    cases unless the transactions are idempotent.)
> > > 
> > > If the ovn-controller dies I think there is no too much we can do, events will
> > > be lost until the controller restarts properly.
> > > If ovn-northd or the connection to the db dies, controller_event_run() will not
> > > manage the Controller_Event table and pinctrl_handle_event() will queue the
> > > pending events in the event_table hash until the upper limit is reached.
> > > We can probably add a garbage collector for the pending events in the table.
> > > What do you think?
> > 
> > What's the consequence if an event is missed?  What's the consequence if
> > an event is pushed two or more times?  It's easiest to design a
> > distributed system so that it's OK if an event is delivered zero times
> > or multiple times.  It's a little harder to design so that an event is
> > delivered one or more times.  It's hardest to design so that an event is
> > delivered exactly one time.
> 
> Hi Ben,
> 
> thx a lot for your comments,
> 
> > 
> > There are the following obvious points of failure from these points of
> > view:
> > 
> > 1. ovn-controller.  If it dies, it might not push an event that it
> >    should.  When it comes back up, will it know to push the event that
> >    it missed?  What about if it dies while it is pushing an event; is it
> >    possible that it will push it again when it comes up?
> 
> This is probably not an issue since if the event is lost because the controller
> is dead we will receive a new one when the controller comes back.
> If the controller dies after sending the event to the db it will not receive
> new events when it comes back
> 
> > 
> > 2. The OVSDB protocol.  If the OVSDB connection dies after
> >    ovn-controller's transaction is committed but before ovn-controller
> >    receives the acknowledgment, then when it reconnects ovn-controller
> >    might retry it, which could lead to an event being pushed two or more
> >    times.
> > 
> > 3. ovsdb-server.  Clients don't typically use the OVSDB protocol feature
> >    that ensures that a transaction is committed to stable storage before
> >    it is acknowledged, so an event could get lost if ovsdb-server dies
> >    after acknowledging a transaction but before it gets written to disk.
> >    (Clustered OVSDB always does sync to stable storage though.)
> > 
> > 4. ovn-northd.  There is a race between ovn-northd acting on an event
> >    and marking it handled (or deleting it).  There are also the same
> >    OVSDB protocol and ovsdb-server races in the reverse direction.
> 
> I agree with you, even if these kind of events (duplicated events or duplicated
> rows in the db) are quite unlikely since controller_event processing is done holding
> pinctrl_mutex, they can happen. However I think these kind of events can be managed
> by the CMS since the controller does not have the 'history' of already handled events.

OK.

Will you please document what is (not) guaranteed in the documentation
somewhere?  It's important to write these things down or people are
likely to make bad assumptions later.
Lorenzo Bianconi June 11, 2019, 3:09 p.m. UTC | #6
> On Tue, Jun 11, 2019 at 11:54:31AM +0200, Lorenzo Bianconi wrote:
> > > On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote:
> > > > > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > > > > > Add Controller_Event table to OVN SBDB in order to
> > > > > > report CMS related event.
> > > > > > Introduce event_table hashmap array and controller_event related
> > > > > > structures to ovn-controller in order to track pending events
> > > > > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > > > > > array with event_table ovn-sbdb table
> > > > > > 
> > > > > > Signed-off-by: Mark Michelson <mmichels@redhat.com>
> > > > > > Co-authored-by: Mark Michelson <mmichels@redhat.com>
> > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> > > 
> > > ...
> > > 
> > > > > 4. What is the tolerance for events that are never delivered or that are
> > > > >    delivered more than once?  What can actually be guaranteed, given
> > > > >    that the database can die and that ovn-controller can die?  (Also,
> > > > >    OVSDB transactions cannot guarantee exactly-once semantics in corner
> > > > >    cases unless the transactions are idempotent.)
> > > > 
> > > > If the ovn-controller dies I think there is no too much we can do, events will
> > > > be lost until the controller restarts properly.
> > > > If ovn-northd or the connection to the db dies, controller_event_run() will not
> > > > manage the Controller_Event table and pinctrl_handle_event() will queue the
> > > > pending events in the event_table hash until the upper limit is reached.
> > > > We can probably add a garbage collector for the pending events in the table.
> > > > What do you think?
> > > 
> > > What's the consequence if an event is missed?  What's the consequence if
> > > an event is pushed two or more times?  It's easiest to design a
> > > distributed system so that it's OK if an event is delivered zero times
> > > or multiple times.  It's a little harder to design so that an event is
> > > delivered one or more times.  It's hardest to design so that an event is
> > > delivered exactly one time.
> > 
> > Hi Ben,
> > 
> > thx a lot for your comments,
> > 
> > > 
> > > There are the following obvious points of failure from these points of
> > > view:
> > > 
> > > 1. ovn-controller.  If it dies, it might not push an event that it
> > >    should.  When it comes back up, will it know to push the event that
> > >    it missed?  What about if it dies while it is pushing an event; is it
> > >    possible that it will push it again when it comes up?
> > 
> > This is probably not an issue since if the event is lost because the controller
> > is dead we will receive a new one when the controller comes back.
> > If the controller dies after sending the event to the db it will not receive
> > new events when it comes back
> > 
> > > 
> > > 2. The OVSDB protocol.  If the OVSDB connection dies after
> > >    ovn-controller's transaction is committed but before ovn-controller
> > >    receives the acknowledgment, then when it reconnects ovn-controller
> > >    might retry it, which could lead to an event being pushed two or more
> > >    times.
> > > 
> > > 3. ovsdb-server.  Clients don't typically use the OVSDB protocol feature
> > >    that ensures that a transaction is committed to stable storage before
> > >    it is acknowledged, so an event could get lost if ovsdb-server dies
> > >    after acknowledging a transaction but before it gets written to disk.
> > >    (Clustered OVSDB always does sync to stable storage though.)
> > > 
> > > 4. ovn-northd.  There is a race between ovn-northd acting on an event
> > >    and marking it handled (or deleting it).  There are also the same
> > >    OVSDB protocol and ovsdb-server races in the reverse direction.
> > 
> > I agree with you, even if these kind of events (duplicated events or duplicated
> > rows in the db) are quite unlikely since controller_event processing is done holding
> > pinctrl_mutex, they can happen. However I think these kind of events can be managed
> > by the CMS since the controller does not have the 'history' of already handled events.
> 
> OK.
> 
> Will you please document what is (not) guaranteed in the documentation
> somewhere?  It's important to write these things down or people are
> likely to make bad assumptions later.

ack, will do posting a formal series.

Regards,
Lorenzo
diff mbox series

Patch

diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
index 164b338b5..431ad03d0 100644
--- a/include/ovn/logical-fields.h
+++ b/include/ovn/logical-fields.h
@@ -20,6 +20,23 @@ 
 
 struct shash;
 
+enum ovn_controller_event {
+    OVN_EVENT_EMPTY_LB_BACKENDS = 0,
+    OVN_EVENT_MAX,
+};
+
+static inline char *
+event_to_string(enum ovn_controller_event event)
+{
+    switch (event) {
+    case OVN_EVENT_EMPTY_LB_BACKENDS:
+        return "empty_lb_backends";
+    case OVN_EVENT_MAX:
+    default:
+        return "";
+    }
+}
+
 /* Logical fields.
  *
  * These values are documented in ovn-architecture(7), please update the
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 69eeee5dc..d6494590b 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -765,6 +765,8 @@  main(int argc, char *argv[])
                                 sbrec_port_binding_by_name,
                                 sbrec_mac_binding_by_lport_ip,
                                 sbrec_dns_table_get(ovnsb_idl_loop.idl),
+                                sbrec_controller_event_table_get(
+                                    ovnsb_idl_loop.idl),
                                 br_int, chassis,
                                 &local_datapaths, &active_tunnels);
                     update_ct_zones(&local_lports, &local_datapaths, &ct_zones,
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 8ae1f9bd6..ca191d961 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -223,6 +223,132 @@  static bool may_inject_pkts(void);
 
 COVERAGE_DEFINE(pinctrl_drop_put_mac_binding);
 COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map);
+COVERAGE_DEFINE(pinctrl_drop_controller_event);
+
+struct empty_lb_backends_event {
+    struct hmap_node hmap_node;
+    char *vip;
+    char *protocol;
+    char *load_balancer;
+};
+
+static struct hmap event_table[OVN_EVENT_MAX];
+
+static void init_event_table(void)
+{
+    for (size_t i = 0; i < OVN_EVENT_MAX; i++) {
+        hmap_init(&event_table[i]);
+    }
+}
+
+static void
+empty_lb_backends_event_flush(void)
+{
+    struct empty_lb_backends_event *ce;
+    HMAP_FOR_EACH_POP (ce, hmap_node,
+                       &event_table[OVN_EVENT_EMPTY_LB_BACKENDS]) {
+        free(ce->vip);
+        free(ce->protocol);
+        free(ce->load_balancer);
+        free(ce);
+    }
+}
+
+static void event_table_flush(void)
+{
+    empty_lb_backends_event_flush();
+}
+
+static void event_table_destroy(void)
+{
+    event_table_flush();
+    for (size_t i = 0; i < OVN_EVENT_MAX; i++) {
+        hmap_destroy(&event_table[i]);
+    }
+}
+
+static struct empty_lb_backends_event *
+pinctrl_find_empty_lb_backends_event(char *vip, char *protocol,
+                                     char *load_balancer, uint32_t hash)
+{
+    struct empty_lb_backends_event *ce;
+    HMAP_FOR_EACH_WITH_HASH (ce, hmap_node, hash,
+                             &event_table[OVN_EVENT_EMPTY_LB_BACKENDS]) {
+        if (!strcmp(ce->vip, vip) &&
+            !strcmp(ce->protocol, protocol) &&
+            !strcmp(ce->load_balancer, load_balancer)) {
+            return ce;
+        }
+    }
+    return NULL;
+}
+
+static const struct sbrec_controller_event *
+empty_lb_backends_lookup(struct empty_lb_backends_event *event,
+                         const struct sbrec_controller_event_table *ce_table)
+{
+    const struct sbrec_controller_event *sbrec_event;
+    const char *event_type = event_to_string(OVN_EVENT_EMPTY_LB_BACKENDS);
+    SBREC_CONTROLLER_EVENT_TABLE_FOR_EACH (sbrec_event, ce_table) {
+        if (strcmp(sbrec_event->event_type, event_type)) {
+            continue;
+        }
+
+        const char *vip = smap_get(&sbrec_event->event_info, "vip");
+        const char *protocol = smap_get(&sbrec_event->event_info, "protocol");
+        const char *load_balancer = smap_get(&sbrec_event->event_info,
+                                             "load_balancer");
+
+        if (!strcmp(event->vip, vip) &&
+            !strcmp(event->protocol, protocol) &&
+            !strcmp(event->load_balancer, load_balancer)) {
+            return sbrec_event;
+        }
+    }
+
+    return NULL;
+}
+
+static void
+controller_event_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
+                     const struct sbrec_controller_event_table *ce_table)
+    OVS_REQUIRES(pinctrl_mutex)
+{
+    if (!ovnsb_idl_txn) {
+        return;
+    }
+
+    struct empty_lb_backends_event *empty_lbs;
+    HMAP_FOR_EACH (empty_lbs, hmap_node,
+                   &event_table[OVN_EVENT_EMPTY_LB_BACKENDS]) {
+        const struct sbrec_controller_event *event;
+
+        event = empty_lb_backends_lookup(empty_lbs, ce_table);
+        if (!event) {
+            struct smap event_info = SMAP_INITIALIZER(&event_info);
+
+            smap_add(&event_info, "vip", empty_lbs->vip);
+            smap_add(&event_info, "protocol", empty_lbs->protocol);
+            smap_add(&event_info, "load_balancer", empty_lbs->load_balancer);
+
+            event = sbrec_controller_event_insert(ovnsb_idl_txn);
+            sbrec_controller_event_set_event_type(event,
+                    event_to_string(OVN_EVENT_EMPTY_LB_BACKENDS));
+            sbrec_controller_event_set_event_info(event, &event_info);
+            sbrec_controller_event_set_handled(event, false);
+        }
+    }
+    event_table_flush();
+
+    const struct sbrec_controller_event *cur_event, *next_event;
+    /* flush 'handled' rows */
+    SBREC_CONTROLLER_EVENT_TABLE_FOR_EACH_SAFE (cur_event, next_event,
+                                                ce_table) {
+        if (cur_event->handled) {
+            sbrec_controller_event_delete(cur_event);
+        }
+    }
+}
 
 void
 pinctrl_init(void)
@@ -231,6 +357,7 @@  pinctrl_init(void)
     init_send_garps();
     init_ipv6_ras();
     init_buffered_packets_map();
+    init_event_table();
     pinctrl.br_int_name = NULL;
     pinctrl_handler_seq = seq_create();
     pinctrl_main_seq = seq_create();
@@ -1891,6 +2018,7 @@  pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
             struct ovsdb_idl_index *sbrec_port_binding_by_name,
             struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip,
             const struct sbrec_dns_table *dns_table,
+            const struct sbrec_controller_event_table *ce_table,
             const struct ovsrec_bridge *br_int,
             const struct sbrec_chassis *chassis,
             const struct hmap *local_datapaths,
@@ -1916,6 +2044,7 @@  pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
     prepare_ipv6_ras(sbrec_port_binding_by_datapath,
                      sbrec_port_binding_by_name, local_datapaths);
     sync_dns_cache(dns_table);
+    controller_event_run(ovnsb_idl_txn, ce_table);
     run_buffered_binding(sbrec_port_binding_by_datapath,
                          sbrec_mac_binding_by_lport_ip,
                          local_datapaths);
@@ -2264,6 +2393,7 @@  pinctrl_destroy(void)
     destroy_send_garps();
     destroy_ipv6_ras();
     destroy_buffered_packets_map();
+    event_table_destroy();
     destroy_put_mac_bindings();
     destroy_dns_cache();
     seq_destroy(pinctrl_main_seq);
diff --git a/ovn/controller/pinctrl.h b/ovn/controller/pinctrl.h
index f61d7056e..fdef27a6d 100644
--- a/ovn/controller/pinctrl.h
+++ b/ovn/controller/pinctrl.h
@@ -29,6 +29,7 @@  struct ovsdb_idl_txn;
 struct ovsrec_bridge;
 struct sbrec_chassis;
 struct sbrec_dns_table;
+struct sbrec_controller_event_table;
 
 void pinctrl_init(void);
 void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
@@ -38,6 +39,7 @@  void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
                  struct ovsdb_idl_index *sbrec_port_binding_by_name,
                  struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip,
                  const struct sbrec_dns_table *,
+                 const struct sbrec_controller_event_table *,
                  const struct ovsrec_bridge *, const struct sbrec_chassis *,
                  const struct hmap *local_datapaths,
                  const struct sset *active_tunnels);
diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
index 2b543c6f5..2a481b625 100644
--- a/ovn/ovn-sb.ovsschema
+++ b/ovn/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
-    "version": "2.3.0",
-    "cksum": "3092285199 17409",
+    "version": "2.4.0",
+    "cksum": "674838740 17844",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -349,4 +349,14 @@ 
                     "type": {"key": "string", "value": "string",
                              "min": 0, "max": "unlimited"}}},
             "indexes": [["name"]],
-            "isRoot": true}}}
+            "isRoot": true},
+        "Controller_Event": {
+            "columns": {
+                "event_type": {"type": {"key": {"type": "string",
+                               "enum": ["set", ["empty_lb_backends"]]}}},
+                "event_info": {"type": {"key": "string", "value": "string",
+                               "min": 0, "max": "unlimited"}},
+                "handled": {"type": "boolean"}
+            },
+            "isRoot": true
+         }}}
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 1a2bc1da9..ae5225e18 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -3474,4 +3474,37 @@  tcp.flags = RST;
       </column>
     </group>
   </table>
+  <table name="Controller_Event" title="Controller Event table">
+    <p>
+      Database table used by ovn-controller to report CMS related
+      events
+    </p>
+    <column name="event_type"
+            type='{"type": "string", "enum": ["set", ["empty_lb_backends"]]}'>
+      Event type occurred
+    </column>
+    <column name="event_info">
+    <p>
+      Key-value pairs used to spcify evento info to the CMS.
+      Possible values are:
+    </p>
+      <ul>
+        <li>
+         <code>vip</code>: VIP reported for the <code>empty_lb_backends</code>
+         event
+        </li>
+        <li>
+          <code>protocol</code>: Transport protocol reported for the
+          <code>empty_lb_backends</code> event
+        </li>
+        <li>
+          <code>load_balancer</code>: UUID fo the load balancer reported for
+          the <code>empty_lb_backends</code> event
+        </li>
+      </ul>
+    </column>
+    <column name="handled" type='{"type": "boolean"}'>
+      Value used to indicate if the event has been consumed by the CMS
+    </column>
+  </table>
 </database>