[ovs-dev,0/7] OVSDB 2-Tier deployment.

Message ID	20210501005548.3071269-1-i.maximets@ovn.org
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> sender: i.maximets@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 86D25240003; Sat, 1 May 2021 00:56:01 +0000 (UTC) From: Ilya Maximets <i.maximets@ovn.org> To: ovs-dev@openvswitch.org Date: Sat, 1 May 2021 02:55:41 +0200 Message-Id: <20210501005548.3071269-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Ilya Maximets <i.maximets@ovn.org>, Dumitru Ceara <dceara@redhat.com> Subject: [ovs-dev] [PATCH 0/7] OVSDB 2-Tier deployment. Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org>
Series	OVSDB 2-Tier deployment. \| expand [ovs-dev,0/7] OVSDB 2-Tier deployment. [ovs-dev,1/7] ovsdb: Add support for transaction forwarding to the replication mode. [ovs-dev,2/7] ovsdb: Add extra internal tables to databases for replication purposes. [ovs-dev,3/7] replication: Allow replication of _Server database. [ovs-dev,4/7] ovsdb-cs: Monitor _synced_Database table. [ovs-dev,5/7] python: idl: Monitor _synced_Database table. [ovs-dev,6/7] ovsdb: Report connection state for replication server. [ovs-dev,7/7] python: idl: Allow retry even when using a single remote.

Message ID

20210501005548.3071269-1-i.maximets@ovn.org

Headers

From: Ilya Maximets <i.maximets@ovn.org>
To: ovs-dev@openvswitch.org
Date: Sat,  1 May 2021 02:55:41 +0200
Message-Id: <20210501005548.3071269-1-i.maximets@ovn.org>
MIME-Version: 1.0
Cc: Ilya Maximets <i.maximets@ovn.org>, Dumitru Ceara <dceara@redhat.com>
Subject: [ovs-dev] [PATCH 0/7] OVSDB 2-Tier deployment.
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ovs-dev-bounces@openvswitch.org
Sender: "dev" <ovs-dev-bounces@openvswitch.org>

Series

OVSDB 2-Tier deployment. | expand

Message

Ilya Maximets May 1, 2021, 12:55 a.m. UTC

Replication can be used to scale out read-only access to the database.
But there are clients that are not read-only, but read-mostly.
One of the main examples is ovn-controller that mostly monitors
updates from the Southbound DB, but needs to claim ports by sending
transactions that changes some database tables.

Southbound database serves lots of connections: all connections
from ovn-controllers and some service connections from cloud
infrastructure, e.g. some OpenStack agents are monitoring updates.
At a high scale and with a big size of the database ovsdb-server
spends too much time processing monitor updates and it's required
to move this load somewhere else.  This patch-set aims to introduce
required functionality to scale out read-mostly connections by
replication.

Replication mode natively supports replication of standalone and
clustered databases, so it will work for any type of OVN deployment.

There are 3 missing parts for existing replication mode:

1. Ability to handle transactions that aims to modify the data.
   Obviously, replica is not allowed to execute this kind of
   transactions.  Solution is to implement transaction forwarding,
   i.e. allow replication server to act as a proxy by forwarding
   transactions to the primary server and forwarding replies back
   to the client.  All read-only transactions and monitors are
   still fully served by the replica itself.

2. In case where replica replicates a member of a raft cluster,
   client needs to know the state of this cluster member in order
   to make a decision about re-connection to another server.
   This is solved by replicating a Database table of _Server database
   from the replication source, so clients are able to check the
   clustered database state as usual.

   ** Another solution for this problem is to allow the replication
   server itself to have multiple remotes and re-connect as client
   will do.  However, this would be a significant behavioral change
   for the current implementation of the active-backup schema where
   backup stays connected no matter what.  This will also require
   a huge rewrite of the replication state machine and will likely
   bring lots of code duplication with ovsdb-cs module.  We might
   end up re-writing replication code on top of ovsdb-cs (which
   might be a good thing, though) and refactoring ovsdb-cs itself,
   but that would be much more work.

3. Client will need to know if replica is currently connected
   to the replication source.  For example, for the case where one
   of the replicas lost connection with the primary server, client
   should be able to re-connect to another replica.
   This is implemented by reflecting the connection state in the
   'connected' field of the row in Database table in _Server database.
   Currently for active-backup it's always set to 'true'.

This patch set consists of 4 parts:

Patch #1 - Implementation of a transaction forwarding.  Fully
           independent from the rest of the series and it's the only
           mandatory change for a 2-Tire deployment.  The rest of the
           set is to propagate status fields and have correct failover
           on a client side.

Patches #2-5 - Solution for the missing part #2:  Replication of a
               _Server database and handling on a client side.

Patch #6 - Solution for the problem #3.

Patch #7 - Slightly unrelated fix.  Bringing one missing re-connection
           fix from C version to python IDL.  Mostly to add more
           tests.

Note: in order to replicate a clustered Sb DB, ephemeral columns from
the ovn-sb schema should be manually converted to persistent ones before
creating a database file for the replica, otherwise there will be schema
mismatch and replication will fail.

Ilya Maximets (7):
  ovsdb: Add support for transaction forwarding to the replication mode.
  ovsdb: Add extra internal tables to databases for replication
    purposes.
  replication: Allow replication of _Server database.
  ovsdb-cs: Monitor _synced_Database table.
  python: idl: Monitor _synced_Database table.
  ovsdb: Report connection state for replication server.
  python: idl: Allow retry even when using a single remote.

 Documentation/ref/ovsdb-server.7.rst |   8 ++
 Documentation/ref/ovsdb.7.rst        |   8 ++
 NEWS                                 |  11 ++
 lib/ovsdb-cs.c                       | 116 ++++++++++++-----
 ovsdb/_server.ovsschema              |   5 +-
 ovsdb/_server.xml                    |  28 ++--
 ovsdb/automake.mk                    |   2 +
 ovsdb/execution.c                    |  14 +-
 ovsdb/jsonrpc-server.c               |  61 +++++++--
 ovsdb/jsonrpc-server.h               |   6 +-
 ovsdb/ovsdb-doc                      |   3 +-
 ovsdb/ovsdb-server.c                 |  72 +++++++----
 ovsdb/ovsdb.c                        |  19 ++-
 ovsdb/ovsdb.h                        |   2 +-
 ovsdb/replication.c                  | 108 +++++++++++++++-
 ovsdb/replication.h                  |   7 +-
 ovsdb/table.c                        |  20 ++-
 ovsdb/table.h                        |   4 +-
 ovsdb/transaction-forward.c          | 187 +++++++++++++++++++++++++++
 ovsdb/transaction-forward.h          |  42 ++++++
 ovsdb/trigger.c                      |  62 +++++++--
 ovsdb/trigger.h                      |  45 ++++---
 python/ovs/db/idl.py                 |  73 ++++++++---
 python/ovs/db/schema.py              |  20 ++-
 tests/ovsdb-cluster.at               |  49 ++++++-
 tests/ovsdb-server.at                |  69 +++++++++-
 tests/test-ovsdb.c                   |   2 +-
 tests/test-ovsdb.py                  |   2 +-
 28 files changed, 893 insertions(+), 152 deletions(-)
 create mode 100644 ovsdb/transaction-forward.c
 create mode 100644 ovsdb/transaction-forward.h

Comments

Ben Pfaff May 5, 2021, 9:40 p.m. UTC | #1

On Sat, May 01, 2021 at 02:55:41AM +0200, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> replication.
> 
> Replication mode natively supports replication of standalone and
> clustered databases, so it will work for any type of OVN deployment.

I think that this series has the details that one needs to understand it
at a high level, but I didn't see a high-level overview of its intended
use.  I think the idea is that, for a use case like OVN, one would
deploy a clustered OVS database, and then the second tier would be a
collection of replicas on top of that cluster.  If that's the case, I
think that adding a paragraph to an appropriate high-level documentation
file for OVSDB explaining it would be helpful, and possibly a (ASCII
art?) diagram.

Ilya Maximets May 5, 2021, 10:34 p.m. UTC | #2

On 5/5/21 11:40 PM, Ben Pfaff wrote:
> On Sat, May 01, 2021 at 02:55:41AM +0200, Ilya Maximets wrote:
>> Replication can be used to scale out read-only access to the database.
>> But there are clients that are not read-only, but read-mostly.
>> One of the main examples is ovn-controller that mostly monitors
>> updates from the Southbound DB, but needs to claim ports by sending
>> transactions that changes some database tables.
>>
>> Southbound database serves lots of connections: all connections
>> from ovn-controllers and some service connections from cloud
>> infrastructure, e.g. some OpenStack agents are monitoring updates.
>> At a high scale and with a big size of the database ovsdb-server
>> spends too much time processing monitor updates and it's required
>> to move this load somewhere else.  This patch-set aims to introduce
>> required functionality to scale out read-mostly connections by
>> replication.
>>
>> Replication mode natively supports replication of standalone and
>> clustered databases, so it will work for any type of OVN deployment.
> 
> I think that this series has the details that one needs to understand it
> at a high level, but I didn't see a high-level overview of its intended
> use.  I think the idea is that, for a use case like OVN, one would
> deploy a clustered OVS database, and then the second tier would be a
> collection of replicas on top of that cluster.  If that's the case, I
> think that adding a paragraph to an appropriate high-level documentation
> file for OVSDB explaining it would be helpful, and possibly a (ASCII
> art?) diagram.
> 

Yes.  That is the intended use case.   And I agree that current series
has only bare minimum of documentation.  I'll include the patch that
adds a new paragraph to Documentation/topics/ovsdb-replication.rst to
describe how to scale out read access (existing functionality that is
documented with a single sentence in ovsdb(7) "A set of replicas that do  
serve clients could be used to scale out read access to the primary
database.") and how to use the same schema for an OVN deployment with
transaction forwarding enabled (new features).  This could be done in
v2 or as a separate patch.

I have a following ASCII art for the deployment schema:


        +---------------------------------------------------------+
        | RAFT CLUSTER                                            |
        |           +---------+ ovsdb-server-1 +------+           |
        |           |             +                   |           |
        |           +             |                   +           |
        | +--+ovsdb-server-2 +----|----------+ ovsdb-server-3+--+ |
        | |                       |                             | |
        +-|-----------------------|-----------------------------|-+
          |                       |                             |
          |               +-------+------------+      +---------+
          +               |       +            |      |         +
  +- ovsdb-replica-1 -+   2  ovsdb-replica-3   4 ... N-1  ovsdb-replica-N
  |        |          |      |       |     |               |     |     |
  +        +          +                                                +
client-1 client-2 client-3      ....           ....          ....  client-M

For OVN setup: ovsdb-server-{1,2,3} - clustered Southbound DB
               ovsdb-replica-{1..N} - replication servers
               client-{1..M}        - ovn-controller's

This schema also reflects the fact that replica doesn't replicate the
cluster, but a particular server in a cluster (see "missing part #2" in
the cover letter).  This also should be reflected in the high-level doc.

Best regards, Ilya Maximets.

Ben Pfaff May 5, 2021, 11:12 p.m. UTC | #3

On Thu, May 06, 2021 at 12:34:00AM +0200, Ilya Maximets wrote:
> On 5/5/21 11:40 PM, Ben Pfaff wrote:
> > I think that this series has the details that one needs to understand it
> > at a high level, but I didn't see a high-level overview of its intended
> > use.  I think the idea is that, for a use case like OVN, one would
> > deploy a clustered OVS database, and then the second tier would be a
> > collection of replicas on top of that cluster.  If that's the case, I
> > think that adding a paragraph to an appropriate high-level documentation
> > file for OVSDB explaining it would be helpful, and possibly a (ASCII
> > art?) diagram.
> > 
> 
> Yes.  That is the intended use case.   And I agree that current series
> has only bare minimum of documentation.  I'll include the patch that
> adds a new paragraph to Documentation/topics/ovsdb-replication.rst to
> describe how to scale out read access (existing functionality that is
> documented with a single sentence in ovsdb(7) "A set of replicas that do  
> serve clients could be used to scale out read access to the primary
> database.") and how to use the same schema for an OVN deployment with
> transaction forwarding enabled (new features).  This could be done in
> v2 or as a separate patch.
> 
> I have a following ASCII art for the deployment schema:
> [...]

OK, great!

OVS and OVN are usually good at documenting details, but not the high
level.  It's a syndrome of starting out as a project that implemented a
standard (OpenFlow): we didn't document the high level because we
assumed everyone was already familiar with OpenFlow and what it was good
for at a high level.

Dumitru Ceara May 10, 2021, 12:36 p.m. UTC | #4

On 5/1/21 2:55 AM, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> replication.
> 
> Replication mode natively supports replication of standalone and
> clustered databases, so it will work for any type of OVN deployment.
> 
> There are 3 missing parts for existing replication mode:
> 
> 1. Ability to handle transactions that aims to modify the data.
>    Obviously, replica is not allowed to execute this kind of
>    transactions.  Solution is to implement transaction forwarding,
>    i.e. allow replication server to act as a proxy by forwarding
>    transactions to the primary server and forwarding replies back
>    to the client.  All read-only transactions and monitors are
>    still fully served by the replica itself.
> 
> 2. In case where replica replicates a member of a raft cluster,
>    client needs to know the state of this cluster member in order
>    to make a decision about re-connection to another server.
>    This is solved by replicating a Database table of _Server database
>    from the replication source, so clients are able to check the
>    clustered database state as usual.
> 
>    ** Another solution for this problem is to allow the replication
>    server itself to have multiple remotes and re-connect as client
>    will do.  However, this would be a significant behavioral change
>    for the current implementation of the active-backup schema where
>    backup stays connected no matter what.  This will also require
>    a huge rewrite of the replication state machine and will likely
>    bring lots of code duplication with ovsdb-cs module.  We might
>    end up re-writing replication code on top of ovsdb-cs (which
>    might be a good thing, though) and refactoring ovsdb-cs itself,
>    but that would be much more work.
> 
> 3. Client will need to know if replica is currently connected
>    to the replication source.  For example, for the case where one
>    of the replicas lost connection with the primary server, client
>    should be able to re-connect to another replica.
>    This is implemented by reflecting the connection state in the
>    'connected' field of the row in Database table in _Server database.
>    Currently for active-backup it's always set to 'true'.
> 
> This patch set consists of 4 parts:
> 
> Patch #1 - Implementation of a transaction forwarding.  Fully
>            independent from the rest of the series and it's the only
>            mandatory change for a 2-Tire deployment.  The rest of the
>            set is to propagate status fields and have correct failover
>            on a client side.
> 
> Patches #2-5 - Solution for the missing part #2:  Replication of a
>                _Server database and handling on a client side.
> 
> Patch #6 - Solution for the problem #3.
> 
> Patch #7 - Slightly unrelated fix.  Bringing one missing re-connection
>            fix from C version to python IDL.  Mostly to add more
>            tests.
> 
> Note: in order to replicate a clustered Sb DB, ephemeral columns from
> the ovn-sb schema should be manually converted to persistent ones before
> creating a database file for the replica, otherwise there will be schema
> mismatch and replication will fail.
> 

Hi Ilya,

I had a look at the series and the changes look good to me and I acked
most of the patches.  However, I don't feel confident enough on the
ovsdb-server side, so I hope other reviewers will share their opinions
on this feature before it's accepted.

Regards,
Dumitru