diff mbox series

[ovs-dev,v2,9/9] docs: Add documentation for ovsdb relay mode.

Message ID 20210612020008.3944088-10-i.maximets@ovn.org
State Superseded
Headers show
Series OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment) | expand

Commit Message

Ilya Maximets June 12, 2021, 2 a.m. UTC
Main documentation for the service model and tutorial with the use case
and configuration examples.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 Documentation/automake.mk            |   1 +
 Documentation/ref/ovsdb.7.rst        |  62 ++++++++++++--
 Documentation/topics/index.rst       |   1 +
 Documentation/topics/ovsdb-relay.rst | 124 +++++++++++++++++++++++++++
 NEWS                                 |   3 +
 ovsdb/ovsdb-server.1.in              |  27 +++---
 6 files changed, 200 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/topics/ovsdb-relay.rst

Comments

Dumitru Ceara June 25, 2021, 1:35 p.m. UTC | #1
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> Main documentation for the service model and tutorial with the use case
> and configuration examples.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

I left a few minor comments below.  With them addressed:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks!

>  Documentation/automake.mk            |   1 +
>  Documentation/ref/ovsdb.7.rst        |  62 ++++++++++++--
>  Documentation/topics/index.rst       |   1 +
>  Documentation/topics/ovsdb-relay.rst | 124 +++++++++++++++++++++++++++
>  NEWS                                 |   3 +
>  ovsdb/ovsdb-server.1.in              |  27 +++---
>  6 files changed, 200 insertions(+), 18 deletions(-)
>  create mode 100644 Documentation/topics/ovsdb-relay.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index bc30f94c5..213d9c867 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>  	Documentation/topics/networking-namespaces.rst \
>  	Documentation/topics/openflow.rst \
>  	Documentation/topics/ovs-extensions.rst \
> +	Documentation/topics/ovsdb-relay.rst \
>  	Documentation/topics/ovsdb-replication.rst \
>  	Documentation/topics/porting.rst \
>  	Documentation/topics/record-replay.rst \
> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
> index e4f1bf766..a5b8a9c33 100644
> --- a/Documentation/ref/ovsdb.7.rst
> +++ b/Documentation/ref/ovsdb.7.rst
> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, respectively.
>  Service Models
>  ==============
>  
> -OVSDB supports three service models for databases: **standalone**,
> -**active-backup**, and **clustered**.  The service models provide different
> -compromises among consistency, availability, and partition tolerance.  They
> -also differ in the number of servers required and in terms of performance.  The
> -standalone and active-backup database service models share one on-disk format,
> -and clustered databases use a different format, but the OVSDB programs work
> -with both formats.  ``ovsdb(5)`` documents these file formats.
> +OVSDB supports four service models for databases: **standalone**,
> +**active-backup**, **relay** and **clustered**.  The service models provide
> +different compromises among consistency, availability, and partition tolerance.
> +They also differ in the number of servers required and in terms of performance.
> +The standalone and active-backup database service models share one on-disk
> +format, and clustered databases use a different format, but the OVSDB programs
> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
> +databases has no on-disk storage.

s/has/have

>  
>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>  any particular service model.
> @@ -406,6 +407,50 @@ following consequences:
>    that the client previously read.  The OVSDB client library in Open vSwitch
>    uses this feature to avoid servers with stale data.
>  
> +Relay Service Model
> +-------------------
> +
> +A **relay** database is a way to scale out read-mostly access to the
> +existing database working in any service model including relay.
> +
> +Relay database creates and maintains an OVSDB connection with other OVSDB
> +server.  It uses this connection to maintain in-memory copy of the remote
> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
> +database content changes on relay source in the real time.
> +
> +The purpose of relay server is to scale out the number of database clients.
> +Read-only transactions and monitor requests are fully handled by the relay
> +server itself.  For the transactions that requests database modifications,

s/requests/request

> +relay works as a proxy between the client and the relay source, i.e. it
> +forwards transactions and replies between them.
> +
> +Compared to a clustered and active-backup models, relay service model provides

s/Compared to a/Compared to the

> +read and write access to the database similarly to a clustered database (and
> +even more scalable), but with generally insignificant performance overhead of

Joke: citation needed

> +an active-backup model.  At the same time it doesn't increase availability that
> +needs to be covered by the service model of the relay source.
> +
> +Relay database has no on-disk storage and therefore cannot be converted to
> +any other service model.
> +
> +If there is already a database started in any service model, to start a relay
> +database server use ``ovsdb-server relay:<DB_NAME>:<relay source>``, where
> +``<DB_NAME>`` is the database name as specified in the schema of the database
> +that existing server runs, and ``<relay source>`` is an OVSDB connection method
> +(see `Connection Methods`_ below) that connects to the existing database
> +server.  ``<relay source>`` could contain a comma-separated list of connection
> +methods, e.g. to connect to any server of the clustered database.
> +Multiple relay servers could be started for the same relay source.
> +
> +Since the way how relay handles read and write transactions is very similar
> +to the clustered model where "cluster" means "set or relay servers connected
> +to the same relay source", "follower" means "relay server" and the "leader"
> +means "relay source", same consistency consequences as for the clustered
> +model applies to relay as well (See `Understanding Cluster Consistency`_
> +above).
> +
> +Open vSwitch 2.16 introduced support for relay service model.
> +
>  Database Replication
>  ====================
>  
> @@ -414,7 +459,8 @@ Replication, in this context, means to make, and keep up-to-date, a read-only
>  copy of the contents of a database (the ``replica``).  One use of replication
>  is to keep an up-to-date backup of a database.  A replica used solely for
>  backup would not need to support clients of its own.  A set of replicas that do
> -serve clients could be used to scale out read access to the primary database.
> +serve clients could be used to scale out read access to the primary database,
> +however `Relay Service Model`_ is more suitable for that purpose.
>  
>  A database replica is set up in the same way as a backup server in an
>  active-backup pair, with the difference that the replica is never promoted to
> diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
> index 0036567eb..d8ccbd757 100644
> --- a/Documentation/topics/index.rst
> +++ b/Documentation/topics/index.rst
> @@ -44,6 +44,7 @@ OVS
>     openflow
>     bonding
>     networking-namespaces
> +   ovsdb-relay
>     ovsdb-replication
>     dpdk/index
>     windows
> diff --git a/Documentation/topics/ovsdb-relay.rst b/Documentation/topics/ovsdb-relay.rst
> new file mode 100644
> index 000000000..40d294c55
> --- /dev/null
> +++ b/Documentation/topics/ovsdb-relay.rst
> @@ -0,0 +1,124 @@
> +..
> +      Copyright 2021, Red Hat, Inc.
> +
> +      Licensed under the Apache License, Version 2.0 (the "License"); you may
> +      not use this file except in compliance with the License. You may obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
> +      License for the specific language governing permissions and limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +===============================
> +Scaling OVSDB Access With Relay
> +===============================
> +
> +Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to
> +increase database scalability for a big deployments.  Mainly, OVN (Open Virtual
> +Network) Southbound Database deployments.  This document describes the main
> +concept and provides the configuration examples.
> +
> +What is OVSDB Relay?
> +--------------------
> +
> +Relay is a database service model in which one ``ovsdb-server`` (``relay``)
> +connects to another standalone or clustered database server
> +(``relay source``) and maintains in-memory copy of its data, receiving
> +all the updates via this OVSDB connection.  Relay server handles all the
> +read-only requests (monitors and transactions) on its own and forwards all the
> +transactions that requires database modifications to the relay source.
> +
> +Why is this needed?
> +-------------------
> +
> +Some OVN deployment could have hundreds or even thousands nodes, on each of
> +these nodes there is an ovn-controller, which is connected to the
> +OVN_Southbound database that is served by a standalone or clustered OVSDB.
> +Standalone database is handled by a single ovsdb-server process and clustered
> +could consist of 3 to 5 ovsdb-server processes.  For the clustered database,
> +higher number of servers may significantly increase transaction latency due
> +to necessity for these servers to reach consensus.  So, in the end limited
> +number of ovsdb-server processes serves ever growing number of clients and this
> +leads to performance issues.
> +
> +Read-only access could be scaled up with OVSDB replication on top of
> +active-backup service model, but ovn-controller is a read-mostly client, not
> +a read-only, i.e. it needs to execute write transactions from time to time.
> +Here relay service model comes into play.
> +
> +2-Tier Deployment
> +-----------------
> +
> +Solution for the scaling issue could look like a 2-tier deployment, where
> +a set of relay servers is connected to the main database cluster
> +(OVN_Southbound) and clients (ovn-conrtoller) connected to these relay
> +servers::
> +
> +                                    172.16.0.1
> +   +--------------------+   +----+ ovsdb-relay-1 +--+---+ client-1
> +   |                    |   |                       |
> +   |    Clustered       |   |                       +---+ client-2
> +   |     Database       |   |                        ...
> +   |                    |   |                       +---+ client-N
> +   |    10.0.0.2        |   |
> +   |  ovsdb-server-2    |   |       172.16.0.2
> +   |   +        +       |   +----+ ovsdb-relay-2 +--+---+ client-N+1
> +   |   |        |       |   |                       |
> +   |   |        +       +---+                       +---+ client-N+2
> +   |   |   10.0.0.1     |   |                        ...
> +   |   | ovsdb-server-1 |   |                       +---+ client-2N
> +   |   |        +       |   |
> +   |   |        |       |   |
> +   |   +        +       |   +      ... ... ... ... ...
> +   |  ovsdb-server-3    |   |
> +   |    10.0.0.3        |   |                       +---+ client-KN-1
> +   |                    |   |       172.16.0.K      |
> +   +--------------------+   +----+ ovsdb-relay-K +--+---+ client-KN
> +
> +In practice, the picture might look a bit more complex, because all relay
> +servers might connect to any member of a main cluster and clients might
> +connect to any relay server of their choice.
> +
> +Assuming that servers of a main cluster started like this::
> +
> +  $ ovsdb-server --remote=ptcp:10.0.0.1:6642 ovn-sb-1.db
> +
> +The same for other two servers.  In this case relay servers could be
> +started like this::
> +
> +  $ REMOTES=tcp:10.0.0.1:6642,tcp:10.0.0.2:6642,tcp:10.0.0.3:6642
> +  $ ovsdb-server --remote=ptcp:172.16.0.1:6642 relay:OVN_Southbound:$REMOTES
> +  $ ...
> +  $ ovsdb-server --remote=ptcp:172.16.0.K:6642 relay:OVN_Southbound:$REMOTES
> +
> +Every relay server could connect to any of the cluster members of their choice,
> +fairness of load distribution is achieved by shuffling remotes.
> +
> +For the actual clients, they could be configured to connect to any of the
> +relay servers.  For ovn-controllers the configuration could look like this::
> +
> +  $ REMOTES=tcp:172.16.0.1:6642,...,tcp:172.16.0.K:6642
> +  $ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES
> +
> +Setup like this allows the system to serve ``K * N`` clients while having only
> +``K`` actual connections on the main clustered database keeping it in a
> +stable state.
> +
> +It's also possible to create multi-tier deployments by connecting one set
> +of relay servers to another (smaller) set of relay servers, or even create
> +tree-like structures by the cost of increased latency for write transactions,

s/by/with

> +because they will be forwarded multiple times.
> diff --git a/NEWS b/NEWS
> index ebba17b22..391b0abba 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,6 +1,9 @@
>  Post-v2.15.0
>  ---------------------
>     - OVSDB:
> +     * Introduced new database service model - "relay".  Targeted to scale out
> +       read-mostly access (ovn-controller) to existing databases.
> +       For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
>       * New command line options --record/--replay for ovsdb-server and
>         ovsdb-client to record and replay all the incoming transactions,
>         monitors, etc.  More datails in Documentation/topics/record-replay.rst.
> diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
> index fdd52e8f6..dac0f02cb 100644
> --- a/ovsdb/ovsdb-server.1.in
> +++ b/ovsdb/ovsdb-server.1.in
> @@ -10,6 +10,7 @@ ovsdb\-server \- Open vSwitch database server
>  .SH SYNOPSIS
>  \fBovsdb\-server\fR
>  [\fIdatabase\fR]\&...
> +[\fIrelay:schema_name:remote\fR]\&...
>  [\fB\-\-remote=\fIremote\fR]\&...
>  [\fB\-\-run=\fIcommand\fR]
>  .so lib/daemon-syn.man
> @@ -35,12 +36,15 @@ For an introduction to OVSDB and its implementation in Open vSwitch,
>  see \fBovsdb\fR(7).
>  .PP
>  Each OVSDB file may be specified on the command line as \fIdatabase\fR.
> -If none is specified, the default is \fB@DBDIR@/conf.db\fR.  The database
> -files must already have been created and initialized using, for
> -example, \fBovsdb\-tool\fR's \fBcreate\fR, \fBcreate\-cluster\fR, or
> -\fBjoin\-cluster\fR command.
> +Relay databases may be specified on the command line as
> +\fIrelay:schema_name:remote\fR.  For a detailed description of relay database
> +argument, see \fBovsdb\fR(7).
> +If none of database files or relay databases is specified, the default is
> +\fB@DBDIR@/conf.db\fR.  The database files must already have been created and
> +initialized using, for example, \fBovsdb\-tool\fR's \fBcreate\fR,
> +\fBcreate\-cluster\fR, or \fBjoin\-cluster\fR command.
>  .PP
> -This OVSDB implementation supports standalone, active-backup, and
> +This OVSDB implementation supports standalone, active-backup, relay and
>  clustered database service models, as well as database replication.
>  See the Service Models section of \fBovsdb\fR(7) for more information.
>  .PP
> @@ -50,7 +54,9 @@ successfully join a cluster (if the database file is freshly created
>  with \fBovsdb\-tool join\-cluster\fR) or connect to a cluster that it
>  has already joined.  Use \fBovsdb\-client wait\fR (see
>  \fBovsdb\-client\fR(1)) to wait until the server has successfully
> -joined and connected to a cluster.
> +joined and connected to a cluster.  The same is true for relay databases.
> +Same commands could be used to wait for a relay database to connect to
> +the relay source (remote).
>  .PP
>  In addition to user-specified databases, \fBovsdb\-server\fR version
>  2.9 and later also always hosts a built-in database named
> @@ -243,10 +249,11 @@ not list remotes added indirectly because they were read from the
>  database by configuring a
>  \fBdb:\fIdb\fB,\fItable\fB,\fIcolumn\fR remote.
>  .
> -.IP "\fBovsdb\-server/add\-db \fIdatabase\fR"
> -Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  The database
> -file must already have been created and initialized using, for example,
> -\fBovsdb\-tool create\fR.
> +.IP "\fBovsdb\-server/add\-db \fIdatabase\fR
> +Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  \fIdatabase\fR
> +could be a database file or a relay description in the following format:
> +\fIrelay:schema_name:remote\fR.  The database file must already have been
> +created and initialized using, for example, \fBovsdb\-tool create\fR.
>  .
>  .IP "\fBovsdb\-server/remove\-db \fIdatabase\fR"
>  Removes \fIdatabase\fR from the running \fBovsdb\-server\fR.  \fIdatabase\fR
>
Mark Gray July 2, 2021, 11:05 a.m. UTC | #2
On 12/06/2021 03:00, Ilya Maximets wrote:
> Main documentation for the service model and tutorial with the use case
> and configuration examples.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  Documentation/automake.mk            |   1 +
>  Documentation/ref/ovsdb.7.rst        |  62 ++++++++++++--
>  Documentation/topics/index.rst       |   1 +
>  Documentation/topics/ovsdb-relay.rst | 124 +++++++++++++++++++++++++++
>  NEWS                                 |   3 +
>  ovsdb/ovsdb-server.1.in              |  27 +++---
>  6 files changed, 200 insertions(+), 18 deletions(-)
>  create mode 100644 Documentation/topics/ovsdb-relay.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index bc30f94c5..213d9c867 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>  	Documentation/topics/networking-namespaces.rst \
>  	Documentation/topics/openflow.rst \
>  	Documentation/topics/ovs-extensions.rst \
> +	Documentation/topics/ovsdb-relay.rst \
>  	Documentation/topics/ovsdb-replication.rst \
>  	Documentation/topics/porting.rst \
>  	Documentation/topics/record-replay.rst \
> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
> index e4f1bf766..a5b8a9c33 100644
> --- a/Documentation/ref/ovsdb.7.rst
> +++ b/Documentation/ref/ovsdb.7.rst
> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, respectively.
>  Service Models
>  ==============
>  
> -OVSDB supports three service models for databases: **standalone**,
> -**active-backup**, and **clustered**.  The service models provide different
> -compromises among consistency, availability, and partition tolerance.  They
> -also differ in the number of servers required and in terms of performance.  The
> -standalone and active-backup database service models share one on-disk format,
> -and clustered databases use a different format, but the OVSDB programs work
> -with both formats.  ``ovsdb(5)`` documents these file formats.
> +OVSDB supports four service models for databases: **standalone**,
> +**active-backup**, **relay** and **clustered**.  The service models provide
> +different compromises among consistency, availability, and partition tolerance.
> +They also differ in the number of servers required and in terms of performance.
> +The standalone and active-backup database service models share one on-disk
> +format, and clustered databases use a different format, but the OVSDB programs
> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
> +databases has no on-disk storage.

s/has/have

>  
>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>  any particular service model.
> @@ -406,6 +407,50 @@ following consequences:
>    that the client previously read.  The OVSDB client library in Open vSwitch
>    uses this feature to avoid servers with stale data.
>  
> +Relay Service Model
> +-------------------
> +
> +A **relay** database is a way to scale out read-mostly access to the
> +existing database working in any service model including relay.
> +
> +Relay database creates and maintains an OVSDB connection with other OVSDB

s/other/another

> +server.  It uses this connection to maintain in-memory copy of the remote

s/maintain/maintain an/

> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
> +database content changes on relay source in the real time.

s/on/on the/

> +
> +The purpose of relay server is to scale out the number of database clients.
> +Read-only transactions and monitor requests are fully handled by the relay
> +server itself.  For the transactions that requests database modifications,
> +relay works as a proxy between the client and the relay source, i.e. it
> +forwards transactions and replies between them.
> +
> +Compared to a clustered and active-backup models, relay service model provides
> +read and write access to the database similarly to a clustered database (and
> +even more scalable), but with generally insignificant performance overhead of
> +an active-backup model.  At the same time it doesn't increase availability that
> +needs to be covered by the service model of the relay source.
> +
> +Relay database has no on-disk storage and therefore cannot be converted to
> +any other service model.
> +
> +If there is already a database started in any service model, to start a relay
> +database server use ``ovsdb-server relay:<DB_NAME>:<relay source>``, where
> +``<DB_NAME>`` is the database name as specified in the schema of the database
> +that existing server runs, and ``<relay source>`` is an OVSDB connection method
> +(see `Connection Methods`_ below) that connects to the existing database
> +server.  ``<relay source>`` could contain a comma-separated list of connection
> +methods, e.g. to connect to any server of the clustered database.
> +Multiple relay servers could be started for the same relay source.
> +
> +Since the way how relay handles read and write transactions is very similar

s/the way how relay handles/the way relays handle/

> +to the clustered model where "cluster" means "set or relay servers connected

Do you mean "set of" here?

> +to the same relay source", "follower" means "relay server" and the "leader"
> +means "relay source", same consistency consequences as for the clustered
> +model applies to relay as well (See `Understanding Cluster Consistency`_
> +above).
> +
> +Open vSwitch 2.16 introduced support for relay service model.
> +
>  Database Replication
>  ====================
>  
> @@ -414,7 +459,8 @@ Replication, in this context, means to make, and keep up-to-date, a read-only
>  copy of the contents of a database (the ``replica``).  One use of replication
>  is to keep an up-to-date backup of a database.  A replica used solely for
>  backup would not need to support clients of its own.  A set of replicas that do
> -serve clients could be used to scale out read access to the primary database.
> +serve clients could be used to scale out read access to the primary database,
> +however `Relay Service Model`_ is more suitable for that purpose.
>  
>  A database replica is set up in the same way as a backup server in an
>  active-backup pair, with the difference that the replica is never promoted to
> diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
> index 0036567eb..d8ccbd757 100644
> --- a/Documentation/topics/index.rst
> +++ b/Documentation/topics/index.rst
> @@ -44,6 +44,7 @@ OVS
>     openflow
>     bonding
>     networking-namespaces
> +   ovsdb-relay
>     ovsdb-replication
>     dpdk/index
>     windows
> diff --git a/Documentation/topics/ovsdb-relay.rst b/Documentation/topics/ovsdb-relay.rst
> new file mode 100644
> index 000000000..40d294c55
> --- /dev/null
> +++ b/Documentation/topics/ovsdb-relay.rst
> @@ -0,0 +1,124 @@
> +..
> +      Copyright 2021, Red Hat, Inc.
> +
> +      Licensed under the Apache License, Version 2.0 (the "License"); you may
> +      not use this file except in compliance with the License. You may obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
> +      License for the specific language governing permissions and limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +===============================
> +Scaling OVSDB Access With Relay
> +===============================
> +
> +Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to
> +increase database scalability for a big deployments.  Mainly, OVN (Open Virtual
> +Network) Southbound Database deployments.  This document describes the main
> +concept and provides the configuration examples.
> +
> +What is OVSDB Relay?
> +--------------------
> +
> +Relay is a database service model in which one ``ovsdb-server`` (``relay``)
> +connects to another standalone or clustered database server
> +(``relay source``) and maintains in-memory copy of its data, receiving
> +all the updates via this OVSDB connection.  Relay server handles all the
> +read-only requests (monitors and transactions) on its own and forwards all the
> +transactions that requires database modifications to the relay source.

s/that requires/that require/

> +
> +Why is this needed?
> +-------------------
> +
> +Some OVN deployment could have hundreds or even thousands nodes, on each of

s/nodes,/of nodes. On/

> +these nodes there is an ovn-controller, which is connected to the
> +OVN_Southbound database that is served by a standalone or clustered OVSDB.
> +Standalone database is handled by a single ovsdb-server process and clustered
> +could consist of 3 to 5 ovsdb-server processes.  For the clustered database,
> +higher number of servers may significantly increase transaction latency due
> +to necessity for these servers to reach consensus.  So, in the end limited
> +number of ovsdb-server processes serves ever growing number of clients and this
> +leads to performance issues.
> +
> +Read-only access could be scaled up with OVSDB replication on top of
> +active-backup service model, but ovn-controller is a read-mostly client, not
> +a read-only, i.e. it needs to execute write transactions from time to time.
> +Here relay service model comes into play.
> +
> +2-Tier Deployment
> +-----------------
> +
> +Solution for the scaling issue could look like a 2-tier deployment, where
> +a set of relay servers is connected to the main database cluster
> +(OVN_Southbound) and clients (ovn-conrtoller) connected to these relay
> +servers::
> +
> +                                    172.16.0.1
> +   +--------------------+   +----+ ovsdb-relay-1 +--+---+ client-1
> +   |                    |   |                       |
> +   |    Clustered       |   |                       +---+ client-2
> +   |     Database       |   |                        ...
> +   |                    |   |                       +---+ client-N
> +   |    10.0.0.2        |   |
> +   |  ovsdb-server-2    |   |       172.16.0.2
> +   |   +        +       |   +----+ ovsdb-relay-2 +--+---+ client-N+1
> +   |   |        |       |   |                       |
> +   |   |        +       +---+                       +---+ client-N+2
> +   |   |   10.0.0.1     |   |                        ...
> +   |   | ovsdb-server-1 |   |                       +---+ client-2N
> +   |   |        +       |   |
> +   |   |        |       |   |
> +   |   +        +       |   +      ... ... ... ... ...
> +   |  ovsdb-server-3    |   |
> +   |    10.0.0.3        |   |                       +---+ client-KN-1
> +   |                    |   |       172.16.0.K      |
> +   +--------------------+   +----+ ovsdb-relay-K +--+---+ client-KN
> +
> +In practice, the picture might look a bit more complex, because all relay
> +servers might connect to any member of a main cluster and clients might
> +connect to any relay server of their choice.
> +
> +Assuming that servers of a main cluster started like this::
> +
> +  $ ovsdb-server --remote=ptcp:10.0.0.1:6642 ovn-sb-1.db
> +
> +The same for other two servers.  In this case relay servers could be
> +started like this::
> +
> +  $ REMOTES=tcp:10.0.0.1:6642,tcp:10.0.0.2:6642,tcp:10.0.0.3:6642
> +  $ ovsdb-server --remote=ptcp:172.16.0.1:6642 relay:OVN_Southbound:$REMOTES
> +  $ ...
> +  $ ovsdb-server --remote=ptcp:172.16.0.K:6642 relay:OVN_Southbound:$REMOTES
> +
> +Every relay server could connect to any of the cluster members of their choice,
> +fairness of load distribution is achieved by shuffling remotes.

I guess this assumes a large number of remotes? What I mean here is
there is no mechanism actively shuffling - it is dependent on a large
number of clients connecting to randomly selected nodes?

As relays are meant to be ephemeral, what would happen if we brought one
down for some reason? I presume that all connections would then migrate
to the next client in their list? In this case, it is probably likely
that they all have the same list which would cause them all to propagate
to the same relay?

> +
> +For the actual clients, they could be configured to connect to any of the
> +relay servers.  For ovn-controllers the configuration could look like this::
> +
> +  $ REMOTES=tcp:172.16.0.1:6642,...,tcp:172.16.0.K:6642
> +  $ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES
> +
> +Setup like this allows the system to serve ``K * N`` clients while having only
> +``K`` actual connections on the main clustered database keeping it in a
> +stable state.
> +
> +It's also possible to create multi-tier deployments by connecting one set
> +of relay servers to another (smaller) set of relay servers, or even create
> +tree-like structures by the cost of increased latency for write transactions,
> +because they will be forwarded multiple times.
> diff --git a/NEWS b/NEWS
> index ebba17b22..391b0abba 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,6 +1,9 @@
>  Post-v2.15.0
>  ---------------------
>     - OVSDB:
> +     * Introduced new database service model - "relay".  Targeted to scale out
> +       read-mostly access (ovn-controller) to existing databases.
> +       For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
>       * New command line options --record/--replay for ovsdb-server and
>         ovsdb-client to record and replay all the incoming transactions,
>         monitors, etc.  More datails in Documentation/topics/record-replay.rst.
> diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
> index fdd52e8f6..dac0f02cb 100644
> --- a/ovsdb/ovsdb-server.1.in
> +++ b/ovsdb/ovsdb-server.1.in
> @@ -10,6 +10,7 @@ ovsdb\-server \- Open vSwitch database server
>  .SH SYNOPSIS
>  \fBovsdb\-server\fR
>  [\fIdatabase\fR]\&...
> +[\fIrelay:schema_name:remote\fR]\&...
>  [\fB\-\-remote=\fIremote\fR]\&...
>  [\fB\-\-run=\fIcommand\fR]
>  .so lib/daemon-syn.man
> @@ -35,12 +36,15 @@ For an introduction to OVSDB and its implementation in Open vSwitch,
>  see \fBovsdb\fR(7).
>  .PP
>  Each OVSDB file may be specified on the command line as \fIdatabase\fR.
> -If none is specified, the default is \fB@DBDIR@/conf.db\fR.  The database
> -files must already have been created and initialized using, for
> -example, \fBovsdb\-tool\fR's \fBcreate\fR, \fBcreate\-cluster\fR, or
> -\fBjoin\-cluster\fR command.
> +Relay databases may be specified on the command line as
> +\fIrelay:schema_name:remote\fR.  For a detailed description of relay database
> +argument, see \fBovsdb\fR(7).
> +If none of database files or relay databases is specified, the default is
> +\fB@DBDIR@/conf.db\fR.  The database files must already have been created and
> +initialized using, for example, \fBovsdb\-tool\fR's \fBcreate\fR,
> +\fBcreate\-cluster\fR, or \fBjoin\-cluster\fR command.
>  .PP
> -This OVSDB implementation supports standalone, active-backup, and
> +This OVSDB implementation supports standalone, active-backup, relay and
>  clustered database service models, as well as database replication.
>  See the Service Models section of \fBovsdb\fR(7) for more information.
>  .PP
> @@ -50,7 +54,9 @@ successfully join a cluster (if the database file is freshly created
>  with \fBovsdb\-tool join\-cluster\fR) or connect to a cluster that it
>  has already joined.  Use \fBovsdb\-client wait\fR (see
>  \fBovsdb\-client\fR(1)) to wait until the server has successfully
> -joined and connected to a cluster.
> +joined and connected to a cluster.  The same is true for relay databases.
> +Same commands could be used to wait for a relay database to connect to
> +the relay source (remote).
>  .PP
>  In addition to user-specified databases, \fBovsdb\-server\fR version
>  2.9 and later also always hosts a built-in database named
> @@ -243,10 +249,11 @@ not list remotes added indirectly because they were read from the
>  database by configuring a
>  \fBdb:\fIdb\fB,\fItable\fB,\fIcolumn\fR remote.
>  .
> -.IP "\fBovsdb\-server/add\-db \fIdatabase\fR"
> -Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  The database
> -file must already have been created and initialized using, for example,
> -\fBovsdb\-tool create\fR.
> +.IP "\fBovsdb\-server/add\-db \fIdatabase\fR
> +Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  \fIdatabase\fR
> +could be a database file or a relay description in the following format:
> +\fIrelay:schema_name:remote\fR.  The database file must already have been
> +created and initialized using, for example, \fBovsdb\-tool create\fR.
>  .
>  .IP "\fBovsdb\-server/remove\-db \fIdatabase\fR"
>  Removes \fIdatabase\fR from the running \fBovsdb\-server\fR.  \fIdatabase\fR
>
Ilya Maximets July 12, 2021, 11:48 p.m. UTC | #3
On 6/25/21 3:35 PM, Dumitru Ceara wrote:
> On 6/12/21 4:00 AM, Ilya Maximets wrote:
>> Main documentation for the service model and tutorial with the use case
>> and configuration examples.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
> 
> I left a few minor comments below.  With them addressed:
> 
> Acked-by: Dumitru Ceara <dceara@redhat.com>
> 
> Thanks!
> 
>>  Documentation/automake.mk            |   1 +
>>  Documentation/ref/ovsdb.7.rst        |  62 ++++++++++++--
>>  Documentation/topics/index.rst       |   1 +
>>  Documentation/topics/ovsdb-relay.rst | 124 +++++++++++++++++++++++++++
>>  NEWS                                 |   3 +
>>  ovsdb/ovsdb-server.1.in              |  27 +++---
>>  6 files changed, 200 insertions(+), 18 deletions(-)
>>  create mode 100644 Documentation/topics/ovsdb-relay.rst
>>
>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>> index bc30f94c5..213d9c867 100644
>> --- a/Documentation/automake.mk
>> +++ b/Documentation/automake.mk
>> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>>  	Documentation/topics/networking-namespaces.rst \
>>  	Documentation/topics/openflow.rst \
>>  	Documentation/topics/ovs-extensions.rst \
>> +	Documentation/topics/ovsdb-relay.rst \
>>  	Documentation/topics/ovsdb-replication.rst \
>>  	Documentation/topics/porting.rst \
>>  	Documentation/topics/record-replay.rst \
>> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
>> index e4f1bf766..a5b8a9c33 100644
>> --- a/Documentation/ref/ovsdb.7.rst
>> +++ b/Documentation/ref/ovsdb.7.rst
>> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, respectively.
>>  Service Models
>>  ==============
>>  
>> -OVSDB supports three service models for databases: **standalone**,
>> -**active-backup**, and **clustered**.  The service models provide different
>> -compromises among consistency, availability, and partition tolerance.  They
>> -also differ in the number of servers required and in terms of performance.  The
>> -standalone and active-backup database service models share one on-disk format,
>> -and clustered databases use a different format, but the OVSDB programs work
>> -with both formats.  ``ovsdb(5)`` documents these file formats.
>> +OVSDB supports four service models for databases: **standalone**,
>> +**active-backup**, **relay** and **clustered**.  The service models provide
>> +different compromises among consistency, availability, and partition tolerance.
>> +They also differ in the number of servers required and in terms of performance.
>> +The standalone and active-backup database service models share one on-disk
>> +format, and clustered databases use a different format, but the OVSDB programs
>> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
>> +databases has no on-disk storage.
> 
> s/has/have

OK.

> 
>>  
>>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>>  any particular service model.
>> @@ -406,6 +407,50 @@ following consequences:
>>    that the client previously read.  The OVSDB client library in Open vSwitch
>>    uses this feature to avoid servers with stale data.
>>  
>> +Relay Service Model
>> +-------------------
>> +
>> +A **relay** database is a way to scale out read-mostly access to the
>> +existing database working in any service model including relay.
>> +
>> +Relay database creates and maintains an OVSDB connection with other OVSDB
>> +server.  It uses this connection to maintain in-memory copy of the remote
>> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
>> +database content changes on relay source in the real time.
>> +
>> +The purpose of relay server is to scale out the number of database clients.
>> +Read-only transactions and monitor requests are fully handled by the relay
>> +server itself.  For the transactions that requests database modifications,
> 
> s/requests/request

OK.

> 
>> +relay works as a proxy between the client and the relay source, i.e. it
>> +forwards transactions and replies between them.
>> +
>> +Compared to a clustered and active-backup models, relay service model provides
> 
> s/Compared to a/Compared to the

OK.

> 
>> +read and write access to the database similarly to a clustered database (and
>> +even more scalable), but with generally insignificant performance overhead of
> 
> Joke: citation needed

:)

> 
>> +an active-backup model.  At the same time it doesn't increase availability that
>> +needs to be covered by the service model of the relay source.
>> +
>> +Relay database has no on-disk storage and therefore cannot be converted to
>> +any other service model.
>> +
>> +If there is already a database started in any service model, to start a relay
>> +database server use ``ovsdb-server relay:<DB_NAME>:<relay source>``, where
>> +``<DB_NAME>`` is the database name as specified in the schema of the database
>> +that existing server runs, and ``<relay source>`` is an OVSDB connection method
>> +(see `Connection Methods`_ below) that connects to the existing database
>> +server.  ``<relay source>`` could contain a comma-separated list of connection
>> +methods, e.g. to connect to any server of the clustered database.
>> +Multiple relay servers could be started for the same relay source.
>> +
>> +Since the way how relay handles read and write transactions is very similar
>> +to the clustered model where "cluster" means "set or relay servers connected
>> +to the same relay source", "follower" means "relay server" and the "leader"
>> +means "relay source", same consistency consequences as for the clustered
>> +model applies to relay as well (See `Understanding Cluster Consistency`_
>> +above).
>> +
>> +Open vSwitch 2.16 introduced support for relay service model.
>> +
>>  Database Replication
>>  ====================
>>  
>> @@ -414,7 +459,8 @@ Replication, in this context, means to make, and keep up-to-date, a read-only
>>  copy of the contents of a database (the ``replica``).  One use of replication
>>  is to keep an up-to-date backup of a database.  A replica used solely for
>>  backup would not need to support clients of its own.  A set of replicas that do
>> -serve clients could be used to scale out read access to the primary database.
>> +serve clients could be used to scale out read access to the primary database,
>> +however `Relay Service Model`_ is more suitable for that purpose.
>>  
>>  A database replica is set up in the same way as a backup server in an
>>  active-backup pair, with the difference that the replica is never promoted to
>> diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
>> index 0036567eb..d8ccbd757 100644
>> --- a/Documentation/topics/index.rst
>> +++ b/Documentation/topics/index.rst
>> @@ -44,6 +44,7 @@ OVS
>>     openflow
>>     bonding
>>     networking-namespaces
>> +   ovsdb-relay
>>     ovsdb-replication
>>     dpdk/index
>>     windows
>> diff --git a/Documentation/topics/ovsdb-relay.rst b/Documentation/topics/ovsdb-relay.rst
>> new file mode 100644
>> index 000000000..40d294c55
>> --- /dev/null
>> +++ b/Documentation/topics/ovsdb-relay.rst
>> @@ -0,0 +1,124 @@
>> +..
>> +      Copyright 2021, Red Hat, Inc.
>> +
>> +      Licensed under the Apache License, Version 2.0 (the "License"); you may
>> +      not use this file except in compliance with the License. You may obtain
>> +      a copy of the License at
>> +
>> +          http://www.apache.org/licenses/LICENSE-2.0
>> +
>> +      Unless required by applicable law or agreed to in writing, software
>> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
>> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
>> +      License for the specific language governing permissions and limitations
>> +      under the License.
>> +
>> +      Convention for heading levels in Open vSwitch documentation:
>> +
>> +      =======  Heading 0 (reserved for the title in a document)
>> +      -------  Heading 1
>> +      ~~~~~~~  Heading 2
>> +      +++++++  Heading 3
>> +      '''''''  Heading 4
>> +
>> +      Avoid deeper levels because they do not render well.
>> +
>> +===============================
>> +Scaling OVSDB Access With Relay
>> +===============================
>> +
>> +Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to
>> +increase database scalability for a big deployments.  Mainly, OVN (Open Virtual
>> +Network) Southbound Database deployments.  This document describes the main
>> +concept and provides the configuration examples.
>> +
>> +What is OVSDB Relay?
>> +--------------------
>> +
>> +Relay is a database service model in which one ``ovsdb-server`` (``relay``)
>> +connects to another standalone or clustered database server
>> +(``relay source``) and maintains in-memory copy of its data, receiving
>> +all the updates via this OVSDB connection.  Relay server handles all the
>> +read-only requests (monitors and transactions) on its own and forwards all the
>> +transactions that requires database modifications to the relay source.
>> +
>> +Why is this needed?
>> +-------------------
>> +
>> +Some OVN deployment could have hundreds or even thousands nodes, on each of
>> +these nodes there is an ovn-controller, which is connected to the
>> +OVN_Southbound database that is served by a standalone or clustered OVSDB.
>> +Standalone database is handled by a single ovsdb-server process and clustered
>> +could consist of 3 to 5 ovsdb-server processes.  For the clustered database,
>> +higher number of servers may significantly increase transaction latency due
>> +to necessity for these servers to reach consensus.  So, in the end limited
>> +number of ovsdb-server processes serves ever growing number of clients and this
>> +leads to performance issues.
>> +
>> +Read-only access could be scaled up with OVSDB replication on top of
>> +active-backup service model, but ovn-controller is a read-mostly client, not
>> +a read-only, i.e. it needs to execute write transactions from time to time.
>> +Here relay service model comes into play.
>> +
>> +2-Tier Deployment
>> +-----------------
>> +
>> +Solution for the scaling issue could look like a 2-tier deployment, where
>> +a set of relay servers is connected to the main database cluster
>> +(OVN_Southbound) and clients (ovn-conrtoller) connected to these relay
>> +servers::
>> +
>> +                                    172.16.0.1
>> +   +--------------------+   +----+ ovsdb-relay-1 +--+---+ client-1
>> +   |                    |   |                       |
>> +   |    Clustered       |   |                       +---+ client-2
>> +   |     Database       |   |                        ...
>> +   |                    |   |                       +---+ client-N
>> +   |    10.0.0.2        |   |
>> +   |  ovsdb-server-2    |   |       172.16.0.2
>> +   |   +        +       |   +----+ ovsdb-relay-2 +--+---+ client-N+1
>> +   |   |        |       |   |                       |
>> +   |   |        +       +---+                       +---+ client-N+2
>> +   |   |   10.0.0.1     |   |                        ...
>> +   |   | ovsdb-server-1 |   |                       +---+ client-2N
>> +   |   |        +       |   |
>> +   |   |        |       |   |
>> +   |   +        +       |   +      ... ... ... ... ...
>> +   |  ovsdb-server-3    |   |
>> +   |    10.0.0.3        |   |                       +---+ client-KN-1
>> +   |                    |   |       172.16.0.K      |
>> +   +--------------------+   +----+ ovsdb-relay-K +--+---+ client-KN
>> +
>> +In practice, the picture might look a bit more complex, because all relay
>> +servers might connect to any member of a main cluster and clients might
>> +connect to any relay server of their choice.
>> +
>> +Assuming that servers of a main cluster started like this::
>> +
>> +  $ ovsdb-server --remote=ptcp:10.0.0.1:6642 ovn-sb-1.db
>> +
>> +The same for other two servers.  In this case relay servers could be
>> +started like this::
>> +
>> +  $ REMOTES=tcp:10.0.0.1:6642,tcp:10.0.0.2:6642,tcp:10.0.0.3:6642
>> +  $ ovsdb-server --remote=ptcp:172.16.0.1:6642 relay:OVN_Southbound:$REMOTES
>> +  $ ...
>> +  $ ovsdb-server --remote=ptcp:172.16.0.K:6642 relay:OVN_Southbound:$REMOTES

I also fixed the way passive connections configured (port should go before the ip).

>> +
>> +Every relay server could connect to any of the cluster members of their choice,
>> +fairness of load distribution is achieved by shuffling remotes.
>> +
>> +For the actual clients, they could be configured to connect to any of the
>> +relay servers.  For ovn-controllers the configuration could look like this::
>> +
>> +  $ REMOTES=tcp:172.16.0.1:6642,...,tcp:172.16.0.K:6642
>> +  $ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES
>> +
>> +Setup like this allows the system to serve ``K * N`` clients while having only
>> +``K`` actual connections on the main clustered database keeping it in a
>> +stable state.
>> +
>> +It's also possible to create multi-tier deployments by connecting one set
>> +of relay servers to another (smaller) set of relay servers, or even create
>> +tree-like structures by the cost of increased latency for write transactions,
> 
> s/by/with

OK.

> 
>> +because they will be forwarded multiple times.
>> diff --git a/NEWS b/NEWS
>> index ebba17b22..391b0abba 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -1,6 +1,9 @@
>>  Post-v2.15.0
>>  ---------------------
>>     - OVSDB:
>> +     * Introduced new database service model - "relay".  Targeted to scale out
>> +       read-mostly access (ovn-controller) to existing databases.
>> +       For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
>>       * New command line options --record/--replay for ovsdb-server and
>>         ovsdb-client to record and replay all the incoming transactions,
>>         monitors, etc.  More datails in Documentation/topics/record-replay.rst.
>> diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
>> index fdd52e8f6..dac0f02cb 100644
>> --- a/ovsdb/ovsdb-server.1.in
>> +++ b/ovsdb/ovsdb-server.1.in
>> @@ -10,6 +10,7 @@ ovsdb\-server \- Open vSwitch database server
>>  .SH SYNOPSIS
>>  \fBovsdb\-server\fR
>>  [\fIdatabase\fR]\&...
>> +[\fIrelay:schema_name:remote\fR]\&...
>>  [\fB\-\-remote=\fIremote\fR]\&...
>>  [\fB\-\-run=\fIcommand\fR]
>>  .so lib/daemon-syn.man
>> @@ -35,12 +36,15 @@ For an introduction to OVSDB and its implementation in Open vSwitch,
>>  see \fBovsdb\fR(7).
>>  .PP
>>  Each OVSDB file may be specified on the command line as \fIdatabase\fR.
>> -If none is specified, the default is \fB@DBDIR@/conf.db\fR.  The database
>> -files must already have been created and initialized using, for
>> -example, \fBovsdb\-tool\fR's \fBcreate\fR, \fBcreate\-cluster\fR, or
>> -\fBjoin\-cluster\fR command.
>> +Relay databases may be specified on the command line as
>> +\fIrelay:schema_name:remote\fR.  For a detailed description of relay database
>> +argument, see \fBovsdb\fR(7).
>> +If none of database files or relay databases is specified, the default is
>> +\fB@DBDIR@/conf.db\fR.  The database files must already have been created and
>> +initialized using, for example, \fBovsdb\-tool\fR's \fBcreate\fR,
>> +\fBcreate\-cluster\fR, or \fBjoin\-cluster\fR command.
>>  .PP
>> -This OVSDB implementation supports standalone, active-backup, and
>> +This OVSDB implementation supports standalone, active-backup, relay and
>>  clustered database service models, as well as database replication.
>>  See the Service Models section of \fBovsdb\fR(7) for more information.
>>  .PP
>> @@ -50,7 +54,9 @@ successfully join a cluster (if the database file is freshly created
>>  with \fBovsdb\-tool join\-cluster\fR) or connect to a cluster that it
>>  has already joined.  Use \fBovsdb\-client wait\fR (see
>>  \fBovsdb\-client\fR(1)) to wait until the server has successfully
>> -joined and connected to a cluster.
>> +joined and connected to a cluster.  The same is true for relay databases.
>> +Same commands could be used to wait for a relay database to connect to
>> +the relay source (remote).
>>  .PP
>>  In addition to user-specified databases, \fBovsdb\-server\fR version
>>  2.9 and later also always hosts a built-in database named
>> @@ -243,10 +249,11 @@ not list remotes added indirectly because they were read from the
>>  database by configuring a
>>  \fBdb:\fIdb\fB,\fItable\fB,\fIcolumn\fR remote.
>>  .
>> -.IP "\fBovsdb\-server/add\-db \fIdatabase\fR"
>> -Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  The database
>> -file must already have been created and initialized using, for example,
>> -\fBovsdb\-tool create\fR.
>> +.IP "\fBovsdb\-server/add\-db \fIdatabase\fR
>> +Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  \fIdatabase\fR
>> +could be a database file or a relay description in the following format:
>> +\fIrelay:schema_name:remote\fR.  The database file must already have been
>> +created and initialized using, for example, \fBovsdb\-tool create\fR.
>>  .
>>  .IP "\fBovsdb\-server/remove\-db \fIdatabase\fR"
>>  Removes \fIdatabase\fR from the running \fBovsdb\-server\fR.  \fIdatabase\fR
>>
>
Ilya Maximets July 12, 2021, 11:55 p.m. UTC | #4
On 7/2/21 1:05 PM, Mark Gray wrote:
> On 12/06/2021 03:00, Ilya Maximets wrote:
>> Main documentation for the service model and tutorial with the use case
>> and configuration examples.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>  Documentation/automake.mk            |   1 +
>>  Documentation/ref/ovsdb.7.rst        |  62 ++++++++++++--
>>  Documentation/topics/index.rst       |   1 +
>>  Documentation/topics/ovsdb-relay.rst | 124 +++++++++++++++++++++++++++
>>  NEWS                                 |   3 +
>>  ovsdb/ovsdb-server.1.in              |  27 +++---
>>  6 files changed, 200 insertions(+), 18 deletions(-)
>>  create mode 100644 Documentation/topics/ovsdb-relay.rst
>>
>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>> index bc30f94c5..213d9c867 100644
>> --- a/Documentation/automake.mk
>> +++ b/Documentation/automake.mk
>> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>>  	Documentation/topics/networking-namespaces.rst \
>>  	Documentation/topics/openflow.rst \
>>  	Documentation/topics/ovs-extensions.rst \
>> +	Documentation/topics/ovsdb-relay.rst \
>>  	Documentation/topics/ovsdb-replication.rst \
>>  	Documentation/topics/porting.rst \
>>  	Documentation/topics/record-replay.rst \
>> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
>> index e4f1bf766..a5b8a9c33 100644
>> --- a/Documentation/ref/ovsdb.7.rst
>> +++ b/Documentation/ref/ovsdb.7.rst
>> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, respectively.
>>  Service Models
>>  ==============
>>  
>> -OVSDB supports three service models for databases: **standalone**,
>> -**active-backup**, and **clustered**.  The service models provide different
>> -compromises among consistency, availability, and partition tolerance.  They
>> -also differ in the number of servers required and in terms of performance.  The
>> -standalone and active-backup database service models share one on-disk format,
>> -and clustered databases use a different format, but the OVSDB programs work
>> -with both formats.  ``ovsdb(5)`` documents these file formats.
>> +OVSDB supports four service models for databases: **standalone**,
>> +**active-backup**, **relay** and **clustered**.  The service models provide
>> +different compromises among consistency, availability, and partition tolerance.
>> +They also differ in the number of servers required and in terms of performance.
>> +The standalone and active-backup database service models share one on-disk
>> +format, and clustered databases use a different format, but the OVSDB programs
>> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
>> +databases has no on-disk storage.
> 
> s/has/have

OK.

> 
>>  
>>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>>  any particular service model.
>> @@ -406,6 +407,50 @@ following consequences:
>>    that the client previously read.  The OVSDB client library in Open vSwitch
>>    uses this feature to avoid servers with stale data.
>>  
>> +Relay Service Model
>> +-------------------
>> +
>> +A **relay** database is a way to scale out read-mostly access to the
>> +existing database working in any service model including relay.
>> +
>> +Relay database creates and maintains an OVSDB connection with other OVSDB
> 
> s/other/another

OK.

> 
>> +server.  It uses this connection to maintain in-memory copy of the remote
> 
> s/maintain/maintain an/

OK.

> 
>> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
>> +database content changes on relay source in the real time.
> 
> s/on/on the/

OK.

> 
>> +
>> +The purpose of relay server is to scale out the number of database clients.
>> +Read-only transactions and monitor requests are fully handled by the relay
>> +server itself.  For the transactions that requests database modifications,
>> +relay works as a proxy between the client and the relay source, i.e. it
>> +forwards transactions and replies between them.
>> +
>> +Compared to a clustered and active-backup models, relay service model provides
>> +read and write access to the database similarly to a clustered database (and
>> +even more scalable), but with generally insignificant performance overhead of
>> +an active-backup model.  At the same time it doesn't increase availability that
>> +needs to be covered by the service model of the relay source.
>> +
>> +Relay database has no on-disk storage and therefore cannot be converted to
>> +any other service model.
>> +
>> +If there is already a database started in any service model, to start a relay
>> +database server use ``ovsdb-server relay:<DB_NAME>:<relay source>``, where
>> +``<DB_NAME>`` is the database name as specified in the schema of the database
>> +that existing server runs, and ``<relay source>`` is an OVSDB connection method
>> +(see `Connection Methods`_ below) that connects to the existing database
>> +server.  ``<relay source>`` could contain a comma-separated list of connection
>> +methods, e.g. to connect to any server of the clustered database.
>> +Multiple relay servers could be started for the same relay source.
>> +
>> +Since the way how relay handles read and write transactions is very similar
> 
> s/the way how relay handles/the way relays handle/

OK.

> 
>> +to the clustered model where "cluster" means "set or relay servers connected
> 
> Do you mean "set of" here?

Yep.  Just a typo.

> 
>> +to the same relay source", "follower" means "relay server" and the "leader"
>> +means "relay source", same consistency consequences as for the clustered
>> +model applies to relay as well (See `Understanding Cluster Consistency`_
>> +above).
>> +
>> +Open vSwitch 2.16 introduced support for relay service model.
>> +
>>  Database Replication
>>  ====================
>>  
>> @@ -414,7 +459,8 @@ Replication, in this context, means to make, and keep up-to-date, a read-only
>>  copy of the contents of a database (the ``replica``).  One use of replication
>>  is to keep an up-to-date backup of a database.  A replica used solely for
>>  backup would not need to support clients of its own.  A set of replicas that do
>> -serve clients could be used to scale out read access to the primary database.
>> +serve clients could be used to scale out read access to the primary database,
>> +however `Relay Service Model`_ is more suitable for that purpose.
>>  
>>  A database replica is set up in the same way as a backup server in an
>>  active-backup pair, with the difference that the replica is never promoted to
>> diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
>> index 0036567eb..d8ccbd757 100644
>> --- a/Documentation/topics/index.rst
>> +++ b/Documentation/topics/index.rst
>> @@ -44,6 +44,7 @@ OVS
>>     openflow
>>     bonding
>>     networking-namespaces
>> +   ovsdb-relay
>>     ovsdb-replication
>>     dpdk/index
>>     windows
>> diff --git a/Documentation/topics/ovsdb-relay.rst b/Documentation/topics/ovsdb-relay.rst
>> new file mode 100644
>> index 000000000..40d294c55
>> --- /dev/null
>> +++ b/Documentation/topics/ovsdb-relay.rst
>> @@ -0,0 +1,124 @@
>> +..
>> +      Copyright 2021, Red Hat, Inc.
>> +
>> +      Licensed under the Apache License, Version 2.0 (the "License"); you may
>> +      not use this file except in compliance with the License. You may obtain
>> +      a copy of the License at
>> +
>> +          http://www.apache.org/licenses/LICENSE-2.0
>> +
>> +      Unless required by applicable law or agreed to in writing, software
>> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
>> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
>> +      License for the specific language governing permissions and limitations
>> +      under the License.
>> +
>> +      Convention for heading levels in Open vSwitch documentation:
>> +
>> +      =======  Heading 0 (reserved for the title in a document)
>> +      -------  Heading 1
>> +      ~~~~~~~  Heading 2
>> +      +++++++  Heading 3
>> +      '''''''  Heading 4
>> +
>> +      Avoid deeper levels because they do not render well.
>> +
>> +===============================
>> +Scaling OVSDB Access With Relay
>> +===============================
>> +
>> +Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to
>> +increase database scalability for a big deployments.  Mainly, OVN (Open Virtual
>> +Network) Southbound Database deployments.  This document describes the main
>> +concept and provides the configuration examples.
>> +
>> +What is OVSDB Relay?
>> +--------------------
>> +
>> +Relay is a database service model in which one ``ovsdb-server`` (``relay``)
>> +connects to another standalone or clustered database server
>> +(``relay source``) and maintains in-memory copy of its data, receiving
>> +all the updates via this OVSDB connection.  Relay server handles all the
>> +read-only requests (monitors and transactions) on its own and forwards all the
>> +transactions that requires database modifications to the relay source.
> 
> s/that requires/that require/

OK.

> 
>> +
>> +Why is this needed?
>> +-------------------
>> +
>> +Some OVN deployment could have hundreds or even thousands nodes, on each of
> 
> s/nodes,/of nodes. On/

OK.

> 
>> +these nodes there is an ovn-controller, which is connected to the
>> +OVN_Southbound database that is served by a standalone or clustered OVSDB.
>> +Standalone database is handled by a single ovsdb-server process and clustered
>> +could consist of 3 to 5 ovsdb-server processes.  For the clustered database,
>> +higher number of servers may significantly increase transaction latency due
>> +to necessity for these servers to reach consensus.  So, in the end limited
>> +number of ovsdb-server processes serves ever growing number of clients and this
>> +leads to performance issues.
>> +
>> +Read-only access could be scaled up with OVSDB replication on top of
>> +active-backup service model, but ovn-controller is a read-mostly client, not
>> +a read-only, i.e. it needs to execute write transactions from time to time.
>> +Here relay service model comes into play.
>> +
>> +2-Tier Deployment
>> +-----------------
>> +
>> +Solution for the scaling issue could look like a 2-tier deployment, where
>> +a set of relay servers is connected to the main database cluster
>> +(OVN_Southbound) and clients (ovn-conrtoller) connected to these relay
>> +servers::
>> +
>> +                                    172.16.0.1
>> +   +--------------------+   +----+ ovsdb-relay-1 +--+---+ client-1
>> +   |                    |   |                       |
>> +   |    Clustered       |   |                       +---+ client-2
>> +   |     Database       |   |                        ...
>> +   |                    |   |                       +---+ client-N
>> +   |    10.0.0.2        |   |
>> +   |  ovsdb-server-2    |   |       172.16.0.2
>> +   |   +        +       |   +----+ ovsdb-relay-2 +--+---+ client-N+1
>> +   |   |        |       |   |                       |
>> +   |   |        +       +---+                       +---+ client-N+2
>> +   |   |   10.0.0.1     |   |                        ...
>> +   |   | ovsdb-server-1 |   |                       +---+ client-2N
>> +   |   |        +       |   |
>> +   |   |        |       |   |
>> +   |   +        +       |   +      ... ... ... ... ...
>> +   |  ovsdb-server-3    |   |
>> +   |    10.0.0.3        |   |                       +---+ client-KN-1
>> +   |                    |   |       172.16.0.K      |
>> +   +--------------------+   +----+ ovsdb-relay-K +--+---+ client-KN
>> +
>> +In practice, the picture might look a bit more complex, because all relay
>> +servers might connect to any member of a main cluster and clients might
>> +connect to any relay server of their choice.
>> +
>> +Assuming that servers of a main cluster started like this::
>> +
>> +  $ ovsdb-server --remote=ptcp:10.0.0.1:6642 ovn-sb-1.db
>> +
>> +The same for other two servers.  In this case relay servers could be
>> +started like this::
>> +
>> +  $ REMOTES=tcp:10.0.0.1:6642,tcp:10.0.0.2:6642,tcp:10.0.0.3:6642
>> +  $ ovsdb-server --remote=ptcp:172.16.0.1:6642 relay:OVN_Southbound:$REMOTES
>> +  $ ...
>> +  $ ovsdb-server --remote=ptcp:172.16.0.K:6642 relay:OVN_Southbound:$REMOTES
>> +
>> +Every relay server could connect to any of the cluster members of their choice,
>> +fairness of load distribution is achieved by shuffling remotes.
> 
> I guess this assumes a large number of remotes? What I mean here is
> there is no mechanism actively shuffling - it is dependent on a large
> number of clients connecting to randomly selected nodes?

Yes.  Each relay has a list of remotes (e.g. addresses of 3 raft members)
and the list is shuffled once before the first connection.

> 
> As relays are meant to be ephemeral, what would happen if we brought one
> down for some reason? I presume that all connections would then migrate
> to the next client in their list? In this case, it is probably likely
> that they all have the same list which would cause them all to propagate
> to the same relay?

Each client shuffles the list once before using it.  So, each client
has it's own random list of remotes.  Therefore, next remote will be
different (as far as randomization allows) for each client if they
are currently connected to a same relay.

> 
>> +
>> +For the actual clients, they could be configured to connect to any of the
>> +relay servers.  For ovn-controllers the configuration could look like this::
>> +
>> +  $ REMOTES=tcp:172.16.0.1:6642,...,tcp:172.16.0.K:6642
>> +  $ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES
>> +
>> +Setup like this allows the system to serve ``K * N`` clients while having only
>> +``K`` actual connections on the main clustered database keeping it in a
>> +stable state.
>> +
>> +It's also possible to create multi-tier deployments by connecting one set
>> +of relay servers to another (smaller) set of relay servers, or even create
>> +tree-like structures by the cost of increased latency for write transactions,
>> +because they will be forwarded multiple times.
>> diff --git a/NEWS b/NEWS
>> index ebba17b22..391b0abba 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -1,6 +1,9 @@
>>  Post-v2.15.0
>>  ---------------------
>>     - OVSDB:
>> +     * Introduced new database service model - "relay".  Targeted to scale out
>> +       read-mostly access (ovn-controller) to existing databases.
>> +       For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
>>       * New command line options --record/--replay for ovsdb-server and
>>         ovsdb-client to record and replay all the incoming transactions,
>>         monitors, etc.  More datails in Documentation/topics/record-replay.rst.
>> diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
>> index fdd52e8f6..dac0f02cb 100644
>> --- a/ovsdb/ovsdb-server.1.in
>> +++ b/ovsdb/ovsdb-server.1.in
>> @@ -10,6 +10,7 @@ ovsdb\-server \- Open vSwitch database server
>>  .SH SYNOPSIS
>>  \fBovsdb\-server\fR
>>  [\fIdatabase\fR]\&...
>> +[\fIrelay:schema_name:remote\fR]\&...
>>  [\fB\-\-remote=\fIremote\fR]\&...
>>  [\fB\-\-run=\fIcommand\fR]
>>  .so lib/daemon-syn.man
>> @@ -35,12 +36,15 @@ For an introduction to OVSDB and its implementation in Open vSwitch,
>>  see \fBovsdb\fR(7).
>>  .PP
>>  Each OVSDB file may be specified on the command line as \fIdatabase\fR.
>> -If none is specified, the default is \fB@DBDIR@/conf.db\fR.  The database
>> -files must already have been created and initialized using, for
>> -example, \fBovsdb\-tool\fR's \fBcreate\fR, \fBcreate\-cluster\fR, or
>> -\fBjoin\-cluster\fR command.
>> +Relay databases may be specified on the command line as
>> +\fIrelay:schema_name:remote\fR.  For a detailed description of relay database
>> +argument, see \fBovsdb\fR(7).
>> +If none of database files or relay databases is specified, the default is
>> +\fB@DBDIR@/conf.db\fR.  The database files must already have been created and
>> +initialized using, for example, \fBovsdb\-tool\fR's \fBcreate\fR,
>> +\fBcreate\-cluster\fR, or \fBjoin\-cluster\fR command.
>>  .PP
>> -This OVSDB implementation supports standalone, active-backup, and
>> +This OVSDB implementation supports standalone, active-backup, relay and
>>  clustered database service models, as well as database replication.
>>  See the Service Models section of \fBovsdb\fR(7) for more information.
>>  .PP
>> @@ -50,7 +54,9 @@ successfully join a cluster (if the database file is freshly created
>>  with \fBovsdb\-tool join\-cluster\fR) or connect to a cluster that it
>>  has already joined.  Use \fBovsdb\-client wait\fR (see
>>  \fBovsdb\-client\fR(1)) to wait until the server has successfully
>> -joined and connected to a cluster.
>> +joined and connected to a cluster.  The same is true for relay databases.
>> +Same commands could be used to wait for a relay database to connect to
>> +the relay source (remote).
>>  .PP
>>  In addition to user-specified databases, \fBovsdb\-server\fR version
>>  2.9 and later also always hosts a built-in database named
>> @@ -243,10 +249,11 @@ not list remotes added indirectly because they were read from the
>>  database by configuring a
>>  \fBdb:\fIdb\fB,\fItable\fB,\fIcolumn\fR remote.
>>  .
>> -.IP "\fBovsdb\-server/add\-db \fIdatabase\fR"
>> -Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  The database
>> -file must already have been created and initialized using, for example,
>> -\fBovsdb\-tool create\fR.
>> +.IP "\fBovsdb\-server/add\-db \fIdatabase\fR
>> +Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  \fIdatabase\fR
>> +could be a database file or a relay description in the following format:
>> +\fIrelay:schema_name:remote\fR.  The database file must already have been
>> +created and initialized using, for example, \fBovsdb\-tool create\fR.
>>  .
>>  .IP "\fBovsdb\-server/remove\-db \fIdatabase\fR"
>>  Removes \fIdatabase\fR from the running \fBovsdb\-server\fR.  \fIdatabase\fR
>>
>
diff mbox series

Patch

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index bc30f94c5..213d9c867 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -52,6 +52,7 @@  DOC_SOURCE = \
 	Documentation/topics/networking-namespaces.rst \
 	Documentation/topics/openflow.rst \
 	Documentation/topics/ovs-extensions.rst \
+	Documentation/topics/ovsdb-relay.rst \
 	Documentation/topics/ovsdb-replication.rst \
 	Documentation/topics/porting.rst \
 	Documentation/topics/record-replay.rst \
diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
index e4f1bf766..a5b8a9c33 100644
--- a/Documentation/ref/ovsdb.7.rst
+++ b/Documentation/ref/ovsdb.7.rst
@@ -121,13 +121,14 @@  schema checksum from a schema or database file, respectively.
 Service Models
 ==============
 
-OVSDB supports three service models for databases: **standalone**,
-**active-backup**, and **clustered**.  The service models provide different
-compromises among consistency, availability, and partition tolerance.  They
-also differ in the number of servers required and in terms of performance.  The
-standalone and active-backup database service models share one on-disk format,
-and clustered databases use a different format, but the OVSDB programs work
-with both formats.  ``ovsdb(5)`` documents these file formats.
+OVSDB supports four service models for databases: **standalone**,
+**active-backup**, **relay** and **clustered**.  The service models provide
+different compromises among consistency, availability, and partition tolerance.
+They also differ in the number of servers required and in terms of performance.
+The standalone and active-backup database service models share one on-disk
+format, and clustered databases use a different format, but the OVSDB programs
+work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
+databases has no on-disk storage.
 
 RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
 any particular service model.
@@ -406,6 +407,50 @@  following consequences:
   that the client previously read.  The OVSDB client library in Open vSwitch
   uses this feature to avoid servers with stale data.
 
+Relay Service Model
+-------------------
+
+A **relay** database is a way to scale out read-mostly access to the
+existing database working in any service model including relay.
+
+Relay database creates and maintains an OVSDB connection with other OVSDB
+server.  It uses this connection to maintain in-memory copy of the remote
+database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
+database content changes on relay source in the real time.
+
+The purpose of relay server is to scale out the number of database clients.
+Read-only transactions and monitor requests are fully handled by the relay
+server itself.  For the transactions that requests database modifications,
+relay works as a proxy between the client and the relay source, i.e. it
+forwards transactions and replies between them.
+
+Compared to a clustered and active-backup models, relay service model provides
+read and write access to the database similarly to a clustered database (and
+even more scalable), but with generally insignificant performance overhead of
+an active-backup model.  At the same time it doesn't increase availability that
+needs to be covered by the service model of the relay source.
+
+Relay database has no on-disk storage and therefore cannot be converted to
+any other service model.
+
+If there is already a database started in any service model, to start a relay
+database server use ``ovsdb-server relay:<DB_NAME>:<relay source>``, where
+``<DB_NAME>`` is the database name as specified in the schema of the database
+that existing server runs, and ``<relay source>`` is an OVSDB connection method
+(see `Connection Methods`_ below) that connects to the existing database
+server.  ``<relay source>`` could contain a comma-separated list of connection
+methods, e.g. to connect to any server of the clustered database.
+Multiple relay servers could be started for the same relay source.
+
+Since the way how relay handles read and write transactions is very similar
+to the clustered model where "cluster" means "set or relay servers connected
+to the same relay source", "follower" means "relay server" and the "leader"
+means "relay source", same consistency consequences as for the clustered
+model applies to relay as well (See `Understanding Cluster Consistency`_
+above).
+
+Open vSwitch 2.16 introduced support for relay service model.
+
 Database Replication
 ====================
 
@@ -414,7 +459,8 @@  Replication, in this context, means to make, and keep up-to-date, a read-only
 copy of the contents of a database (the ``replica``).  One use of replication
 is to keep an up-to-date backup of a database.  A replica used solely for
 backup would not need to support clients of its own.  A set of replicas that do
-serve clients could be used to scale out read access to the primary database.
+serve clients could be used to scale out read access to the primary database,
+however `Relay Service Model`_ is more suitable for that purpose.
 
 A database replica is set up in the same way as a backup server in an
 active-backup pair, with the difference that the replica is never promoted to
diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
index 0036567eb..d8ccbd757 100644
--- a/Documentation/topics/index.rst
+++ b/Documentation/topics/index.rst
@@ -44,6 +44,7 @@  OVS
    openflow
    bonding
    networking-namespaces
+   ovsdb-relay
    ovsdb-replication
    dpdk/index
    windows
diff --git a/Documentation/topics/ovsdb-relay.rst b/Documentation/topics/ovsdb-relay.rst
new file mode 100644
index 000000000..40d294c55
--- /dev/null
+++ b/Documentation/topics/ovsdb-relay.rst
@@ -0,0 +1,124 @@ 
+..
+      Copyright 2021, Red Hat, Inc.
+
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+===============================
+Scaling OVSDB Access With Relay
+===============================
+
+Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to
+increase database scalability for a big deployments.  Mainly, OVN (Open Virtual
+Network) Southbound Database deployments.  This document describes the main
+concept and provides the configuration examples.
+
+What is OVSDB Relay?
+--------------------
+
+Relay is a database service model in which one ``ovsdb-server`` (``relay``)
+connects to another standalone or clustered database server
+(``relay source``) and maintains in-memory copy of its data, receiving
+all the updates via this OVSDB connection.  Relay server handles all the
+read-only requests (monitors and transactions) on its own and forwards all the
+transactions that requires database modifications to the relay source.
+
+Why is this needed?
+-------------------
+
+Some OVN deployment could have hundreds or even thousands nodes, on each of
+these nodes there is an ovn-controller, which is connected to the
+OVN_Southbound database that is served by a standalone or clustered OVSDB.
+Standalone database is handled by a single ovsdb-server process and clustered
+could consist of 3 to 5 ovsdb-server processes.  For the clustered database,
+higher number of servers may significantly increase transaction latency due
+to necessity for these servers to reach consensus.  So, in the end limited
+number of ovsdb-server processes serves ever growing number of clients and this
+leads to performance issues.
+
+Read-only access could be scaled up with OVSDB replication on top of
+active-backup service model, but ovn-controller is a read-mostly client, not
+a read-only, i.e. it needs to execute write transactions from time to time.
+Here relay service model comes into play.
+
+2-Tier Deployment
+-----------------
+
+Solution for the scaling issue could look like a 2-tier deployment, where
+a set of relay servers is connected to the main database cluster
+(OVN_Southbound) and clients (ovn-conrtoller) connected to these relay
+servers::
+
+                                    172.16.0.1
+   +--------------------+   +----+ ovsdb-relay-1 +--+---+ client-1
+   |                    |   |                       |
+   |    Clustered       |   |                       +---+ client-2
+   |     Database       |   |                        ...
+   |                    |   |                       +---+ client-N
+   |    10.0.0.2        |   |
+   |  ovsdb-server-2    |   |       172.16.0.2
+   |   +        +       |   +----+ ovsdb-relay-2 +--+---+ client-N+1
+   |   |        |       |   |                       |
+   |   |        +       +---+                       +---+ client-N+2
+   |   |   10.0.0.1     |   |                        ...
+   |   | ovsdb-server-1 |   |                       +---+ client-2N
+   |   |        +       |   |
+   |   |        |       |   |
+   |   +        +       |   +      ... ... ... ... ...
+   |  ovsdb-server-3    |   |
+   |    10.0.0.3        |   |                       +---+ client-KN-1
+   |                    |   |       172.16.0.K      |
+   +--------------------+   +----+ ovsdb-relay-K +--+---+ client-KN
+
+In practice, the picture might look a bit more complex, because all relay
+servers might connect to any member of a main cluster and clients might
+connect to any relay server of their choice.
+
+Assuming that servers of a main cluster started like this::
+
+  $ ovsdb-server --remote=ptcp:10.0.0.1:6642 ovn-sb-1.db
+
+The same for other two servers.  In this case relay servers could be
+started like this::
+
+  $ REMOTES=tcp:10.0.0.1:6642,tcp:10.0.0.2:6642,tcp:10.0.0.3:6642
+  $ ovsdb-server --remote=ptcp:172.16.0.1:6642 relay:OVN_Southbound:$REMOTES
+  $ ...
+  $ ovsdb-server --remote=ptcp:172.16.0.K:6642 relay:OVN_Southbound:$REMOTES
+
+Every relay server could connect to any of the cluster members of their choice,
+fairness of load distribution is achieved by shuffling remotes.
+
+For the actual clients, they could be configured to connect to any of the
+relay servers.  For ovn-controllers the configuration could look like this::
+
+  $ REMOTES=tcp:172.16.0.1:6642,...,tcp:172.16.0.K:6642
+  $ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES
+
+Setup like this allows the system to serve ``K * N`` clients while having only
+``K`` actual connections on the main clustered database keeping it in a
+stable state.
+
+It's also possible to create multi-tier deployments by connecting one set
+of relay servers to another (smaller) set of relay servers, or even create
+tree-like structures by the cost of increased latency for write transactions,
+because they will be forwarded multiple times.
diff --git a/NEWS b/NEWS
index ebba17b22..391b0abba 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,9 @@ 
 Post-v2.15.0
 ---------------------
    - OVSDB:
+     * Introduced new database service model - "relay".  Targeted to scale out
+       read-mostly access (ovn-controller) to existing databases.
+       For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
      * New command line options --record/--replay for ovsdb-server and
        ovsdb-client to record and replay all the incoming transactions,
        monitors, etc.  More datails in Documentation/topics/record-replay.rst.
diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
index fdd52e8f6..dac0f02cb 100644
--- a/ovsdb/ovsdb-server.1.in
+++ b/ovsdb/ovsdb-server.1.in
@@ -10,6 +10,7 @@  ovsdb\-server \- Open vSwitch database server
 .SH SYNOPSIS
 \fBovsdb\-server\fR
 [\fIdatabase\fR]\&...
+[\fIrelay:schema_name:remote\fR]\&...
 [\fB\-\-remote=\fIremote\fR]\&...
 [\fB\-\-run=\fIcommand\fR]
 .so lib/daemon-syn.man
@@ -35,12 +36,15 @@  For an introduction to OVSDB and its implementation in Open vSwitch,
 see \fBovsdb\fR(7).
 .PP
 Each OVSDB file may be specified on the command line as \fIdatabase\fR.
-If none is specified, the default is \fB@DBDIR@/conf.db\fR.  The database
-files must already have been created and initialized using, for
-example, \fBovsdb\-tool\fR's \fBcreate\fR, \fBcreate\-cluster\fR, or
-\fBjoin\-cluster\fR command.
+Relay databases may be specified on the command line as
+\fIrelay:schema_name:remote\fR.  For a detailed description of relay database
+argument, see \fBovsdb\fR(7).
+If none of database files or relay databases is specified, the default is
+\fB@DBDIR@/conf.db\fR.  The database files must already have been created and
+initialized using, for example, \fBovsdb\-tool\fR's \fBcreate\fR,
+\fBcreate\-cluster\fR, or \fBjoin\-cluster\fR command.
 .PP
-This OVSDB implementation supports standalone, active-backup, and
+This OVSDB implementation supports standalone, active-backup, relay and
 clustered database service models, as well as database replication.
 See the Service Models section of \fBovsdb\fR(7) for more information.
 .PP
@@ -50,7 +54,9 @@  successfully join a cluster (if the database file is freshly created
 with \fBovsdb\-tool join\-cluster\fR) or connect to a cluster that it
 has already joined.  Use \fBovsdb\-client wait\fR (see
 \fBovsdb\-client\fR(1)) to wait until the server has successfully
-joined and connected to a cluster.
+joined and connected to a cluster.  The same is true for relay databases.
+Same commands could be used to wait for a relay database to connect to
+the relay source (remote).
 .PP
 In addition to user-specified databases, \fBovsdb\-server\fR version
 2.9 and later also always hosts a built-in database named
@@ -243,10 +249,11 @@  not list remotes added indirectly because they were read from the
 database by configuring a
 \fBdb:\fIdb\fB,\fItable\fB,\fIcolumn\fR remote.
 .
-.IP "\fBovsdb\-server/add\-db \fIdatabase\fR"
-Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  The database
-file must already have been created and initialized using, for example,
-\fBovsdb\-tool create\fR.
+.IP "\fBovsdb\-server/add\-db \fIdatabase\fR
+Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR.  \fIdatabase\fR
+could be a database file or a relay description in the following format:
+\fIrelay:schema_name:remote\fR.  The database file must already have been
+created and initialized using, for example, \fBovsdb\-tool create\fR.
 .
 .IP "\fBovsdb\-server/remove\-db \fIdatabase\fR"
 Removes \fIdatabase\fR from the running \fBovsdb\-server\fR.  \fIdatabase\fR