mbox series

[ovs-dev,v2,0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

Message ID 20210612020008.3944088-1-i.maximets@ovn.org
Headers show
Series OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment) | expand

Message

Ilya Maximets June 12, 2021, 1:59 a.m. UTC
Replication can be used to scale out read-only access to the database.
But there are clients that are not read-only, but read-mostly.
One of the main examples is ovn-controller that mostly monitors
updates from the Southbound DB, but needs to claim ports by sending
transactions that changes some database tables.

Southbound database serves lots of connections: all connections
from ovn-controllers and some service connections from cloud
infrastructure, e.g. some OpenStack agents are monitoring updates.
At a high scale and with a big size of the database ovsdb-server
spends too much time processing monitor updates and it's required
to move this load somewhere else.  This patch-set aims to introduce
required functionality to scale out read-mostly connections by
introducing a new OVSDB 'relay' service model .

In this new service model ovsdb-server connects to existing OVSDB
server and maintains in-memory copy of the database.  It serves
read-only transactions and monitor requests by its own, but forwards
write transactions to the relay source.

Key differences from the active-backup replication:
- support for "write" transactions.
- no on-disk storage. (probably, faster operation)
- support for multiple remotes (connect to the clustered db).
- doesn't try to keep connection as long as possible, but
  faster reconnects to other remotes to avoid missing updates.
- No need to know the complete database schema beforehand,
  only the schema name.
- can be used along with other standalone and clustered databases
  by the same ovsdb-server process. (doesn't turn the whole
  jsonrpc server to read-only mode)
- supports modern version of monitors (monitor_cond_since),
  because based on ovsdb-cs.
- could be chained, i.e. multiple relays could be connected
  one to another in a row or in a tree-like form.

Bringing all above functionality to the existing active-backup
replication doesn't look right as it will make it less reliable
for the actual backup use case, and this also would be much
harder from the implementation point of view, because current
replication code is not based on ovsdb-cs or idl and all the required
features would be likely duplicated or replication would be fully
re-written on top of ovsdb-cs with severe modifications of the former.

Relay is somewhere in the middle between active-backup replication and
the clustered model taking a lot from both, therefore is hard to
implement on top of any of them.

To run ovsdb-server in relay mode, user need to simply run:

  ovsdb-server --remote=punix:db.sock relay:<schema-name>:<remotes>

e.g.

  ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642

More details and examples in the documentation in the last patch
of the series.

I actually tried to implement transaction forwarding on top of
active-backup replication in v1 of this seies, but it required
a lot of tricky changes, including schema format changes in order
to bring required information to the end clients, so I decided
to fully rewrite the functionality in v2 with a different approach.

Future work:
- Add support for transaction history (it could be just inherited
  from the transaction ids received from the relay source).  This
  will allow clients to utilize monitor_cond_since while working
  with relay.
- Possibly try to inherit min_index from the relay source to give
  clients ability to detect relays with stale data.
- Probably, add support for both above things to standalone databases,
  so relays will be able to inherit not only from clustered ones.

Version 2:
  * Dropped implementation on top of active-backup replication.
  * Implemented new 'relay' service model.
  * Updated documentation and wrote a separate topic with examples
    and ascii-graphics.  That's why v2 seem larger.

Ilya Maximets (9):
  jsonrpc-server: Wake up jsonrpc session if there are completed
    triggers.
  ovsdb: storage: Allow setting the name for the unbacked storage.
  ovsdb: table: Expose functions to execute operations on ovsdb tables.
  ovsdb: row: Add support for xor-based row updates.
  ovsdb: New ovsdb 'relay' service model.
  ovsdb: relay: Add support for transaction forwarding.
  ovsdb: relay: Reflect connection status in _Server database.
  ovsdb: Make clients aware of relay service model.
  docs: Add documentation for ovsdb relay mode.

 Documentation/automake.mk            |   1 +
 Documentation/ref/ovsdb.7.rst        |  62 ++++-
 Documentation/topics/index.rst       |   1 +
 Documentation/topics/ovsdb-relay.rst | 124 +++++++++
 NEWS                                 |   3 +
 lib/ovsdb-cs.c                       |  15 +-
 ovsdb/_server.ovsschema              |   7 +-
 ovsdb/_server.xml                    |  33 +--
 ovsdb/automake.mk                    |   4 +
 ovsdb/execution.c                    |  18 +-
 ovsdb/file.c                         |   2 +-
 ovsdb/jsonrpc-server.c               |   3 +-
 ovsdb/ovsdb-client.c                 |   2 +-
 ovsdb/ovsdb-server.1.in              |  27 +-
 ovsdb/ovsdb-server.c                 | 102 ++++---
 ovsdb/ovsdb.c                        |  11 +
 ovsdb/ovsdb.h                        |   9 +-
 ovsdb/relay.c                        | 381 +++++++++++++++++++++++++++
 ovsdb/relay.h                        |  38 +++
 ovsdb/replication.c                  |  83 +-----
 ovsdb/row.c                          |  30 ++-
 ovsdb/row.h                          |   6 +-
 ovsdb/storage.c                      |  13 +-
 ovsdb/storage.h                      |   2 +-
 ovsdb/table.c                        |  70 +++++
 ovsdb/table.h                        |  14 +
 ovsdb/transaction-forward.c          | 182 +++++++++++++
 ovsdb/transaction-forward.h          |  44 ++++
 ovsdb/trigger.c                      |  48 +++-
 ovsdb/trigger.h                      |  41 +--
 python/ovs/db/idl.py                 |  16 ++
 tests/ovsdb-server.at                |  85 +++++-
 tests/test-ovsdb.c                   |   6 +-
 33 files changed, 1287 insertions(+), 196 deletions(-)
 create mode 100644 Documentation/topics/ovsdb-relay.rst
 create mode 100644 ovsdb/relay.c
 create mode 100644 ovsdb/relay.h
 create mode 100644 ovsdb/transaction-forward.c
 create mode 100644 ovsdb/transaction-forward.h

Comments

Dumitru Ceara June 25, 2021, 1:33 p.m. UTC | #1
On 6/12/21 3:59 AM, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> introducing a new OVSDB 'relay' service model .
> 
> In this new service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but forwards
> write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions.
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> 
> Bringing all above functionality to the existing active-backup
> replication doesn't look right as it will make it less reliable
> for the actual backup use case, and this also would be much
> harder from the implementation point of view, because current
> replication code is not based on ovsdb-cs or idl and all the required
> features would be likely duplicated or replication would be fully
> re-written on top of ovsdb-cs with severe modifications of the former.
> 
> Relay is somewhere in the middle between active-backup replication and
> the clustered model taking a lot from both, therefore is hard to
> implement on top of any of them.
> 
> To run ovsdb-server in relay mode, user need to simply run:
> 
>   ovsdb-server --remote=punix:db.sock relay:<schema-name>:<remotes>
> 
> e.g.
> 
>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
> 
> More details and examples in the documentation in the last patch
> of the series.
> 
> I actually tried to implement transaction forwarding on top of
> active-backup replication in v1 of this seies, but it required
> a lot of tricky changes, including schema format changes in order
> to bring required information to the end clients, so I decided
> to fully rewrite the functionality in v2 with a different approach.
> 
> Future work:
> - Add support for transaction history (it could be just inherited
>   from the transaction ids received from the relay source).  This
>   will allow clients to utilize monitor_cond_since while working
>   with relay.

Hi Ilya,

I acked most of the patches in the series (except 7/9 which I think
might need a rather straightforward change) and I saw Mark also left
some comments.

I wonder though if the lack of monitor_cond_since will be a show stopper
for deploying this in production?  Or do you expect reconnects to happen
less often do to the multi-tier nature of new deployments?

I guess we need some scale test data with this deployed to have a better
idea.

In any case, very nice work!

Regards,
Dumitru
Ilya Maximets July 13, 2021, 12:16 a.m. UTC | #2
On 6/25/21 3:33 PM, Dumitru Ceara wrote:
> On 6/12/21 3:59 AM, Ilya Maximets wrote:
>> Replication can be used to scale out read-only access to the database.
>> But there are clients that are not read-only, but read-mostly.
>> One of the main examples is ovn-controller that mostly monitors
>> updates from the Southbound DB, but needs to claim ports by sending
>> transactions that changes some database tables.
>>
>> Southbound database serves lots of connections: all connections
>> from ovn-controllers and some service connections from cloud
>> infrastructure, e.g. some OpenStack agents are monitoring updates.
>> At a high scale and with a big size of the database ovsdb-server
>> spends too much time processing monitor updates and it's required
>> to move this load somewhere else.  This patch-set aims to introduce
>> required functionality to scale out read-mostly connections by
>> introducing a new OVSDB 'relay' service model .
>>
>> In this new service model ovsdb-server connects to existing OVSDB
>> server and maintains in-memory copy of the database.  It serves
>> read-only transactions and monitor requests by its own, but forwards
>> write transactions to the relay source.
>>
>> Key differences from the active-backup replication:
>> - support for "write" transactions.
>> - no on-disk storage. (probably, faster operation)
>> - support for multiple remotes (connect to the clustered db).
>> - doesn't try to keep connection as long as possible, but
>>   faster reconnects to other remotes to avoid missing updates.
>> - No need to know the complete database schema beforehand,
>>   only the schema name.
>> - can be used along with other standalone and clustered databases
>>   by the same ovsdb-server process. (doesn't turn the whole
>>   jsonrpc server to read-only mode)
>> - supports modern version of monitors (monitor_cond_since),
>>   because based on ovsdb-cs.
>> - could be chained, i.e. multiple relays could be connected
>>   one to another in a row or in a tree-like form.
>>
>> Bringing all above functionality to the existing active-backup
>> replication doesn't look right as it will make it less reliable
>> for the actual backup use case, and this also would be much
>> harder from the implementation point of view, because current
>> replication code is not based on ovsdb-cs or idl and all the required
>> features would be likely duplicated or replication would be fully
>> re-written on top of ovsdb-cs with severe modifications of the former.
>>
>> Relay is somewhere in the middle between active-backup replication and
>> the clustered model taking a lot from both, therefore is hard to
>> implement on top of any of them.
>>
>> To run ovsdb-server in relay mode, user need to simply run:
>>
>>   ovsdb-server --remote=punix:db.sock relay:<schema-name>:<remotes>
>>
>> e.g.
>>
>>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
>>
>> More details and examples in the documentation in the last patch
>> of the series.
>>
>> I actually tried to implement transaction forwarding on top of
>> active-backup replication in v1 of this seies, but it required
>> a lot of tricky changes, including schema format changes in order
>> to bring required information to the end clients, so I decided
>> to fully rewrite the functionality in v2 with a different approach.
>>
>> Future work:
>> - Add support for transaction history (it could be just inherited
>>   from the transaction ids received from the relay source).  This
>>   will allow clients to utilize monitor_cond_since while working
>>   with relay.
> 
> Hi Ilya,
> 
> I acked most of the patches in the series (except 7/9 which I think
> might need a rather straightforward change) and I saw Mark also left
> some comments.
> 
> I wonder though if the lack of monitor_cond_since will be a show stopper
> for deploying this in production?  Or do you expect reconnects to happen
> less often do to the multi-tier nature of new deployments?

I do expect that relays will hide most of the re-connections, so clients
will have more stable connections.  In this case it should be fine to not
have monitor_cond_since for clients.  For sure, I'll work on adding
support for it.

Another factor is that deployments will, likely, have more relays
than the main servers, and so it should be easier to handle extra
load of downloading the whole database, if required.

> 
> I guess we need some scale test data with this deployed to have a better
> idea.

Sure, I collected some data from the scale tests and will include it
in the cover letter for v3.

> 
> In any case, very nice work!

Thanks!

> 
> Regards,
> Dumitru
>
Dumitru Ceara July 13, 2021, 4:29 p.m. UTC | #3
On 7/13/21 2:16 AM, Ilya Maximets wrote:
> On 6/25/21 3:33 PM, Dumitru Ceara wrote:
>> On 6/12/21 3:59 AM, Ilya Maximets wrote:
>>> Replication can be used to scale out read-only access to the database.
>>> But there are clients that are not read-only, but read-mostly.
>>> One of the main examples is ovn-controller that mostly monitors
>>> updates from the Southbound DB, but needs to claim ports by sending
>>> transactions that changes some database tables.
>>>
>>> Southbound database serves lots of connections: all connections
>>> from ovn-controllers and some service connections from cloud
>>> infrastructure, e.g. some OpenStack agents are monitoring updates.
>>> At a high scale and with a big size of the database ovsdb-server
>>> spends too much time processing monitor updates and it's required
>>> to move this load somewhere else.  This patch-set aims to introduce
>>> required functionality to scale out read-mostly connections by
>>> introducing a new OVSDB 'relay' service model .
>>>
>>> In this new service model ovsdb-server connects to existing OVSDB
>>> server and maintains in-memory copy of the database.  It serves
>>> read-only transactions and monitor requests by its own, but forwards
>>> write transactions to the relay source.
>>>
>>> Key differences from the active-backup replication:
>>> - support for "write" transactions.
>>> - no on-disk storage. (probably, faster operation)
>>> - support for multiple remotes (connect to the clustered db).
>>> - doesn't try to keep connection as long as possible, but
>>>   faster reconnects to other remotes to avoid missing updates.
>>> - No need to know the complete database schema beforehand,
>>>   only the schema name.
>>> - can be used along with other standalone and clustered databases
>>>   by the same ovsdb-server process. (doesn't turn the whole
>>>   jsonrpc server to read-only mode)
>>> - supports modern version of monitors (monitor_cond_since),
>>>   because based on ovsdb-cs.
>>> - could be chained, i.e. multiple relays could be connected
>>>   one to another in a row or in a tree-like form.
>>>
>>> Bringing all above functionality to the existing active-backup
>>> replication doesn't look right as it will make it less reliable
>>> for the actual backup use case, and this also would be much
>>> harder from the implementation point of view, because current
>>> replication code is not based on ovsdb-cs or idl and all the required
>>> features would be likely duplicated or replication would be fully
>>> re-written on top of ovsdb-cs with severe modifications of the former.
>>>
>>> Relay is somewhere in the middle between active-backup replication and
>>> the clustered model taking a lot from both, therefore is hard to
>>> implement on top of any of them.
>>>
>>> To run ovsdb-server in relay mode, user need to simply run:
>>>
>>>   ovsdb-server --remote=punix:db.sock relay:<schema-name>:<remotes>
>>>
>>> e.g.
>>>
>>>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
>>>
>>> More details and examples in the documentation in the last patch
>>> of the series.
>>>
>>> I actually tried to implement transaction forwarding on top of
>>> active-backup replication in v1 of this seies, but it required
>>> a lot of tricky changes, including schema format changes in order
>>> to bring required information to the end clients, so I decided
>>> to fully rewrite the functionality in v2 with a different approach.
>>>
>>> Future work:
>>> - Add support for transaction history (it could be just inherited
>>>   from the transaction ids received from the relay source).  This
>>>   will allow clients to utilize monitor_cond_since while working
>>>   with relay.
>>
>> Hi Ilya,
>>
>> I acked most of the patches in the series (except 7/9 which I think
>> might need a rather straightforward change) and I saw Mark also left
>> some comments.
>>
>> I wonder though if the lack of monitor_cond_since will be a show stopper
>> for deploying this in production?  Or do you expect reconnects to happen
>> less often do to the multi-tier nature of new deployments?
> 
> I do expect that relays will hide most of the re-connections, so clients
> will have more stable connections.  In this case it should be fine to not
> have monitor_cond_since for clients.  For sure, I'll work on adding
> support for it.
> 
> Another factor is that deployments will, likely, have more relays
> than the main servers, and so it should be easier to handle extra
> load of downloading the whole database, if required.
> 

Sure, but the SB database in OVN can grow reasonably large and all (if
conditional monitoring is disabled) that data will also have to be sent
on the wire and, moreover, will cause the OVN IDL clients to flush and
repopulate their in-memory view of the database.

I think that can turn out to have some impact.  But I agree, it
shouldn't be a blocker for the feature, it's something that can be added
later.

>>
>> I guess we need some scale test data with this deployed to have a better
>> idea.
> 
> Sure, I collected some data from the scale tests and will include it
> in the cover letter for v3.
> 
>>
>> In any case, very nice work!
> 
> Thanks!
> 
>>
>> Regards,
>> Dumitru
>>
> 

Thanks,
Dumitru