diff mbox series

[ovs-dev] python: idl: Fix last-id update from a monitor reply.

Message ID 20230909021913.929606-1-i.maximets@ovn.org
State Accepted
Commit 0896dc19efb5825e3dce0ade09df1c31c0297c74
Headers show
Series [ovs-dev] python: idl: Fix last-id update from a monitor reply. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets Sept. 9, 2023, 2:18 a.m. UTC
While sending a reply to the monitor_cond_since request, server
includes the last transaction ID.  And it sends new IDs with each
subsequent update.  Current implementation doesn't use the one
supplied with a monitor reply, and only takes into account IDs
provided with monitor updates.  That may cause various issues:

1. Performance: During initialization, the last-id is set to zero.
   If re-connection will happen after receiving a monitor reply,
   but before any monitor update, the client will send a new
   monitor request with an all-zero last-id and will re-download
   the whole database again.

2. Data inconsistency: Assuming one of the clients sends a
   transaction, but our python client disconnects before receiving
   a monitor update for this transaction.  The las-id will point
   to a database state before this transaction.  On re-connection,
   this last-id will be sent and the monitor reply will contain
   a diff with a new data from that transaction.  But if another
   disconnection happens right after that, on second re-connection
   our python client will send another monitor_cond_since with
   exactly the same last-id.  That will cause receiving the same
   set of updates again.  And since it's an update2 message with
   a diff of the data, the client will remove previously applied
   result of the transaction.  At this point it will have a
   different database view with the server potentially leading
   to all sorts of data inconsistency problems.

Fix that by always updating the last-id from the latest monitor
reply.

Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 python/ovs/db/idl.py |  1 +
 tests/ovsdb-idl.at   | 22 +++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

Comments

Simon Horman Sept. 12, 2023, 2:40 p.m. UTC | #1
On Sat, Sep 09, 2023 at 04:18:36AM +0200, Ilya Maximets wrote:
> While sending a reply to the monitor_cond_since request, server
> includes the last transaction ID.  And it sends new IDs with each
> subsequent update.  Current implementation doesn't use the one
> supplied with a monitor reply, and only takes into account IDs
> provided with monitor updates.  That may cause various issues:
> 
> 1. Performance: During initialization, the last-id is set to zero.
>    If re-connection will happen after receiving a monitor reply,
>    but before any monitor update, the client will send a new
>    monitor request with an all-zero last-id and will re-download
>    the whole database again.
> 
> 2. Data inconsistency: Assuming one of the clients sends a
>    transaction, but our python client disconnects before receiving
>    a monitor update for this transaction.  The las-id will point
>    to a database state before this transaction.  On re-connection,
>    this last-id will be sent and the monitor reply will contain
>    a diff with a new data from that transaction.  But if another
>    disconnection happens right after that, on second re-connection
>    our python client will send another monitor_cond_since with
>    exactly the same last-id.  That will cause receiving the same
>    set of updates again.  And since it's an update2 message with
>    a diff of the data, the client will remove previously applied
>    result of the transaction.  At this point it will have a
>    different database view with the server potentially leading
>    to all sorts of data inconsistency problems.
> 
> Fix that by always updating the last-id from the latest monitor
> reply.
> 
> Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Acked-by: Simon Horman <horms@ovn.org>
Han Zhou Sept. 12, 2023, 6:10 p.m. UTC | #2
Thanks Ilya!

On Fri, Sep 8, 2023 at 7:18 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> While sending a reply to the monitor_cond_since request, server
> includes the last transaction ID.  And it sends new IDs with each
> subsequent update.  Current implementation doesn't use the one
> supplied with a monitor reply, and only takes into account IDs
> provided with monitor updates.  That may cause various issues:
>
> 1. Performance: During initialization, the last-id is set to zero.
>    If re-connection will happen after receiving a monitor reply,
>    but before any monitor update, the client will send a new
>    monitor request with an all-zero last-id and will re-download
>    the whole database again.
>
> 2. Data inconsistency: Assuming one of the clients sends a
>    transaction, but our python client disconnects before receiving
>    a monitor update for this transaction.  The las-id will point

nit: s/las-id/last-id

This example clearly shows the problem, but I think it may be even more
simplified: the client doesn't have to send a transaction to hit the
problem. If there are any transactions (sent by any clients) committed to
the DB between the disconnection and reconnection of the client, then there
will be such an inconsistency problem if a second disconnection (and later
reconnect again) happens immediately after receiving the monitor reply.

Acked-by: Han Zhou <hzhou@ovn.org>

>    to a database state before this transaction.  On re-connection,
>    this last-id will be sent and the monitor reply will contain
>    a diff with a new data from that transaction.  But if another
>    disconnection happens right after that, on second re-connection
>    our python client will send another monitor_cond_since with
>    exactly the same last-id.  That will cause receiving the same
>    set of updates again.  And since it's an update2 message with
>    a diff of the data, the client will remove previously applied
>    result of the transaction.  At this point it will have a
>    different database view with the server potentially leading
>    to all sorts of data inconsistency problems.
>
> Fix that by always updating the last-id from the latest monitor
> reply.
>
> Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  python/ovs/db/idl.py |  1 +
>  tests/ovsdb-idl.at   | 22 +++++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
> index 9fc2159b0..16ece0334 100644
> --- a/python/ovs/db/idl.py
> +++ b/python/ovs/db/idl.py
> @@ -494,6 +494,7 @@ class Idl(object):
>                          if not msg.result[0]:
>                              self.__clear()
>                          self.__parse_update(msg.result[2], OVSDB_UPDATE3)
> +                        self.last_id = msg.result[1]
>                      elif self.state ==
self.IDL_S_DATA_MONITOR_COND_REQUESTED:
>                          self.__clear()
>                          self.__parse_update(msg.result, OVSDB_UPDATE2)
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index df5a9d2fd..1028b0237 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -2332,6 +2332,23 @@ CHECK_STREAM_OPEN_BLOCK([Python3], [$PYTHON3
$srcdir/test-stream.py],
>  CHECK_STREAM_OPEN_BLOCK([Python3], [$PYTHON3 $srcdir/test-stream.py],
>                          [ssl6], [[[::1]]])
>
> +dnl OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS(LOG)
> +dnl
> +dnl Looks up transaction IDs in the log of OVSDB client application.
> +dnl All-zero UUID should not be sent within a monitor request more than
once,
> +dnl unless some database requests were lost (not replied).
> +m4_define([OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS],
> +[
> +   requests=$(grep -c 'send request' $1)
> +   replies=$(grep -c 'received reply' $1)
> +
> +   if test "$requests" -eq "$replies"; then
> +     AT_CHECK([grep 'monitor_cond_since' $1 \
> +                | grep -c "00000000-0000-0000-0000-000000000000" | tr -d
'\n'],
> +              [0], [1])
> +   fi
> +])
> +
>  # same as OVSDB_CHECK_IDL but uses Python IDL implementation with tcp
>  # with multiple remotes to assert the idl connects to the leader of the
Raft cluster
>  m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
> @@ -2347,10 +2364,11 @@ m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
>     pids=$(cat s2.pid s3.pid s1.pid | tr '\n' ',')
>     echo $pids
>     AT_CHECK([$PYTHON3 $srcdir/test-ovsdb.py  -t30 idl-cluster
$srcdir/idltest.ovsschema $remotes $pids $3],
> -        [0], [stdout], [ignore])
> +        [0], [stdout], [stderr])
>     remote=$(ovsdb_cluster_leader $remotes "idltest")
>     leader=$(echo $remote | cut -d'|' -f 1)
>     AT_CHECK([grep -F -- "${leader}" stdout], [0], [ignore])
> +   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
>     AT_CLEANUP])
>
>  OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL connects to leader], 3,
['remote'])
> @@ -2393,6 +2411,7 @@ m4_define([OVSDB_CHECK_CLUSTER_IDL_C],
>     AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
>              [0], [$5])
>     m4_ifval([$8], [AT_CHECK([grep '$8' stderr], [1])], [], [])
> +   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
>     AT_CLEANUP])
>
>  # Same as OVSDB_CHECK_CLUSTER_IDL_C but uses the Python IDL
implementation.
> @@ -2413,6 +2432,7 @@ m4_define([OVSDB_CHECK_CLUSTER_IDL_PY],
>     AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
>              [0], [$5])
>     m4_if([$8], [AT_CHECK([grep '$8' stderr], [1])], [], [])
> +   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
>     AT_CLEANUP])
>
>  m4_define([OVSDB_CHECK_CLUSTER_IDL],
> --
> 2.41.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Sept. 18, 2023, 7:51 p.m. UTC | #3
On 9/12/23 20:10, Han Zhou wrote:
> Thanks Ilya!
> 
> On Fri, Sep 8, 2023 at 7:18 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>
>> While sending a reply to the monitor_cond_since request, server
>> includes the last transaction ID.  And it sends new IDs with each
>> subsequent update.  Current implementation doesn't use the one
>> supplied with a monitor reply, and only takes into account IDs
>> provided with monitor updates.  That may cause various issues:
>>
>> 1. Performance: During initialization, the last-id is set to zero.
>>    If re-connection will happen after receiving a monitor reply,
>>    but before any monitor update, the client will send a new
>>    monitor request with an all-zero last-id and will re-download
>>    the whole database again.
>>
>> 2. Data inconsistency: Assuming one of the clients sends a
>>    transaction, but our python client disconnects before receiving
>>    a monitor update for this transaction.  The las-id will point
> 
> nit: s/las-id/last-id
> 
> This example clearly shows the problem, but I think it may be even more simplified: the client doesn't have to send a transaction to hit the problem. If there are any transactions (sent by any clients) committed to the DB between the disconnection and reconnection of the client, then there will be such an inconsistency problem if a second disconnection (and later reconnect again) happens immediately after receiving the monitor reply.

That's true.  I tried to highlight this possibility by using words
"one of the clients" vs "our python client".  But maybe that didn't
work as well as I hoped. :)

> Acked-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>

Thanks, Han and Simon!

I fixed the typo and applied the patch.  Also backported down to 2.17.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 9fc2159b0..16ece0334 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -494,6 +494,7 @@  class Idl(object):
                         if not msg.result[0]:
                             self.__clear()
                         self.__parse_update(msg.result[2], OVSDB_UPDATE3)
+                        self.last_id = msg.result[1]
                     elif self.state == self.IDL_S_DATA_MONITOR_COND_REQUESTED:
                         self.__clear()
                         self.__parse_update(msg.result, OVSDB_UPDATE2)
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index df5a9d2fd..1028b0237 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -2332,6 +2332,23 @@  CHECK_STREAM_OPEN_BLOCK([Python3], [$PYTHON3 $srcdir/test-stream.py],
 CHECK_STREAM_OPEN_BLOCK([Python3], [$PYTHON3 $srcdir/test-stream.py],
                         [ssl6], [[[::1]]])
 
+dnl OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS(LOG)
+dnl
+dnl Looks up transaction IDs in the log of OVSDB client application.
+dnl All-zero UUID should not be sent within a monitor request more than once,
+dnl unless some database requests were lost (not replied).
+m4_define([OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS],
+[
+   requests=$(grep -c 'send request' $1)
+   replies=$(grep -c 'received reply' $1)
+
+   if test "$requests" -eq "$replies"; then
+     AT_CHECK([grep 'monitor_cond_since' $1 \
+                | grep -c "00000000-0000-0000-0000-000000000000" | tr -d '\n'],
+              [0], [1])
+   fi
+])
+
 # same as OVSDB_CHECK_IDL but uses Python IDL implementation with tcp
 # with multiple remotes to assert the idl connects to the leader of the Raft cluster
 m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
@@ -2347,10 +2364,11 @@  m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
    pids=$(cat s2.pid s3.pid s1.pid | tr '\n' ',')
    echo $pids
    AT_CHECK([$PYTHON3 $srcdir/test-ovsdb.py  -t30 idl-cluster $srcdir/idltest.ovsschema $remotes $pids $3],
-        [0], [stdout], [ignore])
+        [0], [stdout], [stderr])
    remote=$(ovsdb_cluster_leader $remotes "idltest")
    leader=$(echo $remote | cut -d'|' -f 1)
    AT_CHECK([grep -F -- "${leader}" stdout], [0], [ignore])
+   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
    AT_CLEANUP])
 
 OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL connects to leader], 3, ['remote'])
@@ -2393,6 +2411,7 @@  m4_define([OVSDB_CHECK_CLUSTER_IDL_C],
    AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
             [0], [$5])
    m4_ifval([$8], [AT_CHECK([grep '$8' stderr], [1])], [], [])
+   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
    AT_CLEANUP])
 
 # Same as OVSDB_CHECK_CLUSTER_IDL_C but uses the Python IDL implementation.
@@ -2413,6 +2432,7 @@  m4_define([OVSDB_CHECK_CLUSTER_IDL_PY],
    AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
             [0], [$5])
    m4_if([$8], [AT_CHECK([grep '$8' stderr], [1])], [], [])
+   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
    AT_CLEANUP])
 
 m4_define([OVSDB_CHECK_CLUSTER_IDL],