diff mbox series

[ovs-dev] Revert "ovsdb-idl: Fix NULL deref reported by Coverity."

Message ID 1597126510-78635-1-git-send-email-hzhou@ovn.org
State Accepted
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev] Revert "ovsdb-idl: Fix NULL deref reported by Coverity." | expand

Commit Message

Han Zhou Aug. 11, 2020, 6:15 a.m. UTC
This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
The commit causes a regression in OVN scale test. ovn-northd's CPU
more than doubled for the test scenario: create and bind 12k ports.
Below are some perf data of ovn-northd when running command:
  ovn-nbctl --wait=sb sync

Before reverting this commit:
-   92.42%     0.62%  ovn-northd  ovn-northd          [.] main
   - 91.80% main
      + 68.93% ovn_db_run (inlined)
      + 22.45% ovsdb_idl_loop_commit_and_wait

After reverting this commit:
-   92.84%     0.60%  ovn-northd  ovn-northd          [.] main
   - 92.24% main
      + 92.03% ovn_db_run (inlined)

Reverting this commit avoided 22.45% of the CPU caused by
ovsdb_idl_loop_commit_and_wait().

The commit changed the logic of ovsdb_idl_txn_write__() by adding
the check "datum->keys && datum->values" before discarding unchanged
data in a transaction. However, it is normal for OVSDB clients (
such as ovn-northd) to try to set columns with same empty data
as it is before the transaction. IDL would discard these changes
and avoid sending big transactions to server (which would end up as
no-op on server side). In the ovn scale test scenario mentioned above,
each iteration of ovn-northd would send a transaction to server that
includes all rows of the huge Port_Binding table, which caused the
significant CPU increase of ovn-northd (and also the OVN SB DB server),
resulted in longer end to end latency of OVN configuration changes.

For the original problem the commit 68bc6f88 was trying to fix, it
doesn't seem to be a real problem. The NULL deref reported by
Coverity may be addressed in a future patch using a different approach,
if necessary.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 lib/ovsdb-idl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Ilya Maximets Aug. 11, 2020, 6:35 p.m. UTC | #1
On 8/11/20 8:15 AM, Han Zhou wrote:
> This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
> The commit causes a regression in OVN scale test. ovn-northd's CPU
> more than doubled for the test scenario: create and bind 12k ports.
> Below are some perf data of ovn-northd when running command:
>   ovn-nbctl --wait=sb sync
> 
> Before reverting this commit:
> -   92.42%     0.62%  ovn-northd  ovn-northd          [.] main
>    - 91.80% main
>       + 68.93% ovn_db_run (inlined)
>       + 22.45% ovsdb_idl_loop_commit_and_wait
> 
> After reverting this commit:
> -   92.84%     0.60%  ovn-northd  ovn-northd          [.] main
>    - 92.24% main
>       + 92.03% ovn_db_run (inlined)
> 
> Reverting this commit avoided 22.45% of the CPU caused by
> ovsdb_idl_loop_commit_and_wait().
> 
> The commit changed the logic of ovsdb_idl_txn_write__() by adding
> the check "datum->keys && datum->values" before discarding unchanged
> data in a transaction. However, it is normal for OVSDB clients (
> such as ovn-northd) to try to set columns with same empty data
> as it is before the transaction. IDL would discard these changes
> and avoid sending big transactions to server (which would end up as
> no-op on server side). In the ovn scale test scenario mentioned above,
> each iteration of ovn-northd would send a transaction to server that
> includes all rows of the huge Port_Binding table, which caused the
> significant CPU increase of ovn-northd (and also the OVN SB DB server),
> resulted in longer end to end latency of OVN configuration changes.

Good catch!

So, if the row has at least one empty column, the whole row will be sent
as update and rejected by ovsdb-server because nothing really changed.
Is it correct?

> 
> For the original problem the commit 68bc6f88 was trying to fix, it
> doesn't seem to be a real problem. The NULL deref reported by
> Coverity may be addressed in a future patch using a different approach,
> if necessary.
> 
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  lib/ovsdb-idl.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> index ef3b97b..d8f221c 100644
> --- a/lib/ovsdb-idl.c
> +++ b/lib/ovsdb-idl.c
> @@ -4631,8 +4631,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>       * transaction only does writes of existing values, without making any real
>       * changes, we will drop the whole transaction later in
>       * ovsdb_idl_txn_commit().) */
> -    if (datum->keys && datum->values &&
> -        write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> +    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
>                                           datum, &column->type)) {
>          goto discard_datum;
>      }
>
Han Zhou Aug. 11, 2020, 7:08 p.m. UTC | #2
On Tue, Aug 11, 2020 at 11:35 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 8/11/20 8:15 AM, Han Zhou wrote:
> > This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
> > The commit causes a regression in OVN scale test. ovn-northd's CPU
> > more than doubled for the test scenario: create and bind 12k ports.
> > Below are some perf data of ovn-northd when running command:
> >   ovn-nbctl --wait=sb sync
> >
> > Before reverting this commit:
> > -   92.42%     0.62%  ovn-northd  ovn-northd          [.] main
> >    - 91.80% main
> >       + 68.93% ovn_db_run (inlined)
> >       + 22.45% ovsdb_idl_loop_commit_and_wait
> >
> > After reverting this commit:
> > -   92.84%     0.60%  ovn-northd  ovn-northd          [.] main
> >    - 92.24% main
> >       + 92.03% ovn_db_run (inlined)
> >
> > Reverting this commit avoided 22.45% of the CPU caused by
> > ovsdb_idl_loop_commit_and_wait().
> >
> > The commit changed the logic of ovsdb_idl_txn_write__() by adding
> > the check "datum->keys && datum->values" before discarding unchanged
> > data in a transaction. However, it is normal for OVSDB clients (
> > such as ovn-northd) to try to set columns with same empty data
> > as it is before the transaction. IDL would discard these changes
> > and avoid sending big transactions to server (which would end up as
> > no-op on server side). In the ovn scale test scenario mentioned above,
> > each iteration of ovn-northd would send a transaction to server that
> > includes all rows of the huge Port_Binding table, which caused the
> > significant CPU increase of ovn-northd (and also the OVN SB DB server),
> > resulted in longer end to end latency of OVN configuration changes.
>
> Good catch!
>
> So, if the row has at least one empty column, the whole row will be sent
> as update and rejected by ovsdb-server because nothing really changed.
> Is it correct?
>

To be more accurate, any columns that are set by the transaction (using one
of the xxx_set_xxx APIs) will be sent to server, even if they are not
changed at all. The columns that are not set by the transaction won't be
sent.
The columns doesn't have to be empty, since in most cases datum->values are
empty (if it is not a map), even if the column is not empty.
The server wouldn't reject, instead, it makes no change and simply returns
success to the client. There is no correctness issue, but this impacts
performance - waste of CPU and bandwidth.

> >
> > For the original problem the commit 68bc6f88 was trying to fix, it
> > doesn't seem to be a real problem. The NULL deref reported by
> > Coverity may be addressed in a future patch using a different approach,
> > if necessary.
> >
> > Cc: William Tu <u9012063@gmail.com>
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >  lib/ovsdb-idl.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> > index ef3b97b..d8f221c 100644
> > --- a/lib/ovsdb-idl.c
> > +++ b/lib/ovsdb-idl.c
> > @@ -4631,8 +4631,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
*row_,
> >       * transaction only does writes of existing values, without making
any real
> >       * changes, we will drop the whole transaction later in
> >       * ovsdb_idl_txn_commit().) */
> > -    if (datum->keys && datum->values &&
> > -        write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> > +    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> >                                           datum, &column->type)) {
> >          goto discard_datum;
> >      }
> >
>
Ilya Maximets Aug. 12, 2020, 6:41 p.m. UTC | #3
On 8/11/20 9:08 PM, Han Zhou wrote:
> 
> 
> On Tue, Aug 11, 2020 at 11:35 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>
>> On 8/11/20 8:15 AM, Han Zhou wrote:
>> > This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
>> > The commit causes a regression in OVN scale test. ovn-northd's CPU
>> > more than doubled for the test scenario: create and bind 12k ports.
>> > Below are some perf data of ovn-northd when running command:
>> >   ovn-nbctl --wait=sb sync
>> >
>> > Before reverting this commit:
>> > -   92.42%     0.62%  ovn-northd  ovn-northd          [.] main
>> >    - 91.80% main
>> >       + 68.93% ovn_db_run (inlined)
>> >       + 22.45% ovsdb_idl_loop_commit_and_wait
>> >
>> > After reverting this commit:
>> > -   92.84%     0.60%  ovn-northd  ovn-northd          [.] main
>> >    - 92.24% main
>> >       + 92.03% ovn_db_run (inlined)
>> >
>> > Reverting this commit avoided 22.45% of the CPU caused by
>> > ovsdb_idl_loop_commit_and_wait().
>> >
>> > The commit changed the logic of ovsdb_idl_txn_write__() by adding
>> > the check "datum->keys && datum->values" before discarding unchanged
>> > data in a transaction. However, it is normal for OVSDB clients (
>> > such as ovn-northd) to try to set columns with same empty data
>> > as it is before the transaction. IDL would discard these changes
>> > and avoid sending big transactions to server (which would end up as
>> > no-op on server side). In the ovn scale test scenario mentioned above,
>> > each iteration of ovn-northd would send a transaction to server that
>> > includes all rows of the huge Port_Binding table, which caused the
>> > significant CPU increase of ovn-northd (and also the OVN SB DB server),
>> > resulted in longer end to end latency of OVN configuration changes.
>>
>> Good catch!
>>
>> So, if the row has at least one empty column, the whole row will be sent
>> as update and rejected by ovsdb-server because nothing really changed.
>> Is it correct?
>>
> 
> To be more accurate, any columns that are set by the transaction (using one of the xxx_set_xxx APIs) will be sent to server, even if they are not changed at all. The columns that are not set by the transaction won't be sent.
> The columns doesn't have to be empty, since in most cases datum->values are empty (if it is not a map), even if the column is not empty.
> The server wouldn't reject, instead, it makes no change and simply returns success to the client. There is no correctness issue, but this impacts performance - waste of CPU and bandwidth.

Oh.  I missed that 'values' are almost always NULL since most of
datum types doesn't have them.  So, the issue is even worse than
I though initially.

> 
>> >
>> > For the original problem the commit 68bc6f88 was trying to fix, it
>> > doesn't seem to be a real problem. The NULL deref reported by
>> > Coverity may be addressed in a future patch using a different approach,
>> > if necessary.

I looked through the code and this coverity defect looks like a false positive
indeed, because ovsdb_datum_equals() or any other function here always checks
for the type before using and NULL might be only in OVSDB_TYPE_VOID case or if
n == 0, which is also checked.


Thanks, Han!
Applied to master and branch-2.14.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index ef3b97b..d8f221c 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -4631,8 +4631,7 @@  ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
      * transaction only does writes of existing values, without making any real
      * changes, we will drop the whole transaction later in
      * ovsdb_idl_txn_commit().) */
-    if (datum->keys && datum->values &&
-        write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
+    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
                                          datum, &column->type)) {
         goto discard_datum;
     }