Message ID | 1597126510-78635-1-git-send-email-hzhou@ovn.org |
---|---|
State | Accepted |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev] Revert "ovsdb-idl: Fix NULL deref reported by Coverity." | expand |
On 8/11/20 8:15 AM, Han Zhou wrote: > This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3. > The commit causes a regression in OVN scale test. ovn-northd's CPU > more than doubled for the test scenario: create and bind 12k ports. > Below are some perf data of ovn-northd when running command: > ovn-nbctl --wait=sb sync > > Before reverting this commit: > - 92.42% 0.62% ovn-northd ovn-northd [.] main > - 91.80% main > + 68.93% ovn_db_run (inlined) > + 22.45% ovsdb_idl_loop_commit_and_wait > > After reverting this commit: > - 92.84% 0.60% ovn-northd ovn-northd [.] main > - 92.24% main > + 92.03% ovn_db_run (inlined) > > Reverting this commit avoided 22.45% of the CPU caused by > ovsdb_idl_loop_commit_and_wait(). > > The commit changed the logic of ovsdb_idl_txn_write__() by adding > the check "datum->keys && datum->values" before discarding unchanged > data in a transaction. However, it is normal for OVSDB clients ( > such as ovn-northd) to try to set columns with same empty data > as it is before the transaction. IDL would discard these changes > and avoid sending big transactions to server (which would end up as > no-op on server side). In the ovn scale test scenario mentioned above, > each iteration of ovn-northd would send a transaction to server that > includes all rows of the huge Port_Binding table, which caused the > significant CPU increase of ovn-northd (and also the OVN SB DB server), > resulted in longer end to end latency of OVN configuration changes. Good catch! So, if the row has at least one empty column, the whole row will be sent as update and rejected by ovsdb-server because nothing really changed. Is it correct? > > For the original problem the commit 68bc6f88 was trying to fix, it > doesn't seem to be a real problem. The NULL deref reported by > Coverity may be addressed in a future patch using a different approach, > if necessary. > > Cc: William Tu <u9012063@gmail.com> > Signed-off-by: Han Zhou <hzhou@ovn.org> > --- > lib/ovsdb-idl.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > index ef3b97b..d8f221c 100644 > --- a/lib/ovsdb-idl.c > +++ b/lib/ovsdb-idl.c > @@ -4631,8 +4631,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > * transaction only does writes of existing values, without making any real > * changes, we will drop the whole transaction later in > * ovsdb_idl_txn_commit().) */ > - if (datum->keys && datum->values && > - write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > + if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > datum, &column->type)) { > goto discard_datum; > } >
On Tue, Aug 11, 2020 at 11:35 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 8/11/20 8:15 AM, Han Zhou wrote: > > This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3. > > The commit causes a regression in OVN scale test. ovn-northd's CPU > > more than doubled for the test scenario: create and bind 12k ports. > > Below are some perf data of ovn-northd when running command: > > ovn-nbctl --wait=sb sync > > > > Before reverting this commit: > > - 92.42% 0.62% ovn-northd ovn-northd [.] main > > - 91.80% main > > + 68.93% ovn_db_run (inlined) > > + 22.45% ovsdb_idl_loop_commit_and_wait > > > > After reverting this commit: > > - 92.84% 0.60% ovn-northd ovn-northd [.] main > > - 92.24% main > > + 92.03% ovn_db_run (inlined) > > > > Reverting this commit avoided 22.45% of the CPU caused by > > ovsdb_idl_loop_commit_and_wait(). > > > > The commit changed the logic of ovsdb_idl_txn_write__() by adding > > the check "datum->keys && datum->values" before discarding unchanged > > data in a transaction. However, it is normal for OVSDB clients ( > > such as ovn-northd) to try to set columns with same empty data > > as it is before the transaction. IDL would discard these changes > > and avoid sending big transactions to server (which would end up as > > no-op on server side). In the ovn scale test scenario mentioned above, > > each iteration of ovn-northd would send a transaction to server that > > includes all rows of the huge Port_Binding table, which caused the > > significant CPU increase of ovn-northd (and also the OVN SB DB server), > > resulted in longer end to end latency of OVN configuration changes. > > Good catch! > > So, if the row has at least one empty column, the whole row will be sent > as update and rejected by ovsdb-server because nothing really changed. > Is it correct? > To be more accurate, any columns that are set by the transaction (using one of the xxx_set_xxx APIs) will be sent to server, even if they are not changed at all. The columns that are not set by the transaction won't be sent. The columns doesn't have to be empty, since in most cases datum->values are empty (if it is not a map), even if the column is not empty. The server wouldn't reject, instead, it makes no change and simply returns success to the client. There is no correctness issue, but this impacts performance - waste of CPU and bandwidth. > > > > For the original problem the commit 68bc6f88 was trying to fix, it > > doesn't seem to be a real problem. The NULL deref reported by > > Coverity may be addressed in a future patch using a different approach, > > if necessary. > > > > Cc: William Tu <u9012063@gmail.com> > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > --- > > lib/ovsdb-idl.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > > index ef3b97b..d8f221c 100644 > > --- a/lib/ovsdb-idl.c > > +++ b/lib/ovsdb-idl.c > > @@ -4631,8 +4631,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > > * transaction only does writes of existing values, without making any real > > * changes, we will drop the whole transaction later in > > * ovsdb_idl_txn_commit().) */ > > - if (datum->keys && datum->values && > > - write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > > + if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > > datum, &column->type)) { > > goto discard_datum; > > } > > >
On 8/11/20 9:08 PM, Han Zhou wrote: > > > On Tue, Aug 11, 2020 at 11:35 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: >> >> On 8/11/20 8:15 AM, Han Zhou wrote: >> > This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3. >> > The commit causes a regression in OVN scale test. ovn-northd's CPU >> > more than doubled for the test scenario: create and bind 12k ports. >> > Below are some perf data of ovn-northd when running command: >> > ovn-nbctl --wait=sb sync >> > >> > Before reverting this commit: >> > - 92.42% 0.62% ovn-northd ovn-northd [.] main >> > - 91.80% main >> > + 68.93% ovn_db_run (inlined) >> > + 22.45% ovsdb_idl_loop_commit_and_wait >> > >> > After reverting this commit: >> > - 92.84% 0.60% ovn-northd ovn-northd [.] main >> > - 92.24% main >> > + 92.03% ovn_db_run (inlined) >> > >> > Reverting this commit avoided 22.45% of the CPU caused by >> > ovsdb_idl_loop_commit_and_wait(). >> > >> > The commit changed the logic of ovsdb_idl_txn_write__() by adding >> > the check "datum->keys && datum->values" before discarding unchanged >> > data in a transaction. However, it is normal for OVSDB clients ( >> > such as ovn-northd) to try to set columns with same empty data >> > as it is before the transaction. IDL would discard these changes >> > and avoid sending big transactions to server (which would end up as >> > no-op on server side). In the ovn scale test scenario mentioned above, >> > each iteration of ovn-northd would send a transaction to server that >> > includes all rows of the huge Port_Binding table, which caused the >> > significant CPU increase of ovn-northd (and also the OVN SB DB server), >> > resulted in longer end to end latency of OVN configuration changes. >> >> Good catch! >> >> So, if the row has at least one empty column, the whole row will be sent >> as update and rejected by ovsdb-server because nothing really changed. >> Is it correct? >> > > To be more accurate, any columns that are set by the transaction (using one of the xxx_set_xxx APIs) will be sent to server, even if they are not changed at all. The columns that are not set by the transaction won't be sent. > The columns doesn't have to be empty, since in most cases datum->values are empty (if it is not a map), even if the column is not empty. > The server wouldn't reject, instead, it makes no change and simply returns success to the client. There is no correctness issue, but this impacts performance - waste of CPU and bandwidth. Oh. I missed that 'values' are almost always NULL since most of datum types doesn't have them. So, the issue is even worse than I though initially. > >> > >> > For the original problem the commit 68bc6f88 was trying to fix, it >> > doesn't seem to be a real problem. The NULL deref reported by >> > Coverity may be addressed in a future patch using a different approach, >> > if necessary. I looked through the code and this coverity defect looks like a false positive indeed, because ovsdb_datum_equals() or any other function here always checks for the type before using and NULL might be only in OVSDB_TYPE_VOID case or if n == 0, which is also checked. Thanks, Han! Applied to master and branch-2.14. Best regards, Ilya Maximets.
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index ef3b97b..d8f221c 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -4631,8 +4631,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, * transaction only does writes of existing values, without making any real * changes, we will drop the whole transaction later in * ovsdb_idl_txn_commit().) */ - if (datum->keys && datum->values && - write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), + if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), datum, &column->type)) { goto discard_datum; }
This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3. The commit causes a regression in OVN scale test. ovn-northd's CPU more than doubled for the test scenario: create and bind 12k ports. Below are some perf data of ovn-northd when running command: ovn-nbctl --wait=sb sync Before reverting this commit: - 92.42% 0.62% ovn-northd ovn-northd [.] main - 91.80% main + 68.93% ovn_db_run (inlined) + 22.45% ovsdb_idl_loop_commit_and_wait After reverting this commit: - 92.84% 0.60% ovn-northd ovn-northd [.] main - 92.24% main + 92.03% ovn_db_run (inlined) Reverting this commit avoided 22.45% of the CPU caused by ovsdb_idl_loop_commit_and_wait(). The commit changed the logic of ovsdb_idl_txn_write__() by adding the check "datum->keys && datum->values" before discarding unchanged data in a transaction. However, it is normal for OVSDB clients ( such as ovn-northd) to try to set columns with same empty data as it is before the transaction. IDL would discard these changes and avoid sending big transactions to server (which would end up as no-op on server side). In the ovn scale test scenario mentioned above, each iteration of ovn-northd would send a transaction to server that includes all rows of the huge Port_Binding table, which caused the significant CPU increase of ovn-northd (and also the OVN SB DB server), resulted in longer end to end latency of OVN configuration changes. For the original problem the commit 68bc6f88 was trying to fix, it doesn't seem to be a real problem. The NULL deref reported by Coverity may be addressed in a future patch using a different approach, if necessary. Cc: William Tu <u9012063@gmail.com> Signed-off-by: Han Zhou <hzhou@ovn.org> --- lib/ovsdb-idl.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)