From patchwork Thu Jun 30 23:34:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1650918 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LYvn95NFVz9sGt for ; Fri, 1 Jul 2022 09:34:37 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 8BBB242438; Thu, 30 Jun 2022 23:34:33 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 8BBB242438 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HYC2EFa3KiQu; Thu, 30 Jun 2022 23:34:28 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id 3E2D5423D3; Thu, 30 Jun 2022 23:34:25 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 3E2D5423D3 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B4268C007F; Thu, 30 Jun 2022 23:34:23 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id A5D0CC007A for ; Thu, 30 Jun 2022 23:34:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 405FE404C0 for ; Thu, 30 Jun 2022 23:34:21 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 405FE404C0 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9qyv3TWqpVMA for ; Thu, 30 Jun 2022 23:34:16 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 4E27B40B49 Received: from relay11.mail.gandi.net (relay11.mail.gandi.net [IPv6:2001:4b98:dc4:8::231]) by smtp2.osuosl.org (Postfix) with ESMTPS id 4E27B40B49 for ; Thu, 30 Jun 2022 23:34:16 +0000 (UTC) Received: (Authenticated sender: i.maximets@ovn.org) by mail.gandi.net (Postfix) with ESMTPSA id C4933100002; Thu, 30 Jun 2022 23:34:13 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 1 Jul 2022 01:34:06 +0200 Message-Id: <20220630233407.623049-2-i.maximets@ovn.org> X-Mailer: git-send-email 2.34.3 In-Reply-To: <20220630233407.623049-1-i.maximets@ovn.org> References: <20220630233407.623049-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 1/2] ovsdb: Add lazy-copy support for ovsdb_datum objects. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Currently ovsdb-server is using shallow copies of some JSON objects by keeping a reference counter. JSON string objects are also used directly as ovsdb atoms in database rows to avoid extra copies. Taking this approach one step further ovsdb_datum objects can also be mostly deduplicated by postponing the copy until it actually needed. datum object itself contains a type and 2 pointers to data arrays. Adding a one more pointer to a reference counter we may create a shallow copy of the datum by simply copying type and pointers and increasing the reference counter. Before modifying the datum, special function needs to be called to perform an actual copy of the object, a.k.a. unshare it. Most of the datum modifications are performed inside the special functions in ovsdb-data.c, so that is not very hard to track. A few places like idl and column mutations in ovsdb are accessing and changing the data directly, so a few extra unshare() calls has to be added there. This change doesn't affect the maximum memory consumption too much, because most of the copies are short-living. However, not actually performing these copies saves up to 40% of CPU time on operations with large sets. Reported-at: https://bugzilla.redhat.com/2069089 Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara --- lib/db-ctl-base.c | 4 +- lib/ovsdb-data.c | 115 ++++++++++++++++++++++++++++++++------------ lib/ovsdb-data.h | 13 ++--- lib/ovsdb-idl.c | 11 +++-- lib/ovsdb-types.c | 3 +- ovsdb/condition.c | 4 +- ovsdb/monitor.c | 5 +- ovsdb/mutation.c | 4 +- ovsdb/ovsdb-idlc.in | 35 +++++++------- ovsdb/ovsdb-util.c | 6 ++- ovsdb/row.c | 6 +-- ovsdb/transaction.c | 8 ++- tests/test-ovsdb.c | 1 + 13 files changed, 135 insertions(+), 80 deletions(-) diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c index 707456158..bc85e9921 100644 --- a/lib/db-ctl-base.c +++ b/lib/db-ctl-base.c @@ -1503,7 +1503,7 @@ cmd_add(struct ctl_context *ctx) } type = &column->type; - ovsdb_datum_clone(&old, ovsdb_idl_read(row, column), &column->type); + ovsdb_datum_clone(&old, ovsdb_idl_read(row, column)); for (i = 4; i < ctx->argc; i++) { struct ovsdb_type add_type; struct ovsdb_datum add; @@ -1588,7 +1588,7 @@ cmd_remove(struct ctl_context *ctx) } type = &column->type; - ovsdb_datum_clone(&old, ovsdb_idl_read(row, column), &column->type); + ovsdb_datum_clone(&old, ovsdb_idl_read(row, column)); for (i = 4; i < ctx->argc; i++) { struct ovsdb_type rm_type; struct ovsdb_datum rm; diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c index 61ad7679a..d4209e1d4 100644 --- a/lib/ovsdb-data.c +++ b/lib/ovsdb-data.c @@ -896,6 +896,7 @@ ovsdb_datum_init_empty(struct ovsdb_datum *datum) datum->n = 0; datum->keys = NULL; datum->values = NULL; + datum->refcnt = NULL; } /* Initializes 'datum' as a datum that has the default value for 'type'. @@ -917,6 +918,7 @@ ovsdb_datum_init_default(struct ovsdb_datum *datum, datum->n = type->n_min; datum->keys = alloc_default_atoms(type->key.type, datum->n); datum->values = alloc_default_atoms(type->value.type, datum->n); + datum->refcnt = NULL; } /* Returns a read-only datum of the given 'type' that has the default value for @@ -928,10 +930,12 @@ const struct ovsdb_datum * ovsdb_datum_default(const struct ovsdb_type *type) { if (type->n_min == 0) { - static const struct ovsdb_datum empty; + static unsigned int refcnt = 1; + static const struct ovsdb_datum empty = { .refcnt = &refcnt }; return ∅ } else if (type->n_min == 1) { static struct ovsdb_datum default_data[OVSDB_N_TYPES][OVSDB_N_TYPES]; + static unsigned int refcnt[OVSDB_N_TYPES][OVSDB_N_TYPES]; struct ovsdb_datum *d; int kt = type->key.type; int vt = type->value.type; @@ -946,6 +950,8 @@ ovsdb_datum_default(const struct ovsdb_type *type) d->values = CONST_CAST(union ovsdb_atom *, ovsdb_atom_default(vt)); } + d->refcnt = &refcnt[kt][vt]; + *d->refcnt = 1; } return d; } else { @@ -999,18 +1005,23 @@ clone_atoms(const union ovsdb_atom *old, enum ovsdb_atomic_type type, size_t n) } } -/* Initializes 'new' as a copy of 'old', with the given 'type'. +/* Initializes 'new' as a shallow copy of 'old_'. * * The caller must eventually arrange for 'new' to be destroyed (with - * ovsdb_datum_destroy()). */ + * ovsdb_datum_destroy()). The caller must call ovsdb_datum_unshare() + * before attempting direct modifications of the 'new' or 'old_', i.e. + * modifications outside of the ovsdb_datum_* API. */ void -ovsdb_datum_clone(struct ovsdb_datum *new, const struct ovsdb_datum *old, - const struct ovsdb_type *type) +ovsdb_datum_clone(struct ovsdb_datum *new, const struct ovsdb_datum *old_) { - unsigned int n = old->n; - new->n = n; - new->keys = clone_atoms(old->keys, type->key.type, n); - new->values = clone_atoms(old->values, type->value.type, n); + struct ovsdb_datum *old = CONST_CAST(struct ovsdb_datum *, old_); + + if (!old->refcnt) { + old->refcnt = xmalloc(sizeof *old->refcnt); + *old->refcnt = 1; + } + memcpy(new, old, sizeof *new); + (*new->refcnt)++; } static void @@ -1037,8 +1048,23 @@ free_data(enum ovsdb_atomic_type type, void ovsdb_datum_destroy(struct ovsdb_datum *datum, const struct ovsdb_type *type) { - free_data(type->key.type, datum->keys, datum->n); - free_data(type->value.type, datum->values, datum->n); + if (!datum->refcnt || !--(*datum->refcnt)) { + free_data(type->key.type, datum->keys, datum->n); + free_data(type->value.type, datum->values, datum->n); + free(datum->refcnt); + } +} + +void +ovsdb_datum_unshare(struct ovsdb_datum *datum, const struct ovsdb_type *type) +{ + if (!datum->refcnt || *datum->refcnt == 1) { + return; + } + datum->keys = clone_atoms(datum->keys, type->key.type, datum->n); + datum->values = clone_atoms(datum->values, type->value.type, datum->n); + (*datum->refcnt)--; + datum->refcnt = NULL; } /* Swaps the contents of 'a' and 'b', which need not have the same type. */ @@ -1107,7 +1133,7 @@ ovsdb_datum_sort__(struct ovsdb_datum *datum, enum ovsdb_atomic_type key_type, * caller must free the returned error when it is no longer needed. On error, * 'datum' is sorted but not unique. */ struct ovsdb_error * -ovsdb_datum_sort(struct ovsdb_datum *datum, enum ovsdb_atomic_type key_type) +ovsdb_datum_sort(struct ovsdb_datum *datum, const struct ovsdb_type *type) { size_t i; @@ -1115,11 +1141,13 @@ ovsdb_datum_sort(struct ovsdb_datum *datum, enum ovsdb_atomic_type key_type) return NULL; } - ovsdb_datum_sort__(datum, key_type, OVSDB_TYPE_VOID); + ovsdb_datum_unshare(datum, type); + + ovsdb_datum_sort__(datum, type->key.type, OVSDB_TYPE_VOID); for (i = 0; i < datum->n - 1; i++) { if (ovsdb_atom_equals(&datum->keys[i], &datum->keys[i + 1], - key_type)) { + type->key.type)) { if (datum->values) { return ovsdb_error(NULL, "map contains duplicate key"); } else { @@ -1135,9 +1163,9 @@ ovsdb_datum_sort(struct ovsdb_datum *datum, enum ovsdb_atomic_type key_type) * this function assert-fails if it actually does. */ void ovsdb_datum_sort_assert(struct ovsdb_datum *datum, - enum ovsdb_atomic_type key_type) + const struct ovsdb_type *type) { - struct ovsdb_error *error = ovsdb_datum_sort(datum, key_type); + struct ovsdb_error *error = ovsdb_datum_sort(datum, type); if (error) { OVS_NOT_REACHED(); } @@ -1150,8 +1178,7 @@ ovsdb_datum_sort_assert(struct ovsdb_datum *datum, * Returns the number of keys (or pairs) that were dropped. */ size_t ovsdb_datum_sort_unique(struct ovsdb_datum *datum, - enum ovsdb_atomic_type key_type, - enum ovsdb_atomic_type value_type) + const struct ovsdb_type *type) { size_t src, dst; @@ -1159,20 +1186,21 @@ ovsdb_datum_sort_unique(struct ovsdb_datum *datum, return 0; } - ovsdb_datum_sort__(datum, key_type, value_type); + ovsdb_datum_unshare(datum, type); + ovsdb_datum_sort__(datum, type->key.type, type->value.type); dst = 1; for (src = 1; src < datum->n; src++) { if (ovsdb_atom_equals(&datum->keys[src], &datum->keys[dst - 1], - key_type)) { - ovsdb_atom_destroy(&datum->keys[src], key_type); - if (value_type != OVSDB_TYPE_VOID) { - ovsdb_atom_destroy(&datum->values[src], value_type); + type->key.type)) { + ovsdb_atom_destroy(&datum->keys[src], type->key.type); + if (type->value.type != OVSDB_TYPE_VOID) { + ovsdb_atom_destroy(&datum->values[src], type->value.type); } } else { if (src != dst) { datum->keys[dst] = datum->keys[src]; - if (value_type != OVSDB_TYPE_VOID) { + if (type->value.type != OVSDB_TYPE_VOID) { datum->values[dst] = datum->values[src]; } } @@ -1258,6 +1286,7 @@ ovsdb_datum_from_json__(struct ovsdb_datum *datum, datum->n = 0; datum->keys = xmalloc(n * sizeof *datum->keys); datum->values = is_map ? xmalloc(n * sizeof *datum->values) : NULL; + datum->refcnt = NULL; for (i = 0; i < n; i++) { const struct json *element = inner->array.elems[i]; const struct json *key = NULL; @@ -1298,6 +1327,7 @@ ovsdb_datum_from_json__(struct ovsdb_datum *datum, datum->n = 1; datum->keys = xmalloc(sizeof *datum->keys); datum->values = NULL; + datum->refcnt = NULL; error = ovsdb_atom_from_json(&datum->keys[0], &type->key, json, symtab); @@ -1331,7 +1361,7 @@ ovsdb_datum_from_json(struct ovsdb_datum *datum, return error; } - error = ovsdb_datum_sort(datum, type->key.type); + error = ovsdb_datum_sort(datum, type); if (error) { ovsdb_datum_destroy(datum, type); } @@ -1609,7 +1639,7 @@ ovsdb_datum_from_string(struct ovsdb_datum *datum, goto error; } - dberror = ovsdb_datum_sort(datum, type->key.type); + dberror = ovsdb_datum_sort(datum, type); if (dberror) { ovsdb_error_destroy(dberror); if (ovsdb_type_is_map(type)) { @@ -1687,6 +1717,7 @@ ovsdb_datum_from_smap(struct ovsdb_datum *datum, const struct smap *smap) datum->n = smap_count(smap); datum->keys = xmalloc(datum->n * sizeof *datum->keys); datum->values = xmalloc(datum->n * sizeof *datum->values); + datum->refcnt = NULL; struct smap_node *node; size_t i = 0; @@ -1697,7 +1728,11 @@ ovsdb_datum_from_smap(struct ovsdb_datum *datum, const struct smap *smap) } ovs_assert(i == datum->n); - ovsdb_datum_sort_unique(datum, OVSDB_TYPE_STRING, OVSDB_TYPE_STRING); + struct ovsdb_type type = { + OVSDB_BASE_STRING_INIT, OVSDB_BASE_STRING_INIT, + 0, UINT_MAX + }; + ovsdb_datum_sort_unique(datum, &type); } struct ovsdb_error * OVS_WARN_UNUSED_RESULT @@ -1774,6 +1809,10 @@ ovsdb_datum_compare_3way(const struct ovsdb_datum *a, return a->n < b->n ? -1 : 1; } + if (a->refcnt && a->refcnt == b->refcnt) { + return 0; + } + cmp = atom_arrays_compare_3way(a->keys, b->keys, type->key.type, a->n); if (cmp) { return cmp; @@ -1893,6 +1932,8 @@ static void ovsdb_datum_reallocate(struct ovsdb_datum *a, const struct ovsdb_type *type, unsigned int capacity) { + ovsdb_datum_unshare(a, type); + a->keys = xrealloc(a->keys, capacity * sizeof *a->keys); if (type->value.type != OVSDB_TYPE_VOID) { a->values = xrealloc(a->values, capacity * sizeof *a->values); @@ -1909,6 +1950,8 @@ void ovsdb_datum_remove_unsafe(struct ovsdb_datum *datum, size_t idx, const struct ovsdb_type *type) { + ovsdb_datum_unshare(datum, type); + ovsdb_atom_destroy(&datum->keys[idx], type->key.type); datum->keys[idx] = datum->keys[datum->n - 1]; if (type->value.type != OVSDB_TYPE_VOID) { @@ -1940,6 +1983,9 @@ ovsdb_datum_add_unsafe(struct ovsdb_datum *datum, const union ovsdb_atom *range_end_atom) { size_t idx = datum->n; + + ovsdb_datum_unshare(datum, type); + datum->n += range_end_atom ? (range_end_atom->integer - key->integer + 1) : 1; datum->keys = xrealloc(datum->keys, datum->n * sizeof *datum->keys); @@ -1984,6 +2030,8 @@ ovsdb_datum_push_unsafe(struct ovsdb_datum *dst, return; } + ovsdb_datum_unshare(dst, type); + memcpy(&dst->keys[dst->n], &src->keys[start_idx], n * sizeof src->keys[0]); if (type->value.type != OVSDB_TYPE_VOID) { memcpy(&dst->values[dst->n], &src->values[start_idx], @@ -1999,6 +2047,7 @@ ovsdb_datum_union(struct ovsdb_datum *a, const struct ovsdb_datum *b, struct ovsdb_datum result; unsigned int copied, pos; + ovsdb_datum_unshare(a, type); ovsdb_datum_init_empty(&result); copied = 0; @@ -2050,6 +2099,8 @@ ovsdb_datum_subtract(struct ovsdb_datum *a, const struct ovsdb_type *a_type, ovs_assert(a_type->value.type == b_type->value.type || b_type->value.type == OVSDB_TYPE_VOID); + ovsdb_datum_unshare(a, a_type); + idx = xmalloc(b->n * sizeof *idx); n_idx = 0; for (size_t bi = 0; bi < b->n; bi++) { @@ -2168,8 +2219,8 @@ ovsdb_datum_added_removed(struct ovsdb_datum *added, ovsdb_datum_init_empty(added); ovsdb_datum_init_empty(removed); if (!ovsdb_type_is_composite(type)) { - ovsdb_datum_clone(removed, old, type); - ovsdb_datum_clone(added, new, type); + ovsdb_datum_clone(removed, old); + ovsdb_datum_clone(added, new); return; } @@ -2228,7 +2279,7 @@ ovsdb_datum_diff(struct ovsdb_datum *diff, ovsdb_datum_init_empty(diff); if (!ovsdb_type_is_composite(type)) { - ovsdb_datum_clone(diff, new, type); + ovsdb_datum_clone(diff, new); return; } @@ -2283,10 +2334,12 @@ ovsdb_datum_apply_diff_in_place(struct ovsdb_datum *a, if (!ovsdb_type_is_composite(type)) { ovsdb_datum_destroy(a, type); - ovsdb_datum_clone(a, diff, type); + ovsdb_datum_clone(a, diff); return NULL; } + ovsdb_datum_unshare(a, type); + operation = xmalloc(diff->n * sizeof *operation); idx = xmalloc(diff->n * sizeof *idx); new_size = a->n; diff --git a/lib/ovsdb-data.h b/lib/ovsdb-data.h index ba5d179a6..dcb620513 100644 --- a/lib/ovsdb-data.h +++ b/lib/ovsdb-data.h @@ -146,6 +146,8 @@ struct ovsdb_datum { unsigned int n; /* Number of 'keys' and 'values'. */ union ovsdb_atom *keys; /* Each of the ovsdb_type's 'key_type'. */ union ovsdb_atom *values; /* Each of the ovsdb_type's 'value_type'. */ + unsigned int *refcnt; /* Number of copies with the same + * 'keys' and 'values'. */ }; #define OVSDB_DATUM_INITIALIZER { 0, NULL, NULL } @@ -155,22 +157,21 @@ void ovsdb_datum_init_default(struct ovsdb_datum *, const struct ovsdb_type *); bool ovsdb_datum_is_default(const struct ovsdb_datum *, const struct ovsdb_type *); const struct ovsdb_datum *ovsdb_datum_default(const struct ovsdb_type *); -void ovsdb_datum_clone(struct ovsdb_datum *, const struct ovsdb_datum *, - const struct ovsdb_type *); +void ovsdb_datum_clone(struct ovsdb_datum *, const struct ovsdb_datum *); void ovsdb_datum_destroy(struct ovsdb_datum *, const struct ovsdb_type *); +void ovsdb_datum_unshare(struct ovsdb_datum *, const struct ovsdb_type *); void ovsdb_datum_swap(struct ovsdb_datum *, struct ovsdb_datum *); /* Checking and maintaining invariants. */ struct ovsdb_error *ovsdb_datum_sort(struct ovsdb_datum *, - enum ovsdb_atomic_type key_type) + const struct ovsdb_type *type) OVS_WARN_UNUSED_RESULT; void ovsdb_datum_sort_assert(struct ovsdb_datum *, - enum ovsdb_atomic_type key_type); + const struct ovsdb_type *type); size_t ovsdb_datum_sort_unique(struct ovsdb_datum *, - enum ovsdb_atomic_type key_type, - enum ovsdb_atomic_type value_type); + const struct ovsdb_type *type); struct ovsdb_error *ovsdb_datum_check_constraints( const struct ovsdb_datum *, const struct ovsdb_type *) diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 3b8741408..51dae1d02 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -1088,7 +1088,7 @@ ovsdb_idl_condition_add_clause__(struct ovsdb_idl_condition *condition, struct ovsdb_idl_clause *clause = xmalloc(sizeof *clause); clause->function = src->function; clause->column = src->column; - ovsdb_datum_clone(&clause->arg, &src->arg, &src->column->type); + ovsdb_datum_clone(&clause->arg, &src->arg); hmap_insert(&condition->clauses, &clause->hmap_node, hash); } @@ -1128,12 +1128,14 @@ ovsdb_idl_condition_add_clause(struct ovsdb_idl_condition *condition, struct ovsdb_idl_clause clause = { .function = function, .column = column, - .arg = *arg, }; + ovsdb_datum_clone(&clause.arg, arg); + uint32_t hash = ovsdb_idl_clause_hash(&clause); if (!ovsdb_idl_condition_find_clause(condition, &clause, hash)) { ovsdb_idl_condition_add_clause__(condition, &clause, hash); } + ovsdb_datum_destroy(&clause.arg, &column->type); } } @@ -3611,7 +3613,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, if (owns_datum) { row->new_datum[column_idx] = *datum; } else { - ovsdb_datum_clone(&row->new_datum[column_idx], datum, &column->type); + ovsdb_datum_clone(&row->new_datum[column_idx], datum); } (column->unparse)(row); (column->parse)(row, &row->new_datum[column_idx]); @@ -3650,8 +3652,7 @@ ovsdb_idl_txn_write(const struct ovsdb_idl_row *row, const struct ovsdb_idl_column *column, struct ovsdb_datum *datum) { - ovsdb_datum_sort_unique(datum, - column->type.key.type, column->type.value.type); + ovsdb_datum_sort_unique(datum, &column->type); ovsdb_idl_txn_write__(row, column, datum, true); } diff --git a/lib/ovsdb-types.c b/lib/ovsdb-types.c index 24ccdd1cc..61efe59cf 100644 --- a/lib/ovsdb-types.c +++ b/lib/ovsdb-types.c @@ -189,8 +189,7 @@ ovsdb_base_type_clone(struct ovsdb_base_type *dst, if (src->enum_) { dst->enum_ = xmalloc(sizeof *dst->enum_); - ovsdb_datum_clone(dst->enum_, src->enum_, - ovsdb_base_type_get_enum_type(dst->type)); + ovsdb_datum_clone(dst->enum_, src->enum_); } switch (dst->type) { diff --git a/ovsdb/condition.c b/ovsdb/condition.c index 9aa3788db..d0016fa7f 100644 --- a/ovsdb/condition.c +++ b/ovsdb/condition.c @@ -376,9 +376,7 @@ ovsdb_clause_clone(struct ovsdb_clause *new, struct ovsdb_clause *old) { new->function = old->function; new->column = old->column; - ovsdb_datum_clone(&new->arg, - &old->arg, - &old->column->type); + ovsdb_datum_clone(&new->arg, &old->arg); } bool diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c index 952fa902e..191befcae 100644 --- a/ovsdb/monitor.c +++ b/ovsdb/monitor.c @@ -334,9 +334,8 @@ clone_monitor_row_data(const struct ovsdb_monitor_table *mt, const struct ovsdb_column *c = mt->columns[i].column; const struct ovsdb_datum *src = &row->fields[c->index]; struct ovsdb_datum *dst = &data[i]; - const struct ovsdb_type *type = &c->type; - ovsdb_datum_clone(dst, src, type); + ovsdb_datum_clone(dst, src); } return data; } @@ -359,7 +358,7 @@ update_monitor_row_data(const struct ovsdb_monitor_table *mt, if (!ovsdb_datum_equals(src, dst, type)) { ovsdb_datum_destroy(dst, type); - ovsdb_datum_clone(dst, src, type); + ovsdb_datum_clone(dst, src); } } } diff --git a/ovsdb/mutation.c b/ovsdb/mutation.c index 03d1c3499..cbc71bc49 100644 --- a/ovsdb/mutation.c +++ b/ovsdb/mutation.c @@ -287,6 +287,8 @@ mutate_scalar(const struct ovsdb_type *dst_type, struct ovsdb_datum *dst, struct ovsdb_error *error; unsigned int i; + ovsdb_datum_unshare(dst, dst_type); + if (base->type == OVSDB_TYPE_INTEGER) { int64_t y = arg->integer; for (i = 0; i < dst->n; i++) { @@ -322,7 +324,7 @@ mutate_scalar(const struct ovsdb_type *dst_type, struct ovsdb_datum *dst, } } - error = ovsdb_datum_sort(dst, dst_type->key.type); + error = ovsdb_datum_sort(dst, dst_type); if (error) { ovsdb_error_destroy(error); return ovsdb_error("constraint violation", diff --git a/ovsdb/ovsdb-idlc.in b/ovsdb/ovsdb-idlc.in index 13c535939..5a97a8ea3 100755 --- a/ovsdb/ovsdb-idlc.in +++ b/ovsdb/ovsdb-idlc.in @@ -951,7 +951,10 @@ void 'c': columnName, 'args': ', '.join(['%(type)s%(name)s' % m for m in members])}) + print() + print(" datum.refcnt = NULL;") if type.n_min == 1 and type.n_max == 1: + print() print(" union ovsdb_atom *key = xmalloc(sizeof *key);") if type.value: print(" union ovsdb_atom *value = xmalloc(sizeof *value);") @@ -1004,10 +1007,6 @@ void if type.value: print(" " + type.value.copyCValue("datum.values[i].%s" % type.value.type.to_lvalue_string(), "%s[i]" % valueVar)) print(" }") - if type.value: - valueType = type.value.toAtomicType() - else: - valueType = "OVSDB_TYPE_VOID" txn_write_func = "ovsdb_idl_txn_write" print(" %(f)s(&row->header_, &%(s)s_col_%(c)s, &datum);" \ % {'f': txn_write_func, @@ -1035,6 +1034,7 @@ void datum->n = 1; datum->keys = xmalloc(datum->n * sizeof *datum->keys); datum->values = xmalloc(datum->n * sizeof *datum->values); + datum->refcnt = NULL; ''' % {'s': structName, 'c': columnName,'coltype':column.type.key.to_const_c_type(prefix), 'valtype':column.type.value.to_const_c_type(prefix), 'S': structName.upper(), 'C': columnName.upper(), 't': tableName}) @@ -1060,6 +1060,7 @@ void datum->n = 1; datum->keys = xmalloc(datum->n * sizeof *datum->keys); datum->values = NULL; + datum->refcnt = NULL; ''' % {'s': structName, 'c': columnName,'coltype':column.type.key.to_const_c_type(prefix), 'valtype':column.type.value.to_const_c_type(prefix), 'S': structName.upper(), 'C': columnName.upper(), 't': tableName}) @@ -1087,6 +1088,7 @@ void datum->n = 1; datum->keys = xmalloc(datum->n * sizeof *datum->values); datum->values = NULL; + datum->refcnt = NULL; ''' % {'s': structName, 'c': columnName, 'valtype':column.type.key.to_const_c_type(prefix), 't': tableName}) print(" " + type.key.copyCValue("datum->keys[0].%s" % type.key.type.to_lvalue_string(), "new_value")) @@ -1110,6 +1112,7 @@ void datum->n = 1; datum->keys = xmalloc(datum->n * sizeof *datum->values); datum->values = NULL; + datum->refcnt = NULL; ''' % {'s': structName, 'c': columnName,'coltype':column.type.key.to_const_c_type(prefix), 'valtype':column.type.key.to_const_c_type(prefix), 'S': structName.upper(), 'C': columnName.upper(), 't': tableName}) @@ -1179,8 +1182,11 @@ void 'args': ', '.join(['%(type)s%(name)s' % m for m in members])}) print("{") print(" struct ovsdb_datum datum;") + print("") + print(" datum.refcnt = NULL;") free = [] if type.n_min == 1 and type.n_max == 1: + print() print(" union ovsdb_atom *key = xmalloc(sizeof *key);") if type.value: print(" union ovsdb_atom *value = xmalloc(sizeof *value);") @@ -1228,12 +1234,8 @@ void if type.value: print(" " + type.value.copyCValue("datum.values[i].%s" % type.value.type.to_lvalue_string(), "%s[i]" % valueVar, refTable=False)) print(" }") - if type.value: - valueType = type.value.toAtomicType() - else: - valueType = "OVSDB_TYPE_VOID" - print(" ovsdb_datum_sort_unique(&datum, %s, %s);" % ( - type.key.toAtomicType(), valueType)) + print(" ovsdb_datum_sort_unique(&datum, &%(s)s_col_%(c)s.type);" \ + % {'s': structName, 'c': columnName}) print(""" ovsdb_idl_condition_add_clause(cond, function, @@ -1341,6 +1343,7 @@ struct %(s)s * datum->n = smap_count(%(c)s); datum->keys = xmalloc(datum->n * sizeof *datum->keys); datum->values = xmalloc(datum->n * sizeof *datum->values); + datum->refcnt = NULL; i = 0; SMAP_FOR_EACH (node, %(c)s) { @@ -1348,7 +1351,7 @@ struct %(s)s * datum->values[i].s = ovsdb_atom_string_create(node->value); i++; } - ovsdb_datum_sort_unique(datum, OVSDB_TYPE_STRING, OVSDB_TYPE_STRING); + ovsdb_datum_sort_unique(datum, &%(s)s_col_%(c)s.type); } else { ovsdb_datum_init_empty(datum); } @@ -1387,6 +1390,8 @@ struct %(s)s * 'args': ', '.join(['%(type)s%(name)s' % m for m in members])}) print("{") print(" struct ovsdb_datum datum;") + print() + print(" datum.refcnt = NULL;") if type.n_min == 1 and type.n_max == 1: print(" union ovsdb_atom *key = xmalloc(sizeof(union ovsdb_atom));") if type.value: @@ -1443,12 +1448,8 @@ struct %(s)s * if type.value: print(" " + type.value.copyCValue("datum.values[i].%s" % type.value.type.to_lvalue_string(), "%s[i]" % valueVar)) print(" }") - if type.value: - valueType = type.value.toAtomicType() - else: - valueType = "OVSDB_TYPE_VOID" - print(" ovsdb_datum_sort_unique(&datum, %s, %s);" % ( - type.key.toAtomicType(), valueType)) + print(" ovsdb_datum_sort_unique(&datum, &%(s)s_col_%(c)s.type);" + % {'s': structName, 'c': columnName}) txn_write_func = "ovsdb_idl_index_write" print(" %(f)s(CONST_CAST(struct ovsdb_idl_row *, &row->header_), &%(s)s_columns[ %(S)s_COL_%(C)s ], &datum, &%(p)stable_classes[%(P)sTABLE_%(T)s]);" \ % {'f': txn_write_func, diff --git a/ovsdb/ovsdb-util.c b/ovsdb/ovsdb-util.c index 6d7be066b..303191dc8 100644 --- a/ovsdb/ovsdb-util.c +++ b/ovsdb/ovsdb-util.c @@ -221,6 +221,8 @@ ovsdb_util_write_singleton(struct ovsdb_row *row, const char *column_name, return; } + ovsdb_datum_unshare(datum, &column->type); + if (datum->n == 1) { if (ovsdb_atom_equals(&datum->keys[0], atom, type)) { return; @@ -231,6 +233,7 @@ ovsdb_util_write_singleton(struct ovsdb_row *row, const char *column_name, datum->n = 1; datum->keys = xmalloc(sizeof *datum->keys); datum->values = NULL; + datum->refcnt = NULL; } ovsdb_atom_clone(&datum->keys[0], atom, type); } @@ -305,6 +308,7 @@ ovsdb_util_write_string_string_column(struct ovsdb_row *row, datum->n = n; datum->keys = xmalloc(n * sizeof *datum->keys); datum->values = xmalloc(n * sizeof *datum->values); + datum->refcnt = NULL; for (i = 0; i < n; ++i) { datum->keys[i].s = ovsdb_atom_string_create_nocopy(keys[i]); @@ -312,5 +316,5 @@ ovsdb_util_write_string_string_column(struct ovsdb_row *row, } /* Sort and check constraints. */ - ovsdb_datum_sort_assert(datum, column->type.key.type); + ovsdb_datum_sort_assert(datum, &column->type); } diff --git a/ovsdb/row.c b/ovsdb/row.c index f93fe2306..fd50c7e7b 100644 --- a/ovsdb/row.c +++ b/ovsdb/row.c @@ -143,8 +143,7 @@ ovsdb_row_clone(const struct ovsdb_row *old) SHASH_FOR_EACH (node, &table->schema->columns) { const struct ovsdb_column *column = node->data; ovsdb_datum_clone(&new->fields[column->index], - &old->fields[column->index], - &column->type); + &old->fields[column->index]); } struct ovsdb_weak_ref *weak, *clone; @@ -257,8 +256,7 @@ ovsdb_row_update_columns(struct ovsdb_row *dst, } else { ovsdb_datum_destroy(&dst->fields[column->index], &column->type); ovsdb_datum_clone(&dst->fields[column->index], - &src->fields[column->index], - &column->type); + &src->fields[column->index]); } } return NULL; diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c index ab7bb0eb1..6864fe099 100644 --- a/ovsdb/transaction.c +++ b/ovsdb/transaction.c @@ -693,8 +693,7 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row) ovs_list_init(&weak->src_node); } } - ovsdb_datum_sort_unique(&deleted_refs, column->type.key.type, - column->type.value.type); + ovsdb_datum_sort_unique(&deleted_refs, &column->type); /* Removing elements that references deleted rows. */ ovsdb_datum_subtract(datum, &column->type, @@ -708,7 +707,7 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row) datum, &column->type); } else { ovsdb_datum_init_empty(&removed); - ovsdb_datum_clone(&added, datum, &column->type); + ovsdb_datum_clone(&added, datum); } /* Checking added data and creating new references. */ @@ -732,8 +731,7 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row) } if (deleted_refs.n) { /* Removing all the references that doesn't point to valid rows. */ - ovsdb_datum_sort_unique(&deleted_refs, column->type.key.type, - column->type.value.type); + ovsdb_datum_sort_unique(&deleted_refs, &column->type); ovsdb_datum_subtract(datum, &column->type, &deleted_refs, &column->type); ovsdb_datum_destroy(&deleted_refs, &column->type); diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c index 0fef1f978..343833151 100644 --- a/tests/test-ovsdb.c +++ b/tests/test-ovsdb.c @@ -1679,6 +1679,7 @@ do_transact_set_integer(struct ovsdb_row *row, const char *column_name, column = ovsdb_table_schema_get_column(do_transact_table->schema, column_name); + ovsdb_datum_unshare(&row->fields[column->index], &column->type); row->fields[column->index].keys[0].integer = integer; } } From patchwork Thu Jun 30 23:34:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1650917 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LYvn52lkdz9sGt for ; Fri, 1 Jul 2022 09:34:33 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id C3AA342423; Thu, 30 Jun 2022 23:34:30 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org C3AA342423 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WJIQh6897JAN; Thu, 30 Jun 2022 23:34:26 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id CD5B6423C1; Thu, 30 Jun 2022 23:34:23 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org CD5B6423C1 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id F292AC002D; Thu, 30 Jun 2022 23:34:22 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 9D896C0011 for ; Thu, 30 Jun 2022 23:34:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3C9CF60ACE for ; Thu, 30 Jun 2022 23:34:21 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 3C9CF60ACE X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IJtoOO_sKDDa for ; Thu, 30 Jun 2022 23:34:19 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 70CAD607C9 Received: from relay11.mail.gandi.net (relay11.mail.gandi.net [217.70.178.231]) by smtp3.osuosl.org (Postfix) with ESMTPS id 70CAD607C9 for ; Thu, 30 Jun 2022 23:34:19 +0000 (UTC) Received: (Authenticated sender: i.maximets@ovn.org) by mail.gandi.net (Postfix) with ESMTPSA id 212BD100002; Thu, 30 Jun 2022 23:34:16 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 1 Jul 2022 01:34:07 +0200 Message-Id: <20220630233407.623049-3-i.maximets@ovn.org> X-Mailer: git-send-email 2.34.3 In-Reply-To: <20220630233407.623049-1-i.maximets@ovn.org> References: <20220630233407.623049-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 2/2] ovsdb: Prepare snapshot JSON in a separate thread. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Conversion of the database data into JSON object, serialization and destruction of that object are the most heavy operations during the database compaction. If these operations are moved to a separate thread, the main thread can continue processing database requests in the meantime. With this change, the compaction is split in 3 phases: 1. Initialization: - Create a copy of the database. - Remember current database index. - Start a separate thread to convert a copy of the database into serialized JSON object. 2. Wait: - Continue normal operation until compaction thread is done. - Meanwhile, compaction thread: * Convert database copy to JSON. * Serialize resulted JSON. * Destroy original JSON object. 3. Finish: - Destroy the database copy. - Take the snapshot created by the thread. - Write on disk. The key for this schema to be fast is the ability to create a shallow copy of the database. This doesn't take too much time allowing the thread to do most of work. Database copy is created and destroyed only by the main thread, so there is no need for synchronization. Such solution allows to reduce the time main thread is blocked by compaction by 80-90%. For example, in ovn-heater tests with 120 node density-heavy scenario, where compaction normally takes 5-6 seconds at the end of a test, measured compaction times was all below 1 second with the change applied. Also, note that these measured times are the sum of phases 1 and 3, so actual poll intervals are about half a second in this case. Only implemented for raft storage for now. The implementation for standalone databases can be added later by using a file offset as a database index and copying newly added changes from the old file to a new one during ovsdb_log_replace(). Reported-at: https://bugzilla.redhat.com/2069108 Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara --- ovsdb/ovsdb-server.c | 18 +++++- ovsdb/ovsdb.c | 143 +++++++++++++++++++++++++++++++++++++++---- ovsdb/ovsdb.h | 24 ++++++++ ovsdb/raft.c | 8 ++- ovsdb/raft.h | 3 +- ovsdb/row.c | 17 +++++ ovsdb/row.h | 1 + ovsdb/storage.c | 11 ++-- ovsdb/storage.h | 3 +- 9 files changed, 204 insertions(+), 24 deletions(-) diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index 5549b4e3a..eae2f6679 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -252,7 +252,9 @@ main_loop(struct server_config *config, remove_db(config, node, xasprintf("removing database %s because storage " "disconnected permanently", node->name)); - } else if (ovsdb_storage_should_snapshot(db->db->storage)) { + } else if (!ovsdb_snapshot_in_progress(db->db) + && (ovsdb_storage_should_snapshot(db->db->storage) || + ovsdb_snapshot_ready(db->db))) { log_and_free_error(ovsdb_snapshot(db->db, trim_memory)); } } @@ -287,6 +289,7 @@ main_loop(struct server_config *config, ovsdb_trigger_wait(db->db, time_msec()); ovsdb_storage_wait(db->db->storage); ovsdb_storage_read_wait(db->db->storage); + ovsdb_snapshot_wait(db->db); } if (run_process) { process_wait(run_process); @@ -1552,11 +1555,20 @@ ovsdb_server_compact(struct unixctl_conn *conn, int argc, ? !strcmp(node->name, db_name) : node->name[0] != '_') { if (db->db) { + struct ovsdb_error *error = NULL; + VLOG_INFO("compacting %s database by user request", node->name); - struct ovsdb_error *error = ovsdb_snapshot(db->db, - trim_memory); + error = ovsdb_snapshot(db->db, trim_memory); + if (!error && ovsdb_snapshot_in_progress(db->db)) { + while (ovsdb_snapshot_in_progress(db->db)) { + ovsdb_snapshot_wait(db->db); + poll_block(); + } + error = ovsdb_snapshot(db->db, trim_memory); + } + if (error) { char *s = ovsdb_error_to_string(error); ds_put_format(&reply, "%s\n", s); diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c index 91b4a01af..8cbefbe3d 100644 --- a/ovsdb/ovsdb.c +++ b/ovsdb/ovsdb.c @@ -25,9 +25,13 @@ #include "file.h" #include "monitor.h" #include "openvswitch/json.h" +#include "openvswitch/poll-loop.h" +#include "ovs-thread.h" #include "ovsdb-error.h" #include "ovsdb-parser.h" #include "ovsdb-types.h" +#include "row.h" +#include "seq.h" #include "simap.h" #include "storage.h" #include "table.h" @@ -461,6 +465,21 @@ ovsdb_destroy(struct ovsdb *db) if (db) { struct shash_node *node; + /* Need to wait for compaction thread to finish the work. */ + while (ovsdb_snapshot_in_progress(db)) { + ovsdb_snapshot_wait(db); + poll_block(); + } + if (ovsdb_snapshot_ready(db)) { + struct ovsdb_error *error = ovsdb_snapshot(db, false); + + if (error) { + char *s = ovsdb_error_to_string_free(error); + VLOG_INFO("%s: %s", db->name, s); + free(s); + } + } + /* Close the log. */ ovsdb_storage_close(db->storage); @@ -535,20 +554,119 @@ ovsdb_get_table(const struct ovsdb *db, const char *name) return shash_find_data(&db->tables, name); } +static struct ovsdb * +ovsdb_clone_data(const struct ovsdb *db) +{ + struct ovsdb *new = ovsdb_create(ovsdb_schema_clone(db->schema), NULL); + + struct shash_node *node; + SHASH_FOR_EACH (node, &db->tables) { + struct ovsdb_table *table = node->data; + struct ovsdb_table *new_table = shash_find_data(&new->tables, + node->name); + struct ovsdb_row *row, *new_row; + + hmap_reserve(&new_table->rows, hmap_count(&table->rows)); + HMAP_FOR_EACH (row, hmap_node, &table->rows) { + new_row = ovsdb_row_datum_clone(row); + hmap_insert(&new_table->rows, &new_row->hmap_node, + ovsdb_row_hash(new_row)); + } + } + + return new; +} + +static void * +compaction_thread(void *aux) +{ + struct ovsdb_compaction_state *state = aux; + uint64_t start_time = time_msec(); + struct json *data; + + VLOG_DBG("%s: Compaction thread started.", state->db->name); + data = ovsdb_to_txn_json(state->db, "compacting database online"); + state->data = json_serialized_object_create(data); + json_destroy(data); + + state->thread_time = time_msec() - start_time; + + VLOG_DBG("%s: Compaction thread finished in %"PRIu64" ms.", + state->db->name, state->thread_time); + seq_change(state->done); + return NULL; +} + +void +ovsdb_snapshot_wait(struct ovsdb *db) +{ + if (db->snap_state) { + seq_wait(db->snap_state->done, db->snap_state->seqno); + } +} + +bool +ovsdb_snapshot_in_progress(struct ovsdb *db) +{ + return db->snap_state && + seq_read(db->snap_state->done) == db->snap_state->seqno; +} + +bool +ovsdb_snapshot_ready(struct ovsdb *db) +{ + return db->snap_state && + seq_read(db->snap_state->done) != db->snap_state->seqno; +} + struct ovsdb_error * OVS_WARN_UNUSED_RESULT ovsdb_snapshot(struct ovsdb *db, bool trim_memory OVS_UNUSED) { - if (!db->storage) { + if (!db->storage || ovsdb_snapshot_in_progress(db)) { return NULL; } + uint64_t applied_index = ovsdb_storage_get_applied_index(db->storage); uint64_t elapsed, start_time = time_msec(); - struct json *schema = ovsdb_schema_to_json(db->schema); - struct json *data = ovsdb_to_txn_json(db, "compacting database online"); - struct ovsdb_error *error = ovsdb_storage_store_snapshot(db->storage, - schema, data); - json_destroy(schema); - json_destroy(data); + struct ovsdb_compaction_state *state; + + if (!applied_index) { + /* Parallel compaction is not supported for standalone databases. */ + state = xzalloc(sizeof *state); + state->data = ovsdb_to_txn_json(db, "compacting database online"); + state->schema = ovsdb_schema_to_json(db->schema); + } else if (ovsdb_snapshot_ready(db)) { + xpthread_join(db->snap_state->thread, NULL); + + state = db->snap_state; + db->snap_state = NULL; + + ovsdb_destroy(state->db); + seq_destroy(state->done); + } else { + /* Creating a thread. */ + ovs_assert(!db->snap_state); + state = xzalloc(sizeof *state); + + state->db = ovsdb_clone_data(db); + state->schema = ovsdb_schema_to_json(db->schema); + state->applied_index = applied_index; + state->done = seq_create(); + state->seqno = seq_read(state->done); + state->thread = ovs_thread_create("compaction", + compaction_thread, state); + state->init_time = time_msec() - start_time; + + db->snap_state = state; + return NULL; + } + + struct ovsdb_error *error; + + error = ovsdb_storage_store_snapshot(db->storage, state->schema, + state->data, state->applied_index); + json_destroy(state->schema); + json_destroy(state->data); #if HAVE_DECL_MALLOC_TRIM if (!error && trim_memory) { @@ -557,10 +675,13 @@ ovsdb_snapshot(struct ovsdb *db, bool trim_memory OVS_UNUSED) #endif elapsed = time_msec() - start_time; - if (elapsed > 1000) { - VLOG_INFO("%s: Database compaction took %"PRIu64"ms", - db->name, elapsed); - } + VLOG(elapsed > 1000 ? VLL_INFO : VLL_DBG, + "%s: Database compaction took %"PRIu64"ms " + "(init: %"PRIu64"ms, write: %"PRIu64"ms, thread: %"PRIu64"ms)", + db->name, elapsed + state->init_time, + state->init_time, elapsed, state->thread_time); + + free(state); return error; } diff --git a/ovsdb/ovsdb.h b/ovsdb/ovsdb.h index ec2d235ec..2f77821e0 100644 --- a/ovsdb/ovsdb.h +++ b/ovsdb/ovsdb.h @@ -72,6 +72,24 @@ struct ovsdb_txn_history_node { struct ovsdb_txn *txn; }; +struct ovsdb_compaction_state { + pthread_t thread; /* Thread handle. */ + + struct ovsdb *db; /* Copy of a database data to compact. */ + + struct json *data; /* 'db' as a serialized json. */ + struct json *schema; /* 'db' schema json. */ + uint64_t applied_index; /* Last applied index reported by the storage + * at the moment of a database copy. */ + + /* Completion signaling. */ + struct seq *done; + uint64_t seqno; + + uint64_t init_time; /* Time spent by the main thread preparing. */ + uint64_t thread_time; /* Time spent for compaction by the thread. */ +}; + struct ovsdb { char *name; struct ovsdb_schema *schema; @@ -101,6 +119,9 @@ struct ovsdb { struct ovs_list txn_forward_new; /* Hash map for transactions that are already sent and waits for reply. */ struct hmap txn_forward_sent; + + /* Database compaction. */ + struct ovsdb_compaction_state *snap_state; }; struct ovsdb *ovsdb_create(struct ovsdb_schema *, struct ovsdb_storage *); @@ -124,6 +145,9 @@ struct json *ovsdb_execute(struct ovsdb *, const struct ovsdb_session *, struct ovsdb_error *ovsdb_snapshot(struct ovsdb *, bool trim_memory) OVS_WARN_UNUSED_RESULT; +void ovsdb_snapshot_wait(struct ovsdb *); +bool ovsdb_snapshot_in_progress(struct ovsdb *); +bool ovsdb_snapshot_ready(struct ovsdb *); void ovsdb_replace(struct ovsdb *dst, struct ovsdb *src); diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 856d083f2..b2c21e70f 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -4295,7 +4295,8 @@ raft_notify_snapshot_recommended(struct raft *raft) * only valuable to call it if raft_get_log_length() is significant and * especially if raft_grew_lots() returns true. */ struct ovsdb_error * OVS_WARN_UNUSED_RESULT -raft_store_snapshot(struct raft *raft, const struct json *new_snapshot_data) +raft_store_snapshot(struct raft *raft, const struct json *new_snapshot_data, + uint64_t applied_index) { if (raft->joining) { return ovsdb_error(NULL, @@ -4311,11 +4312,12 @@ raft_store_snapshot(struct raft *raft, const struct json *new_snapshot_data) "cannot store a snapshot following failure"); } - if (raft->last_applied < raft->log_start) { + uint64_t new_log_start = applied_index ? applied_index + 1 + : raft->last_applied + 1; + if (new_log_start <= raft->log_start) { return ovsdb_error(NULL, "not storing a duplicate snapshot"); } - uint64_t new_log_start = raft->last_applied + 1; struct raft_entry new_snapshot = { .term = raft_get_term(raft, new_log_start - 1), .eid = *raft_get_eid(raft, new_log_start - 1), diff --git a/ovsdb/raft.h b/ovsdb/raft.h index 599bc0ae8..403ed3dd7 100644 --- a/ovsdb/raft.h +++ b/ovsdb/raft.h @@ -180,7 +180,8 @@ uint64_t raft_get_log_length(const struct raft *); bool raft_may_snapshot(const struct raft *); void raft_notify_snapshot_recommended(struct raft *); struct ovsdb_error *raft_store_snapshot(struct raft *, - const struct json *new_snapshot) + const struct json *new_snapshot, + uint64_t applied_index) OVS_WARN_UNUSED_RESULT; /* Cluster management. */ diff --git a/ovsdb/row.c b/ovsdb/row.c index fd50c7e7b..3f0bb8acf 100644 --- a/ovsdb/row.c +++ b/ovsdb/row.c @@ -155,6 +155,23 @@ ovsdb_row_clone(const struct ovsdb_row *old) return new; } +struct ovsdb_row * +ovsdb_row_datum_clone(const struct ovsdb_row *old) +{ + const struct ovsdb_table *table = old->table; + const struct shash_node *node; + struct ovsdb_row *new; + + new = allocate_row(table); + SHASH_FOR_EACH (node, &table->schema->columns) { + const struct ovsdb_column *column = node->data; + ovsdb_datum_clone(&new->fields[column->index], + &old->fields[column->index]); + } + return new; +} + + /* The caller is responsible for ensuring that 'row' has been removed from its * table and that it is not participating in a transaction. */ void diff --git a/ovsdb/row.h b/ovsdb/row.h index 4d3c17afc..ff91288fe 100644 --- a/ovsdb/row.h +++ b/ovsdb/row.h @@ -93,6 +93,7 @@ void ovsdb_weak_ref_destroy(struct ovsdb_weak_ref *); struct ovsdb_row *ovsdb_row_create(const struct ovsdb_table *); struct ovsdb_row *ovsdb_row_clone(const struct ovsdb_row *); +struct ovsdb_row *ovsdb_row_datum_clone(const struct ovsdb_row *); void ovsdb_row_destroy(struct ovsdb_row *); uint32_t ovsdb_row_hash_columns(const struct ovsdb_row *, diff --git a/ovsdb/storage.c b/ovsdb/storage.c index d4984be25..e8f95ce64 100644 --- a/ovsdb/storage.c +++ b/ovsdb/storage.c @@ -576,7 +576,7 @@ ovsdb_storage_should_snapshot(struct ovsdb_storage *storage) static struct ovsdb_error * OVS_WARN_UNUSED_RESULT ovsdb_storage_store_snapshot__(struct ovsdb_storage *storage, const struct json *schema, - const struct json *data) + const struct json *data, uint64_t index) { if (storage->raft) { struct json *entries = json_array_create_empty(); @@ -587,7 +587,7 @@ ovsdb_storage_store_snapshot__(struct ovsdb_storage *storage, json_array_add(entries, json_clone(data)); } struct ovsdb_error *error = raft_store_snapshot(storage->raft, - entries); + entries, index); json_destroy(entries); return error; } else if (storage->log) { @@ -611,10 +611,11 @@ ovsdb_storage_store_snapshot__(struct ovsdb_storage *storage, struct ovsdb_error * OVS_WARN_UNUSED_RESULT ovsdb_storage_store_snapshot(struct ovsdb_storage *storage, const struct json *schema, - const struct json *data) + const struct json *data, uint64_t index) { struct ovsdb_error *error = ovsdb_storage_store_snapshot__(storage, - schema, data); + schema, data, + index); bool retry_quickly = error != NULL; schedule_next_snapshot(storage, retry_quickly); return error; @@ -638,7 +639,7 @@ ovsdb_storage_write_schema_change(struct ovsdb_storage *storage, prereq, &result); json_destroy(txn_json); } else if (storage->log) { - w->error = ovsdb_storage_store_snapshot__(storage, schema, data); + w->error = ovsdb_storage_store_snapshot__(storage, schema, data, 0); } else { /* When 'error' and 'command' are both null, it indicates that the * command is complete. This is fine since this unbacked storage drops diff --git a/ovsdb/storage.h b/ovsdb/storage.h index ff026b77f..a1fdaa564 100644 --- a/ovsdb/storage.h +++ b/ovsdb/storage.h @@ -79,7 +79,8 @@ void ovsdb_write_destroy(struct ovsdb_write *); bool ovsdb_storage_should_snapshot(struct ovsdb_storage *); struct ovsdb_error *ovsdb_storage_store_snapshot(struct ovsdb_storage *storage, const struct json *schema, - const struct json *snapshot) + const struct json *snapshot, + uint64_t applied_index) OVS_WARN_UNUSED_RESULT; struct ovsdb_write *ovsdb_storage_write_schema_change(