From patchwork Fri Mar 1 18:56:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1050361 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="INTjZS8u"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 449zQY5dWWz9s00 for ; Sat, 2 Mar 2019 06:05:41 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id C352CC01B; Fri, 1 Mar 2019 19:05:05 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id ECBD5BEDD for ; Fri, 1 Mar 2019 18:56:47 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5E6B82D4 for ; Fri, 1 Mar 2019 18:56:47 +0000 (UTC) Received: by mail-pg1-f193.google.com with SMTP id q206so11855322pgq.4 for ; Fri, 01 Mar 2019 10:56:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=yKX3z66XgZDvCfXpdqV/gwFXS10gEVNQ+9M2mj1fwYM=; b=INTjZS8uJC1RTy2a+/DkvUzb7fHE6IQkIu/hc3PNKOc/BunK3eYAP6hVFXQZF4k+yX JbL1FGc/Qf8pgTCVvFj/oZB0JAYlCJk2V65r6GRaLhBfGHAVYzkSMPstZaj11dYLKdme 4FYG1/qfv5k7m9x5p0ySPq0S1VnRiCnxBQLoz6a40qfCnr/kwEarINp43S7wxFXPXk3T O7yJciuiytbS292blNKYaLyVUZcWYJ4YD0WRQxpFo2OEfSaaq68qSO0cb9MelB71Qzfi hXYRgCKTMoe13fgX0t/x9B3pZP9W+IoKW5LFgMtyaKYUuVYe8IupazU3snaUPreSrYhY Djyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=yKX3z66XgZDvCfXpdqV/gwFXS10gEVNQ+9M2mj1fwYM=; b=fhQSnEkDtZfM02gSUE0GR3chbcuzNCn+VOSfBB2oElx7mb3f7+ogYzsjc3b/jULhma l2WTfI4OB7ZGv09O4MQ5Mdm1Usz+T05li7tJlxcqWwFcB5qOM3ateDAbNL2PZojEGPx+ O+v4mVmpJm9VDjvMwx44AcZ+6qtdM2iWMRXR11DuL8HL1Wgm544bsNnO0OJ2O2+8WNTZ fxZzylPn55AUncdNJLt9B/OZeE21/D8e7Q+uLQSVo7/MpLQDveMgFuXy3PzjCsDhmYJp LtaEGDRU/0RK3tblV2F8ev7WqsuQCb7/NC6lHOymWIaIdbaYDiDEt/BR4v+CghvFWKvl K5bw== X-Gm-Message-State: APjAAAVyoY65vYIER8vabQJUwTIk8GuGI1u/nlM8m4XJakEf5Y4LcrKg cm+3W6lzO1JiN30DAcyXiaIZhsac X-Google-Smtp-Source: APXvYqyiSCWNcbRnqAPQW1VzwXUpyrbQA4ikjEuqaKkkOBLXHcdhVUhp6Cq656m7Qlt+iFGuG9OGAg== X-Received: by 2002:a63:1a25:: with SMTP id a37mr6452839pga.428.1551466606631; Fri, 01 Mar 2019 10:56:46 -0800 (PST) Received: from localhost.localdomain.localdomain ([216.113.160.77]) by smtp.gmail.com with ESMTPSA id l5sm29552109pfi.97.2019.03.01.10.56.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 01 Mar 2019 10:56:46 -0800 (PST) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Fri, 1 Mar 2019 10:56:37 -0800 Message-Id: <1551466597-87991-2-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1551466597-87991-1-git-send-email-hzhou8@ebay.com> References: <1551466597-87991-1-git-send-email-hzhou8@ebay.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH 2/2] ovsdb raft: Precheck prereq before proposing commit. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou In current OVSDB Raft design, when there are multiple transactions pending, either from same server node or different nodes in the cluster, only the first one can be successful at once, and following ones will fail at the prerequisite check on leader node, because the first one will update the expected prerequisite eid on leader node, and the prerequisite used for proposing a commit has to be committed eid, so it is not possible for a node to use the latest prerequisite expected by the leader to propose a commit until the lastest transaction is committed by the leader and updated the committed_index on the node. Current implementation proposes the commit as soon as the transaction is requested by the client, which results in continously retry which causes high CPU load and waste. Particularly, even if all clients are using leader_only to connect to only the leader, the prereq check failure still happens a lot when a batch of transactions are pending on the leader node - the leader node proposes a batch of commits using the same committed eid as prerequisite and it updates the expected prereq as soon as the first one is in progress, but it needs time to append to followers and wait until majority replies to update the committed_index, which results in continously useless retries of the following transactions proposed by the leader itself. This patch doesn't change the design but simplely pre-checks if current eid is same as prereq, before proposing the commit, to avoid waste of CPU cycles, for both leader and followers. When clients use leader_only mode, this patch completely eliminates the prereq check failures. In scale test of OVN with 1k HVs and creating and binding 10k lports, the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost reduction on followers. (The test was with leader election base time set to 10000ms, because otherwise the test couldn't complete because of the frequent leader re-election.) This is just one of the related performance problems of the prereq checking mechanism dicussed at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html Signed-off-by: Han Zhou --- ovsdb/TODO.rst | 3 --- ovsdb/raft.c | 2 +- ovsdb/raft.h | 1 + ovsdb/storage.c | 9 +++++++++ ovsdb/storage.h | 2 ++ ovsdb/transaction.c | 10 ++++++++++ ovsdb/transaction.h | 1 + ovsdb/trigger.c | 4 ++++ 8 files changed, 28 insertions(+), 4 deletions(-) diff --git a/ovsdb/TODO.rst b/ovsdb/TODO.rst index 3bd4e76..fb4a50f 100644 --- a/ovsdb/TODO.rst +++ b/ovsdb/TODO.rst @@ -39,9 +39,6 @@ OVSDB Clustering To-do List * Include index with monitor update? -* Back off when transaction fails to commit? Definitely back off until - the eid changes for prereq failures - * Testing with replication. * Handling bad transactions in read_db(). (Kill the database?) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 68b527c..eee4f33 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1906,7 +1906,7 @@ raft_get_eid(const struct raft *raft, uint64_t index) return &raft->snap.eid; } -static const struct uuid * +const struct uuid * raft_current_eid(const struct raft *raft) { return raft_get_eid(raft, raft->log_end - 1); diff --git a/ovsdb/raft.h b/ovsdb/raft.h index cd16782..3d44899 100644 --- a/ovsdb/raft.h +++ b/ovsdb/raft.h @@ -180,4 +180,5 @@ struct ovsdb_error *raft_store_snapshot(struct raft *, void raft_take_leadership(struct raft *); void raft_transfer_leadership(struct raft *, const char *reason); +const struct uuid *raft_current_eid(const struct raft *); #endif /* lib/raft.h */ diff --git a/ovsdb/storage.c b/ovsdb/storage.c index b810bff..e26252b 100644 --- a/ovsdb/storage.c +++ b/ovsdb/storage.c @@ -601,3 +601,12 @@ ovsdb_storage_write_schema_change(struct ovsdb_storage *storage, } return w; } + +const struct uuid * +ovsdb_storage_peek_last_eid(struct ovsdb_storage *storage) +{ + if (!storage->raft) { + return NULL; + } + return raft_current_eid(storage->raft); +} diff --git a/ovsdb/storage.h b/ovsdb/storage.h index 4a01fde..8a9bbab 100644 --- a/ovsdb/storage.h +++ b/ovsdb/storage.h @@ -91,4 +91,6 @@ struct ovsdb_storage *ovsdb_storage_open_standalone(const char *filename, bool rw); struct ovsdb_schema *ovsdb_storage_read_schema(struct ovsdb_storage *); +const struct uuid *ovsdb_storage_peek_last_eid(struct ovsdb_storage *); + #endif /* ovsdb/storage.h */ diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c index 9fc1fd7..67ea771 100644 --- a/ovsdb/transaction.c +++ b/ovsdb/transaction.c @@ -1011,6 +1011,16 @@ struct ovsdb_txn_progress { struct ovsdb_storage *storage; }; +bool +ovsdb_txn_precheck_prereq(const struct ovsdb *db) +{ + const struct uuid *eid = ovsdb_storage_peek_last_eid(db->storage); + if (!eid) { + return true; + } + return uuid_equals(&db->prereq, eid); +} + struct ovsdb_txn_progress * ovsdb_txn_propose_schema_change(struct ovsdb *db, const struct json *schema, diff --git a/ovsdb/transaction.h b/ovsdb/transaction.h index c819373..c21871a 100644 --- a/ovsdb/transaction.h +++ b/ovsdb/transaction.h @@ -29,6 +29,7 @@ void ovsdb_txn_set_txnid(const struct uuid *, struct ovsdb_txn *); const struct uuid *ovsdb_txn_get_txnid(const struct ovsdb_txn *); void ovsdb_txn_abort(struct ovsdb_txn *); +bool ovsdb_txn_precheck_prereq(const struct ovsdb *db); struct ovsdb_error *ovsdb_txn_replay_commit(struct ovsdb_txn *) OVS_WARN_UNUSED_RESULT; struct ovsdb_txn_progress *ovsdb_txn_propose_commit(struct ovsdb_txn *, diff --git a/ovsdb/trigger.c b/ovsdb/trigger.c index 3f62dc9..6f4ed96 100644 --- a/ovsdb/trigger.c +++ b/ovsdb/trigger.c @@ -194,6 +194,10 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long int now) struct ovsdb_txn *txn = NULL; struct ovsdb *newdb = NULL; if (!strcmp(t->request->method, "transact")) { + if (!ovsdb_txn_precheck_prereq(t->db)) { + return false; + } + bool durable; struct json *result;