From patchwork Wed Jul 29 04:40:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1338148 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BGgp750kjz9sTZ; Wed, 29 Jul 2020 14:40:31 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1k0dtD-0008L4-7M; Wed, 29 Jul 2020 04:40:27 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtA-0008KJ-5j for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:24 +0000 Received: from mail-pf1-f198.google.com ([209.85.210.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dt9-0008VN-Oy for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:23 +0000 Received: by mail-pf1-f198.google.com with SMTP id p127so16443012pfb.18 for ; Tue, 28 Jul 2020 21:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d0M5WctTiP89gPYIJ1+fNA+xtoM8T+YQFwUdEs9VNt0=; b=Zz7NRd6FL1Mc79Px7Fpo9Uw/sdVMCQnxRjXC+lg3I5RsAFsCwDHcunzIjv6HWBu8Xi mTm6F2Jw9MPzQh7vtLiCJumMavX4/H1vuK024lPikcgG9mcr/ToK7fsv+V6cFIAloEAx f2PTTwxCT7CjWIUN5tbG5u4uCICQlfU+mcTEURsd26Z+OYx78A74j8iJO0qK5XIRz1lW mrHX71G6+vym0y0hH/HfUDmx6g9v4qxMFwExnHybUQuTHA/9dfYMsKPbEoLNANvKKhBa MIg2NyVL+8a+VEruHM9aTG0Tp1bGmN85UXAUZexMkFnOTWPYiCLJGiSOZK8bYyPdwOgB us8g== X-Gm-Message-State: AOAM533E7o2x4RWn1Xa/tU5+qpN9RrkabQlAqWyOd4nuCd02g+q40SLI UJ8bNUZ7hvTD8dUFjdjub9WU8i0IkBBjHzU3km+gRh1vpmDhQUIoP91Z8GblAEoIWGaN4U/6uSR osG9cjPbA12JMXYalM3hhv5dgAkcwI/hEHKs1f2zzkw== X-Received: by 2002:a17:90b:2350:: with SMTP id ms16mr8111189pjb.224.1595997622081; Tue, 28 Jul 2020 21:40:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwfchjJA5RRCROXo4bNnTmyRDNI5xteTsL965q7Jq0OUavkM0nlWnBh6qSoBmbbUH/x9UiGvw== X-Received: by 2002:a17:90b:2350:: with SMTP id ms16mr8111156pjb.224.1595997621574; Tue, 28 Jul 2020 21:40:21 -0700 (PDT) Received: from localhost.localdomain (222-152-178-139-fibre.sparkbb.co.nz. [222.152.178.139]) by smtp.gmail.com with ESMTPSA id a2sm606062pgf.53.2020.07.28.21.40.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jul 2020 21:40:21 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][Bionic][PATCH 1/3] NFSv4.1: Avoid false retries when RPC calls are interrupted Date: Wed, 29 Jul 2020 16:40:00 +1200 Message-Id: <20200729044002.18762-2-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200729044002.18762-1-matthew.ruffell@canonical.com> References: <20200729044002.18762-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Trond Myklebust BugLink: https://bugs.launchpad.net/bugs/1887607 A 'false retry' in NFSv4.1 occurs when the client attempts to transmit a new RPC call using a slot+sequence number combination that references an already cached one. Currently, the Linux NFS client will do this if a user process interrupts an RPC call that is in progress. The problem with doing so is that we defeat the main mechanism used by the server to differentiate between a new call and a replayed one. Even if the server is able to perfectly cache the arguments of the old call, it cannot know if the client intended to replay or send a new call. The obvious fix is to bump the sequence number pre-emptively if an RPC call is interrupted, but in order to deal with the corner cases where the interrupted call is not actually received and processed by the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED as a sign that we need to either wait or locate a correct sequence number that lies between the value we sent, and the last value that was acked by a SEQUENCE call on that slot. Signed-off-by: Trond Myklebust Tested-by: Jason Tibbitts (backported from commit 3453d5708b33efe76f40eca1c0ed60923094b971) [mruffell: fixup deletion of nfs4_sequence_process_interrupted] Signed-off-by: Matthew Ruffell --- fs/nfs/nfs4proc.c | 105 ++++++++++++++++++++----------------------- fs/nfs/nfs4session.c | 5 ++- fs/nfs/nfs4session.h | 5 ++- 3 files changed, 55 insertions(+), 60 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index f26c3f68cc0d..3aa2643a8f53 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -697,13 +697,25 @@ static void nfs41_sequence_free_slot(struct nfs4_sequence_res *res) res->sr_slot = NULL; } +static void nfs4_slot_sequence_record_sent(struct nfs4_slot *slot, + u32 seqnr) +{ + if ((s32)(seqnr - slot->seq_nr_highest_sent) > 0) + slot->seq_nr_highest_sent = seqnr; +} +static void nfs4_slot_sequence_acked(struct nfs4_slot *slot, + u32 seqnr) +{ + slot->seq_nr_highest_sent = seqnr; + slot->seq_nr_last_acked = seqnr; +} + static int nfs41_sequence_process(struct rpc_task *task, struct nfs4_sequence_res *res) { struct nfs4_session *session; struct nfs4_slot *slot = res->sr_slot; struct nfs_client *clp; - bool interrupted = false; int ret = 1; if (slot == NULL) @@ -714,16 +726,12 @@ static int nfs41_sequence_process(struct rpc_task *task, session = slot->table->session; - if (slot->interrupted) { - if (res->sr_status != -NFS4ERR_DELAY) - slot->interrupted = 0; - interrupted = true; - } - trace_nfs4_sequence_done(session, res); /* Check the SEQUENCE operation status */ switch (res->sr_status) { case 0: + /* Mark this sequence number as having been acked */ + nfs4_slot_sequence_acked(slot, slot->seq_nr); /* Update the slot's sequence and clientid lease timer */ slot->seq_done = 1; clp = session->clp; @@ -738,9 +746,9 @@ static int nfs41_sequence_process(struct rpc_task *task, * sr_status remains 1 if an RPC level error occurred. * The server may or may not have processed the sequence * operation.. - * Mark the slot as having hosted an interrupted RPC call. */ - slot->interrupted = 1; + nfs4_slot_sequence_record_sent(slot, slot->seq_nr); + slot->seq_done = 1; goto out; case -NFS4ERR_DELAY: /* The server detected a resend of the RPC call and @@ -751,6 +759,7 @@ static int nfs41_sequence_process(struct rpc_task *task, __func__, slot->slot_nr, slot->seq_nr); + nfs4_slot_sequence_acked(slot, slot->seq_nr); goto out_retry; case -NFS4ERR_RETRY_UNCACHED_REP: case -NFS4ERR_SEQ_FALSE_RETRY: @@ -758,6 +767,7 @@ static int nfs41_sequence_process(struct rpc_task *task, * The server thinks we tried to replay a request. * Retry the call after bumping the sequence ID. */ + nfs4_slot_sequence_acked(slot, slot->seq_nr); goto retry_new_seq; case -NFS4ERR_BADSLOT: /* @@ -768,21 +778,28 @@ static int nfs41_sequence_process(struct rpc_task *task, goto session_recover; goto retry_nowait; case -NFS4ERR_SEQ_MISORDERED: + nfs4_slot_sequence_record_sent(slot, slot->seq_nr); /* - * Was the last operation on this sequence interrupted? - * If so, retry after bumping the sequence number. + * Were one or more calls using this slot interrupted? + * If the server never received the request, then our + * transmitted slot sequence number may be too high. */ - if (interrupted) - goto retry_new_seq; - /* - * Could this slot have been previously retired? - * If so, then the server may be expecting seq_nr = 1! - */ - if (slot->seq_nr != 1) { - slot->seq_nr = 1; + if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1) { + slot->seq_nr--; goto retry_nowait; } - goto session_recover; + /* + * RFC5661: + * A retry might be sent while the original request is + * still in progress on the replier. The replier SHOULD + * deal with the issue by returning NFS4ERR_DELAY as the + * reply to SEQUENCE or CB_SEQUENCE operation, but + * implementations MAY return NFS4ERR_SEQ_MISORDERED. + * + * Restart the search after a delay. + */ + slot->seq_nr = slot->seq_nr_highest_sent; + goto out_retry; default: /* Just update the slot sequence no. */ slot->seq_done = 1; @@ -873,17 +890,6 @@ static const struct rpc_call_ops nfs41_call_sync_ops = { .rpc_call_done = nfs41_call_sync_done, }; -static void -nfs4_sequence_process_interrupted(struct nfs_client *client, - struct nfs4_slot *slot, struct rpc_cred *cred) -{ - struct rpc_task *task; - - task = _nfs41_proc_sequence(client, cred, slot, true); - if (!IS_ERR(task)) - rpc_put_task_async(task); -} - #else /* !CONFIG_NFS_V4_1 */ static int nfs4_sequence_process(struct rpc_task *task, struct nfs4_sequence_res *res) @@ -904,14 +910,6 @@ int nfs4_sequence_done(struct rpc_task *task, } EXPORT_SYMBOL_GPL(nfs4_sequence_done); -static void -nfs4_sequence_process_interrupted(struct nfs_client *client, - struct nfs4_slot *slot, struct rpc_cred *cred) -{ - WARN_ON_ONCE(1); - slot->interrupted = 0; -} - #endif /* !CONFIG_NFS_V4_1 */ static void nfs41_sequence_res_init(struct nfs4_sequence_res *res) @@ -952,26 +950,19 @@ int nfs4_setup_sequence(struct nfs_client *client, task->tk_timeout = 0; } - for (;;) { - spin_lock(&tbl->slot_tbl_lock); - /* The state manager will wait until the slot table is empty */ - if (nfs4_slot_tbl_draining(tbl) && !args->sa_privileged) - goto out_sleep; - - slot = nfs4_alloc_slot(tbl); - if (IS_ERR(slot)) { - /* Try again in 1/4 second */ - if (slot == ERR_PTR(-ENOMEM)) - task->tk_timeout = HZ >> 2; - goto out_sleep; - } - spin_unlock(&tbl->slot_tbl_lock); + spin_lock(&tbl->slot_tbl_lock); + /* The state manager will wait until the slot table is empty */ + if (nfs4_slot_tbl_draining(tbl) && !args->sa_privileged) + goto out_sleep; - if (likely(!slot->interrupted)) - break; - nfs4_sequence_process_interrupted(client, - slot, task->tk_msg.rpc_cred); + slot = nfs4_alloc_slot(tbl); + if (IS_ERR(slot)) { + /* Try again in 1/4 second */ + if (slot == ERR_PTR(-ENOMEM)) + task->tk_timeout = HZ >> 2; + goto out_sleep; } + spin_unlock(&tbl->slot_tbl_lock); nfs4_sequence_attach_slot(args, res, slot); diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c index 769b85655c4b..fdb75da5d349 100644 --- a/fs/nfs/nfs4session.c +++ b/fs/nfs/nfs4session.c @@ -110,6 +110,8 @@ static struct nfs4_slot *nfs4_new_slot(struct nfs4_slot_table *tbl, slot->table = tbl; slot->slot_nr = slotid; slot->seq_nr = seq_init; + slot->seq_nr_highest_sent = seq_init; + slot->seq_nr_last_acked = seq_init - 1; } return slot; } @@ -276,7 +278,8 @@ static void nfs4_reset_slot_table(struct nfs4_slot_table *tbl, p = &tbl->slots; while (*p) { (*p)->seq_nr = ivalue; - (*p)->interrupted = 0; + (*p)->seq_nr_highest_sent = ivalue; + (*p)->seq_nr_last_acked = ivalue - 1; p = &(*p)->next; } tbl->highest_used_slotid = NFS4_NO_SLOT; diff --git a/fs/nfs/nfs4session.h b/fs/nfs/nfs4session.h index 3c550f297561..230509b77121 100644 --- a/fs/nfs/nfs4session.h +++ b/fs/nfs/nfs4session.h @@ -23,8 +23,9 @@ struct nfs4_slot { unsigned long generation; u32 slot_nr; u32 seq_nr; - unsigned int interrupted : 1, - privileged : 1, + u32 seq_nr_last_acked; + u32 seq_nr_highest_sent; + unsigned int privileged : 1, seq_done : 1; }; From patchwork Wed Jul 29 04:40:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1338147 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BGgp80p6wz9sTg; Wed, 29 Jul 2020 14:40:31 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1k0dtE-0008Lp-DE; Wed, 29 Jul 2020 04:40:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtB-0008Kf-Mj for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:25 +0000 Received: from mail-pg1-f197.google.com ([209.85.215.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtB-0008VU-7x for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:25 +0000 Received: by mail-pg1-f197.google.com with SMTP id y28so17274629pge.23 for ; Tue, 28 Jul 2020 21:40:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K/rt8jfxpgUYb/K9m4D0jouIT4GVolqBm/7wzYXafoQ=; b=t5joGedALdcDWmlRM92ylWPcvEsO4gOtstFljQs+e90uob9T4sPhK3aXkwgQsJyh1R EIwiuYXC2ORLPLUEakLlO09PCtK/9tUs+iIO6Ijb3jufkXlYgNB0BuJ76FEzOWTjDGbu QjYPylylsO5xaFEgHTqD0AlrqGcIM7jcPAZV4h5qM68Un0mb1u+jJ+hRvRkmRUidAM32 qyV0PwC7Z4RAqxSs45gsF0bZilD/mSNjF6+0Ccb3DhntzAlkDtW5a8deeJdfFdnft0kw f8lL3M8sXiL/4vwno5sTXdYnuosk2cduJs0t7N0zbxzLFQtG48XS42GmohfIi4ELAMox QEGw== X-Gm-Message-State: AOAM530GmMBPLHNBoInBFSRu6kwOS3d1//mANhi+V45MJG+7yAzW6j7v 1onFPOIrsmm1dcg7gmlxIIiqblo57neYxXXPzcEPqRT3WOqtP/j7kePIY+aJ/UromnTDGb3FKAJ weVwlOu8CZEVQT1+H7RRXNMRTmKsgqHQ0TA0O0k0rIw== X-Received: by 2002:a17:90b:112:: with SMTP id p18mr7970039pjz.92.1595997623778; Tue, 28 Jul 2020 21:40:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw0LX3fFKRTj424JH2u5smDIsv/zA2wXIktoNzKbKM2TvLLMyL2lGoqzweC3ZjdDvNGPLFm4w== X-Received: by 2002:a17:90b:112:: with SMTP id p18mr7970022pjz.92.1595997623495; Tue, 28 Jul 2020 21:40:23 -0700 (PDT) Received: from localhost.localdomain (222-152-178-139-fibre.sparkbb.co.nz. [222.152.178.139]) by smtp.gmail.com with ESMTPSA id a2sm606062pgf.53.2020.07.28.21.40.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jul 2020 21:40:23 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][Bionic][PATCH 2/3] NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process() Date: Wed, 29 Jul 2020 16:40:01 +1200 Message-Id: <20200729044002.18762-3-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200729044002.18762-1-matthew.ruffell@canonical.com> References: <20200729044002.18762-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Trond Myklebust BugLink: https://bugs.launchpad.net/bugs/1887607 If the server returns a bad or dead session error, the we don't want to update the session slot number, but just immediately schedule recovery and allow it to proceed. We can/should then remove handling in other places Fixes: 3453d5708b33 ("NFSv4.1: Avoid false retries when RPC calls are interrupted") Signed-off-by: Trond Myklebust (cherry picked from commit 5c441544f045e679afd6c3c6d9f7aaf5fa5f37b0) Signed-off-by: Matthew Ruffell --- fs/nfs/nfs4proc.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 3aa2643a8f53..a9a7610e8bee 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -454,9 +454,7 @@ static int nfs4_do_handle_exception(struct nfs_server *server, case -NFS4ERR_DEADSESSION: case -NFS4ERR_SEQ_FALSE_RETRY: case -NFS4ERR_SEQ_MISORDERED: - dprintk("%s ERROR: %d Reset session\n", __func__, - errorcode); - nfs4_schedule_session_recovery(clp->cl_session, errorcode); + /* Handled in nfs41_sequence_process() */ goto wait_on_recovery; #endif /* defined(CONFIG_NFS_V4_1) */ case -NFS4ERR_FILE_OPEN: @@ -716,6 +714,7 @@ static int nfs41_sequence_process(struct rpc_task *task, struct nfs4_session *session; struct nfs4_slot *slot = res->sr_slot; struct nfs_client *clp; + int status; int ret = 1; if (slot == NULL) @@ -727,8 +726,13 @@ static int nfs41_sequence_process(struct rpc_task *task, session = slot->table->session; trace_nfs4_sequence_done(session, res); + + status = res->sr_status; + if (task->tk_status == -NFS4ERR_DEADSESSION) + status = -NFS4ERR_DEADSESSION; + /* Check the SEQUENCE operation status */ - switch (res->sr_status) { + switch (status) { case 0: /* Mark this sequence number as having been acked */ nfs4_slot_sequence_acked(slot, slot->seq_nr); @@ -800,6 +804,10 @@ static int nfs41_sequence_process(struct rpc_task *task, */ slot->seq_nr = slot->seq_nr_highest_sent; goto out_retry; + case -NFS4ERR_BADSESSION: + case -NFS4ERR_DEADSESSION: + case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: + goto session_recover; default: /* Just update the slot sequence no. */ slot->seq_done = 1; @@ -810,8 +818,10 @@ static int nfs41_sequence_process(struct rpc_task *task, out_noaction: return ret; session_recover: - nfs4_schedule_session_recovery(session, res->sr_status); - goto retry_nowait; + nfs4_schedule_session_recovery(session, status); + dprintk("%s ERROR: %d Reset session\n", __func__, status); + nfs41_sequence_free_slot(res); + goto out; retry_new_seq: ++slot->seq_nr; retry_nowait: @@ -2047,7 +2057,6 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct case -NFS4ERR_BAD_HIGH_SLOT: case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: case -NFS4ERR_DEADSESSION: - nfs4_schedule_session_recovery(server->nfs_client->cl_session, err); return -EAGAIN; case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_STALE_STATEID: @@ -7381,6 +7390,15 @@ nfs41_same_server_scope(struct nfs41_server_scope *a, static void nfs4_bind_one_conn_to_session_done(struct rpc_task *task, void *calldata) { + struct nfs41_bind_conn_to_session_args *args = task->tk_msg.rpc_argp; + struct nfs_client *clp = args->client; + + switch (task->tk_status) { + case -NFS4ERR_BADSESSION: + case -NFS4ERR_DEADSESSION: + nfs4_schedule_session_recovery(clp->cl_session, + task->tk_status); + } } static const struct rpc_call_ops nfs4_bind_one_conn_to_session_ops = { @@ -8427,8 +8445,6 @@ static int nfs41_reclaim_complete_handle_errors(struct rpc_task *task, struct nf case -NFS4ERR_BADSESSION: case -NFS4ERR_DEADSESSION: case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: - nfs4_schedule_session_recovery(clp->cl_session, - task->tk_status); break; default: nfs4_schedule_lease_recovery(clp); From patchwork Wed Jul 29 04:40:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1338149 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BGgp94mPwz9sRN; Wed, 29 Jul 2020 14:40:33 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1k0dtF-0008Mw-Pz; Wed, 29 Jul 2020 04:40:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtD-0008L9-HX for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:27 +0000 Received: from mail-pj1-f70.google.com ([209.85.216.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtD-0008Vb-0Y for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:27 +0000 Received: by mail-pj1-f70.google.com with SMTP id k4so1310586pjs.1 for ; Tue, 28 Jul 2020 21:40:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R6zxosDTn/oVlhu2//9q6z622BhAE2ph9Gahl9XLIVw=; b=G6PB5V0+u8IMAYk/2mTlvjVI1fZerjXWPJs8iV1UnEKxXLBB+wb2kOAnsE2hJYTeh5 sgASHZHSIGPGkOfGyIN+x+lWlrnYx8T4I2BBGM2npJ2YpDXxDswOMBJqFO6KmqsPkZwd 8/dbgv16uU5eZIa/adXY2/CGZGs0o+ieDZfe/utnsKf/lnpTxVTLvN5M7xUKCi4MbH2n Wvi6FgI8ZTSZvF5MXv1z3gRmpTDHC1xK7FuUv6G8UfzwQGT6KSnOc88UwLhZdPRKFVbY +8JRoBph8FcQjo0EN7DXlCs1TQZMFoDGdaDcuTCAEODMkFnVqi2PIBTR3hO2s9RfeFZN Mmpg== X-Gm-Message-State: AOAM531vHHWRQjc1xKe2y3OM6zNCfl/g6NaM1l2AlwS3GCr/LOR7RxEu umuNegrY+NN+Z63WTLE5CoGhMMsbJvwe/ZdcNn4bjKlfmiVmIEA0q7tO40CAS0eEfzS7Zkg0jBk Vw56dJbLVcqYE0nrlbEcGcPS4hDiON/hL97UP3mvt0Q== X-Received: by 2002:a62:7c4f:: with SMTP id x76mr13052278pfc.124.1595997625544; Tue, 28 Jul 2020 21:40:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6BZAqA+KOztKGZec8e8zWos0jfSRaiukTYXXUaNHtbDLnHtH1QjzZbCy/zlXmS/nsnWx9Og== X-Received: by 2002:a62:7c4f:: with SMTP id x76mr13052266pfc.124.1595997625277; Tue, 28 Jul 2020 21:40:25 -0700 (PDT) Received: from localhost.localdomain (222-152-178-139-fibre.sparkbb.co.nz. [222.152.178.139]) by smtp.gmail.com with ESMTPSA id a2sm606062pgf.53.2020.07.28.21.40.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jul 2020 21:40:24 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][Bionic][PATCH 3/3] NFS: Fix interrupted slots by sending a solo SEQUENCE operation Date: Wed, 29 Jul 2020 16:40:02 +1200 Message-Id: <20200729044002.18762-4-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200729044002.18762-1-matthew.ruffell@canonical.com> References: <20200729044002.18762-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Anna Schumaker BugLink: https://bugs.launchpad.net/bugs/1887607 We used to do this before 3453d5708b33, but this was changed to better handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the slot re-use case when the server doesn't receive the interrupted operation, but if the server does receive the operation then it could still end up replying to the client with mis-matched operations from the reply cache. We can fix this by sending a SEQUENCE to the server while recovering from a SEQ_MISORDERED error when we detect that we are in an interrupted slot situation. Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are interrupted) Signed-off-by: Anna Schumaker (backported from commit 913fadc5b105c3619d9e8d0fe8899ff1593cc737) [mruffell: change const struct cred to struct rpc_cred in nfs4_probe_sequence] Signed-off-by: Matthew Ruffell --- fs/nfs/nfs4proc.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a9a7610e8bee..d72963177f68 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -708,6 +708,14 @@ static void nfs4_slot_sequence_acked(struct nfs4_slot *slot, slot->seq_nr_last_acked = seqnr; } +static void nfs4_probe_sequence(struct nfs_client *client, struct rpc_cred *cred, + struct nfs4_slot *slot) +{ + struct rpc_task *task = _nfs41_proc_sequence(client, cred, slot, true); + if (!IS_ERR(task)) + rpc_put_task_async(task); +} + static int nfs41_sequence_process(struct rpc_task *task, struct nfs4_sequence_res *res) { @@ -724,6 +732,7 @@ static int nfs41_sequence_process(struct rpc_task *task, goto out; session = slot->table->session; + clp = session->clp; trace_nfs4_sequence_done(session, res); @@ -738,7 +747,6 @@ static int nfs41_sequence_process(struct rpc_task *task, nfs4_slot_sequence_acked(slot, slot->seq_nr); /* Update the slot's sequence and clientid lease timer */ slot->seq_done = 1; - clp = session->clp; do_renew_lease(clp, res->sr_timestamp); /* Check sequence flags */ nfs41_handle_sequence_flag_errors(clp, res->sr_status_flags, @@ -786,10 +794,18 @@ static int nfs41_sequence_process(struct rpc_task *task, /* * Were one or more calls using this slot interrupted? * If the server never received the request, then our - * transmitted slot sequence number may be too high. + * transmitted slot sequence number may be too high. However, + * if the server did receive the request then it might + * accidentally give us a reply with a mismatched operation. + * We can sort this out by sending a lone sequence operation + * to the server on the same slot. */ if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1) { slot->seq_nr--; + if (task->tk_msg.rpc_proc != &nfs4_procedures[NFSPROC4_CLNT_SEQUENCE]) { + nfs4_probe_sequence(clp, task->tk_msg.rpc_cred, slot); + res->sr_slot = NULL; + } goto retry_nowait; } /*