From patchwork Wed Jul 29 04:40:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1338149 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BGgp94mPwz9sRN; Wed, 29 Jul 2020 14:40:33 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1k0dtF-0008Mw-Pz; Wed, 29 Jul 2020 04:40:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtD-0008L9-HX for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:27 +0000 Received: from mail-pj1-f70.google.com ([209.85.216.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k0dtD-0008Vb-0Y for kernel-team@lists.ubuntu.com; Wed, 29 Jul 2020 04:40:27 +0000 Received: by mail-pj1-f70.google.com with SMTP id k4so1310586pjs.1 for ; Tue, 28 Jul 2020 21:40:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R6zxosDTn/oVlhu2//9q6z622BhAE2ph9Gahl9XLIVw=; b=G6PB5V0+u8IMAYk/2mTlvjVI1fZerjXWPJs8iV1UnEKxXLBB+wb2kOAnsE2hJYTeh5 sgASHZHSIGPGkOfGyIN+x+lWlrnYx8T4I2BBGM2npJ2YpDXxDswOMBJqFO6KmqsPkZwd 8/dbgv16uU5eZIa/adXY2/CGZGs0o+ieDZfe/utnsKf/lnpTxVTLvN5M7xUKCi4MbH2n Wvi6FgI8ZTSZvF5MXv1z3gRmpTDHC1xK7FuUv6G8UfzwQGT6KSnOc88UwLhZdPRKFVbY +8JRoBph8FcQjo0EN7DXlCs1TQZMFoDGdaDcuTCAEODMkFnVqi2PIBTR3hO2s9RfeFZN Mmpg== X-Gm-Message-State: AOAM531vHHWRQjc1xKe2y3OM6zNCfl/g6NaM1l2AlwS3GCr/LOR7RxEu umuNegrY+NN+Z63WTLE5CoGhMMsbJvwe/ZdcNn4bjKlfmiVmIEA0q7tO40CAS0eEfzS7Zkg0jBk Vw56dJbLVcqYE0nrlbEcGcPS4hDiON/hL97UP3mvt0Q== X-Received: by 2002:a62:7c4f:: with SMTP id x76mr13052278pfc.124.1595997625544; Tue, 28 Jul 2020 21:40:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6BZAqA+KOztKGZec8e8zWos0jfSRaiukTYXXUaNHtbDLnHtH1QjzZbCy/zlXmS/nsnWx9Og== X-Received: by 2002:a62:7c4f:: with SMTP id x76mr13052266pfc.124.1595997625277; Tue, 28 Jul 2020 21:40:25 -0700 (PDT) Received: from localhost.localdomain (222-152-178-139-fibre.sparkbb.co.nz. [222.152.178.139]) by smtp.gmail.com with ESMTPSA id a2sm606062pgf.53.2020.07.28.21.40.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jul 2020 21:40:24 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][Bionic][PATCH 3/3] NFS: Fix interrupted slots by sending a solo SEQUENCE operation Date: Wed, 29 Jul 2020 16:40:02 +1200 Message-Id: <20200729044002.18762-4-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200729044002.18762-1-matthew.ruffell@canonical.com> References: <20200729044002.18762-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Anna Schumaker BugLink: https://bugs.launchpad.net/bugs/1887607 We used to do this before 3453d5708b33, but this was changed to better handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the slot re-use case when the server doesn't receive the interrupted operation, but if the server does receive the operation then it could still end up replying to the client with mis-matched operations from the reply cache. We can fix this by sending a SEQUENCE to the server while recovering from a SEQ_MISORDERED error when we detect that we are in an interrupted slot situation. Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are interrupted) Signed-off-by: Anna Schumaker (backported from commit 913fadc5b105c3619d9e8d0fe8899ff1593cc737) [mruffell: change const struct cred to struct rpc_cred in nfs4_probe_sequence] Signed-off-by: Matthew Ruffell --- fs/nfs/nfs4proc.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a9a7610e8bee..d72963177f68 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -708,6 +708,14 @@ static void nfs4_slot_sequence_acked(struct nfs4_slot *slot, slot->seq_nr_last_acked = seqnr; } +static void nfs4_probe_sequence(struct nfs_client *client, struct rpc_cred *cred, + struct nfs4_slot *slot) +{ + struct rpc_task *task = _nfs41_proc_sequence(client, cred, slot, true); + if (!IS_ERR(task)) + rpc_put_task_async(task); +} + static int nfs41_sequence_process(struct rpc_task *task, struct nfs4_sequence_res *res) { @@ -724,6 +732,7 @@ static int nfs41_sequence_process(struct rpc_task *task, goto out; session = slot->table->session; + clp = session->clp; trace_nfs4_sequence_done(session, res); @@ -738,7 +747,6 @@ static int nfs41_sequence_process(struct rpc_task *task, nfs4_slot_sequence_acked(slot, slot->seq_nr); /* Update the slot's sequence and clientid lease timer */ slot->seq_done = 1; - clp = session->clp; do_renew_lease(clp, res->sr_timestamp); /* Check sequence flags */ nfs41_handle_sequence_flag_errors(clp, res->sr_status_flags, @@ -786,10 +794,18 @@ static int nfs41_sequence_process(struct rpc_task *task, /* * Were one or more calls using this slot interrupted? * If the server never received the request, then our - * transmitted slot sequence number may be too high. + * transmitted slot sequence number may be too high. However, + * if the server did receive the request then it might + * accidentally give us a reply with a mismatched operation. + * We can sort this out by sending a lone sequence operation + * to the server on the same slot. */ if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1) { slot->seq_nr--; + if (task->tk_msg.rpc_proc != &nfs4_procedures[NFSPROC4_CLNT_SEQUENCE]) { + nfs4_probe_sequence(clp, task->tk_msg.rpc_cred, slot); + res->sr_slot = NULL; + } goto retry_nowait; } /*