From patchwork Sat Dec 2 21:14:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keno Fischer X-Patchwork-Id: 843929 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=juliacomputing-com.20150623.gappssmtp.com header.i=@juliacomputing-com.20150623.gappssmtp.com header.b="M8PmHBBQ"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yq3nM2DvWz9sBd for ; Sun, 3 Dec 2017 08:14:59 +1100 (AEDT) Received: from localhost ([::1]:36956 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eLF7e-0002WP-NN for incoming@patchwork.ozlabs.org; Sat, 02 Dec 2017 16:14:54 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42603) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eLF7I-0002WG-6G for qemu-devel@nongnu.org; Sat, 02 Dec 2017 16:14:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eLF7G-0005KK-Go for qemu-devel@nongnu.org; Sat, 02 Dec 2017 16:14:32 -0500 Received: from mail-qt0-x22f.google.com ([2607:f8b0:400d:c0d::22f]:41857) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eLF7G-0005JN-7D for qemu-devel@nongnu.org; Sat, 02 Dec 2017 16:14:30 -0500 Received: by mail-qt0-x22f.google.com with SMTP id i40so17119783qti.8 for ; Sat, 02 Dec 2017 13:14:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=juliacomputing-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mime-version:content-disposition :user-agent; bh=Xsg5MPs4ODmpP23iGxcnXnLVaBw/ISFf6TJCoVgnKgk=; b=M8PmHBBQ8NErflucmsnds8FWjpC1N36dgV24N8S28RdNw6M6ZtKd37t9NwnlyI2rbr 9kSQMhTRLXLAHFzkw2RBkgcuvSbqw/0a7rSHLhLzqw2SmLZ1PXgpsJq5R7dV6Y7GTG70 4j3UWl+Hw20ssmQeIIzU10fJvbxczGqYM7TvPqjoM8y0ef8XHA4DfrYW3Xj6+zKnqpC9 HsqKNgL77AYfiTeM84TLNE0aDYlKr5a6WUrfI2mLOudQXkYaiJJVVzFHdnflGT+HcCHO HVvxF92iAQsmOAjLp5nDlSjpk6UKB0rGMHTiB5ZS89DmLtUaXaDWgqBsqD7TomWLRyxJ 4i5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=Xsg5MPs4ODmpP23iGxcnXnLVaBw/ISFf6TJCoVgnKgk=; b=dwUJ2RsWrfsidIdS5x4LSViGb5DruN20RZVwA3rCy4LpI+Pc+E67OK3l+jWf6R+chC BQud8WvR1fECdHIGkEOlBFLRzSlcgsUmfJvXf7dGRlAheQV79O7ksfTlO4kqPONdgPd0 40sSKDvXWTRdz8pDOvDjSwD+iZqzMDmubPn4OVs0k81ZyQbKsR3weT24CJb34R0sxwLc ZBoXVDdppvOmeUWPXNHmm8fmlz0d1vURQDITeWADc7BLQH7SkwAlL07OO+CrjfCTqB1s UNqQYKLD2iMUeytyKG1n4Gc/mIrhPMsBSpMT690qgXM29X3QXYyK4JX8AwKbKho7Xxx3 pzTQ== X-Gm-Message-State: AKGB3mLtIKSd6N3PpiYl6eNd9Jv/Ag/10f5+G9iZCZZNbuCeWjadAK9q kAJQGW9fLNFCUWwMAm6QVdfs77YDA+E= X-Google-Smtp-Source: AGs4zMbDpggrEg5Xx7aTXOgwxBj0PU+6iQtTi3L2ekmOMZt8wMYJMm43jgfobzPEkdS8vNa93M/AwQ== X-Received: by 10.237.35.37 with SMTP id h34mr15369407qtc.9.1512249268771; Sat, 02 Dec 2017 13:14:28 -0800 (PST) Received: from juliacomputing.com ([2601:184:407f:f1b8:90a1:b1c6:add7:d3ce]) by smtp.gmail.com with ESMTPSA id l30sm2638314qtb.3.2017.12.02.13.14.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 02 Dec 2017 13:14:27 -0800 (PST) Date: Sat, 2 Dec 2017 16:14:25 -0500 From: Keno Fischer To: qemu-devel@nongnu.org Message-ID: <20171202211425.GA13189@juliacomputing.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.6.1 (2016-04-27) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c0d::22f Subject: [Qemu-devel] [PATCH] 9pfs: Correctly handle cancelled requests X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: groug@kaod.org, aneesh.kumar@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" # Background I was investigating spurious, non-deterministic EINTR returns from various 9p file system operations in a Linux guest served from the qemu 9p server. ## EINTR, ERESTARTSYS and the linux kernel When a signal arrives that the Linux kernel needs to deliver to user-space while a given thread is blocked (in the 9p case waiting for a reply to its request in 9p_client_rpc -> wait_event_interruptible), it asks whatever driver is currently running to abort its current operation (in the 9p case causing the submission of a TFLUSH message) and return to user space. In these situations, the error message reported is generally ERESTARTSYS. If the userspace processes specified SA_RESTART, this means that the system call will get restarted upon completion of the signal handler delivery (assuming the signal handler doesn't modify the process state in complicated ways not relevant here). If SA_RESTART is not specified, ERESTARTSYS gets translated to EINTR and user space is expected to handle the restart itself. ## The 9p TFLISH command The 9p TFLUSH commands requests that the server abort an ongoing operation. The man page [1] specifies: ``` If it recognizes oldtag as the tag of a pending transaction, it should abort any pending response and discard that tag. [...] When the client sends a Tflush, it must wait to receive the corresponding Rflush before reusing oldtag for subsequent messages. If a response to the flushed request is received before the Rflush, the client must honor the response as if it had not been flushed, since the completed request may signify a state change in the server ``` In particular, this means that the server must not send a reply with the orignal tag in response to the cancellation request, because the client is obligated to interpret such a reply as a coincidental reply to the original request. # The bug When qemu receives a TFlush request, it sets the `cancelled` flag on the relevant pdu. This flag is periodically checked, e.g. in `v9fs_co_name_to_path`, and if set, the operation is aborted and the error is set to EINTR. However, the server then violates the spec, by returning to the client an Rerror response, rather than discarding the message entirely. As a result, the client is required to assume that said Rerror response is a result of the original request, not a result of the cancellation and thus passes the EINTR error back to user space. This is not the worst thing it could do, however as discussed above, the correct error code would have been ERESTARTSYS, such that user space programs with SA_RESTART set get correctly restarted upon completion of the signal handler. Instead, such programs get spurious EINTR results that they were not expecting to handle. It should be noted that there are plenty of user space programs that do not set SA_RESTART and do not correctly handle EINTR either. However, that is then a userspace bug. It should also be noted that this bug has been mitigated by a recent commit to the Linux kernel [2], which essentially prevents the kernel from sending Tflush requests unless the process is about to die (in which case the process likely doesn't care about the response). Nevertheless, for older kernels and to comply with the spec, I believe this change is beneficial. # Implementation The fix is fairly simple, just skipping notification of a reply if the pdu was previously cancelled. I also added a new trace event to distinguish operations that caused an error reply from those that were cancelled. One complication is that we only omit sending the message on EINTR errors in order to avoid confusing the rest of the code (which may assume that a client knows about a fid if it sucessfully passed it off to pud_complete without checking for cancellation status). This does mean that if the server acts upon the cancellation flag, it always needs to set err to EINTR. I believe this is true of the current code. [1] https://9fans.github.io/plan9port/man/man9/flush.html [2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc89103754de52 Signed-off-by: Keno Fischer --- hw/9pfs/9p.c | 17 +++++++++++++++++ hw/9pfs/trace-events | 1 + 2 files changed, 18 insertions(+) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 710cd91..46f406b 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -648,6 +648,22 @@ static void coroutine_fn pdu_complete(V9fsPDU *pdu, ssize_t len) V9fsState *s = pdu->s; int ret; + /* + * The 9p spec requires that successfully cancelled pdus receive no reply. + * Sending a reply would confuse clients because they would + * assume that any EINTR is the actual result of the operation, + * rather than a consequence of the cancellation. However, if + * the operation completed (succesfully or with an error other + * than caused be cancellation), we do send out that reply, both + * for efficiency and to avoid confusing the rest of the state machine + * that assumes passing a non-error here will mean a successful + * transmission of the reply. + */ + if (pdu->cancelled && len == -EINTR) { + trace_v9fs_rcancel(pdu->tag, pdu->id); + goto out_wakeup; + } + if (len < 0) { int err = -len; len = 7; @@ -690,6 +706,7 @@ static void coroutine_fn pdu_complete(V9fsPDU *pdu, ssize_t len) out_notify: pdu->s->transport->push_and_notify(pdu); +out_wakeup: /* Now wakeup anybody waiting in flush for this request */ if (!qemu_co_queue_next(&pdu->complete)) { pdu_free(pdu); diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index 08a4abf..1aee350 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -1,6 +1,7 @@ # See docs/devel/tracing.txt for syntax documentation. # hw/9pfs/virtio-9p.c +v9fs_rcancel(uint16_t tag, uint8_t id) "tag %d id %d" v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s"