From patchwork Mon Nov 14 22:48:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703787 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DiLJIHkJ; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB5Hc2DBcz20KC for ; Tue, 15 Nov 2022 10:34:20 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie4-0002Lh-Tq; Mon, 14 Nov 2022 18:13:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouids-0001eb-SA for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGC-0002Lv-BW for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466139; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kzTEBYqs7XYX4CjhiYRYgBnv2AzlV5X742T2Py1Whn8=; b=DiLJIHkJ17SkiiDpKQAyedNJApetQB/rkfXCxxT1/jFEKv41vmQXMcEiCCoPRmkX4wBYPo kiueSPyT1PBjUlzbEkromRc579zuvzIFfIUvUWrQsuLYNWtSZ1TPgJmyfQJy3W8PHeIF6F 34e8f6n3yfIr+/qLH0wVAlqXJxvNdRs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-584-2mtY7N2BO5aHhvGgVXZN8g-1; Mon, 14 Nov 2022 17:48:55 -0500 X-MC-Unique: 2mtY7N2BO5aHhvGgVXZN8g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2B2BC85A59D; Mon, 14 Nov 2022 22:48:55 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B91940B48EA; Mon, 14 Nov 2022 22:48:54 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy Subject: [PATCH v2 01/15] nbd/client: Add safety check on chunk payload length Date: Mon, 14 Nov 2022 16:48:34 -0600 Message-Id: <20221114224848.2186298-2-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Our existing use of structured replies either reads into a qiov capped at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see NBD_MAX_MALLOC_PAYLOAD in block/nbd.c). But the existing length checks are rather late; if we encounter a buggy (or malicious) server that sends a super-large payload length, we should drop the connection right then rather than assuming the layer on top will be careful. This becomes more important when we permit 64-bit lengths which are even more likely to have the potential for attempted denial of service abuse. Signed-off-by: Eric Blake --- nbd/client.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/nbd/client.c b/nbd/client.c index 90a6b7b38b..cd97a2aa09 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -1412,6 +1412,18 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, chunk->handle = be64_to_cpu(chunk->handle); chunk->length = be32_to_cpu(chunk->length); + /* + * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests + * at 32M, no valid server should send us payload larger than + * this. Even if we stopped using REQ_ONE, sane servers will cap + * the number of extents they return for block status. + */ + if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) { + error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long", + chunk->type, nbd_rep_lookup(chunk->type)); + return -EINVAL; + } + return 0; } From patchwork Mon Nov 14 22:46:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703772 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=e1mQ+lrH; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB4sN0f8gz23mY for ; Tue, 15 Nov 2022 10:15:02 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie9-0002ZI-Fa; Mon, 14 Nov 2022 18:13:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidu-0001eb-3j for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiEK-0002C6-EI for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:47:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N+e9YSpNcasjSd/2vWE8941Sgr54UDwzaslyclKxDqQ=; b=e1mQ+lrH3FfQtbiXBKn9J0AGDAOu3Qj8hwNrQ/GQ5cJn7NM5aGNKSuyySbBD3+ZqYcTDzX epjDLciZgXP9/0LARfbTgxwdDDj1Ej2n5sSWB3oWVejuLW2eE4rGbPPoT/HwIYdqE3jfld mDi97/veoUqVzN9Jj1vaWgAtVM2/sso= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-327-5FSCGvSyMaOhcr0VHvAfrQ-1; Mon, 14 Nov 2022 17:47:00 -0500 X-MC-Unique: 5FSCGvSyMaOhcr0VHvAfrQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EDF3485A59D; Mon, 14 Nov 2022 22:46:59 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 871682024CC0; Mon, 14 Nov 2022 22:46:59 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Subject: [PATCH v2 2/6] spec: Tweak description of maximum block size Date: Mon, 14 Nov 2022 16:46:51 -0600 Message-Id: <20221114224655.2186173-3-eblake@redhat.com> In-Reply-To: <20221114224655.2186173-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224655.2186173-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Commit 9f30fedb improved the spec to allow non-payload requests that exceed any advertised maximum block size. Take this one step further by permitting the server to use NBD_EOVERFLOW as a hint to the client when a request is oversize (while permitting NBD_EINVAL for back-compat), and by rewording the text to explicitly call out that what is being advertised is the maximum payload length, not maximum block size. This becomes more important when we add 64-bit extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS` to have both an effect length (how much of the image does the client want status on - may be larger than 32 bits) and an optional payload length (a way to filter the response to a subset of negotiated metadata contexts). In the shorter term, it means that a server may (but not must) accept a read request larger than the maximum block size if it can use structured replies to keep each chunk of the response under the maximum payload limits. --- doc/proto.md | 127 +++++++++++++++++++++++++++++---------------------- 1 file changed, 73 insertions(+), 54 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 8f08583..53c334a 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -745,8 +745,8 @@ text unless the client insists on TLS. During transmission phase, several operations are constrained by the export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`, -as well as by three block size constraints defined here (minimum, -preferred, and maximum). +as well as by three block size constraints defined here (minimum +block, preferred block, and maximum payload). If a client can honour server block size constraints (as set out below and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the @@ -772,15 +772,15 @@ learn the server's constraints without committing to them. If block size constraints have not been advertised or agreed on externally, then a server SHOULD support a default minimum block size -of 1, a preferred block size of 2^12 (4,096), and a maximum block size -that is effectively unlimited (0xffffffff, or the export size if that -is smaller), while a client desiring maximum interoperability SHOULD -constrain its requests to a minimum block size of 2^9 (512), and limit -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of -2^25 (33,554,432). A server that wants to enforce block sizes other -than the defaults specified here MAY refuse to go into transmission -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard -disconnect) or which uses `NBD_OPT_GO` without requesting +of 1, a preferred block size of 2^12 (4,096), and a maximum block +payload size that is at least 2^25 (33,554,432) (even if the export +size is smaller); while a client desiring maximum interoperability +SHOULD constrain its requests to a minimum block size of 2^9 (512), +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum +block size of 2^25 (33,554,432). A server that wants to enforce block +sizes other than the defaults specified here MAY refuse to go into +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via +a hard disconnect) or which uses `NBD_OPT_GO` without requesting `NBD_INFO_BLOCK_SIZE` (via an error reply of `NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients that do not request sizing information when the server supports @@ -818,17 +818,40 @@ the preferred block size for that export. The server MAY advertise an export size that is not an integer multiple of the preferred block size. -The maximum block size represents the maximum length that the server -is willing to handle in one request. If advertised, it MAY be -something other than a power of 2, but MUST be either an integer -multiple of the minimum block size or the value 0xffffffff for no -inherent limit, MUST be at least as large as the smaller of the +The maximum block payload size represents the maximum payload length +that the server is willing to handle in one request. If advertised, +it MAY be something other than a power of 2, but MUST be either an +integer multiple of the minimum block size or the value 0xffffffff for +no inherent limit, MUST be at least as large as the smaller of the preferred block size or export size, and SHOULD be at least 2^20 (1,048,576) if the export is that large. For convenience, the server -MAY advertise a maximum block size that is larger than the export -size, although in that case, the client MUST treat the export size as -the effective maximum block size (as further constrained by a nonzero -offset). +MAY advertise a maximum block payload size that is larger than the +export size, although in that case, the client MUST treat the export +size as an effective maximum block size (as further constrained by a +nonzero offset). Notwithstanding any maximum block size advertised, +either the server or the client MAY initiate a hard disconnect if a +payload length of either a request or a reply would be large enough to +be deemed a denial of service attack; however, for maximum +portability, any payload *length* not exceeding 2^25 (33,554,432) +bytes SHOULD NOT be considered a denial of service attack, even if +that length is larger than the advertised maximum block payload size. + +For commands that require a payload in either direction and where the +client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ` +without structured replies), the client MUST NOT use a length larger +than the maximum block size. For replies where the payload length is +controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag +`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured +replies are negotiated), the server MUST NOT send an individual reply +chunk larger than the maximum payload size, although it may split the +overall reply among several chunks. For commands that do not require +a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY +request a length larger than the maximum block size; the server SHOULD +NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL` +error if the oversize request would require more server resources than +the same command operating on only a maximum block size (such as some +implementations of `NBD_CMD_WRITE_ZEROES` without the flag +`NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`). Where a transmission request can have a nonzero *offset* and/or *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`), @@ -836,24 +859,9 @@ the client MUST ensure that *offset* and *length* are integer multiples of any advertised minimum block size, and SHOULD use integer multiples of any advertised preferred block size where possible. For those requests, the client MUST NOT use a *length* which, when added to -*offset*, would exceed the export size. Also for NBD_CMD_READ, -NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for -when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a *length* -larger than any advertised maximum block size. -The server SHOULD report an `NBD_EINVAL` error if -the client's request is not aligned to advertised minimum block size -boundaries, or is larger than the advertised maximum block size. -Notwithstanding any maximum block size advertised, either the server -or the client MAY initiate a hard disconnect if the payload of an -`NBD_CMD_WRITE` request or `NBD_CMD_READ` reply would be large enough -to be deemed a denial of service attack; however, for maximum -portability, any *length* less than 2^25 (33,554,432) bytes SHOULD NOT -be considered a denial of service attack (even if the advertised -maximum block size is smaller). For all other commands, where the -*length* is not reflected in the payload (such as `NBD_CMD_TRIM` or -`NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with -an `NBD_EINVAL` error for a client that exceeds the maximum block size, -rather than initiating a hard disconnect. +*offset*, would exceed the export size. The server SHOULD report an +`NBD_EINVAL` error if the client's request is not aligned to advertised +minimum block size boundaries or would exceed the export size. ## Metadata querying @@ -1595,7 +1603,7 @@ during option haggling in the fixed newstyle negotiation. - 16 bits, `NBD_INFO_BLOCK_SIZE` - 32 bits, minimum block size - 32 bits, preferred block size - - 32 bits, maximum block size + - 32 bits, maximum block payload size * `NBD_REP_META_CONTEXT` (4) @@ -1743,7 +1751,8 @@ Some chunk types can additionally be categorized by role, such as *error chunks* or *content chunks*. Each type determines how to interpret the "length" bytes of payload. If the client receives an unknown or unexpected type, other than an *error chunk*, it -MUST initiate a hard disconnect. +MUST initiate a hard disconnect. A server MUST NOT send a chunk +larger than any advertised maximum block payload size. * `NBD_REPLY_TYPE_NONE` (0) @@ -1906,7 +1915,9 @@ The following request types exist: If structured replies were not negotiated, then a read request MUST always be answered by a simple reply, as documented above (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing - length bytes of data according to the client's request). + length bytes of data according to the client's request), which in + turn means any client request with a length larger than the + maximum block payload size will fail. If an error occurs, the server SHOULD set the appropriate error code in the error field. The server MAY then initiate a hard disconnect. @@ -1936,13 +1947,18 @@ The following request types exist: request, but MAY send the content chunks in any order (the client MUST reassemble content chunks into the correct order), and MAY send additional content chunks even after reporting an error - chunk. Note that a request for more than 2^32 - 8 bytes (if - permitted by block size constraints) MUST be split into at least - two chunks, so as not to overflow the length field of a reply - while still allowing space for the offset of each chunk. When no - error is detected, the server MUST send enough data chunks to - cover the entire region described by the offset and length of the - client's request. + chunk. A server MAY support read requests larger than the maximum + block payload size by splitting the response across multiple + chunks (in particular, a request for more than 2^32 - 8 bytes + containing data rather than holes MUST be split to avoid + overflowing the `NBD_REPLY_TYPE_OFFSET_DATA` length field); + however, the server is also permitted to reject large read + requests up front, so a client should be prepared to retry with + smaller requests if a large request fails. + + When no error is detected, the server MUST send enough data chunks + to cover the entire region described by the offset and length of + the client's request. To minimize traffic, the server MAY use a content or error chunk as the final chunk by setting the `NBD_REPLY_FLAG_DONE` flag, but @@ -2115,11 +2131,12 @@ The following request types exist: If the server advertised `NBD_FLAG_SEND_FAST_ZERO` but `NBD_CMD_FLAG_FAST_ZERO` is not set, then the server MUST NOT fail with `NBD_ENOTSUP`, even if the operation is no faster than a - corresponding `NBD_CMD_WRITE`. Conversely, if - `NBD_CMD_FLAG_FAST_ZERO` is set, the server MUST fail quickly with - `NBD_ENOTSUP` unless the request can be serviced in less time than - a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents - of the export when returning this failure. The server's + corresponding `NBD_CMD_WRITE`. Conversely, if `NBD_CMD_FLAG_FAST_ZERO` + is set, the server SHOULD NOT fail with `NBD_EOVERFLOW` regardless of + the client length, MUST fail quickly with `NBD_ENOTSUP` unless the + request can be serviced in less time than a corresponding + `NBD_CMD_WRITE`, and SHOULD NOT alter the contents of the export when + returning an `NBD_ENOTSUP` failure. The server's determination on whether to fail a fast request MAY depend on a number of factors, such as whether the request was suitably aligned, on whether the `NBD_CMD_FLAG_NO_HOLE` flag was present, @@ -2272,7 +2289,9 @@ SHOULD return `NBD_EPERM` if it receives a write or trim request on a read-only export. The server SHOULD NOT return `NBD_EOVERFLOW` except as documented in -response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported. +response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported, or when +a command without payload requests a length larger than an advertised +maximum block payload length. The server SHOULD NOT return `NBD_ENOTSUP` except as documented in response to `NBD_CMD_WRITE_ZEROES` when `NBD_CMD_FLAG_FAST_ZERO` is From patchwork Mon Nov 14 22:46:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703774 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=esytavvu; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB4tS4BWgz20KC for ; Tue, 15 Nov 2022 10:16:00 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieZ-00046U-6l; Mon, 14 Nov 2022 18:14:11 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidu-0001hU-3i for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiEJ-0002Bj-Ku for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:47:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466022; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUp5obJtvpmNtSidYS8qWBe5fkx0ql+rljzG1ZWCr2I=; b=esytavvuiWhEgWl3j0WYM9ZA7rADzrwUHW//coCLnHZ6le4/aMnDMyso36D6AN1aO5bePS IaP18JZl411ga7NhKvVDC/fcVqUxanPPD2N8yqc74rB51ZkFkQW+wDMTApWlW1UsxKULJq z7nwo0ejINCqsf8nOoWQRovqka66gDk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-529-QZGgsDmWNa2vQyc_I6EgCg-1; Mon, 14 Nov 2022 17:47:00 -0500 X-MC-Unique: QZGgsDmWNa2vQyc_I6EgCg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 928143C01DF3; Mon, 14 Nov 2022 22:47:00 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1B2692024CC0; Mon, 14 Nov 2022 22:47:00 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Subject: [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS Date: Mon, 14 Nov 2022 16:46:52 -0600 Message-Id: <20221114224655.2186173-4-eblake@redhat.com> In-Reply-To: <20221114224655.2186173-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224655.2186173-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Add a new negotiation feature where the client and server agree to use larger packet headers on every packet sent during transmission phase. This has two purposes: first, it makes it possible to perform operations like trim, write zeroes, and block status on more than 2^32 bytes in a single command. For NBD_CMD_READ, replies are still implicitly capped by the maximum block payload limits (generally 32M); if you want to know if a hole larger than 32 bits can represent, you'll use BLOCK_STATUS instead of hoping that a large READ will either return a hole or report overflow. But for NBD_CMD_BLOCK_STATUS, it is very useful to be able to report a status extent with a size larger than 32-bits, in some cases even if the client's request was for smaller than 32-bits (such as when it is known that the entire image is not sparse). Thus, the wording chosen here is careful to permit a server to use either flavor status chunk type in its reply, and clients requesting extended headers must be prepared for both reply types. Second, when structured replies are active, clients have to deal with the difference between 16- and 20-byte headers of simple vs. structured replies, which impacts performance if the client must perform multiple syscalls to first read the magic before knowing if there are even additional bytes to read to learn a payload length. In extended header mode, all headers are the same width and there are no simple replies permitted. The layout of the reply header is more like the request header; and including the client's offset in the reply makes it easier to convert between absolute and relative offsets for replies to NBD_CMD_READ. Similarly, by having extended mode use a power-of-2 sizing, it becomes easier to manipulate arrays of headers without worrying about an individual header crossing a cache line. However, note that this change only affects the headers; data payloads can still be unaligned (for example, a client performing 1-byte reads or writes). We would need to negotiate yet another extension if we wanted to ensure that all NBD transmission packets started on an 8-byte boundary after option haggling has completed. This spec addition was done in parallel with proof of concept implementations in qemu (server and client), libnbd (client), and nbdkit (server). Signed-off-by: Eric Blake --- doc/proto.md | 481 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 358 insertions(+), 123 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 53c334a..fde1e70 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -280,34 +280,53 @@ a soft disconnect. ### Transmission -There are three message types in the transmission phase: the request, -the simple reply, and the structured reply chunk. The +There are two general message types in the transmission phase: the +request (simple or extended), and the reply (simple, structured, or +extended). The determination of which message headers to use is +determined during handshaking phase, based on whether +`NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS` was requested +by the client and given a successful response by the server. The transmission phase consists of a series of transactions, where the client submits requests and the server sends corresponding replies with either a single simple reply or a series of one or more -structured reply chunks per request. The phase continues until -either side terminates transmission; this can be performed cleanly -only by the client. +structured or extended reply chunks per request. The phase continues +until either side terminates transmission; this can be performed +cleanly only by the client. Note that without client negotiation, the server MUST use only simple replies, and that it is impossible to tell by reading the server traffic in isolation whether a data field will be present; the simple reply is also problematic for error handling of the `NBD_CMD_READ` -request. Therefore, structured replies can be used to create a -a context-free server stream; see below. +request. Therefore, structured or extended replies can be used to +create a a context-free server stream; see below. + +The results of client negotiation also determine whether the client +and server will utilize only compact requests and replies, or whether +both sides will use only extended packets. Compact messages are the +default, but inherently limit single transactions to a 32-bit window +starting at a 64-bit offset. Extended messages make it possible to +perform 64-bit transactions (although typically only for commands that +do not include a data payload). Furthermore, when only structured +replies have been negotiated, compact messages require the client to +perform partial reads to determine which reply packet style (16-byte +simple or 20-byte structured) is on the wire before knowing the length +of the rest of the reply, which can reduce client performance. With +extended messages, all packet headers have a fixed length of 32 bytes, +and although this results in more traffic over the network, the +resulting layout is friendlier for performance. Replies need not be sent in the same order as requests (i.e., requests -may be handled by the server asynchronously), and structured reply -chunks from one request may be interleaved with reply messages from -other requests; however, there may be constraints that prevent -arbitrary reordering of structured reply chunks within a given reply. +may be handled by the server asynchronously), and structured or +extended reply chunks from one request may be interleaved with reply +messages from other requests; however, there may be constraints that +prevent arbitrary reordering of reply chunks within a given reply. Clients SHOULD use a handle that is distinct from all other currently pending transactions, but MAY reuse handles that are no longer in flight; handles need not be consecutive. In each reply message -(whether simple or structured), the server MUST use the same value for -handle as was sent by the client in the corresponding request. In -this way, the client can correlate which request is receiving a -response. +(whether simple, structured, or extended), the server MUST use the +same value for handle as was sent by the client in the corresponding +request. In this way, the client can correlate which request is +receiving a response. #### Ordering of messages and writes @@ -344,7 +363,10 @@ may be useful. #### Request message -The request message, sent by the client, looks as follows: +The compact request message is sent by the client when extended +transactions are not in use (either the client did not request +extended headers during negotiation, or the server responded that +`NBD_OPT_EXTENDED_HEADERS` is unsupported), and looks as follows: C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`) C: 16 bits, command flags @@ -354,19 +376,54 @@ C: 64 bits, offset (unsigned) C: 32 bits, length (unsigned) C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`) +In the compact style, the client MUST NOT use the +`NBD_CMD_FLAG_PAYLOAD_LEN` flag; and the only command where *length* +represents payload length instead of effect length is `NBD_CMD_WRITE`. + +If negotiation agreed on extended transactions with +`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests: + +C: 32 bits, 0x21e41c71, magic (`NBD_EXTENDED_REQUEST_MAGIC`) +C: 16 bits, command flags +C: 16 bits, type +C: 64 bits, handle +C: 64 bits, offset (unsigned) +C: 64 bits, payload/effect length (unsigned) +C: (*length* bytes of data if *flags* includes `NBD_CMD_FLAG_PAYLOAD_LEN`) + +With extended headers, the meaning of the *length* field depends on +whether *flags* contains `NBD_CMD_FLAG_PAYLOAD_LEN` (that many +additional bytes of payload are present), or if the flag is absent +(there is no payload, and *length* instead is an effect length +describing how much of the image the request operates on). The +command `NBD_CMD_WRITE` MUST use the flag `NBD_CMD_FLAG_PAYLOAD_LEN` +in this mode; while other commands SHOULD avoid the flag if the +server has not indicated extension suppport for payloads on that +command. A server SHOULD initiate hard disconnect if a client sets +the `NBD_CMD_FLAG_PAYLOAD_LEN` flag and uses a *length* larger than +a server's advertised or default maximum payload length (capped at +32 bits by the constraints of `NBD_INFO_BLOCK_SIZE`); in all other +cases, a server SHOULD gracefully consume *length* bytes of payload +(even if it then replies with an `NBD_EINVAL` failure because the +particular command was not expecting a payload), and proceed with +the next client command. Thus, only when *length* is used as an +effective length will it utilize a full 64-bit value. + #### Simple reply message The simple reply message MUST be sent by the server in response to all -requests if structured replies have not been negotiated using -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple +requests if neither structured replies (`NBD_OPT_STRUCTURED_REPLY`) +nor extended headers (`NBD_OPT_EXTENDED_HEADERS`) have been +negotiated. If structured replies have been negotiated, a simple reply MAY be used as a reply to any request other than `NBD_CMD_READ`, -but only if the reply has no data payload. The message looks as -follows: +but only if the reply has no data payload. If extended headers have +been negotiated, a simple reply MUST NOT be used. The message looks +as follows: S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be `NBD_REPLY_MAGIC`) S: 32 bits, error (MAY be zero) -S: 64 bits, handle +S: 64 bits, handle (MUST match the request) S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and *error* is zero) @@ -416,9 +473,9 @@ on the chunks received. A structured reply chunk message looks as follows: S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`) -S: 16 bits, flags +S: 16 bits, reply flags S: 16 bits, type -S: 64 bits, handle +S: 64 bits, handle (MUST match the request) S: 32 bits, length of payload (unsigned) S: *length* bytes of payload data (if *length* is nonzero) @@ -426,6 +483,45 @@ The use of *length* in the reply allows context-free division of the overall server traffic into individual reply messages; the *type* field describes how to further interpret the payload. +#### Extended reply chunk message + +One implementation drawback of the original structured replies via +`NBD_OPT_STRUCTURED_REPLY` is that a client must read the magic number +to know whether it will then be reading a 16-byte or 20-byte header +from the server. Another is that an array of 20-byte structures is +not always cache-line friendly, compared to power-of-2 sizing. +Therefore, a client may instead attempt to negotiate extended headers +via `NBD_OPT_EXTENDED_HEADERS` in place of structured replies. + +If extended headers are negotiated, all server replies MUST use an +extended reply; the server MUST NOT use simple replies even when there +is no payload. Most fields in the extended reply have the same +semantics as their counterparts in structured replies (including the +use of `NBD_REPLY_FLAG_DONE` to end a sequence of one or more chunks +comprising the overall reply). The extended reply has an additional +field *offset* which MUST match the *offset* of the client's request +(even in reply types such as `NBD_REPLY_TYPE_OFFSET_DATA` where the +payload itself contains a different offset representing the middle of +the buffer). + +The other difference between extended replies and structured replies +is that the *length* field is 64 bits to match the layout of the +extended request header. However, note that while the request flags +determine whether a request uses a 32-bit payload length or a 64-bit +effect length, at this time the reply length is used solely for +payload lengths and so in practice will never exceed a 32-bit value +(see `NBD_INFO_BLOCK_SIZE`). + +An extended reply chunk message looks as follows: + +S: 32 bits, 0x6e8a278c, magic (`NBD_EXTENDED_REPLY_MAGIC`) +S: 16 bits, reply flags +S: 16 bits, type +S: 64 bits, handle (MUST match the request) +S: 64 bits, offset (MUST match the request) +S: 64 bits, length of payload (unsigned) +S: *length* bytes of payload data (if *length* is nonzero) + #### Terminating the transmission phase There are two methods of terminating the transmission phase: @@ -838,18 +934,19 @@ that length is larger than the advertised maximum block payload size. For commands that require a payload in either direction and where the client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ` -without structured replies), the client MUST NOT use a length larger -than the maximum block size. For replies where the payload length is -controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag -`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured -replies are negotiated), the server MUST NOT send an individual reply -chunk larger than the maximum payload size, although it may split the -overall reply among several chunks. For commands that do not require -a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY -request a length larger than the maximum block size; the server SHOULD -NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL` -error if the oversize request would require more server resources than -the same command operating on only a maximum block size (such as some +without structured or extended replies), the client MUST NOT use a +length larger than the maximum block size. For replies where the +payload length is controlled by the server (`NBD_CMD_BLOCK_STATUS` +without the flag `NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when +structured replies or extended headers are negotiated), the server +MUST NOT send an individual reply chunk larger than the maximum +payload size, although it may split the overall reply among several +chunks. For commands that do not require a payload in either +direction (such as `NBD_CMD_TRIM`), the client MAY request a length +larger than the maximum block size; the server SHOULD NOT disconnect, +but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL` error if the +oversize request would require more server resources than the same +command operating on only a maximum block size (such as some implementations of `NBD_CMD_WRITE_ZEROES` without the flag `NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`). @@ -894,12 +991,11 @@ The procedure works as follows: - During transmission, a client can then indicate interest in metadata for a given region by way of the `NBD_CMD_BLOCK_STATUS` command, where *offset* and *length* indicate the area of interest. On - success, the server MUST respond with one structured reply chunk of - type `NBD_REPLY_TYPE_BLOCK_STATUS` per metadata context selected - during negotiation, where each reply chunk is a list of one or more - consecutive extents for that context. Each extent comes with a - *flags* field, the semantics of which are defined by the metadata - context. + success, the server MUST respond with one status type reply chunk + per metadata context selected during negotiation, where each reply + chunk is a list of one or more consecutive extents for that context. + Each extent comes with a *flags* field, the semantics of which are + defined by the metadata context. The client's requested *length* is only a hint to the server, so the cumulative extent length contained in a chunk of the server's reply @@ -920,9 +1016,10 @@ same export name for the subsequent `NBD_OPT_GO` (or sending `NBD_CMD_BLOCK_STATUS` without selecting at least one metadata context. -The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent as a -structured reply; this implies that in order to use metadata querying, -structured replies MUST be negotiated first. +The reply to a successful `NBD_CMD_BLOCK_STATUS` request MUST be sent +via reply chunks; this implies that in order to use metadata querying, +either structured replies or extended headers MUST be negotiated +first. Metadata contexts are identified by their names. The name MUST consist of a namespace, followed by a colon, followed by a leaf-name. The @@ -1097,12 +1194,12 @@ The field has the following format: - bit 5, `NBD_FLAG_SEND_TRIM`: exposes support for `NBD_CMD_TRIM`. - bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`: exposes support for `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_FLAG_NO_HOLE`. -- bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured reply. The - server MUST set this transmission flag to 1 if the - `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and - MUST leave this flag clear if structured replies have not been - negotiated. Clients MUST NOT set the `NBD_CMD_FLAG_DF` request - flag unless this transmission flag is set. +- bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured or extended + reply. The server MUST set this transmission flag to 1 if the + `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and MUST + leave this flag clear if neither structured replies nor extended + headers have been negotiated. Clients MUST NOT set the + `NBD_CMD_FLAG_DF` request flag unless this transmission flag is set. - bit 8, `NBD_FLAG_CAN_MULTI_CONN`: Indicates that the server operates entirely without cache, or that the cache it uses is shared among all connections to the given device. In particular, if this flag is @@ -1212,10 +1309,10 @@ of the newstyle negotiation. When this command succeeds, the server MUST NOT preserve any negotiation state (such as a request for - `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from - `NBD_OPT_SET_META_CONTEXT`) issued before this command. A client - SHOULD defer all stateful option requests until after it - determines whether encryption is available. + `NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS`, or + metadata contexts from `NBD_OPT_SET_META_CONTEXT`) issued before + this command. A client SHOULD defer all stateful option requests + until after it determines whether encryption is available. See the section on TLS above for further details. @@ -1296,6 +1393,10 @@ of the newstyle negotiation. `NBD_REP_ERR_BLOCK_SIZE_REQD` error SHOULD ensure it first sends an `NBD_INFO_BLOCK_SIZE` information reply in order to help avoid a potentially unnecessary round trip. + - `NBD_REP_ERR_EXT_HEADER_REQD`: The server requires the client to + utilize extended headers. While this may make it easier to + implement a server with fewer considerations for backwards + compatibility, it limits connections to only recent clients. Additionally, if TLS has not been initiated, the server MAY reply with `NBD_REP_ERR_TLS_REQD` (instead of `NBD_REP_ERR_UNKNOWN`) to @@ -1350,6 +1451,9 @@ of the newstyle negotiation. server MUST use structured replies to the `NBD_CMD_READ` transmission request. Other extensions that require structured replies may now be negotiated. + - `NBD_REP_ERR_EXT_HEADER_REQD`: The client has already + successfully negotiated extended headers, which takes precedence + over this option. - For backwards compatibility, clients SHOULD be prepared to also handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies will be sent. @@ -1357,9 +1461,10 @@ of the newstyle negotiation. It is envisioned that future extensions will add other new requests that may require a data payload in the reply. A server that supports such extensions SHOULD NOT advertise those - extensions until the client negotiates structured replies; and a - client MUST NOT make use of those extensions without first - enabling the `NBD_OPT_STRUCTURED_REPLY` extension. + extensions until the client has negotiated either structured + replies or extended headers; and a client MUST NOT make use of + those extensions without first enabling support for reply + payloads. If the client requests `NBD_OPT_STARTTLS` after this option, it MUST renegotiate structured replies and any other dependent @@ -1370,9 +1475,10 @@ of the newstyle negotiation. Return a list of `NBD_REP_META_CONTEXT` replies, one per context, followed by an `NBD_REP_ACK` or an error. - This option SHOULD NOT be requested unless structured replies have - been negotiated first. If a client attempts to do so, a server - MAY send `NBD_REP_ERR_INVALID`. + This option SHOULD NOT be requested unless structured replies or + extended headers have been negotiated first. If a client attempts + to do so, a server MAY send `NBD_REP_ERR_INVALID` or + `NBD_REP_ERR_EXT_HEADER_REQD`. Data: - 32 bits, length of export name. @@ -1454,9 +1560,10 @@ of the newstyle negotiation. they are interested in are selected with the final query that they sent. - This option MUST NOT be requested unless structured replies have - been negotiated first. If a client attempts to do so, a server - SHOULD send `NBD_REP_ERR_INVALID`. + This option MUST NOT be requested unless structured replies or + extended headers have been negotiated first. If a client attempts + to do so, a server SHOULD send `NBD_REP_ERR_INVALID` or + `NBD_REP_ERR_EXT_HEADER_REQD`. A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least @@ -1493,6 +1600,46 @@ of the newstyle negotiation. option does not select any metadata context, provided the client then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands. +* `NBD_OPT_EXTENDED_HEADERS` (11) + + The client wishes to use extended headers during the transmission + phase. The client MUST NOT send any additional data with the + option, and the server SHOULD reject a request that includes data + with `NBD_REP_ERR_INVALID`. + + When successful, this option takes precedence over structured + replies. A client MAY request structured replies first, although + a server SHOULD support this option even if structured replies are + not negotiated. + + It is envisioned that future extensions will add other new + requests that support a data payload in the request or reply. A + server that supports such extensions SHOULD NOT advertise those + extensions until the client has negotiated extended headers; and a + client MUST NOT make use of those extensions without first + enabling support for reply payloads. + + The server replies with the following, or with an error permitted + elsewhere in this document: + + - `NBD_REP_ACK`: Extended headers have been negotiated; the client + MUST use the 32-byte extended request header, with proper use of + `NBD_CMD_FLAG_PAYLOAD_LEN` for all commands sending a payload; + and the server MUST use the 32-byte extended reply header. + - For backwards compatibility, clients SHOULD be prepared to also + handle `NBD_REP_ERR_UNSUP`; in this case, only the compact + transmission headers will be used. + + Note that a response of `NBD_REP_ERR_BLOCK_SIZE_REQD` does not + make sense in response to this command, but a server MAY fail with + that error for a later `NBD_OPT_GO` without a client request for + `NBD_INFO_BLOCK_SIZE`, since the use of extended headers provides + more incentive for a client to promise to obey block size + constraints. + + If the client requests `NBD_OPT_STARTTLS` after this option, it + MUST renegotiate extended headers. + #### Option reply types These values are used in the "reply type" field, sent by the server @@ -1672,6 +1819,16 @@ case that data is an error message string suitable for display to the user. The request or the reply is too large to process. +* `NBD_REP_ERR_EXT_HEADER_REQD` (2^31 + 10) + + The server is unwilling to proceed with the given request (such as + `NBD_OPT_GO` or `NBD_OPT_SET_META_CONTEXT`) for a client that has + not yet requested extended headers (via + `NBD_OPT_EXTENDED_HEADERS`). This error is also used to indicate + that the server is unable to downgrade to structured replies (via + `NBD_OPT_STRUCTURED_REPLY`) if extended headers have already been + enabled. + ### Transmission phase #### Flag fields @@ -1725,6 +1882,17 @@ valid may depend on negotiation during the handshake phase. `NBD_CMD_WRITE`, then the server MUST fail quickly with an error of `NBD_ENOTSUP`. The client MUST NOT set this unless the server advertised `NBD_FLAG_SEND_FAST_ZERO`. +- bit 5, `NBD_CMD_FLAG_PAYLOAD_LEN`; only valid if extended headers + were negotiated via `NBD_OPT_EXTENDED_HEADERS`. If set, the + *length* field of the header describes a (nonzero) payload length of + data to be sent alongside the header; if the flag is clear, the + header *length* is instead an effect length for the command's + interaction with the export, and there is no payload after the + header. With extended headers, the flag MUST be set for + `NBD_CMD_WRITE` (as the write command always sends a payload of the + bytes to be written); for other commands, the flag will trigger an + `NBD_EINVAL` error unless the server has advertised support for an + extension payload form for the command. ##### Structured reply flags @@ -1746,13 +1914,15 @@ unrecognized flags. #### Structured reply types -These values are used in the "type" field of a structured reply. -Some chunk types can additionally be categorized by role, such as -*error chunks* or *content chunks*. Each type determines how to -interpret the "length" bytes of payload. If the client receives -an unknown or unexpected type, other than an *error chunk*, it -MUST initiate a hard disconnect. A server MUST NOT send a chunk -larger than any advertised maximum block payload size. +These values are used in the "type" field of a structured reply. Some +chunk types can additionally be categorized by role, such as *error +chunks*, *content chunks*, or *status chunks*. Each type determines +how to interpret the "length" bytes of payload. If the client +receives an unknown or unexpected type, other than an *error chunk*, +it MAY initiate a hard disconnect on the grounds that the client is +uncertain whether the server handled the request as desired. A server +MUST NOT send a chunk larger than any advertised maximum block payload +size. * `NBD_REPLY_TYPE_NONE` (0) @@ -1795,13 +1965,28 @@ larger than any advertised maximum block payload size. 64 bits: offset (unsigned) 32 bits: hole size (unsigned, MUST be nonzero) + At this time, although servers that support extended headers are + permitted to accept client requests for `NBD_CMD_READ` with an + effect length larger than any advertised maximum block payload size + by splitting the reply into multiple chunks, portable clients SHOULD + NOT request a read *length* larger than 32 bits (corresponding to + the maximum block payload constraint implied by + `NBD_INFO_BLOCK_SIZE`), and therefore a 32-bit constraint on the + *hole size* does not represent an arbitrary limitation. Should a + future scenario arise where it can be demonstrated that a client and + server would benefit from an extension allowing a maximum block + payload size to be larger than 32 bits, that extension would also + introduce a counterpart reply type that can express a 64-bit *hole + size*. + * `NBD_REPLY_TYPE_BLOCK_STATUS` (5) - *length* MUST be 4 + (a positive integer multiple of 8). This reply - represents a series of consecutive block descriptors where the sum - of the length fields within the descriptors is subject to further - constraints documented below. A successful block status request MUST - have exactly one status chunk per negotiated metadata context ID. + This chunk type is in the status chunk category. *length* MUST be + 4 + (a positive integer multiple of 8). This reply represents a + series of consecutive block descriptors where the sum of the length + fields within the descriptors is subject to further constraints + documented below. A successful block status request MUST have + exactly one status chunk per negotiated metadata context ID. The payload starts with: @@ -1843,6 +2028,43 @@ larger than any advertised maximum block payload size. extent information at the first offset not covered by a reduced-length reply. +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6) + + This chunk type is in the status chunk category. *length* MUST be + 8 + (a positive multiple of 16). The semantics of this chunk mirror + those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a + larger *extent length* field, added padding in each descriptor to + ease alignment, and the addition of a *descriptor count* field that + can be used for easier client processing. This chunk type MUST NOT + be used unless extended headers were negotiated with + `NBD_OPT_EXTENDED_HEADERS`. + + If the *descriptor count* field contains 0, the number of subsequent + descriptors is determined solely by the *length* field of the reply + header. However, the server MAY populate the *descriptor count* + field with the number of descriptors that follow; when doing this, + the server MUST ensure that the header *length* is equal to + *descriptor count* * 16 + 8. + + The payload starts with: + + 32 bits, metadata context ID + 32 bits, descriptor count + + and is followed by a list of one or more descriptors, each with this + layout: + + 64 bits, length of the extent to which the status below + applies (unsigned, MUST be nonzero) + 32 bits, padding (MUST be zero) + 32 bits, status flags + + Note that even when extended headers are in use, the client MUST be + prepared for the server to use either the compact or extended chunk + type, regardless of whether the client's hinted effect length was + more or less than 32 bits; but the server MUST use exactly one of + the two chunk types per negotiated metacontext ID. + All error chunk types have bit 15 set, and begin with the same *error*, *message length*, and optional *message* fields as `NBD_REPLY_TYPE_ERROR`. If nonzero, *message length* indicates @@ -1857,8 +2079,9 @@ remaining structured fields at the end. and the client MAY NOT make any assumptions about partial success. This type SHOULD NOT be used more than once in a structured reply. Valid as a reply to any request. Note that - *message length* MUST NOT exceed the 4096 bytes string length - limit. + *message length* MUST NOT exceed the 4096 bytes string length limit, + and therefore there is no need for a counterpart extended-length + error chunk type. The payload is structured as: @@ -1905,19 +2128,21 @@ The following request types exist: A read request. Length and offset define the data to be read. The server MUST reply with either a simple reply or a structured - reply, according to whether the structured replies have been - negotiated using `NBD_OPT_STRUCTURED_REPLY`. The client SHOULD NOT - request a read length of 0; the behavior of a server on such a + reply, according to whether structured replies + (`NBD_OPT_STRUCTURED_REPLY`) or extended headers + (`NBD_OPT_EXTENDED_HEADERS`) were negotiated. The client SHOULD + NOT request a read length of 0; the behavior of a server on such a request is unspecified although the server SHOULD NOT disconnect. *Simple replies* - If structured replies were not negotiated, then a read request - MUST always be answered by a simple reply, as documented above - (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing - length bytes of data according to the client's request), which in - turn means any client request with a length larger than the - maximum block payload size will fail. + If neither structured replies nor extended headers were + negotiated, then a read request MUST always be answered by a + simple reply, as documented above (using magic 0x67446698 + `NBD_SIMPLE_REPLY_MAGIC`, and containing length bytes of data + according to the client's request), which in turn means any client + request with a length larger than the maximum block payload size + will fail. If an error occurs, the server SHOULD set the appropriate error code in the error field. The server MAY then initiate a hard disconnect. @@ -1930,13 +2155,15 @@ The following request types exist: *Structured replies* - If structured replies are negotiated, then a read request MUST - result in a structured reply with one or more chunks (each using - magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final - chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following - additional constraints. + If structured replies or extended headers are negotiated, then a + read request MUST result in a reply with one or more structured + chunks (each using `NBD_STRUCTURED_REPLY_MAGIC` or + `NBD_EXTENDED_REPLY_MAGIC` according to what was negotiated), + where the final chunk has the flag `NBD_REPLY_FLAG_DONE`, and with + the following additional constraints. - The server MAY split the reply into any number of content chunks; + The server MAY split the reply into any number of content chunks + (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least one byte, although to minimize overhead, the server SHOULD use chunks with lengths and offsets as an integer multiple of 512 bytes, where possible (the first and @@ -1949,12 +2176,13 @@ The following request types exist: send additional content chunks even after reporting an error chunk. A server MAY support read requests larger than the maximum block payload size by splitting the response across multiple - chunks (in particular, a request for more than 2^32 - 8 bytes - containing data rather than holes MUST be split to avoid - overflowing the `NBD_REPLY_TYPE_OFFSET_DATA` length field); - however, the server is also permitted to reject large read - requests up front, so a client should be prepared to retry with - smaller requests if a large request fails. + chunks (in particular, if extended headers are not in use, a + request for more than 2^32 - 8 bytes containing data rather than + holes MUST be split to avoid overflowing the 32-bit + `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is + also permitted to reject large read requests up front, so a client + should be prepared to retry with smaller requests if a large + request fails. When no error is detected, the server MUST send enough data chunks to cover the entire region described by the offset and length of @@ -2180,10 +2408,10 @@ The following request types exist: * `NBD_CMD_BLOCK_STATUS` (7) - A block status query request. Length and offset define the range - of interest. The client SHOULD NOT request a status length of 0; - the behavior of a server on such a request is unspecified although - the server SHOULD NOT disconnect. + A block status query request. The effect length is a hint to the + server about the range of interest. The client SHOULD NOT request + a status length of 0; the behavior of a server on such a request + is unspecified although the server SHOULD NOT disconnect. A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least @@ -2191,28 +2419,29 @@ The following request types exist: same export name used to enter transmission phase, and where the server returned at least one metadata context without an error. This in turn requires the client to first negotiate structured - replies. For a successful return, the server MUST use a structured - reply, containing exactly one chunk of type - `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the - status field of each descriptor is determined by the flags field - as defined by the metadata context. The server MAY send chunks in - a different order than the context ids were assigned in reply to - `NBD_OPT_SET_META_CONTEXT`. + replies or extended headers. For a successful return, the server + MUST use one reply chunk per selected context id (only + `NBD_REPLY_TYPE_BLOCK_STATUS` for structured replies, and either + `NBD_REPLY_TYPE_BLOCK_STATUS` or `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` + for extended headers). The status field of each descriptor is + determined by the flags field as defined by the metadata context. + The server MAY send chunks in a different order than the context + ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`. - The list of block status descriptors within the - `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions - of the file starting from specified *offset*. If the client used - the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one - descriptor where the *length* of the descriptor MUST NOT be - greater than the *length* of the request; otherwise, a chunk MAY - contain multiple descriptors, and the final descriptor MAY extend - beyond the original requested size if the server can determine a - larger length without additional effort. On the other hand, the - server MAY return less data than requested. In particular, a - server SHOULD NOT send more than 2^20 status descriptors in a - single chunk. However the server MUST return at least one status - descriptor, and since each status descriptor has a non-zero - length, a client can always make progress on a successful return. + The list of block status descriptors within a given status chunk + represent consecutive portions of the file starting from specified + *offset*. If the client used the `NBD_CMD_FLAG_REQ_ONE` flag, + each chunk contains exactly one descriptor where the *length* of + the descriptor MUST NOT be greater than the *length* of the + request; otherwise, a chunk MAY contain multiple descriptors, and + the final descriptor MAY extend beyond the original requested size + if the server can determine a larger length without additional + effort. On the other hand, the server MAY return less data than + requested. In particular, a server SHOULD NOT send more than 2^20 + status descriptors in a single chunk. However the server MUST + return at least one status descriptor, and since each status + descriptor has a non-zero length, a client can always make + progress on a successful return. The server SHOULD use different *status* values between consecutive descriptors where feasible, although the client SHOULD @@ -2412,6 +2641,10 @@ implement the following features: Clients that do not implement querying for block size constraints SHOULD abide by the rules laid out in the section "Block size constraints", above. +- Servers that implement extended headers but desire interoperability + with older client implementations SHOULD NOT use the + `NBD_REP_ERR_EXT_HEADER_REQD` error response to `NBD_OPT_GO`, but + should gracefully support compact headers. ### Future considerations @@ -2421,6 +2654,8 @@ implementations are not yet ready to support them: - Structured replies; the Linux kernel currently does not yet implement them. +- Extended headers; these are similar to structured replies, but is + new enough that many clients and servers do not yet implement them. ## About this file From patchwork Mon Nov 14 22:46:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703773 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=TzpYaCRB; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB4sV3Qchz20KC for ; Tue, 15 Nov 2022 10:15:10 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieW-0003tg-NA; Mon, 14 Nov 2022 18:14:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidu-0001m0-1t for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiEL-0002CM-Ea for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:47:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uuaSAbLULdKS6n+Mze1ExTD67+B/CL/4oJ/FeniV1kM=; b=TzpYaCRBnu7b9EHLY5wjcyRqXPDOq4glz4MTf23dEXCv4a9YXyN8bpzzJtqZL3+pnLuzqO YpiH2s2FNbZHtr+hCKdsM2XaK7hAk3F6drGFdvu8ipVjB136q61G2k9/LfhPeNYzQrgWBc 7y//cytOcMqR0TeOLNmkRaV8kgaNJ7s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-533-dgxc94uaPS2864B-M07Ixw-1; Mon, 14 Nov 2022 17:47:01 -0500 X-MC-Unique: dgxc94uaPS2864B-M07Ixw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 23FBD811E84; Mon, 14 Nov 2022 22:47:01 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1DA12024CC0; Mon, 14 Nov 2022 22:47:00 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Subject: [PATCH v2 4/6] spec: Allow 64-bit block status results Date: Mon, 14 Nov 2022 16:46:53 -0600 Message-Id: <20221114224655.2186173-5-eblake@redhat.com> In-Reply-To: <20221114224655.2186173-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224655.2186173-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org There are some potential extension metadata contexts that would benefit from a 64-bit status value. For example, Zoned Block Devices (see https://zonedstorage.io/docs/linux/zbd-api) may want to return the relative offset of where the next write will occur within the zone, where a zone may be larger than 4G; creating a metacontext "zbd:offset" that returns a 64-bit offset seems nicer than creating two metacontexts "zbd:offset_lo" and "zbd:offset_hi" that each return only 32 bits of the answer. While the addition of extended headers superficially justified leaving room in NBD_REPLY_TYPE_BLOCK_STATUS_EXT for the purpose of alignment, it also has the nice benefit of being useful to allow extension metadata contexts that can actually take advantage of the padding (and remembering that since network byte order is big-endian, the padding is in the correct location). To ensure maximum backwards compatibility, require that all contexts in the "base:" namespace (so far, just "base:allocation") will only utilize 32-bit status. --- doc/proto.md | 62 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 49 insertions(+), 13 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index fde1e70..14af48d 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -987,7 +987,10 @@ The procedure works as follows: during transmission, the client MUST select one or more metadata contexts with the `NBD_OPT_SET_META_CONTEXT` command. If needed, the client can use `NBD_OPT_LIST_META_CONTEXT` to list contexts that the - server supports. + server supports. Most metadata contexts expose no more than 32 bits + of information, but some metadata contexts have associated data that + is 64 bits in length; using such contexts requires the client to + first negotiate extended headers with `NBD_OPT_EXTENDED_HEADERS`. - During transmission, a client can then indicate interest in metadata for a given region by way of the `NBD_CMD_BLOCK_STATUS` command, where *offset* and *length* indicate the area of interest. On @@ -1045,7 +1048,7 @@ third-party namespaces are currently registered: Save in respect of the `base:` namespace described below, this specification requires no specific semantics of metadata contexts, except that all the information they provide MUST be representable within the flags field as -defined for `NBD_REPLY_TYPE_BLOCK_STATUS`. Likewise, save in respect of +defined for `NBD_REPLY_TYPE_BLOCK_STATUS_EXT`. Likewise, save in respect of the `base:` namespace, the syntax of query strings is not specified by this document, other than the recommendation that the empty leaf-name makes sense as a wildcard for a client query during `NBD_OPT_LIST_META_CONTEXT`, @@ -1112,7 +1115,9 @@ should make no assumption as to its contents or stability. For the `base:allocation` context, the remainder of the flags field is reserved. Servers SHOULD set it to all-zero; clients MUST ignore -unknown flags. +unknown flags. Because fewer than 32 flags are defined, this metadata +context does not require the use of `NBD_OPT_EXTENDED_HEADERS`, and a +server can use `NBD_REPLY_TYPE_BLOCK_STATUS` to return results. ## Values @@ -1480,6 +1485,18 @@ of the newstyle negotiation. to do so, a server MAY send `NBD_REP_ERR_INVALID` or `NBD_REP_ERR_EXT_HEADER_REQD`. + A server MAY support extension contexts that produce status values + that require more than 32 bits. The server MAY advertise such + contexts even if the client has not yet negotiated extended + headers, although it SHOULD then conclude the overall response + with the `NBD_REP_ERR_EXT_HEADER_REQD` error to inform the client + that extended headers are required to make full use of all + contexts advertised. However, since none of the contexts defined + in the "base:" namespace provide more than 32 bits of status, a + server MUST NOT use this failure mode when the response is limited + to the "base:" namespace; nor may the server use this failure mode + when the client has already negotiated extended headers. + Data: - 32 bits, length of export name. - String, name of export for which we wish to list metadata @@ -1565,6 +1582,13 @@ of the newstyle negotiation. to do so, a server SHOULD send `NBD_REP_ERR_INVALID` or `NBD_REP_ERR_EXT_HEADER_REQD`. + If a client requests a metadata context that utilizes 64-bit + status, but has not yet negotiated extended headers, the server + MUST either omit that context from its successful reply, or else + fail the request with `NBD_REP_ERR_EXT_HEADER_REQD`. The server + MUST NOT use this failure for a client request that is limited to + contexts in the "base:" namespace. + A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least once, and where the final time it was sent, it referred to the @@ -2028,16 +2052,23 @@ size. extent information at the first offset not covered by a reduced-length reply. + For an extension metadata context that documents that the status + value may potentially occupy 64 bits, a server MUST NOT use this + reply type unless the most-significant 32 bits of all *status* + values included in this reply are all zeroes. Note that if the + client did not negotiate extended headers, then the server already + guaranteed during the handshake phase that no metadata contexts + utilizing a 64-bit status value were negotiated. + * `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6) This chunk type is in the status chunk category. *length* MUST be 8 + (a positive multiple of 16). The semantics of this chunk mirror those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a - larger *extent length* field, added padding in each descriptor to - ease alignment, and the addition of a *descriptor count* field that - can be used for easier client processing. This chunk type MUST NOT - be used unless extended headers were negotiated with - `NBD_OPT_EXTENDED_HEADERS`. + larger *extent length* field and a 64-bit *status* field, and the + addition of a *descriptor count* field that can be used for easier + client processing. This chunk type MUST NOT be used unless extended + headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`. If the *descriptor count* field contains 0, the number of subsequent descriptors is determined solely by the *length* field of the reply @@ -2056,14 +2087,19 @@ size. 64 bits, length of the extent to which the status below applies (unsigned, MUST be nonzero) - 32 bits, padding (MUST be zero) - 32 bits, status flags + 64 bits, status flags Note that even when extended headers are in use, the client MUST be prepared for the server to use either the compact or extended chunk - type, regardless of whether the client's hinted effect length was - more or less than 32 bits; but the server MUST use exactly one of - the two chunk types per negotiated metacontext ID. + type for metadata contexts, regardless of whether the client's + hinted effect length was more or less than 32 bits; but the server + MUST use exactly one of the two chunk types per negotiated + metacontext ID. However, the server MUST use the extended chunk + type when responding to an extension metadata context that utilizes + a 64-bit status code where the resulting *status* value is not + representable in 32 bits. For metadata contexts that only return a + 32-bit status (including all contexts in the "base:" namespace), the + most-significant 32 bits of *status* MUST be all zeroes. All error chunk types have bit 15 set, and begin with the same *error*, *message length*, and optional *message* fields as From patchwork Mon Nov 14 22:48:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703808 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=C8/Lw3vy; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB5ps6blFz23mH for ; Tue, 15 Nov 2022 10:57:56 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieR-0003cQ-Bq; Mon, 14 Nov 2022 18:14:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouids-0001o2-Dj for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGF-0002Mr-0b for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466142; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1NrkZVHAocJOLpKWjaSIw4pPOaz/H1tbWxK3Giv3VKM=; b=C8/Lw3vyDOgpxP8QdJTkqgtzQsNxoMq3D6FBJ/5gvpNsCk9XKofZG7QSBgpeWp5w/F4nHN IexFLZwH0QnLm2A/jYg+9mSEiXuCyyTkfYHr6S7H6Ey+KRItZ+alh98ldo72sJx4eaIoFj 28RZcmXFxA+xVzkVXl7/tjqL7cT3fLI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-183-ngUCwPe3OpGx83CvUIPf1w-1; Mon, 14 Nov 2022 17:48:58 -0500 X-MC-Unique: ngUCwPe3OpGx83CvUIPf1w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7FB3B101AA45; Mon, 14 Nov 2022 22:48:58 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0940D40AE7EF; Mon, 14 Nov 2022 22:48:57 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy Subject: [PATCH v2 05/15] nbd/server: Refactor handling of request payload Date: Mon, 14 Nov 2022 16:48:38 -0600 Message-Id: <20221114224848.2186298-6-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Upcoming additions to support NBD 64-bit effect lengths allow for the possibility to distinguish between payload length (capped at 32M) and effect length (up to 63 bits). Without that extension, only the NBD_CMD_WRITE request has a payload; but with the extension, it makes sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length (where the payload is a limited-size struct that in turns gives the real effect length as well as a subset of known ids for which status is requested). Other future NBD commands may also have a request payload, so the 64-bit extension introduces a new NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header length is a payload length or an effect length, rather than hard-coding the decision based on the command. Note that we do not support the payload version of BLOCK_STATUS yet. For this patch, no semantic change is intended for a compliant client. For a non-compliant client, it is possible that the error behavior changes (a different message, a change on whether the connection is killed or remains alive for the next command, or so forth), but all errors should still be handled gracefully. Signed-off-by: Eric Blake --- nbd/server.c | 53 ++++++++++++++++++++++++++++++++---------------- nbd/trace-events | 1 + 2 files changed, 37 insertions(+), 17 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 7738f5f899..ad5c2052b5 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2316,6 +2316,8 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, Error **errp) { NBDClient *client = req->client; + bool extended_with_payload; + int payload_len = 0; int valid_flags; int ret; @@ -2329,27 +2331,40 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, trace_nbd_co_receive_request_decode_type(request->handle, request->type, nbd_cmd_lookup(request->type)); - if (request->type != NBD_CMD_WRITE) { - /* No payload, we are ready to read the next request. */ - req->complete = true; - } - if (request->type == NBD_CMD_DISC) { /* Special case: we're going to disconnect without a reply, * whether or not flags, from, or len are bogus */ + req->complete = true; return -EIO; } + /* Payload and buffer handling. */ + extended_with_payload = client->extended_headers && + (request->flags & NBD_CMD_FLAG_PAYLOAD_LEN); if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE || - request->type == NBD_CMD_CACHE) - { + request->type == NBD_CMD_CACHE || extended_with_payload) { if (request->len > NBD_MAX_BUFFER_SIZE) { error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)", request->len, NBD_MAX_BUFFER_SIZE); return -EINVAL; } - if (request->type != NBD_CMD_CACHE) { + if (request->type == NBD_CMD_WRITE || extended_with_payload) { + payload_len = request->len; + if (request->type != NBD_CMD_WRITE) { + /* + * For now, we don't support payloads on other + * commands; but we can keep the connection alive. + */ + request->len = 0; + } else if (client->extended_headers && !extended_with_payload) { + /* The client is noncompliant. Trace it, but proceed. */ + trace_nbd_co_receive_ext_payload_compliance(request->from, + request->len); + } + } + + if (request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ) { req->data = blk_try_blockalign(client->exp->common.blk, request->len); if (req->data == NULL) { @@ -2359,18 +2374,20 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, } } - if (request->type == NBD_CMD_WRITE) { - assert(request->len <= NBD_MAX_BUFFER_SIZE); - if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data", - errp) < 0) - { + if (payload_len) { + if (req->data) { + ret = nbd_read(client->ioc, req->data, payload_len, + "CMD_WRITE data", errp); + } else { + ret = nbd_drop(client->ioc, payload_len, errp); + } + if (ret < 0) { return -EIO; } - req->complete = true; - trace_nbd_co_receive_request_payload_received(request->handle, - request->len); + payload_len); } + req->complete = true; /* Sanity checks. */ if (client->exp->nbdflags & NBD_FLAG_READ_ONLY && @@ -2400,7 +2417,9 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, client->check_align); } valid_flags = NBD_CMD_FLAG_FUA; - if (request->type == NBD_CMD_READ && client->structured_reply) { + if (request->type == NBD_CMD_WRITE && client->extended_headers) { + valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN; + } else if (request->type == NBD_CMD_READ && client->structured_reply) { valid_flags |= NBD_CMD_FLAG_DF; } else if (request->type == NBD_CMD_WRITE_ZEROES) { valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO; diff --git a/nbd/trace-events b/nbd/trace-events index e2c1d68688..adf5666e20 100644 --- a/nbd/trace-events +++ b/nbd/trace-events @@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'" nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)" nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64 +nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64 nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32 nbd_trip(void) "Reading request" From patchwork Mon Nov 14 22:48:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703822 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HOYPRgc7; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB6RQ5WWrz23mH for ; Tue, 15 Nov 2022 11:26:10 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie3-0002Hg-S2; Mon, 14 Nov 2022 18:13:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidq-0001df-Fe for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:26 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGS-0002RT-Qf for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466156; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jhXWmjjMMvetiJJGQOaMHI4cWIKTWMJVS4R37MRRDFY=; b=HOYPRgc7I/irY9HFg6PFajmibb2GPme/jYwHjLLKrRFxbE+JfONmviRpcTi4Z0bBN1pFpb tw2J3MFhiHSOFn8V1q3MNEAEWEJGEA1NFxVzCALCzOe4qiEEmk4o8ZFXHrXxvbo1zKz6v4 ZaM8Ag23rpdD9H6jyrihoA729i3ZV9k= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-557-HOZzzlkEMVK76COZWP2WeA-1; Mon, 14 Nov 2022 17:48:59 -0500 X-MC-Unique: HOZzzlkEMVK76COZWP2WeA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2E52D1C08792; Mon, 14 Nov 2022 22:48:59 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC33040AE7EF; Mon, 14 Nov 2022 22:48:58 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy Subject: [PATCH v2 06/15] nbd/server: Refactor to pass full request around Date: Mon, 14 Nov 2022 16:48:39 -0600 Message-Id: <20221114224848.2186298-7-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Part of NBD's 64-bit headers extension involves passing the client's requested offset back as part of the reply header (one reason for this change: converting absolute offsets stored in NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is easier if the absolute offset of the buffer is also available). This is a refactoring patch to pass the full request around the reply stack, rather than just the handle, so that later patches can then access request->from when extended headers are active. But for this patch, there are no semantic changes. Signed-off-by: Eric Blake --- nbd/server.c | 109 ++++++++++++++++++++++++++------------------------- 1 file changed, 55 insertions(+), 54 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index ad5c2052b5..4d1400430b 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1890,18 +1890,17 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov, } static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov, - uint64_t error, uint64_t handle) + uint64_t error, NBDRequest *request) { NBDSimpleReply *reply = iov->iov_base; iov->iov_len = sizeof(*reply); stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC); stl_be_p(&reply->error, error); - stq_be_p(&reply->handle, handle); + stq_be_p(&reply->handle, request->handle); } -static int nbd_co_send_simple_reply(NBDClient *client, - uint64_t handle, +static int nbd_co_send_simple_reply(NBDClient *client, NBDRequest *request, uint32_t error, void *data, size_t len, @@ -1914,16 +1913,16 @@ static int nbd_co_send_simple_reply(NBDClient *client, {.iov_base = data, .iov_len = len} }; - trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err), - len); - set_be_simple_reply(client, &iov[0], nbd_err, handle); + trace_nbd_co_send_simple_reply(request->handle, nbd_err, + nbd_err_lookup(nbd_err), len); + set_be_simple_reply(client, &iov[0], nbd_err, request); return nbd_co_send_iov(client, iov, len ? 2 : 1, errp); } static inline void set_be_chunk(NBDClient *client, struct iovec *iov, uint16_t flags, uint16_t type, - uint64_t handle, uint32_t length) + NBDRequest *request, uint32_t length) { NBDStructuredReplyChunk *chunk = iov->iov_base; @@ -1931,12 +1930,12 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov, stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC); stw_be_p(&chunk->flags, flags); stw_be_p(&chunk->type, type); - stq_be_p(&chunk->handle, handle); + stq_be_p(&chunk->handle, request->handle); stl_be_p(&chunk->length, length); } static int coroutine_fn nbd_co_send_structured_done(NBDClient *client, - uint64_t handle, + NBDRequest *request, Error **errp) { NBDReply hdr; @@ -1944,15 +1943,15 @@ static int coroutine_fn nbd_co_send_structured_done(NBDClient *client, {.iov_base = &hdr}, }; - trace_nbd_co_send_structured_done(handle); + trace_nbd_co_send_structured_done(request->handle); set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE, - NBD_REPLY_TYPE_NONE, handle, 0); + NBD_REPLY_TYPE_NONE, request, 0); return nbd_co_send_iov(client, iov, 1, errp); } static int coroutine_fn nbd_co_send_structured_read(NBDClient *client, - uint64_t handle, + NBDRequest *request, uint64_t offset, void *data, size_t size, @@ -1968,16 +1967,16 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client, }; assert(size); - trace_nbd_co_send_structured_read(handle, offset, data, size); + trace_nbd_co_send_structured_read(request->handle, offset, data, size); set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0, - NBD_REPLY_TYPE_OFFSET_DATA, handle, iov[1].iov_len + size); + NBD_REPLY_TYPE_OFFSET_DATA, request, iov[1].iov_len + size); stq_be_p(&chunk.offset, offset); return nbd_co_send_iov(client, iov, 3, errp); } static int coroutine_fn nbd_co_send_structured_error(NBDClient *client, - uint64_t handle, + NBDRequest *request, uint32_t error, const char *msg, Error **errp) @@ -1992,10 +1991,11 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client, }; assert(nbd_err); - trace_nbd_co_send_structured_error(handle, nbd_err, + trace_nbd_co_send_structured_error(request->handle, nbd_err, nbd_err_lookup(nbd_err), msg ? msg : ""); set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE, - NBD_REPLY_TYPE_ERROR, handle, iov[1].iov_len + iov[2].iov_len); + NBD_REPLY_TYPE_ERROR, request, + iov[1].iov_len + iov[2].iov_len); stl_be_p(&chunk.error, nbd_err); stw_be_p(&chunk.message_length, iov[2].iov_len); @@ -2007,7 +2007,7 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client, * reported to the client, at which point this function succeeds. */ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, - uint64_t handle, + NBDRequest *request, uint64_t offset, uint8_t *data, size_t size, @@ -2029,7 +2029,7 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, char *msg = g_strdup_printf("unable to check for holes: %s", strerror(-status)); - ret = nbd_co_send_structured_error(client, handle, -status, msg, + ret = nbd_co_send_structured_error(client, request, -status, msg, errp); g_free(msg); return ret; @@ -2044,12 +2044,12 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, {.iov_base = &chunk, .iov_len = sizeof(chunk)}, }; - trace_nbd_co_send_structured_read_hole(handle, offset + progress, + trace_nbd_co_send_structured_read_hole(request->handle, + offset + progress, pnum); set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0, - NBD_REPLY_TYPE_OFFSET_HOLE, - handle, iov[1].iov_len); + NBD_REPLY_TYPE_OFFSET_HOLE, request, iov[1].iov_len); stq_be_p(&chunk.offset, offset + progress); stl_be_p(&chunk.length, pnum); ret = nbd_co_send_iov(client, iov, 2, errp); @@ -2060,7 +2060,8 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, error_setg_errno(errp, -ret, "reading from file failed"); break; } - ret = nbd_co_send_structured_read(client, handle, offset + progress, + ret = nbd_co_send_structured_read(client, request, + offset + progress, data + progress, pnum, final, errp); } @@ -2212,7 +2213,7 @@ static int blockalloc_to_extents(BlockDriverState *bs, uint64_t offset, * @ea is converted to BE by the function * @last controls whether NBD_REPLY_FLAG_DONE is sent. */ -static int nbd_co_send_extents(NBDClient *client, uint64_t handle, +static int nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea, bool last, uint32_t context_id, Error **errp) { @@ -2226,18 +2227,18 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle, nbd_extent_array_convert_to_be(ea); - trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length, - last); + trace_nbd_co_send_extents(request->handle, ea->count, context_id, + ea->total_length, last); set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0, NBD_REPLY_TYPE_BLOCK_STATUS, - handle, iov[1].iov_len + iov[2].iov_len); + request, iov[1].iov_len + iov[2].iov_len); stl_be_p(&chunk.context_id, context_id); return nbd_co_send_iov(client, iov, 3, errp); } /* Get block status from the exported device and send it to the client */ -static int nbd_co_send_block_status(NBDClient *client, uint64_t handle, +static int nbd_co_send_block_status(NBDClient *client, NBDRequest *request, BlockDriverState *bs, uint64_t offset, uint32_t length, bool dont_fragment, bool last, uint32_t context_id, @@ -2254,10 +2255,10 @@ static int nbd_co_send_block_status(NBDClient *client, uint64_t handle, } if (ret < 0) { return nbd_co_send_structured_error( - client, handle, -ret, "can't get block status", errp); + client, request, -ret, "can't get block status", errp); } - return nbd_co_send_extents(client, handle, ea, last, context_id, errp); + return nbd_co_send_extents(client, request, ea, last, context_id, errp); } /* Populate @ea from a dirty bitmap. */ @@ -2292,7 +2293,7 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap, bdrv_dirty_bitmap_unlock(bitmap); } -static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle, +static int nbd_co_send_bitmap(NBDClient *client, NBDRequest *request, BdrvDirtyBitmap *bitmap, uint64_t offset, uint32_t length, bool dont_fragment, bool last, uint32_t context_id, Error **errp) @@ -2302,7 +2303,7 @@ static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle, bitmap_to_extents(bitmap, offset, length, ea); - return nbd_co_send_extents(client, handle, ea, last, context_id, errp); + return nbd_co_send_extents(client, request, ea, last, context_id, errp); } /* nbd_co_receive_request @@ -2440,16 +2441,16 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, * Returns 0 if connection is still live, -errno on failure to talk to client */ static coroutine_fn int nbd_send_generic_reply(NBDClient *client, - uint64_t handle, + NBDRequest *request, int ret, const char *error_msg, Error **errp) { if (client->structured_reply && ret < 0) { - return nbd_co_send_structured_error(client, handle, -ret, error_msg, + return nbd_co_send_structured_error(client, request, -ret, error_msg, errp); } else { - return nbd_co_send_simple_reply(client, handle, ret < 0 ? -ret : 0, + return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0, NULL, 0, errp); } } @@ -2470,7 +2471,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request, if (request->flags & NBD_CMD_FLAG_FUA) { ret = blk_co_flush(exp->common.blk); if (ret < 0) { - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "flush failed", errp); } } @@ -2478,26 +2479,26 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request, if (client->structured_reply && !(request->flags & NBD_CMD_FLAG_DF) && request->len) { - return nbd_co_send_sparse_read(client, request->handle, request->from, + return nbd_co_send_sparse_read(client, request, request->from, data, request->len, errp); } ret = blk_pread(exp->common.blk, request->from, request->len, data, 0); if (ret < 0) { - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "reading from file failed", errp); } if (client->structured_reply) { if (request->len) { - return nbd_co_send_structured_read(client, request->handle, + return nbd_co_send_structured_read(client, request, request->from, data, request->len, true, errp); } else { - return nbd_co_send_structured_done(client, request->handle, errp); + return nbd_co_send_structured_done(client, request, errp); } } else { - return nbd_co_send_simple_reply(client, request->handle, 0, + return nbd_co_send_simple_reply(client, request, 0, data, request->len, errp); } } @@ -2521,7 +2522,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request, ret = blk_co_preadv(exp->common.blk, request->from, request->len, NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH); - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "caching data failed", errp); } @@ -2553,7 +2554,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, assert(request->len <= NBD_MAX_BUFFER_SIZE); ret = blk_pwrite(exp->common.blk, request->from, request->len, data, flags); - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "writing to file failed", errp); case NBD_CMD_WRITE_ZEROES: @@ -2569,7 +2570,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, } ret = blk_pwrite_zeroes(exp->common.blk, request->from, request->len, flags); - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "writing to file failed", errp); case NBD_CMD_DISC: @@ -2578,7 +2579,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, case NBD_CMD_FLUSH: ret = blk_co_flush(exp->common.blk); - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "flush failed", errp); case NBD_CMD_TRIM: @@ -2586,12 +2587,12 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, if (ret >= 0 && request->flags & NBD_CMD_FLAG_FUA) { ret = blk_co_flush(exp->common.blk); } - return nbd_send_generic_reply(client, request->handle, ret, + return nbd_send_generic_reply(client, request, ret, "discard failed", errp); case NBD_CMD_BLOCK_STATUS: if (!request->len) { - return nbd_send_generic_reply(client, request->handle, -EINVAL, + return nbd_send_generic_reply(client, request, -EINVAL, "need non-zero length", errp); } assert(request->len <= UINT32_MAX); @@ -2600,7 +2601,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, int contexts_remaining = client->export_meta.count; if (client->export_meta.base_allocation) { - ret = nbd_co_send_block_status(client, request->handle, + ret = nbd_co_send_block_status(client, request, blk_bs(exp->common.blk), request->from, request->len, dont_fragment, @@ -2613,7 +2614,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, } if (client->export_meta.allocation_depth) { - ret = nbd_co_send_block_status(client, request->handle, + ret = nbd_co_send_block_status(client, request, blk_bs(exp->common.blk), request->from, request->len, dont_fragment, @@ -2629,7 +2630,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, if (!client->export_meta.bitmaps[i]) { continue; } - ret = nbd_co_send_bitmap(client, request->handle, + ret = nbd_co_send_bitmap(client, request, client->exp->export_bitmaps[i], request->from, request->len, dont_fragment, !--contexts_remaining, @@ -2643,7 +2644,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, return 0; } else { - return nbd_send_generic_reply(client, request->handle, -EINVAL, + return nbd_send_generic_reply(client, request, -EINVAL, "CMD_BLOCK_STATUS not negotiated", errp); } @@ -2651,7 +2652,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, default: msg = g_strdup_printf("invalid request type (%" PRIu32 ") received", request->type); - ret = nbd_send_generic_reply(client, request->handle, -EINVAL, msg, + ret = nbd_send_generic_reply(client, request, -EINVAL, msg, errp); g_free(msg); return ret; @@ -2712,7 +2713,7 @@ static coroutine_fn void nbd_trip(void *opaque) Error *export_err = local_err; local_err = NULL; - ret = nbd_send_generic_reply(client, request.handle, -EINVAL, + ret = nbd_send_generic_reply(client, &request, -EINVAL, error_get_pretty(export_err), &local_err); error_free(export_err); } else { From patchwork Mon Nov 14 22:48:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703782 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZAWo4+K3; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB57V1txbz23mH for ; Tue, 15 Nov 2022 10:27:17 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieC-0002jN-Gx; Mon, 14 Nov 2022 18:13:48 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001df-Vm for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGI-0002OZ-E4 for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466145; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zhjaJStUuLPK/R3vMxs8VI5rugu1a9ay15tYrbDlKAg=; b=ZAWo4+K3LUMODgmZdygtrBRlOr56IN5A0cWwtiJPVoCoFbMpWESpgQcNd9kKEGFun60zPX W/UM0PQ3KTTuRops/v3iZNpcXG8mop63XGpw3jDwTqa2bnBXx/Y0tWa4cxcn7QrMfXxUU0 GMalw3nMrgtO2xjSdbgQEHZQL+i6rbk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-86-c9vmJOToNCSgezP9DKJmmg-1; Mon, 14 Nov 2022 17:49:00 -0500 X-MC-Unique: c9vmJOToNCSgezP9DKJmmg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D1B791C087B7; Mon, 14 Nov 2022 22:48:59 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A32F40AE7F0; Mon, 14 Nov 2022 22:48:59 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy Subject: [PATCH v2 07/15] nbd/server: Initial support for extended headers Date: Mon, 14 Nov 2022 16:48:40 -0600 Message-Id: <20221114224848.2186298-8-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Even though the NBD spec has been altered to allow us to accept NBD_CMD_READ larger than the max payload size (provided our response is a hole or broken up over more than one data chunk), we are not planning to take advantage of that, and continue to cap NBD_CMD_READ to 32M regardless of header size. For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already supports 64-bit operations without any effort on our part. For NBD_CMD_BLOCK_STATUS, the client's length is a hint; the easiest approach for now is to truncate our answer back to 32 bits, which lets us delay the effort of implementing NBD_REPLY_TYPE_BLOCK_STATUS_EXT to a separate patch. Signed-off-by: Eric Blake --- nbd/nbd-internal.h | 7 ++- nbd/server.c | 132 +++++++++++++++++++++++++++++++++++---------- 2 files changed, 108 insertions(+), 31 deletions(-) diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h index 0016793ff4..f9fe0b6ce3 100644 --- a/nbd/nbd-internal.h +++ b/nbd/nbd-internal.h @@ -1,7 +1,7 @@ /* * NBD Internal Declarations * - * Copyright (C) 2016-2021 Red Hat, Inc. + * Copyright (C) 2016-2022 Red Hat, Inc. * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. @@ -35,8 +35,11 @@ * https://github.com/yoe/nbd/blob/master/doc/proto.md */ -/* Size of all NBD_OPT_*, without payload */ +/* Size of all compact NBD_CMD_*, without payload */ #define NBD_REQUEST_SIZE (4 + 2 + 2 + 8 + 8 + 4) +/* Size of all extended NBD_CMD_*, without payload */ +#define NBD_EXTENDED_REQUEST_SIZE (4 + 2 + 2 + 8 + 8 + 8) + /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */ #define NBD_REPLY_SIZE (4 + 4 + 8) /* Size of reply to NBD_OPT_EXPORT_NAME */ diff --git a/nbd/server.c b/nbd/server.c index 4d1400430b..b46655b4d8 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -141,7 +141,7 @@ struct NBDClient { uint32_t check_align; /* If non-zero, check for aligned client requests */ - bool structured_reply; + bool structured_reply; /* also set true if extended_headers is set */ bool extended_headers; NBDExportMetaContexts export_meta; @@ -1260,6 +1260,10 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) case NBD_OPT_STRUCTURED_REPLY: if (length) { ret = nbd_reject_length(client, false, errp); + } else if (client->extended_headers) { + ret = nbd_negotiate_send_rep_err( + client, NBD_REP_ERR_EXT_HEADER_REQD, errp, + "extended headers already negotiated"); } else if (client->structured_reply) { ret = nbd_negotiate_send_rep_err( client, NBD_REP_ERR_INVALID, errp, @@ -1276,6 +1280,19 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) errp); break; + case NBD_OPT_EXTENDED_HEADERS: + if (length) { + ret = nbd_reject_length(client, false, errp); + } else if (client->extended_headers) { + ret = nbd_negotiate_send_rep_err( + client, NBD_REP_ERR_INVALID, errp, + "extended headers already negotiated"); + } else { + ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); + client->structured_reply = client->extended_headers = true; + } + break; + default: ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp, "Unsupported option %" PRIu32 " (%s)", @@ -1411,11 +1428,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp) static int nbd_receive_request(NBDClient *client, NBDRequest *request, Error **errp) { - uint8_t buf[NBD_REQUEST_SIZE]; - uint32_t magic; + uint8_t buf[NBD_EXTENDED_REQUEST_SIZE]; + uint32_t magic, expect; int ret; + size_t size = client->extended_headers ? NBD_EXTENDED_REQUEST_SIZE + : NBD_REQUEST_SIZE; - ret = nbd_read_eof(client, buf, sizeof(buf), errp); + ret = nbd_read_eof(client, buf, size, errp); if (ret < 0) { return ret; } @@ -1423,13 +1442,21 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request, return -EIO; } - /* Request - [ 0 .. 3] magic (NBD_REQUEST_MAGIC) - [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, ...) - [ 6 .. 7] type (NBD_CMD_READ, ...) - [ 8 .. 15] handle - [16 .. 23] from - [24 .. 27] len + /* + * Compact request + * [ 0 .. 3] magic (NBD_REQUEST_MAGIC) + * [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, ...) + * [ 6 .. 7] type (NBD_CMD_READ, ...) + * [ 8 .. 15] handle + * [16 .. 23] from + * [24 .. 27] len + * Extended request + * [ 0 .. 3] magic (NBD_EXTENDED_REQUEST_MAGIC) + * [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...) + * [ 6 .. 7] type (NBD_CMD_READ, ...) + * [ 8 .. 15] handle + * [16 .. 23] from + * [24 .. 31] len */ magic = ldl_be_p(buf); @@ -1437,12 +1464,18 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request, request->type = lduw_be_p(buf + 6); request->handle = ldq_be_p(buf + 8); request->from = ldq_be_p(buf + 16); - request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */ + if (client->extended_headers) { + request->len = ldq_be_p(buf + 24); + expect = NBD_EXTENDED_REQUEST_MAGIC; + } else { + request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */ + expect = NBD_REQUEST_MAGIC; + } trace_nbd_receive_request(magic, request->flags, request->type, request->from, request->len); - if (magic != NBD_REQUEST_MAGIC) { + if (magic != expect) { error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic); return -EINVAL; } @@ -1890,14 +1923,37 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov, } static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov, - uint64_t error, NBDRequest *request) + uint32_t error, NBDStructuredError *err, + NBDRequest *request) { - NBDSimpleReply *reply = iov->iov_base; + if (client->extended_headers) { + NBDExtendedReplyChunk *chunk = iov->iov_base; - iov->iov_len = sizeof(*reply); - stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC); - stl_be_p(&reply->error, error); - stq_be_p(&reply->handle, request->handle); + iov->iov_len = sizeof(*chunk); + stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC); + stw_be_p(&chunk->flags, NBD_REPLY_FLAG_DONE); + stq_be_p(&chunk->handle, request->handle); + stq_be_p(&chunk->offset, request->from); + if (error) { + assert(!iov[1].iov_base); + iov[1].iov_base = err; + iov[1].iov_len = sizeof(*err); + stw_be_p(&chunk->type, NBD_REPLY_TYPE_ERROR); + stq_be_p(&chunk->length, sizeof(*err)); + stl_be_p(&err->error, error); + stw_be_p(&err->message_length, 0); + } else { + stw_be_p(&chunk->type, NBD_REPLY_TYPE_NONE); + stq_be_p(&chunk->length, 0); + } + } else { + NBDSimpleReply *reply = iov->iov_base; + + iov->iov_len = sizeof(*reply); + stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC); + stl_be_p(&reply->error, error); + stq_be_p(&reply->handle, request->handle); + } } static int nbd_co_send_simple_reply(NBDClient *client, NBDRequest *request, @@ -1908,30 +1964,44 @@ static int nbd_co_send_simple_reply(NBDClient *client, NBDRequest *request, { NBDReply hdr; int nbd_err = system_errno_to_nbd_errno(error); + NBDStructuredError err; struct iovec iov[] = { {.iov_base = &hdr}, {.iov_base = data, .iov_len = len} }; + assert(!len || !nbd_err); trace_nbd_co_send_simple_reply(request->handle, nbd_err, nbd_err_lookup(nbd_err), len); - set_be_simple_reply(client, &iov[0], nbd_err, request); + set_be_simple_reply(client, &iov[0], nbd_err, &err, request); - return nbd_co_send_iov(client, iov, len ? 2 : 1, errp); + return nbd_co_send_iov(client, iov, iov[1].iov_len ? 2 : 1, errp); } static inline void set_be_chunk(NBDClient *client, struct iovec *iov, uint16_t flags, uint16_t type, NBDRequest *request, uint32_t length) { - NBDStructuredReplyChunk *chunk = iov->iov_base; + if (client->extended_headers) { + NBDExtendedReplyChunk *chunk = iov->iov_base; - iov->iov_len = sizeof(*chunk); - stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC); - stw_be_p(&chunk->flags, flags); - stw_be_p(&chunk->type, type); - stq_be_p(&chunk->handle, request->handle); - stl_be_p(&chunk->length, length); + iov->iov_len = sizeof(*chunk); + stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC); + stw_be_p(&chunk->flags, flags); + stw_be_p(&chunk->type, type); + stq_be_p(&chunk->handle, request->handle); + stq_be_p(&chunk->offset, request->from); + stq_be_p(&chunk->length, length); + } else { + NBDStructuredReplyChunk *chunk = iov->iov_base; + + iov->iov_len = sizeof(*chunk); + stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC); + stw_be_p(&chunk->flags, flags); + stw_be_p(&chunk->type, type); + stq_be_p(&chunk->handle, request->handle); + stl_be_p(&chunk->length, length); + } } static int coroutine_fn nbd_co_send_structured_done(NBDClient *client, @@ -2595,7 +2665,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, return nbd_send_generic_reply(client, request, -EINVAL, "need non-zero length", errp); } - assert(request->len <= UINT32_MAX); + if (request->len > UINT32_MAX) { + /* For now, truncate our response to a 32 bit window */ + request->len = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES, + client->check_align ?: 1); + } if (client->export_meta.count) { bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE; int contexts_remaining = client->export_meta.count; From patchwork Mon Nov 14 22:48:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703776 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=YQJg2KGs; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB4yR4rn8z20KC for ; Tue, 15 Nov 2022 10:19:27 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieP-0003XR-94; Mon, 14 Nov 2022 18:14:01 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouids-0001hU-Da for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGE-0002Mp-RV for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466142; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=arqevMePL9jAQyKPEZW3LJy1uVj4/5W9Ai4SJGjTr2w=; b=YQJg2KGsDpCfrKQioAY42WpgV2XDPbwo0yoTbYS+GbPFKe39ZvYw/f/LbqMWwxRPaE2IJU v/skCYKJfodtFTN9+uVL2xp7R6B/8Pl3yKBZJnE8j/OELaxSTI/p0uF9jePezOYfYAO6S2 GjozBqK6X+kPmSddEH+tbbCn+Re2stQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-Kw0mlnUeNdWlBOVKCDboWw-1; Mon, 14 Nov 2022 17:49:00 -0500 X-MC-Unique: Kw0mlnUeNdWlBOVKCDboWw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7E7913C0DDD0; Mon, 14 Nov 2022 22:49:00 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0840C40AE7EF; Mon, 14 Nov 2022 22:48:59 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy Subject: [PATCH v2 08/15] nbd/server: Support 64-bit block status Date: Mon, 14 Nov 2022 16:48:41 -0600 Message-Id: <20221114224848.2186298-9-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The previous patch handled extended headers by truncating large block status requests from the client back to 32 bits. But this is not ideal; for cases where we can truly determine the status of the entire image quickly (for example, when reporting the entire image as non-sparse because we lack the ability to probe for holes), this causes more network traffic for the client to iterate through 4G chunks than for us to just report the entire image at once. For ease of implementation, if extended headers were negotiated, then we always reply with 64-bit block status replies, even when the result could have fit in the older 32-bit block status chunk (clients supporting extended headers have to be prepared for either chunk type, so temporarily reverting this patch proves whether a client is compliant). For now, all metacontexts that we know how to export never populate more than 32 bits of information, so we don't have to worry about NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we always send all zeroes for the upper 32 bits of status during NBD_CMD_BLOCK_STATUS. Note that we previously had some interesting size-juggling on call chains, such as: nbd_co_send_block_status(uint32_t length) -> blockstatus_to_extents(uint32_t bytes) -> bdrv_block_status_above(bytes, &uint64_t num) -> nbd_extent_array_add(uint64_t num) -> store num in 32-bit length But we were lucky that it never overflowed: bdrv_block_status_above never sets num larger than bytes, and we had previously been capping 'bytes' at 32 bits (either by the protocol, or in the previous patch with an explicit truncation). This patch adds some assertions that ensure we continue to avoid overflowing 32 bits for a narrow client, while fully utilizing 64-bits all the way through when the client understands that. Signed-off-by: Eric Blake --- nbd/server.c | 86 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 59 insertions(+), 27 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index b46655b4d8..f21f8098c1 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2145,20 +2145,30 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, } typedef struct NBDExtentArray { - NBDExtent *extents; + union { + NBDStructuredMeta id; + NBDStructuredMetaExt meta; + }; + union { + NBDExtent *narrow; + NBDExtentExt *extents; + }; unsigned int nb_alloc; unsigned int count; uint64_t total_length; + bool extended; /* Whether 64-bit extents are allowed */ bool can_add; bool converted_to_be; } NBDExtentArray; -static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc) +static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc, + bool extended) { NBDExtentArray *ea = g_new0(NBDExtentArray, 1); ea->nb_alloc = nb_alloc; - ea->extents = g_new(NBDExtent, nb_alloc); + ea->extents = g_new(NBDExtentExt, nb_alloc); + ea->extended = extended; ea->can_add = true; return ea; @@ -2172,17 +2182,37 @@ static void nbd_extent_array_free(NBDExtentArray *ea) G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free) /* Further modifications of the array after conversion are abandoned */ -static void nbd_extent_array_convert_to_be(NBDExtentArray *ea) +static void nbd_extent_array_convert_to_be(NBDExtentArray *ea, + uint32_t context_id, + struct iovec *iov) { int i; assert(!ea->converted_to_be); + assert(iov[0].iov_base == &ea->meta); + assert(iov[1].iov_base == ea->extents); ea->can_add = false; ea->converted_to_be = true; - for (i = 0; i < ea->count; i++) { - ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags); - ea->extents[i].length = cpu_to_be32(ea->extents[i].length); + stl_be_p(&ea->meta.context_id, context_id); + if (ea->extended) { + stl_be_p(&ea->meta.count, ea->count); + for (i = 0; i < ea->count; i++) { + ea->extents[i].length = cpu_to_be64(ea->extents[i].length); + ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags); + } + iov[0].iov_len = sizeof(ea->meta); + iov[1].iov_len = ea->count * sizeof(ea->extents[0]); + } else { + /* Conversion reduces memory usage, order of iteration matters */ + for (i = 0; i < ea->count; i++) { + assert(ea->extents[i].length <= UINT32_MAX); + assert((uint32_t) ea->extents[i].flags == ea->extents[i].flags); + ea->narrow[i].length = cpu_to_be32(ea->extents[i].length); + ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags); + } + iov[0].iov_len = sizeof(ea->id); + iov[1].iov_len = ea->count * sizeof(ea->narrow[0]); } } @@ -2196,19 +2226,23 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea) * would result in an incorrect range reported to the client) */ static int nbd_extent_array_add(NBDExtentArray *ea, - uint32_t length, uint32_t flags) + uint64_t length, uint32_t flags) { assert(ea->can_add); if (!length) { return 0; } + if (!ea->extended) { + assert(length <= UINT32_MAX); + } /* Extend previous extent if flags are the same */ if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) { - uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length; + uint64_t sum = length + ea->extents[ea->count - 1].length; - if (sum <= UINT32_MAX) { + assert(sum >= length); + if (sum <= UINT32_MAX || ea->extended) { ea->extents[ea->count - 1].length = sum; ea->total_length += length; return 0; @@ -2221,7 +2255,7 @@ static int nbd_extent_array_add(NBDExtentArray *ea, } ea->total_length += length; - ea->extents[ea->count] = (NBDExtent) {.length = length, .flags = flags}; + ea->extents[ea->count] = (NBDExtentExt) {.length = length, .flags = flags}; ea->count++; return 0; @@ -2288,21 +2322,20 @@ static int nbd_co_send_extents(NBDClient *client, NBDRequest *request, bool last, uint32_t context_id, Error **errp) { NBDReply hdr; - NBDStructuredMeta chunk; struct iovec iov[] = { {.iov_base = &hdr}, - {.iov_base = &chunk, .iov_len = sizeof(chunk)}, - {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])} + {.iov_base = &ea->meta}, + {.iov_base = ea->extents} }; - nbd_extent_array_convert_to_be(ea); + nbd_extent_array_convert_to_be(ea, context_id, &iov[1]); trace_nbd_co_send_extents(request->handle, ea->count, context_id, ea->total_length, last); set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0, - NBD_REPLY_TYPE_BLOCK_STATUS, + client->extended_headers ? NBD_REPLY_TYPE_BLOCK_STATUS_EXT + : NBD_REPLY_TYPE_BLOCK_STATUS, request, iov[1].iov_len + iov[2].iov_len); - stl_be_p(&chunk.context_id, context_id); return nbd_co_send_iov(client, iov, 3, errp); } @@ -2310,13 +2343,14 @@ static int nbd_co_send_extents(NBDClient *client, NBDRequest *request, /* Get block status from the exported device and send it to the client */ static int nbd_co_send_block_status(NBDClient *client, NBDRequest *request, BlockDriverState *bs, uint64_t offset, - uint32_t length, bool dont_fragment, + uint64_t length, bool dont_fragment, bool last, uint32_t context_id, Error **errp) { int ret; unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS; - g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents); + g_autoptr(NBDExtentArray) ea = + nbd_extent_array_new(nb_extents, client->extended_headers); if (context_id == NBD_META_ID_BASE_ALLOCATION) { ret = blockstatus_to_extents(bs, offset, length, ea); @@ -2343,7 +2377,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap, bdrv_dirty_bitmap_lock(bitmap); for (start = offset; - bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, INT32_MAX, + bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, + es->extended ? INT64_MAX : INT32_MAX, &dirty_start, &dirty_count); start = dirty_start + dirty_count) { @@ -2365,11 +2400,12 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap, static int nbd_co_send_bitmap(NBDClient *client, NBDRequest *request, BdrvDirtyBitmap *bitmap, uint64_t offset, - uint32_t length, bool dont_fragment, bool last, + uint64_t length, bool dont_fragment, bool last, uint32_t context_id, Error **errp) { unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS; - g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents); + g_autoptr(NBDExtentArray) ea = + nbd_extent_array_new(nb_extents, client->extended_headers); bitmap_to_extents(bitmap, offset, length, ea); @@ -2665,11 +2701,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, return nbd_send_generic_reply(client, request, -EINVAL, "need non-zero length", errp); } - if (request->len > UINT32_MAX) { - /* For now, truncate our response to a 32 bit window */ - request->len = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES, - client->check_align ?: 1); - } + assert(client->extended_headers || request->len <= UINT32_MAX); if (client->export_meta.count) { bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE; int contexts_remaining = client->export_meta.count; From patchwork Mon Nov 14 22:48:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703790 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=SQ+uNEuZ; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB5JD6XB8z23mY for ; Tue, 15 Nov 2022 10:34:52 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieD-0002p8-RJ; Mon, 14 Nov 2022 18:13:49 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001eb-VR for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGH-0002OJ-RT for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466145; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PS+9G0eQgrAzbNlU7enH1i26FFIUC7sSv8GZMmvWhks=; b=SQ+uNEuZXyWm6Qpz95tbZ4isdo07OOppHU1IAE/yomHsYImPOxt8Z+ddFUDjigvpGvnhKy 5cqc1fknOkjSC5CVf4SXtiku3s1YudXYiLivPcHZKzMLGHnfJ2Itp6BljQCMbm0LDNMy4m dA7nPOenfawk3M0rB3QPj6RBwLmYafU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-262-wRz522GnMMapbMDV41tsog-1; Mon, 14 Nov 2022 17:49:01 -0500 X-MC-Unique: wRz522GnMMapbMDV41tsog-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5A4B8101E14C; Mon, 14 Nov 2022 22:49:01 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id B81E440E42F2; Mon, 14 Nov 2022 22:49:00 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy , Kevin Wolf , Hanna Reitz Subject: [PATCH v2 09/15] nbd/client: Initial support for extended headers Date: Mon, 14 Nov 2022 16:48:42 -0600 Message-Id: <20221114224848.2186298-10-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Update the client code to be able to send an extended request, and parse an extended header from the server. Note that since we reject any structured reply with a too-large payload, we can always normalize a valid header back into the compact form, so that the caller need not deal with two branches of a union. Still, until a later patch lets the client negotiate extended headers, the code added here should not be reached. Note that because of the different magic numbers, it is just as easy to trace and then tolerate a non-compliant server sending the wrong header reply as it would be to insist that the server is compliant. The only caller to nbd_receive_reply() always passed NULL for errp; since we are changing the signature anyways, I decided to sink the decision to ignore errors one layer lower. Signed-off-by: Eric Blake --- include/block/nbd.h | 2 +- block/nbd.c | 3 +- nbd/client.c | 84 ++++++++++++++++++++++++++++++--------------- nbd/trace-events | 1 + 4 files changed, 61 insertions(+), 29 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 357121ce76..02e31b2261 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -363,7 +363,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info, Error **errp); int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr); int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc, - NBDReply *reply, Error **errp); + NBDReply *reply, bool ext_hdrs); int nbd_client(int fd); int nbd_disconnect(int fd); int nbd_errno_to_system_errno(int err); diff --git a/block/nbd.c b/block/nbd.c index 32681d2867..a8b1bc1054 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -457,7 +457,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle) /* We are under mutex and handle is 0. We have to do the dirty work. */ assert(s->reply.handle == 0); - ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, NULL); + ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, + s->info.extended_headers); if (ret <= 0) { ret = ret ? ret : -EIO; nbd_channel_error(s, ret); diff --git a/nbd/client.c b/nbd/client.c index 2480a48ec6..70f06ce637 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -1348,22 +1348,28 @@ int nbd_disconnect(int fd) int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr) { - uint8_t buf[NBD_REQUEST_SIZE]; + uint8_t buf[NBD_EXTENDED_REQUEST_SIZE]; + size_t len; - assert(!ext_hdr); - assert(request->len <= UINT32_MAX); trace_nbd_send_request(request->from, request->len, request->handle, request->flags, request->type, nbd_cmd_lookup(request->type)); - stl_be_p(buf, NBD_REQUEST_MAGIC); + stl_be_p(buf, ext_hdr ? NBD_EXTENDED_REQUEST_MAGIC : NBD_REQUEST_MAGIC); stw_be_p(buf + 4, request->flags); stw_be_p(buf + 6, request->type); stq_be_p(buf + 8, request->handle); stq_be_p(buf + 16, request->from); - stl_be_p(buf + 24, request->len); + if (ext_hdr) { + stq_be_p(buf + 24, request->len); + len = NBD_EXTENDED_REQUEST_SIZE; + } else { + assert(request->len <= UINT32_MAX); + stl_be_p(buf + 24, request->len); + len = NBD_REQUEST_SIZE; + } - return nbd_write(ioc, buf, sizeof(buf), NULL); + return nbd_write(ioc, buf, len, NULL); } /* nbd_receive_simple_reply @@ -1392,28 +1398,34 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply, /* nbd_receive_structured_reply_chunk * Read structured reply chunk except magic field (which should be already - * read). + * read). Normalize into the compact form. * Payload is not read. */ -static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, - NBDStructuredReplyChunk *chunk, +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *chunk, Error **errp) { int ret; + size_t len; + uint64_t payload_len; - assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC); + if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) { + len = sizeof(chunk->structured); + } else { + assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC); + len = sizeof(chunk->extended); + } ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic), - sizeof(*chunk) - sizeof(chunk->magic), "structured chunk", + len - sizeof(chunk->magic), "structured chunk", errp); if (ret < 0) { return ret; } - chunk->flags = be16_to_cpu(chunk->flags); - chunk->type = be16_to_cpu(chunk->type); - chunk->handle = be64_to_cpu(chunk->handle); - chunk->length = be32_to_cpu(chunk->length); + /* flags, type, and handle occupy same space between forms */ + chunk->structured.flags = be16_to_cpu(chunk->structured.flags); + chunk->structured.type = be16_to_cpu(chunk->structured.type); + chunk->structured.handle = be64_to_cpu(chunk->structured.handle); /* * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests @@ -1421,11 +1433,20 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, * this. Even if we stopped using REQ_ONE, sane servers will cap * the number of extents they return for block status. */ - if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) { + if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) { + payload_len = be32_to_cpu(chunk->structured.length); + } else { + /* For now, we are ignoring the extended header offset. */ + payload_len = be64_to_cpu(chunk->extended.length); + chunk->magic = NBD_STRUCTURED_REPLY_MAGIC; + } + if (payload_len > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) { error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long", - chunk->type, nbd_rep_lookup(chunk->type)); + chunk->structured.type, + nbd_rep_lookup(chunk->structured.type)); return -EINVAL; } + chunk->structured.length = payload_len; return 0; } @@ -1472,30 +1493,35 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size, /* nbd_receive_reply * - * Decreases bs->in_flight while waiting for a new reply. This yield is where - * we wait indefinitely and the coroutine must be able to be safely reentered - * for nbd_client_attach_aio_context(). + * Wait for a new reply. If this yields, the coroutine must be able to be + * safely reentered for nbd_client_attach_aio_context(). @ext_hdrs determines + * which reply magic we are expecting, although this normalizes the result + * so that the caller only has to work with compact headers. * * Returns 1 on success - * 0 on eof, when no data was read (errp is not set) - * negative errno on failure (errp is set) + * 0 on eof, when no data was read + * negative errno on failure */ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc, - NBDReply *reply, Error **errp) + NBDReply *reply, bool ext_hdrs) { int ret; const char *type; - ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp); + ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), NULL); if (ret <= 0) { return ret; } reply->magic = be32_to_cpu(reply->magic); + /* Diagnose but accept wrong-width header */ switch (reply->magic) { case NBD_SIMPLE_REPLY_MAGIC: - ret = nbd_receive_simple_reply(ioc, &reply->simple, errp); + if (ext_hdrs) { + trace_nbd_receive_wrong_header(reply->magic); + } + ret = nbd_receive_simple_reply(ioc, &reply->simple, NULL); if (ret < 0) { break; } @@ -1504,7 +1530,11 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc, reply->handle); break; case NBD_STRUCTURED_REPLY_MAGIC: - ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp); + case NBD_EXTENDED_REPLY_MAGIC: + if (ext_hdrs != (reply->magic == NBD_EXTENDED_REPLY_MAGIC)) { + trace_nbd_receive_wrong_header(reply->magic); + } + ret = nbd_receive_structured_reply_chunk(ioc, reply, NULL); if (ret < 0) { break; } @@ -1515,7 +1545,7 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc, reply->structured.length); break; default: - error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic); + trace_nbd_receive_wrong_header(reply->magic); return -EINVAL; } if (ret < 0) { diff --git a/nbd/trace-events b/nbd/trace-events index adf5666e20..c20df33a43 100644 --- a/nbd/trace-events +++ b/nbd/trace-events @@ -34,6 +34,7 @@ nbd_client_clear_socket(void) "Clearing NBD socket" nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }" nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }" nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }" +nbd_receive_wrong_header(uint32_t magic) "Server sent unexpected magic 0x%" PRIx32 # common.c nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL" From patchwork Mon Nov 14 22:48:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703796 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MLFIW/im; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB5Nm58ZCz23lt for ; Tue, 15 Nov 2022 10:38:48 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieF-0002vb-CM; Mon, 14 Nov 2022 18:13:51 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001m0-VS for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGH-0002OD-Fk for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466145; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dNpwGX13xWfOmVQIRvlGoXlq2hyxspUY7BXP43GdQ8I=; b=MLFIW/imWoze90C1UdDWFt64FcFEoWg2OSg+Shtpxoej1HJI+Niws+TzqMP34z0eTi8mXY MJyje0Wis1DMrj46v3z5QfSKIspMRWHs5AUtFe4mESE9Dqo0tU4kWjWf/RFTECnHPxUtHx lt6Szk6n+Mj0ypetMi334sIHm4ZRHOE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-330-TFGQOHzJMxOLsHPn4RMPlw-1; Mon, 14 Nov 2022 17:49:02 -0500 X-MC-Unique: TFGQOHzJMxOLsHPn4RMPlw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 321EE38164C1; Mon, 14 Nov 2022 22:49:02 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8E85140E42E3; Mon, 14 Nov 2022 22:49:01 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy , Kevin Wolf , Hanna Reitz Subject: [PATCH v2 10/15] nbd/client: Accept 64-bit block status chunks Date: Mon, 14 Nov 2022 16:48:43 -0600 Message-Id: <20221114224848.2186298-11-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Because we use NBD_CMD_FLAG_REQ_ONE with NBD_CMD_BLOCK_STATUS, a client in narrow mode should not be able to provoke a server into sending a block status result larger than the client's 32-bit request. But in extended mode, a 64-bit status request must be able to handle a 64-bit status result, once a future patch enables the client requesting extended mode. We can also tolerate a non-compliant server sending the new chunk even when it should not. In normal execution, we are only requesting "base:allocation" which never exceeds 32 bits. But during testing with x-dirty-bitmap, we can force qemu to connect to some other context that might have 64-bit status bit; however, we ignore those upper bits (other than mapping qemu:allocation-depth into something that 'qemu-img map --output=json' can expose), and since it is only testing, we really don't bother with checking whether more than the two least-significant bits are set. Signed-off-by: Eric Blake --- block/nbd.c | 38 +++++++++++++++++++++++++++----------- block/trace-events | 1 + 2 files changed, 28 insertions(+), 11 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index a8b1bc1054..44ab5437ea 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -608,13 +608,16 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s, */ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, NBDStructuredReplyChunk *chunk, - uint8_t *payload, uint64_t orig_length, - NBDExtent *extent, Error **errp) + uint8_t *payload, bool wide, + uint64_t orig_length, + NBDExtentExt *extent, Error **errp) { uint32_t context_id; + uint32_t count = 0; + size_t len = wide ? sizeof(*extent) : sizeof(NBDExtent); /* The server succeeded, so it must have sent [at least] one extent */ - if (chunk->length < sizeof(context_id) + sizeof(*extent)) { + if (chunk->length < sizeof(context_id) + wide * sizeof(count) + len) { error_setg(errp, "Protocol error: invalid payload for " "NBD_REPLY_TYPE_BLOCK_STATUS"); return -EINVAL; @@ -629,8 +632,14 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, return -EINVAL; } - extent->length = payload_advance32(&payload); - extent->flags = payload_advance32(&payload); + if (wide) { + count = payload_advance32(&payload); + extent->length = payload_advance64(&payload); + extent->flags = payload_advance64(&payload); + } else { + extent->length = payload_advance32(&payload); + extent->flags = payload_advance32(&payload); + } if (extent->length == 0) { error_setg(errp, "Protocol error: server sent status chunk with " @@ -670,7 +679,8 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s, * connection; just ignore trailing extents, and clamp things to * the length of our request. */ - if (chunk->length > sizeof(context_id) + sizeof(*extent)) { + if (count > 1 || + chunk->length > sizeof(context_id) + wide * sizeof(count) + len) { trace_nbd_parse_blockstatus_compliance("more than one extent"); } if (extent->length > orig_length) { @@ -1114,7 +1124,7 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t handle, uint64_t length, - NBDExtent *extent, + NBDExtentExt *extent, int *request_ret, Error **errp) { NBDReplyChunkIter iter; @@ -1131,6 +1141,11 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s, assert(nbd_reply_is_structured(&reply)); switch (chunk->type) { + case NBD_REPLY_TYPE_BLOCK_STATUS_EXT: + if (!s->info.extended_headers) { + trace_nbd_extended_headers_compliance("block_status_ext"); + } + /* fallthrough */ case NBD_REPLY_TYPE_BLOCK_STATUS: if (received) { nbd_channel_error(s, -EINVAL); @@ -1139,9 +1154,10 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s, } received = true; - ret = nbd_parse_blockstatus_payload(s, &reply.structured, - payload, length, extent, - &local_err); + ret = nbd_parse_blockstatus_payload( + s, &reply.structured, payload, + chunk->type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT, + length, extent, &local_err); if (ret < 0) { nbd_channel_error(s, ret); nbd_iter_channel_error(&iter, ret, &local_err); @@ -1369,7 +1385,7 @@ static int coroutine_fn nbd_client_co_block_status( int64_t *pnum, int64_t *map, BlockDriverState **file) { int ret, request_ret; - NBDExtent extent = { 0 }; + NBDExtentExt extent = { 0 }; BDRVNBDState *s = (BDRVNBDState *)bs->opaque; Error *local_err = NULL; diff --git a/block/trace-events b/block/trace-events index 48dbf10c66..b84a5155a3 100644 --- a/block/trace-events +++ b/block/trace-events @@ -168,6 +168,7 @@ iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, ui # nbd.c nbd_parse_blockstatus_compliance(const char *err) "ignoring extra data from non-compliant server: %s" nbd_structured_read_compliance(const char *type) "server sent non-compliant unaligned read %s chunk" +nbd_extended_headers_compliance(const char *type) "server sent non-compliant %s chunk without extended headers" nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s" nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s" nbd_client_handshake(const char *export_name) "export '%s'" From patchwork Mon Nov 14 22:51:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703777 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=IG3yljzp; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB4yp1QDRz20KC for ; Tue, 15 Nov 2022 10:19:46 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouidw-0001z3-Hw; Mon, 14 Nov 2022 18:13:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidn-0001X6-U1 for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:23 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiJI-00035l-MB for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:52:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3wuXOMmcxaCQ0KVeHxANmBOlmlfS1/Yq3oJL5YkSZMs=; b=IG3yljzpQqOgwq7fqCbztGQ0WRg+p+IkjP8N/Wx15722rUozfOypmCfNEYFYsdgD80uO17 zZkqP+acGWM5dcMhmUjz8tEmZ4i7M0FHP+g2Yey5vxtB3KhAdA313/dx3QWjjrjHnAe5Vx ogm0olehnoIg+F1FNJ24ioJlNWOwDmE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-139-PQSktL5oN4-3cO2YzQB8Sg-1; Mon, 14 Nov 2022 17:52:08 -0500 X-MC-Unique: PQSktL5oN4-3cO2YzQB8Sg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4A6B1101A54E; Mon, 14 Nov 2022 22:52:08 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id DA37F40E9787; Mon, 14 Nov 2022 22:52:07 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Subject: [libnbd PATCH v2 11/23] api: Add several functions for controlling extended headers Date: Mon, 14 Nov 2022 16:51:46 -0600 Message-Id: <20221114225158.2186742-12-eblake@redhat.com> In-Reply-To: <20221114225158.2186742-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114225158.2186742-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The new NBD_OPT_EXTENDED_HEADERS feature is worth using by default, but there may be cases where the user explicitly wants to stick with the older 32-bit headers. nbd_set_request_extended_headers() will let the client override the default, nbd_get_request_extended_headers() determines the current state of the request, and nbd_get_extended_headers_negotiated() determines what the client and server actually settled on. These use nbd_set_request_structured_replies() and friends as a template. Note that this patch just adds the API for controlling the defaults, but ignores the state variable; a later one will then tweak the state machine to actually request structured headers when the state variable is set, as well as add nbd_opt_extended_headers() for manual control. The intent is that because extended headers takes priority over structured replies, the functions will interact as follows during the automatic handshaking that takes place prior to nbd_opt_set_mode() relinquishing control to the client in negotiation mode: 1. client default: request_eh=1, request_sr=1 => try EXTENDED_HEADERS - server supports it: nothing further to do, use extended headers, but also report structured replies as active - server lacks it: behave like case 2 2. request_eh=0 (explicit client downgrade), request_sr=1 (left at default) => try STRUCTURED_REPLY - server supports it: expect structured replies - server lacks it: expect simple replies 3. request_sr=0 (explicit client downgrade), request_eh ignored => don't try either handshake - expect simple replies Client code that wants to manually force simple replies only has to disable structured replies; and only code that wants to disable extended headers but still use structured replies has to use the new nbd_set_request_extended_headers() API. Until a later patch adds an explicit API nbd_opt_extended_headers(), there is no way to request extended headers after structured replies are already negotiated. --- lib/internal.h | 1 + generator/API.ml | 115 ++++++++++++++++-- generator/states-newstyle-opt-starttls.c | 1 + .../states-newstyle-opt-structured-reply.c | 3 +- lib/handle.c | 23 ++++ python/t/110-defaults.py | 1 + python/t/120-set-non-defaults.py | 2 + ocaml/tests/test_110_defaults.ml | 2 + ocaml/tests/test_120_set_non_defaults.ml | 3 + golang/libnbd_110_defaults_test.go | 8 ++ golang/libnbd_120_set_non_defaults_test.go | 12 ++ 11 files changed, 160 insertions(+), 11 deletions(-) diff --git a/lib/internal.h b/lib/internal.h index b91fe6f6..1abb21cb 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -110,6 +110,7 @@ struct nbd_handle { char *tls_psk_file; /* PSK filename, NULL = no PSK */ /* Extended headers. */ + bool request_eh; /* Whether to request extended headers */ bool extended_headers; /* If we negotiated NBD_OPT_EXTENDED_HEADERS */ /* Desired metadata contexts. */ diff --git a/generator/API.ml b/generator/API.ml index aeee41fb..3d0289f6 100644 --- a/generator/API.ml +++ b/generator/API.ml @@ -808,6 +808,77 @@ "get_tls_psk_file", { }; *) + "set_request_extended_headers", { + default_call with + args = [Bool "request"]; ret = RErr; + permitted_states = [ Created ]; + shortdesc = "control use of extended headers"; + longdesc = "\ +By default, libnbd tries to negotiate extended headers with the +server, as this protocol extension permits the use of 64-bit +zero, trim, and block status actions. However, +for integration testing, it can be useful to clear this flag +rather than find a way to alter the server to fail the negotiation +request. + +For backwards compatibility, the setting of this knob is ignored +if L is also set to false, +since the use of extended headers implies structured replies."; + see_also = [Link "get_request_extended_headers"; + Link "set_handshake_flags"; Link "set_strict_mode"; + Link "get_extended_headers_negotiated"; + Link "zero"; Link "trim"; Link "cache"; + Link "block_status_64"; + Link "set_request_structured_replies"]; + }; + + "get_request_extended_headers", { + default_call with + args = []; ret = RBool; + may_set_error = false; + shortdesc = "see if extended headers are attempted"; + longdesc = "\ +Return the state of the request extended headers flag on this +handle. + +B If you want to find out if extended headers were actually +negotiated on a particular connection use +L instead."; + see_also = [Link "set_request_extended_headers"; + Link "get_extended_headers_negotiated"; + Link "get_request_extended_headers"]; + }; + + "get_extended_headers_negotiated", { + default_call with + args = []; ret = RBool; + permitted_states = [ Negotiating; Connected; Closed ]; + shortdesc = "see if extended headers are in use"; + longdesc = "\ +After connecting you may call this to find out if the connection is +using extended headers. + +When extended headers are not in use, commands are limited to a +32-bit length, even when the libnbd API uses a 64-bit parameter +to express the length. But even when extended headers are +supported, the server may enforce other limits, visible through +L. + +Note that when extended headers are negotiated, you should +prefer the use of L instead of +L if any of the meta contexts you requested +via L might return 64-bit status +values; however, all of the well-known meta contexts covered +by current C constants only return 32-bit +status."; + see_also = [Link "set_request_extended_headers"; + Link "get_request_extended_headers"; + Link "zero"; Link "trim"; Link "cache"; + Link "block_status_64"; Link "get_block_size"; + Link "get_protocol"; + Link "get_structured_replies_negotiated"]; + }; + "set_request_structured_replies", { default_call with args = [Bool "request"]; ret = RErr; @@ -821,12 +892,16 @@ "set_request_structured_replies", { rather than find a way to alter the server to fail the negotiation request. It is also useful to set this to false prior to using L if it is desired to control when to send -L during negotiation."; +L during negotiation. + +Note that setting this knob to false also disables any automatic +request for extended headers."; see_also = [Link "get_request_structured_replies"; Link "set_handshake_flags"; Link "set_strict_mode"; Link "opt_structured_reply"; Link "get_structured_replies_negotiated"; - Link "can_meta_context"; Link "can_df"]; + Link "can_meta_context"; Link "can_df"; + Link "set_request_extended_headers"]; }; "get_request_structured_replies", { @@ -842,7 +917,8 @@ "get_request_structured_replies", { negotiated on a particular connection use L instead."; see_also = [Link "set_request_structured_replies"; - Link "get_structured_replies_negotiated"]; + Link "get_structured_replies_negotiated"; + Link "get_request_extended_headers"]; }; "get_structured_replies_negotiated", { @@ -854,11 +930,17 @@ "get_structured_replies_negotiated", { After connecting you may call this to find out if the connection is using structured replies. Note that this setting is sticky; this can return true even after a second L -returns false because the server detected a duplicate request."; +returns false because the server detected a duplicate request. + +Note that if the connection negotiates extended headers, this +function returns true (as extended headers imply structured +replies) even if no explicit request for structured replies was +attempted."; see_also = [Link "set_request_structured_replies"; Link "get_request_structured_replies"; Link "opt_structured_reply"; - Link "get_protocol"]; + Link "get_protocol"; + Link "get_extended_headers_negotiated"]; }; "set_request_meta_context", { @@ -2575,7 +2657,9 @@ "trim", { or there is an error. Note this will generally return an error if L is false or L is true. -Note that not all servers can support a C of 4GiB or larger. +Note that not all servers can support a C of 4GiB or larger; +L indicates which servers +will parse a request larger than 32 bits. The NBD protocol does not yet have a way for a client to learn if the server will enforce an even smaller maximum trim size, although a future extension may add a constraint visible in @@ -2606,7 +2690,9 @@ "cache", { this command. Note this will generally return an error if L is false. -Note that not all servers can support a C of 4GiB or larger. +Note that not all servers can support a C of 4GiB or larger; +L indicates which servers +will parse a request larger than 32 bits. The NBD protocol does not yet have a way for a client to learn if the server will enforce an even smaller maximum cache size, although a future extension may add a constraint visible in @@ -2635,7 +2721,9 @@ "zero", { or there is an error. Note this will generally return an error if L is false or L is true. -Note that not all servers can support a C of 4GiB or larger. +Note that not all servers can support a C of 4GiB or larger; +L indicates which servers +will parse a request larger than 32 bits. The NBD protocol does not yet have a way for a client to learn if the server will enforce an even smaller maximum zero size, although a future extension may add a constraint visible in @@ -2675,7 +2763,9 @@ "block_status", { are supported, the number of blocks and cumulative length of those blocks need not be identical between contexts. -Note that not all servers can support a C of 4GiB or larger. +Note that not all servers can support a C of 4GiB or larger; +L indicates which servers +will parse a request larger than 32 bits. The NBD protocol does not yet have a way for a client to learn if the server will enforce an even smaller maximum block status size, although a future extension may add a constraint visible in @@ -2753,7 +2843,9 @@ "block_status_64", { are supported, the number of blocks and cumulative length of those blocks need not be identical between contexts. -Note that not all servers can support a C of 4GiB or larger. +Note that not all servers can support a C of 4GiB or larger; +L indicates which servers +will parse a request larger than 32 bits. The NBD protocol does not yet have a way for a client to learn if the server will enforce an even smaller maximum block status size, although a future extension may add a constraint visible in @@ -3972,6 +4064,9 @@ let first_version = "aio_opt_starttls", (1, 16); "block_status_64", (1, 16); "aio_block_status_64", (1, 16); + "set_request_extended_headers", (1, 16); + "get_request_extended_headers", (1, 16); + "get_extended_headers_negotiated", (1, 16); (* These calls are proposed for a future version of libnbd, but * have not been added to any released version so far. diff --git a/generator/states-newstyle-opt-starttls.c b/generator/states-newstyle-opt-starttls.c index 75fce085..6d528672 100644 --- a/generator/states-newstyle-opt-starttls.c +++ b/generator/states-newstyle-opt-starttls.c @@ -86,6 +86,7 @@ NEWSTYLE.OPT_STARTTLS.CHECK_REPLY: } nbd_internal_reset_size_and_flags (h); h->structured_replies = false; + h->extended_headers = false; h->meta_valid = false; new_sock = nbd_internal_crypto_create_session (h, h->sock); if (new_sock == NULL) { diff --git a/generator/states-newstyle-opt-structured-reply.c b/generator/states-newstyle-opt-structured-reply.c index 46b4161b..4608a38a 100644 --- a/generator/states-newstyle-opt-structured-reply.c +++ b/generator/states-newstyle-opt-structured-reply.c @@ -25,7 +25,7 @@ NEWSTYLE.OPT_STRUCTURED_REPLY.START: assert (h->opt_mode); else { assert (CALLBACK_IS_NULL (h->opt_cb.completion)); - if (!h->request_sr) { + if (!h->request_sr || h->structured_replies) { if (h->opt_mode) SET_NEXT_STATE (%.NEGOTIATING); else @@ -84,6 +84,7 @@ NEWSTYLE.OPT_STRUCTURED_REPLY.CHECK_REPLY: err = 0; break; case NBD_REP_ERR_INVALID: + case NBD_REP_ERR_EXT_HEADER_REQD: err = EINVAL; /* fallthrough */ default: diff --git a/lib/handle.c b/lib/handle.c index 41e442e7..5cf7956e 100644 --- a/lib/handle.c +++ b/lib/handle.c @@ -63,6 +63,7 @@ nbd_create (void) h->unique = 1; h->tls_verify_peer = true; + h->request_eh = true; h->request_sr = true; h->request_meta = true; h->request_block_size = true; @@ -384,6 +385,28 @@ nbd_unlocked_clear_meta_contexts (struct nbd_handle *h) return 0; } + +int +nbd_unlocked_set_request_extended_headers (struct nbd_handle *h, + bool request) +{ + h->request_eh = request; + return 0; +} + +/* NB: may_set_error = false. */ +int +nbd_unlocked_get_request_extended_headers (struct nbd_handle *h) +{ + return h->request_eh; +} + +int +nbd_unlocked_get_extended_headers_negotiated (struct nbd_handle *h) +{ + return h->extended_headers; +} + int nbd_unlocked_set_request_structured_replies (struct nbd_handle *h, bool request) diff --git a/python/t/110-defaults.py b/python/t/110-defaults.py index 6b62c8a4..bada47de 100644 --- a/python/t/110-defaults.py +++ b/python/t/110-defaults.py @@ -21,6 +21,7 @@ assert h.get_export_name() == "" assert h.get_full_info() is False assert h.get_tls() == nbd.TLS_DISABLE +assert h.get_request_extended_headers() is True assert h.get_request_structured_replies() is True assert h.get_request_meta_context() is True assert h.get_request_block_size() is True diff --git a/python/t/120-set-non-defaults.py b/python/t/120-set-non-defaults.py index 8b48b4ad..2a7bd928 100644 --- a/python/t/120-set-non-defaults.py +++ b/python/t/120-set-non-defaults.py @@ -31,6 +31,8 @@ if h.supports_tls(): h.set_tls(nbd.TLS_ALLOW) assert h.get_tls() == nbd.TLS_ALLOW +h.set_request_extended_headers(False) +assert h.get_request_extended_headers() is False h.set_request_structured_replies(False) assert h.get_request_structured_replies() is False h.set_request_meta_context(False) diff --git a/ocaml/tests/test_110_defaults.ml b/ocaml/tests/test_110_defaults.ml index 768ad636..cc0b492e 100644 --- a/ocaml/tests/test_110_defaults.ml +++ b/ocaml/tests/test_110_defaults.ml @@ -26,6 +26,8 @@ let assert (not info); let tls = NBD.get_tls nbd in assert (tls = NBD.TLS.DISABLE); + let eh = NBD.get_request_extended_headers nbd in + assert (eh = true); let sr = NBD.get_request_structured_replies nbd in assert sr; let meta = NBD.get_request_meta_context nbd in diff --git a/ocaml/tests/test_120_set_non_defaults.ml b/ocaml/tests/test_120_set_non_defaults.ml index 76ff82fb..efb6817a 100644 --- a/ocaml/tests/test_120_set_non_defaults.ml +++ b/ocaml/tests/test_120_set_non_defaults.ml @@ -39,6 +39,9 @@ let let tls = NBD.get_tls nbd in assert (tls = NBD.TLS.ALLOW); ); + NBD.set_request_extended_headers nbd false; + let eh = NBD.get_request_extended_headers nbd in + assert (eh = false); NBD.set_request_structured_replies nbd false; let sr = NBD.get_request_structured_replies nbd in assert (not sr); diff --git a/golang/libnbd_110_defaults_test.go b/golang/libnbd_110_defaults_test.go index d7ad319c..bba94df8 100644 --- a/golang/libnbd_110_defaults_test.go +++ b/golang/libnbd_110_defaults_test.go @@ -51,6 +51,14 @@ func Test110Defaults(t *testing.T) { t.Fatalf("unexpected tls state") } + eh, err := h.GetRequestExtendedHeaders() + if err != nil { + t.Fatalf("could not get extended headers state: %s", err) + } + if eh != true { + t.Fatalf("unexpected extended headers state") + } + sr, err := h.GetRequestStructuredReplies() if err != nil { t.Fatalf("could not get structured replies state: %s", err) diff --git a/golang/libnbd_120_set_non_defaults_test.go b/golang/libnbd_120_set_non_defaults_test.go index 06bb29d5..5b63bc7b 100644 --- a/golang/libnbd_120_set_non_defaults_test.go +++ b/golang/libnbd_120_set_non_defaults_test.go @@ -81,6 +81,18 @@ func Test120SetNonDefaults(t *testing.T) { } } + err = h.SetRequestExtendedHeaders(false) + if err != nil { + t.Fatalf("could not set extended headers state: %s", err) + } + eh, err := h.GetRequestExtendedHeaders() + if err != nil { + t.Fatalf("could not get extended headers state: %s", err) + } + if eh != false { + t.Fatalf("unexpected extended headers state") + } + err = h.SetRequestStructuredReplies(false) if err != nil { t.Fatalf("could not set structured replies state: %s", err) From patchwork Mon Nov 14 22:48:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703824 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ShKMFgve; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB6Zw5xNjz20KC for ; Tue, 15 Nov 2022 11:32:40 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie6-0002Qt-O3; Mon, 14 Nov 2022 18:13:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001hU-6l for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:27 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGM-0002Pq-9v for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466149; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XyzxNnPOskLIUQT0IYlcm+4U9bzcrXQAt9wSC9NhVmI=; b=ShKMFgve7yckXHNcwDGuQcLKfJExKwA4DlzT5AU1ry5o0T5BEjT2wvBRsv3u3ykjcitDlm pcY9nF5i8PX6XDf7c1fxXvuaRN5WBip+sgSNW+DvWlMOk2RFA75wrg11njcRVZRuwycdB4 dNet9eUW+YaNzxjZNezc15BRdLBilew= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-MElPXoAmM-2NVGbsZaveEQ-1; Mon, 14 Nov 2022 17:49:04 -0500 X-MC-Unique: MElPXoAmM-2NVGbsZaveEQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DDF7B1C0878D; Mon, 14 Nov 2022 22:49:03 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 48CD140E42EF; Mon, 14 Nov 2022 22:49:03 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy , Kevin Wolf , Hanna Reitz Subject: [PATCH v2 12/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Date: Mon, 14 Nov 2022 16:48:45 -0600 Message-Id: <20221114224848.2186298-13-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The next commit will add support for the new addition of NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can request that the server only return a subset of negotiated contexts, rather than all contexts. To make that task easier, this patch populates the list of contexts to return on a per-command basis (for now, identical to the full set of negotiated contexts). Signed-off-by: Eric Blake --- include/block/nbd.h | 20 +++++++- nbd/server.c | 108 +++++++++++++++++++++++--------------------- 2 files changed, 75 insertions(+), 53 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 02e31b2261..9a8ac1c8a5 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -50,8 +50,23 @@ typedef struct NBDOptionReplyMetaContext { /* metadata context name follows */ } QEMU_PACKED NBDOptionReplyMetaContext; -/* Transmission phase structs - * +/* Transmission phase structs */ + +/* + * NBDMetaContexts represents a list of meta contexts in use, as + * selected by NBD_OPT_SET_META_CONTEXT. Also used for + * NBD_OPT_LIST_META_CONTEXT, and payload filtering in + * NBD_CMD_BLOCK_STATUS. + */ +typedef struct NBDMetaContexts { + size_t count; /* number of negotiated contexts */ + bool base_allocation; /* export base:allocation context (block status) */ + bool allocation_depth; /* export qemu:allocation-depth */ + size_t nr_bitmaps; /* Length of bitmaps array */ + bool *bitmaps; /* export qemu:dirty-bitmap: */ +} NBDMetaContexts; + +/* * Note: NBDRequest is _NOT_ the same as the network representation of an NBD * request! */ @@ -61,6 +76,7 @@ typedef struct NBDRequest { uint64_t len; /* Effect length; 32 bit limit without extended headers */ uint16_t flags; /* NBD_CMD_FLAG_* */ uint16_t type; /* NBD_CMD_* */ + NBDMetaContexts contexts; /* Used by NBD_CMD_BLOCK_STATUS */ } NBDRequest; typedef struct NBDSimpleReply { diff --git a/nbd/server.c b/nbd/server.c index f21f8098c1..1fd1f32028 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -103,20 +103,6 @@ struct NBDExport { static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports); -/* NBDExportMetaContexts represents a list of contexts to be exported, - * as selected by NBD_OPT_SET_META_CONTEXT. Also used for - * NBD_OPT_LIST_META_CONTEXT. */ -typedef struct NBDExportMetaContexts { - NBDExport *exp; - size_t count; /* number of negotiated contexts */ - bool base_allocation; /* export base:allocation context (block status) */ - bool allocation_depth; /* export qemu:allocation-depth */ - bool *bitmaps; /* - * export qemu:dirty-bitmap:, - * sized by exp->nr_export_bitmaps - */ -} NBDExportMetaContexts; - struct NBDClient { int refcount; void (*close_fn)(NBDClient *client, bool negotiated); @@ -143,7 +129,8 @@ struct NBDClient { bool structured_reply; /* also set true if extended_headers is set */ bool extended_headers; - NBDExportMetaContexts export_meta; + NBDExport *context_exp; /* export of last OPT_SET_META_CONTEXT */ + NBDMetaContexts contexts; /* Negotiated meta contexts */ uint32_t opt; /* Current option being negotiated */ uint32_t optlen; /* remaining length of data in ioc for the option being @@ -456,8 +443,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp) static void nbd_check_meta_export(NBDClient *client) { - if (client->exp != client->export_meta.exp) { - client->export_meta.count = 0; + if (client->exp != client->context_exp) { + client->contexts.count = 0; } } @@ -847,7 +834,7 @@ static bool nbd_strshift(const char **str, const char *prefix) * Handle queries to 'base' namespace. For now, only the base:allocation * context is available. Return true if @query has been handled. */ -static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta, +static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta, const char *query) { if (!nbd_strshift(&query, "base:")) { @@ -867,8 +854,8 @@ static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta, * and qemu:allocation-depth contexts are available. Return true if @query * has been handled. */ -static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta, - const char *query) +static bool nbd_meta_qemu_query(NBDClient *client, NBDExport *exp, + NBDMetaContexts *meta, const char *query) { size_t i; @@ -879,9 +866,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta, if (!*query) { if (client->opt == NBD_OPT_LIST_META_CONTEXT) { - meta->allocation_depth = meta->exp->allocation_depth; - if (meta->exp->nr_export_bitmaps) { - memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps); + meta->allocation_depth = exp->allocation_depth; + if (meta->nr_bitmaps) { + memset(meta->bitmaps, 1, meta->nr_bitmaps); } } trace_nbd_negotiate_meta_query_parse("empty"); @@ -890,7 +877,7 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta, if (strcmp(query, "allocation-depth") == 0) { trace_nbd_negotiate_meta_query_parse("allocation-depth"); - meta->allocation_depth = meta->exp->allocation_depth; + meta->allocation_depth = exp->allocation_depth; return true; } @@ -898,17 +885,17 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta, trace_nbd_negotiate_meta_query_parse("dirty-bitmap:"); if (!*query) { if (client->opt == NBD_OPT_LIST_META_CONTEXT && - meta->exp->nr_export_bitmaps) { - memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps); + exp->nr_export_bitmaps) { + memset(meta->bitmaps, 1, exp->nr_export_bitmaps); } trace_nbd_negotiate_meta_query_parse("empty"); return true; } - for (i = 0; i < meta->exp->nr_export_bitmaps; i++) { + for (i = 0; i < meta->nr_bitmaps; i++) { const char *bm_name; - bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]); + bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]); if (strcmp(bm_name, query) == 0) { meta->bitmaps[i] = true; trace_nbd_negotiate_meta_query_parse(query); @@ -932,8 +919,8 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta, * * Return -errno on I/O error, 0 if option was completely handled by * sending a reply about inconsistent lengths, or 1 on success. */ -static int nbd_negotiate_meta_query(NBDClient *client, - NBDExportMetaContexts *meta, Error **errp) +static int nbd_negotiate_meta_query(NBDClient *client, NBDExport *exp, + NBDMetaContexts *meta, Error **errp) { int ret; g_autofree char *query = NULL; @@ -960,7 +947,7 @@ static int nbd_negotiate_meta_query(NBDClient *client, if (nbd_meta_base_query(client, meta, query)) { return 1; } - if (nbd_meta_qemu_query(client, meta, query)) { + if (nbd_meta_qemu_query(client, exp, meta, query)) { return 1; } @@ -972,14 +959,15 @@ static int nbd_negotiate_meta_query(NBDClient *client, * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT * * Return -errno on I/O error, or 0 if option was completely handled. */ -static int nbd_negotiate_meta_queries(NBDClient *client, - NBDExportMetaContexts *meta, Error **errp) +static int nbd_negotiate_meta_queries(NBDClient *client, Error **errp) { int ret; g_autofree char *export_name = NULL; /* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */ g_autofree G_GNUC_UNUSED bool *bitmaps = NULL; - NBDExportMetaContexts local_meta = {0}; + NBDMetaContexts local_meta = {0}; + NBDMetaContexts *meta; + NBDExport *exp; uint32_t nb_queries; size_t i; size_t count = 0; @@ -994,6 +982,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client, if (client->opt == NBD_OPT_LIST_META_CONTEXT) { /* Only change the caller's meta on SET. */ meta = &local_meta; + } else { + meta = &client->contexts; + client->context_exp = NULL; } g_free(meta->bitmaps); @@ -1004,14 +995,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client, return ret; } - meta->exp = nbd_export_find(export_name); - if (meta->exp == NULL) { + exp = nbd_export_find(export_name); + if (exp == NULL) { g_autofree char *sane_name = nbd_sanitize_name(export_name); return nbd_opt_drop(client, NBD_REP_ERR_UNKNOWN, errp, "export '%s' not present", sane_name); } - meta->bitmaps = g_new0(bool, meta->exp->nr_export_bitmaps); + meta->nr_bitmaps = exp->nr_export_bitmaps; + meta->bitmaps = g_new0(bool, exp->nr_export_bitmaps); if (client->opt == NBD_OPT_LIST_META_CONTEXT) { bitmaps = meta->bitmaps; } @@ -1027,13 +1019,13 @@ static int nbd_negotiate_meta_queries(NBDClient *client, if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) { /* enable all known contexts */ meta->base_allocation = true; - meta->allocation_depth = meta->exp->allocation_depth; - if (meta->exp->nr_export_bitmaps) { - memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps); + meta->allocation_depth = exp->allocation_depth; + if (exp->nr_export_bitmaps) { + memset(meta->bitmaps, 1, meta->nr_bitmaps); } } else { for (i = 0; i < nb_queries; ++i) { - ret = nbd_negotiate_meta_query(client, meta, errp); + ret = nbd_negotiate_meta_query(client, exp, meta, errp); if (ret <= 0) { return ret; } @@ -1060,7 +1052,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client, count++; } - for (i = 0; i < meta->exp->nr_export_bitmaps; i++) { + for (i = 0; i < meta->nr_bitmaps; i++) { const char *bm_name; g_autofree char *context = NULL; @@ -1068,7 +1060,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client, continue; } - bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]); + bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]); context = g_strdup_printf("qemu:dirty-bitmap:%s", bm_name); ret = nbd_negotiate_send_meta_context(client, context, @@ -1083,6 +1075,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client, ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); if (ret == 0) { meta->count = count; + if (client->opt == NBD_OPT_SET_META_CONTEXT) { + client->context_exp = exp; + } } return ret; @@ -1276,8 +1271,7 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) case NBD_OPT_LIST_META_CONTEXT: case NBD_OPT_SET_META_CONTEXT: - ret = nbd_negotiate_meta_queries(client, &client->export_meta, - errp); + ret = nbd_negotiate_meta_queries(client, errp); break; case NBD_OPT_EXTENDED_HEADERS: @@ -1508,7 +1502,7 @@ void nbd_client_put(NBDClient *client) QTAILQ_REMOVE(&client->exp->clients, client, next); blk_exp_unref(&client->exp->common); } - g_free(client->export_meta.bitmaps); + g_free(client->contexts.bitmaps); g_free(client); } } @@ -2479,6 +2473,8 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, return -ENOMEM; } } + } else if (request->type == NBD_CMD_BLOCK_STATUS) { + request->contexts = client->contexts; } if (payload_len) { @@ -2702,11 +2698,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, "need non-zero length", errp); } assert(client->extended_headers || request->len <= UINT32_MAX); - if (client->export_meta.count) { + if (request->contexts.count) { bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE; - int contexts_remaining = client->export_meta.count; + int contexts_remaining = request->contexts.count; - if (client->export_meta.base_allocation) { + if (request->contexts.base_allocation) { ret = nbd_co_send_block_status(client, request, blk_bs(exp->common.blk), request->from, @@ -2719,7 +2715,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, } } - if (client->export_meta.allocation_depth) { + if (request->contexts.allocation_depth) { ret = nbd_co_send_block_status(client, request, blk_bs(exp->common.blk), request->from, request->len, @@ -2732,8 +2728,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, } } + assert(request->contexts.nr_bitmaps == + client->exp->nr_export_bitmaps); for (i = 0; i < client->exp->nr_export_bitmaps; i++) { - if (!client->export_meta.bitmaps[i]) { + if (!request->contexts.bitmaps[i]) { continue; } ret = nbd_co_send_bitmap(client, request, @@ -2749,6 +2747,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, assert(!contexts_remaining); return 0; + } else if (client->contexts.count) { + return nbd_send_generic_reply(client, request, -EINVAL, + "CMD_BLOCK_STATUS payload not valid", + errp); } else { return nbd_send_generic_reply(client, request, -EINVAL, "CMD_BLOCK_STATUS not negotiated", @@ -2825,6 +2827,10 @@ static coroutine_fn void nbd_trip(void *opaque) } else { ret = nbd_handle_request(client, &request, req->data, &local_err); } + if (request.type == NBD_CMD_BLOCK_STATUS && + request.contexts.bitmaps != client->contexts.bitmaps) { + g_free(request.contexts.bitmaps); + } if (ret < 0) { error_prepend(&local_err, "Failed to send reply: "); goto disconnect; From patchwork Mon Nov 14 22:48:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703823 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Uy1mFv+i; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB6ZV2YNcz20KC for ; Tue, 15 Nov 2022 11:32:18 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie6-0002QQ-JK; Mon, 14 Nov 2022 18:13:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001df-IM for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:27 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGL-0002PN-2S for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466148; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c7SMkdLnmr8zQDA/YEAOaatXmRMb68RmmXxsoqFCaBY=; b=Uy1mFv+iwV/gtyp6jAWSTzmgpNEYu9QmcFUW/CaltOrvkOAF4kkRT+PwliZFf/KAIvgWnF SSN3fn8PtkxlKUck1RwXq0ZnPnckgH1K6DU8mUK2LHOOapxga6Mp/JhQSYVrhoarSS+ULZ 9UBgw8J5BWx0m7cslqLsz8rmjLmSevs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-593-1zK3yDK6OMyfoYX5N6GVJg-1; Mon, 14 Nov 2022 17:49:05 -0500 X-MC-Unique: 1zK3yDK6OMyfoYX5N6GVJg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C724C3C0DDDB; Mon, 14 Nov 2022 22:49:04 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E07E40E42E3; Mon, 14 Nov 2022 22:49:04 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy , Kevin Wolf , Hanna Reitz Subject: [PATCH v2 13/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Date: Mon, 14 Nov 2022 16:48:46 -0600 Message-Id: <20221114224848.2186298-14-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Allow a client to request a subset of negotiated meta contexts. For example, a client may ask to use a single connection to learn about both block status and dirty bitmaps, but where the dirty bitmap queries only need to be performed on a subset of the disk; forcing the server to compute that information on block status queries in the rest of the disk is wasted effort (both at the server, and on the amount of traffic sent over the wire to be parsed and ignored by the client). Qemu as an NBD client never requests to use more than one meta context, so it has no need to use block status payloads. Testing this instead requires support from libnbd, which CAN access multiple meta contexts in parallel from a single NBD connection; an interop test submitted to the libnbd project at the same time as this patch demonstrates the feature working, as well as testing some corner cases (for example, when the payload length is longer than the export length), although other corner cases (like passing the same id duplicated) requires a protocol fuzzer because libnbd is not wired up to break the protocol that badly. This also includes tweaks to 'qemu-nbd --list' to show when a server is advertising the capability, and to the testsuite to reflect the addition to that output. Signed-off-by: Eric Blake --- docs/interop/nbd.txt | 2 +- include/block/nbd.h | 32 ++++-- nbd/server.c | 106 +++++++++++++++++- qemu-nbd.c | 1 + nbd/trace-events | 1 + tests/qemu-iotests/223.out | 12 +- tests/qemu-iotests/307.out | 10 +- .../tests/nbd-qemu-allocation.out | 2 +- 8 files changed, 136 insertions(+), 30 deletions(-) diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt index 988c072697..b7893043a3 100644 --- a/docs/interop/nbd.txt +++ b/docs/interop/nbd.txt @@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE NBD_CMD_FLAG_FAST_ZERO * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports -* 7.2: NBD_OPT_EXTENDED_HEADERS +* 7.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD diff --git a/include/block/nbd.h b/include/block/nbd.h index 9a8ac1c8a5..2a65c606c9 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -167,6 +167,12 @@ typedef struct NBDExtentExt { uint64_t flags; /* NBD_STATE_* */ } QEMU_PACKED NBDExtentExt; +/* Client payload for limiting NBD_CMD_BLOCK_STATUS reply */ +typedef struct NBDBlockStatusPayload { + uint64_t effect_length; + /* uint32_t ids[] follows, array length implied by header */ +} QEMU_PACKED NBDBlockStatusPayload; + /* Transmission (export) flags: sent from server to client during handshake, but describe what will happen during transmission */ enum { @@ -183,20 +189,22 @@ enum { NBD_FLAG_SEND_RESIZE_BIT = 9, /* Send resize */ NBD_FLAG_SEND_CACHE_BIT = 10, /* Send CACHE (prefetch) */ NBD_FLAG_SEND_FAST_ZERO_BIT = 11, /* FAST_ZERO flag for WRITE_ZEROES */ + NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT = 12, /* PAYLOAD flag for BLOCK_STATUS */ }; -#define NBD_FLAG_HAS_FLAGS (1 << NBD_FLAG_HAS_FLAGS_BIT) -#define NBD_FLAG_READ_ONLY (1 << NBD_FLAG_READ_ONLY_BIT) -#define NBD_FLAG_SEND_FLUSH (1 << NBD_FLAG_SEND_FLUSH_BIT) -#define NBD_FLAG_SEND_FUA (1 << NBD_FLAG_SEND_FUA_BIT) -#define NBD_FLAG_ROTATIONAL (1 << NBD_FLAG_ROTATIONAL_BIT) -#define NBD_FLAG_SEND_TRIM (1 << NBD_FLAG_SEND_TRIM_BIT) -#define NBD_FLAG_SEND_WRITE_ZEROES (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT) -#define NBD_FLAG_SEND_DF (1 << NBD_FLAG_SEND_DF_BIT) -#define NBD_FLAG_CAN_MULTI_CONN (1 << NBD_FLAG_CAN_MULTI_CONN_BIT) -#define NBD_FLAG_SEND_RESIZE (1 << NBD_FLAG_SEND_RESIZE_BIT) -#define NBD_FLAG_SEND_CACHE (1 << NBD_FLAG_SEND_CACHE_BIT) -#define NBD_FLAG_SEND_FAST_ZERO (1 << NBD_FLAG_SEND_FAST_ZERO_BIT) +#define NBD_FLAG_HAS_FLAGS (1 << NBD_FLAG_HAS_FLAGS_BIT) +#define NBD_FLAG_READ_ONLY (1 << NBD_FLAG_READ_ONLY_BIT) +#define NBD_FLAG_SEND_FLUSH (1 << NBD_FLAG_SEND_FLUSH_BIT) +#define NBD_FLAG_SEND_FUA (1 << NBD_FLAG_SEND_FUA_BIT) +#define NBD_FLAG_ROTATIONAL (1 << NBD_FLAG_ROTATIONAL_BIT) +#define NBD_FLAG_SEND_TRIM (1 << NBD_FLAG_SEND_TRIM_BIT) +#define NBD_FLAG_SEND_WRITE_ZEROES (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT) +#define NBD_FLAG_SEND_DF (1 << NBD_FLAG_SEND_DF_BIT) +#define NBD_FLAG_CAN_MULTI_CONN (1 << NBD_FLAG_CAN_MULTI_CONN_BIT) +#define NBD_FLAG_SEND_RESIZE (1 << NBD_FLAG_SEND_RESIZE_BIT) +#define NBD_FLAG_SEND_CACHE (1 << NBD_FLAG_SEND_CACHE_BIT) +#define NBD_FLAG_SEND_FAST_ZERO (1 << NBD_FLAG_SEND_FAST_ZERO_BIT) +#define NBD_FLAG_BLOCK_STAT_PAYLOAD (1 << NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT) /* New-style handshake (global) flags, sent from server to client, and control what will happen during handshake phase. */ diff --git a/nbd/server.c b/nbd/server.c index 1fd1f32028..cd280f1721 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -441,9 +441,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp) return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); } -static void nbd_check_meta_export(NBDClient *client) +static void nbd_check_meta_export(NBDClient *client, NBDExport *exp) { - if (client->exp != client->context_exp) { + if (exp != client->context_exp) { client->contexts.count = 0; } } @@ -486,11 +486,15 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes, error_setg(errp, "export not found"); return -EINVAL; } + nbd_check_meta_export(client, client->exp); myflags = client->exp->nbdflags; if (client->structured_reply) { myflags |= NBD_FLAG_SEND_DF; } + if (client->extended_headers && client->contexts.count) { + myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD; + } trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags); stq_be_p(buf, client->exp->size); stw_be_p(buf + 8, myflags); @@ -503,7 +507,6 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes, QTAILQ_INSERT_TAIL(&client->exp->clients, client, next); blk_exp_ref(&client->exp->common); - nbd_check_meta_export(client); return 0; } @@ -623,6 +626,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp) errp, "export '%s' not present", sane_name); } + if (client->opt == NBD_OPT_GO) { + nbd_check_meta_export(client, exp); + } /* Don't bother sending NBD_INFO_NAME unless client requested it */ if (sendname) { @@ -676,6 +682,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp) if (client->structured_reply) { myflags |= NBD_FLAG_SEND_DF; } + if (client->extended_headers && + (client->contexts.count || client->opt == NBD_OPT_INFO)) { + myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD; + } trace_nbd_negotiate_new_style_size_flags(exp->size, myflags); stq_be_p(buf, exp->size); stw_be_p(buf + 8, myflags); @@ -711,7 +721,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp) client->check_align = check_align; QTAILQ_INSERT_TAIL(&client->exp->clients, client, next); blk_exp_ref(&client->exp->common); - nbd_check_meta_export(client); rc = 1; } return rc; @@ -2406,6 +2415,83 @@ static int nbd_co_send_bitmap(NBDClient *client, NBDRequest *request, return nbd_co_send_extents(client, request, ea, last, context_id, errp); } +/* + * nbd_co_block_status_payload_read + * Called when a client wants a subset of negotiated contexts via a + * BLOCK_STATUS payload. Check the payload for valid length and + * contents. On success, return 0 with request updated to effective + * length. If request was invalid but payload consumed, return 0 with + * request->len and request->contexts.count set to 0 (which will + * trigger an appropriate NBD_EINVAL response later on). On I/O + * error, return -EIO. + */ +static int +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request, + Error **errp) +{ + int payload_len = request->len; + g_autofree char *buf = NULL; + g_autofree bool *bitmaps = NULL; + size_t count, i; + uint32_t id; + + assert(request->len <= NBD_MAX_BUFFER_SIZE); + if (payload_len % sizeof(uint32_t) || + payload_len < sizeof(NBDBlockStatusPayload) || + payload_len > (sizeof(NBDBlockStatusPayload) + + sizeof(id) * client->contexts.count)) { + goto skip; + } + + buf = g_malloc(payload_len); + if (nbd_read(client->ioc, buf, payload_len, + "CMD_BLOCK_STATUS data", errp) < 0) { + return -EIO; + } + trace_nbd_co_receive_request_payload_received(request->handle, + payload_len); + memset(&request->contexts, 0, sizeof(request->contexts)); + request->contexts.nr_bitmaps = client->context_exp->nr_export_bitmaps; + bitmaps = g_new0(bool, request->contexts.nr_bitmaps); + count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id); + payload_len = 0; + + for (i = 0; i < count; i++) { + + id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i); + if (id == NBD_META_ID_BASE_ALLOCATION) { + if (request->contexts.base_allocation) { + goto skip; + } + request->contexts.base_allocation = true; + } else if (id == NBD_META_ID_ALLOCATION_DEPTH) { + if (request->contexts.allocation_depth) { + goto skip; + } + request->contexts.allocation_depth = true; + } else { + if (id - NBD_META_ID_DIRTY_BITMAP > + request->contexts.nr_bitmaps || + bitmaps[id - NBD_META_ID_DIRTY_BITMAP]) { + goto skip; + } + bitmaps[id - NBD_META_ID_DIRTY_BITMAP] = true; + } + } + + request->len = ldq_be_p(buf); + request->contexts.count = count; + request->contexts.bitmaps = bitmaps; + bitmaps = NULL; + return 0; + + skip: + trace_nbd_co_receive_block_status_payload_compliance(request->from, + request->len); + request->len = request->contexts.count = 0; + return nbd_drop(client->ioc, payload_len, errp); +} + /* nbd_co_receive_request * Collect a client request. Return 0 if request looks valid, -EIO to drop * connection right away, -EAGAIN to indicate we were interrupted and the @@ -2452,7 +2538,14 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, if (request->type == NBD_CMD_WRITE || extended_with_payload) { payload_len = request->len; - if (request->type != NBD_CMD_WRITE) { + if (request->type == NBD_CMD_BLOCK_STATUS) { + payload_len = nbd_co_block_status_payload_read(client, + request, + errp); + if (payload_len < 0) { + return -EIO; + } + } else if (request->type != NBD_CMD_WRITE) { /* * For now, we don't support payloads on other * commands; but we can keep the connection alive. @@ -2528,6 +2621,9 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO; } else if (request->type == NBD_CMD_BLOCK_STATUS) { valid_flags |= NBD_CMD_FLAG_REQ_ONE; + if (client->extended_headers && client->contexts.count) { + valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN; + } } if (request->flags & ~valid_flags) { error_setg(errp, "unsupported flags for command %s (got 0x%x)", diff --git a/qemu-nbd.c b/qemu-nbd.c index 9404164487..4ae5c5e73e 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -222,6 +222,7 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls, [NBD_FLAG_SEND_RESIZE_BIT] = "resize", [NBD_FLAG_SEND_CACHE_BIT] = "cache", [NBD_FLAG_SEND_FAST_ZERO_BIT] = "fast-zero", + [NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT] = "block-status-payload", }; printf(" size: %" PRIu64 "\n", list[i].size); diff --git a/nbd/trace-events b/nbd/trace-events index c20df33a43..da92fe1b56 100644 --- a/nbd/trace-events +++ b/nbd/trace-events @@ -70,6 +70,7 @@ nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, size_t nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size) "Send structured read hole reply: handle = %" PRIu64 ", offset = %" PRIu64 ", len = %zu" nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)" nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'" +nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x" nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)" nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64 nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64 diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out index b98582c38e..b38f0b7963 100644 --- a/tests/qemu-iotests/223.out +++ b/tests/qemu-iotests/223.out @@ -83,7 +83,7 @@ exports available: 0 exports available: 3 export: 'n' size: 4194304 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 @@ -94,7 +94,7 @@ exports available: 3 export: 'n2' description: some text size: 4194304 - flags: 0xded ( flush fua trim zeroes df multi cache fast-zero ) + flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 @@ -104,7 +104,7 @@ exports available: 3 qemu:dirty-bitmap:b2 export: 'n3' size: 4194304 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 @@ -205,7 +205,7 @@ exports available: 0 exports available: 3 export: 'n' size: 4194304 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 @@ -216,7 +216,7 @@ exports available: 3 export: 'n2' description: some text size: 4194304 - flags: 0xded ( flush fua trim zeroes df multi cache fast-zero ) + flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 @@ -226,7 +226,7 @@ exports available: 3 qemu:dirty-bitmap:b2 export: 'n3' size: 4194304 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out index 2b9a6a67a1..f645f3315f 100644 --- a/tests/qemu-iotests/307.out +++ b/tests/qemu-iotests/307.out @@ -15,7 +15,7 @@ wrote 4096/4096 bytes at offset 0 exports available: 1 export: 'fmt' size: 67108864 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: XXX opt block: XXX max block: XXX @@ -44,7 +44,7 @@ exports available: 1 exports available: 1 export: 'fmt' size: 67108864 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: XXX opt block: XXX max block: XXX @@ -76,7 +76,7 @@ exports available: 1 exports available: 2 export: 'fmt' size: 67108864 - flags: 0x58f ( readonly flush fua df multi cache ) + flags: 0x158f ( readonly flush fua df multi cache block-status-payload ) min block: XXX opt block: XXX max block: XXX @@ -86,7 +86,7 @@ exports available: 2 export: 'export1' description: This is the writable second export size: 67108864 - flags: 0xded ( flush fua trim zeroes df multi cache fast-zero ) + flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload ) min block: XXX opt block: XXX max block: XXX @@ -113,7 +113,7 @@ exports available: 1 export: 'export1' description: This is the writable second export size: 67108864 - flags: 0xded ( flush fua trim zeroes df multi cache fast-zero ) + flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload ) min block: XXX opt block: XXX max block: XXX diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out index 659276032b..794d1bfce6 100644 --- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out +++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out @@ -17,7 +17,7 @@ wrote 2097152/2097152 bytes at offset 1048576 exports available: 1 export: '' size: 4194304 - flags: 0x48f ( readonly flush fua df cache ) + flags: 0x148f ( readonly flush fua df cache block-status-payload ) min block: 1 opt block: 4096 max block: 33554432 From patchwork Mon Nov 14 22:48:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703771 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fMWcp1f/; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB4sN0Tx4z23lt for ; Tue, 15 Nov 2022 10:15:02 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieJ-00039M-3Z; Mon, 14 Nov 2022 18:13:55 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001X6-6a for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:27 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGM-0002Pu-HD for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466149; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PSKFsd3WLLrsYRtlYZAlFwgEDN2comkk7iEfIBxymQM=; b=fMWcp1f/Xw6EwH0DWkNDimoEsX9EWAfid/gtr5jeTUdUTy9TxpelHyjfN6919SzmO4HByZ y/WVzFgWyUHOuymRsqyCqgVG+LILAfTJeaYySfFBm2kakkDCTiFJKA4fBIljgWC+9DXwVm wdML+C567BE/hQ8dI9KpIHc/Lfd9MQ4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-317-WJwDpn3APeGXsE-6Q8_8Dw-1; Mon, 14 Nov 2022 17:49:06 -0500 X-MC-Unique: WJwDpn3APeGXsE-6Q8_8Dw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9AEB4101A528; Mon, 14 Nov 2022 22:49:05 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0566940E42E5; Mon, 14 Nov 2022 22:49:04 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy , Kevin Wolf , Hanna Reitz Subject: [PATCH v2 14/15] RFC: nbd/client: Accept 64-bit hole chunks Date: Mon, 14 Nov 2022 16:48:47 -0600 Message-Id: <20221114224848.2186298-15-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org As part of adding extended headers, the NBD spec debated about adding support for reading 64-bit holes. It was documented in a separate upstream commit XXX[*] to make it easier to decide whether 64-bit holes should be required of all clients supporting extended headers, or whether it is an unneeded feature; hence, the qemu work to support it is also pulled out into a separate commit. Note that we can also tolerate a non-compliant server sending the new chunk even when it should not. Signed-off-by: Eric Blake --- [*] Fix commit id if we actually go with idea --- include/block/nbd.h | 8 ++++++++ block/nbd.c | 26 ++++++++++++++++++++------ nbd/common.c | 4 +++- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 2a65c606c9..18b6bad038 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -133,6 +133,13 @@ typedef struct NBDStructuredReadHole { uint32_t length; } QEMU_PACKED NBDStructuredReadHole; +/* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE_EXT */ +typedef struct NBDStructuredReadHoleExt { + /* header's length == 16 */ + uint64_t offset; + uint64_t length; +} QEMU_PACKED NBDStructuredReadHoleExt; + /* Header of all NBD_REPLY_TYPE_ERROR* errors */ typedef struct NBDStructuredError { /* header's length >= 6 */ @@ -309,6 +316,7 @@ enum { #define NBD_REPLY_TYPE_NONE 0 #define NBD_REPLY_TYPE_OFFSET_DATA 1 #define NBD_REPLY_TYPE_OFFSET_HOLE 2 +#define NBD_REPLY_TYPE_OFFSET_HOLE_EXT 3 #define NBD_REPLY_TYPE_BLOCK_STATUS 5 #define NBD_REPLY_TYPE_BLOCK_STATUS_EXT 6 #define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1) diff --git a/block/nbd.c b/block/nbd.c index 44ab5437ea..968d5d8a37 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -570,20 +570,26 @@ static inline uint64_t payload_advance64(uint8_t **payload) static int nbd_parse_offset_hole_payload(BDRVNBDState *s, NBDStructuredReplyChunk *chunk, - uint8_t *payload, uint64_t orig_offset, + uint8_t *payload, bool wide, + uint64_t orig_offset, QEMUIOVector *qiov, Error **errp) { uint64_t offset; - uint32_t hole_size; + uint64_t hole_size; + size_t len = wide ? sizeof(hole_size) : sizeof(uint32_t); - if (chunk->length != sizeof(offset) + sizeof(hole_size)) { + if (chunk->length != sizeof(offset) + len) { error_setg(errp, "Protocol error: invalid payload for " "NBD_REPLY_TYPE_OFFSET_HOLE"); return -EINVAL; } offset = payload_advance64(&payload); - hole_size = payload_advance32(&payload); + if (wide) { + hole_size = payload_advance64(&payload); + } else { + hole_size = payload_advance32(&payload); + } if (!hole_size || offset < orig_offset || hole_size > qiov->size || offset > orig_offset + qiov->size - hole_size) { @@ -596,6 +602,7 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s, trace_nbd_structured_read_compliance("hole"); } + assert(hole_size <= SIZE_MAX); qemu_iovec_memset(qiov, offset - orig_offset, 0, hole_size); return 0; @@ -1094,9 +1101,16 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h * in qiov */ break; + case NBD_REPLY_TYPE_OFFSET_HOLE_EXT: + if (!s->info.extended_headers) { + trace_nbd_extended_headers_compliance("hole_ext"); + } + /* fallthrough */ case NBD_REPLY_TYPE_OFFSET_HOLE: - ret = nbd_parse_offset_hole_payload(s, &reply.structured, payload, - offset, qiov, &local_err); + ret = nbd_parse_offset_hole_payload( + s, &reply.structured, payload, + chunk->type == NBD_REPLY_TYPE_OFFSET_HOLE_EXT, + offset, qiov, &local_err); if (ret < 0) { nbd_channel_error(s, ret); nbd_iter_channel_error(&iter, ret, &local_err); diff --git a/nbd/common.c b/nbd/common.c index 137466defd..54f7d6a4fd 100644 --- a/nbd/common.c +++ b/nbd/common.c @@ -174,7 +174,9 @@ const char *nbd_reply_type_lookup(uint16_t type) case NBD_REPLY_TYPE_OFFSET_DATA: return "data"; case NBD_REPLY_TYPE_OFFSET_HOLE: - return "hole"; + return "hole (32-bit)"; + case NBD_REPLY_TYPE_OFFSET_HOLE_EXT: + return "hole (64-bit)"; case NBD_REPLY_TYPE_BLOCK_STATUS: return "block status (32-bit)"; case NBD_REPLY_TYPE_BLOCK_STATUS_EXT: From patchwork Mon Nov 14 22:48:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703795 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=eflHQfT6; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB5Nj1FXNz23mY for ; Tue, 15 Nov 2022 10:38:45 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie1-0002CA-R5; Mon, 14 Nov 2022 18:13:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidr-0001df-6E for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:27 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiGM-0002QB-Kx for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:49:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466150; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CHsBQkgh32mZYDlsY0mZ2u45YAKMiJgOqFfn5u30FUM=; b=eflHQfT6fgXpf9vgPxWhOSlcUIKdvxYwaL75zSzK4Rg5rXXbh/b08rOoIMk2S/zMR1DiGp RvPQxRvTHqzFSIgz0kGCuQcIRPsBU8HniwIDWM3X8chyDGIggJpcbLsEf8MgJZS/PUHvMV Mmquj8sxsFxLNNok/Zobfo1wBrO1HcQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-Y1SxJHrLPdCwHXu-gqXVEA-1; Mon, 14 Nov 2022 17:49:06 -0500 X-MC-Unique: Y1SxJHrLPdCwHXu-gqXVEA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 47AFC101E989; Mon, 14 Nov 2022 22:49:06 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id C3AAE40E42E3; Mon, 14 Nov 2022 22:49:05 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org, Vladimir Sementsov-Ogievskiy Subject: [PATCH v2 15/15] RFC: nbd/server: Send 64-bit hole chunk Date: Mon, 14 Nov 2022 16:48:48 -0600 Message-Id: <20221114224848.2186298-16-eblake@redhat.com> In-Reply-To: <20221114224848.2186298-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224848.2186298-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Since we cap NBD_CMD_READ requests to 32M, we never have a reason to send a 64-bit chunk type for a hole; but it is worth producing these for interoperability testing of clients that want extended headers. --- nbd/server.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index cd280f1721..04cb172f97 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2112,9 +2112,13 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, if (status & BDRV_BLOCK_ZERO) { NBDReply hdr; NBDStructuredReadHole chunk; + NBDStructuredReadHoleExt chunk_ext; struct iovec iov[] = { {.iov_base = &hdr}, - {.iov_base = &chunk, .iov_len = sizeof(chunk)}, + {.iov_base = client->extended_headers ? &chunk_ext + : (void *) &chunk, + .iov_len = client->extended_headers ? sizeof(chunk_ext) + : sizeof(chunk)}, }; trace_nbd_co_send_structured_read_hole(request->handle, @@ -2122,9 +2126,17 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client, pnum); set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0, - NBD_REPLY_TYPE_OFFSET_HOLE, request, iov[1].iov_len); - stq_be_p(&chunk.offset, offset + progress); - stl_be_p(&chunk.length, pnum); + client->extended_headers + ? NBD_REPLY_TYPE_OFFSET_HOLE_EXT + : NBD_REPLY_TYPE_OFFSET_HOLE, + request, iov[1].iov_len); + if (client->extended_headers) { + stq_be_p(&chunk_ext.offset, offset + progress); + stq_be_p(&chunk_ext.length, pnum); + } else { + stq_be_p(&chunk.offset, offset + progress); + stl_be_p(&chunk.length, pnum); + } ret = nbd_co_send_iov(client, iov, 2, errp); } else { ret = blk_pread(exp->common.blk, offset + progress, pnum, From patchwork Mon Nov 14 22:51:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 1703781 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=CUCi04B0; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NB53g3kFyz23lt for ; Tue, 15 Nov 2022 10:23:59 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouidw-0001zr-RN; Mon, 14 Nov 2022 18:13:33 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidm-0001X6-92 for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:22 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiJP-00036V-DK for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:52:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466334; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yGGwQbuqD48ui6XJeENuPlPZt4N0uynde43JY+1oSfw=; b=CUCi04B0F7JWlMQW9LnQg8JcR3aL0DnT1OSGfqz9g4euWU5YTnUjfdoMUHXEplk7G2IuyG kg/vX72uKSu0Btjp26ee2JzsOl4LTEbGORW6RaoRQKOX2pTpy2zUwxSi/5U4NB85DUTgWy bITmfdf1T8oOtXb3p7ajU5e62Ot+Xxg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-388-5yyGsRK_Nq2mNK1KuWqcdg-1; Mon, 14 Nov 2022 17:52:12 -0500 X-MC-Unique: 5yyGsRK_Nq2mNK1KuWqcdg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2A1AD1C08972; Mon, 14 Nov 2022 22:52:12 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id BADB340E9786; Mon, 14 Nov 2022 22:52:11 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Subject: [libnbd PATCH v2 18/23] generator: Actually request extended headers Date: Mon, 14 Nov 2022 16:51:53 -0600 Message-Id: <20221114225158.2186742-19-eblake@redhat.com> In-Reply-To: <20221114225158.2186742-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114225158.2186742-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This is the culmination of the previous patches' preparation work for using extended headers when possible. The new states in the state machine are copied extensively from our handling of OPT_STRUCTURED_REPLY. The next patch will then expose a new API nbd_opt_extended_headers() for manual control. At the same time I posted this patch, I had patches for qemu-nbd to support extended headers as server (nbdkit is a bit tougher). The next patches will add some interop tests that pass when using a new enough qemu-nbd, showing that we have cross-project interoperability and therefore an extension worth standardizing. --- generator/API.ml | 87 ++++++++--------- generator/Makefile.am | 3 +- generator/state_machine.ml | 41 ++++++++ .../states-newstyle-opt-extended-headers.c | 94 +++++++++++++++++++ generator/states-newstyle-opt-starttls.c | 6 +- 5 files changed, 185 insertions(+), 46 deletions(-) create mode 100644 generator/states-newstyle-opt-extended-headers.c diff --git a/generator/API.ml b/generator/API.ml index 3d0289f6..570f15f4 100644 --- a/generator/API.ml +++ b/generator/API.ml @@ -953,23 +953,24 @@ "set_request_meta_context", { (all C calls when L is false, or L and L when option mode is enabled) will also try to issue NBD_OPT_SET_META_CONTEXT when -the server supports structured replies and any contexts were -registered by L. The default setting -is true; however the extra step of negotiating meta contexts is -not always desirable: performing both info and go on the same -export works without needing to re-negotiate contexts on the -second call; integration testing of other servers may benefit -from manual invocation of L at -other times in the negotiation sequence; and even when using just -L, it can be faster to collect the server's +the server supports structured replies or extended headers and +any contexts were registered by L. The +default setting is true; however the extra step of negotiating +meta contexts is not always desirable: performing both info and +go on the same export works without needing to re-negotiate +contexts on the second call; integration testing of other servers +may benefit from manual invocation of L +at other times in the negotiation sequence; and even when using +just L, it can be faster to collect the server's results by relying on the callback function passed to L than a series of post-process calls to L. Note that this control has no effect if the server does not -negotiate structured replies, or if the client did not request -any contexts via L. Setting this -control to false may cause L to fail."; +negotiate structured replies or extended headers, or if the +client did not request any contexts via L. +Setting this control to false may cause L +to fail."; see_also = [Link "set_opt_mode"; Link "opt_go"; Link "opt_info"; Link "opt_list_meta_context"; Link "opt_set_meta_context"; Link "get_structured_replies_negotiated"; @@ -1404,11 +1405,11 @@ "opt_info", { If successful, functions like L and L will report details about that export. If L is set (the default) and -structured replies were negotiated, it is also valid to use -L after this call. However, it may be -more efficient to clear that setting and manually utilize -L with its callback approach, for -learning which contexts an export supports. In general, if +structured replies or extended headers were negotiated, it is also +valid to use L after this call. However, +it may be more efficient to clear that setting and manually +utilize L with its callback approach, +for learning which contexts an export supports. In general, if L is called next, that call will likely succeed with the details remaining the same, although this is not guaranteed by all servers. @@ -1538,12 +1539,12 @@ "opt_set_meta_context", { recent L or L. This can only be used if L enabled option mode. Normally, this function is redundant, as L -automatically does the same task if structured replies have -already been negotiated. But manual control over meta context -requests can be useful for fine-grained testing of how a server -handles unusual negotiation sequences. Often, use of this -function is coupled with L to -bypass the automatic context request normally performed by +automatically does the same task if structured replies or extended +headers have already been negotiated. But manual control over +meta context requests can be useful for fine-grained testing of +how a server handles unusual negotiation sequences. Often, use +of this function is coupled with L +to bypass the automatic context request normally performed by L. The NBD protocol allows a client to decide how many queries to ask @@ -1597,12 +1598,13 @@ "opt_set_meta_context_queries", { or L. This can only be used if L enabled option mode. Normally, this function is redundant, as L automatically does -the same task if structured replies have already been -negotiated. But manual control over meta context requests can -be useful for fine-grained testing of how a server handles -unusual negotiation sequences. Often, use of this function is -coupled with L to bypass the -automatic context request normally performed by L. +the same task if structured replies or extended headers have +already been negotiated. But manual control over meta context +requests can be useful for fine-grained testing of how a server +handles unusual negotiation sequences. Often, use of this +function is coupled with L to +bypass the automatic context request normally performed by +L. The NBD protocol allows a client to decide how many queries to ask the server. This function takes an explicit list of queries; to @@ -3233,13 +3235,13 @@ "aio_opt_set_meta_context", { recent L or L. This can only be used if L enabled option mode. Normally, this function is redundant, as L -automatically does the same task if structured replies have -already been negotiated. But manual control over meta context -requests can be useful for fine-grained testing of how a server -handles unusual negotiation sequences. Often, use of this -function is coupled with L to -bypass the automatic context request normally performed by -L. +automatically does the same task if structured replies or +extended headers have already been negotiated. But manual +control over meta context requests can be useful for fine-grained +testing of how a server handles unusual negotiation sequences. +Often, use of this function is coupled with +L to bypass the automatic +context request normally performed by L. To determine when the request completes, wait for L to return false. Or supply the optional @@ -3266,12 +3268,13 @@ "aio_opt_set_meta_context_queries", { or L. This can only be used if L enabled option mode. Normally, this function is redundant, as L automatically does -the same task if structured replies have already been -negotiated. But manual control over meta context requests can -be useful for fine-grained testing of how a server handles -unusual negotiation sequences. Often, use of this function is -coupled with L to bypass the -automatic context request normally performed by L. +the same task if structured replies or extended headers have +already been negotiated. But manual control over meta context +requests can be useful for fine-grained testing of how a server +handles unusual negotiation sequences. Often, use of this +function is coupled with L to +bypass the automatic context request normally performed by +L. To determine when the request completes, wait for L to return false. Or supply the optional diff --git a/generator/Makefile.am b/generator/Makefile.am index 3193c733..09739449 100644 --- a/generator/Makefile.am +++ b/generator/Makefile.am @@ -1,5 +1,5 @@ # nbd client library in userspace -# Copyright (C) 2013-2020 Red Hat Inc. +# Copyright (C) 2013-2022 Red Hat Inc. # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public @@ -30,6 +30,7 @@ states_code = \ states-issue-command.c \ states-magic.c \ states-newstyle-opt-export-name.c \ + states-newstyle-opt-extended-headers.c \ states-newstyle-opt-list.c \ states-newstyle-opt-go.c \ states-newstyle-opt-meta-context.c \ diff --git a/generator/state_machine.ml b/generator/state_machine.ml index d2c326f3..c9e38a5e 100644 --- a/generator/state_machine.ml +++ b/generator/state_machine.ml @@ -297,6 +297,7 @@ and * NEGOTIATING after OPT_STRUCTURED_REPLY or any failed OPT_GO. *) Group ("OPT_STARTTLS", newstyle_opt_starttls_state_machine); + Group ("OPT_EXTENDED_HEADERS", newstyle_opt_extended_headers_state_machine); Group ("OPT_STRUCTURED_REPLY", newstyle_opt_structured_reply_state_machine); Group ("OPT_META_CONTEXT", newstyle_opt_meta_context_state_machine); Group ("OPT_GO", newstyle_opt_go_state_machine); @@ -441,6 +442,46 @@ and }; ] +(* Fixed newstyle NBD_OPT_EXTENDED_HEADERS option. + * Implementation: generator/states-newstyle-opt-extended-headers.c + *) +and newstyle_opt_extended_headers_state_machine = [ + State { + default_state with + name = "START"; + comment = "Try to negotiate newstyle NBD_OPT_EXTENDED_HEADERS"; + external_events = []; + }; + + State { + default_state with + name = "SEND"; + comment = "Send newstyle NBD_OPT_EXTENDED_HEADERS negotiation request"; + external_events = [ NotifyWrite, "" ]; + }; + + State { + default_state with + name = "RECV_REPLY"; + comment = "Receive newstyle NBD_OPT_EXTENDED_HEADERS option reply"; + external_events = [ NotifyRead, "" ]; + }; + + State { + default_state with + name = "RECV_REPLY_PAYLOAD"; + comment = "Receive any newstyle NBD_OPT_EXTENDED_HEADERS reply payload"; + external_events = [ NotifyRead, "" ]; + }; + + State { + default_state with + name = "CHECK_REPLY"; + comment = "Check newstyle NBD_OPT_EXTENDED_HEADERS option reply"; + external_events = []; + }; +] + (* Fixed newstyle NBD_OPT_STRUCTURED_REPLY option. * Implementation: generator/states-newstyle-opt-structured-reply.c *) diff --git a/generator/states-newstyle-opt-extended-headers.c b/generator/states-newstyle-opt-extended-headers.c new file mode 100644 index 00000000..151787bf --- /dev/null +++ b/generator/states-newstyle-opt-extended-headers.c @@ -0,0 +1,94 @@ +/* nbd client library in userspace: state machine + * Copyright (C) 2013-2022 Red Hat Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* State machine for negotiating NBD_OPT_EXTENDED_HEADERS. */ + +STATE_MACHINE { + NEWSTYLE.OPT_EXTENDED_HEADERS.START: + assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE); + assert (h->opt_current != NBD_OPT_EXTENDED_HEADERS); + assert (CALLBACK_IS_NULL (h->opt_cb.completion)); + if (!h->request_eh || !h->request_sr) { + SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START); + return 0; + } + + h->sbuf.option.version = htobe64 (NBD_NEW_VERSION); + h->sbuf.option.option = htobe32 (NBD_OPT_EXTENDED_HEADERS); + h->sbuf.option.optlen = htobe32 (0); + h->chunks_sent++; + h->wbuf = &h->sbuf; + h->wlen = sizeof h->sbuf.option; + SET_NEXT_STATE (%SEND); + return 0; + + NEWSTYLE.OPT_EXTENDED_HEADERS.SEND: + switch (send_from_wbuf (h)) { + case -1: SET_NEXT_STATE (%.DEAD); return 0; + case 0: + h->rbuf = &h->sbuf; + h->rlen = sizeof h->sbuf.or.option_reply; + SET_NEXT_STATE (%RECV_REPLY); + } + return 0; + + NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY: + switch (recv_into_rbuf (h)) { + case -1: SET_NEXT_STATE (%.DEAD); return 0; + case 0: + if (prepare_for_reply_payload (h, NBD_OPT_EXTENDED_HEADERS) == -1) { + SET_NEXT_STATE (%.DEAD); + return 0; + } + SET_NEXT_STATE (%RECV_REPLY_PAYLOAD); + } + return 0; + + NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY_PAYLOAD: + switch (recv_into_rbuf (h)) { + case -1: SET_NEXT_STATE (%.DEAD); return 0; + case 0: SET_NEXT_STATE (%CHECK_REPLY); + } + return 0; + + NEWSTYLE.OPT_EXTENDED_HEADERS.CHECK_REPLY: + uint32_t reply; + + reply = be32toh (h->sbuf.or.option_reply.reply); + switch (reply) { + case NBD_REP_ACK: + debug (h, "negotiated extended headers on this connection"); + h->extended_headers = true; + /* Extended headers trump structured replies, so skip ahead. */ + h->structured_replies = true; + break; + default: + if (handle_reply_error (h) == -1) { + SET_NEXT_STATE (%.DEAD); + return 0; + } + + debug (h, "extended headers are not supported by this server"); + break; + } + + /* Next option. */ + SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START); + return 0; + +} /* END STATE MACHINE */ diff --git a/generator/states-newstyle-opt-starttls.c b/generator/states-newstyle-opt-starttls.c index 6d528672..3b509841 100644 --- a/generator/states-newstyle-opt-starttls.c +++ b/generator/states-newstyle-opt-starttls.c @@ -26,7 +26,7 @@ NEWSTYLE.OPT_STARTTLS.START: else { /* If TLS was not requested we skip this option and go to the next one. */ if (h->tls == LIBNBD_TLS_DISABLE) { - SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START); + SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START); return 0; } assert (CALLBACK_IS_NULL (h->opt_cb.completion)); @@ -128,7 +128,7 @@ NEWSTYLE.OPT_STARTTLS.CHECK_REPLY: SET_NEXT_STATE (%.NEGOTIATING); else { debug (h, "continuing with unencrypted connection"); - SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START); + SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START); } return 0; } @@ -185,7 +185,7 @@ NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_DONE: if (h->opt_current == NBD_OPT_STARTTLS) SET_NEXT_STATE (%.NEGOTIATING); else - SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START); + SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START); return 0; } /* END STATE MACHINE */