[{"id":3675868,"web_url":"http://patchwork.ozlabs.org/comment/3675868/","msgid":"<20260410124121.GA485553@fedora>","list_archive_url":null,"date":"2026-04-10T12:41:21","subject":"Re: [PATCH v2] hw/nvme: add namespace hotplug support","submitter":{"id":17227,"url":"http://patchwork.ozlabs.org/api/people/17227/","name":"Stefan Hajnoczi","email":"stefanha@redhat.com"},"content":"On Thu, Apr 09, 2026 at 11:34:51PM +0200, mr-083 wrote:\n> Add hotplug support for nvme-ns devices on the NvmeBus. This enables\n> namespace-level hot-swap without removing the NVMe controller, matching\n> the behavior of physical NVMe drives hot-swapped in the same PCIe slot.\n\nIf we rely purely on NVMe's AEN then this is not equivalent to swapping\nphysical drives in the same PCIe slot. Maybe adjust the wording to\nreflect that this is NVMe-level Namespace hotplug?\n\n> Mark nvme-ns devices as hotpluggable and register the NvmeBus as a\n> hotplug handler with proper plug and unplug callbacks:\n> \n> - plug: attach namespace to all started controllers and send an\n>   Asynchronous Event Notification (AEN) with NS_ATTR_CHANGED so\n>   the guest kernel rescans namespaces and adds the block device\n> - unplug: detach from all controllers, send AEN, remove from\n>   subsystem, then unrealize the device. The guest kernel rescans\n>   and removes the block device.\n> \n> The plug handler skips controllers that haven't started yet\n> (qs_created == false) to avoid interfering with boot-time namespace\n> attachment in nvme_start_ctrl().\n> \n> Both the controller bus and subsystem bus are configured as hotplug\n> handlers via qbus_set_bus_hotplug_handler() since nvme-ns devices\n> may reparent to the subsystem bus during realize.\n> \n> Example hot-swap sequence using the NVMe subsystem model:\n> \n>   # Boot with: -device nvme-subsys,id=subsys0\n>   #            -device nvme,id=ctrl0,subsys=subsys0\n>   #            -device nvme-ns,id=ns0,drive=drv0,bus=ctrl0,nsid=1\n> \n>   device_del ns0             # guest receives AEN, removes /dev/nvme0n1\n>   drive_del drv0\n>   drive_add 0 file=disk.qcow2,format=qcow2,id=drv0,if=none\n>   device_add nvme-ns,id=ns0,drive=drv0,bus=ctrl0,nsid=1\n>                               # guest receives AEN, adds /dev/nvme0n1\n> \n> Tested with Linux 6.1 guest (NVMe driver processes AEN and rescans\n> namespace list automatically).\n\nDid you test a Windows Server guest? If not, I can try that next week in\ncase there are any surprises.\n\n> \n> Signed-off-by: Matthieu Receveur <matthieu@min.io>\n> ---\n>  hw/nvme/ctrl.c   | 85 ++++++++++++++++++++++++++++++++++++++++++++++++\n>  hw/nvme/ns.c     |  1 +\n>  hw/nvme/subsys.c |  2 ++\n>  3 files changed, 88 insertions(+)\n> \n> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c\n> index be6c7028cb..5502e4ea2b 100644\n> --- a/hw/nvme/ctrl.c\n> +++ b/hw/nvme/ctrl.c\n> @@ -206,6 +206,7 @@\n>  #include \"system/hostmem.h\"\n>  #include \"hw/pci/msix.h\"\n>  #include \"hw/pci/pcie_sriov.h\"\n> +#include \"hw/core/qdev.h\"\n>  #include \"system/spdm-socket.h\"\n>  #include \"migration/vmstate.h\"\n>  \n> @@ -9293,6 +9294,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)\n>      }\n>  \n>      qbus_init(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id);\n> +    qbus_set_bus_hotplug_handler(BUS(&n->bus));\n>  \n>      if (nvme_init_subsys(n, errp)) {\n>          return;\n> @@ -9553,10 +9555,93 @@ static const TypeInfo nvme_info = {\n>      },\n>  };\n>  \n> +static void nvme_ns_hot_plug(HotplugHandler *hotplug_dev, DeviceState *dev,\n> +                              Error **errp)\n> +{\n> +    NvmeNamespace *ns = NVME_NS(dev);\n> +    NvmeSubsystem *subsys = ns->subsys;\n> +    uint32_t nsid = ns->params.nsid;\n> +    int i;\n> +\n> +    /*\n> +     * Attach to all started controllers and notify via AEN.\n> +     * Skip controllers that haven't started yet (boot-time realize) —\n> +     * nvme_start_ctrl() will attach namespaces during controller init.\n> +     */\n> +    for (i = 0; i < NVME_MAX_CONTROLLERS; i++) {\n> +        NvmeCtrl *ctrl = nvme_subsys_ctrl(subsys, i);\n> +        if (!ctrl || !ctrl->qs_created) {\n> +            continue;\n> +        }\n> +\n> +        if (nvme_csi_supported(ctrl, ns->csi) && !ns->params.detached) {\n> +            nvme_attach_ns(ctrl, ns);\n> +            nvme_update_dsm_limits(ctrl, ns);\n> +\n> +            if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {\n> +                nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,\n> +                                   NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,\n> +                                   NVME_LOG_CHANGED_NSLIST);\n> +            }\n> +        }\n> +    }\n> +}\n> +\n> +static void nvme_ns_hot_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,\n> +                               Error **errp)\n> +{\n> +    NvmeNamespace *ns = NVME_NS(dev);\n> +    NvmeSubsystem *subsys = ns->subsys;\n> +    uint32_t nsid = ns->params.nsid;\n> +    int i;\n\nWhile there is qdev_unrealize -> nvme_ns_unrealize -> nvme_ns_drain ->\nblk_drain to quiesce I/O requests at the end of this function, I wonder\nwhether it's safe to start removing the namespace before I/O has been\ndrained.\n\nDid you test hot unplug while the Namespace is under heavy I/O (e.g. fio\njob running inside the guest with lots of queued I/O requests)?\n\nIt might be necessary to stop I/O first before tearing down the\nnamespace.\n\n> +\n> +    /*\n> +     * Detach from all controllers and notify the guest via AEN.\n> +     * Must happen before unrealize to avoid use-after-free when the\n> +     * guest sends I/O to a freed namespace.\n> +     */\n> +    for (i = 0; i < NVME_MAX_CONTROLLERS; i++) {\n> +        NvmeCtrl *ctrl = nvme_subsys_ctrl(subsys, i);\n> +        if (!ctrl || !nvme_ns(ctrl, nsid)) {\n> +            continue;\n> +        }\n> +\n> +        nvme_detach_ns(ctrl, ns);\n> +        nvme_update_dsm_limits(ctrl, NULL);\n> +\n> +        if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {\n> +            nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,\n> +                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,\n> +                               NVME_LOG_CHANGED_NSLIST);\n> +        }\n> +    }\n> +\n> +    /* Remove from subsystem namespace list. */\n> +    subsys->namespaces[nsid] = NULL;\n\nThe dual of this operation is done in nvme_ns_realize():\n\n  subsys->namespaces[nsid] = ns;\n\nMaybe nvme_ns_unrealize() should remove the namespace from the\nsubsystem for consistency? I guess the lack of removal was never an\nissue before hot unplug, but now it would be nice to implement the\nlifecycle.\n\n> +\n> +    /*\n> +     * Unrealize: drain I/O, flush, cleanup structures, remove from QOM.\n> +     * nvme_ns_unrealize() handles drain/shutdown/cleanup internally.\n> +     */\n> +    qdev_unrealize(dev);\n> +}\n> +\n> +static void nvme_bus_class_init(ObjectClass *klass, const void *data)\n> +{\n> +    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);\n> +    hc->plug = nvme_ns_hot_plug;\n> +    hc->unplug = nvme_ns_hot_unplug;\n> +}\n> +\n>  static const TypeInfo nvme_bus_info = {\n>      .name = TYPE_NVME_BUS,\n>      .parent = TYPE_BUS,\n>      .instance_size = sizeof(NvmeBus),\n> +    .class_init = nvme_bus_class_init,\n> +    .interfaces = (const InterfaceInfo[]) {\n> +        { TYPE_HOTPLUG_HANDLER },\n> +        { }\n> +    },\n>  };\n>  \n>  static void nvme_register_types(void)\n> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c\n> index b0106eaa5c..eb628c0734 100644\n> --- a/hw/nvme/ns.c\n> +++ b/hw/nvme/ns.c\n> @@ -937,6 +937,7 @@ static void nvme_ns_class_init(ObjectClass *oc, const void *data)\n>      dc->bus_type = TYPE_NVME_BUS;\n>      dc->realize = nvme_ns_realize;\n>      dc->unrealize = nvme_ns_unrealize;\n> +    dc->hotpluggable = true;\n>      device_class_set_props(dc, nvme_ns_props);\n>      dc->desc = \"Virtual NVMe namespace\";\n>  }\n> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c\n> index 777e1c620f..fa35055d3c 100644\n> --- a/hw/nvme/subsys.c\n> +++ b/hw/nvme/subsys.c\n> @@ -9,6 +9,7 @@\n>  #include \"qemu/osdep.h\"\n>  #include \"qemu/units.h\"\n>  #include \"qapi/error.h\"\n> +#include \"hw/core/qdev.h\"\n>  \n>  #include \"nvme.h\"\n>  \n> @@ -205,6 +206,7 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)\n>      NvmeSubsystem *subsys = NVME_SUBSYS(dev);\n>  \n>      qbus_init(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id);\n> +    qbus_set_bus_hotplug_handler(BUS(&subsys->bus));\n>  \n>      nvme_subsys_setup(subsys, errp);\n>  }\n> -- \n> 2.53.0\n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=KU6YhQB6;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsbzT5yvTz1yGb\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 22:42:11 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wBBB1-0002Ib-TC; Fri, 10 Apr 2026 08:41:35 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <stefanha@redhat.com>)\n id 1wBBB0-0002Hq-5j\n for qemu-devel@nongnu.org; Fri, 10 Apr 2026 08:41:34 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.133.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <stefanha@redhat.com>)\n id 1wBBAy-0007Rs-03\n for qemu-devel@nongnu.org; Fri, 10 Apr 2026 08:41:33 -0400","from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com\n (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by\n relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,\n cipher=TLS_AES_256_GCM_SHA384) id us-mta-668-5VEcGlopPG26FGig9nsY0g-1; Fri,\n 10 Apr 2026 08:41:27 -0400","from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com\n (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest\n SHA256)\n (No client certificate requested)\n by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS\n id 2E52919560AA; Fri, 10 Apr 2026 12:41:25 +0000 (UTC)","from localhost (unknown [10.44.33.180])\n by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP\n id 8F5C93000C16; Fri, 10 Apr 2026 12:41:23 +0000 (UTC)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775824890;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=OkFro9RzyLlMJ1jNDT8zK4zo475sncVAQoU2I6wG3yk=;\n b=KU6YhQB6erdEJwlVSQ+P58OUeBUzFKpXBDE4EsPrLtPjf+iXuRhAIWrVTE86Nqd5/Am0ze\n lC40iAnIGJ18OyfFVAV30C3593Rmwl/7DzfGS+qVyA3kz3yKgW0WpyLaMpSjbYhArmoYGG\n GD/p28Ui6zzYPVlqE9NACsaSxa3bixE=","X-MC-Unique":"5VEcGlopPG26FGig9nsY0g-1","X-Mimecast-MFC-AGG-ID":"5VEcGlopPG26FGig9nsY0g_1775824885","Date":"Fri, 10 Apr 2026 08:41:21 -0400","From":"Stefan Hajnoczi <stefanha@redhat.com>","To":"mr-083 <matthieu@minio.io>","Cc":"qemu-devel@nongnu.org, qemu-block@nongnu.org, its@irrelevant.dk,\n kbusch@kernel.org, mr-083 <matthieu@min.io>","Subject":"Re: [PATCH v2] hw/nvme: add namespace hotplug support","Message-ID":"<20260410124121.GA485553@fedora>","References":"<20260409060155.94704-1-matthieu@min.io>\n <20260409213454.5186-1-matthieu@min.io>","MIME-Version":"1.0","Content-Type":"multipart/signed; micalg=pgp-sha512;\n protocol=\"application/pgp-signature\"; boundary=\"MjL36GCxI9d0/xOb\"","Content-Disposition":"inline","In-Reply-To":"<20260409213454.5186-1-matthieu@min.io>","X-Scanned-By":"MIMEDefang 3.4.1 on 10.30.177.4","Received-SPF":"pass client-ip=170.10.133.124;\n envelope-from=stefanha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"7","X-Spam_score":"0.7","X-Spam_bar":"/","X-Spam_report":"(0.7 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001, RCVD_IN_SBL_CSS=3.335,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}}]