diff mbox series

hw/block/nvme: re-enable NVMe PCI hotplug

Message ID 20210511073511.32511-1-hare@suse.de
State New
Headers show
Series hw/block/nvme: re-enable NVMe PCI hotplug | expand

Commit Message

Hannes Reinecke May 11, 2021, 7:35 a.m. UTC
Ever since commit e570768566 ("hw/block/nvme: support for shared
namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
hotplug infrastructure will only work for the nvme devices (which
are PCI devices), but not for any attached namespaces.
So when re-adding the NVMe PCI device via 'device_add' the NVMe
controller is added, but all namespaces are missing.
This patch adds device hotplug hooks for NVMe namespaces, such that one
can call 'device_add nvme-ns' to (re-)attach the namespaces after
the PCI NVMe device 'device_add nvme' hotplug call.

Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 capstone               |  2 +-
 hw/block/nvme-ns.c     | 31 ++++++++++++++++++++++
 hw/block/nvme-subsys.c | 12 +++++++++
 hw/block/nvme-subsys.h |  1 +
 hw/block/nvme.c        | 60 +++++++++++++++++++++++++++++++-----------
 hw/block/nvme.h        |  1 +
 roms/SLOF              |  2 +-
 roms/openbios          |  2 +-
 roms/u-boot            |  2 +-
 9 files changed, 93 insertions(+), 20 deletions(-)

Comments

no-reply@patchew.org May 11, 2021, 7:45 a.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20210511073511.32511-1-hare@suse.de/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210511073511.32511-1-hare@suse.de
Subject: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
   e58c7a3..e4f3ede  master     -> master
 * [new tag]         patchew/20210511073511.32511-1-hare@suse.de -> patchew/20210511073511.32511-1-hare@suse.de
Switched to a new branch 'test'
1d24622 hw/block/nvme: re-enable NVMe PCI hotplug

=== OUTPUT BEGIN ===
ERROR: braces {} are necessary for all arms of this statement
#101: FILE: hw/block/nvme-subsys.c:50:
+    if (!n->subsys)
[...]

total: 1 errors, 0 warnings, 182 lines checked

Commit 1d24622c5d19 (hw/block/nvme: re-enable NVMe PCI hotplug) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210511073511.32511-1-hare@suse.de/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Philippe Mathieu-Daudé May 11, 2021, 8:05 a.m. UTC | #2
Cc'ing Klaus (maintainer)

On 5/11/21 9:35 AM, Hannes Reinecke wrote:
> Ever since commit e570768566 ("hw/block/nvme: support for shared
> namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
> hotplug infrastructure will only work for the nvme devices (which
> are PCI devices), but not for any attached namespaces.
> So when re-adding the NVMe PCI device via 'device_add' the NVMe
> controller is added, but all namespaces are missing.
> This patch adds device hotplug hooks for NVMe namespaces, such that one
> can call 'device_add nvme-ns' to (re-)attach the namespaces after
> the PCI NVMe device 'device_add nvme' hotplug call.
> 
> Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  capstone               |  2 +-

>  roms/SLOF              |  2 +-
>  roms/openbios          |  2 +-
>  roms/u-boot            |  2 +-
>  9 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/capstone b/capstone
> index f8b1b83301..22ead3e0bf 160000
> --- a/capstone
> +++ b/capstone
> @@ -1 +1 @@
> -Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
> +Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf

> index 33a7322de1..e18ddad851 160000
> --- a/roms/SLOF
> +++ b/roms/SLOF
> @@ -1 +1 @@
> -Subproject commit 33a7322de13e9dca4b38851a345a58d37e7a441d
> +Subproject commit e18ddad8516ff2cfe36ec130200318f7251aa78c
> diff --git a/roms/openbios b/roms/openbios
> index 4a0041107b..7f28286f5c 160000
> --- a/roms/openbios
> +++ b/roms/openbios
> @@ -1 +1 @@
> -Subproject commit 4a0041107b8ef77e0e8337bfcb5f8078887261a7
> +Subproject commit 7f28286f5cb1ca682e3ba0a8706d8884f12bc49e
> diff --git a/roms/u-boot b/roms/u-boot
> index b46dd116ce..d3689267f9 160000
> --- a/roms/u-boot
> +++ b/roms/u-boot
> @@ -1 +1 @@
> -Subproject commit b46dd116ce03e235f2a7d4843c6278e1da44b5e1
> +Subproject commit d3689267f92c5956e09cc7d1baa4700141662bff
> 

Submodule changes unlikely related.
Klaus Jensen May 11, 2021, 12:22 p.m. UTC | #3
On May 11 09:35, Hannes Reinecke wrote:
>Ever since commit e570768566 ("hw/block/nvme: support for shared
>namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
>hotplug infrastructure will only work for the nvme devices (which
>are PCI devices), but not for any attached namespaces.
>So when re-adding the NVMe PCI device via 'device_add' the NVMe
>controller is added, but all namespaces are missing.
>This patch adds device hotplug hooks for NVMe namespaces, such that one
>can call 'device_add nvme-ns' to (re-)attach the namespaces after
>the PCI NVMe device 'device_add nvme' hotplug call.
>

Hi Hannes,

Thanks for this.

The real fix here is that namespaces are properly detached from other 
controllers that it may be shared on.

But is this really the behavior we want? That nvme-ns devices always 
"belongs to" (in QEMU qdev terms) an nvme device is an artifact of the 
Bus/Device architecture and not really how an NVM subsystem should 
behave. Removing a controller should not cause shared namespaces to 
disappear from other controllers.

I have a WIP that instead adds an NvmeBus to the nvme-subsys device and 
reparents the nvme-ns devices to that if the parent controller is linked 
to a sybsystem. This way, nvme-ns devices wont be unrealized under the 
feet of other controllers.

The hotplug fix looks good - I'll post a series that tries to integrate 
both.

>Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
>Signed-off-by: Hannes Reinecke <hare@suse.de>
>---
> capstone               |  2 +-
> hw/block/nvme-ns.c     | 31 ++++++++++++++++++++++
> hw/block/nvme-subsys.c | 12 +++++++++
> hw/block/nvme-subsys.h |  1 +
> hw/block/nvme.c        | 60 +++++++++++++++++++++++++++++++-----------
> hw/block/nvme.h        |  1 +
> roms/SLOF              |  2 +-
> roms/openbios          |  2 +-
> roms/u-boot            |  2 +-
> 9 files changed, 93 insertions(+), 20 deletions(-)
>
>diff --git a/capstone b/capstone
>index f8b1b83301..22ead3e0bf 160000
>--- a/capstone
>+++ b/capstone
>@@ -1 +1 @@
>-Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
>+Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf
>diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
>index 7bb618f182..3a7e01f10f 100644
>--- a/hw/block/nvme-ns.c
>+++ b/hw/block/nvme-ns.c
>@@ -526,6 +526,36 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>     nvme_attach_ns(n, ns);
> }
>
>+static void nvme_ns_unrealize(DeviceState *dev)
>+{
>+    NvmeNamespace *ns = NVME_NS(dev);
>+    BusState *s = qdev_get_parent_bus(dev);
>+    NvmeCtrl *n = NVME(s->parent);
>+    NvmeSubsystem *subsys = n->subsys;
>+    uint32_t nsid = ns->params.nsid;
>+    int i;
>+
>+    nvme_ns_drain(ns);
>+    nvme_ns_shutdown(ns);
>+    nvme_ns_cleanup(ns);
>+
>+    if (subsys) {
>+        subsys->namespaces[nsid] = NULL;
>+
>+        if (ns->params.shared) {
>+            for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
>+                NvmeCtrl *ctrl = subsys->ctrls[i];
>+
>+                if (ctrl) {
>+                    nvme_detach_ns(ctrl, ns);
>+                }
>+            }
>+            return;
>+        }
>+    }
>+    nvme_detach_ns(n, ns);
>+}
>+
> static Property nvme_ns_props[] = {
>     DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
>     DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
>@@ -563,6 +593,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
>
>     dc->bus_type = TYPE_NVME_BUS;
>     dc->realize = nvme_ns_realize;
>+    dc->unrealize = nvme_ns_unrealize;
>     device_class_set_props(dc, nvme_ns_props);
>     dc->desc = "Virtual NVMe namespace";
> }
>diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
>index 9604c19117..1c00508f33 100644
>--- a/hw/block/nvme-subsys.c
>+++ b/hw/block/nvme-subsys.c
>@@ -42,6 +42,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
>     return cntlid;
> }
>
>+void nvme_subsys_unregister_ctrl(NvmeCtrl *n)
>+{
>+    NvmeSubsystem *subsys = n->subsys;
>+    int cntlid = n->cntlid;
>+
>+    if (!n->subsys)
>+        return;
>+    assert(cntlid < ARRAY_SIZE(subsys->ctrls));
>+    subsys->ctrls[cntlid] = NULL;
>+    n->cntlid = -1;
>+}
>+
> static void nvme_subsys_setup(NvmeSubsystem *subsys)
> {
>     const char *nqn = subsys->params.nqn ?
>diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
>index 7d7ef5f7f1..2d8a146c4f 100644
>--- a/hw/block/nvme-subsys.h
>+++ b/hw/block/nvme-subsys.h
>@@ -32,6 +32,7 @@ typedef struct NvmeSubsystem {
> } NvmeSubsystem;
>
> int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
>+void nvme_subsys_unregister_ctrl(NvmeCtrl *n);
>
> static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
>         uint32_t cntlid)
>diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>index 5fe082ec34..515678b686 100644
>--- a/hw/block/nvme.c
>+++ b/hw/block/nvme.c
>@@ -4963,26 +4963,12 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
>             }
>
>             nvme_attach_ns(ctrl, ns);
>-            __nvme_select_ns_iocs(ctrl, ns);
>         } else {
>             if (!nvme_ns(ctrl, nsid)) {
>                 return NVME_NS_NOT_ATTACHED | NVME_DNR;
>             }
>
>-            ctrl->namespaces[nsid - 1] = NULL;
>-            ns->attached--;
>-
>-            nvme_update_dmrsl(ctrl);
>-        }
>-
>-        /*
>-         * Add namespace id to the changed namespace id list for event clearing
>-         * via Get Log Page command.
>-         */
>-        if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {
>-            nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,
>-                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
>-                               NVME_LOG_CHANGED_NSLIST);
>+            nvme_detach_ns(ctrl, ns);
>         }
>     }
>
>@@ -6166,6 +6152,34 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
>
>     n->dmrsl = MIN_NON_ZERO(n->dmrsl,
>                             BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
>+    if (NVME_CC_EN(n->bar.cc)) {
>+        /* Ctrl is live */
>+        __nvme_select_ns_iocs(n, ns);
>+        if (!test_and_set_bit(nsid, n->changed_nsids)) {
>+            nvme_enqueue_event(n, NVME_AER_TYPE_NOTICE,
>+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
>+                               NVME_LOG_CHANGED_NSLIST);
>+        }
>+    }
>+}
>+
>+void nvme_detach_ns(NvmeCtrl *n, NvmeNamespace *ns)
>+{
>+    uint32_t nsid = ns->params.nsid;
>+
>+    if (ns->attached) {
>+        n->namespaces[nsid - 1] = NULL;
>+        ns->attached--;
>+    }
>+    nvme_update_dmrsl(n);
>+    if (NVME_CC_EN(n->bar.cc)) {
>+        /* Ctrl is live */
>+        if (!test_and_set_bit(nsid, n->changed_nsids)) {
>+            nvme_enqueue_event(n, NVME_AER_TYPE_NOTICE,
>+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
>+                               NVME_LOG_CHANGED_NSLIST);
>+        }
>+    }
> }
>
> static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>@@ -6193,7 +6207,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>         return;
>     }
>     nvme_init_ctrl(n, pci_dev);
>-
>+    qbus_set_bus_hotplug_handler(BUS(&n->bus));
>     /* setup a namespace if the controller drive property was given */
>     if (n->namespace.blkconf.blk) {
>         ns = &n->namespace;
>@@ -6224,6 +6238,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>         nvme_ns_cleanup(ns);
>     }
>
>+    nvme_subsys_unregister_ctrl(n);
>+
>     g_free(n->cq);
>     g_free(n->sq);
>     g_free(n->aer_reqs);
>@@ -6348,10 +6364,22 @@ static const TypeInfo nvme_info = {
>     },
> };
>
>+static void nvme_bus_class_init(ObjectClass *klass, void *data)
>+{
>+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
>+
>+    hc->unplug = qdev_simple_device_unplug_cb;
>+}
>+
> static const TypeInfo nvme_bus_info = {
>     .name = TYPE_NVME_BUS,
>     .parent = TYPE_BUS,
>     .instance_size = sizeof(NvmeBus),
>+    .class_init = nvme_bus_class_init,
>+    .interfaces = (InterfaceInfo[]) {
>+        { TYPE_HOTPLUG_HANDLER },
>+        { }
>+    }
> };
>
> static void nvme_register_types(void)
>diff --git a/hw/block/nvme.h b/hw/block/nvme.h
>index 5d05ec368f..4fc06f58a4 100644
>--- a/hw/block/nvme.h
>+++ b/hw/block/nvme.h
>@@ -255,6 +255,7 @@ typedef enum NvmeTxDirection {
> } NvmeTxDirection;
>
> void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns);
>+void nvme_detach_ns(NvmeCtrl *n, NvmeNamespace *ns);
> uint16_t nvme_bounce_data(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>                           NvmeTxDirection dir, NvmeRequest *req);
> uint16_t nvme_bounce_mdata(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>diff --git a/roms/SLOF b/roms/SLOF
>index 33a7322de1..e18ddad851 160000
>--- a/roms/SLOF
>+++ b/roms/SLOF
>@@ -1 +1 @@
>-Subproject commit 33a7322de13e9dca4b38851a345a58d37e7a441d
>+Subproject commit e18ddad8516ff2cfe36ec130200318f7251aa78c
>diff --git a/roms/openbios b/roms/openbios
>index 4a0041107b..7f28286f5c 160000
>--- a/roms/openbios
>+++ b/roms/openbios
>@@ -1 +1 @@
>-Subproject commit 4a0041107b8ef77e0e8337bfcb5f8078887261a7
>+Subproject commit 7f28286f5cb1ca682e3ba0a8706d8884f12bc49e
>diff --git a/roms/u-boot b/roms/u-boot
>index b46dd116ce..d3689267f9 160000
>--- a/roms/u-boot
>+++ b/roms/u-boot
>@@ -1 +1 @@
>-Subproject commit b46dd116ce03e235f2a7d4843c6278e1da44b5e1
>+Subproject commit d3689267f92c5956e09cc7d1baa4700141662bff
>-- 
>2.26.2
>
>
Hannes Reinecke May 11, 2021, 1:12 p.m. UTC | #4
On 5/11/21 2:22 PM, Klaus Jensen wrote:
> On May 11 09:35, Hannes Reinecke wrote:
>> Ever since commit e570768566 ("hw/block/nvme: support for shared
>> namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
>> hotplug infrastructure will only work for the nvme devices (which
>> are PCI devices), but not for any attached namespaces.
>> So when re-adding the NVMe PCI device via 'device_add' the NVMe
>> controller is added, but all namespaces are missing.
>> This patch adds device hotplug hooks for NVMe namespaces, such that one
>> can call 'device_add nvme-ns' to (re-)attach the namespaces after
>> the PCI NVMe device 'device_add nvme' hotplug call.
>>
> 
> Hi Hannes,
> 
> Thanks for this.
> 
> The real fix here is that namespaces are properly detached from other
> controllers that it may be shared on.
> 
> But is this really the behavior we want? That nvme-ns devices always
> "belongs to" (in QEMU qdev terms) an nvme device is an artifact of the
> Bus/Device architecture and not really how an NVM subsystem should
> behave. Removing a controller should not cause shared namespaces to
> disappear from other controllers.
> 
> I have a WIP that instead adds an NvmeBus to the nvme-subsys device and
> reparents the nvme-ns devices to that if the parent controller is linked
> to a sybsystem. This way, nvme-ns devices wont be unrealized under the
> feet of other controllers.
> 
That would be the other direction I thought of; _technically_ NVMe
namespaces are objects of the subsystem, and 'controllers' are just
temporary objects providing access to the namespaces presented by the
subsystem.
So if you are going to rework it I'd rather make the namespaces children
objects of the subsystem, and have nsid maps per controller detailing
which nsids are accessible from the individual controllers.
That would probably a simple memcpy() to start with, but it would allow
us to modify that map via NVMe-MI and stuff.

However, if you do that you'll find that subsystems can't be hotplugged,
too; but I'm sure you'll be able to fix it up :-)

> The hotplug fix looks good - I'll post a series that tries to integrate
> both.
> 
Ta.

The more I think about it, the more I think we should be looking into
reparenting the namespaces to the subsystem.
That would have the _immediate_ benefit that 'device_del' and
'device_add' becomes symmetric (ie one doesn't have to do a separate
'device_add nvme-ns'), as the nvme namespace is not affected by the
hotplug event.

This really was a quick hack to demonstrate a shortcoming in the linux
NVMe stack (cf 'nvme-mpath: delete disk after last connection' if you
are interested in details), so I'm sure there is room for improvement.

And the prime reason for sending it out was to gauge interest by
qemu-devel; I have a somewhat mixed experience when sending patches to
the qemu ML ...

Cheers,

Hannes
Klaus Jensen May 11, 2021, 1:37 p.m. UTC | #5
On May 11 15:12, Hannes Reinecke wrote:
>On 5/11/21 2:22 PM, Klaus Jensen wrote:
>> On May 11 09:35, Hannes Reinecke wrote:
>>> Ever since commit e570768566 ("hw/block/nvme: support for shared
>>> namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
>>> hotplug infrastructure will only work for the nvme devices (which
>>> are PCI devices), but not for any attached namespaces.
>>> So when re-adding the NVMe PCI device via 'device_add' the NVMe
>>> controller is added, but all namespaces are missing.
>>> This patch adds device hotplug hooks for NVMe namespaces, such that one
>>> can call 'device_add nvme-ns' to (re-)attach the namespaces after
>>> the PCI NVMe device 'device_add nvme' hotplug call.
>>>
>>
>> Hi Hannes,
>>
>> Thanks for this.
>>
>> The real fix here is that namespaces are properly detached from other
>> controllers that it may be shared on.
>>
>> But is this really the behavior we want? That nvme-ns devices always
>> "belongs to" (in QEMU qdev terms) an nvme device is an artifact of the
>> Bus/Device architecture and not really how an NVM subsystem should
>> behave. Removing a controller should not cause shared namespaces to
>> disappear from other controllers.
>>
>> I have a WIP that instead adds an NvmeBus to the nvme-subsys device and
>> reparents the nvme-ns devices to that if the parent controller is linked
>> to a sybsystem. This way, nvme-ns devices wont be unrealized under the
>> feet of other controllers.
>>
>That would be the other direction I thought of; _technically_ NVMe
>namespaces are objects of the subsystem, and 'controllers' are just
>temporary objects providing access to the namespaces presented by the
>subsystem.
>So if you are going to rework it I'd rather make the namespaces children
>objects of the subsystem, and have nsid maps per controller detailing
>which nsids are accessible from the individual controllers.
>That would probably a simple memcpy() to start with, but it would allow
>us to modify that map via NVMe-MI and stuff.
>
>However, if you do that you'll find that subsystems can't be hotplugged,
>too; but I'm sure you'll be able to fix it up :-)
>
>> The hotplug fix looks good - I'll post a series that tries to integrate
>> both.
>>
>Ta.
>
>The more I think about it, the more I think we should be looking into
>reparenting the namespaces to the subsystem.
>That would have the _immediate_ benefit that 'device_del' and
>'device_add' becomes symmetric (ie one doesn't have to do a separate
>'device_add nvme-ns'), as the nvme namespace is not affected by the
>hotplug event.
>

I have that working, but I'm struggling with a QEMU API technicality in 
that I apparently cannot simply move the NvmeBus creation to the 
nvme-subsys device. For some reason the bus is not available for the 
nvme-ns devices. That is, if one does something like this:

   -device nvme-subsys,...
   -device nvme-ns,...

Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns". 
This is probably just me not grok'ing the qdev well enough, so I'll keep 
trying to fix that. What works now is to have the regular setup:

   -device nvme-subsys,...
   -device nvme,...
   -device nvme-ns,...

And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys 
(which magically now IS available when nvme-ns is realized). This has 
the same end result, but I really would like that the namespaces could 
be specified as children of the subsys directly.

Any help from qdev experts would be appreciated! :)

>This really was a quick hack to demonstrate a shortcoming in the linux
>NVMe stack (cf 'nvme-mpath: delete disk after last connection' if you
>are interested in details), so I'm sure there is room for improvement.
>

I follow linux-nvme, so I saw the patch ;)

>And the prime reason for sending it out was to gauge interest by
>qemu-devel; I have a somewhat mixed experience when sending patches to
>the qemu ML ...
>

Your contribution is very much appreciated :)
Hannes Reinecke May 11, 2021, 2:54 p.m. UTC | #6
On 5/11/21 3:37 PM, Klaus Jensen wrote:
> On May 11 15:12, Hannes Reinecke wrote:
>> On 5/11/21 2:22 PM, Klaus Jensen wrote:
[ .. ]
>>> The hotplug fix looks good - I'll post a series that tries to integrate
>>> both.
>>>
>> Ta.
>>
>> The more I think about it, the more I think we should be looking into
>> reparenting the namespaces to the subsystem.
>> That would have the _immediate_ benefit that 'device_del' and
>> 'device_add' becomes symmetric (ie one doesn't have to do a separate
>> 'device_add nvme-ns'), as the nvme namespace is not affected by the
>> hotplug event.
>>
> 
> I have that working, but I'm struggling with a QEMU API technicality in
> that I apparently cannot simply move the NvmeBus creation to the
> nvme-subsys device. For some reason the bus is not available for the
> nvme-ns devices. That is, if one does something like this:
> 
>   -device nvme-subsys,...
>   -device nvme-ns,...
> 
> Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
> This is probably just me not grok'ing the qdev well enough, so I'll keep
> trying to fix that. What works now is to have the regular setup:
> 
_Normally_ the 'id' of the parent device spans a bus, so the syntax
should be

-device nvme-subsys,id=subsys1,...
-device nvme-ns,bus=subsys1,...

As for the nvme device I would initially expose any namespace from the
subsystem to the controller; the nvme spec has some concept of 'active'
or 'inactive' namespaces which would allow us to blank out individual
namespaces on a per-controller basis, but I fear that's not easy to
model with qdev and the structure above.

>   -device nvme-subsys,...
>   -device nvme,...
>   -device nvme-ns,...
> 
> And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
> (which magically now IS available when nvme-ns is realized). This has
> the same end result, but I really would like that the namespaces could
> be specified as children of the subsys directly.
> 
Shudder.
Automatic reparenting.
To my understanding from qdev that shouldn't even be possible.
Please don't.

Cheers,

Hannes
Klaus Jensen May 11, 2021, 4:03 p.m. UTC | #7
On May 11 16:54, Hannes Reinecke wrote:
>On 5/11/21 3:37 PM, Klaus Jensen wrote:
>> On May 11 15:12, Hannes Reinecke wrote:
>>> On 5/11/21 2:22 PM, Klaus Jensen wrote:
>[ .. ]
>>>> The hotplug fix looks good - I'll post a series that tries to integrate
>>>> both.
>>>>
>>> Ta.
>>>
>>> The more I think about it, the more I think we should be looking into
>>> reparenting the namespaces to the subsystem.
>>> That would have the _immediate_ benefit that 'device_del' and
>>> 'device_add' becomes symmetric (ie one doesn't have to do a separate
>>> 'device_add nvme-ns'), as the nvme namespace is not affected by the
>>> hotplug event.
>>>
>>
>> I have that working, but I'm struggling with a QEMU API technicality in
>> that I apparently cannot simply move the NvmeBus creation to the
>> nvme-subsys device. For some reason the bus is not available for the
>> nvme-ns devices. That is, if one does something like this:
>>
>>   -device nvme-subsys,...
>>   -device nvme-ns,...
>>
>> Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
>> This is probably just me not grok'ing the qdev well enough, so I'll keep
>> trying to fix that. What works now is to have the regular setup:
>>
>_Normally_ the 'id' of the parent device spans a bus, so the syntax
>should be
>
>-device nvme-subsys,id=subsys1,...
>-device nvme-ns,bus=subsys1,...

Yeah, I know, I just oversimplified the example. This *is* how I wanted 
it to work ;)

>
>As for the nvme device I would initially expose any namespace from the
>subsystem to the controller; the nvme spec has some concept of 'active'
>or 'inactive' namespaces which would allow us to blank out individual
>namespaces on a per-controller basis, but I fear that's not easy to
>model with qdev and the structure above.
>

The nvme-ns device already supports the boolean 'detached' parameter to 
support the concept of an inactive namespace.

>>   -device nvme-subsys,...
>>   -device nvme,...
>>   -device nvme-ns,...
>>
>> And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
>> (which magically now IS available when nvme-ns is realized). This has
>> the same end result, but I really would like that the namespaces could
>> be specified as children of the subsys directly.
>>
>Shudder.
>Automatic reparenting.
>To my understanding from qdev that shouldn't even be possible.
>Please don't.
>

It's perfectly possible with the API and used to implement stuff like 
failover. We are not changing the parent object, we are changing the 
parent bus. hw/sd does something like this (but does mention that its a 
bit of a hack). In this case I'd say we could argue to get away with it 
as well. 

Allowing the nvme-ns device to be a child of the controller allows the 
initially attached controller of non-shared namespaces to be easily 
expressible. But I agree, the approach is a bit wacky, which is why I 
havnt posted anything yet.
Hannes Reinecke May 11, 2021, 4:12 p.m. UTC | #8
On 5/11/21 6:03 PM, Klaus Jensen wrote:
> On May 11 16:54, Hannes Reinecke wrote:
>> On 5/11/21 3:37 PM, Klaus Jensen wrote:
>>> On May 11 15:12, Hannes Reinecke wrote:
>>>> On 5/11/21 2:22 PM, Klaus Jensen wrote:
>> [ .. ]
>>>>> The hotplug fix looks good - I'll post a series that tries to 
>>>>> integrate
>>>>> both.
>>>>>
>>>> Ta.
>>>>
>>>> The more I think about it, the more I think we should be looking into
>>>> reparenting the namespaces to the subsystem.
>>>> That would have the _immediate_ benefit that 'device_del' and
>>>> 'device_add' becomes symmetric (ie one doesn't have to do a separate
>>>> 'device_add nvme-ns'), as the nvme namespace is not affected by the
>>>> hotplug event.
>>>>
>>>
>>> I have that working, but I'm struggling with a QEMU API technicality in
>>> that I apparently cannot simply move the NvmeBus creation to the
>>> nvme-subsys device. For some reason the bus is not available for the
>>> nvme-ns devices. That is, if one does something like this:
>>>
>>>   -device nvme-subsys,...
>>>   -device nvme-ns,...
>>>
>>> Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
>>> This is probably just me not grok'ing the qdev well enough, so I'll keep
>>> trying to fix that. What works now is to have the regular setup:
>>>
>> _Normally_ the 'id' of the parent device spans a bus, so the syntax
>> should be
>>
>> -device nvme-subsys,id=subsys1,...
>> -device nvme-ns,bus=subsys1,...
> 
> Yeah, I know, I just oversimplified the example. This *is* how I wanted 
> it to work ;)
> 
>>
>> As for the nvme device I would initially expose any namespace from the
>> subsystem to the controller; the nvme spec has some concept of 'active'
>> or 'inactive' namespaces which would allow us to blank out individual
>> namespaces on a per-controller basis, but I fear that's not easy to
>> model with qdev and the structure above.
>>
> 
> The nvme-ns device already supports the boolean 'detached' parameter to 
> support the concept of an inactive namespace.
> 
Yeah, but that doesn't really work if we move the namespace to the 
subsystem; the 'detached' parameter is for the controller<->namespace
relationship.
That's why I meant we have to have some sort of NSID map for the 
controller such that the controller knows with NSID to access.
I guess we can copy the trick with the NSID array, and reverse the 
operation we have now wrt subsystem; keep the main NSID array in the 
subsystem, and per-controller NSID arrays holding those which can be 
accessed.

And ignore the commandline for now; figure that one out later.

Cheers,

Hannes
>>>   -device nvme-subsys,...
>>>   -device nvme,...
>>>   -device nvme-ns,...
>>>
>>> And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
>>> (which magically now IS available when nvme-ns is realized). This has
>>> the same end result, but I really would like that the namespaces could
>>> be specified as children of the subsys directly.
>>>
>> Shudder.
>> Automatic reparenting.
>> To my understanding from qdev that shouldn't even be possible.
>> Please don't.
>>
> 
> It's perfectly possible with the API and used to implement stuff like 
> failover. We are not changing the parent object, we are changing the 
> parent bus. hw/sd does something like this (but does mention that its a 
> bit of a hack). In this case I'd say we could argue to get away with it 
> as well.
> Allowing the nvme-ns device to be a child of the controller allows the 
> initially attached controller of non-shared namespaces to be easily 
> expressible. But I agree, the approach is a bit wacky, which is why I 
> havnt posted anything yet.

Yep, I did try to implement multipathing for SCSI at one point, and that 
became patently horrible as it would run against qdev where everything 
is ordered within a tree.

Cheers,

Hannes
Daniel Wagner Oct. 10, 2022, 5:01 p.m. UTC | #9
On Tue, May 11, 2021 at 06:12:47PM +0200, Hannes Reinecke wrote:
> On 5/11/21 6:03 PM, Klaus Jensen wrote:
> > On May 11 16:54, Hannes Reinecke wrote:
> > > On 5/11/21 3:37 PM, Klaus Jensen wrote:
> > > > On May 11 15:12, Hannes Reinecke wrote:
> > > > > On 5/11/21 2:22 PM, Klaus Jensen wrote:
> > > [ .. ]
> > > > > > The hotplug fix looks good - I'll post a series that
> > > > > > tries to integrate
> > > > > > both.
> > > > > > 
> > > > > Ta.
> > > > > 
> > > > > The more I think about it, the more I think we should be looking into
> > > > > reparenting the namespaces to the subsystem.
> > > > > That would have the _immediate_ benefit that 'device_del' and
> > > > > 'device_add' becomes symmetric (ie one doesn't have to do a separate
> > > > > 'device_add nvme-ns'), as the nvme namespace is not affected by the
> > > > > hotplug event.
> > > > > 
> > > > 
> > > > I have that working, but I'm struggling with a QEMU API technicality in
> > > > that I apparently cannot simply move the NvmeBus creation to the
> > > > nvme-subsys device. For some reason the bus is not available for the
> > > > nvme-ns devices. That is, if one does something like this:
> > > > 
> > > >   -device nvme-subsys,...
> > > >   -device nvme-ns,...
> > > > 
> > > > Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
> > > > This is probably just me not grok'ing the qdev well enough, so I'll keep
> > > > trying to fix that. What works now is to have the regular setup:
> > > > 
> > > _Normally_ the 'id' of the parent device spans a bus, so the syntax
> > > should be
> > > 
> > > -device nvme-subsys,id=subsys1,...
> > > -device nvme-ns,bus=subsys1,...
> > 
> > Yeah, I know, I just oversimplified the example. This *is* how I wanted
> > it to work ;)
> > 
> > > 
> > > As for the nvme device I would initially expose any namespace from the
> > > subsystem to the controller; the nvme spec has some concept of 'active'
> > > or 'inactive' namespaces which would allow us to blank out individual
> > > namespaces on a per-controller basis, but I fear that's not easy to
> > > model with qdev and the structure above.
> > > 
> > 
> > The nvme-ns device already supports the boolean 'detached' parameter to
> > support the concept of an inactive namespace.
> > 
> Yeah, but that doesn't really work if we move the namespace to the
> subsystem; the 'detached' parameter is for the controller<->namespace
> relationship.
> That's why I meant we have to have some sort of NSID map for the controller
> such that the controller knows with NSID to access.
> I guess we can copy the trick with the NSID array, and reverse the operation
> we have now wrt subsystem; keep the main NSID array in the subsystem, and
> per-controller NSID arrays holding those which can be accessed.
> 
> And ignore the commandline for now; figure that one out later.
> 
> Cheers,
> 
> Hannes
> > > >   -device nvme-subsys,...
> > > >   -device nvme,...
> > > >   -device nvme-ns,...
> > > > 
> > > > And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
> > > > (which magically now IS available when nvme-ns is realized). This has
> > > > the same end result, but I really would like that the namespaces could
> > > > be specified as children of the subsys directly.
> > > > 
> > > Shudder.
> > > Automatic reparenting.
> > > To my understanding from qdev that shouldn't even be possible.
> > > Please don't.
> > > 
> > 
> > It's perfectly possible with the API and used to implement stuff like
> > failover. We are not changing the parent object, we are changing the
> > parent bus. hw/sd does something like this (but does mention that its a
> > bit of a hack). In this case I'd say we could argue to get away with it
> > as well.
> > Allowing the nvme-ns device to be a child of the controller allows the
> > initially attached controller of non-shared namespaces to be easily
> > expressible. But I agree, the approach is a bit wacky, which is why I
> > havnt posted anything yet.
> 
> Yep, I did try to implement multipathing for SCSI at one point, and that
> became patently horrible as it would run against qdev where everything is
> ordered within a tree.

Sorry to ask but has there been any progress on this topic? Just run
into the same issue that adding nvme device during runtime is not
showing any namespace.

Thanks,
Daniel
Klaus Jensen Oct. 10, 2022, 5:15 p.m. UTC | #10
On Okt 10 19:01, Daniel Wagner wrote:
> On Tue, May 11, 2021 at 06:12:47PM +0200, Hannes Reinecke wrote:
> > On 5/11/21 6:03 PM, Klaus Jensen wrote:
> > > On May 11 16:54, Hannes Reinecke wrote:
> > > > On 5/11/21 3:37 PM, Klaus Jensen wrote:
> > > > > On May 11 15:12, Hannes Reinecke wrote:
> > > > > > On 5/11/21 2:22 PM, Klaus Jensen wrote:
> > > > [ .. ]
> > > > > > > The hotplug fix looks good - I'll post a series that
> > > > > > > tries to integrate
> > > > > > > both.
> > > > > > > 
> > > > > > Ta.
> > > > > > 
> > > > > > The more I think about it, the more I think we should be looking into
> > > > > > reparenting the namespaces to the subsystem.
> > > > > > That would have the _immediate_ benefit that 'device_del' and
> > > > > > 'device_add' becomes symmetric (ie one doesn't have to do a separate
> > > > > > 'device_add nvme-ns'), as the nvme namespace is not affected by the
> > > > > > hotplug event.
> > > > > > 
> > > > > 
> > > > > I have that working, but I'm struggling with a QEMU API technicality in
> > > > > that I apparently cannot simply move the NvmeBus creation to the
> > > > > nvme-subsys device. For some reason the bus is not available for the
> > > > > nvme-ns devices. That is, if one does something like this:
> > > > > 
> > > > >   -device nvme-subsys,...
> > > > >   -device nvme-ns,...
> > > > > 
> > > > > Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
> > > > > This is probably just me not grok'ing the qdev well enough, so I'll keep
> > > > > trying to fix that. What works now is to have the regular setup:
> > > > > 
> > > > _Normally_ the 'id' of the parent device spans a bus, so the syntax
> > > > should be
> > > > 
> > > > -device nvme-subsys,id=subsys1,...
> > > > -device nvme-ns,bus=subsys1,...
> > > 
> > > Yeah, I know, I just oversimplified the example. This *is* how I wanted
> > > it to work ;)
> > > 
> > > > 
> > > > As for the nvme device I would initially expose any namespace from the
> > > > subsystem to the controller; the nvme spec has some concept of 'active'
> > > > or 'inactive' namespaces which would allow us to blank out individual
> > > > namespaces on a per-controller basis, but I fear that's not easy to
> > > > model with qdev and the structure above.
> > > > 
> > > 
> > > The nvme-ns device already supports the boolean 'detached' parameter to
> > > support the concept of an inactive namespace.
> > > 
> > Yeah, but that doesn't really work if we move the namespace to the
> > subsystem; the 'detached' parameter is for the controller<->namespace
> > relationship.
> > That's why I meant we have to have some sort of NSID map for the controller
> > such that the controller knows with NSID to access.
> > I guess we can copy the trick with the NSID array, and reverse the operation
> > we have now wrt subsystem; keep the main NSID array in the subsystem, and
> > per-controller NSID arrays holding those which can be accessed.
> > 
> > And ignore the commandline for now; figure that one out later.
> > 
> > Cheers,
> > 
> > Hannes
> > > > >   -device nvme-subsys,...
> > > > >   -device nvme,...
> > > > >   -device nvme-ns,...
> > > > > 
> > > > > And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
> > > > > (which magically now IS available when nvme-ns is realized). This has
> > > > > the same end result, but I really would like that the namespaces could
> > > > > be specified as children of the subsys directly.
> > > > > 
> > > > Shudder.
> > > > Automatic reparenting.
> > > > To my understanding from qdev that shouldn't even be possible.
> > > > Please don't.
> > > > 
> > > 
> > > It's perfectly possible with the API and used to implement stuff like
> > > failover. We are not changing the parent object, we are changing the
> > > parent bus. hw/sd does something like this (but does mention that its a
> > > bit of a hack). In this case I'd say we could argue to get away with it
> > > as well.
> > > Allowing the nvme-ns device to be a child of the controller allows the
> > > initially attached controller of non-shared namespaces to be easily
> > > expressible. But I agree, the approach is a bit wacky, which is why I
> > > havnt posted anything yet.
> > 
> > Yep, I did try to implement multipathing for SCSI at one point, and that
> > became patently horrible as it would run against qdev where everything is
> > ordered within a tree.
> 
> Sorry to ask but has there been any progress on this topic? Just run
> into the same issue that adding nvme device during runtime is not
> showing any namespace.
> 

This is all upstream. Namespaces with 'shared=on' *should* all be
automatically attached to any hotplugged controller devices.

With what setup is this not working for you?
Hannes Reinecke Oct. 12, 2022, 6:24 a.m. UTC | #11
On 10/10/22 19:01, Daniel Wagner wrote:
> On Tue, May 11, 2021 at 06:12:47PM +0200, Hannes Reinecke wrote:
>> On 5/11/21 6:03 PM, Klaus Jensen wrote:
>>> On May 11 16:54, Hannes Reinecke wrote:
>>>> On 5/11/21 3:37 PM, Klaus Jensen wrote:
>>>>> On May 11 15:12, Hannes Reinecke wrote:
>>>>>> On 5/11/21 2:22 PM, Klaus Jensen wrote:
>>>> [ .. ]
>>>>>>> The hotplug fix looks good - I'll post a series that
>>>>>>> tries to integrate
>>>>>>> both.
>>>>>>>
>>>>>> Ta.
>>>>>>
>>>>>> The more I think about it, the more I think we should be looking into
>>>>>> reparenting the namespaces to the subsystem.
>>>>>> That would have the _immediate_ benefit that 'device_del' and
>>>>>> 'device_add' becomes symmetric (ie one doesn't have to do a separate
>>>>>> 'device_add nvme-ns'), as the nvme namespace is not affected by the
>>>>>> hotplug event.
>>>>>>
>>>>>
>>>>> I have that working, but I'm struggling with a QEMU API technicality in
>>>>> that I apparently cannot simply move the NvmeBus creation to the
>>>>> nvme-subsys device. For some reason the bus is not available for the
>>>>> nvme-ns devices. That is, if one does something like this:
>>>>>
>>>>>    -device nvme-subsys,...
>>>>>    -device nvme-ns,...
>>>>>
>>>>> Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
>>>>> This is probably just me not grok'ing the qdev well enough, so I'll keep
>>>>> trying to fix that. What works now is to have the regular setup:
>>>>>
>>>> _Normally_ the 'id' of the parent device spans a bus, so the syntax
>>>> should be
>>>>
>>>> -device nvme-subsys,id=subsys1,...
>>>> -device nvme-ns,bus=subsys1,...
>>>
>>> Yeah, I know, I just oversimplified the example. This *is* how I wanted
>>> it to work ;)
>>>
>>>>
>>>> As for the nvme device I would initially expose any namespace from the
>>>> subsystem to the controller; the nvme spec has some concept of 'active'
>>>> or 'inactive' namespaces which would allow us to blank out individual
>>>> namespaces on a per-controller basis, but I fear that's not easy to
>>>> model with qdev and the structure above.
>>>>
>>>
>>> The nvme-ns device already supports the boolean 'detached' parameter to
>>> support the concept of an inactive namespace.
>>>
>> Yeah, but that doesn't really work if we move the namespace to the
>> subsystem; the 'detached' parameter is for the controller<->namespace
>> relationship.
>> That's why I meant we have to have some sort of NSID map for the controller
>> such that the controller knows with NSID to access.
>> I guess we can copy the trick with the NSID array, and reverse the operation
>> we have now wrt subsystem; keep the main NSID array in the subsystem, and
>> per-controller NSID arrays holding those which can be accessed.
>>
>> And ignore the commandline for now; figure that one out later.
>>
[..]
> 
> Sorry to ask but has there been any progress on this topic? Just run
> into the same issue that adding nvme device during runtime is not
> showing any namespace.
> 
I _thought_ that the pci hotplug fixes have now been merged with qemu 
upstream. Klaus?

Cheers,

Hannes
Klaus Jensen Oct. 12, 2022, 8:06 a.m. UTC | #12
On Okt 12 08:24, Hannes Reinecke wrote:
> On 10/10/22 19:01, Daniel Wagner wrote:
> > On Tue, May 11, 2021 at 06:12:47PM +0200, Hannes Reinecke wrote:
> > > On 5/11/21 6:03 PM, Klaus Jensen wrote:
> > > > On May 11 16:54, Hannes Reinecke wrote:
> > > > > On 5/11/21 3:37 PM, Klaus Jensen wrote:
> > > > > > On May 11 15:12, Hannes Reinecke wrote:
> > > > > > > On 5/11/21 2:22 PM, Klaus Jensen wrote:
> > > > > [ .. ]
> > > > > > > > The hotplug fix looks good - I'll post a series that
> > > > > > > > tries to integrate
> > > > > > > > both.
> > > > > > > > 
> > > > > > > Ta.
> > > > > > > 
> > > > > > > The more I think about it, the more I think we should be looking into
> > > > > > > reparenting the namespaces to the subsystem.
> > > > > > > That would have the _immediate_ benefit that 'device_del' and
> > > > > > > 'device_add' becomes symmetric (ie one doesn't have to do a separate
> > > > > > > 'device_add nvme-ns'), as the nvme namespace is not affected by the
> > > > > > > hotplug event.
> > > > > > > 
> > > > > > 
> > > > > > I have that working, but I'm struggling with a QEMU API technicality in
> > > > > > that I apparently cannot simply move the NvmeBus creation to the
> > > > > > nvme-subsys device. For some reason the bus is not available for the
> > > > > > nvme-ns devices. That is, if one does something like this:
> > > > > > 
> > > > > >    -device nvme-subsys,...
> > > > > >    -device nvme-ns,...
> > > > > > 
> > > > > > Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
> > > > > > This is probably just me not grok'ing the qdev well enough, so I'll keep
> > > > > > trying to fix that. What works now is to have the regular setup:
> > > > > > 
> > > > > _Normally_ the 'id' of the parent device spans a bus, so the syntax
> > > > > should be
> > > > > 
> > > > > -device nvme-subsys,id=subsys1,...
> > > > > -device nvme-ns,bus=subsys1,...
> > > > 
> > > > Yeah, I know, I just oversimplified the example. This *is* how I wanted
> > > > it to work ;)
> > > > 
> > > > > 
> > > > > As for the nvme device I would initially expose any namespace from the
> > > > > subsystem to the controller; the nvme spec has some concept of 'active'
> > > > > or 'inactive' namespaces which would allow us to blank out individual
> > > > > namespaces on a per-controller basis, but I fear that's not easy to
> > > > > model with qdev and the structure above.
> > > > > 
> > > > 
> > > > The nvme-ns device already supports the boolean 'detached' parameter to
> > > > support the concept of an inactive namespace.
> > > > 
> > > Yeah, but that doesn't really work if we move the namespace to the
> > > subsystem; the 'detached' parameter is for the controller<->namespace
> > > relationship.
> > > That's why I meant we have to have some sort of NSID map for the controller
> > > such that the controller knows with NSID to access.
> > > I guess we can copy the trick with the NSID array, and reverse the operation
> > > we have now wrt subsystem; keep the main NSID array in the subsystem, and
> > > per-controller NSID arrays holding those which can be accessed.
> > > 
> > > And ignore the commandline for now; figure that one out later.
> > > 
> [..]
> > 
> > Sorry to ask but has there been any progress on this topic? Just run
> > into the same issue that adding nvme device during runtime is not
> > showing any namespace.
> > 
> I _thought_ that the pci hotplug fixes have now been merged with qemu
> upstream. Klaus?
> 

Yup. It's all upstream.
Daniel Wagner Oct. 18, 2022, 8:15 a.m. UTC | #13
On Mon, Oct 10, 2022 at 07:15:08PM +0200, Klaus Jensen wrote:
> This is all upstream. Namespaces with 'shared=on' *should* all be
> automatically attached to any hotplugged controller devices.
> 
> With what setup is this not working for you?

Ah okay, I missed the 'shared=on' bit. Let me try again.
Daniel Wagner Oct. 21, 2022, 12:59 p.m. UTC | #14
On Tue, Oct 18, 2022 at 10:15:57AM +0200, Daniel Wagner wrote:
> On Mon, Oct 10, 2022 at 07:15:08PM +0200, Klaus Jensen wrote:
> > This is all upstream. Namespaces with 'shared=on' *should* all be
> > automatically attached to any hotplugged controller devices.
> > 
> > With what setup is this not working for you?
> 
> Ah okay, I missed the 'shared=on' bit. Let me try again.

Nope, that's not enough. Maybe my setup is not okay?

  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='pcie-root-port,id=root,slot=1'/>
  </qemu:commandline>

  qemu-monitor-command tw0 --hmp drive_add 0 if=none,file=/tmp/nvme1.qcow2,format=qcow2,id=nvme1
  qemu-monitor-command tw0 --hmp device_add nvme,id=nvme1,serial=nvme-1,bus=root
  qemu-monitor-command tw0 --hmp device_add nvme-ns,drive=nvme1,nsid=1,shared=on
  Error: Bus 'nvme1' does not support hotplugging

With the patch below the hotplugging works.


diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..a09a698873 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5711,8 +5711,6 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
             }
 
             nvme_attach_ns(ctrl, ns);
-            nvme_select_iocs_ns(ctrl, ns);
-
             break;
 
         case NVME_NS_ATTACHMENT_DETACH:
@@ -5720,26 +5718,12 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
                 return NVME_NS_NOT_ATTACHED | NVME_DNR;
             }
 
-            ctrl->namespaces[nsid] = NULL;
-            ns->attached--;
-
-            nvme_update_dmrsl(ctrl);
-
+            nvme_detach_ns(ctrl, ns);
             break;
 
         default:
             return NVME_INVALID_FIELD | NVME_DNR;
         }
-
-        /*
-         * Add namespace id to the changed namespace id list for event clearing
-         * via Get Log Page command.
-         */
-        if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {
-            nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,
-                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
-                               NVME_LOG_CHANGED_NSLIST);
-        }
     }
 
     return NVME_SUCCESS;
@@ -7564,6 +7548,34 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 
     n->dmrsl = MIN_NON_ZERO(n->dmrsl,
                             BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+    if (NVME_CC_EN(n->bar.cc)) {
+        /* Ctrl is live */
+        nvme_select_iocs_ns(n, ns);
+        if (!test_and_set_bit(nsid, n->changed_nsids)) {
+            nvme_enqueue_event(n, NVME_AER_TYPE_NOTICE,
+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
+                               NVME_LOG_CHANGED_NSLIST);
+        }
+    }
+}
+
+void nvme_detach_ns(NvmeCtrl *n, NvmeNamespace *ns)
+{
+    uint32_t nsid = ns->params.nsid;
+
+    if (ns->attached) {
+        n->namespaces[nsid - 1] = NULL;
+        ns->attached--;
+    }
+    nvme_update_dmrsl(n);
+    if (NVME_CC_EN(n->bar.cc)) {
+        /* Ctrl is live */
+        if (!test_and_set_bit(nsid, n->changed_nsids)) {
+            nvme_enqueue_event(n, NVME_AER_TYPE_NOTICE,
+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
+                               NVME_LOG_CHANGED_NSLIST);
+        }
+    }
 }
 
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
@@ -7600,6 +7612,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         return;
     }
     nvme_init_ctrl(n, pci_dev);
+    qbus_set_bus_hotplug_handler(BUS(&n->bus));
 
     /* setup a namespace if the controller drive property was given */
     if (n->namespace.blkconf.blk) {
@@ -7817,10 +7830,22 @@ static const TypeInfo nvme_info = {
     },
 };
 
+static void nvme_bus_class_init(ObjectClass *klass, void *data)
+{
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+
+    hc->unplug = qdev_simple_device_unplug_cb;
+}
+
 static const TypeInfo nvme_bus_info = {
     .name = TYPE_NVME_BUS,
     .parent = TYPE_BUS,
     .instance_size = sizeof(NvmeBus),
+    .class_init = nvme_bus_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 static void nvme_register_types(void)
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 62a1f97be0..69ba08b0e7 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -530,10 +530,31 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
 static void nvme_ns_unrealize(DeviceState *dev)
 {
     NvmeNamespace *ns = NVME_NS(dev);
+    BusState *s = qdev_get_parent_bus(dev);
+    NvmeCtrl *n = NVME(s->parent);
+    NvmeSubsystem *subsys = n->subsys;
+    uint32_t nsid = ns->params.nsid;
+    int i;
 
     nvme_ns_drain(ns);
     nvme_ns_shutdown(ns);
     nvme_ns_cleanup(ns);
+
+    if (subsys) {
+        subsys->namespaces[nsid] = NULL;
+
+        if (ns->params.shared) {
+            for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+                NvmeCtrl *ctrl = subsys->ctrls[i];
+
+                if (ctrl) {
+                    nvme_detach_ns(ctrl, ns);
+                }
+            }
+            return;
+        }
+    }
+    nvme_detach_ns(n, ns);
 }
 
 static void nvme_ns_realize(DeviceState *dev, Error **errp)
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 79f5c281c2..7d9fc2ab28 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -576,6 +576,7 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(NvmeCtrl *n,
 }
 
 void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns);
+void nvme_detach_ns(NvmeCtrl *n, NvmeNamespace *ns);
 uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len,
                           NvmeTxDirection dir, NvmeRequest *req);
 uint16_t nvme_bounce_mdata(NvmeCtrl *n, void *ptr, uint32_t len,
Klaus Jensen Oct. 31, 2022, 7:30 a.m. UTC | #15
On Oct 21 14:59, Daniel Wagner wrote:
> On Tue, Oct 18, 2022 at 10:15:57AM +0200, Daniel Wagner wrote:
> > On Mon, Oct 10, 2022 at 07:15:08PM +0200, Klaus Jensen wrote:
> > > This is all upstream. Namespaces with 'shared=on' *should* all be
> > > automatically attached to any hotplugged controller devices.
> > > 
> > > With what setup is this not working for you?
> > 
> > Ah okay, I missed the 'shared=on' bit. Let me try again.
> 
> Nope, that's not enough. Maybe my setup is not okay?
> 
>   <qemu:commandline>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='pcie-root-port,id=root,slot=1'/>
>   </qemu:commandline>
> 
>   qemu-monitor-command tw0 --hmp drive_add 0 if=none,file=/tmp/nvme1.qcow2,format=qcow2,id=nvme1
>   qemu-monitor-command tw0 --hmp device_add nvme,id=nvme1,serial=nvme-1,bus=root
>   qemu-monitor-command tw0 --hmp device_add nvme-ns,drive=nvme1,nsid=1,shared=on
>   Error: Bus 'nvme1' does not support hotplugging
> 
> With the patch below the hotplugging works.
> 

Looks like you are not configuring a subsystem device (nvme-subsys) and
attaching the nvme controller to it.
diff mbox series

Patch

diff --git a/capstone b/capstone
index f8b1b83301..22ead3e0bf 160000
--- a/capstone
+++ b/capstone
@@ -1 +1 @@ 
-Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
+Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..3a7e01f10f 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -526,6 +526,36 @@  static void nvme_ns_realize(DeviceState *dev, Error **errp)
     nvme_attach_ns(n, ns);
 }
 
+static void nvme_ns_unrealize(DeviceState *dev)
+{
+    NvmeNamespace *ns = NVME_NS(dev);
+    BusState *s = qdev_get_parent_bus(dev);
+    NvmeCtrl *n = NVME(s->parent);
+    NvmeSubsystem *subsys = n->subsys;
+    uint32_t nsid = ns->params.nsid;
+    int i;
+
+    nvme_ns_drain(ns);
+    nvme_ns_shutdown(ns);
+    nvme_ns_cleanup(ns);
+
+    if (subsys) {
+        subsys->namespaces[nsid] = NULL;
+
+        if (ns->params.shared) {
+            for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+                NvmeCtrl *ctrl = subsys->ctrls[i];
+
+                if (ctrl) {
+                    nvme_detach_ns(ctrl, ns);
+                }
+            }
+            return;
+        }
+    }
+    nvme_detach_ns(n, ns);
+}
+
 static Property nvme_ns_props[] = {
     DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
     DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
@@ -563,6 +593,7 @@  static void nvme_ns_class_init(ObjectClass *oc, void *data)
 
     dc->bus_type = TYPE_NVME_BUS;
     dc->realize = nvme_ns_realize;
+    dc->unrealize = nvme_ns_unrealize;
     device_class_set_props(dc, nvme_ns_props);
     dc->desc = "Virtual NVMe namespace";
 }
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index 9604c19117..1c00508f33 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -42,6 +42,18 @@  int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
     return cntlid;
 }
 
+void nvme_subsys_unregister_ctrl(NvmeCtrl *n)
+{
+    NvmeSubsystem *subsys = n->subsys;
+    int cntlid = n->cntlid;
+
+    if (!n->subsys)
+        return;
+    assert(cntlid < ARRAY_SIZE(subsys->ctrls));
+    subsys->ctrls[cntlid] = NULL;
+    n->cntlid = -1;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
     const char *nqn = subsys->params.nqn ?
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 7d7ef5f7f1..2d8a146c4f 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -32,6 +32,7 @@  typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+void nvme_subsys_unregister_ctrl(NvmeCtrl *n);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
         uint32_t cntlid)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fe082ec34..515678b686 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4963,26 +4963,12 @@  static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
             }
 
             nvme_attach_ns(ctrl, ns);
-            __nvme_select_ns_iocs(ctrl, ns);
         } else {
             if (!nvme_ns(ctrl, nsid)) {
                 return NVME_NS_NOT_ATTACHED | NVME_DNR;
             }
 
-            ctrl->namespaces[nsid - 1] = NULL;
-            ns->attached--;
-
-            nvme_update_dmrsl(ctrl);
-        }
-
-        /*
-         * Add namespace id to the changed namespace id list for event clearing
-         * via Get Log Page command.
-         */
-        if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {
-            nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,
-                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
-                               NVME_LOG_CHANGED_NSLIST);
+            nvme_detach_ns(ctrl, ns);
         }
     }
 
@@ -6166,6 +6152,34 @@  void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 
     n->dmrsl = MIN_NON_ZERO(n->dmrsl,
                             BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+    if (NVME_CC_EN(n->bar.cc)) {
+        /* Ctrl is live */
+        __nvme_select_ns_iocs(n, ns);
+        if (!test_and_set_bit(nsid, n->changed_nsids)) {
+            nvme_enqueue_event(n, NVME_AER_TYPE_NOTICE,
+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
+                               NVME_LOG_CHANGED_NSLIST);
+        }
+    }
+}
+
+void nvme_detach_ns(NvmeCtrl *n, NvmeNamespace *ns)
+{
+    uint32_t nsid = ns->params.nsid;
+
+    if (ns->attached) {
+        n->namespaces[nsid - 1] = NULL;
+        ns->attached--;
+    }
+    nvme_update_dmrsl(n);
+    if (NVME_CC_EN(n->bar.cc)) {
+        /* Ctrl is live */
+        if (!test_and_set_bit(nsid, n->changed_nsids)) {
+            nvme_enqueue_event(n, NVME_AER_TYPE_NOTICE,
+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
+                               NVME_LOG_CHANGED_NSLIST);
+        }
+    }
 }
 
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
@@ -6193,7 +6207,7 @@  static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         return;
     }
     nvme_init_ctrl(n, pci_dev);
-
+    qbus_set_bus_hotplug_handler(BUS(&n->bus));
     /* setup a namespace if the controller drive property was given */
     if (n->namespace.blkconf.blk) {
         ns = &n->namespace;
@@ -6224,6 +6238,8 @@  static void nvme_exit(PCIDevice *pci_dev)
         nvme_ns_cleanup(ns);
     }
 
+    nvme_subsys_unregister_ctrl(n);
+
     g_free(n->cq);
     g_free(n->sq);
     g_free(n->aer_reqs);
@@ -6348,10 +6364,22 @@  static const TypeInfo nvme_info = {
     },
 };
 
+static void nvme_bus_class_init(ObjectClass *klass, void *data)
+{
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+
+    hc->unplug = qdev_simple_device_unplug_cb;
+}
+
 static const TypeInfo nvme_bus_info = {
     .name = TYPE_NVME_BUS,
     .parent = TYPE_BUS,
     .instance_size = sizeof(NvmeBus),
+    .class_init = nvme_bus_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 static void nvme_register_types(void)
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 5d05ec368f..4fc06f58a4 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -255,6 +255,7 @@  typedef enum NvmeTxDirection {
 } NvmeTxDirection;
 
 void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns);
+void nvme_detach_ns(NvmeCtrl *n, NvmeNamespace *ns);
 uint16_t nvme_bounce_data(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
                           NvmeTxDirection dir, NvmeRequest *req);
 uint16_t nvme_bounce_mdata(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
diff --git a/roms/SLOF b/roms/SLOF
index 33a7322de1..e18ddad851 160000
--- a/roms/SLOF
+++ b/roms/SLOF
@@ -1 +1 @@ 
-Subproject commit 33a7322de13e9dca4b38851a345a58d37e7a441d
+Subproject commit e18ddad8516ff2cfe36ec130200318f7251aa78c
diff --git a/roms/openbios b/roms/openbios
index 4a0041107b..7f28286f5c 160000
--- a/roms/openbios
+++ b/roms/openbios
@@ -1 +1 @@ 
-Subproject commit 4a0041107b8ef77e0e8337bfcb5f8078887261a7
+Subproject commit 7f28286f5cb1ca682e3ba0a8706d8884f12bc49e
diff --git a/roms/u-boot b/roms/u-boot
index b46dd116ce..d3689267f9 160000
--- a/roms/u-boot
+++ b/roms/u-boot
@@ -1 +1 @@ 
-Subproject commit b46dd116ce03e235f2a7d4843c6278e1da44b5e1
+Subproject commit d3689267f92c5956e09cc7d1baa4700141662bff