mbox series

[RFC,V2,00/11] hw/block/nvme: support multi-path for ctrl/ns

Message ID 20210117145341.23310-1-minwoo.im.dev@gmail.com
Headers show
Series hw/block/nvme: support multi-path for ctrl/ns | expand

Message

Minwoo Im Jan. 17, 2021, 2:53 p.m. UTC
Hello,

This patch series introduces NVMe subsystem device to support multi-path
I/O in NVMe device model.  Two use-cases are supported along with this
patch: Multi-controller, Namespace Sharing.

V1 RFC has been discussed with Klaus and Keith, I really appreciate them
for this patch series to have proper direction [1].

This patch series contains few start-up refactoring pathces from the
first to fifth patches to make nvme-ns device not to rely on the nvme
controller always.  Because nvme-ns shall be able to be mapped to the
subsystem level, not a single controller level so that it should provide
generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
the first five patches are to remove the NvmeCtrl * instance argument
from the nvme_ns_setup().  I'd be happy if they are picked!

For controller and namespace devices, 'subsys' property has been
introduced to map them to a subsystem.  If multi-controller needed, we
can specify 'subsys' to controllers the same.

For namespace deivice, if 'subsys' is not given just like it was, it
will have to be provided with 'bus' parameter to specify a nvme
controller device to attach, it means, they are mutual-exlusive.  To
share a namespace between or among controllers, then nvme-ns should have
'subsys' property to a single nvme subsystem instance.  To make a
namespace private one, then we need to specify 'bus' property rather
than the 'subsys'.

Of course, this series does not require any updates for the run command
for the previos users.

Plase refer the following example with nvme-cli output:

QEMU Run:
  -device nvme-subsys,id=subsys0 \
  -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
  -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
  -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
  -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
  -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
  \
  -device nvme,serial=qux,id=nvme3 \
  -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3

nvme-cli:
  root@vm:~/work# nvme list -v
  NVM Express Subsystems

  Subsystem        Subsystem-NQN                                                                                    Controllers
  ---------------- ------------------------------------------------------------------------------------------------ ----------------
  nvme-subsys1     nqn.2019-08.org.qemu:subsys0                                                                     nvme0, nvme1, nvme2
  nvme-subsys3     nqn.2019-08.org.qemu:qux                                                                         nvme3

  NVM Express Controllers

  Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
  -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
  nvme0    foo                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:06.0   nvme-subsys1 nvme1n1
  nvme1    bar                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:07.0   nvme-subsys1 nvme1n1
  nvme2    baz                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:08.0   nvme-subsys1 nvme1n1, nvme1n2
  nvme3    qux                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:09.0   nvme-subsys3

  NVM Express Namespaces

  Device       NSID     Usage                      Format           Controllers
  ------------ -------- -------------------------- ---------------- ----------------
  nvme1n1      1        134.22  MB / 134.22  MB    512   B +  0 B   nvme0, nvme1, nvme2
  nvme1n2      2        268.44  MB / 268.44  MB    512   B +  0 B   nvme2
  nvme3n1      3        268.44  MB / 268.44  MB    512   B +  0 B   nvme3

Summary:
  - Refactored nvme-ns device not to rely on controller during the
    setup.  [1/11 - 5/11]
  - Introduced a nvme-subsys device model. [6/11]
  - Create subsystem NQN based on subsystem. [7/11]
  - Introduced multi-controller model. [8/11 - 9/11]
  - Updated namespace sharing scheme to be based on nvme-subsys
    hierarchy. [10/11 - 11/11]

Since RFC V1:
  - Updated namespace sharing scheme to be based on nvme-subsys
    hierarchy.

Thanks,

[1] https://lists.gnu.org/archive/html/qemu-block/2021-01/msg00425.html

Minwoo Im (11):
  hw/block/nvme: remove unused argument in nvme_ns_init_zoned
  hw/block/nvme: open code for volatile write cache
  hw/block/nvme: remove unused argument in nvme_ns_init_blk
  hw/block/nvme: split setup and register for namespace
  hw/block/nvme: remove unused argument in nvme_ns_setup
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem

 hw/block/meson.build   |   2 +-
 hw/block/nvme-ns.c     |  40 ++++++++++------
 hw/block/nvme-ns.h     |   9 +++-
 hw/block/nvme-subsys.c | 101 +++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme-subsys.h |  30 ++++++++++++
 hw/block/nvme.c        |  92 ++++++++++++++++++++++++++++++++-----
 hw/block/nvme.h        |   5 +-
 include/block/nvme.h   |   8 ++++
 8 files changed, 259 insertions(+), 28 deletions(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

Comments

Klaus Jensen Jan. 18, 2021, 9:14 p.m. UTC | #1
On Jan 17 23:53, Minwoo Im wrote:
> Hello,
> 
> This patch series introduces NVMe subsystem device to support multi-path
> I/O in NVMe device model.  Two use-cases are supported along with this
> patch: Multi-controller, Namespace Sharing.
> 
> V1 RFC has been discussed with Klaus and Keith, I really appreciate them
> for this patch series to have proper direction [1].
> 
> This patch series contains few start-up refactoring pathces from the
> first to fifth patches to make nvme-ns device not to rely on the nvme
> controller always.  Because nvme-ns shall be able to be mapped to the
> subsystem level, not a single controller level so that it should provide
> generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
> the first five patches are to remove the NvmeCtrl * instance argument
> from the nvme_ns_setup().  I'd be happy if they are picked!
> 
> For controller and namespace devices, 'subsys' property has been
> introduced to map them to a subsystem.  If multi-controller needed, we
> can specify 'subsys' to controllers the same.
> 
> For namespace deivice, if 'subsys' is not given just like it was, it
> will have to be provided with 'bus' parameter to specify a nvme
> controller device to attach, it means, they are mutual-exlusive.  To
> share a namespace between or among controllers, then nvme-ns should have
> 'subsys' property to a single nvme subsystem instance.  To make a
> namespace private one, then we need to specify 'bus' property rather
> than the 'subsys'.
> 
> Of course, this series does not require any updates for the run command
> for the previos users.
> 
> Plase refer the following example with nvme-cli output:
> 
> QEMU Run:
>   -device nvme-subsys,id=subsys0 \
>   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
>   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
>   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
>   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
>   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
>   \
>   -device nvme,serial=qux,id=nvme3 \
>   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> 
> nvme-cli:
>   root@vm:~/work# nvme list -v
>   NVM Express Subsystems
> 
>   Subsystem        Subsystem-NQN                                                                                    Controllers
>   ---------------- ------------------------------------------------------------------------------------------------ ----------------
>   nvme-subsys1     nqn.2019-08.org.qemu:subsys0                                                                     nvme0, nvme1, nvme2
>   nvme-subsys3     nqn.2019-08.org.qemu:qux                                                                         nvme3
> 
>   NVM Express Controllers
> 
>   Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
>   -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
>   nvme0    foo                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:06.0   nvme-subsys1 nvme1n1
>   nvme1    bar                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:07.0   nvme-subsys1 nvme1n1
>   nvme2    baz                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:08.0   nvme-subsys1 nvme1n1, nvme1n2
>   nvme3    qux                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:09.0   nvme-subsys3
> 
>   NVM Express Namespaces
> 
>   Device       NSID     Usage                      Format           Controllers
>   ------------ -------- -------------------------- ---------------- ----------------
>   nvme1n1      1        134.22  MB / 134.22  MB    512   B +  0 B   nvme0, nvme1, nvme2
>   nvme1n2      2        268.44  MB / 268.44  MB    512   B +  0 B   nvme2
>   nvme3n1      3        268.44  MB / 268.44  MB    512   B +  0 B   nvme3
> 
> Summary:
>   - Refactored nvme-ns device not to rely on controller during the
>     setup.  [1/11 - 5/11]
>   - Introduced a nvme-subsys device model. [6/11]
>   - Create subsystem NQN based on subsystem. [7/11]
>   - Introduced multi-controller model. [8/11 - 9/11]
>   - Updated namespace sharing scheme to be based on nvme-subsys
>     hierarchy. [10/11 - 11/11]
> 
> Since RFC V1:
>   - Updated namespace sharing scheme to be based on nvme-subsys
>     hierarchy.
> 

Great stuff Minwoo. Thanks!

I'll pick up [01-05/11] directly since they are pretty trivial.

The subsystem model looks pretty much like it should, I don't have a lot
of comments.

One thing that I considered, is if we should reverse the "registration"
and think about it as namespace attachment. The spec is about
controllers attaching to namespaces, not the other way around.
Basically, let the namespaces be configured first and register on the
subsystem (accumulating in a "namespaces" array), then have the
controllers register with the subsystem and attach to all "non-detached"
namespaces. This allows detached namespaces to "linger" in the subsystem
to be attached later on. If there are any private namespaces (like ns2
in your example above), it will be defined after the controller with the
bus=ctrlX parameter like normal.
Minwoo Im Jan. 19, 2021, 3:21 a.m. UTC | #2
On 21-01-18 22:14:45, Klaus Jensen wrote:
> On Jan 17 23:53, Minwoo Im wrote:
> > Hello,
> > 
> > This patch series introduces NVMe subsystem device to support multi-path
> > I/O in NVMe device model.  Two use-cases are supported along with this
> > patch: Multi-controller, Namespace Sharing.
> > 
> > V1 RFC has been discussed with Klaus and Keith, I really appreciate them
> > for this patch series to have proper direction [1].
> > 
> > This patch series contains few start-up refactoring pathces from the
> > first to fifth patches to make nvme-ns device not to rely on the nvme
> > controller always.  Because nvme-ns shall be able to be mapped to the
> > subsystem level, not a single controller level so that it should provide
> > generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
> > the first five patches are to remove the NvmeCtrl * instance argument
> > from the nvme_ns_setup().  I'd be happy if they are picked!
> > 
> > For controller and namespace devices, 'subsys' property has been
> > introduced to map them to a subsystem.  If multi-controller needed, we
> > can specify 'subsys' to controllers the same.
> > 
> > For namespace deivice, if 'subsys' is not given just like it was, it
> > will have to be provided with 'bus' parameter to specify a nvme
> > controller device to attach, it means, they are mutual-exlusive.  To
> > share a namespace between or among controllers, then nvme-ns should have
> > 'subsys' property to a single nvme subsystem instance.  To make a
> > namespace private one, then we need to specify 'bus' property rather
> > than the 'subsys'.
> > 
> > Of course, this series does not require any updates for the run command
> > for the previos users.
> > 
> > Plase refer the following example with nvme-cli output:
> > 
> > QEMU Run:
> >   -device nvme-subsys,id=subsys0 \
> >   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
> >   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
> >   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
> >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
> >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
> >   \
> >   -device nvme,serial=qux,id=nvme3 \
> >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > 
> > nvme-cli:
> >   root@vm:~/work# nvme list -v
> >   NVM Express Subsystems
> > 
> >   Subsystem        Subsystem-NQN                                                                                    Controllers
> >   ---------------- ------------------------------------------------------------------------------------------------ ----------------
> >   nvme-subsys1     nqn.2019-08.org.qemu:subsys0                                                                     nvme0, nvme1, nvme2
> >   nvme-subsys3     nqn.2019-08.org.qemu:qux                                                                         nvme3
> > 
> >   NVM Express Controllers
> > 
> >   Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
> >   -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
> >   nvme0    foo                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:06.0   nvme-subsys1 nvme1n1
> >   nvme1    bar                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:07.0   nvme-subsys1 nvme1n1
> >   nvme2    baz                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:08.0   nvme-subsys1 nvme1n1, nvme1n2
> >   nvme3    qux                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:09.0   nvme-subsys3
> > 
> >   NVM Express Namespaces
> > 
> >   Device       NSID     Usage                      Format           Controllers
> >   ------------ -------- -------------------------- ---------------- ----------------
> >   nvme1n1      1        134.22  MB / 134.22  MB    512   B +  0 B   nvme0, nvme1, nvme2
> >   nvme1n2      2        268.44  MB / 268.44  MB    512   B +  0 B   nvme2
> >   nvme3n1      3        268.44  MB / 268.44  MB    512   B +  0 B   nvme3
> > 
> > Summary:
> >   - Refactored nvme-ns device not to rely on controller during the
> >     setup.  [1/11 - 5/11]
> >   - Introduced a nvme-subsys device model. [6/11]
> >   - Create subsystem NQN based on subsystem. [7/11]
> >   - Introduced multi-controller model. [8/11 - 9/11]
> >   - Updated namespace sharing scheme to be based on nvme-subsys
> >     hierarchy. [10/11 - 11/11]
> > 
> > Since RFC V1:
> >   - Updated namespace sharing scheme to be based on nvme-subsys
> >     hierarchy.
> > 
> 
> Great stuff Minwoo. Thanks!
> 
> I'll pick up [01-05/11] directly since they are pretty trivial.

Thanks! will prepare the next series based on there.

> The subsystem model looks pretty much like it should, I don't have a lot
> of comments.
> 
> One thing that I considered, is if we should reverse the "registration"
> and think about it as namespace attachment. The spec is about
> controllers attaching to namespaces, not the other way around.
> Basically, let the namespaces be configured first and register on the
> subsystem (accumulating in a "namespaces" array), then have the
> controllers register with the subsystem and attach to all "non-detached"
> namespaces. This allows detached namespaces to "linger" in the subsystem
> to be attached later on. If there are any private namespaces (like ns2
> in your example above), it will be defined after the controller with the
> bus=ctrlX parameter like normal.

Revisited spec. again.  5.19 says "The Namespace Attachment command is
used to attach and detach controllers from a namespace.".  and 5.20 says
"Host software uses the Namespace Attachment command to attach or detach
a namespace to or from a controller. The create operation does not attach
the namespace to a controller."

	-device nvme-subsys,id=subsys0
	-device nvme-ns,id=ns1,drive=<drv>,nsid=1,subsys=subsys0
	-device nvme,id=nvme0,serial=foo,subsys=subsys0

In this case, the 'nvme0' controller will have no namespace at the
initial time of the boot-up.  'nvme0' can be attached to the namespace
'ns1' with namespace attach command.  'nvme-ns' device is same as the
'create-ns' operation in a NVMe subsystem.  This makes sense as spec
5.19 says "from a namespace".

	-device nvme,id=nvme1,serial=bar,subsys=subsys0b
	-device nvme-ns,id=ns2,drive=<drv>,nsid=1,bus=nvme1

This case if for private namespace directly attached to controller.
This makes sense as spec 5.20 says "to or from a controller".

All looks fine to me, but one thing I an wondering is that how can we
attach a controller to shared namespace(s) at the initial time?
Klaus Jensen Jan. 19, 2021, 6:04 a.m. UTC | #3
On Jan 19 12:21, Minwoo Im wrote:
> On 21-01-18 22:14:45, Klaus Jensen wrote:
> > On Jan 17 23:53, Minwoo Im wrote:
> > > Hello,
> > > 
> > > This patch series introduces NVMe subsystem device to support multi-path
> > > I/O in NVMe device model.  Two use-cases are supported along with this
> > > patch: Multi-controller, Namespace Sharing.
> > > 
> > > V1 RFC has been discussed with Klaus and Keith, I really appreciate them
> > > for this patch series to have proper direction [1].
> > > 
> > > This patch series contains few start-up refactoring pathces from the
> > > first to fifth patches to make nvme-ns device not to rely on the nvme
> > > controller always.  Because nvme-ns shall be able to be mapped to the
> > > subsystem level, not a single controller level so that it should provide
> > > generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
> > > the first five patches are to remove the NvmeCtrl * instance argument
> > > from the nvme_ns_setup().  I'd be happy if they are picked!
> > > 
> > > For controller and namespace devices, 'subsys' property has been
> > > introduced to map them to a subsystem.  If multi-controller needed, we
> > > can specify 'subsys' to controllers the same.
> > > 
> > > For namespace deivice, if 'subsys' is not given just like it was, it
> > > will have to be provided with 'bus' parameter to specify a nvme
> > > controller device to attach, it means, they are mutual-exlusive.  To
> > > share a namespace between or among controllers, then nvme-ns should have
> > > 'subsys' property to a single nvme subsystem instance.  To make a
> > > namespace private one, then we need to specify 'bus' property rather
> > > than the 'subsys'.
> > > 
> > > Of course, this series does not require any updates for the run command
> > > for the previos users.
> > > 
> > > Plase refer the following example with nvme-cli output:
> > > 
> > > QEMU Run:
> > >   -device nvme-subsys,id=subsys0 \
> > >   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
> > >   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
> > >   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
> > >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
> > >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
> > >   \
> > >   -device nvme,serial=qux,id=nvme3 \
> > >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > > 
> > > nvme-cli:
> > >   root@vm:~/work# nvme list -v
> > >   NVM Express Subsystems
> > > 
> > >   Subsystem        Subsystem-NQN                                                                                    Controllers
> > >   ---------------- ------------------------------------------------------------------------------------------------ ----------------
> > >   nvme-subsys1     nqn.2019-08.org.qemu:subsys0                                                                     nvme0, nvme1, nvme2
> > >   nvme-subsys3     nqn.2019-08.org.qemu:qux                                                                         nvme3
> > > 
> > >   NVM Express Controllers
> > > 
> > >   Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
> > >   -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
> > >   nvme0    foo                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:06.0   nvme-subsys1 nvme1n1
> > >   nvme1    bar                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:07.0   nvme-subsys1 nvme1n1
> > >   nvme2    baz                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:08.0   nvme-subsys1 nvme1n1, nvme1n2
> > >   nvme3    qux                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:09.0   nvme-subsys3
> > > 
> > >   NVM Express Namespaces
> > > 
> > >   Device       NSID     Usage                      Format           Controllers
> > >   ------------ -------- -------------------------- ---------------- ----------------
> > >   nvme1n1      1        134.22  MB / 134.22  MB    512   B +  0 B   nvme0, nvme1, nvme2
> > >   nvme1n2      2        268.44  MB / 268.44  MB    512   B +  0 B   nvme2
> > >   nvme3n1      3        268.44  MB / 268.44  MB    512   B +  0 B   nvme3
> > > 
> > > Summary:
> > >   - Refactored nvme-ns device not to rely on controller during the
> > >     setup.  [1/11 - 5/11]
> > >   - Introduced a nvme-subsys device model. [6/11]
> > >   - Create subsystem NQN based on subsystem. [7/11]
> > >   - Introduced multi-controller model. [8/11 - 9/11]
> > >   - Updated namespace sharing scheme to be based on nvme-subsys
> > >     hierarchy. [10/11 - 11/11]
> > > 
> > > Since RFC V1:
> > >   - Updated namespace sharing scheme to be based on nvme-subsys
> > >     hierarchy.
> > > 
> > 
> > Great stuff Minwoo. Thanks!
> > 
> > I'll pick up [01-05/11] directly since they are pretty trivial.
> 
> Thanks! will prepare the next series based on there.
> 
> > The subsystem model looks pretty much like it should, I don't have a lot
> > of comments.
> > 
> > One thing that I considered, is if we should reverse the "registration"
> > and think about it as namespace attachment. The spec is about
> > controllers attaching to namespaces, not the other way around.
> > Basically, let the namespaces be configured first and register on the
> > subsystem (accumulating in a "namespaces" array), then have the
> > controllers register with the subsystem and attach to all "non-detached"
> > namespaces. This allows detached namespaces to "linger" in the subsystem
> > to be attached later on. If there are any private namespaces (like ns2
> > in your example above), it will be defined after the controller with the
> > bus=ctrlX parameter like normal.
> 
> Revisited spec. again.  5.19 says "The Namespace Attachment command is
> used to attach and detach controllers from a namespace.".  and 5.20 says
> "Host software uses the Namespace Attachment command to attach or detach
> a namespace to or from a controller. The create operation does not attach
> the namespace to a controller."
> 

Yeah ok, that is pretty inconsistent.

> 	-device nvme-subsys,id=subsys0
> 	-device nvme-ns,id=ns1,drive=<drv>,nsid=1,subsys=subsys0
> 	-device nvme,id=nvme0,serial=foo,subsys=subsys0
> 
> In this case, the 'nvme0' controller will have no namespace at the
> initial time of the boot-up.  'nvme0' can be attached to the namespace
> 'ns1' with namespace attach command.  'nvme-ns' device is same as the
> 'create-ns' operation in a NVMe subsystem.  This makes sense as spec
> 5.19 says "from a namespace".
> 
> 	-device nvme,id=nvme1,serial=bar,subsys=subsys0b
> 	-device nvme-ns,id=ns2,drive=<drv>,nsid=1,bus=nvme1
> 
> This case if for private namespace directly attached to controller.
> This makes sense as spec 5.20 says "to or from a controller".
> 
> All looks fine to me, but one thing I an wondering is that how can we
> attach a controller to shared namespace(s) at the initial time?
> 

Ok, nevermind. I think we can get 'detached' functionality in either
case, so no need to increase complexity by requiring a change of define
order.

Supporting CNS 0x12 and 0x13 (Identify, Controller List), we need the
controllers registered and stored in the subsystem anyway.

So, can we add a 'namespaces' array on the subsystem to keep a list of
namespaces and add a 'detached' parameter on the nvme-ns device? If that
parameter is given, the device is not registered with the controllers.
Minwoo Im Jan. 19, 2021, 7:51 a.m. UTC | #4
On 21-01-19 07:04:04, Klaus Jensen wrote:
> On Jan 19 12:21, Minwoo Im wrote:
> > On 21-01-18 22:14:45, Klaus Jensen wrote:
> > > On Jan 17 23:53, Minwoo Im wrote:
> > > > Hello,
> > > > 
> > > > This patch series introduces NVMe subsystem device to support multi-path
> > > > I/O in NVMe device model.  Two use-cases are supported along with this
> > > > patch: Multi-controller, Namespace Sharing.
> > > > 
> > > > V1 RFC has been discussed with Klaus and Keith, I really appreciate them
> > > > for this patch series to have proper direction [1].
> > > > 
> > > > This patch series contains few start-up refactoring pathces from the
> > > > first to fifth patches to make nvme-ns device not to rely on the nvme
> > > > controller always.  Because nvme-ns shall be able to be mapped to the
> > > > subsystem level, not a single controller level so that it should provide
> > > > generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
> > > > the first five patches are to remove the NvmeCtrl * instance argument
> > > > from the nvme_ns_setup().  I'd be happy if they are picked!
> > > > 
> > > > For controller and namespace devices, 'subsys' property has been
> > > > introduced to map them to a subsystem.  If multi-controller needed, we
> > > > can specify 'subsys' to controllers the same.
> > > > 
> > > > For namespace deivice, if 'subsys' is not given just like it was, it
> > > > will have to be provided with 'bus' parameter to specify a nvme
> > > > controller device to attach, it means, they are mutual-exlusive.  To
> > > > share a namespace between or among controllers, then nvme-ns should have
> > > > 'subsys' property to a single nvme subsystem instance.  To make a
> > > > namespace private one, then we need to specify 'bus' property rather
> > > > than the 'subsys'.
> > > > 
> > > > Of course, this series does not require any updates for the run command
> > > > for the previos users.
> > > > 
> > > > Plase refer the following example with nvme-cli output:
> > > > 
> > > > QEMU Run:
> > > >   -device nvme-subsys,id=subsys0 \
> > > >   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
> > > >   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
> > > >   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
> > > >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
> > > >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
> > > >   \
> > > >   -device nvme,serial=qux,id=nvme3 \
> > > >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > > > 
> > > > nvme-cli:
> > > >   root@vm:~/work# nvme list -v
> > > >   NVM Express Subsystems
> > > > 
> > > >   Subsystem        Subsystem-NQN                                                                                    Controllers
> > > >   ---------------- ------------------------------------------------------------------------------------------------ ----------------
> > > >   nvme-subsys1     nqn.2019-08.org.qemu:subsys0                                                                     nvme0, nvme1, nvme2
> > > >   nvme-subsys3     nqn.2019-08.org.qemu:qux                                                                         nvme3
> > > > 
> > > >   NVM Express Controllers
> > > > 
> > > >   Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
> > > >   -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
> > > >   nvme0    foo                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:06.0   nvme-subsys1 nvme1n1
> > > >   nvme1    bar                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:07.0   nvme-subsys1 nvme1n1
> > > >   nvme2    baz                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:08.0   nvme-subsys1 nvme1n1, nvme1n2
> > > >   nvme3    qux                  QEMU NVMe Ctrl                           1.0      pcie   0000:00:09.0   nvme-subsys3
> > > > 
> > > >   NVM Express Namespaces
> > > > 
> > > >   Device       NSID     Usage                      Format           Controllers
> > > >   ------------ -------- -------------------------- ---------------- ----------------
> > > >   nvme1n1      1        134.22  MB / 134.22  MB    512   B +  0 B   nvme0, nvme1, nvme2
> > > >   nvme1n2      2        268.44  MB / 268.44  MB    512   B +  0 B   nvme2
> > > >   nvme3n1      3        268.44  MB / 268.44  MB    512   B +  0 B   nvme3
> > > > 
> > > > Summary:
> > > >   - Refactored nvme-ns device not to rely on controller during the
> > > >     setup.  [1/11 - 5/11]
> > > >   - Introduced a nvme-subsys device model. [6/11]
> > > >   - Create subsystem NQN based on subsystem. [7/11]
> > > >   - Introduced multi-controller model. [8/11 - 9/11]
> > > >   - Updated namespace sharing scheme to be based on nvme-subsys
> > > >     hierarchy. [10/11 - 11/11]
> > > > 
> > > > Since RFC V1:
> > > >   - Updated namespace sharing scheme to be based on nvme-subsys
> > > >     hierarchy.
> > > > 
> > > 
> > > Great stuff Minwoo. Thanks!
> > > 
> > > I'll pick up [01-05/11] directly since they are pretty trivial.
> > 
> > Thanks! will prepare the next series based on there.
> > 
> > > The subsystem model looks pretty much like it should, I don't have a lot
> > > of comments.
> > > 
> > > One thing that I considered, is if we should reverse the "registration"
> > > and think about it as namespace attachment. The spec is about
> > > controllers attaching to namespaces, not the other way around.
> > > Basically, let the namespaces be configured first and register on the
> > > subsystem (accumulating in a "namespaces" array), then have the
> > > controllers register with the subsystem and attach to all "non-detached"
> > > namespaces. This allows detached namespaces to "linger" in the subsystem
> > > to be attached later on. If there are any private namespaces (like ns2
> > > in your example above), it will be defined after the controller with the
> > > bus=ctrlX parameter like normal.
> > 
> > Revisited spec. again.  5.19 says "The Namespace Attachment command is
> > used to attach and detach controllers from a namespace.".  and 5.20 says
> > "Host software uses the Namespace Attachment command to attach or detach
> > a namespace to or from a controller. The create operation does not attach
> > the namespace to a controller."
> > 
> 
> Yeah ok, that is pretty inconsistent.
> 
> > 	-device nvme-subsys,id=subsys0
> > 	-device nvme-ns,id=ns1,drive=<drv>,nsid=1,subsys=subsys0
> > 	-device nvme,id=nvme0,serial=foo,subsys=subsys0
> > 
> > In this case, the 'nvme0' controller will have no namespace at the
> > initial time of the boot-up.  'nvme0' can be attached to the namespace
> > 'ns1' with namespace attach command.  'nvme-ns' device is same as the
> > 'create-ns' operation in a NVMe subsystem.  This makes sense as spec
> > 5.19 says "from a namespace".
> > 
> > 	-device nvme,id=nvme1,serial=bar,subsys=subsys0b
> > 	-device nvme-ns,id=ns2,drive=<drv>,nsid=1,bus=nvme1
> > 
> > This case if for private namespace directly attached to controller.
> > This makes sense as spec 5.20 says "to or from a controller".
> > 
> > All looks fine to me, but one thing I an wondering is that how can we
> > attach a controller to shared namespace(s) at the initial time?
> > 
> 
> Ok, nevermind. I think we can get 'detached' functionality in either
> case, so no need to increase complexity by requiring a change of define
> order.
> 
> Supporting CNS 0x12 and 0x13 (Identify, Controller List), we need the
> controllers registered and stored in the subsystem anyway.
> 
> So, can we add a 'namespaces' array on the subsystem to keep a list of
> namespaces and add a 'detached' parameter on the nvme-ns device? If that
> parameter is given, the device is not registered with the controllers.

Sure, will do that.  Plese let me have V3 series.  Thanks!