Patchwork [2/2] NVMe: Implement PCI-e reset notification callback

login
register
mail settings
Submitter Keith Busch
Date April 8, 2014, 11:42 p.m.
Message ID <1397000541-1085-2-git-send-email-keith.busch@intel.com>
Download mbox | patch
Permalink /patch/337749/
State Changes Requested
Headers show

Comments

Keith Busch - April 8, 2014, 11:42 p.m.
Disable and shutdown the device prior to reset, and restart the device
after.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/block/nvme-core.c |   11 +++++++++++
 1 file changed, 11 insertions(+)
Learner Study - April 22, 2014, 1:34 a.m.
Hi Keith,

I think NVMe Reset should apply to PF mode driver only, and not to VF
mode driver.
Is that understanding correct? Does the NVMe driver know which mode
its running in?

Thanks!


On Tue, Apr 8, 2014 at 4:42 PM, Keith Busch <keith.busch@intel.com> wrote:
>
> Disable and shutdown the device prior to reset, and restart the device
> after.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/block/nvme-core.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index 625259d..273ff12 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -2605,6 +2605,16 @@ static void nvme_reset_failed_dev(struct work_struct *ws)
>         nvme_dev_reset(dev);
>  }
>
> +static void nvme_reset_notify(struct pci_dev *pdev, int prepare)
> +{
> +       struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +       if (prepare)
> +               nvme_dev_shutdown(dev);
> +       else
> +               nvme_dev_resume(dev);
> +}
> +
>  static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>         int result = -ENOMEM;
> @@ -2744,6 +2754,7 @@ static const struct pci_error_handlers nvme_err_handler = {
>         .link_reset     = nvme_link_reset,
>         .slot_reset     = nvme_slot_reset,
>         .resume         = nvme_error_resume,
> +       .reset_notify   = nvme_reset_notify,
>  };
>
>  /* Move to pci_ids.h later */
> --
> 1.7.10.4
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://merlin.infradead.org/mailman/listinfo/linux-nvme
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keith Busch - April 22, 2014, 1:57 a.m.
On Mon, 21 Apr 2014, Learner Study wrote:
> Hi Keith,
>
> I think NVMe Reset should apply to PF mode driver only, and not to VF
> mode driver.
> Is that understanding correct? Does the NVMe driver know which mode
> its running in?

Oh, this driver doesn't enable SR-IOV and has no PF/VF awareness. Shame
on us, I'll add it to my list unless someone beats me to it.

Still, I think we'd like to be able to reset a VF if only because it
gets the queues back to a pristine state. A VF reset should not affect
any of the other functions.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Learner Study - April 22, 2014, 2:45 a.m.
But Won't resetting from a VF impact functionality of other VFs?

On Apr 21, 2014, at 6:57 PM, Keith Busch <keith.busch@intel.com> wrote:

> On Mon, 21 Apr 2014, Learner Study wrote:
>> Hi Keith,
>> 
>> I think NVMe Reset should apply to PF mode driver only, and not to VF
>> mode driver.
>> Is that understanding correct? Does the NVMe driver know which mode
>> its running in?
> 
> Oh, this driver doesn't enable SR-IOV and has no PF/VF awareness. Shame
> on us, I'll add it to my list unless someone beats me to it.
> 
> Still, I think we'd like to be able to reset a VF if only because it
> gets the queues back to a pristine state. A VF reset should not affect
> any of the other functions.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Learner Study - April 22, 2014, 3:34 a.m.
Per the spec, "When an NVM Subsystem Reset occurs, the entire NVM
subsystem is reset"....so all VFs would get impacted, if a VF does a
Reset. So, I think Reset kind of control should be in PF mode only.

On Mon, Apr 21, 2014 at 7:45 PM, Learner <learner.study@gmail.com> wrote:
> But Won't resetting from a VF impact functionality of other VFs?
>
> On Apr 21, 2014, at 6:57 PM, Keith Busch <keith.busch@intel.com> wrote:
>
>> On Mon, 21 Apr 2014, Learner Study wrote:
>>> Hi Keith,
>>>
>>> I think NVMe Reset should apply to PF mode driver only, and not to VF
>>> mode driver.
>>> Is that understanding correct? Does the NVMe driver know which mode
>>> its running in?
>>
>> Oh, this driver doesn't enable SR-IOV and has no PF/VF awareness. Shame
>> on us, I'll add it to my list unless someone beats me to it.
>>
>> Still, I think we'd like to be able to reset a VF if only because it
>> gets the queues back to a pristine state. A VF reset should not affect
>> any of the other functions.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mayes, Barrett N - April 22, 2014, 5:02 a.m.
Section 8.5 of the NVMe 1.1 spec contains the following:  "While the details associated with implementing a controller that supports SR-IOV are outside the scope of this specification, such a controller shall implement fully compliant NVM Express Virtual Functions (VFs). This ensures that the same host software developed for non-virtualized environments is capable of running unmodified within an SI. No such requirement exists for the Physical Function (PF)."

If a VF is a fully NVMe compliant device then it must at least act like it implements NVMe subsystem reset.  How the SR-IOV-capable controller actually implements this is left up to the vendor.  But the spec does not require that a reset of one VF initiate a reset of other VF.

-----Original Message-----
From: Linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf Of Learner Study
Sent: Monday, April 21, 2014 8:34 PM
To: Busch, Keith
Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; linux-nvme
Subject: Re: [PATCH 2/2] NVMe: Implement PCI-e reset notification callback

Per the spec, "When an NVM Subsystem Reset occurs, the entire NVM subsystem is reset"....so all VFs would get impacted, if a VF does a Reset. So, I think Reset kind of control should be in PF mode only.

On Mon, Apr 21, 2014 at 7:45 PM, Learner <learner.study@gmail.com> wrote:
> But Won't resetting from a VF impact functionality of other VFs?
>
> On Apr 21, 2014, at 6:57 PM, Keith Busch <keith.busch@intel.com> wrote:
>
>> On Mon, 21 Apr 2014, Learner Study wrote:
>>> Hi Keith,
>>>
>>> I think NVMe Reset should apply to PF mode driver only, and not to 
>>> VF mode driver.
>>> Is that understanding correct? Does the NVMe driver know which mode 
>>> its running in?
>>
>> Oh, this driver doesn't enable SR-IOV and has no PF/VF awareness. 
>> Shame on us, I'll add it to my list unless someone beats me to it.
>>
>> Still, I think we'd like to be able to reset a VF if only because it 
>> gets the queues back to a pristine state. A VF reset should not 
>> affect any of the other functions.
Learner Study - April 22, 2014, 10:07 a.m.
It would be interesting to see which way the standard open source
driver decides to go...would we allow Reset in one VF to impact
operations in other VFs (as per spec 7.3.1 of NVMe 1.1 spec, "When an
NVM Subsystem Reset occurs, the entire NVM subsystem is reset"...)

On Mon, Apr 21, 2014 at 10:02 PM, Mayes, Barrett N
<barrett.n.mayes@intel.com> wrote:
> Section 8.5 of the NVMe 1.1 spec contains the following:  "While the details associated with implementing a controller that supports SR-IOV are outside the scope of this specification, such a controller shall implement fully compliant NVM Express Virtual Functions (VFs). This ensures that the same host software developed for non-virtualized environments is capable of running unmodified within an SI. No such requirement exists for the Physical Function (PF)."
>
> If a VF is a fully NVMe compliant device then it must at least act like it implements NVMe subsystem reset.  How the SR-IOV-capable controller actually implements this is left up to the vendor.  But the spec does not require that a reset of one VF initiate a reset of other VF.
>
> -----Original Message-----
> From: Linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf Of Learner Study
> Sent: Monday, April 21, 2014 8:34 PM
> To: Busch, Keith
> Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; linux-nvme
> Subject: Re: [PATCH 2/2] NVMe: Implement PCI-e reset notification callback
>
> Per the spec, "When an NVM Subsystem Reset occurs, the entire NVM subsystem is reset"....so all VFs would get impacted, if a VF does a Reset. So, I think Reset kind of control should be in PF mode only.
>
> On Mon, Apr 21, 2014 at 7:45 PM, Learner <learner.study@gmail.com> wrote:
>> But Won't resetting from a VF impact functionality of other VFs?
>>
>> On Apr 21, 2014, at 6:57 PM, Keith Busch <keith.busch@intel.com> wrote:
>>
>>> On Mon, 21 Apr 2014, Learner Study wrote:
>>>> Hi Keith,
>>>>
>>>> I think NVMe Reset should apply to PF mode driver only, and not to
>>>> VF mode driver.
>>>> Is that understanding correct? Does the NVMe driver know which mode
>>>> its running in?
>>>
>>> Oh, this driver doesn't enable SR-IOV and has no PF/VF awareness.
>>> Shame on us, I'll add it to my list unless someone beats me to it.
>>>
>>> Still, I think we'd like to be able to reset a VF if only because it
>>> gets the queues back to a pristine state. A VF reset should not
>>> affect any of the other functions.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keith Busch - April 22, 2014, 2:22 p.m.
On Mon, 21 Apr 2014, Mayes, Barrett N wrote:
> Section 8.5 of the NVMe 1.1 spec contains the following:  "While the details
> associated with implementing a controller that supports SR-IOV are outside
> the scope of this specification, such a controller shall implement fully
> compliant NVM Express Virtual Functions (VFs). This ensures that the same
> host software developed for non-virtualized environments is capable of
> running unmodified within an SI. No such requirement exists for the Physical
> Function (PF)."
>
> If a VF is a fully NVMe compliant device then it must at least act like it
> implements NVMe subsystem reset.  How the SR-IOV-capable controller actually
> implements this is left up to the vendor.  But the spec does not require that
> a reset of one VF initiate a reset of other VF.

The NVMe spec does not, but PCI does.

In case it wasn't clear from PATCH 1/2, this proposed callback is for a
function level reset. The PCI SR-IOV spec says all VF's implement FLR and
that this reset does not affect any other functions, from section 2.2.2:

   VFs must support Function Level Reset (FLR).

   Note: Software may use FLR to reset a VF. FLR to a VF affects a VF’s
   state but does not affect its existence in PCI Configuration Space or
   PCI Bus address space. The VFs BARn values (see Section 3.3.14) and
   VF MSE (see Section 3.3.3.4) in the PF’s SR-IOV extended capability
   are unaffected by FLRs issued to the VF.

Further, an NVMe subsystem reset is not the same as a controller or
function level reset. I have not proposed doing a subsytem reset here.

Since this is a call-back invoked from an FLR that happens outside the
driver whether the driver wants it to happen or not, it's better the
driver is aware and prepared that this occured rather than not knowing
and left to wonder why the heck the controller stopped responding.
Learner Study - April 22, 2014, 3 p.m.
Thanks for the clarification that the patch is for FLR, and not NVMe
subsystem Reset event.
Any thoughts on how NVMe subsystem Reset would be handled in future?

Thanks!

On Tue, Apr 22, 2014 at 7:22 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Mon, 21 Apr 2014, Mayes, Barrett N wrote:
>>
>> Section 8.5 of the NVMe 1.1 spec contains the following:  "While the
>> details
>> associated with implementing a controller that supports SR-IOV are outside
>> the scope of this specification, such a controller shall implement fully
>> compliant NVM Express Virtual Functions (VFs). This ensures that the same
>> host software developed for non-virtualized environments is capable of
>> running unmodified within an SI. No such requirement exists for the
>> Physical
>> Function (PF)."
>>
>> If a VF is a fully NVMe compliant device then it must at least act like it
>> implements NVMe subsystem reset.  How the SR-IOV-capable controller
>> actually
>> implements this is left up to the vendor.  But the spec does not require
>> that
>> a reset of one VF initiate a reset of other VF.
>
>
> The NVMe spec does not, but PCI does.
>
> In case it wasn't clear from PATCH 1/2, this proposed callback is for a
> function level reset. The PCI SR-IOV spec says all VF's implement FLR and
> that this reset does not affect any other functions, from section 2.2.2:
>
>   VFs must support Function Level Reset (FLR).
>
>   Note: Software may use FLR to reset a VF. FLR to a VF affects a VF’s
>   state but does not affect its existence in PCI Configuration Space or
>   PCI Bus address space. The VFs BARn values (see Section 3.3.14) and
>   VF MSE (see Section 3.3.3.4) in the PF’s SR-IOV extended capability
>   are unaffected by FLRs issued to the VF.
>
> Further, an NVMe subsystem reset is not the same as a controller or
> function level reset. I have not proposed doing a subsytem reset here.
>
> Since this is a call-back invoked from an FLR that happens outside the
> driver whether the driver wants it to happen or not, it's better the
> driver is aware and prepared that this occured rather than not knowing
> and left to wonder why the heck the controller stopped responding.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - April 22, 2014, 3:57 p.m.
On Tue, Apr 22, 2014 at 5:33 AM, liaohengquan1986
<liaohengquan1986@163.com> wrote:
>
> Hi, everyone,
>             I want to ask that weather the NVMe driver has been tried on the
> IBM or Lenovo server?
>             I use it on IBM(Lenovo) server with suse 11 SP3, but the MSI-X
> irq is always could not be got by cpu(may be it is masked).
>             Has anyone got this kind of problem?

I think your original email contained non-plain text, so it won't
appear on the mailing list, and some recipients may auto-discard it as
well.  See http://vger.kernel.org/majordomo-info.html

If you can reproduce the problem with an upstream kernel, this is the
right place to debug it.  We'd need a complete dmesg log and "lspci
-vv" output to start with.

If the problem only happens with SUSE, then you'd want to work with
SUSE to figure it out.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Learner Study - April 22, 2014, 4:08 p.m.
I'd like to test out the latest NVMe driver. Could someone recommend a
NVMe controller should I use? I haven't been  able to find anything on
Amazon yet...

And perhaps, which version of kernel (OS distribution) should I pick?

Thanks


On Tue, Apr 22, 2014 at 8:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Apr 22, 2014 at 5:33 AM, liaohengquan1986
> <liaohengquan1986@163.com> wrote:
>>
>> Hi, everyone,
>>             I want to ask that weather the NVMe driver has been tried on the
>> IBM or Lenovo server?
>>             I use it on IBM(Lenovo) server with suse 11 SP3, but the MSI-X
>> irq is always could not be got by cpu(may be it is masked).
>>             Has anyone got this kind of problem?
>
> I think your original email contained non-plain text, so it won't
> appear on the mailing list, and some recipients may auto-discard it as
> well.  See http://vger.kernel.org/majordomo-info.html
>
> If you can reproduce the problem with an upstream kernel, this is the
> right place to debug it.  We'd need a complete dmesg log and "lspci
> -vv" output to start with.
>
> If the problem only happens with SUSE, then you'd want to work with
> SUSE to figure it out.
>
> Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 625259d..273ff12 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -2605,6 +2605,16 @@  static void nvme_reset_failed_dev(struct work_struct *ws)
 	nvme_dev_reset(dev);
 }
 
+static void nvme_reset_notify(struct pci_dev *pdev, int prepare)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	if (prepare)
+		nvme_dev_shutdown(dev);
+	else
+		nvme_dev_resume(dev);
+}
+
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	int result = -ENOMEM;
@@ -2744,6 +2754,7 @@  static const struct pci_error_handlers nvme_err_handler = {
 	.link_reset	= nvme_link_reset,
 	.slot_reset	= nvme_slot_reset,
 	.resume		= nvme_error_resume,
+	.reset_notify	= nvme_reset_notify,
 };
 
 /* Move to pci_ids.h later */