diff mbox series

[1/5] nvme: PCI/e configuration from specification

Message ID 20180622112237.2131-1-gersner@gmail.com
State New
Headers show
Series [1/5] nvme: PCI/e configuration from specification | expand

Commit Message

Gersner June 22, 2018, 11:22 a.m. UTC
PCI/e configuration currently does not meets specifications.

Patch includes various configuration changes to support specifications
- BAR2 to return zero when read and CMD.IOSE is not set.
- Expose NVME configuration through IO space (Optional).
- PCI Power Management v1.2.
- PCIe Function Level Reset.
- Disable QEMUs default use of PCIe Link Status (DLLLA).
- PCIe missing AOC compliance flag.
- Mask PCIe End-to-End TLP as RO (Unspecified by specification).

Change-Id: I9057f56266db16b013fa194730c998d4779cede3
Signed-off-by: Shimi Gersner <gersner@gmail.com>
---
 hw/block/nvme.c           | 59 +++++++++++++++++++++++++++++++++++++--
 include/hw/pci/pci_regs.h |  1 +
 2 files changed, 58 insertions(+), 2 deletions(-)

Comments

Kevin Wolf July 12, 2018, 11:47 a.m. UTC | #1
Am 22.06.2018 um 13:22 hat Shimi Gersner geschrieben:
> PCI/e configuration currently does not meets specifications.
> 
> Patch includes various configuration changes to support specifications
> - BAR2 to return zero when read and CMD.IOSE is not set.
> - Expose NVME configuration through IO space (Optional).
> - PCI Power Management v1.2.
> - PCIe Function Level Reset.
> - Disable QEMUs default use of PCIe Link Status (DLLLA).
> - PCIe missing AOC compliance flag.
> - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> 
> Change-Id: I9057f56266db16b013fa194730c998d4779cede3

What is the meaning of this Change-Id tag?

> Signed-off-by: Shimi Gersner <gersner@gmail.com>

Keith, can you have a look at this series?

Kevin
David Sariel July 13, 2018, 7:40 a.m. UTC | #2
Our bad. Change-Id tag snuck into those from gerrit
(https://review.gerrithub.io/c/davidsaOpenu/qemu/+/415434).

Took a note to replace this line with "[PATCH v2]" but, I guess, it makes
sense
if additional comments will follow, right?

Thanks for taking a look.

On 12 July 2018 at 14:47, Kevin Wolf <kwolf@redhat.com> wrote:

> Am 22.06.2018 um 13:22 hat Shimi Gersner geschrieben:
> > PCI/e configuration currently does not meets specifications.
> >
> > Patch includes various configuration changes to support specifications
> > - BAR2 to return zero when read and CMD.IOSE is not set.
> > - Expose NVME configuration through IO space (Optional).
> > - PCI Power Management v1.2.
> > - PCIe Function Level Reset.
> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
> > - PCIe missing AOC compliance flag.
> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> >
> > Change-Id: I9057f56266db16b013fa194730c998d4779cede3
>
> What is the meaning of this Change-Id tag?
>
> > Signed-off-by: Shimi Gersner <gersner@gmail.com>
>
> Keith, can you have a look at this series?
>
> Kevin
>
Daniel Verkamp July 15, 2018, 6:20 a.m. UTC | #3
On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner <gersner@gmail.com> wrote:
> PCI/e configuration currently does not meets specifications.
>
> Patch includes various configuration changes to support specifications
> - BAR2 to return zero when read and CMD.IOSE is not set.
> - Expose NVME configuration through IO space (Optional).
> - PCI Power Management v1.2.
> - PCIe Function Level Reset.
> - Disable QEMUs default use of PCIe Link Status (DLLLA).
> - PCIe missing AOC compliance flag.
> - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
[...]
>      n->num_namespaces = 1;
>      n->num_queues = 64;
> -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
> +    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);

The existing math looks OK to me (maybe already 4 bytes larger than
necessary?). The controller registers should be 0x1000 bytes for the
fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
padding between queues). The result is also rounded up to a power of
two, so the result will typically be 8 KB.  What is the rationale for
this change?

>      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
>
>      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      pci_register_bar(&n->parent_obj, 0,
>          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>          &n->iomem);
> +
> +    // Expose the NVME memory through Address Space IO (Optional by spec)
> +    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO, &n->iomem);

This looks like it will register the whole 4KB+ NVMe register set
(n->iomem) as an I/O space BAR, but this doesn't match the spec (see
NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
implemented) should just contain two 4-byte registers, Index and Data,
that can be used to indirectly access the NVMe register set.  (Just
for curiosity, do you know of any software that uses this feature? It
could be useful for testing the implementation.)

Other minor nitpicks:
- Comment style is inconsistent with the rest of the file (// vs /* */)
- PCIe and NVMe are incorrectly capitalized a few times in comments

Thanks,
-- Daniel
Gersner Aug. 26, 2018, 9:49 p.m. UTC | #4
Hi Daniel,
Thanks for taking a look. Comments are inline.

Gersner.

On Sun, Jul 15, 2018 at 9:21 AM Daniel Verkamp <daniel@drv.nu> wrote:

> On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner <gersner@gmail.com> wrote:
> > PCI/e configuration currently does not meets specifications.
> >
> > Patch includes various configuration changes to support specifications
> > - BAR2 to return zero when read and CMD.IOSE is not set.
> > - Expose NVME configuration through IO space (Optional).
> > - PCI Power Management v1.2.
> > - PCIe Function Level Reset.
> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
> > - PCIe missing AOC compliance flag.
> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> [...]
> >      n->num_namespaces = 1;
> >      n->num_queues = 64;
> > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
> > +    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
>
> The existing math looks OK to me (maybe already 4 bytes larger than
> necessary?). The controller registers should be 0x1000 bytes for the
> fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
> for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
> padding between queues). The result is also rounded up to a power of
> two, so the result will typically be 8 KB.  What is the rationale for
> this change?
>
You are correct, this change shouldn't be here. It was made during tests
against the
nvme compliance which failed here.

The specification states that bits 0 to 13 are RO, which implicitly suggests
that the minimal size of this BAR should be at least 16K as this is a
standard
way to determine the BAR size on PCI (AFAIK). On the contrary it doesn't
fit very well with the memory mapped laid out on 3.1 Register Definition
of the 1.3b nvme spec. Any idea?

Additionally it does look like it has an extra 4 bytes.


>
> >      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
> >
> >      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> > @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev,
> Error **errp)
> >      pci_register_bar(&n->parent_obj, 0,
> >          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> >          &n->iomem);
> > +
> > +    // Expose the NVME memory through Address Space IO (Optional by
> spec)
> > +    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO,
> &n->iomem);
>
> This looks like it will register the whole 4KB+ NVMe register set
> (n->iomem) as an I/O space BAR, but this doesn't match the spec (see
> NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
> implemented) should just contain two 4-byte registers, Index and Data,
> that can be used to indirectly access the NVMe register set.  (Just
> for curiosity, do you know of any software that uses this feature? It
> could be useful for testing the implementation.)
>
Very nice catch. We indeed totally miss-interpreted the specification here.

We use the compliance suit for sanity, but it doesn't actually use this
feature,
just verifies the configuration of the registers.

We will go with rolling back this feature as this is optional.

Question - I'm having hard time to determine from the specification - As
this is optional, how device drivers determine if it is available? Does it
simply the CMD.IOSE flag from the PCI? And if so, what is
the standard way in QEMU to disable the flag completely?


>
> Other minor nitpicks:
> - Comment style is inconsistent with the rest of the file (// vs /* */)
> - PCIe and NVMe are incorrectly capitalized a few times in comments
>

Thanks.


>
> Thanks,
> -- Daniel
>
Daniel Verkamp Aug. 30, 2018, 3:45 p.m. UTC | #5
Hi Shimi,

On Sun, Aug 26, 2018 at 2:50 PM Gersner <gersner@gmail.com> wrote:
>
> Hi Daniel,
> Thanks for taking a look. Comments are inline.
>
> Gersner.
>
> On Sun, Jul 15, 2018 at 9:21 AM Daniel Verkamp <daniel@drv.nu> wrote:
>>
>> On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner <gersner@gmail.com> wrote:
>> > PCI/e configuration currently does not meets specifications.
>> >
>> > Patch includes various configuration changes to support specifications
>> > - BAR2 to return zero when read and CMD.IOSE is not set.
>> > - Expose NVME configuration through IO space (Optional).
>> > - PCI Power Management v1.2.
>> > - PCIe Function Level Reset.
>> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
>> > - PCIe missing AOC compliance flag.
>> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
>> [...]
>> >      n->num_namespaces = 1;
>> >      n->num_queues = 64;
>> > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
>> > +    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
>>
>> The existing math looks OK to me (maybe already 4 bytes larger than
>> necessary?). The controller registers should be 0x1000 bytes for the
>> fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
>> for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
>> padding between queues). The result is also rounded up to a power of
>> two, so the result will typically be 8 KB.  What is the rationale for
>> this change?
>
> You are correct, this change shouldn't be here. It was made during tests against the
> nvme compliance which failed here.

If the NVMe compliance test requires it, that is probably sufficient
reason to change it - although it would be interesting to hear their
rationale.

> The specification states that bits 0 to 13 are RO, which implicitly suggests
> that the minimal size of this BAR should be at least 16K as this is a standard
> way to determine the BAR size on PCI (AFAIK). On the contrary it doesn't
> fit very well with the memory mapped laid out on 3.1 Register Definition
> of the 1.3b nvme spec. Any idea?

You're right, the MLBAR section of the NVMe spec does seem to indicate
that up to bit 13 should be reserved/RO.  This is also probably a good
enough reason to make the change, as long as this reason is provided.

>
> Additionally it does look like it has an extra 4 bytes.
>
>>
>>
>> >      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
>> >
>> >      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>> > @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>> >      pci_register_bar(&n->parent_obj, 0,
>> >          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>> >          &n->iomem);
>> > +
>> > +    // Expose the NVME memory through Address Space IO (Optional by spec)
>> > +    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO, &n->iomem);
>>
>> This looks like it will register the whole 4KB+ NVMe register set
>> (n->iomem) as an I/O space BAR, but this doesn't match the spec (see
>> NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
>> implemented) should just contain two 4-byte registers, Index and Data,
>> that can be used to indirectly access the NVMe register set.  (Just
>> for curiosity, do you know of any software that uses this feature? It
>> could be useful for testing the implementation.)
>
> Very nice catch. We indeed totally miss-interpreted the specification here.
>
> We use the compliance suit for sanity, but it doesn't actually use this feature,
> just verifies the configuration of the registers.
>
> We will go with rolling back this feature as this is optional.

This is probably the right move; I don't know of any real hardware
devices that implement it (and therefore I doubt any software will
require it).  However, it should not be too difficult to add an
implementation of this feature; if you are interested, take a look at
the ahci_idp_read/ahci_idp_write functions - the NVMe index/data pair
should be very similar.

> Question - I'm having hard time to determine from the specification - As
> this is optional, how device drivers determine if it is available? Does it
> simply the CMD.IOSE flag from the PCI? And if so, what is
> the standard way in QEMU to disable the flag completely?

Based on the wording in NVMe 1.3 section 3.2, it seems like the only
indication of support for Index/Data Pair is whether BAR2 is filled
out.  I believe PCI defines unused BARs to be all 0s, so software
should be able to use this to determine whether the feature is
provided by a particular NVMe PCIe device, and removing the
pci_register_bar() call should be enough to accomplish this from the
QEMU side.

Thanks,
-- Daniel
Gersner Sept. 12, 2018, 7:53 p.m. UTC | #6
Hi Daniel,

Sorry for the long round-trips, we had a busy month. We have implemented
all the changes. Waiting for a final clarification.

Should the new patches be posted on this thread or a new one?

Thanks for you time.

Gersner.

On Thu, Aug 30, 2018 at 6:45 PM Daniel Verkamp <daniel@drv.nu> wrote:

> Hi Shimi,
>
> On Sun, Aug 26, 2018 at 2:50 PM Gersner <gersner@gmail.com> wrote:
> >
> > Hi Daniel,
> > Thanks for taking a look. Comments are inline.
> >
> > Gersner.
> >
> > On Sun, Jul 15, 2018 at 9:21 AM Daniel Verkamp <daniel@drv.nu> wrote:
> >>
> >> On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner <gersner@gmail.com>
> wrote:
> >> > PCI/e configuration currently does not meets specifications.
> >> >
> >> > Patch includes various configuration changes to support specifications
> >> > - BAR2 to return zero when read and CMD.IOSE is not set.
> >> > - Expose NVME configuration through IO space (Optional).
> >> > - PCI Power Management v1.2.
> >> > - PCIe Function Level Reset.
> >> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
> >> > - PCIe missing AOC compliance flag.
> >> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> >> [...]
> >> >      n->num_namespaces = 1;
> >> >      n->num_queues = 64;
> >> > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
> >> > +    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
> >>
> >> The existing math looks OK to me (maybe already 4 bytes larger than
> >> necessary?). The controller registers should be 0x1000 bytes for the
> >> fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
> >> for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
> >> padding between queues). The result is also rounded up to a power of
> >> two, so the result will typically be 8 KB.  What is the rationale for
> >> this change?
> >
> > You are correct, this change shouldn't be here. It was made during tests
> against the
> > nvme compliance which failed here.
>
> If the NVMe compliance test requires it, that is probably sufficient
> reason to change it - although it would be interesting to hear their
> rationale.
>
> > The specification states that bits 0 to 13 are RO, which implicitly
> suggests
> > that the minimal size of this BAR should be at least 16K as this is a
> standard
> > way to determine the BAR size on PCI (AFAIK). On the contrary it doesn't
> > fit very well with the memory mapped laid out on 3.1 Register Definition
> > of the 1.3b nvme spec. Any idea?
>
> You're right, the MLBAR section of the NVMe spec does seem to indicate
> that up to bit 13 should be reserved/RO.  This is also probably a good
> enough reason to make the change, as long as this reason is provided.
>
> >
> > Additionally it does look like it has an extra 4 bytes.
> >
> >>
> >>
> >> >      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
> >> >
> >> >      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> >> > @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev,
> Error **errp)
> >> >      pci_register_bar(&n->parent_obj, 0,
> >> >          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> >> >          &n->iomem);
> >> > +
> >> > +    // Expose the NVME memory through Address Space IO (Optional by
> spec)
> >> > +    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO,
> &n->iomem);
> >>
> >> This looks like it will register the whole 4KB+ NVMe register set
> >> (n->iomem) as an I/O space BAR, but this doesn't match the spec (see
> >> NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
> >> implemented) should just contain two 4-byte registers, Index and Data,
> >> that can be used to indirectly access the NVMe register set.  (Just
> >> for curiosity, do you know of any software that uses this feature? It
> >> could be useful for testing the implementation.)
> >
> > Very nice catch. We indeed totally miss-interpreted the specification
> here.
> >
> > We use the compliance suit for sanity, but it doesn't actually use this
> feature,
> > just verifies the configuration of the registers.
> >
> > We will go with rolling back this feature as this is optional.
>
> This is probably the right move; I don't know of any real hardware
> devices that implement it (and therefore I doubt any software will
> require it).  However, it should not be too difficult to add an
> implementation of this feature; if you are interested, take a look at
> the ahci_idp_read/ahci_idp_write functions - the NVMe index/data pair
> should be very similar.
>
Nice, It does seem easy to implement. We decided to go without due to lack
of real-world software we could test against the new facility.


>
> > Question - I'm having hard time to determine from the specification - As
> > this is optional, how device drivers determine if it is available? Does
> it
> > simply the CMD.IOSE flag from the PCI? And if so, what is
> > the standard way in QEMU to disable the flag completely?
>
> Based on the wording in NVMe 1.3 section 3.2, it seems like the only
> indication of support for Index/Data Pair is whether BAR2 is filled
> out.  I believe PCI defines unused BARs to be all 0s, so software
> should be able to use this to determine whether the feature is
> provided by a particular NVMe PCIe device, and removing the
> pci_register_bar() call should be enough to accomplish this from the
> QEMU side.
>
Agree, that was indeed what I also understood from the spec. On the contrary
for some reason the compliance
<https://github.com/nvmecompliance/tnvme/blob/30a19c6829b571b2c7bdf854daf0351bfe038032/GrpPciRegisters/allPciRegs_r10b.cpp#L382>
tests assume that IOSE==1 means
that the Index/Data pair is enabled. They probably had a good reason.
I'm not even sure who turns the IOSE flag up, I found no relevant code in
QEMU.


>
> Thanks,
> -- Daniel
>
Eric Blake Sept. 12, 2018, 9:21 p.m. UTC | #7
On 9/12/18 2:53 PM, Gersner wrote:
> Hi Daniel,
> 
> Sorry for the long round-trips, we had a busy month. We have implemented
> all the changes. Waiting for a final clarification.
> 
> Should the new patches be posted on this thread or a new one?

Best to post a v2 as a new top-level thread (our CI tools don't spot 
patch revisions buried in reply to an existing thread).

Also, it's best to avoid top-posting on technical lists, as it's harder 
to follow the flow of your email.
diff mbox series

Patch

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d5bf95b79b..9d5414c80f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@ 
  */
 
 #include "qemu/osdep.h"
+#include "qemu/range.h"
 #include "hw/block/block.h"
 #include "hw/hw.h"
 #include "hw/pci/msix.h"
@@ -39,6 +40,9 @@ 
 #include "trace.h"
 #include "nvme.h"
 
+#define PCI_EXP_LNKCAP_AOC 0x00400000 /* ASPM Optionality Compliance (AOC) */
+#define PCI_EXP_DEVCAP2_CTDS 0x10 /* Completion Timeout Disable Supported (CTDS) */
+
 #define NVME_GUEST_ERR(trace, fmt, ...) \
     do { \
         (trace_##trace)(__VA_ARGS__); \
@@ -1195,14 +1199,31 @@  static const MemoryRegionOps nvme_cmb_ops = {
     },
 };
 
+static uint32_t nvme_pci_read_config(PCIDevice *pci_dev, uint32_t addr, int len)
+{
+    uint32_t val; /* Value to be returned */
+
+    val = pci_default_read_config(pci_dev, addr, len);
+    if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_2, 4) && (!(pci_dev->config[PCI_COMMAND] & PCI_COMMAND_IO))) {
+        /* When CMD.IOSE is not set */
+        val = 0 ;
+    }
+
+    return val;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
+    static const uint16_t nvme_pm_offset = 0x80;
+    static const uint16_t nvme_pcie_offset = nvme_pm_offset + PCI_PM_SIZEOF;
+
     NvmeCtrl *n = NVME(pci_dev);
     NvmeIdCtrl *id = &n->id_ctrl;
 
     int i;
     int64_t bs_size;
     uint8_t *pci_conf;
+    uint8_t *pci_wmask;
 
     if (!n->conf.blk) {
         error_setg(errp, "drive property not set");
@@ -1226,14 +1247,43 @@  static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     }
 
     pci_conf = pci_dev->config;
+    pci_wmask = pci_dev->wmask;
     pci_conf[PCI_INTERRUPT_PIN] = 1;
     pci_config_set_prog_interface(pci_dev->config, 0x2);
     pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
-    pcie_endpoint_cap_init(&n->parent_obj, 0x80);
+
+    // Configure the PMC capability
+    (void)pci_add_capability(pci_dev, PCI_CAP_ID_PM, nvme_pm_offset, PCI_PM_SIZEOF, errp);
+    if (NULL != *errp) {
+        return;
+    }
+
+    //  - PCI Power Management v1.2, No PME support, No Soft Reset, Make state writeable
+    pci_set_word(pci_conf + nvme_pm_offset + PCI_PM_PMC, PCI_PM_CAP_VER_1_2);
+    pci_set_word(pci_conf + nvme_pm_offset + PCI_PM_CTRL, PCI_PM_CTRL_NO_SOFT_RESET);
+    pci_set_word(pci_wmask + nvme_pm_offset + PCI_PM_CTRL, PCI_PM_CTRL_STATE_MASK);
+
+    // Disable QEMU default QEMU_PCIE_LNKSTA_DLLLA to disabled active flag in the Link Status Register of PCIE
+    pci_dev->cap_present &= ~(QEMU_PCIE_LNKSTA_DLLLA);
+
+    // PCIE Capability
+    pcie_endpoint_cap_init(&n->parent_obj, nvme_pcie_offset);
+
+    // PCIE Function Level Reset (FLRC) as required by 1.2 spec
+    pcie_cap_flr_init(&n->parent_obj);
+
+    // PCIE Configured with L0s by default by QEMU, configure missing AOC flag required by compliance
+    pci_long_test_and_set_mask(pci_conf + pci_dev->exp.exp_cap + PCI_EXP_LNKCAP, PCI_EXP_LNKCAP_AOC);
+
+    // Compliance requires Completion Timeout Disable Supported (CTDS).
+    pci_long_test_and_set_mask(pci_conf + pci_dev->exp.exp_cap + PCI_EXP_DEVCAP2, PCI_EXP_DEVCAP2_CTDS);
+
+    // Make the End-End TLP Prefix readonly as NVME spec doesnt acknowledge this field
+    pci_word_test_and_clear_mask(pci_wmask + pci_dev->exp.exp_cap + PCI_EXP_DEVCTL2, PCI_EXP_DEVCTL2_EETLPPB);
 
     n->num_namespaces = 1;
     n->num_queues = 64;
-    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
+    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
     n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
     n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
@@ -1245,6 +1295,10 @@  static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     pci_register_bar(&n->parent_obj, 0,
         PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
         &n->iomem);
+
+    // Expose the NVME memory through Address Space IO (Optional by spec)
+    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO, &n->iomem);
+
     msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4, NULL);
 
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
@@ -1355,6 +1409,7 @@  static void nvme_class_init(ObjectClass *oc, void *data)
     PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
     pc->realize = nvme_realize;
+    pc->config_read = nvme_pci_read_config;
     pc->exit = nvme_exit;
     pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
     pc->vendor_id = PCI_VENDOR_ID_INTEL;
diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index 7a83142578..54dc918f54 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -1,3 +1,4 @@ 
 #include "standard-headers/linux/pci_regs.h"
 
 #define  PCI_PM_CAP_VER_1_1     0x0002  /* PCI PM spec ver. 1.1 */
+#define  PCI_PM_CAP_VER_1_2     0x0003  /* PCI PM spec ver. 1.2 */