diff mbox

[v2,3/4] nvdimm: add a boolean option "restrict"

Message ID 20170606072229.9302-4-haozhong.zhang@intel.com
State New
Headers show

Commit Message

Haozhong Zhang June 6, 2017, 7:22 a.m. UTC
If a vNVDIMM device is not backed by a DAX device and its "restrict"
option is enabled, bit 3 of state flags in its region mapping
structure will be set, in order to notify the guest of the lack of
write persistence guarantee. Once this bit is set, the guest OS may
mark the vNVDIMM device as read-only.

This option is disabled by default for backwards compatibility. It's
recommended to enable for the formal usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 hw/acpi/nvdimm.c        | 16 ++++++++++++++++
 hw/mem/nvdimm.c         | 38 +++++++++++++++++++++++++++++++++++++-
 include/hw/mem/nvdimm.h |  5 +++++
 3 files changed, 58 insertions(+), 1 deletion(-)

Comments

Stefan Hajnoczi June 7, 2017, 3:27 p.m. UTC | #1
On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> If a vNVDIMM device is not backed by a DAX device and its "restrict"
> option is enabled, bit 3 of state flags in its region mapping
> structure will be set, in order to notify the guest of the lack of
> write persistence guarantee. Once this bit is set, the guest OS may
> mark the vNVDIMM device as read-only.
> 
> This option is disabled by default for backwards compatibility. It's
> recommended to enable for the formal usage.

Good idea.  I think the following is cleaner:

DEFINE_PROP_ON_OFF_AUTO("readonly") on the 'nvdimm' device.  The
following states are available:

 * 'on' - ACPI_NFIT_MEM_NOT_ARMED is set
 * 'off' - ACPI_NFIT_MEM_NOT_ARMED is clear
 * 'auto' - ACPI_NFIT_MEM_NOT_ARMED set if backend is not persistent

This new property defaults to 'auto'.  Machine types older than
pc-i440fx-2.10 and pc-q35-2.10 default to 'on'.
Haozhong Zhang June 8, 2017, 1:45 a.m. UTC | #2
On 06/07/17 16:27 +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> > If a vNVDIMM device is not backed by a DAX device and its "restrict"
> > option is enabled, bit 3 of state flags in its region mapping
> > structure will be set, in order to notify the guest of the lack of
> > write persistence guarantee. Once this bit is set, the guest OS may
> > mark the vNVDIMM device as read-only.
> > 
> > This option is disabled by default for backwards compatibility. It's
> > recommended to enable for the formal usage.
> 
> Good idea.  I think the following is cleaner:
> 
> DEFINE_PROP_ON_OFF_AUTO("readonly") on the 'nvdimm' device.  The
> following states are available:
> 
>  * 'on' - ACPI_NFIT_MEM_NOT_ARMED is set
>  * 'off' - ACPI_NFIT_MEM_NOT_ARMED is clear
>  * 'auto' - ACPI_NFIT_MEM_NOT_ARMED set if backend is not persistent
> 
> This new property defaults to 'auto'.  Machine types older than
> pc-i440fx-2.10 and pc-q35-2.10 default to 'on'.

Shouldn't it be 'off' on older machine types? The older machine types
and older QEMU never check the backend and never set ACPI_NFIT_MEM_NOT_ARMED.

Haozhong
Haozhong Zhang June 8, 2017, 6:39 a.m. UTC | #3
On 06/07/17 16:27 +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> > If a vNVDIMM device is not backed by a DAX device and its "restrict"
> > option is enabled, bit 3 of state flags in its region mapping
> > structure will be set, in order to notify the guest of the lack of
> > write persistence guarantee. Once this bit is set, the guest OS may
> > mark the vNVDIMM device as read-only.
> > 
> > This option is disabled by default for backwards compatibility. It's
> > recommended to enable for the formal usage.
> 
> Good idea.  I think the following is cleaner:
> 
> DEFINE_PROP_ON_OFF_AUTO("readonly") on the 'nvdimm' device.  The
> following states are available:
> 
>  * 'on' - ACPI_NFIT_MEM_NOT_ARMED is set
>  * 'off' - ACPI_NFIT_MEM_NOT_ARMED is clear
>  * 'auto' - ACPI_NFIT_MEM_NOT_ARMED set if backend is not persistent
> 
> This new property defaults to 'auto'.  Machine types older than
> pc-i440fx-2.10 and pc-q35-2.10 default to 'on'.

I think the the name "readonly" is not precise, because QEMU only sets
one bit and does not prevent guest writes. It's guest decision to
treat the vNVDIMM devices as read-only (e.g. Linux kernel).

We may use "unsafe-write" instead.

Haozhong
Stefan Hajnoczi June 8, 2017, 10:18 a.m. UTC | #4
On Thu, Jun 08, 2017 at 02:39:48PM +0800, Haozhong Zhang wrote:
> On 06/07/17 16:27 +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> > > If a vNVDIMM device is not backed by a DAX device and its "restrict"
> > > option is enabled, bit 3 of state flags in its region mapping
> > > structure will be set, in order to notify the guest of the lack of
> > > write persistence guarantee. Once this bit is set, the guest OS may
> > > mark the vNVDIMM device as read-only.
> > > 
> > > This option is disabled by default for backwards compatibility. It's
> > > recommended to enable for the formal usage.
> > 
> > Good idea.  I think the following is cleaner:
> > 
> > DEFINE_PROP_ON_OFF_AUTO("readonly") on the 'nvdimm' device.  The
> > following states are available:
> > 
> >  * 'on' - ACPI_NFIT_MEM_NOT_ARMED is set
> >  * 'off' - ACPI_NFIT_MEM_NOT_ARMED is clear
> >  * 'auto' - ACPI_NFIT_MEM_NOT_ARMED set if backend is not persistent
> > 
> > This new property defaults to 'auto'.  Machine types older than
> > pc-i440fx-2.10 and pc-q35-2.10 default to 'on'.
> 
> I think the the name "readonly" is not precise, because QEMU only sets
> one bit and does not prevent guest writes. It's guest decision to
> treat the vNVDIMM devices as read-only (e.g. Linux kernel).
> 
> We may use "unsafe-write" instead.

I agree, "readonly" isn't accurate.  I would use the NFIT terminology
and call it "armed".
Stefan Hajnoczi June 8, 2017, 10:19 a.m. UTC | #5
On Thu, Jun 08, 2017 at 09:45:29AM +0800, Haozhong Zhang wrote:
> On 06/07/17 16:27 +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> > > If a vNVDIMM device is not backed by a DAX device and its "restrict"
> > > option is enabled, bit 3 of state flags in its region mapping
> > > structure will be set, in order to notify the guest of the lack of
> > > write persistence guarantee. Once this bit is set, the guest OS may
> > > mark the vNVDIMM device as read-only.
> > > 
> > > This option is disabled by default for backwards compatibility. It's
> > > recommended to enable for the formal usage.
> > 
> > Good idea.  I think the following is cleaner:
> > 
> > DEFINE_PROP_ON_OFF_AUTO("readonly") on the 'nvdimm' device.  The
> > following states are available:
> > 
> >  * 'on' - ACPI_NFIT_MEM_NOT_ARMED is set
> >  * 'off' - ACPI_NFIT_MEM_NOT_ARMED is clear
> >  * 'auto' - ACPI_NFIT_MEM_NOT_ARMED set if backend is not persistent
> > 
> > This new property defaults to 'auto'.  Machine types older than
> > pc-i440fx-2.10 and pc-q35-2.10 default to 'on'.
> 
> Shouldn't it be 'off' on older machine types? The older machine types
> and older QEMU never check the backend and never set ACPI_NFIT_MEM_NOT_ARMED.

You are right.  'readonly' should be 'off' for older machine types.
Michael S. Tsirkin June 8, 2017, 12:56 p.m. UTC | #6
On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> If a vNVDIMM device is not backed by a DAX device and its "restrict"
> option is enabled, bit 3 of state flags in its region mapping
> structure will be set, in order to notify the guest of the lack of
> write persistence guarantee. Once this bit is set, the guest OS may
> mark the vNVDIMM device as read-only.
> 
> This option is disabled by default for backwards compatibility. It's
> recommended to enable for the formal usage.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Seems wrong to me. E.g. it won't work in a nested
virt setup. What if backend is dax but is not armed?
Can't the armed bit of the backing device be tested?
Name "restrict" is also confusing. Can we reuse cache=
options? E.g. cache=unsafe etc.

> ---
>  hw/acpi/nvdimm.c        | 16 ++++++++++++++++
>  hw/mem/nvdimm.c         | 38 +++++++++++++++++++++++++++++++++++++-
>  include/hw/mem/nvdimm.h |  5 +++++
>  3 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index 8e7d6ec034..fd1ef6dc65 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -138,6 +138,8 @@ struct NvdimmNfitMemDev {
>  } QEMU_PACKED;
>  typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
>  
> +#define ACPI_NFIT_MEM_NOT_ARMED    (1 << 3)
> +
>  /*
>   * NVDIMM Control Region Structure
>   *
> @@ -289,6 +291,10 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
>      int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
>                                              NULL);
>      uint32_t handle = nvdimm_slot_to_handle(slot);
> +    bool dev_dax = object_property_get_bool(OBJECT(dev), NVDIMM_DEV_DAX_PROP,
> +                                            NULL);
> +    bool restrict_mode = object_property_get_bool(OBJECT(dev),
> +                                                  NVDIMM_RESTRICT_PROP, NULL);
>  
>      nfit_memdev = acpi_data_push(structures, sizeof(*nfit_memdev));
>  
> @@ -312,6 +318,16 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
>  
>      /* Only one interleave for PMEM. */
>      nfit_memdev->interleave_ways = cpu_to_le16(1);
> +
> +    /*
> +     * If a vNVDIMM device in the restrict mode and is not backed by a
> +     * DAX device, QEMU will set ACPI_NFIT_MEM_NOT_ARMED bit of state
> +     * flags in its region mapping structure, in order to notify the
> +     * guest of the lack of write persistence guarantee.
> +     */
> +    if (!dev_dax && restrict_mode) {
> +        nfit_memdev->flags = cpu_to_le16(ACPI_NFIT_MEM_NOT_ARMED);
> +    }
>  }
>  
>  /*
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index b23542fbdf..cda416e5c8 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -65,11 +65,46 @@ out:
>      error_propagate(errp, local_err);
>  }
>  
> +static bool nvdimm_get_backend_dev_dax(Object *obj, Error **errp)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> +
> +    return nvdimm->backend_dev_dax;
> +}
> +
> +static bool nvdimm_get_restrict(Object *obj, Error **errp)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> +
> +    return nvdimm->restrict_mode;
> +}
> +
> +static void nvdimm_set_restrict(Object *obj, bool val, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> +    Error *local_err = NULL;
> +
> +    if (dev->realized) {
> +        error_setg(&local_err, "cannot change property value");
> +        goto out;
> +    }
> +
> +    nvdimm->restrict_mode = val;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static void nvdimm_init(Object *obj)
>  {
>      object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
>                          nvdimm_get_label_size, nvdimm_set_label_size, NULL,
>                          NULL, NULL);
> +    object_property_add_bool(obj, NVDIMM_DEV_DAX_PROP,
> +                             nvdimm_get_backend_dev_dax, NULL, NULL);
> +    object_property_add_bool(obj, NVDIMM_RESTRICT_PROP,
> +                             nvdimm_get_restrict, nvdimm_set_restrict, NULL);
>  }
>  
>  static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm)
> @@ -85,7 +120,8 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
>      NVDIMMDevice *nvdimm = NVDIMM(dimm);
>      uint64_t align, pmem_size, size = memory_region_size(mr);
>  
> -    if (!qemu_fd_is_dev_dax(memory_region_get_fd(mr))) {
> +    nvdimm->backend_dev_dax = qemu_fd_is_dev_dax(memory_region_get_fd(mr));
> +    if (!nvdimm->backend_dev_dax) {
>          error_report("warning: nvdimm backend does not look like a DAX device, "
>                       "unable to guarantee persistence of guest writes");
>      }
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index f1f3987055..2fbe0d7858 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -49,6 +49,8 @@
>                                                 TYPE_NVDIMM)
>  
>  #define NVDIMM_LABEL_SIZE_PROP "label-size"
> +#define NVDIMM_DEV_DAX_PROP    "dev-dax"
> +#define NVDIMM_RESTRICT_PROP   "restrict"
>  
>  struct NVDIMMDevice {
>      /* private */
> @@ -74,6 +76,9 @@ struct NVDIMMDevice {
>       * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
>       */
>      MemoryRegion nvdimm_mr;
> +
> +    bool backend_dev_dax;
> +    bool restrict_mode;
>  };
>  typedef struct NVDIMMDevice NVDIMMDevice;
>  
> -- 
> 2.11.0
Stefan Hajnoczi June 9, 2017, 9:34 a.m. UTC | #7
On Thu, Jun 08, 2017 at 03:56:42PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> > If a vNVDIMM device is not backed by a DAX device and its "restrict"
> > option is enabled, bit 3 of state flags in its region mapping
> > structure will be set, in order to notify the guest of the lack of
> > write persistence guarantee. Once this bit is set, the guest OS may
> > mark the vNVDIMM device as read-only.
> > 
> > This option is disabled by default for backwards compatibility. It's
> > recommended to enable for the formal usage.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> Seems wrong to me. E.g. it won't work in a nested
> virt setup. What if backend is dax but is not armed?
> Can't the armed bit of the backing device be tested?
> Name "restrict" is also confusing. Can we reuse cache=
> options? E.g. cache=unsafe etc.

The -drive cache= options (writeback, writethrough, none, directsync,
unsafe) are confusing and considered legacy options.  The new options
are -drive
cache.writeback=on|off,cache.direct=on|off,cache.no-flush=on|off.

I suggested to call the option -device nvdimm,armed=auto|on|off in
another email.  "Armed" is the term used by the NVDIMM/NFIT
specification and it has an NVDIMM-specific meaning.

Stefan
Haozhong Zhang June 12, 2017, 1:18 a.m. UTC | #8
On 06/08/17 15:56 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 06, 2017 at 03:22:28PM +0800, Haozhong Zhang wrote:
> > If a vNVDIMM device is not backed by a DAX device and its "restrict"
> > option is enabled, bit 3 of state flags in its region mapping
> > structure will be set, in order to notify the guest of the lack of
> > write persistence guarantee. Once this bit is set, the guest OS may
> > mark the vNVDIMM device as read-only.
> > 
> > This option is disabled by default for backwards compatibility. It's
> > recommended to enable for the formal usage.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> Seems wrong to me. E.g. it won't work in a nested
> virt setup. What if backend is dax but is not armed?
> Can't the armed bit of the backing device be tested?

If the not-arm bit of a host NVDIMM region is set, Linux NVDIMM driver
will make it read-only, and QEMU will fail when it tries to mmap it
with flags (PROT_READ | PROT_WRITE). Thus, we don't need to check
whether the host region is not armed.

> Name "restrict" is also confusing. Can we reuse cache=
> options? E.g. cache=unsafe etc.

I agree the name is confusing, and would like to use the name 'armed'
suggested by Stefan.

Thanks,
Haozhong

> 
> > ---
> >  hw/acpi/nvdimm.c        | 16 ++++++++++++++++
> >  hw/mem/nvdimm.c         | 38 +++++++++++++++++++++++++++++++++++++-
> >  include/hw/mem/nvdimm.h |  5 +++++
> >  3 files changed, 58 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 8e7d6ec034..fd1ef6dc65 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -138,6 +138,8 @@ struct NvdimmNfitMemDev {
> >  } QEMU_PACKED;
> >  typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
> >  
> > +#define ACPI_NFIT_MEM_NOT_ARMED    (1 << 3)
> > +
> >  /*
> >   * NVDIMM Control Region Structure
> >   *
> > @@ -289,6 +291,10 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
> >      int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
> >                                              NULL);
> >      uint32_t handle = nvdimm_slot_to_handle(slot);
> > +    bool dev_dax = object_property_get_bool(OBJECT(dev), NVDIMM_DEV_DAX_PROP,
> > +                                            NULL);
> > +    bool restrict_mode = object_property_get_bool(OBJECT(dev),
> > +                                                  NVDIMM_RESTRICT_PROP, NULL);
> >  
> >      nfit_memdev = acpi_data_push(structures, sizeof(*nfit_memdev));
> >  
> > @@ -312,6 +318,16 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
> >  
> >      /* Only one interleave for PMEM. */
> >      nfit_memdev->interleave_ways = cpu_to_le16(1);
> > +
> > +    /*
> > +     * If a vNVDIMM device in the restrict mode and is not backed by a
> > +     * DAX device, QEMU will set ACPI_NFIT_MEM_NOT_ARMED bit of state
> > +     * flags in its region mapping structure, in order to notify the
> > +     * guest of the lack of write persistence guarantee.
> > +     */
> > +    if (!dev_dax && restrict_mode) {
> > +        nfit_memdev->flags = cpu_to_le16(ACPI_NFIT_MEM_NOT_ARMED);
> > +    }
> >  }
> >  
> >  /*
> > diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> > index b23542fbdf..cda416e5c8 100644
> > --- a/hw/mem/nvdimm.c
> > +++ b/hw/mem/nvdimm.c
> > @@ -65,11 +65,46 @@ out:
> >      error_propagate(errp, local_err);
> >  }
> >  
> > +static bool nvdimm_get_backend_dev_dax(Object *obj, Error **errp)
> > +{
> > +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> > +
> > +    return nvdimm->backend_dev_dax;
> > +}
> > +
> > +static bool nvdimm_get_restrict(Object *obj, Error **errp)
> > +{
> > +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> > +
> > +    return nvdimm->restrict_mode;
> > +}
> > +
> > +static void nvdimm_set_restrict(Object *obj, bool val, Error **errp)
> > +{
> > +    DeviceState *dev = DEVICE(obj);
> > +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> > +    Error *local_err = NULL;
> > +
> > +    if (dev->realized) {
> > +        error_setg(&local_err, "cannot change property value");
> > +        goto out;
> > +    }
> > +
> > +    nvdimm->restrict_mode = val;
> > +
> > + out:
> > +    error_propagate(errp, local_err);
> > +}
> > +
> >  static void nvdimm_init(Object *obj)
> >  {
> >      object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
> >                          nvdimm_get_label_size, nvdimm_set_label_size, NULL,
> >                          NULL, NULL);
> > +    object_property_add_bool(obj, NVDIMM_DEV_DAX_PROP,
> > +                             nvdimm_get_backend_dev_dax, NULL, NULL);
> > +    object_property_add_bool(obj, NVDIMM_RESTRICT_PROP,
> > +                             nvdimm_get_restrict, nvdimm_set_restrict, NULL);
> >  }
> >  
> >  static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm)
> > @@ -85,7 +120,8 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
> >      NVDIMMDevice *nvdimm = NVDIMM(dimm);
> >      uint64_t align, pmem_size, size = memory_region_size(mr);
> >  
> > -    if (!qemu_fd_is_dev_dax(memory_region_get_fd(mr))) {
> > +    nvdimm->backend_dev_dax = qemu_fd_is_dev_dax(memory_region_get_fd(mr));
> > +    if (!nvdimm->backend_dev_dax) {
> >          error_report("warning: nvdimm backend does not look like a DAX device, "
> >                       "unable to guarantee persistence of guest writes");
> >      }
> > diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> > index f1f3987055..2fbe0d7858 100644
> > --- a/include/hw/mem/nvdimm.h
> > +++ b/include/hw/mem/nvdimm.h
> > @@ -49,6 +49,8 @@
> >                                                 TYPE_NVDIMM)
> >  
> >  #define NVDIMM_LABEL_SIZE_PROP "label-size"
> > +#define NVDIMM_DEV_DAX_PROP    "dev-dax"
> > +#define NVDIMM_RESTRICT_PROP   "restrict"
> >  
> >  struct NVDIMMDevice {
> >      /* private */
> > @@ -74,6 +76,9 @@ struct NVDIMMDevice {
> >       * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
> >       */
> >      MemoryRegion nvdimm_mr;
> > +
> > +    bool backend_dev_dax;
> > +    bool restrict_mode;
> >  };
> >  typedef struct NVDIMMDevice NVDIMMDevice;
> >  
> > -- 
> > 2.11.0
diff mbox

Patch

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 8e7d6ec034..fd1ef6dc65 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -138,6 +138,8 @@  struct NvdimmNfitMemDev {
 } QEMU_PACKED;
 typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
 
+#define ACPI_NFIT_MEM_NOT_ARMED    (1 << 3)
+
 /*
  * NVDIMM Control Region Structure
  *
@@ -289,6 +291,10 @@  nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
     int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
                                             NULL);
     uint32_t handle = nvdimm_slot_to_handle(slot);
+    bool dev_dax = object_property_get_bool(OBJECT(dev), NVDIMM_DEV_DAX_PROP,
+                                            NULL);
+    bool restrict_mode = object_property_get_bool(OBJECT(dev),
+                                                  NVDIMM_RESTRICT_PROP, NULL);
 
     nfit_memdev = acpi_data_push(structures, sizeof(*nfit_memdev));
 
@@ -312,6 +318,16 @@  nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
 
     /* Only one interleave for PMEM. */
     nfit_memdev->interleave_ways = cpu_to_le16(1);
+
+    /*
+     * If a vNVDIMM device in the restrict mode and is not backed by a
+     * DAX device, QEMU will set ACPI_NFIT_MEM_NOT_ARMED bit of state
+     * flags in its region mapping structure, in order to notify the
+     * guest of the lack of write persistence guarantee.
+     */
+    if (!dev_dax && restrict_mode) {
+        nfit_memdev->flags = cpu_to_le16(ACPI_NFIT_MEM_NOT_ARMED);
+    }
 }
 
 /*
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index b23542fbdf..cda416e5c8 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -65,11 +65,46 @@  out:
     error_propagate(errp, local_err);
 }
 
+static bool nvdimm_get_backend_dev_dax(Object *obj, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    return nvdimm->backend_dev_dax;
+}
+
+static bool nvdimm_get_restrict(Object *obj, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    return nvdimm->restrict_mode;
+}
+
+static void nvdimm_set_restrict(Object *obj, bool val, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+    Error *local_err = NULL;
+
+    if (dev->realized) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    nvdimm->restrict_mode = val;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
 static void nvdimm_init(Object *obj)
 {
     object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
                         nvdimm_get_label_size, nvdimm_set_label_size, NULL,
                         NULL, NULL);
+    object_property_add_bool(obj, NVDIMM_DEV_DAX_PROP,
+                             nvdimm_get_backend_dev_dax, NULL, NULL);
+    object_property_add_bool(obj, NVDIMM_RESTRICT_PROP,
+                             nvdimm_get_restrict, nvdimm_set_restrict, NULL);
 }
 
 static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm)
@@ -85,7 +120,8 @@  static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
     NVDIMMDevice *nvdimm = NVDIMM(dimm);
     uint64_t align, pmem_size, size = memory_region_size(mr);
 
-    if (!qemu_fd_is_dev_dax(memory_region_get_fd(mr))) {
+    nvdimm->backend_dev_dax = qemu_fd_is_dev_dax(memory_region_get_fd(mr));
+    if (!nvdimm->backend_dev_dax) {
         error_report("warning: nvdimm backend does not look like a DAX device, "
                      "unable to guarantee persistence of guest writes");
     }
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index f1f3987055..2fbe0d7858 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -49,6 +49,8 @@ 
                                                TYPE_NVDIMM)
 
 #define NVDIMM_LABEL_SIZE_PROP "label-size"
+#define NVDIMM_DEV_DAX_PROP    "dev-dax"
+#define NVDIMM_RESTRICT_PROP   "restrict"
 
 struct NVDIMMDevice {
     /* private */
@@ -74,6 +76,9 @@  struct NVDIMMDevice {
      * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
      */
     MemoryRegion nvdimm_mr;
+
+    bool backend_dev_dax;
+    bool restrict_mode;
 };
 typedef struct NVDIMMDevice NVDIMMDevice;