diff mbox series

[v2,4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation

Message ID 20180207073331.14158-5-haozhong.zhang@intel.com
State New
Headers show
Series nvdimm: guarantee persistence of QEMU writes to persistent memory | expand

Commit Message

Haozhong Zhang Feb. 7, 2018, 7:33 a.m. UTC
Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 hw/mem/nvdimm.c     |  9 ++++++++-
 include/qemu/pmem.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h

Comments

Stefan Hajnoczi Feb. 9, 2018, 2:27 p.m. UTC | #1
On Wed, Feb 07, 2018 at 03:33:27PM +0800, Haozhong Zhang wrote:
> @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
>  {
>      MemoryRegion *mr;
>      PCDIMMDevice *dimm = PC_DIMM(nvdimm);
> +    bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
> +                                            "pmem", NULL);
>      uint64_t backend_offset;
>  
>      nvdimm_validate_rw_label_data(nvdimm, size, offset);
>  
> -    memcpy(nvdimm->label_data + offset, buf, size);
> +    if (!is_pmem) {
> +        memcpy(nvdimm->label_data + offset, buf, size);
> +    } else {
> +        pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
> +    }

Is this enough to prevent label corruption in case of power failure?

pmem_memcpy_persist() is not atomic.  Power failure can result in a mix
of the old and new label data.

If we want this operation to be 100% safe there needs to be some kind of
update protocol that makes the change atomic, like a Label A and Label B
area with a single Label Index field that can be updated atomically to
point to the active Label A/B area.

Stefan
Haozhong Zhang Feb. 9, 2018, 2:57 p.m. UTC | #2
On 02/09/18 14:27 +0000, Stefan Hajnoczi wrote:
> On Wed, Feb 07, 2018 at 03:33:27PM +0800, Haozhong Zhang wrote:
> > @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
> >  {
> >      MemoryRegion *mr;
> >      PCDIMMDevice *dimm = PC_DIMM(nvdimm);
> > +    bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
> > +                                            "pmem", NULL);
> >      uint64_t backend_offset;
> >  
> >      nvdimm_validate_rw_label_data(nvdimm, size, offset);
> >  
> > -    memcpy(nvdimm->label_data + offset, buf, size);
> > +    if (!is_pmem) {
> > +        memcpy(nvdimm->label_data + offset, buf, size);
> > +    } else {
> > +        pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
> > +    }
> 
> Is this enough to prevent label corruption in case of power failure?
> 
> pmem_memcpy_persist() is not atomic.  Power failure can result in a mix
> of the old and new label data.
> 
> If we want this operation to be 100% safe there needs to be some kind of
> update protocol that makes the change atomic, like a Label A and Label B
> area with a single Label Index field that can be updated atomically to
> point to the active Label A/B area.

All this patch series is to guarantee: if the guest is still alive and
running, all its previous writes to pmem, which were performed by
QEMU, will be still persistent on pmem.

If a power failure happens before QEMU returns to the guest, e.g., in
the middle of above pmem_memcpy_persist(), yes, the guest label data
may be in an inconsistent state, but the guest also has no chance to
progress.  And, that is what could happen in the non-virtualization
environment as well, and it's the responsibility of the (guest) SW to
defend such failures, e.g., by the protocol you mentioned.

Haozhong
Stefan Hajnoczi Feb. 12, 2018, 1:55 p.m. UTC | #3
On Fri, Feb 09, 2018 at 10:57:26PM +0800, Haozhong Zhang wrote:
> On 02/09/18 14:27 +0000, Stefan Hajnoczi wrote:
> > On Wed, Feb 07, 2018 at 03:33:27PM +0800, Haozhong Zhang wrote:
> > > @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
> > >  {
> > >      MemoryRegion *mr;
> > >      PCDIMMDevice *dimm = PC_DIMM(nvdimm);
> > > +    bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
> > > +                                            "pmem", NULL);
> > >      uint64_t backend_offset;
> > >  
> > >      nvdimm_validate_rw_label_data(nvdimm, size, offset);
> > >  
> > > -    memcpy(nvdimm->label_data + offset, buf, size);
> > > +    if (!is_pmem) {
> > > +        memcpy(nvdimm->label_data + offset, buf, size);
> > > +    } else {
> > > +        pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
> > > +    }
> > 
> > Is this enough to prevent label corruption in case of power failure?
> > 
> > pmem_memcpy_persist() is not atomic.  Power failure can result in a mix
> > of the old and new label data.
> > 
> > If we want this operation to be 100% safe there needs to be some kind of
> > update protocol that makes the change atomic, like a Label A and Label B
> > area with a single Label Index field that can be updated atomically to
> > point to the active Label A/B area.
> 
> All this patch series is to guarantee: if the guest is still alive and
> running, all its previous writes to pmem, which were performed by
> QEMU, will be still persistent on pmem.
> 
> If a power failure happens before QEMU returns to the guest, e.g., in
> the middle of above pmem_memcpy_persist(), yes, the guest label data
> may be in an inconsistent state, but the guest also has no chance to
> progress.  And, that is what could happen in the non-virtualization
> environment as well, and it's the responsibility of the (guest) SW to
> defend such failures, e.g., by the protocol you mentioned.

Thanks for explaining!  I thought the atomic update is done by the ACPI
_LSW DSM method but it turns out the driver must do it.

For anyone else who is interested, a detailed description of the Label
Storage Area is on page 652 of the UEFI Specification version 2.7 and
the Linux driver implements the atomic update algorithm in
drivers/nvdimm/label.c:__pmem_label_update().

Stefan
diff mbox series

Patch

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 61e677f92f..18861d1a7a 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@ 
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
@@ -156,11 +157,17 @@  static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
 {
     MemoryRegion *mr;
     PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+    bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+                                            "pmem", NULL);
     uint64_t backend_offset;
 
     nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-    memcpy(nvdimm->label_data + offset, buf, size);
+    if (!is_pmem) {
+        memcpy(nvdimm->label_data + offset, buf, size);
+    } else {
+        pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+    }
 
     mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
     backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 0000000000..9017596ff0
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,31 @@ 
+/*
+ * Stub functions for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include <libpmem.h>
+#else  /* !CONFIG_LIBPMEM */
+
+#include <string.h>
+
+/* Stubs */
+
+static inline void *
+pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+    return memcpy(pmemdest, src, len);
+}
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */