diff mbox series

[v2,2/4] Memory: Enable writeback for given memory region

Message ID 20191105234100.22052-3-beata.michalska@linaro.org
State New
Headers show
Series target/arm: Support for Data Cache Clean up to PoP | expand

Commit Message

Beata Michalska Nov. 5, 2019, 11:40 p.m. UTC
Add an option to trigger memory writeback to sync given memory region
with the corresponding backing store, case one is available.
This extends the support for persistent memory, allowing syncing on-demand.

Signed-off-by: Beata Michalska <beata.michalska@linaro.org>
---
 exec.c                  | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/exec/memory.h   |  6 ++++++
 include/exec/ram_addr.h |  8 ++++++++
 include/qemu/cutils.h   |  1 +
 memory.c                | 12 ++++++++++++
 util/cutils.c           | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 117 insertions(+)

Comments

Richard Henderson Nov. 6, 2019, 12:19 p.m. UTC | #1
On 11/6/19 12:40 AM, Beata Michalska wrote:
> +void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
> +{
> +    void *addr = ramblock_ptr(block, start);
> +
> +    /*
> +     * The requested range might spread up to the very end of the block
> +     */
> +    if ((start + length) > block->used_length) {
> +        qemu_log("%s: sync range outside the block boundaries: "
> +                     "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
> +                     " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
> +                     __func__, start, length, block->used_length);
> +        length = block->used_length - start;
> +    }

qemu_log_mask w/ GUEST_ERROR?  How do we expect the length to overflow?

> +#ifdef CONFIG_LIBPMEM
> +    /* The lack of support for pmem should not block the sync */
> +    if (ramblock_is_pmem(block)) {
> +        pmem_persist(addr, length);
> +    } else
> +#endif

Perhaps better to return out of that if block than have the dangling else.

> +/**
> + * Sync changes made to the memory mapped file back to the backing
> + * storage. For POSIX compliant systems this will simply fallback
> + * to regular msync call (thus the required alignment). Otherwise
> + * it will trigger whole file sync (including the metadata case
> + * there is no support to skip that otherwise)
> + *
> + * @addr   - start of the memory area to be synced
> + * @length - length of the are to be synced
> + * @align  - alignment (expected to be PAGE_SIZE)
> + * @fd     - file descriptor for the file to be synced
> + *           (mandatory only for POSIX non-compliant systems)
> + */
> +int qemu_msync(void *addr, size_t length, size_t align, int fd)
> +{
> +#ifdef CONFIG_POSIX
> +    size_t align_mask;
> +
> +    /* Bare minimum of sanity checks on the alignment */
> +    /* The start address needs to be a multiple of PAGE_SIZE */
> +    align = MAX(align, qemu_real_host_page_size);
> +    align_mask = ~(qemu_real_host_page_size - 1);
> +    align = (align + ~align_mask) & align_mask;
> +
> +    align_mask = ~(align - 1);

I don't understand what you're trying to do with align.

You pass in qemu_host_page_size from the one caller, and then adjust it for
qemu_real_host_page_size?

Why pass in anything at all, and just use qemu_real_host_page_mask?

> +    /**
> +     * There are no strict reqs as per the length of mapping
> +     * to be synced. Still the length needs to follow the address
> +     * alignment changes. Additionally - round the size to the multiple
> +     * of requested alignment (expected as PAGE_SIZE)
> +     */
> +    length += ((uintptr_t)addr & (align - 1));
> +    length = (length + ~align_mask) & align_mask;
> +
> +    addr = (void *)((uintptr_t)addr & align_mask);
> +
> +    return msync(addr, length, MS_SYNC);
> +#else /* CONFIG_POSIX */
> +    /**
> +     * Perform the sync based on the file descriptor
> +     * The sync range will most probably be wider than the one
> +     * requested - but it will still get the job done
> +     */
> +    return qemu_fdatasync(fd);
> +#endif /* CONFIG_POSIX */
> +}


r~
Beata Michalska Nov. 7, 2019, 2:41 p.m. UTC | #2
On Wed, 6 Nov 2019 at 12:20, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 11/6/19 12:40 AM, Beata Michalska wrote:
> > +void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
> > +{
> > +    void *addr = ramblock_ptr(block, start);
> > +
> > +    /*
> > +     * The requested range might spread up to the very end of the block
> > +     */
> > +    if ((start + length) > block->used_length) {
> > +        qemu_log("%s: sync range outside the block boundaries: "
> > +                     "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
> > +                     " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
> > +                     __func__, start, length, block->used_length);
> > +        length = block->used_length - start;
> > +    }
>
> qemu_log_mask w/ GUEST_ERROR?  How do we expect the length to overflow?

In theory it shouldn't, at least with current usage.
I guess the probe_access will make sure of that.
This was more of a precaution to enable catching potential/future misuses
aka debugging purpose. I can get rid of that it that's playing too safe.

>
> > +#ifdef CONFIG_LIBPMEM
> > +    /* The lack of support for pmem should not block the sync */
> > +    if (ramblock_is_pmem(block)) {
> > +        pmem_persist(addr, length);
> > +    } else
> > +#endif
>
> Perhaps better to return out of that if block than have the dangling else.

Good idea
>
> > +/**
> > + * Sync changes made to the memory mapped file back to the backing
> > + * storage. For POSIX compliant systems this will simply fallback
> > + * to regular msync call (thus the required alignment). Otherwise
> > + * it will trigger whole file sync (including the metadata case
> > + * there is no support to skip that otherwise)
> > + *
> > + * @addr   - start of the memory area to be synced
> > + * @length - length of the are to be synced
> > + * @align  - alignment (expected to be PAGE_SIZE)
> > + * @fd     - file descriptor for the file to be synced
> > + *           (mandatory only for POSIX non-compliant systems)
> > + */
> > +int qemu_msync(void *addr, size_t length, size_t align, int fd)
> > +{
> > +#ifdef CONFIG_POSIX
> > +    size_t align_mask;
> > +
> > +    /* Bare minimum of sanity checks on the alignment */
> > +    /* The start address needs to be a multiple of PAGE_SIZE */
> > +    align = MAX(align, qemu_real_host_page_size);
> > +    align_mask = ~(qemu_real_host_page_size - 1);
> > +    align = (align + ~align_mask) & align_mask;
> > +
> > +    align_mask = ~(align - 1);
>
> I don't understand what you're trying to do with align.
>
> You pass in qemu_host_page_size from the one caller, and then adjust it for
> qemu_real_host_page_size?
>
> Why pass in anything at all, and just use qemu_real_host_page_mask?

The qemu_msync was supposed to be generic and not tied to current use case
without any assumptions on the alignment and whether that would  be an actual
host page size. So that was just to make sure it will be a multiple of that.
I can get rid of that with assumption all will be using the same alignment.

BR
Beata
>
> > +    /**
> > +     * There are no strict reqs as per the length of mapping
> > +     * to be synced. Still the length needs to follow the address
> > +     * alignment changes. Additionally - round the size to the multiple
> > +     * of requested alignment (expected as PAGE_SIZE)
> > +     */
> > +    length += ((uintptr_t)addr & (align - 1));
> > +    length = (length + ~align_mask) & align_mask;
> > +
> > +    addr = (void *)((uintptr_t)addr & align_mask);
> > +
> > +    return msync(addr, length, MS_SYNC);
> > +#else /* CONFIG_POSIX */
> > +    /**
> > +     * Perform the sync based on the file descriptor
> > +     * The sync range will most probably be wider than the one
> > +     * requested - but it will still get the job done
> > +     */
> > +    return qemu_fdatasync(fd);
> > +#endif /* CONFIG_POSIX */
> > +}
>
>
> r~
>
Alex Bennée Nov. 7, 2019, 4:57 p.m. UTC | #3
Beata Michalska <beata.michalska@linaro.org> writes:

> On Wed, 6 Nov 2019 at 12:20, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 11/6/19 12:40 AM, Beata Michalska wrote:
>> > +void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
>> > +{
>> > +    void *addr = ramblock_ptr(block, start);
>> > +
>> > +    /*
>> > +     * The requested range might spread up to the very end of the block
>> > +     */
>> > +    if ((start + length) > block->used_length) {
>> > +        qemu_log("%s: sync range outside the block boundaries: "
>> > +                     "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
>> > +                     " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
>> > +                     __func__, start, length, block->used_length);
>> > +        length = block->used_length - start;
>> > +    }
>>
>> qemu_log_mask w/ GUEST_ERROR?  How do we expect the length to overflow?
>
> In theory it shouldn't, at least with current usage.
> I guess the probe_access will make sure of that.
> This was more of a precaution to enable catching potential/future misuses
> aka debugging purpose. I can get rid of that it that's playing too
> safe.

If the internal code might get it wrong and that would be a bug then the
g_assert(), if the values are ultimately from the guest then log with
GUEST_ERROR as Richard suggests.

<snip>

--
Alex Bennée
Peter Maydell Nov. 7, 2019, 4:59 p.m. UTC | #4
On Thu, 7 Nov 2019 at 16:57, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Beata Michalska <beata.michalska@linaro.org> writes:
>
> > On Wed, 6 Nov 2019 at 12:20, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >> qemu_log_mask w/ GUEST_ERROR?  How do we expect the length to overflow?
> >
> > In theory it shouldn't, at least with current usage.
> > I guess the probe_access will make sure of that.
> > This was more of a precaution to enable catching potential/future misuses
> > aka debugging purpose. I can get rid of that it that's playing too
> > safe.
>
> If the internal code might get it wrong and that would be a bug then the
> g_assert(), if the values are ultimately from the guest then log with
> GUEST_ERROR as Richard suggests.

...or consider asserting at this level and making the target
specific calling code sanitize and do the GUEST_ERROR logging
(it can do a better job of it because it knows what the
target-specific operation that the guest just got wrong was).

thanks
-- PMM
diff mbox series

Patch

diff --git a/exec.c b/exec.c
index ffdb518..e1f06de 100644
--- a/exec.c
+++ b/exec.c
@@ -65,6 +65,8 @@ 
 #include "exec/ram_addr.h"
 #include "exec/log.h"
 
+#include "qemu/pmem.h"
+
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -2156,6 +2158,47 @@  int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
     return 0;
 }
 
+/*
+ * Trigger sync on the given ram block for range [start, start + length]
+ * with the backing store if one is available.
+ * Otherwise no-op.
+ * @Note: this is supposed to be a synchronous op.
+ */
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
+{
+    void *addr = ramblock_ptr(block, start);
+
+    /*
+     * The requested range might spread up to the very end of the block
+     */
+    if ((start + length) > block->used_length) {
+        qemu_log("%s: sync range outside the block boundaries: "
+                     "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
+                     " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
+                     __func__, start, length, block->used_length);
+        length = block->used_length - start;
+    }
+
+#ifdef CONFIG_LIBPMEM
+    /* The lack of support for pmem should not block the sync */
+    if (ramblock_is_pmem(block)) {
+        pmem_persist(addr, length);
+    } else
+#endif
+    if (block->fd >= 0) {
+        /**
+         * Case there is no support for PMEM or the memory has not been
+         * specified as persistent (or is not one) - use the msync.
+         * Less optimal but still achieves the same goal
+         */
+        if (qemu_msync(addr, length, qemu_host_page_size, block->fd)) {
+            warn_report("%s: failed to sync memory range: start: "
+                    RAM_ADDR_FMT " length: " RAM_ADDR_FMT,
+                    __func__, start, length);
+        }
+    }
+}
+
 /* Called with ram_list.mutex held */
 static void dirty_memory_extend(ram_addr_t old_ram_size,
                                 ram_addr_t new_ram_size)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e499dc2..27a84e0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1265,6 +1265,12 @@  void *memory_region_get_ram_ptr(MemoryRegion *mr);
  */
 void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize,
                               Error **errp);
+/**
+ * memory_region_do_writeback: Trigger writeback for selected address range
+ * [addr, addr + size]
+ *
+ */
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size);
 
 /**
  * memory_region_set_log: Turn dirty logging on or off for a region.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index bed0554..5adebb0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -174,6 +174,14 @@  void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length);
+
+/* Clear whole block of mem */
+static inline void qemu_ram_block_writeback(RAMBlock *block)
+{
+    qemu_ram_writeback(block, 0, block->used_length);
+}
+
 #define DIRTY_CLIENTS_ALL     ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index b54c847..41c5fa9 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -130,6 +130,7 @@  const char *qemu_strchrnul(const char *s, int c);
 #endif
 time_t mktimegm(struct tm *tm);
 int qemu_fdatasync(int fd);
+int qemu_msync(void *addr, size_t length, size_t alignment, int fd);
 int fcntl_setfl(int fd, int flag);
 int qemu_parse_fd(const char *param);
 int qemu_strtoi(const char *nptr, const char **endptr, int base,
diff --git a/memory.c b/memory.c
index c952eab..15734a0 100644
--- a/memory.c
+++ b/memory.c
@@ -2214,6 +2214,18 @@  void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize, Error **errp
     qemu_ram_resize(mr->ram_block, newsize, errp);
 }
 
+
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size)
+{
+    /*
+     * Might be extended case needed to cover
+     * different types of memory regions
+     */
+    if (mr->ram_block && mr->dirty_log_mask) {
+        qemu_ram_writeback(mr->ram_block, addr, size);
+    }
+}
+
 /*
  * Call proper memory listeners about the change on the newly
  * added/removed CoalescedMemoryRange.
diff --git a/util/cutils.c b/util/cutils.c
index fd591ca..7a50dbc 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -164,6 +164,53 @@  int qemu_fdatasync(int fd)
 #endif
 }
 
+/**
+ * Sync changes made to the memory mapped file back to the backing
+ * storage. For POSIX compliant systems this will simply fallback
+ * to regular msync call (thus the required alignment). Otherwise
+ * it will trigger whole file sync (including the metadata case
+ * there is no support to skip that otherwise)
+ *
+ * @addr   - start of the memory area to be synced
+ * @length - length of the are to be synced
+ * @align  - alignment (expected to be PAGE_SIZE)
+ * @fd     - file descriptor for the file to be synced
+ *           (mandatory only for POSIX non-compliant systems)
+ */
+int qemu_msync(void *addr, size_t length, size_t align, int fd)
+{
+#ifdef CONFIG_POSIX
+    size_t align_mask;
+
+    /* Bare minimum of sanity checks on the alignment */
+    /* The start address needs to be a multiple of PAGE_SIZE */
+    align = MAX(align, qemu_real_host_page_size);
+    align_mask = ~(qemu_real_host_page_size - 1);
+    align = (align + ~align_mask) & align_mask;
+
+    align_mask = ~(align - 1);
+    /**
+     * There are no strict reqs as per the length of mapping
+     * to be synced. Still the length needs to follow the address
+     * alignment changes. Additionally - round the size to the multiple
+     * of requested alignment (expected as PAGE_SIZE)
+     */
+    length += ((uintptr_t)addr & (align - 1));
+    length = (length + ~align_mask) & align_mask;
+
+    addr = (void *)((uintptr_t)addr & align_mask);
+
+    return msync(addr, length, MS_SYNC);
+#else /* CONFIG_POSIX */
+    /**
+     * Perform the sync based on the file descriptor
+     * The sync range will most probably be wider than the one
+     * requested - but it will still get the job done
+     */
+    return qemu_fdatasync(fd);
+#endif /* CONFIG_POSIX */
+}
+
 #ifndef _WIN32
 /* Sets a specific flag */
 int fcntl_setfl(int fd, int flag)