diff mbox series

[V2] memory: RAM_NAMED_FILE flag

Message ID 1675796613-235716-1-git-send-email-steven.sistare@oracle.com
State New
Headers show
Series [V2] memory: RAM_NAMED_FILE flag | expand

Commit Message

Steven Sistare Feb. 7, 2023, 7:03 p.m. UTC
migrate_ignore_shared() is an optimization that avoids copying memory
that is visible and can be mapped on the target.  However, a
memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
flag set is not migrated when migrate_ignore_shared() is true.  This is
wrong, because the block has no named backing store, and its contents will
be lost.  To fix, ignore shared memory iff it is a named file.  Define a
new flag RAM_NAMED_FILE to distinguish this case.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 backends/hostmem-file.c   | 1 +
 include/exec/cpu-common.h | 1 +
 include/exec/memory.h     | 3 +++
 migration/ram.c           | 3 ++-
 softmmu/physmem.c         | 7 ++++++-
 5 files changed, 13 insertions(+), 2 deletions(-)

Comments

Peter Xu Feb. 7, 2023, 8:23 p.m. UTC | #1
On Tue, Feb 07, 2023 at 11:03:33AM -0800, Steve Sistare wrote:
> migrate_ignore_shared() is an optimization that avoids copying memory
> that is visible and can be mapped on the target.  However, a
> memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
> flag set is not migrated when migrate_ignore_shared() is true.  This is
> wrong, because the block has no named backing store, and its contents will
> be lost.  To fix, ignore shared memory iff it is a named file.  Define a
> new flag RAM_NAMED_FILE to distinguish this case.

There's also TYPE_MEMORY_BACKEND_EPC.  Reading the commit message it seems
it can still be used in similar ways.  Pasting commit message from c6c0232:

    Because of its unique requirements, Linux manages EPC separately from
    normal memory.  Similar to memfd, the device /dev/sgx_vepc can be
    opened to obtain a file descriptor which can in turn be used to mmap()
    EPC memory.

I'm not sure whether it means that should apply for RAM_NAMED_FILE too,
neither do I think it's super important..  Still better to define it
properly.

Another comment is, AFAIK this patch will modify senamtics of the old
capability "x-ignore-shared".  But I'd say in a sensible way.  Maybe worth
directly modify qapi/migration.json to reflect it (especially it's x-
prefixed) to something like:

# @x-ignore-shared: If enabled, QEMU will not migrate named shared memory
#                   (since 4.0) 

Thanks,
Steven Sistare Feb. 8, 2023, 6:34 p.m. UTC | #2
On 2/7/2023 3:23 PM, Peter Xu wrote:
> On Tue, Feb 07, 2023 at 11:03:33AM -0800, Steve Sistare wrote:
>> migrate_ignore_shared() is an optimization that avoids copying memory
>> that is visible and can be mapped on the target.  However, a
>> memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
>> flag set is not migrated when migrate_ignore_shared() is true.  This is
>> wrong, because the block has no named backing store, and its contents will
>> be lost.  To fix, ignore shared memory iff it is a named file.  Define a
>> new flag RAM_NAMED_FILE to distinguish this case.
> 
> There's also TYPE_MEMORY_BACKEND_EPC.  Reading the commit message it seems
> it can still be used in similar ways.  Pasting commit message from c6c0232:
> 
>     Because of its unique requirements, Linux manages EPC separately from
>     normal memory.  Similar to memfd, the device /dev/sgx_vepc can be
>     opened to obtain a file descriptor which can in turn be used to mmap()
>     EPC memory.
> 
> I'm not sure whether it means that should apply for RAM_NAMED_FILE too,
> neither do I think it's super important..  Still better to define it
> properly.

The RAM_NAMED_FILE flag will be false for TYPE_MEMORY_BACKEND_EPC, so 
ramblock_is_ignored will return false, and the contents will be copied
over the socket to the target, and the segment will be recreated there.

However, perhaps I do not understand your point.

> Another comment is, AFAIK this patch will modify senamtics of the old
> capability "x-ignore-shared".  But I'd say in a sensible way.  Maybe worth
> directly modify qapi/migration.json to reflect it (especially it's x-
> prefixed) to something like:
> 
> # @x-ignore-shared: If enabled, QEMU will not migrate named shared memory
> #                   (since 4.0) 

Good idea.  I propose:

# @x-ignore-shared: If enabled, QEMU will not migrate shared memory that is
#                   accessible on the target. (since 4.0)

- Steve
Peter Xu Feb. 8, 2023, 7:22 p.m. UTC | #3
On Wed, Feb 08, 2023 at 01:34:18PM -0500, Steven Sistare wrote:
> On 2/7/2023 3:23 PM, Peter Xu wrote:
> > On Tue, Feb 07, 2023 at 11:03:33AM -0800, Steve Sistare wrote:
> >> migrate_ignore_shared() is an optimization that avoids copying memory
> >> that is visible and can be mapped on the target.  However, a
> >> memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
> >> flag set is not migrated when migrate_ignore_shared() is true.  This is
> >> wrong, because the block has no named backing store, and its contents will
> >> be lost.  To fix, ignore shared memory iff it is a named file.  Define a
> >> new flag RAM_NAMED_FILE to distinguish this case.
> > 
> > There's also TYPE_MEMORY_BACKEND_EPC.  Reading the commit message it seems
> > it can still be used in similar ways.  Pasting commit message from c6c0232:
> > 
> >     Because of its unique requirements, Linux manages EPC separately from
> >     normal memory.  Similar to memfd, the device /dev/sgx_vepc can be
> >     opened to obtain a file descriptor which can in turn be used to mmap()
> >     EPC memory.
> > 
> > I'm not sure whether it means that should apply for RAM_NAMED_FILE too,
> > neither do I think it's super important..  Still better to define it
> > properly.
> 
> The RAM_NAMED_FILE flag will be false for TYPE_MEMORY_BACKEND_EPC, so 
> ramblock_is_ignored will return false, and the contents will be copied
> over the socket to the target, and the segment will be recreated there.
> 
> However, perhaps I do not understand your point.

My point was it looked like it should apply RAM_NAMED_FILE too, because
it's also a named file.

But..  I don't think another QEMU can share the same data if opening the
same file.  Based on my zero knowledge on it... I quickly looked up
sgx_vepc_open() in the Linux impl where sgx_vepc.page_array plays a role of
page cache iiuc, while it's private per vma even if VM_SHARED.

So please ignore my comment..

> 
> > Another comment is, AFAIK this patch will modify senamtics of the old
> > capability "x-ignore-shared".  But I'd say in a sensible way.  Maybe worth
> > directly modify qapi/migration.json to reflect it (especially it's x-
> > prefixed) to something like:
> > 
> > # @x-ignore-shared: If enabled, QEMU will not migrate named shared memory
> > #                   (since 4.0) 
> 
> Good idea.  I propose:
> 
> # @x-ignore-shared: If enabled, QEMU will not migrate shared memory that is
> #                   accessible on the target. (since 4.0)

I would use s/target/destination host/ because target can be misread as the
VM itself (at least a general term in migration code base).  Other than
that looks good to me.

Thanks,
Steven Sistare June 7, 2023, 2:05 p.m. UTC | #4
On 2/8/2023 2:22 PM, Peter Xu wrote:
> On Wed, Feb 08, 2023 at 01:34:18PM -0500, Steven Sistare wrote:
>> On 2/7/2023 3:23 PM, Peter Xu wrote:
>>> On Tue, Feb 07, 2023 at 11:03:33AM -0800, Steve Sistare wrote:
>>>> migrate_ignore_shared() is an optimization that avoids copying memory
>>>> that is visible and can be mapped on the target.  However, a
>>>> memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
>>>> flag set is not migrated when migrate_ignore_shared() is true.  This is
>>>> wrong, because the block has no named backing store, and its contents will
>>>> be lost.  To fix, ignore shared memory iff it is a named file.  Define a
>>>> new flag RAM_NAMED_FILE to distinguish this case.
>>>
>>> There's also TYPE_MEMORY_BACKEND_EPC.  Reading the commit message it seems
>>> it can still be used in similar ways.  Pasting commit message from c6c0232:
>>>
>>>     Because of its unique requirements, Linux manages EPC separately from
>>>     normal memory.  Similar to memfd, the device /dev/sgx_vepc can be
>>>     opened to obtain a file descriptor which can in turn be used to mmap()
>>>     EPC memory.
>>>
>>> I'm not sure whether it means that should apply for RAM_NAMED_FILE too,
>>> neither do I think it's super important..  Still better to define it
>>> properly.
>>
>> The RAM_NAMED_FILE flag will be false for TYPE_MEMORY_BACKEND_EPC, so 
>> ramblock_is_ignored will return false, and the contents will be copied
>> over the socket to the target, and the segment will be recreated there.
>>
>> However, perhaps I do not understand your point.
> 
> My point was it looked like it should apply RAM_NAMED_FILE too, because
> it's also a named file.
> 
> But..  I don't think another QEMU can share the same data if opening the
> same file.  Based on my zero knowledge on it... I quickly looked up
> sgx_vepc_open() in the Linux impl where sgx_vepc.page_array plays a role of
> page cache iiuc, while it's private per vma even if VM_SHARED.
> 
> So please ignore my comment..
> 
>>
>>> Another comment is, AFAIK this patch will modify senamtics of the old
>>> capability "x-ignore-shared".  But I'd say in a sensible way.  Maybe worth
>>> directly modify qapi/migration.json to reflect it (especially it's x-
>>> prefixed) to something like:
>>>
>>> # @x-ignore-shared: If enabled, QEMU will not migrate named shared memory
>>> #                   (since 4.0) 
>>
>> Good idea.  I propose:
>>
>> # @x-ignore-shared: If enabled, QEMU will not migrate shared memory that is
>> #                   accessible on the target. (since 4.0)
> 
> I would use s/target/destination host/ because target can be misread as the
> VM itself (at least a general term in migration code base).  Other than
> that looks good to me.

Hi Peter, I will rebase to the tip, modify the wording, and repost.  
Can I add your RB?

- Steve
Peter Xu June 7, 2023, 2:48 p.m. UTC | #5
On Wed, Jun 07, 2023 at 10:05:11AM -0400, Steven Sistare wrote:
> On 2/8/2023 2:22 PM, Peter Xu wrote:
> > On Wed, Feb 08, 2023 at 01:34:18PM -0500, Steven Sistare wrote:
> >> On 2/7/2023 3:23 PM, Peter Xu wrote:
> >>> On Tue, Feb 07, 2023 at 11:03:33AM -0800, Steve Sistare wrote:
> >>>> migrate_ignore_shared() is an optimization that avoids copying memory
> >>>> that is visible and can be mapped on the target.  However, a
> >>>> memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
> >>>> flag set is not migrated when migrate_ignore_shared() is true.  This is
> >>>> wrong, because the block has no named backing store, and its contents will
> >>>> be lost.  To fix, ignore shared memory iff it is a named file.  Define a
> >>>> new flag RAM_NAMED_FILE to distinguish this case.
> >>>
> >>> There's also TYPE_MEMORY_BACKEND_EPC.  Reading the commit message it seems
> >>> it can still be used in similar ways.  Pasting commit message from c6c0232:
> >>>
> >>>     Because of its unique requirements, Linux manages EPC separately from
> >>>     normal memory.  Similar to memfd, the device /dev/sgx_vepc can be
> >>>     opened to obtain a file descriptor which can in turn be used to mmap()
> >>>     EPC memory.
> >>>
> >>> I'm not sure whether it means that should apply for RAM_NAMED_FILE too,
> >>> neither do I think it's super important..  Still better to define it
> >>> properly.
> >>
> >> The RAM_NAMED_FILE flag will be false for TYPE_MEMORY_BACKEND_EPC, so 
> >> ramblock_is_ignored will return false, and the contents will be copied
> >> over the socket to the target, and the segment will be recreated there.
> >>
> >> However, perhaps I do not understand your point.
> > 
> > My point was it looked like it should apply RAM_NAMED_FILE too, because
> > it's also a named file.
> > 
> > But..  I don't think another QEMU can share the same data if opening the
> > same file.  Based on my zero knowledge on it... I quickly looked up
> > sgx_vepc_open() in the Linux impl where sgx_vepc.page_array plays a role of
> > page cache iiuc, while it's private per vma even if VM_SHARED.
> > 
> > So please ignore my comment..
> > 
> >>
> >>> Another comment is, AFAIK this patch will modify senamtics of the old
> >>> capability "x-ignore-shared".  But I'd say in a sensible way.  Maybe worth
> >>> directly modify qapi/migration.json to reflect it (especially it's x-
> >>> prefixed) to something like:
> >>>
> >>> # @x-ignore-shared: If enabled, QEMU will not migrate named shared memory
> >>> #                   (since 4.0) 
> >>
> >> Good idea.  I propose:
> >>
> >> # @x-ignore-shared: If enabled, QEMU will not migrate shared memory that is
> >> #                   accessible on the target. (since 4.0)
> > 
> > I would use s/target/destination host/ because target can be misread as the
> > VM itself (at least a general term in migration code base).  Other than
> > that looks good to me.
> 
> Hi Peter, I will rebase to the tip, modify the wording, and repost.  
> Can I add your RB?

Yes. thanks,
diff mbox series

Patch

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 2514128..486d77d 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -56,6 +56,7 @@  file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
+    ram_flags |= RAM_NAMED_FILE;
     memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
                                      backend->size, fb->align, ram_flags,
                                      fb->mem_path, fb->readonly, errp);
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6feaa40..8173e3b 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -92,6 +92,7 @@  void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
 void qemu_ram_set_migratable(RAMBlock *rb);
 void qemu_ram_unset_migratable(RAMBlock *rb);
+bool qemu_ram_is_named_file(RAMBlock *rb);
 int qemu_ram_get_fd(RAMBlock *rb);
 
 size_t qemu_ram_pagesize(RAMBlock *block);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2..3224e09 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -232,6 +232,9 @@  typedef struct IOMMUTLBEvent {
 /* RAM that isn't accessible through normal means. */
 #define RAM_PROTECTED (1 << 8)
 
+/* RAM is an mmap-ed named file */
+#define RAM_NAMED_FILE (1 << 9)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/migration/ram.c b/migration/ram.c
index 334309f..03f8131 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -190,7 +190,8 @@  static bool postcopy_preempt_active(void)
 bool ramblock_is_ignored(RAMBlock *block)
 {
     return !qemu_ram_is_migratable(block) ||
-           (migrate_ignore_shared() && qemu_ram_is_shared(block));
+           (migrate_ignore_shared() && qemu_ram_is_shared(block) &&
+            qemu_ram_is_named_file(block));
 }
 
 #undef RAMBLOCK_FOREACH
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index cb998cd..767a5f0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1749,6 +1749,11 @@  void qemu_ram_unset_migratable(RAMBlock *rb)
     rb->flags &= ~RAM_MIGRATABLE;
 }
 
+bool qemu_ram_is_named_file(RAMBlock *rb)
+{
+    return rb->flags & RAM_NAMED_FILE;
+}
+
 int qemu_ram_get_fd(RAMBlock *rb)
 {
     return rb->fd;
@@ -2059,7 +2064,7 @@  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
     /* Just support these ram flags by now. */
     assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
-                          RAM_PROTECTED)) == 0);
+                          RAM_PROTECTED | RAM_NAMED_FILE)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");