diff mbox series

[for-9.1,v2,09/11] hostmem: add a new memory backend based on POSIX shm_open()

Message ID 20240326133936.125332-10-sgarzare@redhat.com
State New
Headers show
Series vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD) | expand

Commit Message

Stefano Garzarella March 26, 2024, 1:39 p.m. UTC
shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json                      |  17 +++++
 backends/hostmem-shm.c             | 118 +++++++++++++++++++++++++++++
 backends/meson.build               |   1 +
 qemu-options.hx                    |  10 +++
 5 files changed, 149 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

Comments

David Hildenbrand March 26, 2024, 2:45 p.m. UTC | #1
> +    mode = 0;
> +    oflag = O_RDWR | O_CREAT | O_EXCL;
> +    backend_name = host_memory_backend_get_name(backend);
> +
> +    /*
> +     * Some operating systems allow creating anonymous POSIX shared memory
> +     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
> +     * defined by POSIX, so let's create a unique name.
> +     *
> +     * From Linux's shm_open(3) man-page:
> +     *   For  portable  use,  a shared  memory  object should be identified
> +     *   by a name of the form /somename;"
> +     */
> +    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
> +                    backend_name);

Hm, shouldn't we just let the user specify a name, and if no name was 
specified, generate one ourselves?

I'm also not quite sure if "host_memory_backend_get_name()" should be 
used for the purpose here.
Stefano Garzarella March 27, 2024, 10:23 a.m. UTC | #2
On Tue, Mar 26, 2024 at 03:45:52PM +0100, David Hildenbrand wrote:
>>+    mode = 0;
>>+    oflag = O_RDWR | O_CREAT | O_EXCL;
>>+    backend_name = host_memory_backend_get_name(backend);
>>+
>>+    /*
>>+     * Some operating systems allow creating anonymous POSIX shared memory
>>+     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
>>+     * defined by POSIX, so let's create a unique name.
>>+     *
>>+     * From Linux's shm_open(3) man-page:
>>+     *   For  portable  use,  a shared  memory  object should be identified
>>+     *   by a name of the form /somename;"
>>+     */
>>+    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
>>+                    backend_name);
>
>Hm, shouldn't we just let the user specify a name, and if no name was 
>specified, generate one ourselves?

I thought about it and initially did it that way, but then some problems 
came up so I tried to keep it as simple as possible for the user and for 
our use case (having an fd associated with memory and sharing it with 
other processes).

The problems I had were:

- what mode_t to use if the object does not exist and needs to be 
   created?

- exclude O_EXCL if the user passes the name since they may have already 
   created it?

- call shm_unlink() only at object deallocation?

So I thought that for now we only support the "anonymous" mode, and if 
in the future we have a use case where the user wants to specify the 
name, we can add it later.

That said, if you think it's already useful from the beginning, I can 
add the name as an optional parameter.

>
>I'm also not quite sure if "host_memory_backend_get_name()" should be 
>used for the purpose here.

What problem do you see? As an alternative I thought of a static 
counter.

Thanks,
Stefano
David Hildenbrand March 27, 2024, 11:51 a.m. UTC | #3
On 27.03.24 11:23, Stefano Garzarella wrote:
> On Tue, Mar 26, 2024 at 03:45:52PM +0100, David Hildenbrand wrote:
>>> +    mode = 0;
>>> +    oflag = O_RDWR | O_CREAT | O_EXCL;
>>> +    backend_name = host_memory_backend_get_name(backend);
>>> +
>>> +    /*
>>> +     * Some operating systems allow creating anonymous POSIX shared memory
>>> +     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
>>> +     * defined by POSIX, so let's create a unique name.
>>> +     *
>>> +     * From Linux's shm_open(3) man-page:
>>> +     *   For  portable  use,  a shared  memory  object should be identified
>>> +     *   by a name of the form /somename;"
>>> +     */
>>> +    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
>>> +                    backend_name);
>>
>> Hm, shouldn't we just let the user specify a name, and if no name was
>> specified, generate one ourselves?
> 
> I thought about it and initially did it that way, but then some problems
> came up so I tried to keep it as simple as possible for the user and for
> our use case (having an fd associated with memory and sharing it with
> other processes).
> 
> The problems I had were:
> 
> - what mode_t to use if the object does not exist and needs to be
>     created? >
> - exclude O_EXCL if the user passes the name since they may have already
>     created it?

I'd handle both like we handle files. But I understand now that you 
really just want to "simulate" memfd.

> 
> - call shm_unlink() only at object deallocation?

Right, we don't really do that for ordinary files, they keep existing. 
We only have the "discard-data" property to free up memory.

For memfd, it just happens "automatically". They cannot be looked up by 
name, and once the last user closes the fd, it gets destroyed.

> 
> So I thought that for now we only support the "anonymous" mode, and if
> in the future we have a use case where the user wants to specify the
> name, we can add it later.

Okay, so for now you really only want an anonymous fd that behaves like 
a memfd, got it.

Likely we should somehow fail if the "POSIX shared memory object" 
already exists. Hmm.

> 
> That said, if you think it's already useful from the beginning, I can
> add the name as an optional parameter.
> 
>>
>> I'm also not quite sure if "host_memory_backend_get_name()" should be
>> used for the purpose here.
> 
> What problem do you see? As an alternative I thought of a static
> counter.

If it's really just an "internal UUID", as we'll remove the file using 
shm_unlink(), then the name in fact shouldn't matter and this would be 
fine. Correct?


So I assume if we ever want to have non-anonymous fds here, we'd pass in 
a new property "name", and change the logic how we open/unlink. Makes sense.
Stefano Garzarella March 27, 2024, 3:40 p.m. UTC | #4
On Wed, Mar 27, 2024 at 12:51:51PM +0100, David Hildenbrand wrote:
>On 27.03.24 11:23, Stefano Garzarella wrote:
>>On Tue, Mar 26, 2024 at 03:45:52PM +0100, David Hildenbrand wrote:
>>>>+    mode = 0;
>>>>+    oflag = O_RDWR | O_CREAT | O_EXCL;
>>>>+    backend_name = host_memory_backend_get_name(backend);
>>>>+
>>>>+    /*
>>>>+     * Some operating systems allow creating anonymous POSIX shared memory
>>>>+     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
>>>>+     * defined by POSIX, so let's create a unique name.
>>>>+     *
>>>>+     * From Linux's shm_open(3) man-page:
>>>>+     *   For  portable  use,  a shared  memory  object should be identified
>>>>+     *   by a name of the form /somename;"
>>>>+     */
>>>>+    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
>>>>+                    backend_name);
>>>
>>>Hm, shouldn't we just let the user specify a name, and if no name was
>>>specified, generate one ourselves?
>>
>>I thought about it and initially did it that way, but then some problems
>>came up so I tried to keep it as simple as possible for the user and for
>>our use case (having an fd associated with memory and sharing it with
>>other processes).
>>
>>The problems I had were:
>>
>>- what mode_t to use if the object does not exist and needs to be
>>    created? >
>>- exclude O_EXCL if the user passes the name since they may have already
>>    created it?
>
>I'd handle both like we handle files. But I understand now that you 
>really just want to "simulate" memfd.

Right, maybe I should write that in the commit message and 
documentation.

>
>>
>>- call shm_unlink() only at object deallocation?
>
>Right, we don't really do that for ordinary files, they keep existing. 
>We only have the "discard-data" property to free up memory.
>
>For memfd, it just happens "automatically". They cannot be looked up 
>by name, and once the last user closes the fd, it gets destroyed.

Yep, I see.

>
>>
>>So I thought that for now we only support the "anonymous" mode, and if
>>in the future we have a use case where the user wants to specify the
>>name, we can add it later.
>
>Okay, so for now you really only want an anonymous fd that behaves like 
>a memfd, got it.
>
>Likely we should somehow fail if the "POSIX shared memory object" 
>already exists. Hmm.

`O_CREAT | O_EXCL` should provide just this behavior.
 From shm_open(3) manpage:

     O_EXCL  If O_CREAT was also specified, and a shared memory object
             with the given name already exists, return an error.  The
             check for the existence of the object, and its creation if
             it does not exist, are performed atomically.

>
>>
>>That said, if you think it's already useful from the beginning, I can
>>add the name as an optional parameter.
>>
>>>
>>>I'm also not quite sure if "host_memory_backend_get_name()" should be
>>>used for the purpose here.
>>
>>What problem do you see? As an alternative I thought of a static
>>counter.
>
>If it's really just an "internal UUID", as we'll remove the file using 
>shm_unlink(), then the name in fact shouldn't matter and this would be 
>fine. Correct?

Right. It's a name that will only "appear" in the system for a very 
short time since we call shm_unlink() right after shm_open(). So I just 
need the unique name to prevent several QEMUs launched at the same time 
from colliding with the name.

>
>So I assume if we ever want to have non-anonymous fds here, we'd pass 
>in a new property "name", and change the logic how we open/unlink.  
>Makes sense.

Exactly! I can clarify this in the commit description as well.

Thanks for the helpful comments!
If there is anything I need to change for v3, please tell me ;-)

Stefano
David Hildenbrand March 27, 2024, 3:42 p.m. UTC | #5
>>>
>>> So I thought that for now we only support the "anonymous" mode, and if
>>> in the future we have a use case where the user wants to specify the
>>> name, we can add it later.
>>
>> Okay, so for now you really only want an anonymous fd that behaves like
>> a memfd, got it.
>>
>> Likely we should somehow fail if the "POSIX shared memory object"
>> already exists. Hmm.
> 
> `O_CREAT | O_EXCL` should provide just this behavior.
>   From shm_open(3) manpage:
> 
>       O_EXCL  If O_CREAT was also specified, and a shared memory object
>               with the given name already exists, return an error.  The
>               check for the existence of the object, and its creation if
>               it does not exist, are performed atomically.
> 

Cool!

>>
>>>
>>> That said, if you think it's already useful from the beginning, I can
>>> add the name as an optional parameter.
>>>
>>>>
>>>> I'm also not quite sure if "host_memory_backend_get_name()" should be
>>>> used for the purpose here.
>>>
>>> What problem do you see? As an alternative I thought of a static
>>> counter.
>>
>> If it's really just an "internal UUID", as we'll remove the file using
>> shm_unlink(), then the name in fact shouldn't matter and this would be
>> fine. Correct?
> 
> Right. It's a name that will only "appear" in the system for a very
> short time since we call shm_unlink() right after shm_open(). So I just
> need the unique name to prevent several QEMUs launched at the same time
> from colliding with the name.

Okay, essentially emulating tmpfile(), but for weird shmem_open() :)

> 
>>
>> So I assume if we ever want to have non-anonymous fds here, we'd pass
>> in a new property "name", and change the logic how we open/unlink.
>> Makes sense.
> 
> Exactly! I can clarify this in the commit description as well.
> 
> Thanks for the helpful comments!
> If there is anything I need to change for v3, please tell me ;-)

Besides some patch description extension, makes sense to me :)
diff mbox series

Patch

diff --git a/docs/system/devices/vhost-user.rst b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@  Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index baae3a183f..88136b861d 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -720,6 +720,19 @@ 
             '*hugetlbsize': 'size',
             '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -976,6 +989,8 @@ 
     { 'name': 'memory-backend-memfd',
       'if': 'CONFIG_LINUX' },
     'memory-backend-ram',
+    { 'name': 'memory-backend-shm',
+      'if': 'CONFIG_POSIX' },
     'pef-guest',
     { 'name': 'pr-manager-helper',
       'if': 'CONFIG_LINUX' },
@@ -1047,6 +1062,8 @@ 
       'memory-backend-memfd':       { 'type': 'MemoryBackendMemfdProperties',
                                       'if': 'CONFIG_LINUX' },
       'memory-backend-ram':         'MemoryBackendProperties',
+      'memory-backend-shm':         { 'type': 'MemoryBackendShmProperties',
+                                      'if': 'CONFIG_POSIX' },
       'pr-manager-helper':          { 'type': 'PrManagerHelperProperties',
                                       'if': 'CONFIG_LINUX' },
       'qtest':                      'QtestProperties',
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
new file mode 100644
index 0000000000..7595204d29
--- /dev/null
+++ b/backends/hostmem-shm.c
@@ -0,0 +1,118 @@ 
+/*
+ * QEMU host POSIX shared memory object backend
+ *
+ * Copyright (C) 2024 Red Hat Inc
+ *
+ * Authors:
+ *   Stefano Garzarella <sgarzare@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+
+#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
+
+OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendShm, MEMORY_BACKEND_SHM)
+
+struct HostMemoryBackendShm {
+    HostMemoryBackend parent_obj;
+};
+
+static bool
+shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    g_autoptr(GString) shm_name = g_string_new(NULL);
+    g_autofree char *backend_name = NULL;
+    uint32_t ram_flags;
+    int fd, oflag;
+    mode_t mode;
+
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return false;
+    }
+
+    /*
+     * Let's use `mode = 0` because we don't want other processes to open our
+     * memory unless we share the file descriptor with them.
+     */
+    mode = 0;
+    oflag = O_RDWR | O_CREAT | O_EXCL;
+    backend_name = host_memory_backend_get_name(backend);
+
+    /*
+     * Some operating systems allow creating anonymous POSIX shared memory
+     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
+     * defined by POSIX, so let's create a unique name.
+     *
+     * From Linux's shm_open(3) man-page:
+     *   For  portable  use,  a shared  memory  object should be identified
+     *   by a name of the form /somename;"
+     */
+    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
+                    backend_name);
+
+    fd = shm_open(shm_name->str, oflag, mode);
+    if (fd < 0) {
+        error_setg_errno(errp, errno,
+                         "failed to create POSIX shared memory");
+        return false;
+    }
+
+    /*
+     * We have the file descriptor, so we no longer need to expose the
+     * POSIX shared memory object. However it will remain allocated as long as
+     * there are file descriptors pointing to it.
+     */
+    shm_unlink(shm_name->str);
+
+    if (ftruncate(fd, backend->size) == -1) {
+        error_setg_errno(errp, errno,
+                         "failed to resize POSIX shared memory to %" PRIu64,
+                         backend->size);
+        close(fd);
+        return false;
+    }
+
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+
+    return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
+                                              backend_name, backend->size,
+                                              ram_flags, fd, 0, errp);
+}
+
+static void
+shm_backend_instance_init(Object *obj)
+{
+    HostMemoryBackendShm *m = MEMORY_BACKEND_SHM(obj);
+
+    MEMORY_BACKEND(m)->share = true;
+}
+
+static void
+shm_backend_class_init(ObjectClass *oc, void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = shm_backend_memory_alloc;
+}
+
+static const TypeInfo shm_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_SHM,
+    .parent = TYPE_MEMORY_BACKEND,
+    .instance_init = shm_backend_instance_init,
+    .class_init = shm_backend_class_init,
+    .instance_size = sizeof(HostMemoryBackendShm),
+};
+
+static void register_types(void)
+{
+    type_register_static(&shm_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/meson.build b/backends/meson.build
index 8b2b111497..3867b0d363 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -13,6 +13,7 @@  system_ss.add([files(
 if host_os != 'windows'
   system_ss.add(files('rng-random.c'))
   system_ss.add(files('hostmem-file.c'))
+  system_ss.add([files('hostmem-shm.c'), rt])
 endif
 if host_os == 'linux'
   system_ss.add(files('hostmem-memfd.c'))
diff --git a/qemu-options.hx b/qemu-options.hx
index 7fd1713fa8..70fa98ba19 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5230,6 +5230,16 @@  SRST
 
         The ``share`` boolean option is on by default with memfd.
 
+    ``-object memory-backend-shm,id=id,merge=on|off,dump=on|off,share=on|off,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave``
+        Creates a POSIX shared memory backend object, which allows
+        QEMU to share the memory with an external process (e.g. when
+        using vhost-user).
+
+        Please refer to ``memory-backend-file`` for a description of the
+        options.
+
+        The ``share`` boolean option is on by default with shm.
+
     ``-object iommufd,id=id[,fd=fd]``
         Creates an iommufd backend which allows control of DMA mapping
         through the ``/dev/iommu`` device.