From patchwork Tue May 28 10:38:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Garzarella X-Patchwork-Id: 1940403 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=h2KsgAqm; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VpTX25MFxz20Q3 for ; Tue, 28 May 2024 20:38:58 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sBuE2-0001sH-9B; Tue, 28 May 2024 06:38:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sBuE0-0001jl-90 for qemu-devel@nongnu.org; Tue, 28 May 2024 06:38:36 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sBuDy-0000FF-2K for qemu-devel@nongnu.org; Tue, 28 May 2024 06:38:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1716892713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=db29oXrpbJH0UickSPvavWdNOAkMBJOnZvy+b0VtUIk=; b=h2KsgAqmddPc6NzTSug6EjjV4HpOc+JrCgsN2Mk1LYKZ1rxSI/W3hmuh0FeH5gTn8xXZ2S 2e1Nd1Gr5W1yiU51rxadPayUKHOkhAWYyFM+4EatnYaInKue434UWTotsA06tT7ijzIekN gxiVyh4C+LCKq9ebaGmD/1dB0njH2pE= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-515-boqFkDbqNdWKtOH8yR6Y1A-1; Tue, 28 May 2024 06:38:31 -0400 X-MC-Unique: boqFkDbqNdWKtOH8yR6Y1A-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4206b3500f5so4669105e9.1 for ; Tue, 28 May 2024 03:38:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716892708; x=1717497508; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=db29oXrpbJH0UickSPvavWdNOAkMBJOnZvy+b0VtUIk=; b=SKp8JI5GTmbozrV2MF10SpYdEz8bGKa0OW7Simv8IrOh1Qwu8o5Gnu842JFco7Rl8z kU3cAkFWYORdsMy+zGiTTUeuxyNNrpsXJBmE+EKvjKG708Dmcc6xTXYNWRH5+9eg+fkF 5Lr7j/og5g1GpDT1eSrLhGMg3U1f0XfTN1QnJlfy9dsncEx4chJpS62Qy1Xa09CHS/KR KPKRwISJ+ZTa+txzZP0dqWcsTqVP5/kavo68DOYI4/ymZMB1NUxScPkfzfx0diFIqTaQ JN29bgW+PYyC7AaBYZWuROfCKyDGGAEaQ0BNiDUo1YoYtlfguPd2btwQu/OXpCAajSpn 67oA== X-Gm-Message-State: AOJu0YxScamXpnX2049DgoiAjL2oG6em9h6PhYnGqzGo6ZPWPKjvBXpq Su+8nSYaDGQdmOteb27kAn6SUlkMtx7sdHGx4yp5/7Nvqb5qBkxQFH7zCL48U9t9YZXSG2X9iP4 Iv7XF6HL8YS1qMj2XovWw9QUFVhbUmkXRH9STjGosCxm3TFJ0HLAa6fp1Z3FlREJxEGcceXH4Ky t+uvgtDLi3dCmm9suhltUrYCYjOozHjy5opfMp X-Received: by 2002:a05:600c:19d0:b0:421:20ac:1244 with SMTP id 5b1f17b1804b1-42120ac12d2mr7432825e9.22.1716892708480; Tue, 28 May 2024 03:38:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGIEkv8TeVdmI6D54Z9twJYFgWC0lyFFvHOuiDyS4mUfcYF64wZIRv6qFyYnZA14DR0BqfaKg== X-Received: by 2002:a05:600c:19d0:b0:421:20ac:1244 with SMTP id 5b1f17b1804b1-42120ac12d2mr7432415e9.22.1716892707946; Tue, 28 May 2024 03:38:27 -0700 (PDT) Received: from step1.redhat.com (host-79-53-30-109.retail.telecomitalia.it. [79.53.30.109]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4211efcf256sm14722245e9.0.2024.05.28.03.38.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 May 2024 03:38:25 -0700 (PDT) From: Stefano Garzarella To: qemu-devel@nongnu.org Cc: Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Gerd Hoffmann , gmaglione@redhat.com, Raphael Norwitz , Laurent Vivier , Brad Smith , slp@redhat.com, stefanha@redhat.com, Igor Mammedov , Eduardo Habkost , David Hildenbrand , qemu-block@nongnu.org, Kevin Wolf , Thomas Huth , Coiby Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Jason Wang , Paolo Bonzini , "Michael S. Tsirkin" , Stefano Garzarella Subject: [PATCH v6 10/12] hostmem: add a new memory backend based on POSIX shm_open() Date: Tue, 28 May 2024 12:38:23 +0200 Message-ID: <20240528103823.146231-1-sgarzare@redhat.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240528103543.145412-1-sgarzare@redhat.com> References: <20240528103543.145412-1-sgarzare@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=sgarzare@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.034, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org shm_open() creates and opens a new POSIX shared memory object. A POSIX shared memory object allows creating memory backend with an associated file descriptor that can be shared with external processes (e.g. vhost-user). The new `memory-backend-shm` can be used as an alternative when `memory-backend-memfd` is not available (Linux only), since shm_open() should be provided by any POSIX-compliant operating system. This backend mimics memfd, allocating memory that is practically anonymous. In theory shm_open() requires a name, but this is allocated for a short time interval and shm_unlink() is called right after shm_open(). After that, only fd is shared with external processes (e.g., vhost-user) as if it were associated with anonymous memory. In the future we may also allow the user to specify the name to be passed to shm_open(), but for now we keep the backend simple, mimicking anonymous memory such as memfd. Acked-by: David Hildenbrand Acked-by: Stefan Hajnoczi Signed-off-by: Stefano Garzarella --- v5 - fixed documentation in qapi/qom.json and qemu-options.hx [Markus] v4 - fail if we find "share=off" in shm_backend_memory_alloc() [David] v3 - enriched commit message and documentation to highlight that we want to mimic memfd (David) --- docs/system/devices/vhost-user.rst | 5 +- qapi/qom.json | 19 +++++ backends/hostmem-shm.c | 123 +++++++++++++++++++++++++++++ backends/meson.build | 1 + qemu-options.hx | 16 ++++ 5 files changed, 162 insertions(+), 2 deletions(-) create mode 100644 backends/hostmem-shm.c diff --git a/docs/system/devices/vhost-user.rst b/docs/system/devices/vhost-user.rst index 9b2da106ce..35259d8ec7 100644 --- a/docs/system/devices/vhost-user.rst +++ b/docs/system/devices/vhost-user.rst @@ -98,8 +98,9 @@ Shared memory object In order for the daemon to access the VirtIO queues to process the requests it needs access to the guest's address space. This is -achieved via the ``memory-backend-file`` or ``memory-backend-memfd`` -objects. A reference to a file-descriptor which can access this object +achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or +``memory-backend-shm`` objects. +A reference to a file-descriptor which can access this object will be passed via the socket as part of the protocol negotiation. Currently the shared memory object needs to match the size of the main diff --git a/qapi/qom.json b/qapi/qom.json index 38dde6d785..d40592d863 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -721,6 +721,21 @@ '*hugetlbsize': 'size', '*seal': 'bool' } } +## +# @MemoryBackendShmProperties: +# +# Properties for memory-backend-shm objects. +# +# The @share boolean option is true by default with shm. Setting it to false +# will cause a failure during allocation because it is not supported by this +# backend. +# +# Since: 9.1 +## +{ 'struct': 'MemoryBackendShmProperties', + 'base': 'MemoryBackendProperties', + 'data': { } } + ## # @MemoryBackendEpcProperties: # @@ -985,6 +1000,8 @@ { 'name': 'memory-backend-memfd', 'if': 'CONFIG_LINUX' }, 'memory-backend-ram', + { 'name': 'memory-backend-shm', + 'if': 'CONFIG_POSIX' }, 'pef-guest', { 'name': 'pr-manager-helper', 'if': 'CONFIG_LINUX' }, @@ -1056,6 +1073,8 @@ 'memory-backend-memfd': { 'type': 'MemoryBackendMemfdProperties', 'if': 'CONFIG_LINUX' }, 'memory-backend-ram': 'MemoryBackendProperties', + 'memory-backend-shm': { 'type': 'MemoryBackendShmProperties', + 'if': 'CONFIG_POSIX' }, 'pr-manager-helper': { 'type': 'PrManagerHelperProperties', 'if': 'CONFIG_LINUX' }, 'qtest': 'QtestProperties', diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c new file mode 100644 index 0000000000..374edc3db8 --- /dev/null +++ b/backends/hostmem-shm.c @@ -0,0 +1,123 @@ +/* + * QEMU host POSIX shared memory object backend + * + * Copyright (C) 2024 Red Hat Inc + * + * Authors: + * Stefano Garzarella + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "sysemu/hostmem.h" +#include "qapi/error.h" + +#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm" + +OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendShm, MEMORY_BACKEND_SHM) + +struct HostMemoryBackendShm { + HostMemoryBackend parent_obj; +}; + +static bool +shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) +{ + g_autoptr(GString) shm_name = g_string_new(NULL); + g_autofree char *backend_name = NULL; + uint32_t ram_flags; + int fd, oflag; + mode_t mode; + + if (!backend->size) { + error_setg(errp, "can't create shm backend with size 0"); + return false; + } + + if (!backend->share) { + error_setg(errp, "can't create shm backend with `share=off`"); + return false; + } + + /* + * Let's use `mode = 0` because we don't want other processes to open our + * memory unless we share the file descriptor with them. + */ + mode = 0; + oflag = O_RDWR | O_CREAT | O_EXCL; + backend_name = host_memory_backend_get_name(backend); + + /* + * Some operating systems allow creating anonymous POSIX shared memory + * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not + * defined by POSIX, so let's create a unique name. + * + * From Linux's shm_open(3) man-page: + * For portable use, a shared memory object should be identified + * by a name of the form /somename;" + */ + g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(), + backend_name); + + fd = shm_open(shm_name->str, oflag, mode); + if (fd < 0) { + error_setg_errno(errp, errno, + "failed to create POSIX shared memory"); + return false; + } + + /* + * We have the file descriptor, so we no longer need to expose the + * POSIX shared memory object. However it will remain allocated as long as + * there are file descriptors pointing to it. + */ + shm_unlink(shm_name->str); + + if (ftruncate(fd, backend->size) == -1) { + error_setg_errno(errp, errno, + "failed to resize POSIX shared memory to %" PRIu64, + backend->size); + close(fd); + return false; + } + + ram_flags = RAM_SHARED; + ram_flags |= backend->reserve ? 0 : RAM_NORESERVE; + + return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), + backend_name, backend->size, + ram_flags, fd, 0, errp); +} + +static void +shm_backend_instance_init(Object *obj) +{ + HostMemoryBackendShm *m = MEMORY_BACKEND_SHM(obj); + + MEMORY_BACKEND(m)->share = true; +} + +static void +shm_backend_class_init(ObjectClass *oc, void *data) +{ + HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc); + + bc->alloc = shm_backend_memory_alloc; +} + +static const TypeInfo shm_backend_info = { + .name = TYPE_MEMORY_BACKEND_SHM, + .parent = TYPE_MEMORY_BACKEND, + .instance_init = shm_backend_instance_init, + .class_init = shm_backend_class_init, + .instance_size = sizeof(HostMemoryBackendShm), +}; + +static void register_types(void) +{ + type_register_static(&shm_backend_info); +} + +type_init(register_types); diff --git a/backends/meson.build b/backends/meson.build index 8b2b111497..3867b0d363 100644 --- a/backends/meson.build +++ b/backends/meson.build @@ -13,6 +13,7 @@ system_ss.add([files( if host_os != 'windows' system_ss.add(files('rng-random.c')) system_ss.add(files('hostmem-file.c')) + system_ss.add([files('hostmem-shm.c'), rt]) endif if host_os == 'linux' system_ss.add(files('hostmem-memfd.c')) diff --git a/qemu-options.hx b/qemu-options.hx index 8ca7f34ef0..ad6521ef5e 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -5240,6 +5240,22 @@ SRST The ``share`` boolean option is on by default with memfd. + ``-object memory-backend-shm,id=id,merge=on|off,dump=on|off,share=on|off,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave`` + Creates a POSIX shared memory backend object, which allows + QEMU to share the memory with an external process (e.g. when + using vhost-user). + + ``memory-backend-shm`` is a more portable and less featureful version + of ``memory-backend-memfd``. It can then be used in any POSIX system, + especially when memfd is not supported. + + Please refer to ``memory-backend-file`` for a description of the + options. + + The ``share`` boolean option is on by default with shm. Setting it to + off will cause a failure during allocation because it is not supported + by this backend. + ``-object iommufd,id=id[,fd=fd]`` Creates an iommufd backend which allows control of DMA mapping through the ``/dev/iommu`` device.