From patchwork Fri Dec 17 13:46:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570039 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=dX6zeBlN; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFr176bH8z9t6S for ; Sat, 18 Dec 2021 00:48:43 +1100 (AEDT) Received: from localhost ([::1]:38438 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDbF-0005bk-Mh for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 08:48:41 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53974) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZ2-0005Xg-Ot for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:46:29 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36046) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZ1-0008Di-9b for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:46:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748782; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f2PlSZziVpHM5bJpNBPE/X7Wwud5Knv8DhA+h96y4dE=; b=dX6zeBlNnAOkULJgDhlAT1o/L6t9fvSOX46Cpta47v/w25+JwqQsCYscKACDtpiuIGfJuh ovvhHM3tCTTPO4bM9DkCO5tptDtxN2YoeyzfnviDbQsYQnraIr/3ecrDHksk7s4PujUxfo LBBDS2lCt5BzTyuz/UtyHKyJuSJki8o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-402-YUWBcdfLOb27Y2zaiwYkKg-1; Fri, 17 Dec 2021 08:46:19 -0500 X-MC-Unique: YUWBcdfLOb27Y2zaiwYkKg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9C1A31015DBB; Fri, 17 Dec 2021 13:46:18 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id E55D75E482; Fri, 17 Dec 2021 13:46:15 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 1/8] util/oslib-posix: Let touch_all_pages() return an error Date: Fri, 17 Dec 2021 14:46:04 +0100 Message-Id: <20211217134611.31172-2-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Let's prepare touch_all_pages() for returning differing errors. Return an error from the thread and report the last processed error. Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of things (memory error, read error, out of memory). When allocating memory fails via the current SIGBUS-based mechanism, we'll get: os_mem_prealloc: preallocating memory failed: Bad address Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- util/oslib-posix.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index e8bdb02e1d..b146beef78 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -84,7 +84,6 @@ typedef struct MemsetThread MemsetThread; static MemsetThread *memset_thread; static int memset_num_threads; -static bool memset_thread_failed; static QemuMutex page_mutex; static QemuCond page_cond; @@ -452,6 +451,7 @@ static void *do_touch_pages(void *arg) { MemsetThread *memset_args = (MemsetThread *)arg; sigset_t set, oldset; + int ret = 0; /* * On Linux, the page faults from the loop below can cause mmap_sem @@ -470,7 +470,7 @@ static void *do_touch_pages(void *arg) pthread_sigmask(SIG_UNBLOCK, &set, &oldset); if (sigsetjmp(memset_args->env, 1)) { - memset_thread_failed = true; + ret = -EFAULT; } else { char *addr = memset_args->addr; size_t numpages = memset_args->numpages; @@ -494,7 +494,7 @@ static void *do_touch_pages(void *arg) } } pthread_sigmask(SIG_SETMASK, &oldset, NULL); - return NULL; + return (void *)(uintptr_t)ret; } static inline int get_memset_num_threads(int smp_cpus) @@ -509,13 +509,13 @@ static inline int get_memset_num_threads(int smp_cpus) return ret; } -static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int smp_cpus) +static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, + int smp_cpus) { static gsize initialized = 0; size_t numpages_per_thread, leftover; + int ret = 0, i = 0; char *addr = area; - int i = 0; if (g_once_init_enter(&initialized)) { qemu_mutex_init(&page_mutex); @@ -523,7 +523,6 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, g_once_init_leave(&initialized, 1); } - memset_thread_failed = false; threads_created_flag = false; memset_num_threads = get_memset_num_threads(smp_cpus); memset_thread = g_new0(MemsetThread, memset_num_threads); @@ -545,12 +544,16 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, qemu_mutex_unlock(&page_mutex); for (i = 0; i < memset_num_threads; i++) { - qemu_thread_join(&memset_thread[i].pgthread); + int tmp = (uintptr_t)qemu_thread_join(&memset_thread[i].pgthread); + + if (tmp) { + ret = tmp; + } } g_free(memset_thread); memset_thread = NULL; - return memset_thread_failed; + return ret; } void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, @@ -573,9 +576,10 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, } /* touch pages simultaneously */ - if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) { - error_setg(errp, "os_mem_prealloc: Insufficient free host memory " - "pages available to allocate guest RAM"); + ret = touch_all_pages(area, hpagesize, numpages, smp_cpus); + if (ret) { + error_setg_errno(errp, -ret, + "os_mem_prealloc: preallocating memory failed"); } ret = sigaction(SIGBUS, &oldact, NULL); From patchwork Fri Dec 17 13:46:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570042 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=d7QN4u6w; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFr6H3Y9cz9t6S for ; Sat, 18 Dec 2021 00:53:11 +1100 (AEDT) Received: from localhost ([::1]:47112 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDfV-0003ow-HN for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 08:53:08 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54010) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZ8-0005YO-Is for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:46:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:37557) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZ5-0008EZ-Ut for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:46:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748787; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nRAI2HEPBW7S+0GYkiI9WAhdb4aMzzmsMdVOd8bcCsc=; b=d7QN4u6wX695LzG1vVVlJxqS+enyfUvh6Pv/8WSBjMqJ8AUY2kagjTscVipMxMmgCQ7aWI opJgaKH0wb8ZUe6CZijuWti6AdlU1DSt8x+gLsbGSLPYcrUhK+H8UIC0N0M6zqEIXql6py XIWiADA3jCCXliHv3oT3Vojix5pFJ/E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-554-QEqTgWQkPLCymDdGp1sLiw-1; Fri, 17 Dec 2021 08:46:22 -0500 X-MC-Unique: QEqTgWQkPLCymDdGp1sLiw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BAEC81015DA0; Fri, 17 Dec 2021 13:46:21 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id EB6375E482; Fri, 17 Dec 2021 13:46:18 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 2/8] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() Date: Fri, 17 Dec 2021 14:46:05 +0100 Message-Id: <20211217134611.31172-3-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Pankaj Gupta , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Let's sense support and use it for preallocation. MADV_POPULATE_WRITE does not require a SIGBUS handler, doesn't actually touch page content, and avoids context switches; it is, therefore, faster and easier to handle than our current approach. While MADV_POPULATE_WRITE is, in general, faster than manual prefaulting, and especially faster with 4k pages, there is still value in prefaulting using multiple threads to speed up preallocation. More details on MADV_POPULATE_WRITE can be found in the Linux commits 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)"), and in the man page proposal [1]. This resolves the TODO in do_touch_pages(). In the future, we might want to look into using fallocate(), eventually combined with MADV_POPULATE_READ, when dealing with shared file/fd mappings and not caring about memory bindings. [1] https://lkml.kernel.org/r/20210816081922.5155-1-david@redhat.com Reviewed-by: Pankaj Gupta Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- include/qemu/osdep.h | 7 ++++ util/oslib-posix.c | 83 +++++++++++++++++++++++++++++++++----------- 2 files changed, 69 insertions(+), 21 deletions(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 60718fc342..d1660d67fa 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -471,6 +471,11 @@ static inline void qemu_cleanup_generic_vfree(void *p) #else #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED #endif +#ifdef MADV_POPULATE_WRITE +#define QEMU_MADV_POPULATE_WRITE MADV_POPULATE_WRITE +#else +#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID +#endif #elif defined(CONFIG_POSIX_MADVISE) @@ -484,6 +489,7 @@ static inline void qemu_cleanup_generic_vfree(void *p) #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED +#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID #else /* no-op */ @@ -497,6 +503,7 @@ static inline void qemu_cleanup_generic_vfree(void *p) #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_REMOVE QEMU_MADV_INVALID +#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID #endif diff --git a/util/oslib-posix.c b/util/oslib-posix.c index b146beef78..cb89e07770 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -484,10 +484,6 @@ static void *do_touch_pages(void *arg) * * 'volatile' to stop compiler optimizing this away * to a no-op - * - * TODO: get a better solution from kernel so we - * don't need to write at all so we don't cause - * wear on the storage backing the region... */ *(volatile char *)addr = *addr; addr += hpagesize; @@ -497,6 +493,26 @@ static void *do_touch_pages(void *arg) return (void *)(uintptr_t)ret; } +static void *do_madv_populate_write_pages(void *arg) +{ + MemsetThread *memset_args = (MemsetThread *)arg; + const size_t size = memset_args->numpages * memset_args->hpagesize; + char * const addr = memset_args->addr; + int ret = 0; + + /* See do_touch_pages(). */ + qemu_mutex_lock(&page_mutex); + while (!threads_created_flag) { + qemu_cond_wait(&page_cond, &page_mutex); + } + qemu_mutex_unlock(&page_mutex); + + if (size && qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE)) { + ret = -errno; + } + return (void *)(uintptr_t)ret; +} + static inline int get_memset_num_threads(int smp_cpus) { long host_procs = sysconf(_SC_NPROCESSORS_ONLN); @@ -510,10 +526,11 @@ static inline int get_memset_num_threads(int smp_cpus) } static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int smp_cpus) + int smp_cpus, bool use_madv_populate_write) { static gsize initialized = 0; size_t numpages_per_thread, leftover; + void *(*touch_fn)(void *); int ret = 0, i = 0; char *addr = area; @@ -523,6 +540,12 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, g_once_init_leave(&initialized, 1); } + if (use_madv_populate_write) { + touch_fn = do_madv_populate_write_pages; + } else { + touch_fn = do_touch_pages; + } + threads_created_flag = false; memset_num_threads = get_memset_num_threads(smp_cpus); memset_thread = g_new0(MemsetThread, memset_num_threads); @@ -533,7 +556,7 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, memset_thread[i].numpages = numpages_per_thread + (i < leftover); memset_thread[i].hpagesize = hpagesize; qemu_thread_create(&memset_thread[i].pgthread, "touch_pages", - do_touch_pages, &memset_thread[i], + touch_fn, &memset_thread[i], QEMU_THREAD_JOINABLE); addr += memset_thread[i].numpages * hpagesize; } @@ -556,6 +579,12 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, return ret; } +static bool madv_populate_write_possible(char *area, size_t pagesize) +{ + return !qemu_madvise(area, pagesize, QEMU_MADV_POPULATE_WRITE) || + errno != EINVAL; +} + void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, Error **errp) { @@ -563,30 +592,42 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, struct sigaction act, oldact; size_t hpagesize = qemu_fd_getpagesize(fd); size_t numpages = DIV_ROUND_UP(memory, hpagesize); + bool use_madv_populate_write; - memset(&act, 0, sizeof(act)); - act.sa_handler = &sigbus_handler; - act.sa_flags = 0; - - ret = sigaction(SIGBUS, &act, &oldact); - if (ret) { - error_setg_errno(errp, errno, - "os_mem_prealloc: failed to install signal handler"); - return; + /* + * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for + * some special mappings, such as mapping /dev/mem. + */ + use_madv_populate_write = madv_populate_write_possible(area, hpagesize); + + if (!use_madv_populate_write) { + memset(&act, 0, sizeof(act)); + act.sa_handler = &sigbus_handler; + act.sa_flags = 0; + + ret = sigaction(SIGBUS, &act, &oldact); + if (ret) { + error_setg_errno(errp, errno, + "os_mem_prealloc: failed to install signal handler"); + return; + } } /* touch pages simultaneously */ - ret = touch_all_pages(area, hpagesize, numpages, smp_cpus); + ret = touch_all_pages(area, hpagesize, numpages, smp_cpus, + use_madv_populate_write); if (ret) { error_setg_errno(errp, -ret, "os_mem_prealloc: preallocating memory failed"); } - ret = sigaction(SIGBUS, &oldact, NULL); - if (ret) { - /* Terminate QEMU since it can't recover from error */ - perror("os_mem_prealloc: failed to reinstall signal handler"); - exit(1); + if (!use_madv_populate_write) { + ret = sigaction(SIGBUS, &oldact, NULL); + if (ret) { + /* Terminate QEMU since it can't recover from error */ + perror("os_mem_prealloc: failed to reinstall signal handler"); + exit(1); + } } } From patchwork Fri Dec 17 13:46:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570048 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=M3XEUdgo; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFrNP4t4Kz9t2p for ; Sat, 18 Dec 2021 01:05:25 +1100 (AEDT) Received: from localhost ([::1]:44522 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDrN-0004Ue-UW for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 09:05:23 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54074) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZY-0005gx-HT for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:00 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:34818) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZW-0008H9-Pk for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:46:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748811; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8MoK7IAUV+XtkoXhjaKgQXMQRFu9GSldmPcnqlrMHGg=; b=M3XEUdgoRHs4lmBNYxu7SP+uTmA0Qj1vXIZYVkILk6+FyvLE3bDhrNW8iH62AZnSYTf0EV f9GK60Wsm3GCJAKohSb05CEZ+rIZIka4B9ofpgtK/2ZpI3sxNdRjbodT5ex24QtdY9mdzq Lh87o9n79+phrBizQBBhPK9AwjkiisI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-145-1rC3appUOt-FuATW1OBVaw-1; Fri, 17 Dec 2021 08:46:47 -0500 X-MC-Unique: 1rC3appUOt-FuATW1OBVaw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B1B29101F7A3; Fri, 17 Dec 2021 13:46:46 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C0C05E482; Fri, 17 Dec 2021 13:46:21 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 3/8] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() Date: Fri, 17 Dec 2021 14:46:06 +0100 Message-Id: <20211217134611.31172-4-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Let's minimize the number of global variables to prepare for os_mem_prealloc() getting called concurrently and make the code a bit easier to read. The only consumer that really needs a global variable is the sigbus handler, which will require protection via a mutex in the future either way as we cannot concurrently mess with the SIGBUS handler. Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- util/oslib-posix.c | 73 +++++++++++++++++++++++++++++----------------- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index cb89e07770..cf2ead54ad 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -73,21 +73,30 @@ #define MAX_MEM_PREALLOC_THREAD_COUNT 16 +struct MemsetThread; + +typedef struct MemsetContext { + bool all_threads_created; + bool any_thread_failed; + struct MemsetThread *threads; + int num_threads; +} MemsetContext; + struct MemsetThread { char *addr; size_t numpages; size_t hpagesize; QemuThread pgthread; sigjmp_buf env; + MemsetContext *context; }; typedef struct MemsetThread MemsetThread; -static MemsetThread *memset_thread; -static int memset_num_threads; +/* used by sigbus_handler() */ +static MemsetContext *sigbus_memset_context; static QemuMutex page_mutex; static QemuCond page_cond; -static bool threads_created_flag; int qemu_get_thread_id(void) { @@ -438,10 +447,13 @@ const char *qemu_get_exec_dir(void) static void sigbus_handler(int signal) { int i; - if (memset_thread) { - for (i = 0; i < memset_num_threads; i++) { - if (qemu_thread_is_self(&memset_thread[i].pgthread)) { - siglongjmp(memset_thread[i].env, 1); + + if (sigbus_memset_context) { + for (i = 0; i < sigbus_memset_context->num_threads; i++) { + MemsetThread *thread = &sigbus_memset_context->threads[i]; + + if (qemu_thread_is_self(&thread->pgthread)) { + siglongjmp(thread->env, 1); } } } @@ -459,7 +471,7 @@ static void *do_touch_pages(void *arg) * clearing until all threads have been created. */ qemu_mutex_lock(&page_mutex); - while(!threads_created_flag){ + while (!memset_args->context->all_threads_created) { qemu_cond_wait(&page_cond, &page_mutex); } qemu_mutex_unlock(&page_mutex); @@ -502,7 +514,7 @@ static void *do_madv_populate_write_pages(void *arg) /* See do_touch_pages(). */ qemu_mutex_lock(&page_mutex); - while (!threads_created_flag) { + while (!memset_args->context->all_threads_created) { qemu_cond_wait(&page_cond, &page_mutex); } qemu_mutex_unlock(&page_mutex); @@ -529,6 +541,9 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, int smp_cpus, bool use_madv_populate_write) { static gsize initialized = 0; + MemsetContext context = { + .num_threads = get_memset_num_threads(smp_cpus), + }; size_t numpages_per_thread, leftover; void *(*touch_fn)(void *); int ret = 0, i = 0; @@ -546,35 +561,41 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, touch_fn = do_touch_pages; } - threads_created_flag = false; - memset_num_threads = get_memset_num_threads(smp_cpus); - memset_thread = g_new0(MemsetThread, memset_num_threads); - numpages_per_thread = numpages / memset_num_threads; - leftover = numpages % memset_num_threads; - for (i = 0; i < memset_num_threads; i++) { - memset_thread[i].addr = addr; - memset_thread[i].numpages = numpages_per_thread + (i < leftover); - memset_thread[i].hpagesize = hpagesize; - qemu_thread_create(&memset_thread[i].pgthread, "touch_pages", - touch_fn, &memset_thread[i], + context.threads = g_new0(MemsetThread, context.num_threads); + numpages_per_thread = numpages / context.num_threads; + leftover = numpages % context.num_threads; + for (i = 0; i < context.num_threads; i++) { + context.threads[i].addr = addr; + context.threads[i].numpages = numpages_per_thread + (i < leftover); + context.threads[i].hpagesize = hpagesize; + context.threads[i].context = &context; + qemu_thread_create(&context.threads[i].pgthread, "touch_pages", + touch_fn, &context.threads[i], QEMU_THREAD_JOINABLE); - addr += memset_thread[i].numpages * hpagesize; + addr += context.threads[i].numpages * hpagesize; + } + + if (!use_madv_populate_write) { + sigbus_memset_context = &context; } qemu_mutex_lock(&page_mutex); - threads_created_flag = true; + context.all_threads_created = true; qemu_cond_broadcast(&page_cond); qemu_mutex_unlock(&page_mutex); - for (i = 0; i < memset_num_threads; i++) { - int tmp = (uintptr_t)qemu_thread_join(&memset_thread[i].pgthread); + for (i = 0; i < context.num_threads; i++) { + int tmp = (uintptr_t)qemu_thread_join(&context.threads[i].pgthread); if (tmp) { ret = tmp; } } - g_free(memset_thread); - memset_thread = NULL; + + if (!use_madv_populate_write) { + sigbus_memset_context = NULL; + } + g_free(context.threads); return ret; } From patchwork Fri Dec 17 13:46:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570047 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KWfYU1gm; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFrMJ63B3z9t2p for ; Sat, 18 Dec 2021 01:04:28 +1100 (AEDT) Received: from localhost ([::1]:42930 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDqU-0003Pt-MD for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 09:04:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54112) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZj-0005kp-LG for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:09 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:41972) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZi-0008ME-0b for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hFc7fj3ByAzhOQCUJkIyjkyCsudyMWMEgKTKmRh5kWM=; b=KWfYU1gmvW8RNQSR61odK0nqm47oGqpNBA+2zRu3TtLW/TDEqDmkRMnN1NfazvLD6FJ2nj lNSDrDp4Ktl7rGYi6jdhvIKT+mCIWYcpMYfWkMlUBbOJLmhns25LVLQCB4lYfPApJrPP2N HXJzwMJonlRyeTIIFdmZg5ocMuI+IbM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-6QTiFyijNquUO8alX_H_NQ-1; Fri, 17 Dec 2021 08:47:02 -0500 X-MC-Unique: 6QTiFyijNquUO8alX_H_NQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3E5161015DA4; Fri, 17 Dec 2021 13:47:01 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0CF465E482; Fri, 17 Dec 2021 13:46:46 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 4/8] util/oslib-posix: Don't create too many threads with small memory or little pages Date: Fri, 17 Dec 2021 14:46:07 +0100 Message-Id: <20211217134611.31172-5-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Pankaj Gupta , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Let's limit the number of threads to something sane, especially that - We don't have more threads than the number of pages we have - We don't have threads that initialize small (< 64 MiB) memory Reviewed-by: Pankaj Gupta Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- util/oslib-posix.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index cf2ead54ad..67c08a425e 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -40,6 +40,7 @@ #include #include "qemu/cutils.h" #include "qemu/compiler.h" +#include "qemu/units.h" #ifdef CONFIG_LINUX #include @@ -525,7 +526,8 @@ static void *do_madv_populate_write_pages(void *arg) return (void *)(uintptr_t)ret; } -static inline int get_memset_num_threads(int smp_cpus) +static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, + int smp_cpus) { long host_procs = sysconf(_SC_NPROCESSORS_ONLN); int ret = 1; @@ -533,6 +535,12 @@ static inline int get_memset_num_threads(int smp_cpus) if (host_procs > 0) { ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus); } + + /* Especially with gigantic pages, don't create more threads than pages. */ + ret = MIN(ret, numpages); + /* Don't start threads to prealloc comparatively little memory. */ + ret = MIN(ret, MAX(1, hpagesize * numpages / (64 * MiB))); + /* In case sysconf() fails, we fall back to single threaded */ return ret; } @@ -542,7 +550,7 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, { static gsize initialized = 0; MemsetContext context = { - .num_threads = get_memset_num_threads(smp_cpus), + .num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus), }; size_t numpages_per_thread, leftover; void *(*touch_fn)(void *); From patchwork Fri Dec 17 13:46:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570040 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=TWJ2R+ok; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFr4X4wlQz9t6S for ; Sat, 18 Dec 2021 00:51:39 +1100 (AEDT) Received: from localhost ([::1]:44164 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDe3-0001Ns-2F for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 08:51:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54144) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZm-0005me-Gx for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25804) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZl-0008N8-2Y for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748828; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kzq1+LNic1KF/MEspKD0a9iTSfwrsblw57W1b4UDQ9Y=; b=TWJ2R+okuu2JmMfiklW7so54zFgLsdFfHCjGCGaQnRxQYLwPGa6azAEUlxR8DRCaUudP4t +9BdxZ37yvsrmPbUnel7EUxNnUhUUPaZzOC7zo5mYwHfg5+SbZWj7KXXCoUxW+ONR8YqvO 1cpe+oV9RZlIF7plspr1LdGvX2GATLA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-194-jMIdMK2mMreKWTJqrRMm6w-1; Fri, 17 Dec 2021 08:47:05 -0500 X-MC-Unique: jMIdMK2mMreKWTJqrRMm6w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8220D801962; Fri, 17 Dec 2021 13:47:04 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id 936345E482; Fri, 17 Dec 2021 13:47:01 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 5/8] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE Date: Fri, 17 Dec 2021 14:46:08 +0100 Message-Id: <20211217134611.31172-6-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Pankaj Gupta , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Let's simplify the case when we only want a single thread and don't have to mess with signal handlers. Reviewed-by: Pankaj Gupta Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- util/oslib-posix.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 67c08a425e..efa4f96d56 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -564,6 +564,14 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, } if (use_madv_populate_write) { + /* Avoid creating a single thread for MADV_POPULATE_WRITE */ + if (context.num_threads == 1) { + if (qemu_madvise(area, hpagesize * numpages, + QEMU_MADV_POPULATE_WRITE)) { + return -errno; + } + return 0; + } touch_fn = do_madv_populate_write_pages; } else { touch_fn = do_touch_pages; From patchwork Fri Dec 17 13:46:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570044 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=SiO7qe3a; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFrB00wBvz9t6S for ; Sat, 18 Dec 2021 00:56:24 +1100 (AEDT) Received: from localhost ([::1]:55836 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDif-0001Is-U5 for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 08:56:21 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDa1-0005vw-9z for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:31 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:20993) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDZy-0008Oc-PC for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748832; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oK7N/JLH1vH5Wlk2zn5oWnKUC7prf7gO4iOXi3lsnLg=; b=SiO7qe3aHVgcTcrNRqArdHYLUaGdkbPTumPqO7LsuHxieE3TWHKY5CYzU4xj8ZqzSl5VVA gj1wZcwa5WX4Wn3gGuYjZwut5E5rRSYgM5MTMzZnQyleDTMyUFw0qrgSxcwFjTXFfHtqSU 5BTWawGoNKJGDq//qkN439wiiLOC8Yg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-27-7maYqgKIP9i_3hik0-w_Aw-1; Fri, 17 Dec 2021 08:47:09 -0500 X-MC-Unique: 7maYqgKIP9i_3hik0-w_Aw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D433E101F7A1; Fri, 17 Dec 2021 13:47:07 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id D054A5E482; Fri, 17 Dec 2021 13:47:04 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 6/8] util/oslib-posix: Support concurrent os_mem_prealloc() invocation Date: Fri, 17 Dec 2021 14:46:09 +0100 Message-Id: <20211217134611.31172-7-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Pankaj Gupta , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Add a mutex to protect the SIGBUS case, as we cannot mess concurrently with the sigbus handler and we have to manage the global variable sigbus_memset_context. The MADV_POPULATE_WRITE path can run concurrently. Note that page_mutex and page_cond are shared between concurrent invocations, which shouldn't be a problem. This is a preparation for future virtio-mem prealloc code, which will call os_mem_prealloc() asynchronously from an iothread when handling guest requests. Reviewed-by: Pankaj Gupta Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- util/oslib-posix.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index efa4f96d56..9829149e4b 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -95,6 +95,7 @@ typedef struct MemsetThread MemsetThread; /* used by sigbus_handler() */ static MemsetContext *sigbus_memset_context; +static QemuMutex sigbus_mutex; static QemuMutex page_mutex; static QemuCond page_cond; @@ -625,6 +626,7 @@ static bool madv_populate_write_possible(char *area, size_t pagesize) void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, Error **errp) { + static gsize initialized; int ret; struct sigaction act, oldact; size_t hpagesize = qemu_fd_getpagesize(fd); @@ -638,6 +640,12 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, use_madv_populate_write = madv_populate_write_possible(area, hpagesize); if (!use_madv_populate_write) { + if (g_once_init_enter(&initialized)) { + qemu_mutex_init(&sigbus_mutex); + g_once_init_leave(&initialized, 1); + } + + qemu_mutex_lock(&sigbus_mutex); memset(&act, 0, sizeof(act)); act.sa_handler = &sigbus_handler; act.sa_flags = 0; @@ -665,6 +673,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, perror("os_mem_prealloc: failed to reinstall signal handler"); exit(1); } + qemu_mutex_unlock(&sigbus_mutex); } } From patchwork Fri Dec 17 13:46:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570043 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ToBz8S5G; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFr8s1D5Bz9t6S for ; Sat, 18 Dec 2021 00:55:25 +1100 (AEDT) Received: from localhost ([::1]:52804 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDhi-0007iS-VC for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 08:55:22 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54208) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDa4-0005wI-Rt for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36723) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDa3-0008RN-9D for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748846; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G85zT+TFbFoBwScNrsx/BhmkMJmVVEL/McD5uvqKsCk=; b=ToBz8S5GslqPUT3600upfzFUjuPObbQgR3uUI6dgEbYEA1MUDD007XWymFQS5DZ8V9nPWp KJWSqwWLENKV28urkI/v7sQEaB7LoiDpemItWLXOUxTksDIPIMou/QrwTppLyHEmMRiyLq GYf88lV0F2Z9+KZ+52xbVE/BuxTHjN4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-646-8VNCxXLKOPKBeaKwBtzn7g-1; Fri, 17 Dec 2021 08:47:23 -0500 X-MC-Unique: 8VNCxXLKOPKBeaKwBtzn7g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BA95F1923B84; Fri, 17 Dec 2021 13:47:22 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2EEC54E2BF; Fri, 17 Dec 2021 13:47:08 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 7/8] util/oslib-posix: Forward SIGBUS to MCE handler under Linux Date: Fri, 17 Dec 2021 14:46:10 +0100 Message-Id: <20211217134611.31172-8-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Temporarily modifying the SIGBUS handler is really nasty, as we might be unlucky and receive an MCE SIGBUS while having our handler registered. Unfortunately, there is no way around messing with SIGBUS when MADV_POPULATE_WRITE is not applicable or not around. Let's forward SIGBUS that don't belong to us to the already registered handler and document the situation. Reviewed-by: Daniel P. Berrangé Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- softmmu/cpus.c | 4 ++++ util/oslib-posix.c | 36 +++++++++++++++++++++++++++++++++--- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/softmmu/cpus.c b/softmmu/cpus.c index 071085f840..23bca46b07 100644 --- a/softmmu/cpus.c +++ b/softmmu/cpus.c @@ -352,6 +352,10 @@ static void qemu_init_sigbus(void) { struct sigaction action; + /* + * ALERT: when modifying this, take care that SIGBUS forwarding in + * os_mem_prealloc() will continue working as expected. + */ memset(&action, 0, sizeof(action)); action.sa_flags = SA_SIGINFO; action.sa_sigaction = sigbus_handler; diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 9829149e4b..5c47aa9cb7 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -95,6 +95,7 @@ typedef struct MemsetThread MemsetThread; /* used by sigbus_handler() */ static MemsetContext *sigbus_memset_context; +struct sigaction sigbus_oldact; static QemuMutex sigbus_mutex; static QemuMutex page_mutex; @@ -446,7 +447,11 @@ const char *qemu_get_exec_dir(void) return exec_dir; } +#ifdef CONFIG_LINUX +static void sigbus_handler(int signal, siginfo_t *siginfo, void *ctx) +#else /* CONFIG_LINUX */ static void sigbus_handler(int signal) +#endif /* CONFIG_LINUX */ { int i; @@ -459,6 +464,26 @@ static void sigbus_handler(int signal) } } } + +#ifdef CONFIG_LINUX + /* + * We assume that the MCE SIGBUS handler could have been registered. We + * should never receive BUS_MCEERR_AO on any of our threads, but only on + * the main thread registered for PR_MCE_KILL_EARLY. Further, we should not + * receive BUS_MCEERR_AR triggered by action of other threads on one of + * our threads. So, no need to check for unrelated SIGBUS when seeing one + * for our threads. + * + * We will forward to the MCE handler, which will either handle the SIGBUS + * or reinstall the default SIGBUS handler and reraise the SIGBUS. The + * default SIGBUS handler will crash the process, so we don't care. + */ + if (sigbus_oldact.sa_flags & SA_SIGINFO) { + sigbus_oldact.sa_sigaction(signal, siginfo, ctx); + return; + } +#endif /* CONFIG_LINUX */ + warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored"); } static void *do_touch_pages(void *arg) @@ -628,10 +653,10 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, { static gsize initialized; int ret; - struct sigaction act, oldact; size_t hpagesize = qemu_fd_getpagesize(fd); size_t numpages = DIV_ROUND_UP(memory, hpagesize); bool use_madv_populate_write; + struct sigaction act; /* * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for @@ -647,10 +672,15 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, qemu_mutex_lock(&sigbus_mutex); memset(&act, 0, sizeof(act)); +#ifdef CONFIG_LINUX + act.sa_sigaction = &sigbus_handler; + act.sa_flags = SA_SIGINFO; +#else /* CONFIG_LINUX */ act.sa_handler = &sigbus_handler; act.sa_flags = 0; +#endif /* CONFIG_LINUX */ - ret = sigaction(SIGBUS, &act, &oldact); + ret = sigaction(SIGBUS, &act, &sigbus_oldact); if (ret) { error_setg_errno(errp, errno, "os_mem_prealloc: failed to install signal handler"); @@ -667,7 +697,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, } if (!use_madv_populate_write) { - ret = sigaction(SIGBUS, &oldact, NULL); + ret = sigaction(SIGBUS, &sigbus_oldact, NULL); if (ret) { /* Terminate QEMU since it can't recover from error */ perror("os_mem_prealloc: failed to reinstall signal handler"); From patchwork Fri Dec 17 13:46:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1570049 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i/fZLK6z; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JFrRF0sNpz9t2p for ; Sat, 18 Dec 2021 01:07:53 +1100 (AEDT) Received: from localhost ([::1]:50952 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1myDtm-0000Yf-Kb for incoming@patchwork.ozlabs.org; Fri, 17 Dec 2021 09:07:50 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54234) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDa7-0005wv-JX for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:56831) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myDa5-0008Rh-Uo for qemu-devel@nongnu.org; Fri, 17 Dec 2021 08:47:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639748849; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x23m5FuvyW9Sjwg9MEHWl6eooYHKZxW1JdcPxnU5wgI=; b=i/fZLK6z5mE7Nl7qE08dFPAYAuFcm82FAg+5WHFJHJRcUz0tKn0NSQHD8TUCRO2hGoBTK0 9DTp4j5NrIAU+SPKE47yBJHm+lzvTLJNU/SCM2vCJFAelfl7/fpmhgEK4rv+RKBlTjk4ic KR9ZicjraMV2GCD9r5Aa3FNDeBZE1bQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-163-OZz5gxxIMziH1ykWpH_W7w-1; Fri, 17 Dec 2021 08:47:27 -0500 X-MC-Unique: OZz5gxxIMziH1ykWpH_W7w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6BEA081CCBD; Fri, 17 Dec 2021 13:47:26 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1B1605E483; Fri, 17 Dec 2021 13:47:22 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 8/8] virtio-mem: Support "prealloc=on" option Date: Fri, 17 Dec 2021 14:46:11 +0100 Message-Id: <20211217134611.31172-9-david@redhat.com> In-Reply-To: <20211217134611.31172-1-david@redhat.com> References: <20211217134611.31172-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.718, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Gavin Shan , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , =?utf-8?b?TWljaGFsIFByw612b3puw61r?= , David Hildenbrand , "Dr . David Alan Gilbert" , Sebastien Boeuf , Igor Mammedov , Paolo Bonzini , Hui Zhu Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" For scarce memory resources, such as hugetlb, we want to be able to prealloc such memory resources in order to not crash later on access. On simple user errors we could otherwise easily run out of memory resources an crash the VM -- pretty much undesired. For ordinary memory devices, such as DIMMs, we preallocate memory via the memory backend for such use cases; however, with virtio-mem we're dealing with sparse memory backends; preallocating the whole memory backend destroys the whole purpose of virtio-mem. Instead, we want to preallocate memory when actually exposing memory to the VM dynamically, and fail plugging memory gracefully + warn the user in case preallocation fails. A common use case for hugetlb will be using "reserve=off,prealloc=off" for the memory backend and "prealloc=on" for the virtio-mem device. This way, no huge pages will be reserved for the process, but we can recover if there are no actual huge pages when plugging memory. Libvirt is already prepared for this. Note that preallocation cannot protect from the OOM killer -- which holds true for any kind of preallocation in QEMU. It's primarily useful only for scarce memory resources such as hugetlb, or shared file-backed memory. It's of little use for ordinary anonymous memory that can be swapped, KSM merged, ... but we won't forbid it. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 39 ++++++++++++++++++++++++++++++---- include/hw/virtio/virtio-mem.h | 4 ++++ 2 files changed, 39 insertions(+), 4 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index a5d26d414f..33c7884aa0 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -450,10 +450,40 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, return -EBUSY; } virtio_mem_notify_unplug(vmem, offset, size); - } else if (virtio_mem_notify_plug(vmem, offset, size)) { - /* Could be a mapping attempt resulted in memory getting populated. */ - ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size); - return -EBUSY; + } else { + int ret = 0; + + if (vmem->prealloc) { + void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset; + int fd = memory_region_get_fd(&vmem->memdev->mr); + Error *local_err = NULL; + + os_mem_prealloc(fd, area, size, 1, &local_err); + if (local_err) { + static bool warned; + + /* + * Warn only once, we don't want to fill the log with these + * warnings. + */ + if (!warned) { + warn_report_err(local_err); + warned = true; + } else { + error_free(local_err); + } + ret = -EBUSY; + } + } + if (!ret) { + ret = virtio_mem_notify_plug(vmem, offset, size); + } + + if (ret) { + /* Could be preallocation or a notifier populated memory. */ + ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size); + return -EBUSY; + } } virtio_mem_set_bitmap(vmem, start_gpa, size, plug); return 0; @@ -1165,6 +1195,7 @@ static void virtio_mem_instance_init(Object *obj) static Property virtio_mem_properties[] = { DEFINE_PROP_UINT64(VIRTIO_MEM_ADDR_PROP, VirtIOMEM, addr, 0), DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0), + DEFINE_PROP_BOOL(VIRTIO_MEM_PREALLOC_PROP, VirtIOMEM, prealloc, false), DEFINE_PROP_LINK(VIRTIO_MEM_MEMDEV_PROP, VirtIOMEM, memdev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), #if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS) diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index 38c67a89f2..7745cfc1a3 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass, #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size" #define VIRTIO_MEM_ADDR_PROP "memaddr" #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible" +#define VIRTIO_MEM_PREALLOC_PROP "prealloc" struct VirtIOMEM { VirtIODevice parent_obj; @@ -70,6 +71,9 @@ struct VirtIOMEM { */ OnOffAuto unplugged_inaccessible; + /* whether to prealloc memory when plugging new blocks */ + bool prealloc; + /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers;