From patchwork Tue Jul 11 10:35:38 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Tobias Burnus <tobias@codesourcery.com>
X-Patchwork-Id: 1806233
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4R0cjN220Bz20b9
	for <incoming@patchwork.ozlabs.org>; Tue, 11 Jul 2023 20:36:08 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 524F23854E82
	for <incoming@patchwork.ozlabs.org>; Tue, 11 Jul 2023 10:36:06 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 9F2DA385734D
 for <gcc-patches@gcc.gnu.org>; Tue, 11 Jul 2023 10:35:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9F2DA385734D
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="6.01,196,1684828800";
 d="diff'?c'?scan'208";a="12862570"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa2.mentor.iphmx.com with ESMTP; 11 Jul 2023 02:35:50 -0800
IronPort-SDR: 
 oCnTKNYwzQwOIrqdQiLkGfPx49lCbtuUF4r4MG3yONYpK49E3j+d6F2E8xmFRwCovkrBAycVfz
 hNQfF937dLBTNMlXcqgOLHR43qNfm4ibPaD2ZIRqxO5xWAT5IpkkGyU4Q9wdco0vqMnvyJwDaa
 IeZu1c51LoLh03d0mxIXobx1/2H7RkD+PoImILqlQ4C9NMyPIiofxTnrTEKcoCU6qTAt13T+cq
 DVbIIqQ5Uxivy6guQL6oLsacQr6bXhf3E6F3g7ct4yRlELD/mil7CZnwufTBUUOfK0vVDeLFdg
 DGM=
Message-ID: <34fce57b-69a0-a9fd-f8ff-671ee7f94227@codesourcery.com>
Date: Tue, 11 Jul 2023 12:35:38 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Content-Language: en-US
To: gcc-patches <gcc-patches@gcc.gnu.org>, Jakub Jelinek <jakub@redhat.com>
From: Tobias Burnus <tobias@codesourcery.com>
Subject: [Patch] libgomp: Use libnuma for OpenMP's partition=nearest
 allocation trait
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To
 svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12)
X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS,
 SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

While by default 'malloc' allocates memory on the same node as the calling
process/thread ('numactl --show' shows 'preferred node: current',
Linux kernel memory policy MPOL_DEFAULT), this can be changed.
For instance, when running the program as follows, 'malloc' now
prefers to allocate on the second node:
   numactl --preferred=1 ./myproc

Thus, it seems to be sensible to provide a means to ensure the 'nearest'
allocation.  The MPOL_LOCAL policy does so, as provided by
libnuma's numa_alloc_local. (Which is just wrapper around the syscalls
mmap and mbind.) As with (lib)memkind, there is a run-time dlopen check
for (lib)numa - and no numa*.h is required when bulding GCC.

The patch assumes that yesterday's patch
   'libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space'
   https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624030.html
has already been applied. (Which is mostly a .texi only patch, except
for one 'return' -> 'break' change.)

This patch has been bootstrapped and manually tested on x86-64.
It also passed "make check".

Comments, remarks, thoughts?

[I really dislike committing patches without any feedback from others,
but I still intent to do so, if no one comments. This applies to this patch
and the other one.]

Tobias

PS: I have attached a testcase, but as it needs -lnuma, I do not intent
to commit it.  An alternative which could be to do the same as we do in
the patch itself; namely, to use the dlopen handle to obtain the two
libnuma library calls. - I am unsure whether I should do so or
whether I should just leave out the testcase.

Thoughts?
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

libgomp/ChangeLog:

	* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
	(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
	add GOMP_MEMKIND_LIBNUMA.
	(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
	(omp_init_allocator): Handle partition=nearest with libnuma if avail.
	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
	numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
	needed.
	* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
	* libgomp.texi (Memory allocation): Renamed from 'Memory allocation
	with libmemkind'; updated for libnuma usage.

 libgomp/allocator.c              | 202 +++++++++++++++++++++++++++++++++------
 libgomp/config/linux/allocator.c |   1 +
 libgomp/libgomp.texi             |  22 ++++-
 3 files changed, 195 insertions(+), 30 deletions(-)
diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 25c0f150302..2632f16e132 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -31,13 +31,13 @@
 #include "libgomp.h"
 #include <stdlib.h>
 #include <string.h>
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 #include <dlfcn.h>
 #endif
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
-enum gomp_memkind_kind
+enum gomp_numa_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
 #define GOMP_MEMKIND_KINDS \
@@ -50,7 +50,8 @@ enum gomp_memkind_kind
 #define GOMP_MEMKIND_KIND(kind) GOMP_MEMKIND_##kind
   GOMP_MEMKIND_KINDS,
 #undef GOMP_MEMKIND_KIND
-  GOMP_MEMKIND_COUNT
+  GOMP_MEMKIND_COUNT,
+  GOMP_MEMKIND_LIBNUMA = GOMP_MEMKIND_COUNT
 };
 
 struct omp_allocator_data
@@ -65,7 +66,7 @@ struct omp_allocator_data
   unsigned int fallback : 8;
   unsigned int pinned : 1;
   unsigned int partition : 7;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
   unsigned int memkind : 8;
 #endif
 #ifndef HAVE_SYNC_BUILTINS
@@ -81,6 +82,14 @@ struct omp_mem_header
   void *pad;
 };
 
+struct gomp_libnuma_data
+{
+  void *numa_handle;
+  void *(*numa_alloc_local) (size_t);
+  void *(*numa_realloc) (void *, size_t, size_t);
+  void (*numa_free) (void *, size_t);
+};
+
 struct gomp_memkind_data
 {
   void *memkind_handle;
@@ -92,6 +101,50 @@ struct gomp_memkind_data
   void **kinds[GOMP_MEMKIND_COUNT];
 };
 
+#ifdef LIBGOMP_USE_LIBNUMA
+static struct gomp_libnuma_data *libnuma_data;
+static pthread_once_t libnuma_data_once = PTHREAD_ONCE_INIT;
+
+static void
+gomp_init_libnuma (void)
+{
+  void *handle = dlopen ("libnuma.so.1", RTLD_LAZY);
+  struct gomp_libnuma_data *data;
+
+  data = calloc (1, sizeof (struct gomp_libnuma_data));
+  if (data == NULL)
+    {
+      if (handle)
+	dlclose (handle);
+      return;
+    }
+  if (!handle)
+    {
+      __atomic_store_n (&libnuma_data, data, MEMMODEL_RELEASE);
+      return;
+    }
+  data->numa_handle = handle;
+  data->numa_alloc_local
+    = (__typeof (data->numa_alloc_local)) dlsym (handle, "numa_alloc_local");
+  data->numa_realloc
+    = (__typeof (data->numa_realloc)) dlsym (handle, "numa_realloc");
+  data->numa_free
+    = (__typeof (data->numa_free)) dlsym (handle, "numa_free");
+  __atomic_store_n (&libnuma_data, data, MEMMODEL_RELEASE);
+}
+
+static struct gomp_libnuma_data *
+gomp_get_libnuma (void)
+{
+  struct gomp_libnuma_data *data
+    = __atomic_load_n (&libnuma_data, MEMMODEL_ACQUIRE);
+  if (data)
+    return data;
+  pthread_once (&libnuma_data_once, gomp_init_libnuma);
+  return __atomic_load_n (&libnuma_data, MEMMODEL_ACQUIRE);
+}
+#endif
+
 #ifdef LIBGOMP_USE_MEMKIND
 static struct gomp_memkind_data *memkind_data;
 static pthread_once_t memkind_data_once = PTHREAD_ONCE_INIT;
@@ -166,7 +219,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
   struct omp_allocator_data data
     = { memspace, 1, ~(uintptr_t) 0, 0, 0, omp_atv_contended, omp_atv_all,
 	omp_atv_default_mem_fb, omp_atv_false, omp_atv_environment,
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 	GOMP_MEMKIND_NONE
 #endif
       };
@@ -285,8 +338,8 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 
   switch (memspace)
     {
-    case omp_high_bw_mem_space:
 #ifdef LIBGOMP_USE_MEMKIND
+    case omp_high_bw_mem_space:
       struct gomp_memkind_data *memkind_data;
       memkind_data = gomp_get_memkind ();
       if (data.partition == omp_atv_interleaved
@@ -300,17 +353,15 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 	  data.memkind = GOMP_MEMKIND_HBW_PREFERRED;
 	  break;
 	}
-#endif
       break;
     case omp_large_cap_mem_space:
-#ifdef LIBGOMP_USE_MEMKIND
       memkind_data = gomp_get_memkind ();
       if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM_ALL])
 	data.memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
       else if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM])
 	data.memkind = GOMP_MEMKIND_DAX_KMEM;
-#endif
       break;
+#endif
     default:
 #ifdef LIBGOMP_USE_MEMKIND
       if (data.partition == omp_atv_interleaved)
@@ -323,6 +374,14 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
       break;
     }
 
+#ifdef LIBGOMP_USE_LIBNUMA
+  if (data.memkind == GOMP_MEMKIND_NONE && data.partition == omp_atv_nearest)
+    {
+      data.memkind = GOMP_MEMKIND_LIBNUMA;
+      libnuma_data = gomp_get_libnuma ();
+    }
+#endif
+
   /* No support for this so far.  */
   if (data.pinned)
     return omp_null_allocator;
@@ -357,8 +416,8 @@ omp_aligned_alloc (size_t alignment, size_t size,
   struct omp_allocator_data *allocator_data;
   size_t new_size, new_alignment;
   void *ptr, *ret;
-#ifdef LIBGOMP_USE_MEMKIND
-  enum gomp_memkind_kind memkind;
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
+  enum gomp_numa_memkind_kind memkind;
 #endif
 
   if (__builtin_expect (size == 0, 0))
@@ -379,7 +438,7 @@ retry:
       allocator_data = (struct omp_allocator_data *) allocator;
       if (new_alignment < allocator_data->alignment)
 	new_alignment = allocator_data->alignment;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       memkind = allocator_data->memkind;
 #endif
     }
@@ -388,8 +447,10 @@ retry:
       allocator_data = NULL;
       if (new_alignment < sizeof (void *))
 	new_alignment = sizeof (void *);
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       memkind = GOMP_MEMKIND_NONE;
+#endif
+#ifdef LIBGOMP_USE_MEMKIND
       if (allocator == omp_high_bw_mem_alloc)
 	memkind = GOMP_MEMKIND_HBW_PREFERRED;
       else if (allocator == omp_large_cap_mem_alloc)
@@ -444,6 +505,13 @@ retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	ptr = libnuma_data->numa_alloc_local (new_size);
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -469,6 +537,13 @@ retry:
     }
   else
     {
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	ptr = libnuma_data->numa_alloc_local (new_size);
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -502,7 +577,7 @@ fail:
 	{
 	case omp_atv_default_mem_fb:
 	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 	      || memkind
 #endif
 	      || (allocator_data
@@ -577,6 +652,16 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 	  gomp_mutex_unlock (&allocator_data->lock);
 #endif
 	}
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (allocator_data->memkind == GOMP_MEMKIND_LIBNUMA)
+	{
+	  libnuma_data->numa_free (data->ptr, data->size);
+	  return;
+	}
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (allocator_data->memkind)
 	{
@@ -590,7 +675,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #ifdef LIBGOMP_USE_MEMKIND
   else
     {
-      enum gomp_memkind_kind memkind = GOMP_MEMKIND_NONE;
+      enum gomp_numa_memkind_kind memkind = GOMP_MEMKIND_NONE;
       if (data->allocator == omp_high_bw_mem_alloc)
 	memkind = GOMP_MEMKIND_HBW_PREFERRED;
       else if (data->allocator == omp_large_cap_mem_alloc)
@@ -625,8 +710,8 @@ omp_aligned_calloc (size_t alignment, size_t nmemb, size_t size,
   struct omp_allocator_data *allocator_data;
   size_t new_size, size_temp, new_alignment;
   void *ptr, *ret;
-#ifdef LIBGOMP_USE_MEMKIND
-  enum gomp_memkind_kind memkind;
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
+  enum gomp_numa_memkind_kind memkind;
 #endif
 
   if (__builtin_expect (size == 0 || nmemb == 0, 0))
@@ -647,7 +732,7 @@ retry:
       allocator_data = (struct omp_allocator_data *) allocator;
       if (new_alignment < allocator_data->alignment)
 	new_alignment = allocator_data->alignment;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       memkind = allocator_data->memkind;
 #endif
     }
@@ -656,8 +741,10 @@ retry:
       allocator_data = NULL;
       if (new_alignment < sizeof (void *))
 	new_alignment = sizeof (void *);
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       memkind = GOMP_MEMKIND_NONE;
+#endif
+#ifdef LIBGOMP_USE_MEMKIND
       if (allocator == omp_high_bw_mem_alloc)
 	memkind = GOMP_MEMKIND_HBW_PREFERRED;
       else if (allocator == omp_large_cap_mem_alloc)
@@ -714,6 +801,16 @@ retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	{
+	  ptr = libnuma_data->numa_alloc_local (new_size);
+	  memset (ptr, '\0', new_size);
+	}
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -739,6 +836,16 @@ retry:
     }
   else
     {
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	{
+	  ptr = libnuma_data->numa_alloc_local (new_size);
+	  memset (ptr, '\0', new_size);
+	}
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -772,7 +879,7 @@ fail:
 	{
 	case omp_atv_default_mem_fb:
 	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 	      || memkind
 #endif
 	      || (allocator_data
@@ -815,8 +922,8 @@ omp_realloc (void *ptr, size_t size, omp_allocator_handle_t allocator,
   size_t new_size, old_size, new_alignment, old_alignment;
   void *new_ptr, *ret;
   struct omp_mem_header *data;
-#ifdef LIBGOMP_USE_MEMKIND
-  enum gomp_memkind_kind memkind, free_memkind;
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
+  enum gomp_numa_memkind_kind memkind, free_memkind;
 #endif
 
   if (__builtin_expect (ptr == NULL, 0))
@@ -841,15 +948,17 @@ retry:
       allocator_data = (struct omp_allocator_data *) allocator;
       if (new_alignment < allocator_data->alignment)
 	new_alignment = allocator_data->alignment;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       memkind = allocator_data->memkind;
 #endif
     }
   else
     {
       allocator_data = NULL;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       memkind = GOMP_MEMKIND_NONE;
+#endif
+#ifdef LIBGOMP_USE_MEMKIND
       if (allocator == omp_high_bw_mem_alloc)
 	memkind = GOMP_MEMKIND_HBW_PREFERRED;
       else if (allocator == omp_large_cap_mem_alloc)
@@ -865,15 +974,17 @@ retry:
   if (free_allocator > omp_max_predefined_alloc)
     {
       free_allocator_data = (struct omp_allocator_data *) free_allocator;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       free_memkind = free_allocator_data->memkind;
 #endif
     }
   else
     {
       free_allocator_data = NULL;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
       free_memkind = GOMP_MEMKIND_NONE;
+#endif
+#ifdef LIBGOMP_USE_MEMKIND
       if (free_allocator == omp_high_bw_mem_alloc)
 	free_memkind = GOMP_MEMKIND_HBW_PREFERRED;
       else if (free_allocator == omp_large_cap_mem_alloc)
@@ -953,6 +1064,19 @@ retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	{
+	  if (prev_size)
+	    new_ptr = libnuma_data->numa_realloc (data->ptr, data->size,
+						  new_size);
+	  else
+	    new_ptr = libnuma_data->numa_alloc_local (new_size);
+	}
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -1000,6 +1124,13 @@ retry:
 	   && (free_allocator_data == NULL
 	       || free_allocator_data->pool_size == ~(uintptr_t) 0))
     {
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	new_ptr = libnuma_data->numa_realloc (data->ptr, data->size, new_size);
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -1021,6 +1152,13 @@ retry:
     }
   else
     {
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	new_ptr = libnuma_data->numa_alloc_local (new_size);
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
       if (memkind)
 	{
@@ -1060,6 +1198,16 @@ retry:
       gomp_mutex_unlock (&free_allocator_data->lock);
 #endif
     }
+#ifdef LIBGOMP_USE_LIBNUMA
+      if (memkind == GOMP_MEMKIND_LIBNUMA)
+	{
+	  libnuma_data->numa_free (data->ptr, data->size);
+	  return ret;
+	}
+# ifdef LIBGOMP_USE_MEMKIND
+      else
+# endif
+#endif
 #ifdef LIBGOMP_USE_MEMKIND
   if (free_memkind)
     {
@@ -1079,7 +1227,7 @@ fail:
 	{
 	case omp_atv_default_mem_fb:
 	  if (new_alignment > sizeof (void *)
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 	      || memkind
 #endif
 	      || (allocator_data
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 15babcd1ada..64b1b4b9623 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -31,6 +31,7 @@
 #include "libgomp.h"
 #if defined(PLUGIN_SUPPORT) && defined(LIBGOMP_USE_PTHREADS)
 #define LIBGOMP_USE_MEMKIND
+#define LIBGOMP_USE_LIBNUMA
 #endif
 
 #include "../../allocator.c"
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index b1f58e74903..40328456a1d 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4584,7 +4584,7 @@ offloading devices (it's not clear if they should be):
 @menu
 * Implementation-defined ICV Initialization::
 * OpenMP Context Selectors::
-* Memory allocation with libmemkind::
+* Memory allocation::
 @end menu
 
 @node Implementation-defined ICV Initialization
@@ -4631,8 +4631,8 @@ smaller number.  On non-host devices, the value of the
       @tab See @code{-march=} in ``Nvidia PTX Options''
 @end multitable
 
-@node Memory allocation with libmemkind
-@section Memory allocation with libmemkind
+@node Memory allocation
+@section Memory allocation
 
 For the memory spaces, the following applies:
 @itemize
@@ -4656,6 +4656,22 @@ creating memory allocators requesting
       @code{omp_large_cap_mem_space} the allocation will not be interleaved
 @end itemize
 
+On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
+library} (@code{libnuma.so.1}) is available at runtime, it used when creating
+memory allocators requesting
+
+@itemize
+@item the partition trait @code{omp_atv_nearest}, except when the libmemkind
+library is available and the memory space is either
+@code{omp_large_cap_mem_space} or @code{omp_high_bw_mem_space}
+@end itemize
+
+Note that the numa library will round up the allocation size to a multiple of
+the system page size; therefore, consider using it only with large data or
+by sharing allocations by using the @code{pool_size} trait.  Additionally,
+the numa library does not guarantee that for reallocations the same node will
+be used.
+
 Additional notes:
 @itemize
 @item The @code{pinned} trait is unsupported.