diff mbox series

[ovs-dev,v9,2/2] netdev-afxdp: NUMA-aware memory allocation for XSK related memory

Message ID 20200104011326.92161-2-yihung.wei@gmail.com
State Accepted
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,v9,1/2] netdev-linux: Detect numa node id. | expand

Commit Message

Yi-Hung Wei Jan. 4, 2020, 1:13 a.m. UTC
Currently, the AF_XDP socket (XSK) related memory are allocated by main
thread in the main thread's NUMA domain.  With the patch that detects
netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
is different from the main thread's NUMA domain, we will have two
cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).

This patch addresses the aforementioned issue by allocating
the memory in the net device's NUMA domain.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
v9:
  - Addreess review comments from Ilya in patch 2.
    * Add check on numa_available()
    * Properly restore memory policy with get/set_mempolicy.

v8:
  - Addreess review comments from Eelco and Ilya in patch 2.
    * Use OVS_FIND_DEPENDENCY().
    * Avoid the locking issue when calling netdev_get_numa_id().
    * Check NETDEV_NUMA_UNSPEC.
    * Use return value from netdev_get_numa_id() directly, and
      check NETDEV_NUMA_UNSPEC case.
    * Use numa_set_preferred().

---
 Documentation/intro/install/afxdp.rst |  2 +-
 acinclude.m4                          |  2 ++
 include/sparse/automake.mk            |  1 +
 include/sparse/numa.h                 | 27 +++++++++++++++++++++++++++
 lib/netdev-afxdp.c                    | 26 ++++++++++++++++++++++++++
 5 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 include/sparse/numa.h

Comments

William Tu Jan. 16, 2020, 7:01 p.m. UTC | #1
On Fri, Jan 3, 2020 at 5:13 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>
> Currently, the AF_XDP socket (XSK) related memory are allocated by main
> thread in the main thread's NUMA domain.  With the patch that detects
> netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
> the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
> is different from the main thread's NUMA domain, we will have two
> cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).
>
> This patch addresses the aforementioned issue by allocating
> the memory in the net device's NUMA domain.
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

Thanks for working on this!
Acked-by: William Tu <u9012063@gmail.com>
Ilya Maximets Jan. 17, 2020, 10:58 p.m. UTC | #2
On 04.01.2020 02:13, Yi-Hung Wei wrote:
> Currently, the AF_XDP socket (XSK) related memory are allocated by main
> thread in the main thread's NUMA domain.  With the patch that detects
> netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
> the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
> is different from the main thread's NUMA domain, we will have two
> cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).
> 
> This patch addresses the aforementioned issue by allocating
> the memory in the net device's NUMA domain.
> 
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>


Suggesting following incremental for both patches:

---
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 0e43c09ee..ae55944d4 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -674,7 +674,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
     /* Allocate all the xsk related memory in the netdev's NUMA domain. */
     struct bitmask *old_bm = NULL;
     int old_policy, numa_id;
-    if (numa_available() != -1) {
+    if (numa_available() != -1 && ovs_numa_get_n_numas() > 1) {
         numa_id = netdev_get_numa_id(netdev);
         if (numa_id != NETDEV_NUMA_UNSPEC) {
             old_bm = numa_allocate_nodemask();
@@ -723,6 +723,9 @@ out:
     if (old_bm) {
         if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) {
             VLOG_WARN("Failed to restore NUMA memory policy.");
+            /* Can't restore correctly.  Try to use localalloc as the most
+             * likely default memory policy. */
+            numa_set_localalloc();
         }
         numa_bitmask_free(old_bm);
     }
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index af2a34aa9..e1ef58bef 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1459,6 +1459,11 @@ netdev_linux_get_numa_id__(struct netdev_linux *netdev)
     netdev->numa_id = 0;
     netdev->cache_valid |= VALID_NUMA_ID;
 
+    if (ovs_numa_get_n_numas() < 2) {
+        /* No need to check on system with a single NUMA node. */
+        return 0;
+    }
+
     name = netdev_get_name(&netdev->up);
     if (strpbrk(name, "/\\")) {
         VLOG_ERR_RL(&rl, "\"%s\" is not a valid name for a port. "

---


It solves 3 issues:
1. Avoiding warning while using physical device on system without NUMA topology.
2. Attempt to restore memory policy to default and less likely to fail in case
   real restoring failed.
3. Saving some time by avoiding checking NUMA node on system without NUMA.


If you're OK with that, I could squash this in while applying the patch.

Best regards, Ilya Maximets.
William Tu Jan. 17, 2020, 11:19 p.m. UTC | #3
On Fri, Jan 17, 2020 at 2:58 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 04.01.2020 02:13, Yi-Hung Wei wrote:
> > Currently, the AF_XDP socket (XSK) related memory are allocated by main
> > thread in the main thread's NUMA domain.  With the patch that detects
> > netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
> > the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
> > is different from the main thread's NUMA domain, we will have two
> > cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).
> >
> > This patch addresses the aforementioned issue by allocating
> > the memory in the net device's NUMA domain.
> >
> > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
>
>
> Suggesting following incremental for both patches:
>
> ---
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index 0e43c09ee..ae55944d4 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -674,7 +674,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      /* Allocate all the xsk related memory in the netdev's NUMA domain. */
>      struct bitmask *old_bm = NULL;
>      int old_policy, numa_id;
> -    if (numa_available() != -1) {
> +    if (numa_available() != -1 && ovs_numa_get_n_numas() > 1) {
>          numa_id = netdev_get_numa_id(netdev);
>          if (numa_id != NETDEV_NUMA_UNSPEC) {
>              old_bm = numa_allocate_nodemask();
> @@ -723,6 +723,9 @@ out:
>      if (old_bm) {
>          if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) {
>              VLOG_WARN("Failed to restore NUMA memory policy.");
> +            /* Can't restore correctly.  Try to use localalloc as the most
> +             * likely default memory policy. */
> +            numa_set_localalloc();
>          }
>          numa_bitmask_free(old_bm);
>      }
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index af2a34aa9..e1ef58bef 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -1459,6 +1459,11 @@ netdev_linux_get_numa_id__(struct netdev_linux *netdev)
>      netdev->numa_id = 0;
>      netdev->cache_valid |= VALID_NUMA_ID;
>
> +    if (ovs_numa_get_n_numas() < 2) {
> +        /* No need to check on system with a single NUMA node. */
> +        return 0;
> +    }
> +
>      name = netdev_get_name(&netdev->up);
>      if (strpbrk(name, "/\\")) {
>          VLOG_ERR_RL(&rl, "\"%s\" is not a valid name for a port. "
>
> ---
>
>
> It solves 3 issues:
> 1. Avoiding warning while using physical device on system without NUMA topology.
> 2. Attempt to restore memory policy to default and less likely to fail in case
>    real restoring failed.
> 3. Saving some time by avoiding checking NUMA node on system without NUMA.
>
>
> If you're OK with that, I could squash this in while applying the patch.
>

Hi Ilya,

Thanks for the diff!
Yi-Hung is on vacation, and yes, the change looks good to me.

Regards,
William
Ilya Maximets Jan. 18, 2020, 1:23 a.m. UTC | #4
On 18.01.2020 00:19, William Tu wrote:
> On Fri, Jan 17, 2020 at 2:58 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 04.01.2020 02:13, Yi-Hung Wei wrote:
>>> Currently, the AF_XDP socket (XSK) related memory are allocated by main
>>> thread in the main thread's NUMA domain.  With the patch that detects
>>> netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
>>> the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
>>> is different from the main thread's NUMA domain, we will have two
>>> cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).
>>>
>>> This patch addresses the aforementioned issue by allocating
>>> the memory in the net device's NUMA domain.
>>>
>>> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
>>
>>
>> Suggesting following incremental for both patches:
>>
>> ---
>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>> index 0e43c09ee..ae55944d4 100644
>> --- a/lib/netdev-afxdp.c
>> +++ b/lib/netdev-afxdp.c
>> @@ -674,7 +674,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>>      /* Allocate all the xsk related memory in the netdev's NUMA domain. */
>>      struct bitmask *old_bm = NULL;
>>      int old_policy, numa_id;
>> -    if (numa_available() != -1) {
>> +    if (numa_available() != -1 && ovs_numa_get_n_numas() > 1) {
>>          numa_id = netdev_get_numa_id(netdev);
>>          if (numa_id != NETDEV_NUMA_UNSPEC) {
>>              old_bm = numa_allocate_nodemask();
>> @@ -723,6 +723,9 @@ out:
>>      if (old_bm) {
>>          if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) {
>>              VLOG_WARN("Failed to restore NUMA memory policy.");
>> +            /* Can't restore correctly.  Try to use localalloc as the most
>> +             * likely default memory policy. */
>> +            numa_set_localalloc();
>>          }
>>          numa_bitmask_free(old_bm);
>>      }
>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>> index af2a34aa9..e1ef58bef 100644
>> --- a/lib/netdev-linux.c
>> +++ b/lib/netdev-linux.c
>> @@ -1459,6 +1459,11 @@ netdev_linux_get_numa_id__(struct netdev_linux *netdev)
>>      netdev->numa_id = 0;
>>      netdev->cache_valid |= VALID_NUMA_ID;
>>
>> +    if (ovs_numa_get_n_numas() < 2) {
>> +        /* No need to check on system with a single NUMA node. */
>> +        return 0;
>> +    }
>> +
>>      name = netdev_get_name(&netdev->up);
>>      if (strpbrk(name, "/\\")) {
>>          VLOG_ERR_RL(&rl, "\"%s\" is not a valid name for a port. "
>>
>> ---
>>
>>
>> It solves 3 issues:
>> 1. Avoiding warning while using physical device on system without NUMA topology.
>> 2. Attempt to restore memory policy to default and less likely to fail in case
>>    real restoring failed.
>> 3. Saving some time by avoiding checking NUMA node on system without NUMA.
>>
>>
>> If you're OK with that, I could squash this in while applying the patch.
>>
> 
> Hi Ilya,
> 
> Thanks for the diff!
> Yi-Hung is on vacation, and yes, the change looks good to me.

Thanks!  I folded this in and applied both patches to master.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index 7b0736c96114..c4685fa7ebac 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -164,7 +164,7 @@  If a test case fails, check the log at::
 
 Setup AF_XDP netdev
 -------------------
-Before running OVS with AF_XDP, make sure the libbpf and libelf are
+Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are
 set-up right::
 
   ldd vswitchd/ovs-vswitchd
diff --git a/acinclude.m4 b/acinclude.m4
index 542637ac8cb8..f73dc9bf7e3c 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -286,6 +286,8 @@  AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
     AC_CHECK_FUNCS([pthread_spin_lock], [],
       [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
 
+    OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma])
+
     AC_DEFINE([HAVE_AF_XDP], [1],
               [Define to 1 if AF_XDP support is available and enabled.])
     LIBBPF_LDADD=" -lbpf -lelf"
diff --git a/include/sparse/automake.mk b/include/sparse/automake.mk
index 073631e8c082..974ad3fe55f7 100644
--- a/include/sparse/automake.mk
+++ b/include/sparse/automake.mk
@@ -5,6 +5,7 @@  noinst_HEADERS += \
         include/sparse/bits/floatn.h \
         include/sparse/assert.h \
         include/sparse/math.h \
+        include/sparse/numa.h \
         include/sparse/netinet/in.h \
         include/sparse/netinet/ip6.h \
         include/sparse/netpacket/packet.h \
diff --git a/include/sparse/numa.h b/include/sparse/numa.h
new file mode 100644
index 000000000000..3691a0eaf729
--- /dev/null
+++ b/include/sparse/numa.h
@@ -0,0 +1,27 @@ 
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef __CHECKER__
+#error "Use this header only with sparse.  It is not a correct implementation."
+#endif
+
+/* Avoid sparse warning: non-ANSI function declaration of function" */
+#define numa_get_membind_compat() numa_get_membind_compat(void)
+#define numa_get_interleave_mask_compat() numa_get_interleave_mask_compat(void)
+#define numa_get_run_node_mask_compat() numa_get_run_node_mask_compat(void)
+
+/* Get actual <numa.h> definitions for us to annotate and build on. */
+#include_next<numa.h>
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 426dce944977..fad0aab9d550 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -26,6 +26,8 @@ 
 #include <linux/rtnetlink.h>
 #include <linux/if_xdp.h>
 #include <net/if.h>
+#include <numa.h>
+#include <numaif.h>
 #include <poll.h>
 #include <stdlib.h>
 #include <sys/resource.h>
@@ -669,6 +671,24 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
     int err = 0;
 
+    /* Allocate all the xsk related memory in the netdev's NUMA domain. */
+    struct bitmask *old_bm = NULL;
+    int old_policy, numa_id;
+    if (numa_available() != -1) {
+        numa_id = netdev_get_numa_id(netdev);
+        if (numa_id != NETDEV_NUMA_UNSPEC) {
+            old_bm = numa_allocate_nodemask();
+            if (get_mempolicy(&old_policy, old_bm->maskp, old_bm->size + 1,
+                              NULL, 0)) {
+                VLOG_INFO("Failed to get NUMA memory policy.");
+                numa_bitmask_free(old_bm);
+                old_bm = NULL;
+            } else {
+                numa_set_preferred(numa_id);
+            }
+        }
+    }
+
     ovs_mutex_lock(&dev->mutex);
 
     if (netdev->n_rxq == dev->requested_n_rxq
@@ -700,6 +720,12 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     netdev_change_seq_changed(netdev);
 out:
     ovs_mutex_unlock(&dev->mutex);
+    if (old_bm) {
+        if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) {
+            VLOG_WARN("Failed to restore NUMA memory policy.");
+        }
+        numa_bitmask_free(old_bm);
+    }
     return err;
 }