diff mbox series

[ovs-dev,v6,1/5] netdev-dpdk: fix management of pre-existing mempools.

Message ID 1508159716-3656-2-git-send-email-antonio.fischetti@intel.com
State Superseded
Headers show
Series netdev-dpdk: Fix mempool management and other cleanup. | expand

Commit Message

Fischetti, Antonio Oct. 16, 2017, 1:15 p.m. UTC
Fix issues on reconfiguration of pre-existing mempools.
This patch avoids to call dpdk_mp_put() - and erroneously
release the mempool - when it already exists.
Create mempool names by considering also the NUMA socket number.
So a name reflects what socket the mempool is allocated on.
This change is needed for the NUMA-awareness feature.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
CC: Darrell Ball <dlu998@gmail.com>
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Róbert Mulik <robert.mulik@ericsson.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
---
 Test on releasing pre-existing mempools
 =======================================
I've tested this patch by
  - changing at run-time the number of Rx queues:
      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

  - reducing the MTU of the dpdk ports of 1 byte to force
    the configuration of an existing mempool:
      ovs-vsctl set Interface dpdk0 mtu_request=1499

This issue was observed in a PVP test topology with dpdkvhostuserclient
ports. It can happen also with dpdk type ports, eg by reducing the MTU
of 1 byte.

To replicate the bug scenario in the PVP case it's sufficient to
set 1 dpdkvhostuserclient port, and just boot the VM.

Below some more details on my own test setup.

 PVP test setup
 --------------
CLIENT_SOCK_DIR=/tmp
SOCK0=dpdkvhostuser0
SOCK1=dpdkvhostuser1

1 PMD
Add 2 dpdk ports, n_rxq=1
Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-path
 ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-path="$CLIENT_SOCK_DIR/$SOCK0"
 ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-path="$CLIENT_SOCK_DIR/$SOCK1"

Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
 add-flow br0 in_port=1,action=output:3
 add-flow br0 in_port=3,action=output:1
 add-flow br0 in_port=4,action=output:2
 add-flow br0 in_port=2,action=output:4

 Launch QEMU
 -----------
As OvS vhu ports are acting as clients, we must specify 'server' in the next command.
VM_IMAGE=<path/to/your/vm/image>

 sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

 Expected behavior
 -----------------
With this fix OvS shouldn't crash.

 Test NUMA-Awareness feature
 ===========================
Mempool names now contains the requested socket id and become like:
"ovs_4adb057e_1_2030_20512".

Tested with DPDK 17.05.2 (from dpdk-stable branch).
NUMA-awareness feature enabled (DPDK/config/common_base).

Created 1 single dpdkvhostuser port type.
OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes
QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

 Before launching the VM:
 ------------------------
ovs-appctl dpif-netdev/pmd-rxq-show
shows core #1 is serving the vhu port.

pmd thread numa_id 0 core_id 1:
        isolated : false
        port: dpdkvhostuser0    queue-id: 0

 After launching the VM:
 -----------------------
the vhu port is now managed by core #27
pmd thread numa_id 1 core_id 27:
        isolated : false
        port: dpdkvhostuser0    queue-id: 0

and the log shows a new mempool is allocated on NUMA node 1, while
the previous one is deleted:

2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated "ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs
2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing "ovs_4adb057e_0_2030_20512" mempool
---
 lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

Comments

Mark Kavanagh Oct. 17, 2017, 1:33 p.m. UTC | #1
>From: Fischetti, Antonio

>Sent: Monday, October 16, 2017 2:15 PM

>To: dev@openvswitch.org

>Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Kevin Traynor

><ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>; Darrell Ball

><dlu998@gmail.com>; Fischetti, Antonio <antonio.fischetti@intel.com>

>Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.

>

>Fix issues on reconfiguration of pre-existing mempools.

>This patch avoids to call dpdk_mp_put() - and erroneously

>release the mempool - when it already exists.

>Create mempool names by considering also the NUMA socket number.

>So a name reflects what socket the mempool is allocated on.

>This change is needed for the NUMA-awareness feature.


Hi Antonio,

Is there any particular reason why you've combined patches 1 and 2 of the previous series in a single patch here?

I would have thought that these two separate issues would warrant two individual patches (particularly with respect to the reported-by, tested-by tags).

Maybe it's not a big deal, but noted here nonetheless.

Apart from that, there are some comments inline.

Thanks again,
Mark

>

>CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>

>CC: Kevin Traynor <ktraynor@redhat.com>

>CC: Aaron Conole <aconole@redhat.com>

>CC: Darrell Ball <dlu998@gmail.com>

>Reported-by: Ciara Loftus <ciara.loftus@intel.com>

>Tested-by: Ciara Loftus <ciara.loftus@intel.com>

>Reported-by: Róbert Mulik <robert.mulik@ericsson.com>

>Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each

>port.")

>Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

>---

> Test on releasing pre-existing mempools

> =======================================

>I've tested this patch by

>  - changing at run-time the number of Rx queues:

>      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

>

>  - reducing the MTU of the dpdk ports of 1 byte to force

>    the configuration of an existing mempool:

>      ovs-vsctl set Interface dpdk0 mtu_request=1499

>

>This issue was observed in a PVP test topology with dpdkvhostuserclient

>ports. It can happen also with dpdk type ports, eg by reducing the MTU

>of 1 byte.

>

>To replicate the bug scenario in the PVP case it's sufficient to

>set 1 dpdkvhostuserclient port, and just boot the VM.

>

>Below some more details on my own test setup.

>

> PVP test setup

> --------------

>CLIENT_SOCK_DIR=/tmp

>SOCK0=dpdkvhostuser0

>SOCK1=dpdkvhostuser1

>

>1 PMD

>Add 2 dpdk ports, n_rxq=1

>Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-path

> ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-

>path="$CLIENT_SOCK_DIR/$SOCK0"

> ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-

>path="$CLIENT_SOCK_DIR/$SOCK1"

>

>Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1

> add-flow br0 in_port=1,action=output:3

> add-flow br0 in_port=3,action=output:1

> add-flow br0 in_port=4,action=output:2

> add-flow br0 in_port=2,action=output:4

>

> Launch QEMU

> -----------

>As OvS vhu ports are acting as clients, we must specify 'server' in the next

>command.

>VM_IMAGE=<path/to/your/vm/image>

>

> sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name us-

>vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-

>file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem

>-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev

>socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-

>user,id=mynet1,chardev=char0,vhostforce -device virtio-net-

>pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev

>socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-

>user,id=mynet2,chardev=char1,vhostforce -device virtio-net-

>pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

>

> Expected behavior

> -----------------

>With this fix OvS shouldn't crash.

>

> Test NUMA-Awareness feature

> ===========================

>Mempool names now contains the requested socket id and become like:

>"ovs_4adb057e_1_2030_20512".

>

>Tested with DPDK 17.05.2 (from dpdk-stable branch).

>NUMA-awareness feature enabled (DPDK/config/common_base).

>

>Created 1 single dpdkvhostuser port type.

>OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes

>QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

>

> Before launching the VM:

> ------------------------

>ovs-appctl dpif-netdev/pmd-rxq-show

>shows core #1 is serving the vhu port.

>

>pmd thread numa_id 0 core_id 1:

>        isolated : false

>        port: dpdkvhostuser0    queue-id: 0

>

> After launching the VM:

> -----------------------

>the vhu port is now managed by core #27

>pmd thread numa_id 1 core_id 27:

>        isolated : false

>        port: dpdkvhostuser0    queue-id: 0

>

>and the log shows a new mempool is allocated on NUMA node 1, while

>the previous one is deleted:

>

>2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated

>"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs

>2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing

>"ovs_4adb057e_0_2030_20512" mempool

>---

> lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------

> 1 file changed, 25 insertions(+), 17 deletions(-)

>

>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

>index c60f46f..7f2d7ed 100644

>--- a/lib/netdev-dpdk.c

>+++ b/lib/netdev-dpdk.c

>@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> {

>     uint32_t h = hash_string(dmp->if_name, 0);

>     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);

>-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",

>-                       h, dmp->mtu, dmp->mp_size);

>+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",

>+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);

>     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {

>         return NULL;

>     }

>@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> }

>

> static struct dpdk_mp *

>-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> {

>     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);

>     if (!dmp) {

>         return NULL;

>     }

>+    *mp_exists = false;

>     dmp->socket_id = dev->requested_socket_id;

>     dmp->mtu = mtu;

>     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);

>@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST

>             + MIN_NB_MBUF;

>

>-    bool mp_exists = false;

>-

>     do {

>         char *mp_name = dpdk_mp_name(dmp);

>

>         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "

>-                 "with %d Rx and %d Tx queues.",

>+                 "with %d Rx and %d Tx queues, socket id:%d.",

>                  dmp->mp_size, dev->up.name,

>-                 dev->requested_n_rxq, dev->requested_n_txq);

>+                 dev->requested_n_rxq, dev->requested_n_txq,

>+                 dev->requested_socket_id);

>

>         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,

>                                           MP_CACHE_SZ,

>@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>             /* As the mempool create returned EEXIST we can expect the

>              * lookup has returned a valid pointer.  If for some reason

>              * that's not the case we keep track of it. */

>-            mp_exists = true;

>+            *mp_exists = true;

>         } else {

>             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",

>                      mp_name, dmp->mp_size);

>@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);

>             return dmp;

>         }

>-    } while (!mp_exists &&

>+    } while (!(*mp_exists) &&

>             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >= MIN_NB_MBUF));

>

>     rte_free(dmp);

>@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> }

>

> static struct dpdk_mp *

>-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)

>+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> {

>     struct dpdk_mp *dmp;

>

>     ovs_mutex_lock(&dpdk_mp_mutex);

>-    dmp = dpdk_mp_create(dev, mtu);

>+    dmp = dpdk_mp_create(dev, mtu, mp_exists);

>     ovs_mutex_unlock(&dpdk_mp_mutex);

>

>     return dmp;

>@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)


Apologies for not noticing this in my previous review, but the comment for this function needs to be updated.

It currently reads:
   " /* Tries to allocate new mempool on requested_socket_id with
	 * mbuf size corresponding to requested_mtu.
	 * On success new configuration will be applied.
	 * On error, device will be left unchanged. */"

It should be updated to reflect the fact that it tries to allocate a new mempool, or reuse an existing one, where appropriate.
Furthermore, if the error value is EEXIST, the new configuration (i.e. dev->mtu, dev->max_packet_len) is applied, so the device is not unchanged, as the comment suggests.

> {

>     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);

>     struct dpdk_mp *mp;

>+    bool mp_exists;


You don't need the 'mp_exists' variable.

If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup() fails, then dpdk_mp_create() returns NULL, which is already handled by netdev_dpdk_mempool_configure(),
(marked as [1], below).

So, to handle the case where the mempool already exists - i.e. when in the callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to EEXIST - you only need to check rte_errno here;
(see [2], below).

>

>-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));

>+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);


[1]
>     if (!mp) {

>         VLOG_ERR("Failed to create memory pool for netdev "

>                  "%s, with MTU %d on socket %d: %s\n",

>                  dev->up.name, dev->requested_mtu, dev->requested_socket_id,

>                  rte_strerror(rte_errno));

>         return rte_errno;

>+    } else if (mp_exists) {


[2]
"if (rte_errno == EEXIST)" is sufficient here

>+        /* If a new MTU was requested and its rounded value equals the one

>+         * that is currently used, then the existing mempool is returned.

>+         * Update dev with the new values. */

>+        dev->mtu = dev->requested_mtu;

>+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

>+        return EEXIST;


Replace with "return rte_errno" for consistency with the previous return (since the value of rte_errno is guaranteed to be EEXIST here). 

>     } else {

>         dpdk_mp_put(dev->dpdk_mp);

>         dev->dpdk_mp = mp;

>@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)

>     rte_eth_dev_stop(dev->port_id);

>

>     err = netdev_dpdk_mempool_configure(dev);

>-    if (err) {

>+    if (err && err != EEXIST) {

>         goto out;

>     }

>

>@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)

>     netdev_dpdk_remap_txqs(dev);

>

>     err = netdev_dpdk_mempool_configure(dev);

>-    if (err) {

>-        return err;

>-    } else {

>+    if (!err) {

>+        /* A new mempool was created. */

>         netdev_change_seq_changed(&dev->up);

>+    } else if (err != EEXIST){

>+        return err;

>     }

>-

>     if (netdev_dpdk_get_vid(dev) >= 0) {

>         if (dev->vhost_reconfigured == false) {

>             dev->vhost_reconfigured = true;

>--

>2.4.11
Fischetti, Antonio Oct. 17, 2017, 5:04 p.m. UTC | #2
Thanks Mark, comments inline.

-Antonio

> -----Original Message-----

> From: Kavanagh, Mark B

> Sent: Tuesday, October 17, 2017 2:34 PM

> To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

> Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

> Darrell Ball <dlu998@gmail.com>

> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> mempools.

> 

> >From: Fischetti, Antonio

> >Sent: Monday, October 16, 2017 2:15 PM

> >To: dev@openvswitch.org

> >Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Kevin Traynor

> ><ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>; Darrell Ball

> ><dlu998@gmail.com>; Fischetti, Antonio <antonio.fischetti@intel.com>

> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.

> >

> >Fix issues on reconfiguration of pre-existing mempools.

> >This patch avoids to call dpdk_mp_put() - and erroneously

> >release the mempool - when it already exists.

> >Create mempool names by considering also the NUMA socket number.

> >So a name reflects what socket the mempool is allocated on.

> >This change is needed for the NUMA-awareness feature.

> 

> Hi Antonio,

> 

> Is there any particular reason why you've combined patches 1 and 2 of the

> previous series in a single patch here?

> 

> I would have thought that these two separate issues would warrant two

> individual patches (particularly with respect to the reported-by, tested-by

> tags).


[Antonio]
I guess I misunderstood your previous review where you asked to squash patches 
1 and 3 into one patch.
I understood instead to squash the first 2 patches because they were both bug-fixes.
In the next version v7 I'll restore the 2 separate patches.

> 

> Maybe it's not a big deal, but noted here nonetheless.

> 

> Apart from that, there are some comments inline.

> 

> Thanks again,

> Mark

> 

> >

> >CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>

> >CC: Kevin Traynor <ktraynor@redhat.com>

> >CC: Aaron Conole <aconole@redhat.com>

> >CC: Darrell Ball <dlu998@gmail.com>

> >Reported-by: Ciara Loftus <ciara.loftus@intel.com>

> >Tested-by: Ciara Loftus <ciara.loftus@intel.com>

> >Reported-by: Róbert Mulik <robert.mulik@ericsson.com>

> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each

> >port.")

> >Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

> >---

> > Test on releasing pre-existing mempools

> > =======================================

> >I've tested this patch by

> >  - changing at run-time the number of Rx queues:

> >      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

> >

> >  - reducing the MTU of the dpdk ports of 1 byte to force

> >    the configuration of an existing mempool:

> >      ovs-vsctl set Interface dpdk0 mtu_request=1499

> >

> >This issue was observed in a PVP test topology with dpdkvhostuserclient

> >ports. It can happen also with dpdk type ports, eg by reducing the MTU

> >of 1 byte.

> >

> >To replicate the bug scenario in the PVP case it's sufficient to

> >set 1 dpdkvhostuserclient port, and just boot the VM.

> >

> >Below some more details on my own test setup.

> >

> > PVP test setup

> > --------------

> >CLIENT_SOCK_DIR=/tmp

> >SOCK0=dpdkvhostuser0

> >SOCK1=dpdkvhostuser1

> >

> >1 PMD

> >Add 2 dpdk ports, n_rxq=1

> >Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-path

> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-

> >path="$CLIENT_SOCK_DIR/$SOCK0"

> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-

> >path="$CLIENT_SOCK_DIR/$SOCK1"

> >

> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1

> > add-flow br0 in_port=1,action=output:3

> > add-flow br0 in_port=3,action=output:1

> > add-flow br0 in_port=4,action=output:2

> > add-flow br0 in_port=2,action=output:4

> >

> > Launch QEMU

> > -----------

> >As OvS vhu ports are acting as clients, we must specify 'server' in the next

> >command.

> >VM_IMAGE=<path/to/your/vm/image>

> >

> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name us-

> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-

> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem

> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev

> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-

> >user,id=mynet1,chardev=char0,vhostforce -device virtio-net-

> >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev

> >socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-

> >user,id=mynet2,chardev=char1,vhostforce -device virtio-net-

> >pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

> >

> > Expected behavior

> > -----------------

> >With this fix OvS shouldn't crash.

> >

> > Test NUMA-Awareness feature

> > ===========================

> >Mempool names now contains the requested socket id and become like:

> >"ovs_4adb057e_1_2030_20512".

> >

> >Tested with DPDK 17.05.2 (from dpdk-stable branch).

> >NUMA-awareness feature enabled (DPDK/config/common_base).

> >

> >Created 1 single dpdkvhostuser port type.

> >OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes

> >QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

> >

> > Before launching the VM:

> > ------------------------

> >ovs-appctl dpif-netdev/pmd-rxq-show

> >shows core #1 is serving the vhu port.

> >

> >pmd thread numa_id 0 core_id 1:

> >        isolated : false

> >        port: dpdkvhostuser0    queue-id: 0

> >

> > After launching the VM:

> > -----------------------

> >the vhu port is now managed by core #27

> >pmd thread numa_id 1 core_id 27:

> >        isolated : false

> >        port: dpdkvhostuser0    queue-id: 0

> >

> >and the log shows a new mempool is allocated on NUMA node 1, while

> >the previous one is deleted:

> >

> >2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated

> >"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs

> >2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing

> >"ovs_4adb057e_0_2030_20512" mempool

> >---

> > lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------

> > 1 file changed, 25 insertions(+), 17 deletions(-)

> >

> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

> >index c60f46f..7f2d7ed 100644

> >--- a/lib/netdev-dpdk.c

> >+++ b/lib/netdev-dpdk.c

> >@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> > {

> >     uint32_t h = hash_string(dmp->if_name, 0);

> >     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);

> >-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",

> >-                       h, dmp->mtu, dmp->mp_size);

> >+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",

> >+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);

> >     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {

> >         return NULL;

> >     }

> >@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> > }

> >

> > static struct dpdk_mp *

> >-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> > {

> >     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);

> >     if (!dmp) {

> >         return NULL;

> >     }

> >+    *mp_exists = false;

> >     dmp->socket_id = dev->requested_socket_id;

> >     dmp->mtu = mtu;

> >     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);

> >@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST

> >             + MIN_NB_MBUF;

> >

> >-    bool mp_exists = false;

> >-

> >     do {

> >         char *mp_name = dpdk_mp_name(dmp);

> >

> >         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "

> >-                 "with %d Rx and %d Tx queues.",

> >+                 "with %d Rx and %d Tx queues, socket id:%d.",

> >                  dmp->mp_size, dev->up.name,

> >-                 dev->requested_n_rxq, dev->requested_n_txq);

> >+                 dev->requested_n_rxq, dev->requested_n_txq,

> >+                 dev->requested_socket_id);

> >

> >         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,

> >                                           MP_CACHE_SZ,

> >@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >             /* As the mempool create returned EEXIST we can expect the

> >              * lookup has returned a valid pointer.  If for some reason

> >              * that's not the case we keep track of it. */

> >-            mp_exists = true;

> >+            *mp_exists = true;

> >         } else {

> >             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",

> >                      mp_name, dmp->mp_size);

> >@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);

> >             return dmp;

> >         }

> >-    } while (!mp_exists &&

> >+    } while (!(*mp_exists) &&

> >             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >= MIN_NB_MBUF));

> >

> >     rte_free(dmp);

> >@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> > }

> >

> > static struct dpdk_mp *

> >-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)

> >+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> > {

> >     struct dpdk_mp *dmp;

> >

> >     ovs_mutex_lock(&dpdk_mp_mutex);

> >-    dmp = dpdk_mp_create(dev, mtu);

> >+    dmp = dpdk_mp_create(dev, mtu, mp_exists);

> >     ovs_mutex_unlock(&dpdk_mp_mutex);

> >

> >     return dmp;

> >@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)

> 

> Apologies for not noticing this in my previous review, but the comment for this

> function needs to be updated.

> 

> It currently reads:

>    " /* Tries to allocate new mempool on requested_socket_id with

> 	 * mbuf size corresponding to requested_mtu.

> 	 * On success new configuration will be applied.

> 	 * On error, device will be left unchanged. */"

> 

> It should be updated to reflect the fact that it tries to allocate a new

> mempool, or reuse an existing one, where appropriate.

> Furthermore, if the error value is EEXIST, the new configuration (i.e. dev-

> >mtu, dev->max_packet_len) is applied, so the device is not unchanged, as the

> comment suggests.


[Antonio] Thanks, I'll change that.


> 

> > {

> >     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);

> >     struct dpdk_mp *mp;

> >+    bool mp_exists;

> 

> You don't need the 'mp_exists' variable.

> 

> If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup() fails, then

> dpdk_mp_create() returns NULL, which is already handled by

> netdev_dpdk_mempool_configure(),

> (marked as [1], below).


[Antonio]
If rte_mempool_lookup() fails a NULL is returned but rte_errno == ENOENT, so it's no more EEXIST.
This means that in netdev_dpdk_mempool_configure() I should check both error codes like:
if (rte_errno == EEXIST || rte_errno == ENOENT)
moreover who knows if in the future the rte_mempool_lookup() will be enriched to return
even more error codes?
Also, I think it's good to rely on rte_errno as long as you test it immediately, I mean 
right after the API call. In the case of netdev_dpdk_mempool_configure() the rte_errno 
was updated by dpdk_mp_create(), which is 2 levels below in the call stack.
That's why I'd prefer to keep track that we are re-using a mempool into an OVS variable 
and be less dependent by RTE return codes.

BTW I could have added mp_exists variable into struct dpdk_mp but I avoided that because 
that goes inside struct netdev_dpdk and I didn't want to increase its size. So I preferred 
to add it as a new input parameter for the functions.

> 

> So, to handle the case where the mempool already exists - i.e. when in the

> callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to EEXIST - you

> only need to check rte_errno here;

> (see [2], below).

> 

> >

> >-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));

> >+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);

> 

> [1]

> >     if (!mp) {

> >         VLOG_ERR("Failed to create memory pool for netdev "

> >                  "%s, with MTU %d on socket %d: %s\n",

> >                  dev->up.name, dev->requested_mtu, dev->requested_socket_id,

> >                  rte_strerror(rte_errno));

> >         return rte_errno;

> >+    } else if (mp_exists) {

> 

> [2]

> "if (rte_errno == EEXIST)" is sufficient here

> 

> >+        /* If a new MTU was requested and its rounded value equals the one

> >+         * that is currently used, then the existing mempool is returned.

> >+         * Update dev with the new values. */

> >+        dev->mtu = dev->requested_mtu;

> >+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

> >+        return EEXIST;

> 

> Replace with "return rte_errno" for consistency with the previous return (since

> the value of rte_errno is guaranteed to be EEXIST here).


[Antonio]
All the callers - like dpdk_vhost_reconfigure_helper() - behave depending on the
outcome of rte_pktmbuf_pool_create(), that could return an error for insufficient
memory for example.
That's why in the previous return I'm returning rte_errno as it is. Instead here
I'm overriding what could be returned by rte_mempool_lookup() and returning EEXIST
because that's the information we want to report to the caller.

 
> 

> >     } else {

> >         dpdk_mp_put(dev->dpdk_mp);

> >         dev->dpdk_mp = mp;

> >@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)

> >     rte_eth_dev_stop(dev->port_id);

> >

> >     err = netdev_dpdk_mempool_configure(dev);

> >-    if (err) {

> >+    if (err && err != EEXIST) {

> >         goto out;

> >     }

> >

> >@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)

> >     netdev_dpdk_remap_txqs(dev);

> >

> >     err = netdev_dpdk_mempool_configure(dev);

> >-    if (err) {

> >-        return err;

> >-    } else {

> >+    if (!err) {

> >+        /* A new mempool was created. */

> >         netdev_change_seq_changed(&dev->up);

> >+    } else if (err != EEXIST){

> >+        return err;

> >     }

> >-

> >     if (netdev_dpdk_get_vid(dev) >= 0) {

> >         if (dev->vhost_reconfigured == false) {

> >             dev->vhost_reconfigured = true;

> >--

> >2.4.11
Mark Kavanagh Oct. 17, 2017, 9:13 p.m. UTC | #3
>From: Fischetti, Antonio

>Sent: Tuesday, October 17, 2017 6:04 PM

>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org

>Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

>Darrell Ball <dlu998@gmail.com>

>Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>mempools.

>

>Thanks Mark, comments inline.

>

>-Antonio

>

>> -----Original Message-----

>> From: Kavanagh, Mark B

>> Sent: Tuesday, October 17, 2017 2:34 PM

>> To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

>> Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

>> Darrell Ball <dlu998@gmail.com>

>> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>> mempools.

>>

>> >From: Fischetti, Antonio

>> >Sent: Monday, October 16, 2017 2:15 PM

>> >To: dev@openvswitch.org

>> >Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Kevin Traynor

>> ><ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>; Darrell Ball

>> ><dlu998@gmail.com>; Fischetti, Antonio <antonio.fischetti@intel.com>

>> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>mempools.

>> >

>> >Fix issues on reconfiguration of pre-existing mempools.

>> >This patch avoids to call dpdk_mp_put() - and erroneously

>> >release the mempool - when it already exists.

>> >Create mempool names by considering also the NUMA socket number.

>> >So a name reflects what socket the mempool is allocated on.

>> >This change is needed for the NUMA-awareness feature.

>>

>> Hi Antonio,

>>

>> Is there any particular reason why you've combined patches 1 and 2 of the

>> previous series in a single patch here?

>>

>> I would have thought that these two separate issues would warrant two

>> individual patches (particularly with respect to the reported-by, tested-by

>> tags).

>

>[Antonio]

>I guess I misunderstood your previous review where you asked to squash patches

>1 and 3 into one patch.


Hi Antonio,

I figured as much ;)

>I understood instead to squash the first 2 patches because they were both bug-

>fixes.

>In the next version v7 I'll restore the 2 separate patches.


Thanks - I think that's a much cleaner approach.

>

>>

>> Maybe it's not a big deal, but noted here nonetheless.

>>

>> Apart from that, there are some comments inline.

>>

>> Thanks again,

>> Mark

>>

>> >

>> >CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>

>> >CC: Kevin Traynor <ktraynor@redhat.com>

>> >CC: Aaron Conole <aconole@redhat.com>

>> >CC: Darrell Ball <dlu998@gmail.com>

>> >Reported-by: Ciara Loftus <ciara.loftus@intel.com>

>> >Tested-by: Ciara Loftus <ciara.loftus@intel.com>

>> >Reported-by: Róbert Mulik <robert.mulik@ericsson.com>

>> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each

>> >port.")

>> >Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

>> >---

>> > Test on releasing pre-existing mempools

>> > =======================================

>> >I've tested this patch by

>> >  - changing at run-time the number of Rx queues:

>> >      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

>> >

>> >  - reducing the MTU of the dpdk ports of 1 byte to force

>> >    the configuration of an existing mempool:

>> >      ovs-vsctl set Interface dpdk0 mtu_request=1499

>> >

>> >This issue was observed in a PVP test topology with dpdkvhostuserclient

>> >ports. It can happen also with dpdk type ports, eg by reducing the MTU

>> >of 1 byte.

>> >

>> >To replicate the bug scenario in the PVP case it's sufficient to

>> >set 1 dpdkvhostuserclient port, and just boot the VM.

>> >

>> >Below some more details on my own test setup.

>> >

>> > PVP test setup

>> > --------------

>> >CLIENT_SOCK_DIR=/tmp

>> >SOCK0=dpdkvhostuser0

>> >SOCK1=dpdkvhostuser1

>> >

>> >1 PMD

>> >Add 2 dpdk ports, n_rxq=1

>> >Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-

>path

>> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-

>> >path="$CLIENT_SOCK_DIR/$SOCK0"

>> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-

>> >path="$CLIENT_SOCK_DIR/$SOCK1"

>> >

>> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1

>> > add-flow br0 in_port=1,action=output:3

>> > add-flow br0 in_port=3,action=output:1

>> > add-flow br0 in_port=4,action=output:2

>> > add-flow br0 in_port=2,action=output:4

>> >

>> > Launch QEMU

>> > -----------

>> >As OvS vhu ports are acting as clients, we must specify 'server' in the

>next

>> >command.

>> >VM_IMAGE=<path/to/your/vm/image>

>> >

>> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name

>us-

>> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-

>> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa

>node,memdev=mem

>> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev

>> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-

>> >user,id=mynet1,chardev=char0,vhostforce -device virtio-net-

>> >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev

>> >socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-

>> >user,id=mynet2,chardev=char1,vhostforce -device virtio-net-

>> >pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

>> >

>> > Expected behavior

>> > -----------------

>> >With this fix OvS shouldn't crash.

>> >

>> > Test NUMA-Awareness feature

>> > ===========================

>> >Mempool names now contains the requested socket id and become like:

>> >"ovs_4adb057e_1_2030_20512".

>> >

>> >Tested with DPDK 17.05.2 (from dpdk-stable branch).

>> >NUMA-awareness feature enabled (DPDK/config/common_base).

>> >

>> >Created 1 single dpdkvhostuser port type.

>> >OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes

>> >QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

>> >

>> > Before launching the VM:

>> > ------------------------

>> >ovs-appctl dpif-netdev/pmd-rxq-show

>> >shows core #1 is serving the vhu port.

>> >

>> >pmd thread numa_id 0 core_id 1:

>> >        isolated : false

>> >        port: dpdkvhostuser0    queue-id: 0

>> >

>> > After launching the VM:

>> > -----------------------

>> >the vhu port is now managed by core #27

>> >pmd thread numa_id 1 core_id 27:

>> >        isolated : false

>> >        port: dpdkvhostuser0    queue-id: 0

>> >

>> >and the log shows a new mempool is allocated on NUMA node 1, while

>> >the previous one is deleted:

>> >

>> >2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated

>> >"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs

>> >2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing

>> >"ovs_4adb057e_0_2030_20512" mempool

>> >---

>> > lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------

>> > 1 file changed, 25 insertions(+), 17 deletions(-)

>> >

>> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

>> >index c60f46f..7f2d7ed 100644

>> >--- a/lib/netdev-dpdk.c

>> >+++ b/lib/netdev-dpdk.c

>> >@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)

>> > {

>> >     uint32_t h = hash_string(dmp->if_name, 0);

>> >     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);

>> >-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",

>> >-                       h, dmp->mtu, dmp->mp_size);

>> >+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",

>> >+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);

>> >     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {

>> >         return NULL;

>> >     }

>> >@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)

>> > }

>> >

>> > static struct dpdk_mp *

>> >-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> >+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

>> > {

>> >     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);

>> >     if (!dmp) {

>> >         return NULL;

>> >     }

>> >+    *mp_exists = false;

>> >     dmp->socket_id = dev->requested_socket_id;

>> >     dmp->mtu = mtu;

>> >     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);

>> >@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> >             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST

>> >             + MIN_NB_MBUF;

>> >

>> >-    bool mp_exists = false;

>> >-

>> >     do {

>> >         char *mp_name = dpdk_mp_name(dmp);

>> >

>> >         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "

>> >-                 "with %d Rx and %d Tx queues.",

>> >+                 "with %d Rx and %d Tx queues, socket id:%d.",

>> >                  dmp->mp_size, dev->up.name,

>> >-                 dev->requested_n_rxq, dev->requested_n_txq);

>> >+                 dev->requested_n_rxq, dev->requested_n_txq,

>> >+                 dev->requested_socket_id);

>> >

>> >         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,

>> >                                           MP_CACHE_SZ,

>> >@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> >             /* As the mempool create returned EEXIST we can expect the

>> >              * lookup has returned a valid pointer.  If for some reason

>> >              * that's not the case we keep track of it. */

>> >-            mp_exists = true;

>> >+            *mp_exists = true;

>> >         } else {

>> >             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",

>> >                      mp_name, dmp->mp_size);

>> >@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> >             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);

>> >             return dmp;

>> >         }

>> >-    } while (!mp_exists &&

>> >+    } while (!(*mp_exists) &&

>> >             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >= MIN_NB_MBUF));

>> >

>> >     rte_free(dmp);

>> >@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> > }

>> >

>> > static struct dpdk_mp *

>> >-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)

>> >+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

>> > {

>> >     struct dpdk_mp *dmp;

>> >

>> >     ovs_mutex_lock(&dpdk_mp_mutex);

>> >-    dmp = dpdk_mp_create(dev, mtu);

>> >+    dmp = dpdk_mp_create(dev, mtu, mp_exists);

>> >     ovs_mutex_unlock(&dpdk_mp_mutex);

>> >

>> >     return dmp;

>> >@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk

>*dev)

>>

>> Apologies for not noticing this in my previous review, but the comment for

>this

>> function needs to be updated.

>>

>> It currently reads:

>>    " /* Tries to allocate new mempool on requested_socket_id with

>> 	 * mbuf size corresponding to requested_mtu.

>> 	 * On success new configuration will be applied.

>> 	 * On error, device will be left unchanged. */"

>>

>> It should be updated to reflect the fact that it tries to allocate a new

>> mempool, or reuse an existing one, where appropriate.

>> Furthermore, if the error value is EEXIST, the new configuration (i.e. dev-

>> >mtu, dev->max_packet_len) is applied, so the device is not unchanged, as

>the

>> comment suggests.

>

>[Antonio] Thanks, I'll change that.

>

>

>>

>> > {

>> >     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);

>> >     struct dpdk_mp *mp;

>> >+    bool mp_exists;

>>

>> You don't need the 'mp_exists' variable.

>>

>> If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup() fails,

>then

>> dpdk_mp_create() returns NULL, which is already handled by

>> netdev_dpdk_mempool_configure(),

>> (marked as [1], below).

>

>[Antonio]

>If rte_mempool_lookup() fails a NULL is returned but rte_errno == ENOENT, so

>it's no more EEXIST.


Yes, agreed.

>This means that in netdev_dpdk_mempool_configure() I should check both error

>codes like:

>if (rte_errno == EEXIST || rte_errno == ENOENT)


You won't need to do that - hopefully the following will make things clearer.

Consider the callstack, starting from netdev_dpdk_mempool_configure:

[netdev_dpdk_mempool_configure]
 mp = dpdk_mp_get()

	[dpdk_mp_get]
	mp = dpdk_mp_create()

		[dpdk_mp_create]
		dmp->mp = rte_pktmbuf_pool_create(....)     
		# assume this returns NULL and sets rte_errno to EEXIST,
		# as an appropriate mempool already exists

			if (dmp->mp) {
				# skip
			} else if (rte_errno == EEXIST) {
				dmp->mp = rte_mempool_lookup(...)
				# There are two possible scenarios here -
				# let's name them A and B, respectively, and follow their paths.

--------------------------------------------------------------------------------------

				### Scenario A - rte_mempool_lookup() returns NULL and sets ###
				### rte_errno to ENOENT                                     ###  

				mp_exists = true;
      	     } else {
           			# skip
			}
			if (dmp->mp) {
				# skip
			}
		} while (!mp_exists) &&     # condition fails, as mp_exists is true
		...
		return NULL;

	[dpdk_mp_get]
	return NULL

[netdev_dpdk_mempool_configure]
if (!mp) {
	VLOG_ERR(...)  
	return rte_errno        # return ENONENT

-------------------------------------------------------------------------------------

				### Scenario B - rte_mempool_lookup() returns a pointer to ###
				### the mempool; rte_errno remains set to EEXIST           ###  
				mp_exists = true;
      	     } else {
           			# skip
			}
			if (dmp->mp) {
				return dmp;   # return a pointer to the mempool
			}
	[dpdk_mp_get]
	return a pointer to the mempool

[netdev_dpdk_mempool_configure]
if (!mp) {
	# skip
} else if (rte_errno == EEXIST) {
	dev->mtu = dev->requested_mtu;
	dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
	return rte_errno;              # this returns EEXIST
} 

>moreover who knows if in the future the rte_mempool_lookup() will be enriched

>to return

>even more error codes?


That point is moot IMO, as we should handle the API as it behaves currently, and not how it may do so at some potential future date.
Just my $0.02 ;)

>Also, I think it's good to rely on rte_errno as long as you test it

>immediately, I mean

>right after the API call. In the case of netdev_dpdk_mempool_configure() the

>rte_errno

>was updated by dpdk_mp_create(), which is 2 levels below in the call stack.

>That's why I'd prefer to keep track that we are re-using a mempool into an OVS

>variable

>and be less dependent by RTE return codes.


That value is set in the same thread in a defined call-stack; I don't see a disadvantage to referencing it here tbh.

>

>BTW I could have added mp_exists variable into struct dpdk_mp but I avoided

>that because

>that goes inside struct netdev_dpdk and I didn't want to increase its size. So

>I preferred

>to add it as a new input parameter for the functions.

>

>>

>> So, to handle the case where the mempool already exists - i.e. when in the

>> callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to EEXIST -

>you

>> only need to check rte_errno here;

>> (see [2], below).

>>

>> >

>> >-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));

>> >+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);

>>

>> [1]

>> >     if (!mp) {

>> >         VLOG_ERR("Failed to create memory pool for netdev "

>> >                  "%s, with MTU %d on socket %d: %s\n",

>> >                  dev->up.name, dev->requested_mtu, dev-

>>requested_socket_id,

>> >                  rte_strerror(rte_errno));

>> >         return rte_errno;

>> >+    } else if (mp_exists) {

>>

>> [2]

>> "if (rte_errno == EEXIST)" is sufficient here

>>

>> >+        /* If a new MTU was requested and its rounded value equals the one

>> >+         * that is currently used, then the existing mempool is returned.

>> >+         * Update dev with the new values. */

>> >+        dev->mtu = dev->requested_mtu;

>> >+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

>> >+        return EEXIST;

>>

>> Replace with "return rte_errno" for consistency with the previous return

>(since

>> the value of rte_errno is guaranteed to be EEXIST here).

>

>[Antonio]

>All the callers - like dpdk_vhost_reconfigure_helper() - behave depending on

>the

>outcome of rte_pktmbuf_pool_create(), that could return an error for

>insufficient

>memory for example.

>That's why in the previous return I'm returning rte_errno as it is. Instead

>here

>I'm overriding what could be returned by rte_mempool_lookup() and returning

>EEXIST

>because that's the information we want to report to the caller.


As described previously.

Thanks,
Mark

>

>

>>

>> >     } else {

>> >         dpdk_mp_put(dev->dpdk_mp);

>> >         dev->dpdk_mp = mp;

>> >@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)

>> >     rte_eth_dev_stop(dev->port_id);

>> >

>> >     err = netdev_dpdk_mempool_configure(dev);

>> >-    if (err) {

>> >+    if (err && err != EEXIST) {

>> >         goto out;

>> >     }

>> >

>> >@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk

>*dev)

>> >     netdev_dpdk_remap_txqs(dev);

>> >

>> >     err = netdev_dpdk_mempool_configure(dev);

>> >-    if (err) {

>> >-        return err;

>> >-    } else {

>> >+    if (!err) {

>> >+        /* A new mempool was created. */

>> >         netdev_change_seq_changed(&dev->up);

>> >+    } else if (err != EEXIST){

>> >+        return err;

>> >     }

>> >-

>> >     if (netdev_dpdk_get_vid(dev) >= 0) {

>> >         if (dev->vhost_reconfigured == false) {

>> >             dev->vhost_reconfigured = true;

>> >--

>> >2.4.11
Fischetti, Antonio Oct. 18, 2017, 9:43 a.m. UTC | #4
> -----Original Message-----

> From: Kavanagh, Mark B

> Sent: Tuesday, October 17, 2017 10:14 PM

> To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

> Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

> Darrell Ball <dlu998@gmail.com>

> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> mempools.

> 

> >From: Fischetti, Antonio

> >Sent: Tuesday, October 17, 2017 6:04 PM

> >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org

> >Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

> >Darrell Ball <dlu998@gmail.com>

> >Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> >mempools.

> >

> >Thanks Mark, comments inline.

> >

> >-Antonio

> >

> >> -----Original Message-----

> >> From: Kavanagh, Mark B

> >> Sent: Tuesday, October 17, 2017 2:34 PM

> >> To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

> >> Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

> >> Darrell Ball <dlu998@gmail.com>

> >> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> >> mempools.

> >>

> >> >From: Fischetti, Antonio

> >> >Sent: Monday, October 16, 2017 2:15 PM

> >> >To: dev@openvswitch.org

> >> >Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Kevin Traynor

> >> ><ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>; Darrell Ball

> >> ><dlu998@gmail.com>; Fischetti, Antonio <antonio.fischetti@intel.com>

> >> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> >mempools.

> >> >

> >> >Fix issues on reconfiguration of pre-existing mempools.

> >> >This patch avoids to call dpdk_mp_put() - and erroneously

> >> >release the mempool - when it already exists.

> >> >Create mempool names by considering also the NUMA socket number.

> >> >So a name reflects what socket the mempool is allocated on.

> >> >This change is needed for the NUMA-awareness feature.

> >>

> >> Hi Antonio,

> >>

> >> Is there any particular reason why you've combined patches 1 and 2 of the

> >> previous series in a single patch here?

> >>

> >> I would have thought that these two separate issues would warrant two

> >> individual patches (particularly with respect to the reported-by, tested-by

> >> tags).

> >

> >[Antonio]

> >I guess I misunderstood your previous review where you asked to squash patches

> >1 and 3 into one patch.

> 

> Hi Antonio,

> 

> I figured as much ;)

> 

> >I understood instead to squash the first 2 patches because they were both bug-

> >fixes.

> >In the next version v7 I'll restore the 2 separate patches.

> 

> Thanks - I think that's a much cleaner approach.

> 

> >

> >>

> >> Maybe it's not a big deal, but noted here nonetheless.

> >>

> >> Apart from that, there are some comments inline.

> >>

> >> Thanks again,

> >> Mark

> >>

> >> >

> >> >CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>

> >> >CC: Kevin Traynor <ktraynor@redhat.com>

> >> >CC: Aaron Conole <aconole@redhat.com>

> >> >CC: Darrell Ball <dlu998@gmail.com>

> >> >Reported-by: Ciara Loftus <ciara.loftus@intel.com>

> >> >Tested-by: Ciara Loftus <ciara.loftus@intel.com>

> >> >Reported-by: Róbert Mulik <robert.mulik@ericsson.com>

> >> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each

> >> >port.")

> >> >Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

> >> >---

> >> > Test on releasing pre-existing mempools

> >> > =======================================

> >> >I've tested this patch by

> >> >  - changing at run-time the number of Rx queues:

> >> >      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

> >> >

> >> >  - reducing the MTU of the dpdk ports of 1 byte to force

> >> >    the configuration of an existing mempool:

> >> >      ovs-vsctl set Interface dpdk0 mtu_request=1499

> >> >

> >> >This issue was observed in a PVP test topology with dpdkvhostuserclient

> >> >ports. It can happen also with dpdk type ports, eg by reducing the MTU

> >> >of 1 byte.

> >> >

> >> >To replicate the bug scenario in the PVP case it's sufficient to

> >> >set 1 dpdkvhostuserclient port, and just boot the VM.

> >> >

> >> >Below some more details on my own test setup.

> >> >

> >> > PVP test setup

> >> > --------------

> >> >CLIENT_SOCK_DIR=/tmp

> >> >SOCK0=dpdkvhostuser0

> >> >SOCK1=dpdkvhostuser1

> >> >

> >> >1 PMD

> >> >Add 2 dpdk ports, n_rxq=1

> >> >Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-

> >path

> >> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-

> >> >path="$CLIENT_SOCK_DIR/$SOCK0"

> >> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-

> >> >path="$CLIENT_SOCK_DIR/$SOCK1"

> >> >

> >> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1

> >> > add-flow br0 in_port=1,action=output:3

> >> > add-flow br0 in_port=3,action=output:1

> >> > add-flow br0 in_port=4,action=output:2

> >> > add-flow br0 in_port=2,action=output:4

> >> >

> >> > Launch QEMU

> >> > -----------

> >> >As OvS vhu ports are acting as clients, we must specify 'server' in the

> >next

> >> >command.

> >> >VM_IMAGE=<path/to/your/vm/image>

> >> >

> >> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name

> >us-

> >> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-

> >> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa

> >node,memdev=mem

> >> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev

> >> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-

> >> >user,id=mynet1,chardev=char0,vhostforce -device virtio-net-

> >> >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev

> >> >socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-

> >> >user,id=mynet2,chardev=char1,vhostforce -device virtio-net-

> >> >pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

> >> >

> >> > Expected behavior

> >> > -----------------

> >> >With this fix OvS shouldn't crash.

> >> >

> >> > Test NUMA-Awareness feature

> >> > ===========================

> >> >Mempool names now contains the requested socket id and become like:

> >> >"ovs_4adb057e_1_2030_20512".

> >> >

> >> >Tested with DPDK 17.05.2 (from dpdk-stable branch).

> >> >NUMA-awareness feature enabled (DPDK/config/common_base).

> >> >

> >> >Created 1 single dpdkvhostuser port type.

> >> >OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes

> >> >QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

> >> >

> >> > Before launching the VM:

> >> > ------------------------

> >> >ovs-appctl dpif-netdev/pmd-rxq-show

> >> >shows core #1 is serving the vhu port.

> >> >

> >> >pmd thread numa_id 0 core_id 1:

> >> >        isolated : false

> >> >        port: dpdkvhostuser0    queue-id: 0

> >> >

> >> > After launching the VM:

> >> > -----------------------

> >> >the vhu port is now managed by core #27

> >> >pmd thread numa_id 1 core_id 27:

> >> >        isolated : false

> >> >        port: dpdkvhostuser0    queue-id: 0

> >> >

> >> >and the log shows a new mempool is allocated on NUMA node 1, while

> >> >the previous one is deleted:

> >> >

> >> >2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated

> >> >"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs

> >> >2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing

> >> >"ovs_4adb057e_0_2030_20512" mempool

> >> >---

> >> > lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------

> >> > 1 file changed, 25 insertions(+), 17 deletions(-)

> >> >

> >> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

> >> >index c60f46f..7f2d7ed 100644

> >> >--- a/lib/netdev-dpdk.c

> >> >+++ b/lib/netdev-dpdk.c

> >> >@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> >> > {

> >> >     uint32_t h = hash_string(dmp->if_name, 0);

> >> >     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);

> >> >-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",

> >> >-                       h, dmp->mtu, dmp->mp_size);

> >> >+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",

> >> >+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);

> >> >     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {

> >> >         return NULL;

> >> >     }

> >> >@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> >> > }

> >> >

> >> > static struct dpdk_mp *

> >> >-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >> >+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> >> > {

> >> >     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);

> >> >     if (!dmp) {

> >> >         return NULL;

> >> >     }

> >> >+    *mp_exists = false;

> >> >     dmp->socket_id = dev->requested_socket_id;

> >> >     dmp->mtu = mtu;

> >> >     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);

> >> >@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >> >             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST

> >> >             + MIN_NB_MBUF;

> >> >

> >> >-    bool mp_exists = false;

> >> >-

> >> >     do {

> >> >         char *mp_name = dpdk_mp_name(dmp);

> >> >

> >> >         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "

> >> >-                 "with %d Rx and %d Tx queues.",

> >> >+                 "with %d Rx and %d Tx queues, socket id:%d.",

> >> >                  dmp->mp_size, dev->up.name,

> >> >-                 dev->requested_n_rxq, dev->requested_n_txq);

> >> >+                 dev->requested_n_rxq, dev->requested_n_txq,

> >> >+                 dev->requested_socket_id);

> >> >

> >> >         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,

> >> >                                           MP_CACHE_SZ,

> >> >@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >> >             /* As the mempool create returned EEXIST we can expect the

> >> >              * lookup has returned a valid pointer.  If for some reason

> >> >              * that's not the case we keep track of it. */

> >> >-            mp_exists = true;

> >> >+            *mp_exists = true;

> >> >         } else {

> >> >             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",

> >> >                      mp_name, dmp->mp_size);

> >> >@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >> >             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);

> >> >             return dmp;

> >> >         }

> >> >-    } while (!mp_exists &&

> >> >+    } while (!(*mp_exists) &&

> >> >             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >= MIN_NB_MBUF));

> >> >

> >> >     rte_free(dmp);

> >> >@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> >> > }

> >> >

> >> > static struct dpdk_mp *

> >> >-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)

> >> >+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> >> > {

> >> >     struct dpdk_mp *dmp;

> >> >

> >> >     ovs_mutex_lock(&dpdk_mp_mutex);

> >> >-    dmp = dpdk_mp_create(dev, mtu);

> >> >+    dmp = dpdk_mp_create(dev, mtu, mp_exists);

> >> >     ovs_mutex_unlock(&dpdk_mp_mutex);

> >> >

> >> >     return dmp;

> >> >@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk

> >*dev)

> >>

> >> Apologies for not noticing this in my previous review, but the comment for

> >this

> >> function needs to be updated.

> >>

> >> It currently reads:

> >>    " /* Tries to allocate new mempool on requested_socket_id with

> >> 	 * mbuf size corresponding to requested_mtu.

> >> 	 * On success new configuration will be applied.

> >> 	 * On error, device will be left unchanged. */"

> >>

> >> It should be updated to reflect the fact that it tries to allocate a new

> >> mempool, or reuse an existing one, where appropriate.

> >> Furthermore, if the error value is EEXIST, the new configuration (i.e. dev-

> >> >mtu, dev->max_packet_len) is applied, so the device is not unchanged, as

> >the

> >> comment suggests.

> >

> >[Antonio] Thanks, I'll change that.

> >

> >

> >>

> >> > {

> >> >     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);

> >> >     struct dpdk_mp *mp;

> >> >+    bool mp_exists;

> >>

> >> You don't need the 'mp_exists' variable.

> >>

> >> If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup() fails,

> >then

> >> dpdk_mp_create() returns NULL, which is already handled by

> >> netdev_dpdk_mempool_configure(),

> >> (marked as [1], below).

> >

> >[Antonio]

> >If rte_mempool_lookup() fails a NULL is returned but rte_errno == ENOENT, so

> >it's no more EEXIST.

> 

> Yes, agreed.

> 

> >This means that in netdev_dpdk_mempool_configure() I should check both error

> >codes like:

> >if (rte_errno == EEXIST || rte_errno == ENOENT)

> 

> You won't need to do that - hopefully the following will make things clearer.

> 

> Consider the callstack, starting from netdev_dpdk_mempool_configure:


[Antonio] I get your point now. 
Cool, I'll follow your suggestion. 
Thanks for all your explanation!


> 

> [netdev_dpdk_mempool_configure]

>  mp = dpdk_mp_get()

> 

> 	[dpdk_mp_get]

> 	mp = dpdk_mp_create()

> 

> 		[dpdk_mp_create]

> 		dmp->mp = rte_pktmbuf_pool_create(....)

> 		# assume this returns NULL and sets rte_errno to EEXIST,

> 		# as an appropriate mempool already exists

> 

> 			if (dmp->mp) {

> 				# skip

> 			} else if (rte_errno == EEXIST) {

> 				dmp->mp = rte_mempool_lookup(...)

> 				# There are two possible scenarios here -

> 				# let's name them A and B, respectively, and

> follow their paths.

> 

> -------------------------------------------------------------------------------

> -------

> 

> 				### Scenario A - rte_mempool_lookup() returns NULL

> and sets ###

> 				### rte_errno to ENOENT

> ###

> 

> 				mp_exists = true;

>       	     } else {

>            			# skip

> 			}

> 			if (dmp->mp) {

> 				# skip

> 			}

> 		} while (!mp_exists) &&     # condition fails, as mp_exists is

> true

> 		...

> 		return NULL;

> 

> 	[dpdk_mp_get]

> 	return NULL

> 

> [netdev_dpdk_mempool_configure]

> if (!mp) {

> 	VLOG_ERR(...)

> 	return rte_errno        # return ENONENT

> 

> -------------------------------------------------------------------------------

> ------

> 

> 				### Scenario B - rte_mempool_lookup() returns a

> pointer to ###

> 				### the mempool; rte_errno remains set to EEXIST

> ###

> 				mp_exists = true;

>       	     } else {

>            			# skip

> 			}

> 			if (dmp->mp) {

> 				return dmp;   # return a pointer to the mempool

> 			}

> 	[dpdk_mp_get]

> 	return a pointer to the mempool

> 

> [netdev_dpdk_mempool_configure]

> if (!mp) {

> 	# skip

> } else if (rte_errno == EEXIST) {

> 	dev->mtu = dev->requested_mtu;

> 	dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

> 	return rte_errno;              # this returns EEXIST

> }

> 

> >moreover who knows if in the future the rte_mempool_lookup() will be enriched

> >to return

> >even more error codes?

> 

> That point is moot IMO, as we should handle the API as it behaves currently,

> and not how it may do so at some potential future date.

> Just my $0.02 ;)

> 

> >Also, I think it's good to rely on rte_errno as long as you test it

> >immediately, I mean

> >right after the API call. In the case of netdev_dpdk_mempool_configure() the

> >rte_errno

> >was updated by dpdk_mp_create(), which is 2 levels below in the call stack.

> >That's why I'd prefer to keep track that we are re-using a mempool into an OVS

> >variable

> >and be less dependent by RTE return codes.

> 

> That value is set in the same thread in a defined call-stack; I don't see a

> disadvantage to referencing it here tbh.


[Antonio] ok.


> 

> >

> >BTW I could have added mp_exists variable into struct dpdk_mp but I avoided

> >that because

> >that goes inside struct netdev_dpdk and I didn't want to increase its size. So

> >I preferred

> >to add it as a new input parameter for the functions.

> >

> >>

> >> So, to handle the case where the mempool already exists - i.e. when in the

> >> callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to EEXIST -

> >you

> >> only need to check rte_errno here;

> >> (see [2], below).

> >>

> >> >

> >> >-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));

> >> >+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);

> >>

> >> [1]

> >> >     if (!mp) {

> >> >         VLOG_ERR("Failed to create memory pool for netdev "

> >> >                  "%s, with MTU %d on socket %d: %s\n",

> >> >                  dev->up.name, dev->requested_mtu, dev-

> >>requested_socket_id,

> >> >                  rte_strerror(rte_errno));

> >> >         return rte_errno;

> >> >+    } else if (mp_exists) {

> >>

> >> [2]

> >> "if (rte_errno == EEXIST)" is sufficient here

> >>

> >> >+        /* If a new MTU was requested and its rounded value equals the one

> >> >+         * that is currently used, then the existing mempool is returned.

> >> >+         * Update dev with the new values. */

> >> >+        dev->mtu = dev->requested_mtu;

> >> >+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

> >> >+        return EEXIST;

> >>

> >> Replace with "return rte_errno" for consistency with the previous return

> >(since

> >> the value of rte_errno is guaranteed to be EEXIST here).

> >

> >[Antonio]

> >All the callers - like dpdk_vhost_reconfigure_helper() - behave depending on

> >the

> >outcome of rte_pktmbuf_pool_create(), that could return an error for

> >insufficient

> >memory for example.

> >That's why in the previous return I'm returning rte_errno as it is. Instead

> >here

> >I'm overriding what could be returned by rte_mempool_lookup() and returning

> >EEXIST

> >because that's the information we want to report to the caller.

> 

> As described previously.


[Antonio] ok.


> 

> Thanks,

> Mark

> 

> >

> >

> >>

> >> >     } else {

> >> >         dpdk_mp_put(dev->dpdk_mp);

> >> >         dev->dpdk_mp = mp;

> >> >@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)

> >> >     rte_eth_dev_stop(dev->port_id);

> >> >

> >> >     err = netdev_dpdk_mempool_configure(dev);

> >> >-    if (err) {

> >> >+    if (err && err != EEXIST) {

> >> >         goto out;

> >> >     }

> >> >

> >> >@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk

> >*dev)

> >> >     netdev_dpdk_remap_txqs(dev);

> >> >

> >> >     err = netdev_dpdk_mempool_configure(dev);

> >> >-    if (err) {

> >> >-        return err;

> >> >-    } else {

> >> >+    if (!err) {

> >> >+        /* A new mempool was created. */

> >> >         netdev_change_seq_changed(&dev->up);

> >> >+    } else if (err != EEXIST){

> >> >+        return err;

> >> >     }

> >> >-

> >> >     if (netdev_dpdk_get_vid(dev) >= 0) {

> >> >         if (dev->vhost_reconfigured == false) {

> >> >             dev->vhost_reconfigured = true;

> >> >--

> >> >2.4.11
Fischetti, Antonio Oct. 18, 2017, 11:46 a.m. UTC | #5
> -----Original Message-----

> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org]

> On Behalf Of Fischetti, Antonio

> Sent: Wednesday, October 18, 2017 10:43 AM

> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org

> Subject: Re: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-

> existing mempools.

> 

> 

> 

> > -----Original Message-----

> > From: Kavanagh, Mark B

> > Sent: Tuesday, October 17, 2017 10:14 PM

> > To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

> > Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

> > Darrell Ball <dlu998@gmail.com>

> > Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> > mempools.

> >

> > >From: Fischetti, Antonio

> > >Sent: Tuesday, October 17, 2017 6:04 PM

> > >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org

> > >Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>;

> > >Darrell Ball <dlu998@gmail.com>

> > >Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> > >mempools.

> > >

> > >Thanks Mark, comments inline.

> > >

> > >-Antonio

> > >

> > >> -----Original Message-----

> > >> From: Kavanagh, Mark B

> > >> Sent: Tuesday, October 17, 2017 2:34 PM

> > >> To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

> > >> Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole

> <aconole@redhat.com>;

> > >> Darrell Ball <dlu998@gmail.com>

> > >> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> > >> mempools.

> > >>

> > >> >From: Fischetti, Antonio

> > >> >Sent: Monday, October 16, 2017 2:15 PM

> > >> >To: dev@openvswitch.org

> > >> >Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Kevin Traynor

> > >> ><ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>; Darrell Ball

> > >> ><dlu998@gmail.com>; Fischetti, Antonio <antonio.fischetti@intel.com>

> > >> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

> > >mempools.

> > >> >

> > >> >Fix issues on reconfiguration of pre-existing mempools.

> > >> >This patch avoids to call dpdk_mp_put() - and erroneously

> > >> >release the mempool - when it already exists.

> > >> >Create mempool names by considering also the NUMA socket number.

> > >> >So a name reflects what socket the mempool is allocated on.

> > >> >This change is needed for the NUMA-awareness feature.

> > >>

> > >> Hi Antonio,

> > >>

> > >> Is there any particular reason why you've combined patches 1 and 2 of the

> > >> previous series in a single patch here?

> > >>

> > >> I would have thought that these two separate issues would warrant two

> > >> individual patches (particularly with respect to the reported-by, tested-

> by

> > >> tags).

> > >

> > >[Antonio]

> > >I guess I misunderstood your previous review where you asked to squash

> patches

> > >1 and 3 into one patch.

> >

> > Hi Antonio,

> >

> > I figured as much ;)

> >

> > >I understood instead to squash the first 2 patches because they were both

> bug-

> > >fixes.

> > >In the next version v7 I'll restore the 2 separate patches.

> >

> > Thanks - I think that's a much cleaner approach.

> >

> > >

> > >>

> > >> Maybe it's not a big deal, but noted here nonetheless.

> > >>

> > >> Apart from that, there are some comments inline.

> > >>

> > >> Thanks again,

> > >> Mark

> > >>

> > >> >

> > >> >CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>

> > >> >CC: Kevin Traynor <ktraynor@redhat.com>

> > >> >CC: Aaron Conole <aconole@redhat.com>

> > >> >CC: Darrell Ball <dlu998@gmail.com>

> > >> >Reported-by: Ciara Loftus <ciara.loftus@intel.com>

> > >> >Tested-by: Ciara Loftus <ciara.loftus@intel.com>

> > >> >Reported-by: Róbert Mulik <robert.mulik@ericsson.com>

> > >> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each

> > >> >port.")

> > >> >Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

> > >> >---

> > >> > Test on releasing pre-existing mempools

> > >> > =======================================

> > >> >I've tested this patch by

> > >> >  - changing at run-time the number of Rx queues:

> > >> >      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

> > >> >

> > >> >  - reducing the MTU of the dpdk ports of 1 byte to force

> > >> >    the configuration of an existing mempool:

> > >> >      ovs-vsctl set Interface dpdk0 mtu_request=1499

> > >> >

> > >> >This issue was observed in a PVP test topology with dpdkvhostuserclient

> > >> >ports. It can happen also with dpdk type ports, eg by reducing the MTU

> > >> >of 1 byte.

> > >> >

> > >> >To replicate the bug scenario in the PVP case it's sufficient to

> > >> >set 1 dpdkvhostuserclient port, and just boot the VM.

> > >> >

> > >> >Below some more details on my own test setup.

> > >> >

> > >> > PVP test setup

> > >> > --------------

> > >> >CLIENT_SOCK_DIR=/tmp

> > >> >SOCK0=dpdkvhostuser0

> > >> >SOCK1=dpdkvhostuser1

> > >> >

> > >> >1 PMD

> > >> >Add 2 dpdk ports, n_rxq=1

> > >> >Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-

> server-

> > >path

> > >> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-

> > >> >path="$CLIENT_SOCK_DIR/$SOCK0"

> > >> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-

> > >> >path="$CLIENT_SOCK_DIR/$SOCK1"

> > >> >

> > >> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1

> > >> > add-flow br0 in_port=1,action=output:3

> > >> > add-flow br0 in_port=3,action=output:1

> > >> > add-flow br0 in_port=4,action=output:2

> > >> > add-flow br0 in_port=2,action=output:4

> > >> >

> > >> > Launch QEMU

> > >> > -----------

> > >> >As OvS vhu ports are acting as clients, we must specify 'server' in the

> > >next

> > >> >command.

> > >> >VM_IMAGE=<path/to/your/vm/image>

> > >> >

> > >> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name

> > >us-

> > >> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-

> > >> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa

> > >node,memdev=mem

> > >> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev

> > >> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-

> > >> >user,id=mynet1,chardev=char0,vhostforce -device virtio-net-

> > >> >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev

> > >> >socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-

> > >> >user,id=mynet2,chardev=char1,vhostforce -device virtio-net-

> > >> >pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

> > >> >

> > >> > Expected behavior

> > >> > -----------------

> > >> >With this fix OvS shouldn't crash.

> > >> >

> > >> > Test NUMA-Awareness feature

> > >> > ===========================

> > >> >Mempool names now contains the requested socket id and become like:

> > >> >"ovs_4adb057e_1_2030_20512".

> > >> >

> > >> >Tested with DPDK 17.05.2 (from dpdk-stable branch).

> > >> >NUMA-awareness feature enabled (DPDK/config/common_base).

> > >> >

> > >> >Created 1 single dpdkvhostuser port type.

> > >> >OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes

> > >> >QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

> > >> >

> > >> > Before launching the VM:

> > >> > ------------------------

> > >> >ovs-appctl dpif-netdev/pmd-rxq-show

> > >> >shows core #1 is serving the vhu port.

> > >> >

> > >> >pmd thread numa_id 0 core_id 1:

> > >> >        isolated : false

> > >> >        port: dpdkvhostuser0    queue-id: 0

> > >> >

> > >> > After launching the VM:

> > >> > -----------------------

> > >> >the vhu port is now managed by core #27

> > >> >pmd thread numa_id 1 core_id 27:

> > >> >        isolated : false

> > >> >        port: dpdkvhostuser0    queue-id: 0

> > >> >

> > >> >and the log shows a new mempool is allocated on NUMA node 1, while

> > >> >the previous one is deleted:

> > >> >

> > >> >2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated

> > >> >"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs

> > >> >2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing

> > >> >"ovs_4adb057e_0_2030_20512" mempool

> > >> >---

> > >> > lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------

> > >> > 1 file changed, 25 insertions(+), 17 deletions(-)

> > >> >

> > >> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

> > >> >index c60f46f..7f2d7ed 100644

> > >> >--- a/lib/netdev-dpdk.c

> > >> >+++ b/lib/netdev-dpdk.c

> > >> >@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> > >> > {

> > >> >     uint32_t h = hash_string(dmp->if_name, 0);

> > >> >     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);

> > >> >-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",

> > >> >-                       h, dmp->mtu, dmp->mp_size);

> > >> >+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",

> > >> >+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);

> > >> >     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {

> > >> >         return NULL;

> > >> >     }

> > >> >@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)

> > >> > }

> > >> >

> > >> > static struct dpdk_mp *

> > >> >-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> > >> >+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> > >> > {

> > >> >     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);

> > >> >     if (!dmp) {

> > >> >         return NULL;

> > >> >     }

> > >> >+    *mp_exists = false;

> > >> >     dmp->socket_id = dev->requested_socket_id;

> > >> >     dmp->mtu = mtu;

> > >> >     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);

> > >> >@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> > >> >             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) *

> NETDEV_MAX_BURST

> > >> >             + MIN_NB_MBUF;

> > >> >

> > >> >-    bool mp_exists = false;

> > >> >-

> > >> >     do {

> > >> >         char *mp_name = dpdk_mp_name(dmp);

> > >> >

> > >> >         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "

> > >> >-                 "with %d Rx and %d Tx queues.",

> > >> >+                 "with %d Rx and %d Tx queues, socket id:%d.",

> > >> >                  dmp->mp_size, dev->up.name,

> > >> >-                 dev->requested_n_rxq, dev->requested_n_txq);

> > >> >+                 dev->requested_n_rxq, dev->requested_n_txq,

> > >> >+                 dev->requested_socket_id);

> > >> >

> > >> >         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,

> > >> >                                           MP_CACHE_SZ,

> > >> >@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> > >> >             /* As the mempool create returned EEXIST we can expect the

> > >> >              * lookup has returned a valid pointer.  If for some reason

> > >> >              * that's not the case we keep track of it. */

> > >> >-            mp_exists = true;

> > >> >+            *mp_exists = true;

> > >> >         } else {

> > >> >             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",

> > >> >                      mp_name, dmp->mp_size);

> > >> >@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> > >> >             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);

> > >> >             return dmp;

> > >> >         }

> > >> >-    } while (!mp_exists &&

> > >> >+    } while (!(*mp_exists) &&

> > >> >             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >=

> MIN_NB_MBUF));

> > >> >

> > >> >     rte_free(dmp);

> > >> >@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

> > >> > }

> > >> >

> > >> > static struct dpdk_mp *

> > >> >-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)

> > >> >+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

> > >> > {

> > >> >     struct dpdk_mp *dmp;

> > >> >

> > >> >     ovs_mutex_lock(&dpdk_mp_mutex);

> > >> >-    dmp = dpdk_mp_create(dev, mtu);

> > >> >+    dmp = dpdk_mp_create(dev, mtu, mp_exists);

> > >> >     ovs_mutex_unlock(&dpdk_mp_mutex);

> > >> >

> > >> >     return dmp;

> > >> >@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk

> > >*dev)

> > >>

> > >> Apologies for not noticing this in my previous review, but the comment for

> > >this

> > >> function needs to be updated.

> > >>

> > >> It currently reads:

> > >>    " /* Tries to allocate new mempool on requested_socket_id with

> > >> 	 * mbuf size corresponding to requested_mtu.

> > >> 	 * On success new configuration will be applied.

> > >> 	 * On error, device will be left unchanged. */"

> > >>

> > >> It should be updated to reflect the fact that it tries to allocate a new

> > >> mempool, or reuse an existing one, where appropriate.

> > >> Furthermore, if the error value is EEXIST, the new configuration (i.e.

> dev-

> > >> >mtu, dev->max_packet_len) is applied, so the device is not unchanged, as

> > >the

> > >> comment suggests.

> > >

> > >[Antonio] Thanks, I'll change that.

> > >

> > >

> > >>

> > >> > {

> > >> >     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);

> > >> >     struct dpdk_mp *mp;

> > >> >+    bool mp_exists;

> > >>

> > >> You don't need the 'mp_exists' variable.

> > >>

> > >> If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup() fails,

> > >then

> > >> dpdk_mp_create() returns NULL, which is already handled by

> > >> netdev_dpdk_mempool_configure(),

> > >> (marked as [1], below).

> > >

> > >[Antonio]

> > >If rte_mempool_lookup() fails a NULL is returned but rte_errno == ENOENT, so

> > >it's no more EEXIST.

> >

> > Yes, agreed.

> >

> > >This means that in netdev_dpdk_mempool_configure() I should check both error

> > >codes like:

> > >if (rte_errno == EEXIST || rte_errno == ENOENT)

> >

> > You won't need to do that - hopefully the following will make things clearer.

> >

> > Consider the callstack, starting from netdev_dpdk_mempool_configure:

> 

> [Antonio] I get your point now.

> Cool, I'll follow your suggestion.

> Thanks for all your explanation!


[Antonio] Hmm, unforturnately it doesn't work because rte_errno is
updated just in case an API call failed.
If an API call doesn't fail, then rte_errno is not updated and still contains 
an old value.
In fact I've seen cases where I requested a bigger MTU - so to force the creation 
of a new mempool - and I may have that even rte_pktmbuf_pool_create() is successful
and does create a new mempool, rte_errno is still EEXIST (=17) because it refers to a previous call to rte_pktmbuf_pool_create() that failed.

So, in netdev_dpdk_mempool_configure() even if 
	mp = dpdk_mp_get(...

is returning a valid mp pointer, I can't rely on rte_errno value to distinguish
a brand new mp from a pre-existing mp.


> 

> 

> >

> > [netdev_dpdk_mempool_configure]

> >  mp = dpdk_mp_get()

> >

> > 	[dpdk_mp_get]

> > 	mp = dpdk_mp_create()

> >

> > 		[dpdk_mp_create]

> > 		dmp->mp = rte_pktmbuf_pool_create(....)

> > 		# assume this returns NULL and sets rte_errno to EEXIST,

> > 		# as an appropriate mempool already exists

> >

> > 			if (dmp->mp) {

> > 				# skip

> > 			} else if (rte_errno == EEXIST) {

> > 				dmp->mp = rte_mempool_lookup(...)

> > 				# There are two possible scenarios here -

> > 				# let's name them A and B, respectively, and

> > follow their paths.

> >

> > -----------------------------------------------------------------------------

> --

> > -------

> >

> > 				### Scenario A - rte_mempool_lookup() returns NULL

> > and sets ###

> > 				### rte_errno to ENOENT

> > ###

> >

> > 				mp_exists = true;

> >       	     } else {

> >            			# skip

> > 			}

> > 			if (dmp->mp) {

> > 				# skip

> > 			}

> > 		} while (!mp_exists) &&     # condition fails, as mp_exists is

> > true

> > 		...

> > 		return NULL;

> >

> > 	[dpdk_mp_get]

> > 	return NULL

> >

> > [netdev_dpdk_mempool_configure]

> > if (!mp) {

> > 	VLOG_ERR(...)

> > 	return rte_errno        # return ENONENT

> >

> > -----------------------------------------------------------------------------

> --

> > ------

> >

> > 				### Scenario B - rte_mempool_lookup() returns a

> > pointer to ###

> > 				### the mempool; rte_errno remains set to EEXIST

> > ###

> > 				mp_exists = true;

> >       	     } else {

> >            			# skip

> > 			}

> > 			if (dmp->mp) {

> > 				return dmp;   # return a pointer to the mempool

> > 			}

> > 	[dpdk_mp_get]

> > 	return a pointer to the mempool

> >

> > [netdev_dpdk_mempool_configure]

> > if (!mp) {

> > 	# skip

> > } else if (rte_errno == EEXIST) {

> > 	dev->mtu = dev->requested_mtu;

> > 	dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

> > 	return rte_errno;              # this returns EEXIST

> > }

> >

> > >moreover who knows if in the future the rte_mempool_lookup() will be

> enriched

> > >to return

> > >even more error codes?

> >

> > That point is moot IMO, as we should handle the API as it behaves currently,

> > and not how it may do so at some potential future date.

> > Just my $0.02 ;)

> >

> > >Also, I think it's good to rely on rte_errno as long as you test it

> > >immediately, I mean

> > >right after the API call. In the case of netdev_dpdk_mempool_configure() the

> > >rte_errno

> > >was updated by dpdk_mp_create(), which is 2 levels below in the call stack.

> > >That's why I'd prefer to keep track that we are re-using a mempool into an

> OVS

> > >variable

> > >and be less dependent by RTE return codes.

> >

> > That value is set in the same thread in a defined call-stack; I don't see a

> > disadvantage to referencing it here tbh.

> 

> [Antonio] ok.

> 

> 

> >

> > >

> > >BTW I could have added mp_exists variable into struct dpdk_mp but I avoided

> > >that because

> > >that goes inside struct netdev_dpdk and I didn't want to increase its size.

> So

> > >I preferred

> > >to add it as a new input parameter for the functions.

> > >

> > >>

> > >> So, to handle the case where the mempool already exists - i.e. when in the

> > >> callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to EEXIST -

> > >you

> > >> only need to check rte_errno here;

> > >> (see [2], below).

> > >>

> > >> >

> > >> >-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));

> > >> >+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);

> > >>

> > >> [1]

> > >> >     if (!mp) {

> > >> >         VLOG_ERR("Failed to create memory pool for netdev "

> > >> >                  "%s, with MTU %d on socket %d: %s\n",

> > >> >                  dev->up.name, dev->requested_mtu, dev-

> > >>requested_socket_id,

> > >> >                  rte_strerror(rte_errno));

> > >> >         return rte_errno;

> > >> >+    } else if (mp_exists) {

> > >>

> > >> [2]

> > >> "if (rte_errno == EEXIST)" is sufficient here

> > >>

> > >> >+        /* If a new MTU was requested and its rounded value equals the

> one

> > >> >+         * that is currently used, then the existing mempool is

> returned.

> > >> >+         * Update dev with the new values. */

> > >> >+        dev->mtu = dev->requested_mtu;

> > >> >+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

> > >> >+        return EEXIST;

> > >>

> > >> Replace with "return rte_errno" for consistency with the previous return

> > >(since

> > >> the value of rte_errno is guaranteed to be EEXIST here).

> > >

> > >[Antonio]

> > >All the callers - like dpdk_vhost_reconfigure_helper() - behave depending on

> > >the

> > >outcome of rte_pktmbuf_pool_create(), that could return an error for

> > >insufficient

> > >memory for example.

> > >That's why in the previous return I'm returning rte_errno as it is. Instead

> > >here

> > >I'm overriding what could be returned by rte_mempool_lookup() and returning

> > >EEXIST

> > >because that's the information we want to report to the caller.

> >

> > As described previously.

> 

> [Antonio] ok.

> 

> 

> >

> > Thanks,

> > Mark

> >

> > >

> > >

> > >>

> > >> >     } else {

> > >> >         dpdk_mp_put(dev->dpdk_mp);

> > >> >         dev->dpdk_mp = mp;

> > >> >@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)

> > >> >     rte_eth_dev_stop(dev->port_id);

> > >> >

> > >> >     err = netdev_dpdk_mempool_configure(dev);

> > >> >-    if (err) {

> > >> >+    if (err && err != EEXIST) {

> > >> >         goto out;

> > >> >     }

> > >> >

> > >> >@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk

> > >*dev)

> > >> >     netdev_dpdk_remap_txqs(dev);

> > >> >

> > >> >     err = netdev_dpdk_mempool_configure(dev);

> > >> >-    if (err) {

> > >> >-        return err;

> > >> >-    } else {

> > >> >+    if (!err) {

> > >> >+        /* A new mempool was created. */

> > >> >         netdev_change_seq_changed(&dev->up);

> > >> >+    } else if (err != EEXIST){

> > >> >+        return err;

> > >> >     }

> > >> >-

> > >> >     if (netdev_dpdk_get_vid(dev) >= 0) {

> > >> >         if (dev->vhost_reconfigured == false) {

> > >> >             dev->vhost_reconfigured = true;

> > >> >--

> > >> >2.4.11

> 

> _______________________________________________

> dev mailing list

> dev@openvswitch.org

> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Mark Kavanagh Oct. 18, 2017, 11:52 a.m. UTC | #6
>From: Fischetti, Antonio

>Sent: Wednesday, October 18, 2017 12:47 PM

>To: Fischetti, Antonio <antonio.fischetti@intel.com>; Kavanagh, Mark B

><mark.b.kavanagh@intel.com>; dev@openvswitch.org

>Subject: RE: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-

>existing mempools.

>

>

>

>> -----Original Message-----

>> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-

>bounces@openvswitch.org]

>> On Behalf Of Fischetti, Antonio

>> Sent: Wednesday, October 18, 2017 10:43 AM

>> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org

>> Subject: Re: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-

>> existing mempools.

>>

>>

>>

>> > -----Original Message-----

>> > From: Kavanagh, Mark B

>> > Sent: Tuesday, October 17, 2017 10:14 PM

>> > To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org

>> > Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole

><aconole@redhat.com>;

>> > Darrell Ball <dlu998@gmail.com>

>> > Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>> > mempools.

>> >

>> > >From: Fischetti, Antonio

>> > >Sent: Tuesday, October 17, 2017 6:04 PM

>> > >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org

>> > >Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole

><aconole@redhat.com>;

>> > >Darrell Ball <dlu998@gmail.com>

>> > >Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>> > >mempools.

>> > >

>> > >Thanks Mark, comments inline.

>> > >

>> > >-Antonio

>> > >

>> > >> -----Original Message-----

>> > >> From: Kavanagh, Mark B

>> > >> Sent: Tuesday, October 17, 2017 2:34 PM

>> > >> To: Fischetti, Antonio <antonio.fischetti@intel.com>;

>dev@openvswitch.org

>> > >> Cc: Kevin Traynor <ktraynor@redhat.com>; Aaron Conole

>> <aconole@redhat.com>;

>> > >> Darrell Ball <dlu998@gmail.com>

>> > >> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>> > >> mempools.

>> > >>

>> > >> >From: Fischetti, Antonio

>> > >> >Sent: Monday, October 16, 2017 2:15 PM

>> > >> >To: dev@openvswitch.org

>> > >> >Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Kevin Traynor

>> > >> ><ktraynor@redhat.com>; Aaron Conole <aconole@redhat.com>; Darrell Ball

>> > >> ><dlu998@gmail.com>; Fischetti, Antonio <antonio.fischetti@intel.com>

>> > >> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing

>> > >mempools.

>> > >> >

>> > >> >Fix issues on reconfiguration of pre-existing mempools.

>> > >> >This patch avoids to call dpdk_mp_put() - and erroneously

>> > >> >release the mempool - when it already exists.

>> > >> >Create mempool names by considering also the NUMA socket number.

>> > >> >So a name reflects what socket the mempool is allocated on.

>> > >> >This change is needed for the NUMA-awareness feature.

>> > >>

>> > >> Hi Antonio,

>> > >>

>> > >> Is there any particular reason why you've combined patches 1 and 2 of

>the

>> > >> previous series in a single patch here?

>> > >>

>> > >> I would have thought that these two separate issues would warrant two

>> > >> individual patches (particularly with respect to the reported-by,

>tested-

>> by

>> > >> tags).

>> > >

>> > >[Antonio]

>> > >I guess I misunderstood your previous review where you asked to squash

>> patches

>> > >1 and 3 into one patch.

>> >

>> > Hi Antonio,

>> >

>> > I figured as much ;)

>> >

>> > >I understood instead to squash the first 2 patches because they were both

>> bug-

>> > >fixes.

>> > >In the next version v7 I'll restore the 2 separate patches.

>> >

>> > Thanks - I think that's a much cleaner approach.

>> >

>> > >

>> > >>

>> > >> Maybe it's not a big deal, but noted here nonetheless.

>> > >>

>> > >> Apart from that, there are some comments inline.

>> > >>

>> > >> Thanks again,

>> > >> Mark

>> > >>

>> > >> >

>> > >> >CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>

>> > >> >CC: Kevin Traynor <ktraynor@redhat.com>

>> > >> >CC: Aaron Conole <aconole@redhat.com>

>> > >> >CC: Darrell Ball <dlu998@gmail.com>

>> > >> >Reported-by: Ciara Loftus <ciara.loftus@intel.com>

>> > >> >Tested-by: Ciara Loftus <ciara.loftus@intel.com>

>> > >> >Reported-by: Róbert Mulik <robert.mulik@ericsson.com>

>> > >> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for

>each

>> > >> >port.")

>> > >> >Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

>> > >> >---

>> > >> > Test on releasing pre-existing mempools

>> > >> > =======================================

>> > >> >I've tested this patch by

>> > >> >  - changing at run-time the number of Rx queues:

>> > >> >      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4

>> > >> >

>> > >> >  - reducing the MTU of the dpdk ports of 1 byte to force

>> > >> >    the configuration of an existing mempool:

>> > >> >      ovs-vsctl set Interface dpdk0 mtu_request=1499

>> > >> >

>> > >> >This issue was observed in a PVP test topology with

>dpdkvhostuserclient

>> > >> >ports. It can happen also with dpdk type ports, eg by reducing the MTU

>> > >> >of 1 byte.

>> > >> >

>> > >> >To replicate the bug scenario in the PVP case it's sufficient to

>> > >> >set 1 dpdkvhostuserclient port, and just boot the VM.

>> > >> >

>> > >> >Below some more details on my own test setup.

>> > >> >

>> > >> > PVP test setup

>> > >> > --------------

>> > >> >CLIENT_SOCK_DIR=/tmp

>> > >> >SOCK0=dpdkvhostuser0

>> > >> >SOCK1=dpdkvhostuser1

>> > >> >

>> > >> >1 PMD

>> > >> >Add 2 dpdk ports, n_rxq=1

>> > >> >Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-

>> server-

>> > >path

>> > >> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-

>> > >> >path="$CLIENT_SOCK_DIR/$SOCK0"

>> > >> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-

>> > >> >path="$CLIENT_SOCK_DIR/$SOCK1"

>> > >> >

>> > >> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1

>> > >> > add-flow br0 in_port=1,action=output:3

>> > >> > add-flow br0 in_port=3,action=output:1

>> > >> > add-flow br0 in_port=4,action=output:2

>> > >> > add-flow br0 in_port=2,action=output:4

>> > >> >

>> > >> > Launch QEMU

>> > >> > -----------

>> > >> >As OvS vhu ports are acting as clients, we must specify 'server' in

>the

>> > >next

>> > >> >command.

>> > >> >VM_IMAGE=<path/to/your/vm/image>

>> > >> >

>> > >> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -

>name

>> > >us-

>> > >> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-

>> > >> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa

>> > >node,memdev=mem

>> > >> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev

>> > >> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev

>type=vhost-

>> > >> >user,id=mynet1,chardev=char0,vhostforce -device virtio-net-

>> > >> >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev

>> > >> >socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev

>type=vhost-

>> > >> >user,id=mynet2,chardev=char1,vhostforce -device virtio-net-

>> > >> >pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic

>> > >> >

>> > >> > Expected behavior

>> > >> > -----------------

>> > >> >With this fix OvS shouldn't crash.

>> > >> >

>> > >> > Test NUMA-Awareness feature

>> > >> > ===========================

>> > >> >Mempool names now contains the requested socket id and become like:

>> > >> >"ovs_4adb057e_1_2030_20512".

>> > >> >

>> > >> >Tested with DPDK 17.05.2 (from dpdk-stable branch).

>> > >> >NUMA-awareness feature enabled (DPDK/config/common_base).

>> > >> >

>> > >> >Created 1 single dpdkvhostuser port type.

>> > >> >OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes

>> > >> >QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only

>> > >> >

>> > >> > Before launching the VM:

>> > >> > ------------------------

>> > >> >ovs-appctl dpif-netdev/pmd-rxq-show

>> > >> >shows core #1 is serving the vhu port.

>> > >> >

>> > >> >pmd thread numa_id 0 core_id 1:

>> > >> >        isolated : false

>> > >> >        port: dpdkvhostuser0    queue-id: 0

>> > >> >

>> > >> > After launching the VM:

>> > >> > -----------------------

>> > >> >the vhu port is now managed by core #27

>> > >> >pmd thread numa_id 1 core_id 27:

>> > >> >        isolated : false

>> > >> >        port: dpdkvhostuser0    queue-id: 0

>> > >> >

>> > >> >and the log shows a new mempool is allocated on NUMA node 1, while

>> > >> >the previous one is deleted:

>> > >> >

>> > >> >2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated

>> > >> >"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs

>> > >> >2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing

>> > >> >"ovs_4adb057e_0_2030_20512" mempool

>> > >> >---

>> > >> > lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------

>> > >> > 1 file changed, 25 insertions(+), 17 deletions(-)

>> > >> >

>> > >> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

>> > >> >index c60f46f..7f2d7ed 100644

>> > >> >--- a/lib/netdev-dpdk.c

>> > >> >+++ b/lib/netdev-dpdk.c

>> > >> >@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)

>> > >> > {

>> > >> >     uint32_t h = hash_string(dmp->if_name, 0);

>> > >> >     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);

>> > >> >-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",

>> > >> >-                       h, dmp->mtu, dmp->mp_size);

>> > >> >+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,

>"ovs_%x_%d_%d_%u",

>> > >> >+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);

>> > >> >     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {

>> > >> >         return NULL;

>> > >> >     }

>> > >> >@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)

>> > >> > }

>> > >> >

>> > >> > static struct dpdk_mp *

>> > >> >-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> > >> >+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

>> > >> > {

>> > >> >     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);

>> > >> >     if (!dmp) {

>> > >> >         return NULL;

>> > >> >     }

>> > >> >+    *mp_exists = false;

>> > >> >     dmp->socket_id = dev->requested_socket_id;

>> > >> >     dmp->mtu = mtu;

>> > >> >     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);

>> > >> >@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> > >> >             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) *

>> NETDEV_MAX_BURST

>> > >> >             + MIN_NB_MBUF;

>> > >> >

>> > >> >-    bool mp_exists = false;

>> > >> >-

>> > >> >     do {

>> > >> >         char *mp_name = dpdk_mp_name(dmp);

>> > >> >

>> > >> >         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "

>> > >> >-                 "with %d Rx and %d Tx queues.",

>> > >> >+                 "with %d Rx and %d Tx queues, socket id:%d.",

>> > >> >                  dmp->mp_size, dev->up.name,

>> > >> >-                 dev->requested_n_rxq, dev->requested_n_txq);

>> > >> >+                 dev->requested_n_rxq, dev->requested_n_txq,

>> > >> >+                 dev->requested_socket_id);

>> > >> >

>> > >> >         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,

>> > >> >                                           MP_CACHE_SZ,

>> > >> >@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> > >> >             /* As the mempool create returned EEXIST we can expect

>the

>> > >> >              * lookup has returned a valid pointer.  If for some

>reason

>> > >> >              * that's not the case we keep track of it. */

>> > >> >-            mp_exists = true;

>> > >> >+            *mp_exists = true;

>> > >> >         } else {

>> > >> >             VLOG_ERR("Failed mempool \"%s\" create request of %u

>mbufs",

>> > >> >                      mp_name, dmp->mp_size);

>> > >> >@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> > >> >             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init,

>NULL);

>> > >> >             return dmp;

>> > >> >         }

>> > >> >-    } while (!mp_exists &&

>> > >> >+    } while (!(*mp_exists) &&

>> > >> >             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >=

>> MIN_NB_MBUF));

>> > >> >

>> > >> >     rte_free(dmp);

>> > >> >@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)

>> > >> > }

>> > >> >

>> > >> > static struct dpdk_mp *

>> > >> >-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)

>> > >> >+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)

>> > >> > {

>> > >> >     struct dpdk_mp *dmp;

>> > >> >

>> > >> >     ovs_mutex_lock(&dpdk_mp_mutex);

>> > >> >-    dmp = dpdk_mp_create(dev, mtu);

>> > >> >+    dmp = dpdk_mp_create(dev, mtu, mp_exists);

>> > >> >     ovs_mutex_unlock(&dpdk_mp_mutex);

>> > >> >

>> > >> >     return dmp;

>> > >> >@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk

>> > >*dev)

>> > >>

>> > >> Apologies for not noticing this in my previous review, but the comment

>for

>> > >this

>> > >> function needs to be updated.

>> > >>

>> > >> It currently reads:

>> > >>    " /* Tries to allocate new mempool on requested_socket_id with

>> > >> 	 * mbuf size corresponding to requested_mtu.

>> > >> 	 * On success new configuration will be applied.

>> > >> 	 * On error, device will be left unchanged. */"

>> > >>

>> > >> It should be updated to reflect the fact that it tries to allocate a

>new

>> > >> mempool, or reuse an existing one, where appropriate.

>> > >> Furthermore, if the error value is EEXIST, the new configuration (i.e.

>> dev-

>> > >> >mtu, dev->max_packet_len) is applied, so the device is not unchanged,

>as

>> > >the

>> > >> comment suggests.

>> > >

>> > >[Antonio] Thanks, I'll change that.

>> > >

>> > >

>> > >>

>> > >> > {

>> > >> >     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);

>> > >> >     struct dpdk_mp *mp;

>> > >> >+    bool mp_exists;

>> > >>

>> > >> You don't need the 'mp_exists' variable.

>> > >>

>> > >> If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup()

>fails,

>> > >then

>> > >> dpdk_mp_create() returns NULL, which is already handled by

>> > >> netdev_dpdk_mempool_configure(),

>> > >> (marked as [1], below).

>> > >

>> > >[Antonio]

>> > >If rte_mempool_lookup() fails a NULL is returned but rte_errno == ENOENT,

>so

>> > >it's no more EEXIST.

>> >

>> > Yes, agreed.

>> >

>> > >This means that in netdev_dpdk_mempool_configure() I should check both

>error

>> > >codes like:

>> > >if (rte_errno == EEXIST || rte_errno == ENOENT)

>> >

>> > You won't need to do that - hopefully the following will make things

>clearer.

>> >

>> > Consider the callstack, starting from netdev_dpdk_mempool_configure:

>>

>> [Antonio] I get your point now.

>> Cool, I'll follow your suggestion.

>> Thanks for all your explanation!

>

>[Antonio] Hmm, unforturnately it doesn't work because rte_errno is

>updated just in case an API call failed.

>If an API call doesn't fail, then rte_errno is not updated and still contains

>an old value.

>In fact I've seen cases where I requested a bigger MTU - so to force the

>creation

>of a new mempool - and I may have that even rte_pktmbuf_pool_create() is

>successful

>and does create a new mempool, rte_errno is still EEXIST (=17) because it

>refers to a previous call to rte_pktmbuf_pool_create() that failed.

>

>So, in netdev_dpdk_mempool_configure() even if

>	mp = dpdk_mp_get(...

>

>is returning a valid mp pointer, I can't rely on rte_errno value to

>distinguish

>a brand new mp from a pre-existing mp.


Ah, I see. In that case, I'm happy to go with your original approach - thanks for checking this out nonetheless.
-Mark


>

>

>>

>>

>> >

>> > [netdev_dpdk_mempool_configure]

>> >  mp = dpdk_mp_get()

>> >

>> > 	[dpdk_mp_get]

>> > 	mp = dpdk_mp_create()

>> >

>> > 		[dpdk_mp_create]

>> > 		dmp->mp = rte_pktmbuf_pool_create(....)

>> > 		# assume this returns NULL and sets rte_errno to EEXIST,

>> > 		# as an appropriate mempool already exists

>> >

>> > 			if (dmp->mp) {

>> > 				# skip

>> > 			} else if (rte_errno == EEXIST) {

>> > 				dmp->mp = rte_mempool_lookup(...)

>> > 				# There are two possible scenarios here -

>> > 				# let's name them A and B, respectively, and

>> > follow their paths.

>> >

>> > --------------------------------------------------------------------------

>---

>> --

>> > -------

>> >

>> > 				### Scenario A - rte_mempool_lookup() returns NULL

>> > and sets ###

>> > 				### rte_errno to ENOENT

>> > ###

>> >

>> > 				mp_exists = true;

>> >       	     } else {

>> >            			# skip

>> > 			}

>> > 			if (dmp->mp) {

>> > 				# skip

>> > 			}

>> > 		} while (!mp_exists) &&     # condition fails, as mp_exists is

>> > true

>> > 		...

>> > 		return NULL;

>> >

>> > 	[dpdk_mp_get]

>> > 	return NULL

>> >

>> > [netdev_dpdk_mempool_configure]

>> > if (!mp) {

>> > 	VLOG_ERR(...)

>> > 	return rte_errno        # return ENONENT

>> >

>> > --------------------------------------------------------------------------

>---

>> --

>> > ------

>> >

>> > 				### Scenario B - rte_mempool_lookup() returns a

>> > pointer to ###

>> > 				### the mempool; rte_errno remains set to EEXIST

>> > ###

>> > 				mp_exists = true;

>> >       	     } else {

>> >            			# skip

>> > 			}

>> > 			if (dmp->mp) {

>> > 				return dmp;   # return a pointer to the mempool

>> > 			}

>> > 	[dpdk_mp_get]

>> > 	return a pointer to the mempool

>> >

>> > [netdev_dpdk_mempool_configure]

>> > if (!mp) {

>> > 	# skip

>> > } else if (rte_errno == EEXIST) {

>> > 	dev->mtu = dev->requested_mtu;

>> > 	dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

>> > 	return rte_errno;              # this returns EEXIST

>> > }

>> >

>> > >moreover who knows if in the future the rte_mempool_lookup() will be

>> enriched

>> > >to return

>> > >even more error codes?

>> >

>> > That point is moot IMO, as we should handle the API as it behaves

>currently,

>> > and not how it may do so at some potential future date.

>> > Just my $0.02 ;)

>> >

>> > >Also, I think it's good to rely on rte_errno as long as you test it

>> > >immediately, I mean

>> > >right after the API call. In the case of netdev_dpdk_mempool_configure()

>the

>> > >rte_errno

>> > >was updated by dpdk_mp_create(), which is 2 levels below in the call

>stack.

>> > >That's why I'd prefer to keep track that we are re-using a mempool into

>an

>> OVS

>> > >variable

>> > >and be less dependent by RTE return codes.

>> >

>> > That value is set in the same thread in a defined call-stack; I don't see

>a

>> > disadvantage to referencing it here tbh.

>>

>> [Antonio] ok.

>>

>>

>> >

>> > >

>> > >BTW I could have added mp_exists variable into struct dpdk_mp but I

>avoided

>> > >that because

>> > >that goes inside struct netdev_dpdk and I didn't want to increase its

>size.

>> So

>> > >I preferred

>> > >to add it as a new input parameter for the functions.

>> > >

>> > >>

>> > >> So, to handle the case where the mempool already exists - i.e. when in

>the

>> > >> callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to

>EEXIST -

>> > >you

>> > >> only need to check rte_errno here;

>> > >> (see [2], below).

>> > >>

>> > >> >

>> > >> >-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));

>> > >> >+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);

>> > >>

>> > >> [1]

>> > >> >     if (!mp) {

>> > >> >         VLOG_ERR("Failed to create memory pool for netdev "

>> > >> >                  "%s, with MTU %d on socket %d: %s\n",

>> > >> >                  dev->up.name, dev->requested_mtu, dev-

>> > >>requested_socket_id,

>> > >> >                  rte_strerror(rte_errno));

>> > >> >         return rte_errno;

>> > >> >+    } else if (mp_exists) {

>> > >>

>> > >> [2]

>> > >> "if (rte_errno == EEXIST)" is sufficient here

>> > >>

>> > >> >+        /* If a new MTU was requested and its rounded value equals

>the

>> one

>> > >> >+         * that is currently used, then the existing mempool is

>> returned.

>> > >> >+         * Update dev with the new values. */

>> > >> >+        dev->mtu = dev->requested_mtu;

>> > >> >+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);

>> > >> >+        return EEXIST;

>> > >>

>> > >> Replace with "return rte_errno" for consistency with the previous

>return

>> > >(since

>> > >> the value of rte_errno is guaranteed to be EEXIST here).

>> > >

>> > >[Antonio]

>> > >All the callers - like dpdk_vhost_reconfigure_helper() - behave depending

>on

>> > >the

>> > >outcome of rte_pktmbuf_pool_create(), that could return an error for

>> > >insufficient

>> > >memory for example.

>> > >That's why in the previous return I'm returning rte_errno as it is.

>Instead

>> > >here

>> > >I'm overriding what could be returned by rte_mempool_lookup() and

>returning

>> > >EEXIST

>> > >because that's the information we want to report to the caller.

>> >

>> > As described previously.

>>

>> [Antonio] ok.

>>

>>

>> >

>> > Thanks,

>> > Mark

>> >

>> > >

>> > >

>> > >>

>> > >> >     } else {

>> > >> >         dpdk_mp_put(dev->dpdk_mp);

>> > >> >         dev->dpdk_mp = mp;

>> > >> >@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)

>> > >> >     rte_eth_dev_stop(dev->port_id);

>> > >> >

>> > >> >     err = netdev_dpdk_mempool_configure(dev);

>> > >> >-    if (err) {

>> > >> >+    if (err && err != EEXIST) {

>> > >> >         goto out;

>> > >> >     }

>> > >> >

>> > >> >@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct

>netdev_dpdk

>> > >*dev)

>> > >> >     netdev_dpdk_remap_txqs(dev);

>> > >> >

>> > >> >     err = netdev_dpdk_mempool_configure(dev);

>> > >> >-    if (err) {

>> > >> >-        return err;

>> > >> >-    } else {

>> > >> >+    if (!err) {

>> > >> >+        /* A new mempool was created. */

>> > >> >         netdev_change_seq_changed(&dev->up);

>> > >> >+    } else if (err != EEXIST){

>> > >> >+        return err;

>> > >> >     }

>> > >> >-

>> > >> >     if (netdev_dpdk_get_vid(dev) >= 0) {

>> > >> >         if (dev->vhost_reconfigured == false) {

>> > >> >             dev->vhost_reconfigured = true;

>> > >> >--

>> > >> >2.4.11

>>

>> _______________________________________________

>> dev mailing list

>> dev@openvswitch.org

>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c60f46f..7f2d7ed 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -499,8 +499,8 @@  dpdk_mp_name(struct dpdk_mp *dmp)
 {
     uint32_t h = hash_string(dmp->if_name, 0);
     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);
-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
-                       h, dmp->mtu, dmp->mp_size);
+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);
     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
         return NULL;
     }
@@ -508,12 +508,13 @@  dpdk_mp_name(struct dpdk_mp *dmp)
 }
 
 static struct dpdk_mp *
-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)
 {
     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);
     if (!dmp) {
         return NULL;
     }
+    *mp_exists = false;
     dmp->socket_id = dev->requested_socket_id;
     dmp->mtu = mtu;
     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);
@@ -530,15 +531,14 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
             + MIN_NB_MBUF;
 
-    bool mp_exists = false;
-
     do {
         char *mp_name = dpdk_mp_name(dmp);
 
         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "
-                 "with %d Rx and %d Tx queues.",
+                 "with %d Rx and %d Tx queues, socket id:%d.",
                  dmp->mp_size, dev->up.name,
-                 dev->requested_n_rxq, dev->requested_n_txq);
+                 dev->requested_n_rxq, dev->requested_n_txq,
+                 dev->requested_socket_id);
 
         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,
                                           MP_CACHE_SZ,
@@ -559,7 +559,7 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
             /* As the mempool create returned EEXIST we can expect the
              * lookup has returned a valid pointer.  If for some reason
              * that's not the case we keep track of it. */
-            mp_exists = true;
+            *mp_exists = true;
         } else {
             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",
                      mp_name, dmp->mp_size);
@@ -573,7 +573,7 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);
             return dmp;
         }
-    } while (!mp_exists &&
+    } while (!(*mp_exists) &&
             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >= MIN_NB_MBUF));
 
     rte_free(dmp);
@@ -581,12 +581,12 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
 }
 
 static struct dpdk_mp *
-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)
+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)
 {
     struct dpdk_mp *dmp;
 
     ovs_mutex_lock(&dpdk_mp_mutex);
-    dmp = dpdk_mp_create(dev, mtu);
+    dmp = dpdk_mp_create(dev, mtu, mp_exists);
     ovs_mutex_unlock(&dpdk_mp_mutex);
 
     return dmp;
@@ -620,14 +620,22 @@  netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
 {
     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
     struct dpdk_mp *mp;
+    bool mp_exists;
 
-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));
+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);
     if (!mp) {
         VLOG_ERR("Failed to create memory pool for netdev "
                  "%s, with MTU %d on socket %d: %s\n",
                  dev->up.name, dev->requested_mtu, dev->requested_socket_id,
                  rte_strerror(rte_errno));
         return rte_errno;
+    } else if (mp_exists) {
+        /* If a new MTU was requested and its rounded value equals the one
+         * that is currently used, then the existing mempool is returned.
+         * Update dev with the new values. */
+        dev->mtu = dev->requested_mtu;
+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+        return EEXIST;
     } else {
         dpdk_mp_put(dev->dpdk_mp);
         dev->dpdk_mp = mp;
@@ -3207,7 +3215,7 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
     rte_eth_dev_stop(dev->port_id);
 
     err = netdev_dpdk_mempool_configure(dev);
-    if (err) {
+    if (err && err != EEXIST) {
         goto out;
     }
 
@@ -3247,12 +3255,12 @@  dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
     netdev_dpdk_remap_txqs(dev);
 
     err = netdev_dpdk_mempool_configure(dev);
-    if (err) {
-        return err;
-    } else {
+    if (!err) {
+        /* A new mempool was created. */
         netdev_change_seq_changed(&dev->up);
+    } else if (err != EEXIST){
+        return err;
     }
-
     if (netdev_dpdk_get_vid(dev) >= 0) {
         if (dev->vhost_reconfigured == false) {
             dev->vhost_reconfigured = true;