diff mbox

[ovs-dev,v5] netdev-dpdk: Set pmd thread priority

Message ID 1469621532-26142-1-git-send-email-bhanuprakash.bodireddy@intel.com
State Superseded
Delegated to: Daniele Di Proietto
Headers show

Commit Message

Bodireddy, Bhanuprakash July 27, 2016, 12:12 p.m. UTC
Set the DPDK pmd thread scheduling policy to SCHED_RR and static
priority to highest priority value of the policy. This is to deal with
pmd thread starvation case where another cpu hogging process can get
scheduled/affinitized on to the same core the pmd thread is running
there by significantly impacting the datapath performance.

Setting the realtime scheduling policy to the pmd threads is one step
towards Fastpath Service Assurance in OVS DPDK.

The realtime scheduling policy is applied only when CPU mask is passed
to 'pmd-cpu-mask'. For example:

    * In the absence of pmd-cpu-mask, one pmd thread shall be created
      and default scheduling policy and priority gets applied.

    * If pmd-cpu-mask is specified, one or more pmd threads shall be
      spawned on the corresponding core(s) in the mask and real time
      scheduling policy SCHED_RR and highest priority of the policy is
      applied to the pmd thread(s).

To reproduce the pmd thread starvation case:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
taskset 0x2 cat /dev/zero > /dev/null &

With this commit, it is recommended that the OVS control thread and pmd
thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'
should be non-overlapping). Also other processes with same affinity as
PMD thread will be unresponsive.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
v4->v5:
* Reword Note section in DPDK-ADVANCED.md

v3->v4:
* Document update
* Use ovs_strerror for reporting errors in lib-numa.c
 
v2->v3:
* Move set_priority() function to lib/ovs-numa.c
* Apply realtime scheduling policy and priority to pmd thread only if
  pmd-cpu-mask is passed.
* Update INSTALL.DPDK-ADVANCED.

v1->v2:
* Removed #ifdef and introduced dummy function "pmd_thread_setpriority" 
  in netdev-dpdk.h
* Rebase

 INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++----
 lib/dpif-netdev.c        |  9 +++++++++
 lib/ovs-numa.c           | 18 ++++++++++++++++++
 lib/ovs-numa.h           |  1 +
 4 files changed, 41 insertions(+), 4 deletions(-)

Comments

Mark Kavanagh July 27, 2016, 12:28 p.m. UTC | #1
>
>Set the DPDK pmd thread scheduling policy to SCHED_RR and static
>priority to highest priority value of the policy. This is to deal with
>pmd thread starvation case where another cpu hogging process can get
>scheduled/affinitized on to the same core the pmd thread is running
>there by significantly impacting the datapath performance.
>
>Setting the realtime scheduling policy to the pmd threads is one step
>towards Fastpath Service Assurance in OVS DPDK.
>
>The realtime scheduling policy is applied only when CPU mask is passed
>to 'pmd-cpu-mask'. For example:
>
>    * In the absence of pmd-cpu-mask, one pmd thread shall be created
>      and default scheduling policy and priority gets applied.
>
>    * If pmd-cpu-mask is specified, one or more pmd threads shall be
>      spawned on the corresponding core(s) in the mask and real time
>      scheduling policy SCHED_RR and highest priority of the policy is
>      applied to the pmd thread(s).
>
>To reproduce the pmd thread starvation case:
>
>ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
>taskset 0x2 cat /dev/zero > /dev/null &
>
>With this commit, it is recommended that the OVS control thread and pmd
>thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'
>should be non-overlapping). Also other processes with same affinity as
>PMD thread will be unresponsive.
>
>Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>

LGTM - Acked-by: mark.b.kavanagh@intel.com

>---
>v4->v5:
>* Reword Note section in DPDK-ADVANCED.md
>
>v3->v4:
>* Document update
>* Use ovs_strerror for reporting errors in lib-numa.c
>
>v2->v3:
>* Move set_priority() function to lib/ovs-numa.c
>* Apply realtime scheduling policy and priority to pmd thread only if
>  pmd-cpu-mask is passed.
>* Update INSTALL.DPDK-ADVANCED.
>
>v1->v2:
>* Removed #ifdef and introduced dummy function "pmd_thread_setpriority"
>  in netdev-dpdk.h
>* Rebase
>
> INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++----
> lib/dpif-netdev.c        |  9 +++++++++
> lib/ovs-numa.c           | 18 ++++++++++++++++++
> lib/ovs-numa.h           |  1 +
> 4 files changed, 41 insertions(+), 4 deletions(-)
>
>diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>index 9ae536d..d76cb4e 100644
>--- a/INSTALL.DPDK-ADVANCED.md
>+++ b/INSTALL.DPDK-ADVANCED.md
>@@ -205,8 +205,10 @@ needs to be affinitized accordingly.
>     pmd thread is CPU bound, and needs to be affinitized to isolated
>     cores for optimum performance.
>
>-    By setting a bit in the mask, a pmd thread is created and pinned
>-    to the corresponding CPU core. e.g. to run a pmd thread on core 2
>+    By setting a bit in the mask, a pmd thread is created, pinned
>+    to the corresponding CPU core and the scheduling policy SCHED_RR
>+    along with maximum priority of the policy applied to the pmd thread.
>+    e.g. to pin a pmd thread on core 2
>
>     `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
>
>@@ -234,8 +236,10 @@ needs to be affinitized accordingly.
>   responsible for different ports/rxq's. Assignment of ports/rxq's to
>   pmd threads is done automatically.
>
>-  A set bit in the mask means a pmd thread is created and pinned
>-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
>+  A set bit in the mask means a pmd thread is created, pinned to the
>+  corresponding CPU core and the scheduling policy SCHED_RR with highest
>+  priority of the scheduling policy applied to pmd thread.
>+  e.g. to run pmd threads on core 1 and 2
>
>   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
>
>@@ -246,6 +250,11 @@ needs to be affinitized accordingly.
>
>   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
>
>+  Note: It is recommended that the OVS control thread and pmd thread
>+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask'
>+  cpu mask settings should be non-overlapping. Also other processes with
>+  same affinity as PMD thread will be unresponsive.
>+
> ### 4.3 DPDK physical port Rx Queues
>
>   `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index f05ca4e..b85600b 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -2841,6 +2841,15 @@ pmd_thread_main(void *f_)
>     ovs_numa_thread_setaffinity_core(pmd->core_id);
>     dpdk_set_lcore_id(pmd->core_id);
>     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
>+
>+    /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd thread's
>+     * scheduling policy is set to SCHED_RR and the priority to highest priority
>+     * of SCHED_RR policy.  In the absence of pmd-cpu-mask, default scheduling
>+     * policy and priority shall apply to pmd thread.
>+     */
>+    if (pmd->dp->pmd_cmask) {
>+        ovs_numa_thread_setpriority(SCHED_RR);
>+    }
> reload:
>     emc_cache_init(&pmd->flow_cache);
>
>diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
>index c8173e0..428f274 100644
>--- a/lib/ovs-numa.c
>+++ b/lib/ovs-numa.c
>@@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned core_id OVS_UNUSED)
>     return EOPNOTSUPP;
> #endif /* __linux__ */
> }
>+
>+void
>+ovs_numa_thread_setpriority(int policy)
>+{
>+    if (dummy_numa) {
>+        return;
>+    }
>+
>+    struct sched_param threadparam;
>+    int err;
>+
>+    memset(&threadparam, 0, sizeof(threadparam));
>+    threadparam.sched_priority = sched_get_priority_max(policy);
>+    err = pthread_setschedparam(pthread_self(), policy, &threadparam);
>+    if (err) {
>+        VLOG_ERR("Thread priority error %s",ovs_strerror(err));
>+    }
>+}
>diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
>index be836b2..94f0884 100644
>--- a/lib/ovs-numa.h
>+++ b/lib/ovs-numa.h
>@@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id);
> struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
> void ovs_numa_dump_destroy(struct ovs_numa_dump *);
> int ovs_numa_thread_setaffinity_core(unsigned core_id);
>+void ovs_numa_thread_setpriority(int policy);
>
> #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)                    \
>     LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)
>--
>2.4.11
Daniele Di Proietto July 27, 2016, 9:09 p.m. UTC | #2
Thanks for the patch, the implementation looks good to me too.

During testing I kept noticing that it's way too easy to make OVS
completely unresponsive.  As you point out in the documentation by having
dpdk-lcore-mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill
-9 is required).  I wonder what happens if one tries to set pmd-cpu-mask to
every core in the system.

As a way to mitigate the risk perhaps we can avoid setting the main thread
affinity to the first core in dpdk-lcore-mask by _always_ restoring it in
dpdk_init__(), also if auto_determine is false.

Perhaps we should start explicitly prohibiting creating a pmd thread on the
first core in dpdk-lcore-mask (I get why previous version of this didn't do
it on core 0.  Perhaps we can generalize that to the first core in
dpdk-lcore-mask).

What's the behavior of other DPDK applications?

Thanks,

Daniele

2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B <mark.b.kavanagh@intel.com>:

> >
> >Set the DPDK pmd thread scheduling policy to SCHED_RR and static
> >priority to highest priority value of the policy. This is to deal with
> >pmd thread starvation case where another cpu hogging process can get
> >scheduled/affinitized on to the same core the pmd thread is running
> >there by significantly impacting the datapath performance.
> >
> >Setting the realtime scheduling policy to the pmd threads is one step
> >towards Fastpath Service Assurance in OVS DPDK.
> >
> >The realtime scheduling policy is applied only when CPU mask is passed
> >to 'pmd-cpu-mask'. For example:
> >
> >    * In the absence of pmd-cpu-mask, one pmd thread shall be created
> >      and default scheduling policy and priority gets applied.
> >
> >    * If pmd-cpu-mask is specified, one or more pmd threads shall be
> >      spawned on the corresponding core(s) in the mask and real time
> >      scheduling policy SCHED_RR and highest priority of the policy is
> >      applied to the pmd thread(s).
> >
> >To reproduce the pmd thread starvation case:
> >
> >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
> >taskset 0x2 cat /dev/zero > /dev/null &
> >
> >With this commit, it is recommended that the OVS control thread and pmd
> >thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'
> >should be non-overlapping). Also other processes with same affinity as
> >PMD thread will be unresponsive.
> >
> >Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
>
> LGTM - Acked-by: mark.b.kavanagh@intel.com
>
> >---
> >v4->v5:
> >* Reword Note section in DPDK-ADVANCED.md
> >
> >v3->v4:
> >* Document update
> >* Use ovs_strerror for reporting errors in lib-numa.c
> >
> >v2->v3:
> >* Move set_priority() function to lib/ovs-numa.c
> >* Apply realtime scheduling policy and priority to pmd thread only if
> >  pmd-cpu-mask is passed.
> >* Update INSTALL.DPDK-ADVANCED.
> >
> >v1->v2:
> >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority"
> >  in netdev-dpdk.h
> >* Rebase
> >
> > INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++----
> > lib/dpif-netdev.c        |  9 +++++++++
> > lib/ovs-numa.c           | 18 ++++++++++++++++++
> > lib/ovs-numa.h           |  1 +
> > 4 files changed, 41 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> >index 9ae536d..d76cb4e 100644
> >--- a/INSTALL.DPDK-ADVANCED.md
> >+++ b/INSTALL.DPDK-ADVANCED.md
> >@@ -205,8 +205,10 @@ needs to be affinitized accordingly.
> >     pmd thread is CPU bound, and needs to be affinitized to isolated
> >     cores for optimum performance.
> >
> >-    By setting a bit in the mask, a pmd thread is created and pinned
> >-    to the corresponding CPU core. e.g. to run a pmd thread on core 2
> >+    By setting a bit in the mask, a pmd thread is created, pinned
> >+    to the corresponding CPU core and the scheduling policy SCHED_RR
> >+    along with maximum priority of the policy applied to the pmd thread.
> >+    e.g. to pin a pmd thread on core 2
> >
> >     `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
> >
> >@@ -234,8 +236,10 @@ needs to be affinitized accordingly.
> >   responsible for different ports/rxq's. Assignment of ports/rxq's to
> >   pmd threads is done automatically.
> >
> >-  A set bit in the mask means a pmd thread is created and pinned
> >-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
> >+  A set bit in the mask means a pmd thread is created, pinned to the
> >+  corresponding CPU core and the scheduling policy SCHED_RR with highest
> >+  priority of the scheduling policy applied to pmd thread.
> >+  e.g. to run pmd threads on core 1 and 2
> >
> >   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
> >
> >@@ -246,6 +250,11 @@ needs to be affinitized accordingly.
> >
> >   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
> >
> >+  Note: It is recommended that the OVS control thread and pmd thread
> >+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and
> 'pmd-cpu-mask'
> >+  cpu mask settings should be non-overlapping. Also other processes with
> >+  same affinity as PMD thread will be unresponsive.
> >+
> > ### 4.3 DPDK physical port Rx Queues
> >
> >   `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
> >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >index f05ca4e..b85600b 100644
> >--- a/lib/dpif-netdev.c
> >+++ b/lib/dpif-netdev.c
> >@@ -2841,6 +2841,15 @@ pmd_thread_main(void *f_)
> >     ovs_numa_thread_setaffinity_core(pmd->core_id);
> >     dpdk_set_lcore_id(pmd->core_id);
> >     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> >+
> >+    /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd
> thread's
> >+     * scheduling policy is set to SCHED_RR and the priority to highest
> priority
> >+     * of SCHED_RR policy.  In the absence of pmd-cpu-mask, default
> scheduling
> >+     * policy and priority shall apply to pmd thread.
> >+     */
> >+    if (pmd->dp->pmd_cmask) {
> >+        ovs_numa_thread_setpriority(SCHED_RR);
> >+    }
> > reload:
> >     emc_cache_init(&pmd->flow_cache);
> >
> >diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
> >index c8173e0..428f274 100644
> >--- a/lib/ovs-numa.c
> >+++ b/lib/ovs-numa.c
> >@@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned
> core_id OVS_UNUSED)
> >     return EOPNOTSUPP;
> > #endif /* __linux__ */
> > }
> >+
> >+void
> >+ovs_numa_thread_setpriority(int policy)
> >+{
> >+    if (dummy_numa) {
> >+        return;
> >+    }
> >+
> >+    struct sched_param threadparam;
> >+    int err;
> >+
> >+    memset(&threadparam, 0, sizeof(threadparam));
> >+    threadparam.sched_priority = sched_get_priority_max(policy);
> >+    err = pthread_setschedparam(pthread_self(), policy, &threadparam);
> >+    if (err) {
> >+        VLOG_ERR("Thread priority error %s",ovs_strerror(err));
> >+    }
> >+}
> >diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
> >index be836b2..94f0884 100644
> >--- a/lib/ovs-numa.h
> >+++ b/lib/ovs-numa.h
> >@@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id);
> > struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
> > void ovs_numa_dump_destroy(struct ovs_numa_dump *);
> > int ovs_numa_thread_setaffinity_core(unsigned core_id);
> >+void ovs_numa_thread_setpriority(int policy);
> >
> > #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)                    \
> >     LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)
> >--
> >2.4.11
>
>
Bodireddy, Bhanuprakash July 28, 2016, 3:39 p.m. UTC | #3
>-----Original Message-----

>From: Daniele Di Proietto [mailto:diproiettod@ovn.org]

>Sent: Wednesday, July 27, 2016 10:10 PM

>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>

>Cc: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>;

>dev@openvswitch.org

>Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority

>

>Thanks for the patch, the implementation looks good to me too.

>During testing I kept noticing that it's way too easy to make OVS completely

>unresponsive.  As you point out in the documentation by having dpdk-lcore-

>mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill -9 is

>required).  I wonder what happens if one tries to set pmd-cpu-mask to every

>core in the system.

>As a way to mitigate the risk perhaps we can avoid setting the main thread

>affinity to the first core in dpdk-lcore-mask by _always_ restoring it in

>dpdk_init__(), also if auto_determine is false.

>Perhaps we should start explicitly prohibiting creating a pmd thread on the

>first core in dpdk-lcore-mask (I get why previous version of this didn't do it on

>core 0.  Perhaps we can generalize that to the first core in dpdk-lcore-mask).

I will look in to this and get back to you sometime next week.

Regards,
Bhanuprakash.

>

>What's the behavior of other DPDK applications?

>Thanks,

>Daniele

>

>2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B <mark.b.kavanagh@intel.com>:

>>

>>Set the DPDK pmd thread scheduling policy to SCHED_RR and static

>>priority to highest priority value of the policy. This is to deal with

>>pmd thread starvation case where another cpu hogging process can get

>>scheduled/affinitized on to the same core the pmd thread is running

>>there by significantly impacting the datapath performance.

>>

>>Setting the realtime scheduling policy to the pmd threads is one step

>>towards Fastpath Service Assurance in OVS DPDK.

>>

>>The realtime scheduling policy is applied only when CPU mask is passed

>>to 'pmd-cpu-mask'. For example:

>>

>>    * In the absence of pmd-cpu-mask, one pmd thread shall be created

>>      and default scheduling policy and priority gets applied.

>>

>>    * If pmd-cpu-mask is specified, one or more pmd threads shall be

>>      spawned on the corresponding core(s) in the mask and real time

>>      scheduling policy SCHED_RR and highest priority of the policy is

>>      applied to the pmd thread(s).

>>

>>To reproduce the pmd thread starvation case:

>>

>>ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6

>>taskset 0x2 cat /dev/zero > /dev/null &

>>

>>With this commit, it is recommended that the OVS control thread and pmd

>>thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'

>>should be non-overlapping). Also other processes with same affinity as

>>PMD thread will be unresponsive.

>>

>>Signed-off-by: Bhanuprakash Bodireddy

><bhanuprakash.bodireddy@intel.com>

>

>LGTM - Acked-by: mark.b.kavanagh@intel.com

>

>>---

>>v4->v5:

>>* Reword Note section in DPDK-ADVANCED.md

>>

>>v3->v4:

>>* Document update

>>* Use ovs_strerror for reporting errors in lib-numa.c

>>

>>v2->v3:

>>* Move set_priority() function to lib/ovs-numa.c

>>* Apply realtime scheduling policy and priority to pmd thread only if

>>  pmd-cpu-mask is passed.

>>* Update INSTALL.DPDK-ADVANCED.

>>

>>v1->v2:

>>* Removed #ifdef and introduced dummy function

>"pmd_thread_setpriority"

>>  in netdev-dpdk.h

>>* Rebase

>>

>> INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++----

>> lib/dpif-netdev.c        |  9 +++++++++

>> lib/ovs-numa.c           | 18 ++++++++++++++++++

>> lib/ovs-numa.h           |  1 +

>> 4 files changed, 41 insertions(+), 4 deletions(-)

>>

>>diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md

>>index 9ae536d..d76cb4e 100644

>>--- a/INSTALL.DPDK-ADVANCED.md

>>+++ b/INSTALL.DPDK-ADVANCED.md

>>@@ -205,8 +205,10 @@ needs to be affinitized accordingly.

>>     pmd thread is CPU bound, and needs to be affinitized to isolated

>>     cores for optimum performance.

>>

>>-    By setting a bit in the mask, a pmd thread is created and pinned

>>-    to the corresponding CPU core. e.g. to run a pmd thread on core 2

>>+    By setting a bit in the mask, a pmd thread is created, pinned

>>+    to the corresponding CPU core and the scheduling policy SCHED_RR

>>+    along with maximum priority of the policy applied to the pmd thread.

>>+    e.g. to pin a pmd thread on core 2

>>

>>     `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`

>>

>>@@ -234,8 +236,10 @@ needs to be affinitized accordingly.

>>   responsible for different ports/rxq's. Assignment of ports/rxq's to

>>   pmd threads is done automatically.

>>

>>-  A set bit in the mask means a pmd thread is created and pinned

>>-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2

>>+  A set bit in the mask means a pmd thread is created, pinned to the

>>+  corresponding CPU core and the scheduling policy SCHED_RR with highest

>>+  priority of the scheduling policy applied to pmd thread.

>>+  e.g. to run pmd threads on core 1 and 2

>>

>>   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`

>>

>>@@ -246,6 +250,11 @@ needs to be affinitized accordingly.

>>

>>   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1

>>

>>+  Note: It is recommended that the OVS control thread and pmd thread

>>+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-

>mask'

>>+  cpu mask settings should be non-overlapping. Also other processes with

>>+  same affinity as PMD thread will be unresponsive.

>>+

>> ### 4.3 DPDK physical port Rx Queues

>>

>>   `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`

>>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

>>index f05ca4e..b85600b 100644

>>--- a/lib/dpif-netdev.c

>>+++ b/lib/dpif-netdev.c

>>@@ -2841,6 +2841,15 @@ pmd_thread_main(void *f_)

>>     ovs_numa_thread_setaffinity_core(pmd->core_id);

>>     dpdk_set_lcore_id(pmd->core_id);

>>     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);

>>+

>>+    /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd

>thread's

>>+     * scheduling policy is set to SCHED_RR and the priority to highest priority

>>+     * of SCHED_RR policy.  In the absence of pmd-cpu-mask, default

>scheduling

>>+     * policy and priority shall apply to pmd thread.

>>+     */

>>+    if (pmd->dp->pmd_cmask) {

>>+        ovs_numa_thread_setpriority(SCHED_RR);

>>+    }

>> reload:

>>     emc_cache_init(&pmd->flow_cache);

>>

>>diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c

>>index c8173e0..428f274 100644

>>--- a/lib/ovs-numa.c

>>+++ b/lib/ovs-numa.c

>>@@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned

>core_id OVS_UNUSED)

>>     return EOPNOTSUPP;

>> #endif /* __linux__ */

>> }

>>+

>>+void

>>+ovs_numa_thread_setpriority(int policy)

>>+{

>>+    if (dummy_numa) {

>>+        return;

>>+    }

>>+

>>+    struct sched_param threadparam;

>>+    int err;

>>+

>>+    memset(&threadparam, 0, sizeof(threadparam));

>>+    threadparam.sched_priority = sched_get_priority_max(policy);

>>+    err = pthread_setschedparam(pthread_self(), policy, &threadparam);

>>+    if (err) {

>>+        VLOG_ERR("Thread priority error %s",ovs_strerror(err));

>>+    }

>>+}

>>diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h

>>index be836b2..94f0884 100644

>>--- a/lib/ovs-numa.h

>>+++ b/lib/ovs-numa.h

>>@@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id);

>> struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);

>> void ovs_numa_dump_destroy(struct ovs_numa_dump *);

>> int ovs_numa_thread_setaffinity_core(unsigned core_id);

>>+void ovs_numa_thread_setpriority(int policy);

>>

>> #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)                    \

>>     LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)

>>--

>>2.4.11
Flavio Leitner July 28, 2016, 7:26 p.m. UTC | #4
On Thu, Jul 28, 2016 at 03:39:58PM +0000, Bodireddy, Bhanuprakash wrote:
> >-----Original Message-----
> >From: Daniele Di Proietto [mailto:diproiettod@ovn.org]
> >Sent: Wednesday, July 27, 2016 10:10 PM
> >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>
> >Cc: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>;
> >dev@openvswitch.org
> >Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority
> >
> >Thanks for the patch, the implementation looks good to me too.
> >During testing I kept noticing that it's way too easy to make OVS completely
> >unresponsive.  As you point out in the documentation by having dpdk-lcore-
> >mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill -9 is
> >required).  I wonder what happens if one tries to set pmd-cpu-mask to every
> >core in the system.
> >As a way to mitigate the risk perhaps we can avoid setting the main thread
> >affinity to the first core in dpdk-lcore-mask by _always_ restoring it in
> >dpdk_init__(), also if auto_determine is false.
> >Perhaps we should start explicitly prohibiting creating a pmd thread on the
> >first core in dpdk-lcore-mask (I get why previous version of this didn't do it on
> >core 0.  Perhaps we can generalize that to the first core in dpdk-lcore-mask).
> I will look in to this and get back to you sometime next week.

Isn't enough to just increase priority to the max with setpriority(2)?
I know it is not the same as SCHED_RR  but for those users that
don't know how to tune it properly, this seems to be less dangerous
while still providing a good share of CPU to the PMD thread.

For instance, I recall to have seen OVS revalidation threads running
with PMD on the same CPU, but that might have been 2.5 only.

Thanks,
Bodireddy, Bhanuprakash Aug. 15, 2016, 3:55 p.m. UTC | #5
>-----Original Message-----

>From: Daniele Di Proietto [mailto:diproiettod@ovn.org]

>Sent: Wednesday, July 27, 2016 10:10 PM

>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>

>Cc: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>;

>dev@openvswitch.org

>Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority

>

>Thanks for the patch, the implementation looks good to me too.

>During testing I kept noticing that it's way too easy to make OVS completely

>unresponsive.  As you point out in the documentation by having dpdk-lcore-

>mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill -9 is

>required).  I wonder what happens if one tries to set pmd-cpu-mask to every

>core in the system.

>As a way to mitigate the risk perhaps we can avoid setting the main thread

>affinity to the first core in dpdk-lcore-mask by _always_ restoring it in

>dpdk_init__(), also if auto_determine is false.


I followed this approach but during the testing I found that the revalidator thread sometimes
gets blocked when the pmd threads are scheduled. This may be for the reason that
once the pmd threads (SCHED_RR policy applied) kicks in the other threads would never get
scheduled and may eventually block.

>Perhaps we should start explicitly prohibiting creating a pmd thread on the

>first core in dpdk-lcore-mask (I get why previous version of this didn't do it on

>core 0.  Perhaps we can generalize that to the first core in dpdk-lcore-mask).


I thought this is a better approach and implemented this in V6 of the patch. Tested various 
cases and found to be working as expected.
 http://openvswitch.org/pipermail/dev/2016-August/078001.html

>

>What's the behavior of other DPDK applications?


I spoke to DPDK team here and  they confirmed our observations. Given the affinity mask the control threads
gets always pinned to the lowest core of the mask  and can be reproduced with any DPDK application. 
I am following up with the DPDK team to see if this behavior can be changed.

Regards,
Bhanu Prakash. 

>Thanks,

>Daniele

>

>2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B <mark.b.kavanagh@intel.com>:

>>

>>Set the DPDK pmd thread scheduling policy to SCHED_RR and static

>>priority to highest priority value of the policy. This is to deal with

>>pmd thread starvation case where another cpu hogging process can get

>>scheduled/affinitized on to the same core the pmd thread is running

>>there by significantly impacting the datapath performance.

>>

>>Setting the realtime scheduling policy to the pmd threads is one step

>>towards Fastpath Service Assurance in OVS DPDK.

>>

>>The realtime scheduling policy is applied only when CPU mask is passed

>>to 'pmd-cpu-mask'. For example:

>>

>>    * In the absence of pmd-cpu-mask, one pmd thread shall be created

>>      and default scheduling policy and priority gets applied.

>>

>>    * If pmd-cpu-mask is specified, one or more pmd threads shall be

>>      spawned on the corresponding core(s) in the mask and real time

>>      scheduling policy SCHED_RR and highest priority of the policy is

>>      applied to the pmd thread(s).

>>

>>To reproduce the pmd thread starvation case:

>>

>>ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6

>>taskset 0x2 cat /dev/zero > /dev/null &

>>

>>With this commit, it is recommended that the OVS control thread and pmd

>>thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'

>>should be non-overlapping). Also other processes with same affinity as

>>PMD thread will be unresponsive.

>>

>>Signed-off-by: Bhanuprakash Bodireddy

><bhanuprakash.bodireddy@intel.com>

>

>LGTM - Acked-by: mark.b.kavanagh@intel.com

>

>>---

>>v4->v5:

>>* Reword Note section in DPDK-ADVANCED.md

>>

>>v3->v4:

>>* Document update

>>* Use ovs_strerror for reporting errors in lib-numa.c

>>

>>v2->v3:

>>* Move set_priority() function to lib/ovs-numa.c

>>* Apply realtime scheduling policy and priority to pmd thread only if

>>  pmd-cpu-mask is passed.

>>* Update INSTALL.DPDK-ADVANCED.

>>

>>v1->v2:

>>* Removed #ifdef and introduced dummy function

>"pmd_thread_setpriority"

>>  in netdev-dpdk.h

>>* Rebase

>>

>> INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++----

>> lib/dpif-netdev.c        |  9 +++++++++

>> lib/ovs-numa.c           | 18 ++++++++++++++++++

>> lib/ovs-numa.h           |  1 +

>> 4 files changed, 41 insertions(+), 4 deletions(-)

>>

>>diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md

>>index 9ae536d..d76cb4e 100644

>>--- a/INSTALL.DPDK-ADVANCED.md

>>+++ b/INSTALL.DPDK-ADVANCED.md

>>@@ -205,8 +205,10 @@ needs to be affinitized accordingly.

>>     pmd thread is CPU bound, and needs to be affinitized to isolated

>>     cores for optimum performance.

>>

>>-    By setting a bit in the mask, a pmd thread is created and pinned

>>-    to the corresponding CPU core. e.g. to run a pmd thread on core 2

>>+    By setting a bit in the mask, a pmd thread is created, pinned

>>+    to the corresponding CPU core and the scheduling policy SCHED_RR

>>+    along with maximum priority of the policy applied to the pmd thread.

>>+    e.g. to pin a pmd thread on core 2

>>

>>     `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`

>>

>>@@ -234,8 +236,10 @@ needs to be affinitized accordingly.

>>   responsible for different ports/rxq's. Assignment of ports/rxq's to

>>   pmd threads is done automatically.

>>

>>-  A set bit in the mask means a pmd thread is created and pinned

>>-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2

>>+  A set bit in the mask means a pmd thread is created, pinned to the

>>+  corresponding CPU core and the scheduling policy SCHED_RR with highest

>>+  priority of the scheduling policy applied to pmd thread.

>>+  e.g. to run pmd threads on core 1 and 2

>>

>>   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`

>>

>>@@ -246,6 +250,11 @@ needs to be affinitized accordingly.

>>

>>   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1

>>

>>+  Note: It is recommended that the OVS control thread and pmd thread

>>+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-

>mask'

>>+  cpu mask settings should be non-overlapping. Also other processes with

>>+  same affinity as PMD thread will be unresponsive.

>>+

>> ### 4.3 DPDK physical port Rx Queues

>>

>>   `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`

>>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

>>index f05ca4e..b85600b 100644

>>--- a/lib/dpif-netdev.c

>>+++ b/lib/dpif-netdev.c

>>@@ -2841,6 +2841,15 @@ pmd_thread_main(void *f_)

>>     ovs_numa_thread_setaffinity_core(pmd->core_id);

>>     dpdk_set_lcore_id(pmd->core_id);

>>     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);

>>+

>>+    /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd

>thread's

>>+     * scheduling policy is set to SCHED_RR and the priority to highest priority

>>+     * of SCHED_RR policy.  In the absence of pmd-cpu-mask, default

>scheduling

>>+     * policy and priority shall apply to pmd thread.

>>+     */

>>+    if (pmd->dp->pmd_cmask) {

>>+        ovs_numa_thread_setpriority(SCHED_RR);

>>+    }

>> reload:

>>     emc_cache_init(&pmd->flow_cache);

>>

>>diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c

>>index c8173e0..428f274 100644

>>--- a/lib/ovs-numa.c

>>+++ b/lib/ovs-numa.c

>>@@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned

>core_id OVS_UNUSED)

>>     return EOPNOTSUPP;

>> #endif /* __linux__ */

>> }

>>+

>>+void

>>+ovs_numa_thread_setpriority(int policy)

>>+{

>>+    if (dummy_numa) {

>>+        return;

>>+    }

>>+

>>+    struct sched_param threadparam;

>>+    int err;

>>+

>>+    memset(&threadparam, 0, sizeof(threadparam));

>>+    threadparam.sched_priority = sched_get_priority_max(policy);

>>+    err = pthread_setschedparam(pthread_self(), policy, &threadparam);

>>+    if (err) {

>>+        VLOG_ERR("Thread priority error %s",ovs_strerror(err));

>>+    }

>>+}

>>diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h

>>index be836b2..94f0884 100644

>>--- a/lib/ovs-numa.h

>>+++ b/lib/ovs-numa.h

>>@@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id);

>> struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);

>> void ovs_numa_dump_destroy(struct ovs_numa_dump *);

>> int ovs_numa_thread_setaffinity_core(unsigned core_id);

>>+void ovs_numa_thread_setpriority(int policy);

>>

>> #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)                    \

>>     LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)

>>--

>>2.4.11
Bodireddy, Bhanuprakash Aug. 16, 2016, 9:17 a.m. UTC | #6
Hello Flavio,

Thanks for your feedback, unfortunately I missed this mail due to my outlook filter settings. Please see my comments inline.

>-----Original Message-----
>From: Flavio Leitner [mailto:fbl@sysclose.org]
>Sent: Thursday, July 28, 2016 8:27 PM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>
>Cc: Daniele Di Proietto <diproiettod@ovn.org>; Kavanagh, Mark B
><mark.b.kavanagh@intel.com>; dev@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority
>
>On Thu, Jul 28, 2016 at 03:39:58PM +0000, Bodireddy, Bhanuprakash wrote:
>> >-----Original Message-----
>> >From: Daniele Di Proietto [mailto:diproiettod@ovn.org]
>> >Sent: Wednesday, July 27, 2016 10:10 PM
>> >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>
>> >Cc: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>;
>> >dev@openvswitch.org
>> >Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority
>> >
>> >Thanks for the patch, the implementation looks good to me too.
>> >During testing I kept noticing that it's way too easy to make OVS
>> >completely unresponsive.  As you point out in the documentation by
>> >having dpdk-lcore- mask the same as pmd-cpu-mask, OVS cannot even be
>> >killed (a kill -9 is required).  I wonder what happens if one tries
>> >to set pmd-cpu-mask to every core in the system.
>> >As a way to mitigate the risk perhaps we can avoid setting the main
>> >thread affinity to the first core in dpdk-lcore-mask by _always_
>> >restoring it in dpdk_init__(), also if auto_determine is false.
>> >Perhaps we should start explicitly prohibiting creating a pmd thread
>> >on the first core in dpdk-lcore-mask (I get why previous version of
>> >this didn't do it on core 0.  Perhaps we can generalize that to the first core
>in dpdk-lcore-mask).
>> I will look in to this and get back to you sometime next week.
>
>Isn't enough to just increase priority to the max with setpriority(2)?
>I know it is not the same as SCHED_RR  but for those users that don't know
>how to tune it properly, this seems to be less dangerous while still providing a
>good share of CPU to the PMD thread.

I agree with your suggestion here. Though my initial patch was to address pmd thread starvation case, I see now
that I have put some restriction around the pmd-cpu-mask affinity setting by not spawning the pmd thread
on the first core of the dpdk-lcore-mask due to the way DPDK handles the thread pinning .
You can check the patch here:  http://openvswitch.org/pipermail/dev/2016-August/078001.html

At this point In time I would bump up the pmd thread priority instead of changing the policty to SCHED_RR as this
Is less dangerous and would allow users with fewer cores to still set pmd-cpu-mask.

>
>For instance, I recall to have seen OVS revalidation threads running with PMD
>on the same CPU, but that might have been 2.5 only.
This is seen even with latest OVS Master.  Here is the way you can reproduce this. From the below output you can see
that the ovs-vswitchd, revalidators threads are running on the same core as pmd thread.

$ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=F0
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=10

$ ps -eLo tid,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e pmd
 89966   4 ovs-vswitchd
 89969   5 lcore-slave-5
 89970   6 lcore-slave-6
 89971   7 lcore-slave-7
 90038   4 revalidator37
 90039   4 revalidator52
 90040   4 revalidator42
 90041   4 revalidator38
 90042   4 revalidator39
 90043   4 revalidator45
 90044   4 revalidator53
 90045   4 revalidator54
 90047   4 pmd61
>
>Thanks,
>--
>fbl
diff mbox

Patch

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index 9ae536d..d76cb4e 100644
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -205,8 +205,10 @@  needs to be affinitized accordingly.
     pmd thread is CPU bound, and needs to be affinitized to isolated
     cores for optimum performance.
 
-    By setting a bit in the mask, a pmd thread is created and pinned
-    to the corresponding CPU core. e.g. to run a pmd thread on core 2
+    By setting a bit in the mask, a pmd thread is created, pinned
+    to the corresponding CPU core and the scheduling policy SCHED_RR
+    along with maximum priority of the policy applied to the pmd thread.
+    e.g. to pin a pmd thread on core 2
 
     `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
 
@@ -234,8 +236,10 @@  needs to be affinitized accordingly.
   responsible for different ports/rxq's. Assignment of ports/rxq's to
   pmd threads is done automatically.
 
-  A set bit in the mask means a pmd thread is created and pinned
-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
+  A set bit in the mask means a pmd thread is created, pinned to the
+  corresponding CPU core and the scheduling policy SCHED_RR with highest
+  priority of the scheduling policy applied to pmd thread.
+  e.g. to run pmd threads on core 1 and 2
 
   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
 
@@ -246,6 +250,11 @@  needs to be affinitized accordingly.
 
   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
 
+  Note: It is recommended that the OVS control thread and pmd thread
+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask'
+  cpu mask settings should be non-overlapping. Also other processes with
+  same affinity as PMD thread will be unresponsive.
+
 ### 4.3 DPDK physical port Rx Queues
 
   `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f05ca4e..b85600b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2841,6 +2841,15 @@  pmd_thread_main(void *f_)
     ovs_numa_thread_setaffinity_core(pmd->core_id);
     dpdk_set_lcore_id(pmd->core_id);
     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
+
+    /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd thread's
+     * scheduling policy is set to SCHED_RR and the priority to highest priority
+     * of SCHED_RR policy.  In the absence of pmd-cpu-mask, default scheduling
+     * policy and priority shall apply to pmd thread.
+     */
+    if (pmd->dp->pmd_cmask) {
+        ovs_numa_thread_setpriority(SCHED_RR);
+    }
 reload:
     emc_cache_init(&pmd->flow_cache);
 
diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index c8173e0..428f274 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -613,3 +613,21 @@  int ovs_numa_thread_setaffinity_core(unsigned core_id OVS_UNUSED)
     return EOPNOTSUPP;
 #endif /* __linux__ */
 }
+
+void
+ovs_numa_thread_setpriority(int policy)
+{
+    if (dummy_numa) {
+        return;
+    }
+
+    struct sched_param threadparam;
+    int err;
+
+    memset(&threadparam, 0, sizeof(threadparam));
+    threadparam.sched_priority = sched_get_priority_max(policy);
+    err = pthread_setschedparam(pthread_self(), policy, &threadparam);
+    if (err) {
+        VLOG_ERR("Thread priority error %s",ovs_strerror(err));
+    }
+}
diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
index be836b2..94f0884 100644
--- a/lib/ovs-numa.h
+++ b/lib/ovs-numa.h
@@ -56,6 +56,7 @@  void ovs_numa_unpin_core(unsigned core_id);
 struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
 void ovs_numa_dump_destroy(struct ovs_numa_dump *);
 int ovs_numa_thread_setaffinity_core(unsigned core_id);
+void ovs_numa_thread_setpriority(int policy);
 
 #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)                    \
     LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)