diff mbox

[ovs-dev] netdev-dpdk: Set pmd thread priority

Message ID 1461251761-4878-1-git-send-email-bhanuprakash.bodireddy@intel.com
State Changes Requested
Headers show

Commit Message

Bodireddy, Bhanuprakash April 21, 2016, 3:16 p.m. UTC
Set the DPDK pmd thread scheduling policy to SCHED_RR and static
priority to highest priority value of the policy. This is to deal with
pmd thread starvation case where another cpu hogging process can get
scheduled/affinitized to the same core where pmd is running there by
significantly impacting the datapath performance.

The realtime scheduling policy is applied only when CPU mask is passed
to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where the
policy and priority shall not be applied to pmd thread spawned on core0.
For example:

    * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd
      thread shall be created and affinitized to 'core 0' with default
      scheduling policy and priority applied.

    * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd
      threads shall be spawned on the corresponding core(s) in the mask
      and real time scheduling policy SCHED_RR and highest static
      priority is applied to the pmd thread(s).

To reproduce use following commands:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
taskset 0x2 cat /dev/zero > /dev/null &

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 lib/dpif-netdev.c |  9 +++++++++
 lib/netdev-dpdk.c | 14 ++++++++++++++
 lib/netdev-dpdk.h |  1 +
 3 files changed, 24 insertions(+)

Comments

Kevin Traynor April 25, 2016, 5:07 p.m. UTC | #1
On 21/04/2016 16:16, Bhanuprakash Bodireddy wrote:
> Set the DPDK pmd thread scheduling policy to SCHED_RR and static
> priority to highest priority value of the policy. This is to deal with
> pmd thread starvation case where another cpu hogging process can get
> scheduled/affinitized to the same core where pmd is running there by
> significantly impacting the datapath performance.
>
> The realtime scheduling policy is applied only when CPU mask is passed
> to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where the
> policy and priority shall not be applied to pmd thread spawned on core0.
> For example:
>
>      * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd
>        thread shall be created and affinitized to 'core 0' with default
>        scheduling policy and priority applied.
>
>      * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd
>        threads shall be spawned on the corresponding core(s) in the mask
>        and real time scheduling policy SCHED_RR and highest static
>        priority is applied to the pmd thread(s).
>
> To reproduce use following commands:
>
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
> taskset 0x2 cat /dev/zero > /dev/null &

Even though it seems the most likely case - I'm not sure that we can 
always assume the user who put the non-OVS process on the core did so by 
mistake and would want us to increase our priority.

>
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
> ---
>   lib/dpif-netdev.c |  9 +++++++++
>   lib/netdev-dpdk.c | 14 ++++++++++++++
>   lib/netdev-dpdk.h |  1 +
>   3 files changed, 24 insertions(+)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1e8a37c..4a46816 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2670,6 +2670,15 @@ pmd_thread_main(void *f_)
>       /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>       ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>       pmd_thread_setaffinity_cpu(pmd->core_id);
> +
> +#ifdef DPDK_NETDEV
> +    /* Set pmd thread's scheduling policy to SCHED_RR and priority to
> +     * highest priority of SCHED_RR policy, In absence of pmd-cpu-mask (or)
> +     * pmd-cpu-mask=1, default scheduling policy and priority shall
> +     * apply to pmd thread */
> +     if (pmd->core_id)
> +        pmd_thread_setpriority();

Similar to above, I don't think we can assume anything special about 
core 0. This type of change sounds like something that would be better 
done at a layer above vswitch which has more system wide knowledge.

fwiw, it would be cleaner to remove the #ifdef from here and create a 
dummy fn in netdev-dpdk.h, also the 'if' needs {}

> +#endif
>   reload:
>       emc_cache_init(&pmd->flow_cache);
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 208c5f5..6518c87 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2926,6 +2926,20 @@ pmd_thread_setaffinity_cpu(unsigned cpu)
>       return 0;
>   }
>
> +void
> +pmd_thread_setpriority(void)
> +{
> +    struct sched_param threadparam;
> +    int err;
> +
> +    memset(&threadparam, 0, sizeof(threadparam));
> +    threadparam.sched_priority = sched_get_priority_max(SCHED_RR);
> +    err = pthread_setschedparam(pthread_self(), SCHED_RR, &threadparam);
> +    if (err) {
> +        VLOG_WARN("Thread priority error %d",err);
> +    }
> +}
> +
>   static bool
>   dpdk_thread_is_pmd(void)
>   {
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index 646d3e2..168673b 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -26,6 +26,7 @@ int dpdk_init(int argc, char **argv);
>   void netdev_dpdk_register(void);
>   void free_dpdk_buf(struct dp_packet *);
>   int pmd_thread_setaffinity_cpu(unsigned cpu);
> +void pmd_thread_setpriority(void);
>
>   #else
>
>
Bodireddy, Bhanuprakash April 25, 2016, 8:09 p.m. UTC | #2
Thanks for looking in to the patch Kevin, please see my reply inline.

> -----Original Message-----

> From: Traynor, Kevin

> Sent: Monday, April 25, 2016 6:08 PM

> To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>;

> dev@openvswitch.org

> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Set pmd thread priority

> 

> On 21/04/2016 16:16, Bhanuprakash Bodireddy wrote:

> > Set the DPDK pmd thread scheduling policy to SCHED_RR and static

> > priority to highest priority value of the policy. This is to deal with

> > pmd thread starvation case where another cpu hogging process can get

> > scheduled/affinitized to the same core where pmd is running there by

> > significantly impacting the datapath performance.

> >

> > The realtime scheduling policy is applied only when CPU mask is passed

> > to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where

> > the policy and priority shall not be applied to pmd thread spawned on

> core0.

> > For example:

> >

> >      * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd

> >        thread shall be created and affinitized to 'core 0' with default

> >        scheduling policy and priority applied.

> >

> >      * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd

> >        threads shall be spawned on the corresponding core(s) in the mask

> >        and real time scheduling policy SCHED_RR and highest static

> >        priority is applied to the pmd thread(s).

> >

> > To reproduce use following commands:

> >

> > ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 taskset 0x2

> > cat /dev/zero > /dev/null &

> 

> Even though it seems the most likely case - I'm not sure that we can always

> assume the user who put the non-OVS process on the core did so by mistake

> and would want us to increase our priority.


You've got a point.  When the user explicitly sets pmd-cpu-mask, he is likely looking at
Performance and wants to pin  pmd threads to  dedicated cores and unlikely to run
any other process on the pmd cores.

Having said that, I have come across cases with HT enabled, users ran in to issues
understanding 'thread_sliblings_list' and wrongly affinitize qemu threads 
to the cores running pmds using 'taskset' there by significantly impacting the 
datapath performance. This patch will mitigate such issues. 

> 

> >

> > Signed-off-by: Bhanuprakash Bodireddy

> <bhanuprakash.bodireddy@intel.com>

> > ---

> >   lib/dpif-netdev.c |  9 +++++++++

> >   lib/netdev-dpdk.c | 14 ++++++++++++++

> >   lib/netdev-dpdk.h |  1 +

> >   3 files changed, 24 insertions(+)

> >

> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

> > index 1e8a37c..4a46816 100644

> > --- a/lib/dpif-netdev.c

> > +++ b/lib/dpif-netdev.c

> > @@ -2670,6 +2670,15 @@ pmd_thread_main(void *f_)

> >       /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */

> >       ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);

> >       pmd_thread_setaffinity_cpu(pmd->core_id);

> > +

> > +#ifdef DPDK_NETDEV

> > +    /* Set pmd thread's scheduling policy to SCHED_RR and priority to

> > +     * highest priority of SCHED_RR policy, In absence of pmd-cpu-mask (or)

> > +     * pmd-cpu-mask=1, default scheduling policy and priority shall

> > +     * apply to pmd thread */

> > +     if (pmd->core_id)

> > +        pmd_thread_setpriority();

> 

> Similar to above, I don't think we can assume anything special about

> core 0. This type of change sounds like something that would be better

> done at a layer above vswitch which has more system wide knowledge.


I stand to be corrected,  Its very uncommon to isolate core0 or run HPC apps/threads
on the core 0. Also on multicore systems to improve application performance and mitigate 
Interrupts, IRQs get explicity affinitized to Core 0.  For this reasons I treat Core 0 special.

> fwiw, it would be cleaner to remove the #ifdef from here and create a

> dummy fn in netdev-dpdk.h, also the 'if' needs {}


Agree.
 
> > +#endif

> >   reload:

> >       emc_cache_init(&pmd->flow_cache);

> >

> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

> > index 208c5f5..6518c87 100644

> > --- a/lib/netdev-dpdk.c

> > +++ b/lib/netdev-dpdk.c

> > @@ -2926,6 +2926,20 @@ pmd_thread_setaffinity_cpu(unsigned cpu)

> >       return 0;

> >   }

> >

> > +void

> > +pmd_thread_setpriority(void)

> > +{

> > +    struct sched_param threadparam;

> > +    int err;

> > +

> > +    memset(&threadparam, 0, sizeof(threadparam));

> > +    threadparam.sched_priority = sched_get_priority_max(SCHED_RR);

> > +    err = pthread_setschedparam(pthread_self(), SCHED_RR,

> &threadparam);

> > +    if (err) {

> > +        VLOG_WARN("Thread priority error %d",err);

> > +    }

> > +}

> > +

> >   static bool

> >   dpdk_thread_is_pmd(void)

> >   {

> > diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h

> > index 646d3e2..168673b 100644

> > --- a/lib/netdev-dpdk.h

> > +++ b/lib/netdev-dpdk.h

> > @@ -26,6 +26,7 @@ int dpdk_init(int argc, char **argv);

> >   void netdev_dpdk_register(void);

> >   void free_dpdk_buf(struct dp_packet *);

> >   int pmd_thread_setaffinity_cpu(unsigned cpu);

> > +void pmd_thread_setpriority(void);

> >

> >   #else

> >

> >
diff mbox

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1e8a37c..4a46816 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2670,6 +2670,15 @@  pmd_thread_main(void *f_)
     /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
     ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
     pmd_thread_setaffinity_cpu(pmd->core_id);
+
+#ifdef DPDK_NETDEV
+    /* Set pmd thread's scheduling policy to SCHED_RR and priority to
+     * highest priority of SCHED_RR policy, In absence of pmd-cpu-mask (or)
+     * pmd-cpu-mask=1, default scheduling policy and priority shall
+     * apply to pmd thread */
+     if (pmd->core_id)
+        pmd_thread_setpriority();
+#endif
 reload:
     emc_cache_init(&pmd->flow_cache);
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 208c5f5..6518c87 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2926,6 +2926,20 @@  pmd_thread_setaffinity_cpu(unsigned cpu)
     return 0;
 }
 
+void
+pmd_thread_setpriority(void)
+{
+    struct sched_param threadparam;
+    int err;
+
+    memset(&threadparam, 0, sizeof(threadparam));
+    threadparam.sched_priority = sched_get_priority_max(SCHED_RR);
+    err = pthread_setschedparam(pthread_self(), SCHED_RR, &threadparam);
+    if (err) {
+        VLOG_WARN("Thread priority error %d",err);
+    }
+}
+
 static bool
 dpdk_thread_is_pmd(void)
 {
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 646d3e2..168673b 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -26,6 +26,7 @@  int dpdk_init(int argc, char **argv);
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
 int pmd_thread_setaffinity_cpu(unsigned cpu);
+void pmd_thread_setpriority(void);
 
 #else