[ovs-dev,v3,4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

Submitted by Kevin Traynor on Aug. 1, 2017, 3:58 p.m.

Details

Message ID 1501603092-6287-5-git-send-email-ktraynor@redhat.com
State New
Headers show

Commit Message

Kevin Traynor Aug. 1, 2017, 3:58 p.m.
Previously rxqs were assigned to pmds by round robin in
port/queue order.

Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.

The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 Documentation/howto/dpdk.rst |  7 +++++
 lib/dpif-netdev.c            | 73 +++++++++++++++++++++++++++++++++++---------
 2 files changed, 66 insertions(+), 14 deletions(-)

Comments

Greg Rose Aug. 4, 2017, 9:31 p.m.
On 08/01/2017 08:58 AM, Kevin Traynor wrote:
> Previously rxqs were assigned to pmds by round robin in
> port/queue order.
>
> Now that we have the processing cycles used for existing rxqs,
> use that information to try and produced a better balanced
> distribution of rxqs across pmds. i.e. given multiple pmds, the
> rxqs which have consumed the largest amount of processing cycles
> will be placed on different pmds.
>
> The rxqs are sorted by their processing cycles and assigned (in
> sorted order) round robin across pmds.
>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

Kevin,

Thanks for your work on openvswitch.

Unfortunately, this patch fails when applied to the master branch.  I didn't see a 'From:' sha id so I don't know which commit to base this series upon.

Here's the error:

Applying: dpif-netdev: Count the rxq processing cycles for an rxq.
Applying patch #796309 using 'git am'
Description: [ovs-dev,v3,4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
Applying: dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
error: patch failed: lib/dpif-netdev.c:3306
error: lib/dpif-netdev.c: patch does not apply
Patch failed at 0001 dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

I guess we need the series to be rebased.

Thanks,

- Greg

> ---
>   Documentation/howto/dpdk.rst |  7 +++++
>   lib/dpif-netdev.c            | 73 +++++++++++++++++++++++++++++++++++---------
>   2 files changed, 66 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index af01d3e..a969285 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become
>     thread.
>
> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
> +automatically. The processing cycles that have been required for each rxq
> +will be used where known to assign rxqs with the highest consumption of
> +processing cycles to different pmds.
> +
> +Rxq to pmds assignment takes place whenever there are configuration changes.
> +
>   QoS
>   ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 25a521a..a05e586 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
>   }
>
> +/* Sort Rx Queues by the processing cycles they are consuming. */
> +static int
> +rxq_cycle_sort(const void *a, const void *b)
> +{
> +    struct dp_netdev_rxq * qa;
> +    struct dp_netdev_rxq * qb;
> +
> +    qa = *(struct dp_netdev_rxq **) a;
> +    qb = *(struct dp_netdev_rxq **) b;
> +
> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
> +        return -1;
> +    }
> +
> +    return 1;
> +}
> +
>   /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>    * queues and marks the pmds as isolated.  Otherwise, assign non isolated
>    * pmds to unpinned queues.
>    *
> + * If 'pinned' is false queues will be sorted by processing cycles they are
> + * consuming and then assigned to pmds in round robin order.
> + *
>    * The function doesn't touch the pmd threads, it just stores the assignment
>    * in the 'pmd' member of each rxq. */
> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
>       struct dp_netdev_port *port;
>       struct rr_numa_list rr;
> -
> -    rr_numa_list_populate(dp, &rr);
> +    struct dp_netdev_rxq ** rxqs = NULL;
> +    int i, n_rxqs = 0;
> +    struct rr_numa *numa = NULL;
> +    int numa_id;
>
>       HMAP_FOR_EACH (port, node, &dp->ports) {
> -        struct rr_numa *numa;
> -        int numa_id;
> -
>           if (!netdev_is_pmd(port->netdev)) {
>               continue;
>           }
>
> -        numa_id = netdev_get_numa_id(port->netdev);
> -        numa = rr_numa_list_lookup(&rr, numa_id);
> -
>           for (int qid = 0; qid < port->n_rxq; qid++) {
>               struct dp_netdev_rxq *q = &port->rxqs[qid];
> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
>                   }
>               } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> -                if (!numa) {
> -                    VLOG_WARN("There's no available (non isolated) pmd thread "
> -                              "on numa node %d. Queue %d on port \'%s\' will "
> -                              "not be polled.",
> -                              numa_id, qid, netdev_get_name(port->netdev));
> +                if (n_rxqs == 0) {
> +                    rxqs = xmalloc(sizeof *rxqs);
>                   } else {
> -                    q->pmd = rr_numa_get_pmd(numa);
> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>                   }
> +                /* Store the queue. */
> +                rxqs[n_rxqs++] = q;
>               }
>           }
>       }
>
> +    if (n_rxqs > 1) {
> +        /* Sort the queues in order of the processing cycles
> +         * they consumed during their last pmd interval. */
> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> +    }
> +
> +    rr_numa_list_populate(dp, &rr);
> +    /* Assign the sorted queues to pmds in round robin. */
> +    for (i = 0; i < n_rxqs; i++) {
> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> +        numa = rr_numa_list_lookup(&rr, numa_id);
> +        if (!numa) {
> +            VLOG_WARN("There's no available (non isolated) pmd thread "
> +                      "on numa node %d. Queue %d on port \'%s\' will "
> +                      "not be polled.",
> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> +                      netdev_get_name(rxqs[i]->port->netdev));
> +            continue;
> +        }
> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
> +                  "rx queue %d (measured processing cycles %"PRIu64").",
> +                  rxqs[i]->pmd->core_id, numa_id,
> +                  netdev_rxq_get_name(rxqs[i]->rx),
> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
> +                  dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_LAST));
> +    }
> +
>       rr_numa_list_destroy(&rr);
> +    free(rxqs);
>   }
>
>
Kevin Traynor Aug. 4, 2017, 9:56 p.m.
On 08/04/2017 10:31 PM, Greg Rose wrote:
> On 08/01/2017 08:58 AM, Kevin Traynor wrote:
>> Previously rxqs were assigned to pmds by round robin in
>> port/queue order.
>>
>> Now that we have the processing cycles used for existing rxqs,
>> use that information to try and produced a better balanced
>> distribution of rxqs across pmds. i.e. given multiple pmds, the
>> rxqs which have consumed the largest amount of processing cycles
>> will be placed on different pmds.
>>
>> The rxqs are sorted by their processing cycles and assigned (in
>> sorted order) round robin across pmds.
>>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> 
> Kevin,
> 
> Thanks for your work on openvswitch.
> 
> Unfortunately, this patch fails when applied to the master branch.  I
> didn't see a 'From:' sha id so I don't know which commit to base this
> series upon.
> 
> Here's the error:
> 
> Applying: dpif-netdev: Count the rxq processing cycles for an rxq.
> Applying patch #796309 using 'git am'
> Description: [ovs-dev,v3,4/6] dpif-netdev: Change rxq_scheduling to use
> rxq processing cycles.
> Applying: dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
> error: patch failed: lib/dpif-netdev.c:3306
> error: lib/dpif-netdev.c: patch does not apply
> Patch failed at 0001 dpif-netdev: Change rxq_scheduling to use rxq
> processing cycles.
> 
> I guess we need the series to be rebased.
> 

Hi Greg,

Thanks for your mail. A commit merged [1] in the last couple of days
that touches some of the same code, so you're right I need to do a
rebase and re-run a few tests. fyi, I submitted earlier in the week
based on commit 8014f465e272 if you were interested in seeing it before
rebase. Even though both changes are touching the same code, the
functionality is independent and they fit together nicely.

Kevin.

[1]
commit c37813fdb030b4270d05ad61943754f67021a50d
Author: Billy O'Mahony <billy.o.mahony@intel.com>
Date:   Tue Aug 1 14:38:43 2017 -0700

    dpif-netdev: Assign ports to pmds on non-local numa node.


> Thanks,
> 
> - Greg
> 
>> ---
>>   Documentation/howto/dpdk.rst |  7 +++++
>>   lib/dpif-netdev.c            | 73
>> +++++++++++++++++++++++++++++++++++---------
>>   2 files changed, 66 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>> index af01d3e..a969285 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues
>> was pinned will become
>>     thread.
>>
>> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to
>> pmds (cores)
>> +automatically. The processing cycles that have been required for each
>> rxq
>> +will be used where known to assign rxqs with the highest consumption of
>> +processing cycles to different pmds.
>> +
>> +Rxq to pmds assignment takes place whenever there are configuration
>> changes.
>> +
>>   QoS
>>   ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 25a521a..a05e586 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
>>   }
>>
>> +/* Sort Rx Queues by the processing cycles they are consuming. */
>> +static int
>> +rxq_cycle_sort(const void *a, const void *b)
>> +{
>> +    struct dp_netdev_rxq * qa;
>> +    struct dp_netdev_rxq * qb;
>> +
>> +    qa = *(struct dp_netdev_rxq **) a;
>> +    qb = *(struct dp_netdev_rxq **) b;
>> +
>> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
>> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
>> +        return -1;
>> +    }
>> +
>> +    return 1;
>> +}
>> +
>>   /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>>    * queues and marks the pmds as isolated.  Otherwise, assign non
>> isolated
>>    * pmds to unpinned queues.
>>    *
>> + * If 'pinned' is false queues will be sorted by processing cycles
>> they are
>> + * consuming and then assigned to pmds in round robin order.
>> + *
>>    * The function doesn't touch the pmd threads, it just stores the
>> assignment
>>    * in the 'pmd' member of each rxq. */
>> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool
>> pinned) OVS_REQUIRES(dp->port_mutex)
>>       struct dp_netdev_port *port;
>>       struct rr_numa_list rr;
>> -
>> -    rr_numa_list_populate(dp, &rr);
>> +    struct dp_netdev_rxq ** rxqs = NULL;
>> +    int i, n_rxqs = 0;
>> +    struct rr_numa *numa = NULL;
>> +    int numa_id;
>>
>>       HMAP_FOR_EACH (port, node, &dp->ports) {
>> -        struct rr_numa *numa;
>> -        int numa_id;
>> -
>>           if (!netdev_is_pmd(port->netdev)) {
>>               continue;
>>           }
>>
>> -        numa_id = netdev_get_numa_id(port->netdev);
>> -        numa = rr_numa_list_lookup(&rr, numa_id);
>> -
>>           for (int qid = 0; qid < port->n_rxq; qid++) {
>>               struct dp_netdev_rxq *q = &port->rxqs[qid];
>> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool
>> pinned) OVS_REQUIRES(dp->port_mutex)
>>                   }
>>               } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>> -                if (!numa) {
>> -                    VLOG_WARN("There's no available (non isolated)
>> pmd thread "
>> -                              "on numa node %d. Queue %d on port
>> \'%s\' will "
>> -                              "not be polled.",
>> -                              numa_id, qid,
>> netdev_get_name(port->netdev));
>> +                if (n_rxqs == 0) {
>> +                    rxqs = xmalloc(sizeof *rxqs);
>>                   } else {
>> -                    q->pmd = rr_numa_get_pmd(numa);
>> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>>                   }
>> +                /* Store the queue. */
>> +                rxqs[n_rxqs++] = q;
>>               }
>>           }
>>       }
>>
>> +    if (n_rxqs > 1) {
>> +        /* Sort the queues in order of the processing cycles
>> +         * they consumed during their last pmd interval. */
>> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
>> +    }
>> +
>> +    rr_numa_list_populate(dp, &rr);
>> +    /* Assign the sorted queues to pmds in round robin. */
>> +    for (i = 0; i < n_rxqs; i++) {
>> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
>> +        numa = rr_numa_list_lookup(&rr, numa_id);
>> +        if (!numa) {
>> +            VLOG_WARN("There's no available (non isolated) pmd thread "
>> +                      "on numa node %d. Queue %d on port \'%s\' will "
>> +                      "not be polled.",
>> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
>> +                      netdev_get_name(rxqs[i]->port->netdev));
>> +            continue;
>> +        }
>> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
>> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
>> +                  "rx queue %d (measured processing cycles %"PRIu64").",
>> +                  rxqs[i]->pmd->core_id, numa_id,
>> +                  netdev_rxq_get_name(rxqs[i]->rx),
>> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
>> +                  dp_netdev_rxq_get_cycles(rxqs[i],
>> RXQ_CYCLES_PROC_LAST));
>> +    }
>> +
>>       rr_numa_list_destroy(&rr);
>> +    free(rxqs);
>>   }
>>
>>
>
Greg Rose Aug. 4, 2017, 10:13 p.m.
On 08/04/2017 02:56 PM, Kevin Traynor wrote:
> On 08/04/2017 10:31 PM, Greg Rose wrote:
> > On 08/01/2017 08:58 AM, Kevin Traynor wrote:
> >> Previously rxqs were assigned to pmds by round robin in
> >> port/queue order.
> >>
> >> Now that we have the processing cycles used for existing rxqs,
> >> use that information to try and produced a better balanced
> >> distribution of rxqs across pmds. i.e. given multiple pmds, the
> >> rxqs which have consumed the largest amount of processing cycles
> >> will be placed on different pmds.
> >>
> >> The rxqs are sorted by their processing cycles and assigned (in
> >> sorted order) round robin across pmds.
> >>
> >> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >
> > Kevin,
> >
> > Thanks for your work on openvswitch.
> >
> > Unfortunately, this patch fails when applied to the master branch.  I
> > didn't see a 'From:' sha id so I don't know which commit to base this
> > series upon.
> >
> > Here's the error:
> >
> > Applying: dpif-netdev: Count the rxq processing cycles for an rxq.
> > Applying patch #796309 using 'git am'
> > Description: [ovs-dev,v3,4/6] dpif-netdev: Change rxq_scheduling to use
> > rxq processing cycles.
> > Applying: dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
> > error: patch failed: lib/dpif-netdev.c:3306
> > error: lib/dpif-netdev.c: patch does not apply
> > Patch failed at 0001 dpif-netdev: Change rxq_scheduling to use rxq
> > processing cycles.
> >
> > I guess we need the series to be rebased.
> >
>
> Hi Greg,
>
> Thanks for your mail. A commit merged [1] in the last couple of days
> that touches some of the same code, so you're right I need to do a
> rebase and re-run a few tests. fyi, I submitted earlier in the week
> based on commit 8014f465e272 if you were interested in seeing it before
> rebase. Even though both changes are touching the same code, the
> functionality is independent and they fit together nicely.
>
> Kevin.

Great, I'll give it a whirl.

Thanks for the prompt response!

- Greg

>
> [1]
> commit c37813fdb030b4270d05ad61943754f67021a50d
> Author: Billy O'Mahony <billy.o.mahony@intel.com>
> Date:   Tue Aug 1 14:38:43 2017 -0700
>
>      dpif-netdev: Assign ports to pmds on non-local numa node.
>
>
> > Thanks,
> >
> > - Greg
> >
> >> ---
> >>    Documentation/howto/dpdk.rst |  7 +++++
> >>    lib/dpif-netdev.c            | 73
> >> +++++++++++++++++++++++++++++++++++---------
> >>    2 files changed, 66 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> >> index af01d3e..a969285 100644
> >> --- a/Documentation/howto/dpdk.rst
> >> +++ b/Documentation/howto/dpdk.rst
> >> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues
> >> was pinned will become
> >>      thread.
> >>
> >> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to
> >> pmds (cores)
> >> +automatically. The processing cycles that have been required for each
> >> rxq
> >> +will be used where known to assign rxqs with the highest consumption of
> >> +processing cycles to different pmds.
> >> +
> >> +Rxq to pmds assignment takes place whenever there are configuration
> >> changes.
> >> +
> >>    QoS
> >>    ---
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 25a521a..a05e586 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
> >>    }
> >>
> >> +/* Sort Rx Queues by the processing cycles they are consuming. */
> >> +static int
> >> +rxq_cycle_sort(const void *a, const void *b)
> >> +{
> >> +    struct dp_netdev_rxq * qa;
> >> +    struct dp_netdev_rxq * qb;
> >> +
> >> +    qa = *(struct dp_netdev_rxq **) a;
> >> +    qb = *(struct dp_netdev_rxq **) b;
> >> +
> >> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
> >> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
> >> +        return -1;
> >> +    }
> >> +
> >> +    return 1;
> >> +}
> >> +
> >>    /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
> >>     * queues and marks the pmds as isolated.  Otherwise, assign non
> >> isolated
> >>     * pmds to unpinned queues.
> >>     *
> >> + * If 'pinned' is false queues will be sorted by processing cycles
> >> they are
> >> + * consuming and then assigned to pmds in round robin order.
> >> + *
> >>     * The function doesn't touch the pmd threads, it just stores the
> >> assignment
> >>     * in the 'pmd' member of each rxq. */
> >> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned) OVS_REQUIRES(dp->port_mutex)
> >>        struct dp_netdev_port *port;
> >>        struct rr_numa_list rr;
> >> -
> >> -    rr_numa_list_populate(dp, &rr);
> >> +    struct dp_netdev_rxq ** rxqs = NULL;
> >> +    int i, n_rxqs = 0;
> >> +    struct rr_numa *numa = NULL;
> >> +    int numa_id;
> >>
> >>        HMAP_FOR_EACH (port, node, &dp->ports) {
> >> -        struct rr_numa *numa;
> >> -        int numa_id;
> >> -
> >>            if (!netdev_is_pmd(port->netdev)) {
> >>                continue;
> >>            }
> >>
> >> -        numa_id = netdev_get_numa_id(port->netdev);
> >> -        numa = rr_numa_list_lookup(&rr, numa_id);
> >> -
> >>            for (int qid = 0; qid < port->n_rxq; qid++) {
> >>                struct dp_netdev_rxq *q = &port->rxqs[qid];
> >> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned) OVS_REQUIRES(dp->port_mutex)
> >>                    }
> >>                } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> >> -                if (!numa) {
> >> -                    VLOG_WARN("There's no available (non isolated)
> >> pmd thread "
> >> -                              "on numa node %d. Queue %d on port
> >> \'%s\' will "
> >> -                              "not be polled.",
> >> -                              numa_id, qid,
> >> netdev_get_name(port->netdev));
> >> +                if (n_rxqs == 0) {
> >> +                    rxqs = xmalloc(sizeof *rxqs);
> >>                    } else {
> >> -                    q->pmd = rr_numa_get_pmd(numa);
> >> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
> >>                    }
> >> +                /* Store the queue. */
> >> +                rxqs[n_rxqs++] = q;
> >>                }
> >>            }
> >>        }
> >>
> >> +    if (n_rxqs > 1) {
> >> +        /* Sort the queues in order of the processing cycles
> >> +         * they consumed during their last pmd interval. */
> >> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> >> +    }
> >> +
> >> +    rr_numa_list_populate(dp, &rr);
> >> +    /* Assign the sorted queues to pmds in round robin. */
> >> +    for (i = 0; i < n_rxqs; i++) {
> >> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> >> +        numa = rr_numa_list_lookup(&rr, numa_id);
> >> +        if (!numa) {
> >> +            VLOG_WARN("There's no available (non isolated) pmd thread "
> >> +                      "on numa node %d. Queue %d on port \'%s\' will "
> >> +                      "not be polled.",
> >> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> >> +                      netdev_get_name(rxqs[i]->port->netdev));
> >> +            continue;
> >> +        }
> >> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> >> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
> >> +                  "rx queue %d (measured processing cycles %"PRIu64").",
> >> +                  rxqs[i]->pmd->core_id, numa_id,
> >> +                  netdev_rxq_get_name(rxqs[i]->rx),
> >> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
> >> +                  dp_netdev_rxq_get_cycles(rxqs[i],
> >> RXQ_CYCLES_PROC_LAST));
> >> +    }
> >> +
> >>        rr_numa_list_destroy(&rr);
> >> +    free(rxqs);
> >>    }
> >>
> >>
> >
>
Greg Rose Aug. 8, 2017, 6:15 p.m.
On 08/01/2017 08:58 AM, Kevin Traynor wrote:
> Previously rxqs were assigned to pmds by round robin in
> port/queue order.
>
> Now that we have the processing cycles used for existing rxqs,
> use that information to try and produced a better balanced
> distribution of rxqs across pmds. i.e. given multiple pmds, the
> rxqs which have consumed the largest amount of processing cycles
> will be placed on different pmds.
>
> The rxqs are sorted by their processing cycles and assigned (in
> sorted order) round robin across pmds.
>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> ---
>   Documentation/howto/dpdk.rst |  7 +++++
>   lib/dpif-netdev.c            | 73 +++++++++++++++++++++++++++++++++++---------
>   2 files changed, 66 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index af01d3e..a969285 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become
>     thread.
>
> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
> +automatically. The processing cycles that have been required for each rxq
> +will be used where known to assign rxqs with the highest consumption of
> +processing cycles to different pmds.
> +
> +Rxq to pmds assignment takes place whenever there are configuration changes.
> +
>   QoS
>   ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 25a521a..a05e586 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
>   }
>
> +/* Sort Rx Queues by the processing cycles they are consuming. */
> +static int
> +rxq_cycle_sort(const void *a, const void *b)
> +{
> +    struct dp_netdev_rxq * qa;
> +    struct dp_netdev_rxq * qb;
> +
> +    qa = *(struct dp_netdev_rxq **) a;
> +    qb = *(struct dp_netdev_rxq **) b;
> +
> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
> +        return -1;
> +    }
> +
> +    return 1;
> +}
> +
>   /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>    * queues and marks the pmds as isolated.  Otherwise, assign non isolated
>    * pmds to unpinned queues.
>    *
> + * If 'pinned' is false queues will be sorted by processing cycles they are
> + * consuming and then assigned to pmds in round robin order.
> + *
>    * The function doesn't touch the pmd threads, it just stores the assignment
>    * in the 'pmd' member of each rxq. */
> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
>       struct dp_netdev_port *port;
>       struct rr_numa_list rr;
> -
> -    rr_numa_list_populate(dp, &rr);
> +    struct dp_netdev_rxq ** rxqs = NULL;
> +    int i, n_rxqs = 0;
> +    struct rr_numa *numa = NULL;
> +    int numa_id;
>
>       HMAP_FOR_EACH (port, node, &dp->ports) {
> -        struct rr_numa *numa;
> -        int numa_id;
> -
>           if (!netdev_is_pmd(port->netdev)) {
>               continue;
>           }
>
> -        numa_id = netdev_get_numa_id(port->netdev);
> -        numa = rr_numa_list_lookup(&rr, numa_id);
> -
>           for (int qid = 0; qid < port->n_rxq; qid++) {
>               struct dp_netdev_rxq *q = &port->rxqs[qid];
> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
>                   }
>               } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> -                if (!numa) {
> -                    VLOG_WARN("There's no available (non isolated) pmd thread "
> -                              "on numa node %d. Queue %d on port \'%s\' will "
> -                              "not be polled.",
> -                              numa_id, qid, netdev_get_name(port->netdev));
> +                if (n_rxqs == 0) {
> +                    rxqs = xmalloc(sizeof *rxqs);
>                   } else {
> -                    q->pmd = rr_numa_get_pmd(numa);
> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>                   }
> +                /* Store the queue. */
> +                rxqs[n_rxqs++] = q;
>               }
>           }
>       }
>
> +    if (n_rxqs > 1) {
> +        /* Sort the queues in order of the processing cycles
> +         * they consumed during their last pmd interval. */
> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> +    }
> +
> +    rr_numa_list_populate(dp, &rr);
> +    /* Assign the sorted queues to pmds in round robin. */
> +    for (i = 0; i < n_rxqs; i++) {
> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> +        numa = rr_numa_list_lookup(&rr, numa_id);
> +        if (!numa) {
> +            VLOG_WARN("There's no available (non isolated) pmd thread "
> +                      "on numa node %d. Queue %d on port \'%s\' will "
> +                      "not be polled.",
> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> +                      netdev_get_name(rxqs[i]->port->netdev));
> +            continue;
> +        }
> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
> +                  "rx queue %d (measured processing cycles %"PRIu64").",
> +                  rxqs[i]->pmd->core_id, numa_id,
> +                  netdev_rxq_get_name(rxqs[i]->rx),
> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
> +                  dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_LAST));

Kevin,

I've been reviewing and testing this code and found something odd.  The measured processing cycles are
always zero in my setup.

sample log output:

2017-08-08T12:48:25.871Z|00417|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'port-em2' rx queue 5 (measured processing cycles 10011304791).
2017-08-08T12:48:25.871Z|00418|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'port-em2' rx queue 4 (measured processing cycles 0).

Initially I configure my setup with 16 rxq's and a PMD CPU mask of 0x1FFFE.  Then I've been testing by running
iperf traffic with multiple ports 8 or 16 (-P option) to allow 'processing cycles' to count up.  Or at least I think that's
what should be happening.  But when I reconfigure the rxq's and cpu mask the processing cycles is always
zero.

How are you testing this?  Perhaps it's just my test harness or something else.

Initial setup:

ovs-vsctl set Interface port-em2 options:n_rxq=16
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1FFFE

(Note that I do not set affinity - I have read your patch to infer that this is for cases without affinitization.)

After getting traffic I then run this setup:

ovs-vsctl set Interface port-em2 options:n_rxq=4
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1E

Any advice or comment?

Thanks,

- Greg

> +    }
> +
>       rr_numa_list_destroy(&rr);
> +    free(rxqs);
>   }
>
>
Kevin Traynor Aug. 9, 2017, 3:47 p.m.
On 08/08/2017 07:15 PM, Greg Rose wrote:
> On 08/01/2017 08:58 AM, Kevin Traynor wrote:
>> Previously rxqs were assigned to pmds by round robin in
>> port/queue order.
>>
>> Now that we have the processing cycles used for existing rxqs,
>> use that information to try and produced a better balanced
>> distribution of rxqs across pmds. i.e. given multiple pmds, the
>> rxqs which have consumed the largest amount of processing cycles
>> will be placed on different pmds.
>>
>> The rxqs are sorted by their processing cycles and assigned (in
>> sorted order) round robin across pmds.
>>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>> ---
>>   Documentation/howto/dpdk.rst |  7 +++++
>>   lib/dpif-netdev.c            | 73
>> +++++++++++++++++++++++++++++++++++---------
>>   2 files changed, 66 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>> index af01d3e..a969285 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues
>> was pinned will become
>>     thread.
>>
>> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to
>> pmds (cores)
>> +automatically. The processing cycles that have been required for each
>> rxq
>> +will be used where known to assign rxqs with the highest consumption of
>> +processing cycles to different pmds.
>> +
>> +Rxq to pmds assignment takes place whenever there are configuration
>> changes.
>> +
>>   QoS
>>   ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 25a521a..a05e586 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
>>   }
>>
>> +/* Sort Rx Queues by the processing cycles they are consuming. */
>> +static int
>> +rxq_cycle_sort(const void *a, const void *b)
>> +{
>> +    struct dp_netdev_rxq * qa;
>> +    struct dp_netdev_rxq * qb;
>> +
>> +    qa = *(struct dp_netdev_rxq **) a;
>> +    qb = *(struct dp_netdev_rxq **) b;
>> +
>> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
>> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
>> +        return -1;
>> +    }
>> +
>> +    return 1;
>> +}
>> +
>>   /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>>    * queues and marks the pmds as isolated.  Otherwise, assign non
>> isolated
>>    * pmds to unpinned queues.
>>    *
>> + * If 'pinned' is false queues will be sorted by processing cycles
>> they are
>> + * consuming and then assigned to pmds in round robin order.
>> + *
>>    * The function doesn't touch the pmd threads, it just stores the
>> assignment
>>    * in the 'pmd' member of each rxq. */
>> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool
>> pinned) OVS_REQUIRES(dp->port_mutex)
>>       struct dp_netdev_port *port;
>>       struct rr_numa_list rr;
>> -
>> -    rr_numa_list_populate(dp, &rr);
>> +    struct dp_netdev_rxq ** rxqs = NULL;
>> +    int i, n_rxqs = 0;
>> +    struct rr_numa *numa = NULL;
>> +    int numa_id;
>>
>>       HMAP_FOR_EACH (port, node, &dp->ports) {
>> -        struct rr_numa *numa;
>> -        int numa_id;
>> -
>>           if (!netdev_is_pmd(port->netdev)) {
>>               continue;
>>           }
>>
>> -        numa_id = netdev_get_numa_id(port->netdev);
>> -        numa = rr_numa_list_lookup(&rr, numa_id);
>> -
>>           for (int qid = 0; qid < port->n_rxq; qid++) {
>>               struct dp_netdev_rxq *q = &port->rxqs[qid];
>> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool
>> pinned) OVS_REQUIRES(dp->port_mutex)
>>                   }
>>               } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>> -                if (!numa) {
>> -                    VLOG_WARN("There's no available (non isolated)
>> pmd thread "
>> -                              "on numa node %d. Queue %d on port
>> \'%s\' will "
>> -                              "not be polled.",
>> -                              numa_id, qid,
>> netdev_get_name(port->netdev));
>> +                if (n_rxqs == 0) {
>> +                    rxqs = xmalloc(sizeof *rxqs);
>>                   } else {
>> -                    q->pmd = rr_numa_get_pmd(numa);
>> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>>                   }
>> +                /* Store the queue. */
>> +                rxqs[n_rxqs++] = q;
>>               }
>>           }
>>       }
>>
>> +    if (n_rxqs > 1) {
>> +        /* Sort the queues in order of the processing cycles
>> +         * they consumed during their last pmd interval. */
>> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
>> +    }
>> +
>> +    rr_numa_list_populate(dp, &rr);
>> +    /* Assign the sorted queues to pmds in round robin. */
>> +    for (i = 0; i < n_rxqs; i++) {
>> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
>> +        numa = rr_numa_list_lookup(&rr, numa_id);
>> +        if (!numa) {
>> +            VLOG_WARN("There's no available (non isolated) pmd thread "
>> +                      "on numa node %d. Queue %d on port \'%s\' will "
>> +                      "not be polled.",
>> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
>> +                      netdev_get_name(rxqs[i]->port->netdev));
>> +            continue;
>> +        }
>> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
>> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
>> +                  "rx queue %d (measured processing cycles %"PRIu64").",
>> +                  rxqs[i]->pmd->core_id, numa_id,
>> +                  netdev_rxq_get_name(rxqs[i]->rx),
>> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
>> +                  dp_netdev_rxq_get_cycles(rxqs[i],
>> RXQ_CYCLES_PROC_LAST));
> 
> Kevin,
> 
> I've been reviewing and testing this code and found something odd.  The
> measured processing cycles are
> always zero in my setup.
> 
> sample log output:
> 
> 2017-08-08T12:48:25.871Z|00417|dpif_netdev|INFO|Core 6 on numa node 0
> assigned port 'port-em2' rx queue 5 (measured processing cycles
> 10011304791).
> 2017-08-08T12:48:25.871Z|00418|dpif_netdev|INFO|Core 6 on numa node 0
> assigned port 'port-em2' rx queue 4 (measured processing cycles 0).
> 
> Initially I configure my setup with 16 rxq's and a PMD CPU mask of
> 0x1FFFE.  Then I've been testing by running
> iperf traffic with multiple ports 8 or 16 (-P option) to allow
> 'processing cycles' to count up.  Or at least I think that's
> what should be happening.  But when I reconfigure the rxq's and cpu mask
> the processing cycles is always
> zero.
> 

Hi Greg, thanks for trying it out. I see that rxq 5 has measured cycles
so it appears to be just on some queues.

The stat that is showing is the processing cycles that was counted for
the rxq during the last 1 sec run while it was on a pmd. "processing
cycles" counts time to fetch packets and process them but it does not
count time spent polling when there are no rx packets.

There's a few reasons it could be 0:
- The queue is newly added
- There is no rx traffic on that interface
- The interface has not distributed the traffic to that particular rxq
so there is no "processing cycles" done for that queue.

Given the rxq number in the log, I would hazard a guess that it's the
last issue. You could confirm this by setting pmds > total rxqs, so that
each pmd has a max of 1 rxq. Then the pmds stats then can indicate if
there are packets being received on that pmd, and hence rxq. You can
check that setup with
ovs-appctl dpif-netdev/pmd-rxq-show
ovs-appctl dpif-netdev/pmd-stats-clear
ovs-appctl dpif-netdev/pmd-stats-show

If you increase the number of flows so that the RSS in the NIC (IIRC
relies on 5-tuple) can split them across the full range of rxq's it
should solve that issue. Of course there could always be a bug somewhere
too!

> How are you testing this?  Perhaps it's just my test harness or
> something else.
> 

I'm using 2 dpdk ports with flows added to send between them. Externally
I have pktgen-dpdk connected and sending 1K flows so I hit all queues.
Then varying traffic rates, pmds and queue numbers and also using
ovs-appctl dpif-netdev/pmd-rxq-rebalance from 6/6.

> Initial setup:
> 
> ovs-vsctl set Interface port-em2 options:n_rxq=16
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1FFFE
> 
> (Note that I do not set affinity - I have read your patch to infer that
> this is for cases without affinitization.)
> 

That's correct, and manual affinitization takes precedence (I need to
add in docs if I haven't). The patchset only changes the how the
non-affinitized rxqs are distributed.

> After getting traffic I then run this setup:
> 
> ovs-vsctl set Interface port-em2 options:n_rxq=4
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1E
> 
> Any advice or comment?
> 
> Thanks,
> 
> - Greg
> 

Just sent a v4 also with rebase for head of master.

thanks,
Kevin.

>> +    }
>> +
>>       rr_numa_list_destroy(&rr);
>> +    free(rxqs);
>>   }
>>
>>
>
Greg Rose Aug. 10, 2017, 12:42 a.m.
On 08/09/2017 08:47 AM, Kevin Traynor wrote:
> On 08/08/2017 07:15 PM, Greg Rose wrote:
> > On 08/01/2017 08:58 AM, Kevin Traynor wrote:
> >> Previously rxqs were assigned to pmds by round robin in
> >> port/queue order.
> >>
> >> Now that we have the processing cycles used for existing rxqs,
> >> use that information to try and produced a better balanced
> >> distribution of rxqs across pmds. i.e. given multiple pmds, the
> >> rxqs which have consumed the largest amount of processing cycles
> >> will be placed on different pmds.
> >>
> >> The rxqs are sorted by their processing cycles and assigned (in
> >> sorted order) round robin across pmds.
> >>
> >> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >> ---
> >>    Documentation/howto/dpdk.rst |  7 +++++
> >>    lib/dpif-netdev.c            | 73
> >> +++++++++++++++++++++++++++++++++++---------
> >>    2 files changed, 66 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> >> index af01d3e..a969285 100644
> >> --- a/Documentation/howto/dpdk.rst
> >> +++ b/Documentation/howto/dpdk.rst
> >> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues
> >> was pinned will become
> >>      thread.
> >>
> >> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to
> >> pmds (cores)
> >> +automatically. The processing cycles that have been required for each
> >> rxq
> >> +will be used where known to assign rxqs with the highest consumption of
> >> +processing cycles to different pmds.
> >> +
> >> +Rxq to pmds assignment takes place whenever there are configuration
> >> changes.
> >> +
> >>    QoS
> >>    ---
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 25a521a..a05e586 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
> >>    }
> >>
> >> +/* Sort Rx Queues by the processing cycles they are consuming. */
> >> +static int
> >> +rxq_cycle_sort(const void *a, const void *b)
> >> +{
> >> +    struct dp_netdev_rxq * qa;
> >> +    struct dp_netdev_rxq * qb;
> >> +
> >> +    qa = *(struct dp_netdev_rxq **) a;
> >> +    qb = *(struct dp_netdev_rxq **) b;
> >> +
> >> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
> >> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
> >> +        return -1;
> >> +    }
> >> +
> >> +    return 1;
> >> +}
> >> +
> >>    /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
> >>     * queues and marks the pmds as isolated.  Otherwise, assign non
> >> isolated
> >>     * pmds to unpinned queues.
> >>     *
> >> + * If 'pinned' is false queues will be sorted by processing cycles
> >> they are
> >> + * consuming and then assigned to pmds in round robin order.
> >> + *
> >>     * The function doesn't touch the pmd threads, it just stores the
> >> assignment
> >>     * in the 'pmd' member of each rxq. */
> >> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned) OVS_REQUIRES(dp->port_mutex)
> >>        struct dp_netdev_port *port;
> >>        struct rr_numa_list rr;
> >> -
> >> -    rr_numa_list_populate(dp, &rr);
> >> +    struct dp_netdev_rxq ** rxqs = NULL;
> >> +    int i, n_rxqs = 0;
> >> +    struct rr_numa *numa = NULL;
> >> +    int numa_id;
> >>
> >>        HMAP_FOR_EACH (port, node, &dp->ports) {
> >> -        struct rr_numa *numa;
> >> -        int numa_id;
> >> -
> >>            if (!netdev_is_pmd(port->netdev)) {
> >>                continue;
> >>            }
> >>
> >> -        numa_id = netdev_get_numa_id(port->netdev);
> >> -        numa = rr_numa_list_lookup(&rr, numa_id);
> >> -
> >>            for (int qid = 0; qid < port->n_rxq; qid++) {
> >>                struct dp_netdev_rxq *q = &port->rxqs[qid];
> >> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned) OVS_REQUIRES(dp->port_mutex)
> >>                    }
> >>                } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> >> -                if (!numa) {
> >> -                    VLOG_WARN("There's no available (non isolated)
> >> pmd thread "
> >> -                              "on numa node %d. Queue %d on port
> >> \'%s\' will "
> >> -                              "not be polled.",
> >> -                              numa_id, qid,
> >> netdev_get_name(port->netdev));
> >> +                if (n_rxqs == 0) {
> >> +                    rxqs = xmalloc(sizeof *rxqs);
> >>                    } else {
> >> -                    q->pmd = rr_numa_get_pmd(numa);
> >> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
> >>                    }
> >> +                /* Store the queue. */
> >> +                rxqs[n_rxqs++] = q;
> >>                }
> >>            }
> >>        }
> >>
> >> +    if (n_rxqs > 1) {
> >> +        /* Sort the queues in order of the processing cycles
> >> +         * they consumed during their last pmd interval. */
> >> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> >> +    }
> >> +
> >> +    rr_numa_list_populate(dp, &rr);
> >> +    /* Assign the sorted queues to pmds in round robin. */
> >> +    for (i = 0; i < n_rxqs; i++) {
> >> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> >> +        numa = rr_numa_list_lookup(&rr, numa_id);
> >> +        if (!numa) {
> >> +            VLOG_WARN("There's no available (non isolated) pmd thread "
> >> +                      "on numa node %d. Queue %d on port \'%s\' will "
> >> +                      "not be polled.",
> >> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> >> +                      netdev_get_name(rxqs[i]->port->netdev));
> >> +            continue;
> >> +        }
> >> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> >> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
> >> +                  "rx queue %d (measured processing cycles %"PRIu64").",
> >> +                  rxqs[i]->pmd->core_id, numa_id,
> >> +                  netdev_rxq_get_name(rxqs[i]->rx),
> >> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
> >> +                  dp_netdev_rxq_get_cycles(rxqs[i],
> >> RXQ_CYCLES_PROC_LAST));
> >
> > Kevin,
> >
> > I've been reviewing and testing this code and found something odd.  The
> > measured processing cycles are
> > always zero in my setup.
> >
> > sample log output:
> >
> > 2017-08-08T12:48:25.871Z|00417|dpif_netdev|INFO|Core 6 on numa node 0
> > assigned port 'port-em2' rx queue 5 (measured processing cycles
> > 10011304791).
> > 2017-08-08T12:48:25.871Z|00418|dpif_netdev|INFO|Core 6 on numa node 0
> > assigned port 'port-em2' rx queue 4 (measured processing cycles 0).
> >
> > Initially I configure my setup with 16 rxq's and a PMD CPU mask of
> > 0x1FFFE.  Then I've been testing by running
> > iperf traffic with multiple ports 8 or 16 (-P option) to allow
> > 'processing cycles' to count up.  Or at least I think that's
> > what should be happening.  But when I reconfigure the rxq's and cpu mask
> > the processing cycles is always
> > zero.
> >
>
> Hi Greg, thanks for trying it out. I see that rxq 5 has measured cycles
> so it appears to be just on some queues.
>
> The stat that is showing is the processing cycles that was counted for
> the rxq during the last 1 sec run while it was on a pmd. "processing
> cycles" counts time to fetch packets and process them but it does not
> count time spent polling when there are no rx packets.
>
> There's a few reasons it could be 0:
> - The queue is newly added
> - There is no rx traffic on that interface
> - The interface has not distributed the traffic to that particular rxq
> so there is no "processing cycles" done for that queue.
>
> Given the rxq number in the log, I would hazard a guess that it's the
> last issue. You could confirm this by setting pmds > total rxqs, so that
> each pmd has a max of 1 rxq. Then the pmds stats then can indicate if
> there are packets being received on that pmd, and hence rxq. You can
> check that setup with
> ovs-appctl dpif-netdev/pmd-rxq-show
> ovs-appctl dpif-netdev/pmd-stats-clear
> ovs-appctl dpif-netdev/pmd-stats-show

I'll give that a try with your latest revision of the patch series.  Thanks for the
clues!

- Greg

Patch hide | download patch | download mbox

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index af01d3e..a969285 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -119,4 +119,11 @@  After that PMD threads on cores where RX queues was pinned will become
   thread.
 
+If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
+automatically. The processing cycles that have been required for each rxq
+will be used where known to assign rxqs with the highest consumption of
+processing cycles to different pmds.
+
+Rxq to pmds assignment takes place whenever there are configuration changes.
+
 QoS
 ---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 25a521a..a05e586 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3295,8 +3295,29 @@  rr_numa_list_destroy(struct rr_numa_list *rr)
 }
 
+/* Sort Rx Queues by the processing cycles they are consuming. */
+static int
+rxq_cycle_sort(const void *a, const void *b)
+{
+    struct dp_netdev_rxq * qa;
+    struct dp_netdev_rxq * qb;
+
+    qa = *(struct dp_netdev_rxq **) a;
+    qb = *(struct dp_netdev_rxq **) b;
+
+    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
+            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
+        return -1;
+    }
+
+    return 1;
+}
+
 /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
  * queues and marks the pmds as isolated.  Otherwise, assign non isolated
  * pmds to unpinned queues.
  *
+ * If 'pinned' is false queues will be sorted by processing cycles they are
+ * consuming and then assigned to pmds in round robin order.
+ *
  * The function doesn't touch the pmd threads, it just stores the assignment
  * in the 'pmd' member of each rxq. */
@@ -3306,18 +3327,14 @@  rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
     struct dp_netdev_port *port;
     struct rr_numa_list rr;
-
-    rr_numa_list_populate(dp, &rr);
+    struct dp_netdev_rxq ** rxqs = NULL;
+    int i, n_rxqs = 0;
+    struct rr_numa *numa = NULL;
+    int numa_id;
 
     HMAP_FOR_EACH (port, node, &dp->ports) {
-        struct rr_numa *numa;
-        int numa_id;
-
         if (!netdev_is_pmd(port->netdev)) {
             continue;
         }
 
-        numa_id = netdev_get_numa_id(port->netdev);
-        numa = rr_numa_list_lookup(&rr, numa_id);
-
         for (int qid = 0; qid < port->n_rxq; qid++) {
             struct dp_netdev_rxq *q = &port->rxqs[qid];
@@ -3337,17 +3354,45 @@  rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
                 }
             } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
-                if (!numa) {
-                    VLOG_WARN("There's no available (non isolated) pmd thread "
-                              "on numa node %d. Queue %d on port \'%s\' will "
-                              "not be polled.",
-                              numa_id, qid, netdev_get_name(port->netdev));
+                if (n_rxqs == 0) {
+                    rxqs = xmalloc(sizeof *rxqs);
                 } else {
-                    q->pmd = rr_numa_get_pmd(numa);
+                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
                 }
+                /* Store the queue. */
+                rxqs[n_rxqs++] = q;
             }
         }
     }
 
+    if (n_rxqs > 1) {
+        /* Sort the queues in order of the processing cycles
+         * they consumed during their last pmd interval. */
+        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
+    }
+
+    rr_numa_list_populate(dp, &rr);
+    /* Assign the sorted queues to pmds in round robin. */
+    for (i = 0; i < n_rxqs; i++) {
+        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
+        numa = rr_numa_list_lookup(&rr, numa_id);
+        if (!numa) {
+            VLOG_WARN("There's no available (non isolated) pmd thread "
+                      "on numa node %d. Queue %d on port \'%s\' will "
+                      "not be polled.",
+                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
+                      netdev_get_name(rxqs[i]->port->netdev));
+            continue;
+        }
+        rxqs[i]->pmd = rr_numa_get_pmd(numa);
+        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
+                  "rx queue %d (measured processing cycles %"PRIu64").",
+                  rxqs[i]->pmd->core_id, numa_id,
+                  netdev_rxq_get_name(rxqs[i]->rx),
+                  netdev_rxq_get_queue_id(rxqs[i]->rx),
+                  dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_LAST));
+    }
+
     rr_numa_list_destroy(&rr);
+    free(rxqs);
 }