Message ID | 1535705275-28626-1-git-send-email-ktraynor@redhat.com |
---|---|
State | Accepted |
Delegated to: | Ian Stokes |
Headers | show |
Series | [ovs-dev,v5] dpif-netdev: Add round-robin based rxq to pmd assignment. | expand |
On 31 Aug 2018, at 10:47, Kevin Traynor wrote: > Prior to OVS 2.9 automatic assignment of Rxqs to PMDs > (i.e. CPUs) was done by round-robin. > > That was changed in OVS 2.9 to ordering the Rxqs based on > their measured processing cycles. This was to assign the > busiest Rxqs to different PMDs, improving aggregate > throughput. > > For the most part the new scheme should be better, but > there could be situations where a user prefers a simple > round-robin scheme because Rxqs from a single port are > more likely to be spread across multiple PMDs, and/or > traffic is very bursty/unpredictable. > > Add 'pmd-rxq-assign' config to allow a user to select > round-robin based assignment. > > Signed-off-by: Kevin Traynor <ktraynor@redhat.com> > Acked-by: Eelco Chaudron <echaudro@redhat.com> You kept my ack, but anyway I looked at it again… Acked-by: Eelco Chaudron <echaudro@redhat.com> > --- > > V5: > - Squashed NEWS into main patch (Ben) > > V4: > - Modified warning log (Ilya) > > V3: > - Rolled in some style and vswitch.xml changes (Ilya) > - Set cycles mode by default on wrong config (Ilya) > > V2: > - simplified nextpmd change (Eelco) > - removed confusing doc sentence (Eelco) > - fixed commit msg (Ilya) > - made change in pmd-rxq-assign value also perform re-assignment > (Ilya) > - renamed to roundrobin mode (Ilya) > - moved vswitch.xml changes to right config section (Ilya) > - comment/log updates > - moved NEWS update to separate patch as it's been changing on master > > Documentation/topics/dpdk/pmd.rst | 33 +++++++++++++--- > NEWS | 4 +- > lib/dpif-netdev.c | 83 > +++++++++++++++++++++++++++++---------- > tests/pmd.at | 12 +++++- > vswitchd/vswitch.xml | 24 +++++++++++ > 5 files changed, 126 insertions(+), 30 deletions(-) > > diff --git a/Documentation/topics/dpdk/pmd.rst > b/Documentation/topics/dpdk/pmd.rst > index 5f0671e..dd9172d 100644 > --- a/Documentation/topics/dpdk/pmd.rst > +++ b/Documentation/topics/dpdk/pmd.rst > @@ -113,10 +113,15 @@ means that this thread will only poll the > *pinned* Rx queues. > > If ``pmd-rxq-affinity`` is not set for Rx queues, they will be > assigned to PMDs > -(cores) automatically. Where known, the processing cycles that have > been stored > -for each Rx queue will be used to assign Rx queue to PMDs based on a > round > -robin of the sorted Rx queues. For example, take the following > example, where > -there are five Rx queues and three cores - 3, 7, and 8 - available > and the > -measured usage of core cycles per Rx queue over the last interval is > seen to > -be: > +(cores) automatically. > + > +The algorithm used to automatically assign Rxqs to PMDs can be set > by:: > + > + $ ovs-vsctl set Open_vSwitch . > other_config:pmd-rxq-assign=<assignment> > + > +By default, ``cycles`` assignment is used where the Rxqs will be > ordered by > +their measured processing cycles, and then be evenly assigned in > descending > +order to PMDs based on an up/down walk of the PMDs. For example, > where there > +are five Rx queues and three cores - 3, 7, and 8 - available and the > measured > +usage of core cycles per Rx queue over the last interval is seen to > be: > > - Queue #0: 30% > @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in > the following order:: > Core 8: Q3 (60%) | Q0 (30%) > > +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs > are > +assigned to PMDs in a round-robined fashion. This algorithm was used > by > +default prior to OVS 2.9. For example, given the following ports and > queues: > + > +- Port #0 Queue #0 (P0Q0) > +- Port #0 Queue #1 (P0Q1) > +- Port #1 Queue #0 (P1Q0) > +- Port #1 Queue #1 (P1Q1) > +- Port #1 Queue #2 (P1Q2) > + > +The Rx queues may be assigned to the cores in the following order:: > + > + Core 3: P0Q0 | P1Q1 > + Core 7: P0Q1 | P1Q2 > + Core 8: P1Q0 | > + > To see the current measured usage history of PMD core cycles for each > Rx > queue:: > diff --git a/NEWS b/NEWS > index 33b4d8a..33b3638 100644 > --- a/NEWS > +++ b/NEWS > @@ -9,5 +9,7 @@ Post-v2.10.0 > - ovn: > * ovn-ctl: allow passing user:group ids to the OVN daemons. > - > + - DPDK: > + * Add option for simple round-robin based Rxq to PMD assignment. > + It can be set with pmd-rxq-assign. > > v2.10.0 - xx xxx xxxx > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 807a462..ae24e6a 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -349,4 +349,6 @@ struct dp_netdev { > struct id_pool *tx_qid_pool; > struct ovs_mutex tx_qid_pool_mutex; > + /* Use measured cycles for rxq to pmd assignment. */ > + bool pmd_rxq_assign_cyc; > > /* Protects the access of the 'struct dp_netdev_pmd_thread' > @@ -1500,4 +1502,5 @@ create_dp_netdev(const char *name, const struct > dpif_class *class, > > cmap_init(&dp->poll_threads); > + dp->pmd_rxq_assign_cyc = true; > > ovs_mutex_init(&dp->tx_qid_pool_mutex); > @@ -3724,4 +3727,6 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > struct dp_netdev *dp = get_dp_netdev(dpif); > const char *cmask = smap_get(other_config, "pmd-cpu-mask"); > + const char *pmd_rxq_assign = smap_get_def(other_config, > "pmd-rxq-assign", > + "cycles"); > unsigned long long insert_prob = > smap_get_ullong(other_config, "emc-insert-inv-prob", > @@ -3786,4 +3791,18 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > } > } > + > + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles"); > + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) > { > + VLOG_WARN("Unsupported Rxq to PMD assignment mode in > pmd-rxq-assign. " > + "Defaulting to 'cycles'."); > + pmd_rxq_assign_cyc = true; > + pmd_rxq_assign = "cycles"; > + } > + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) { > + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc; > + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.", > + pmd_rxq_assign); > + dp_netdev_request_reconfigure(dp); > + } > return 0; > } > @@ -4256,8 +4275,16 @@ rr_numa_list_populate(struct dp_netdev *dp, > struct rr_numa_list *rr) > } > > -/* Returns the next pmd from the numa node in > - * incrementing or decrementing order. */ > +/* > + * Returns the next pmd from the numa node. > + * > + * If 'updown' is 'true' it will alternate between selecting the next > pmd in > + * either an up or down walk, switching between up/down when the > first or last > + * core is reached. e.g. 1,2,3,3,2,1,1,2... > + * > + * If 'updown' is 'false' it will select the next pmd wrapping around > when last > + * core reached. e.g. 1,2,3,1,2,3,1,2... > + */ > static struct dp_netdev_pmd_thread * > -rr_numa_get_pmd(struct rr_numa *numa) > +rr_numa_get_pmd(struct rr_numa *numa, bool updown) > { > int numa_idx = numa->cur_index; > @@ -4267,5 +4294,9 @@ rr_numa_get_pmd(struct rr_numa *numa) > if (numa->cur_index == numa->n_pmds-1) { > /* Reached the last pmd. */ > - numa->idx_inc = false; > + if (updown) { > + numa->idx_inc = false; > + } else { > + numa->cur_index = 0; > + } > } else { > numa->cur_index++; > @@ -4330,7 +4361,4 @@ compare_rxq_cycles(const void *a, const void *b) > * pmds to unpinned queues. > * > - * If 'pinned' is false queues will be sorted by processing cycles > they are > - * consuming and then assigned to pmds in round robin order. > - * > * The function doesn't touch the pmd threads, it just stores the > assignment > * in the 'pmd' member of each rxq. */ > @@ -4345,4 +4373,5 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > struct rr_numa *numa = NULL; > int numa_id; > + bool assign_cyc = dp->pmd_rxq_assign_cyc; > > HMAP_FOR_EACH (port, node, &dp->ports) { > @@ -4375,10 +4404,13 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + > 1)); > } > - /* Sum the queue intervals and store the cycle > history. */ > - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, > i); > - } > - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > cycle_hist); > > + if (assign_cyc) { > + /* Sum the queue intervals and store the cycle > history. */ > + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; > i++) { > + cycle_hist += > dp_netdev_rxq_get_intrvl_cycles(q, i); > + } > + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > + cycle_hist); > + } > /* Store the queue. */ > rxqs[n_rxqs++] = q; > @@ -4387,5 +4419,5 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > } > > - if (n_rxqs > 1) { > + if (n_rxqs > 1 && assign_cyc) { > /* Sort the queues in order of the processing cycles > * they consumed during their last pmd interval. */ > @@ -4411,5 +4443,5 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > continue; > } > - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); > + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, > assign_cyc); > VLOG_WARN("There's no available (non-isolated) pmd thread > " > "on numa node %d. Queue %d on port \'%s\' will > " > @@ -4420,11 +4452,20 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); > } else { > - rxqs[i]->pmd = rr_numa_get_pmd(numa); > - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " > - "rx queue %d (measured processing cycles > %"PRIu64").", > - rxqs[i]->pmd->core_id, numa_id, > - netdev_rxq_get_name(rxqs[i]->rx), > - netdev_rxq_get_queue_id(rxqs[i]->rx), > - dp_netdev_rxq_get_cycles(rxqs[i], > RXQ_CYCLES_PROC_HIST)); > + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc); > + if (assign_cyc) { > + VLOG_INFO("Core %d on numa node %d assigned port > \'%s\' " > + "rx queue %d " > + "(measured processing cycles %"PRIu64").", > + rxqs[i]->pmd->core_id, numa_id, > + netdev_rxq_get_name(rxqs[i]->rx), > + netdev_rxq_get_queue_id(rxqs[i]->rx), > + dp_netdev_rxq_get_cycles(rxqs[i], > + > RXQ_CYCLES_PROC_HIST)); > + } else { > + VLOG_INFO("Core %d on numa node %d assigned port > \'%s\' " > + "rx queue %d.", rxqs[i]->pmd->core_id, > numa_id, > + netdev_rxq_get_name(rxqs[i]->rx), > + netdev_rxq_get_queue_id(rxqs[i]->rx)); > + } > } > } > diff --git a/tests/pmd.at b/tests/pmd.at > index 4cae6c8..1f952f3 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ > > m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id > \)[[0-9]]*:/\1<cleared>\2<cleared>:/"]) > -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 > 7/<group>/"]) > +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 > 3 4 7/<group>/"]) > +m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 5 7/<group>/;s/0 > 2 4 6/<group>/"]) > m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) > > @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>: > ]) > > +AT_CHECK([ovs-vsctl set Open_vSwitch . > other_config:pmd-rxq-assign=cycles]) > TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) > AT_CHECK([ovs-vsctl set Open_vSwitch . > other_config:pmd-cpu-mask=0x3]) > CHECK_PMD_THREADS_CREATED([2], [], [+$TMP]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl > +port: p0 queue-id: <group> > +port: p0 queue-id: <group> > +]) > + > +AT_CHECK([ovs-vsctl set Open_vSwitch . > other_config:pmd-rxq-assign=roundrobin]) > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl > port: p0 queue-id: <group> > port: p0 queue-id: <group> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index e318151..91d132d 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -433,4 +433,28 @@ > </column> > > + <column name="other_config" key="pmd-rxq-assign" > + type='{"type": "string", > + "enum": ["set", ["cycles", "roundrobin"]]}'> > + <p> > + Specifies how RX queues will be automatically assigned to > CPU cores. > + Options: > + <dl> > + <dt><code>cycles</code></dt> > + <dd>Rxqs will be sorted by order of measured processing > cycles > + before being assigned to CPU cores.</dd> > + <dt><code>roundrobin</code></dt> > + <dd>Rxqs will be round-robined across CPU cores.</dd> > + </dl> > + </p> > + <p> > + The default value is <code>cycles</code>. > + </p> > + <p> > + Changing this value will affect an automatic re-assignment > of Rxqs to > + CPUs. Note: Rxqs mapped to CPU cores with > + <code>pmd-rxq-affinity</code> are unaffected. > + </p> > + </column> > + > <column name="other_config" key="n-handler-threads" > type='{"type": "integer", "minInteger": 1}'> > -- > 1.8.3.1
On 08/31/2018 09:56 AM, Eelco Chaudron wrote: > > > On 31 Aug 2018, at 10:47, Kevin Traynor wrote: > >> Prior to OVS 2.9 automatic assignment of Rxqs to PMDs >> (i.e. CPUs) was done by round-robin. >> >> That was changed in OVS 2.9 to ordering the Rxqs based on >> their measured processing cycles. This was to assign the >> busiest Rxqs to different PMDs, improving aggregate >> throughput. >> >> For the most part the new scheme should be better, but >> there could be situations where a user prefers a simple >> round-robin scheme because Rxqs from a single port are >> more likely to be spread across multiple PMDs, and/or >> traffic is very bursty/unpredictable. >> >> Add 'pmd-rxq-assign' config to allow a user to select >> round-robin based assignment. >> >> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> >> Acked-by: Eelco Chaudron <echaudro@redhat.com> > > You kept my ack, but anyway I looked at it again… > > Acked-by: Eelco Chaudron <echaudro@redhat.com> > thanks! >> --- >> >> V5: >> - Squashed NEWS into main patch (Ben) >> >> V4: >> - Modified warning log (Ilya) >> >> V3: >> - Rolled in some style and vswitch.xml changes (Ilya) >> - Set cycles mode by default on wrong config (Ilya) >> >> V2: >> - simplified nextpmd change (Eelco) >> - removed confusing doc sentence (Eelco) >> - fixed commit msg (Ilya) >> - made change in pmd-rxq-assign value also perform re-assignment (Ilya) >> - renamed to roundrobin mode (Ilya) >> - moved vswitch.xml changes to right config section (Ilya) >> - comment/log updates >> - moved NEWS update to separate patch as it's been changing on master >> >> Documentation/topics/dpdk/pmd.rst | 33 +++++++++++++--- >> NEWS | 4 +- >> lib/dpif-netdev.c | 83 >> +++++++++++++++++++++++++++++---------- >> tests/pmd.at | 12 +++++- >> vswitchd/vswitch.xml | 24 +++++++++++ >> 5 files changed, 126 insertions(+), 30 deletions(-) >> >> diff --git a/Documentation/topics/dpdk/pmd.rst >> b/Documentation/topics/dpdk/pmd.rst >> index 5f0671e..dd9172d 100644 >> --- a/Documentation/topics/dpdk/pmd.rst >> +++ b/Documentation/topics/dpdk/pmd.rst >> @@ -113,10 +113,15 @@ means that this thread will only poll the >> *pinned* Rx queues. >> >> If ``pmd-rxq-affinity`` is not set for Rx queues, they will be >> assigned to PMDs >> -(cores) automatically. Where known, the processing cycles that have >> been stored >> -for each Rx queue will be used to assign Rx queue to PMDs based on a >> round >> -robin of the sorted Rx queues. For example, take the following >> example, where >> -there are five Rx queues and three cores - 3, 7, and 8 - available >> and the >> -measured usage of core cycles per Rx queue over the last interval is >> seen to >> -be: >> +(cores) automatically. >> + >> +The algorithm used to automatically assign Rxqs to PMDs can be set by:: >> + >> + $ ovs-vsctl set Open_vSwitch . >> other_config:pmd-rxq-assign=<assignment> >> + >> +By default, ``cycles`` assignment is used where the Rxqs will be >> ordered by >> +their measured processing cycles, and then be evenly assigned in >> descending >> +order to PMDs based on an up/down walk of the PMDs. For example, >> where there >> +are five Rx queues and three cores - 3, 7, and 8 - available and the >> measured >> +usage of core cycles per Rx queue over the last interval is seen to be: >> >> - Queue #0: 30% >> @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in >> the following order:: >> Core 8: Q3 (60%) | Q0 (30%) >> >> +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs are >> +assigned to PMDs in a round-robined fashion. This algorithm was used by >> +default prior to OVS 2.9. For example, given the following ports and >> queues: >> + >> +- Port #0 Queue #0 (P0Q0) >> +- Port #0 Queue #1 (P0Q1) >> +- Port #1 Queue #0 (P1Q0) >> +- Port #1 Queue #1 (P1Q1) >> +- Port #1 Queue #2 (P1Q2) >> + >> +The Rx queues may be assigned to the cores in the following order:: >> + >> + Core 3: P0Q0 | P1Q1 >> + Core 7: P0Q1 | P1Q2 >> + Core 8: P1Q0 | >> + >> To see the current measured usage history of PMD core cycles for each Rx >> queue:: >> diff --git a/NEWS b/NEWS >> index 33b4d8a..33b3638 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -9,5 +9,7 @@ Post-v2.10.0 >> - ovn: >> * ovn-ctl: allow passing user:group ids to the OVN daemons. >> - >> + - DPDK: >> + * Add option for simple round-robin based Rxq to PMD assignment. >> + It can be set with pmd-rxq-assign. >> >> v2.10.0 - xx xxx xxxx >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 807a462..ae24e6a 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -349,4 +349,6 @@ struct dp_netdev { >> struct id_pool *tx_qid_pool; >> struct ovs_mutex tx_qid_pool_mutex; >> + /* Use measured cycles for rxq to pmd assignment. */ >> + bool pmd_rxq_assign_cyc; >> >> /* Protects the access of the 'struct dp_netdev_pmd_thread' >> @@ -1500,4 +1502,5 @@ create_dp_netdev(const char *name, const struct >> dpif_class *class, >> >> cmap_init(&dp->poll_threads); >> + dp->pmd_rxq_assign_cyc = true; >> >> ovs_mutex_init(&dp->tx_qid_pool_mutex); >> @@ -3724,4 +3727,6 @@ dpif_netdev_set_config(struct dpif *dpif, const >> struct smap *other_config) >> struct dp_netdev *dp = get_dp_netdev(dpif); >> const char *cmask = smap_get(other_config, "pmd-cpu-mask"); >> + const char *pmd_rxq_assign = smap_get_def(other_config, >> "pmd-rxq-assign", >> + "cycles"); >> unsigned long long insert_prob = >> smap_get_ullong(other_config, "emc-insert-inv-prob", >> @@ -3786,4 +3791,18 @@ dpif_netdev_set_config(struct dpif *dpif, const >> struct smap *other_config) >> } >> } >> + >> + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles"); >> + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) { >> + VLOG_WARN("Unsupported Rxq to PMD assignment mode in >> pmd-rxq-assign. " >> + "Defaulting to 'cycles'."); >> + pmd_rxq_assign_cyc = true; >> + pmd_rxq_assign = "cycles"; >> + } >> + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) { >> + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc; >> + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.", >> + pmd_rxq_assign); >> + dp_netdev_request_reconfigure(dp); >> + } >> return 0; >> } >> @@ -4256,8 +4275,16 @@ rr_numa_list_populate(struct dp_netdev *dp, >> struct rr_numa_list *rr) >> } >> >> -/* Returns the next pmd from the numa node in >> - * incrementing or decrementing order. */ >> +/* >> + * Returns the next pmd from the numa node. >> + * >> + * If 'updown' is 'true' it will alternate between selecting the next >> pmd in >> + * either an up or down walk, switching between up/down when the >> first or last >> + * core is reached. e.g. 1,2,3,3,2,1,1,2... >> + * >> + * If 'updown' is 'false' it will select the next pmd wrapping around >> when last >> + * core reached. e.g. 1,2,3,1,2,3,1,2... >> + */ >> static struct dp_netdev_pmd_thread * >> -rr_numa_get_pmd(struct rr_numa *numa) >> +rr_numa_get_pmd(struct rr_numa *numa, bool updown) >> { >> int numa_idx = numa->cur_index; >> @@ -4267,5 +4294,9 @@ rr_numa_get_pmd(struct rr_numa *numa) >> if (numa->cur_index == numa->n_pmds-1) { >> /* Reached the last pmd. */ >> - numa->idx_inc = false; >> + if (updown) { >> + numa->idx_inc = false; >> + } else { >> + numa->cur_index = 0; >> + } >> } else { >> numa->cur_index++; >> @@ -4330,7 +4361,4 @@ compare_rxq_cycles(const void *a, const void *b) >> * pmds to unpinned queues. >> * >> - * If 'pinned' is false queues will be sorted by processing cycles >> they are >> - * consuming and then assigned to pmds in round robin order. >> - * >> * The function doesn't touch the pmd threads, it just stores the >> assignment >> * in the 'pmd' member of each rxq. */ >> @@ -4345,4 +4373,5 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> struct rr_numa *numa = NULL; >> int numa_id; >> + bool assign_cyc = dp->pmd_rxq_assign_cyc; >> >> HMAP_FOR_EACH (port, node, &dp->ports) { >> @@ -4375,10 +4404,13 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); >> } >> - /* Sum the queue intervals and store the cycle >> history. */ >> - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { >> - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); >> - } >> - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, >> cycle_hist); >> >> + if (assign_cyc) { >> + /* Sum the queue intervals and store the cycle >> history. */ >> + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; >> i++) { >> + cycle_hist += >> dp_netdev_rxq_get_intrvl_cycles(q, i); >> + } >> + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, >> + cycle_hist); >> + } >> /* Store the queue. */ >> rxqs[n_rxqs++] = q; >> @@ -4387,5 +4419,5 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> } >> >> - if (n_rxqs > 1) { >> + if (n_rxqs > 1 && assign_cyc) { >> /* Sort the queues in order of the processing cycles >> * they consumed during their last pmd interval. */ >> @@ -4411,5 +4443,5 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> continue; >> } >> - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); >> + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, assign_cyc); >> VLOG_WARN("There's no available (non-isolated) pmd thread " >> "on numa node %d. Queue %d on port \'%s\' will " >> @@ -4420,11 +4452,20 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); >> } else { >> - rxqs[i]->pmd = rr_numa_get_pmd(numa); >> - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " >> - "rx queue %d (measured processing cycles %"PRIu64").", >> - rxqs[i]->pmd->core_id, numa_id, >> - netdev_rxq_get_name(rxqs[i]->rx), >> - netdev_rxq_get_queue_id(rxqs[i]->rx), >> - dp_netdev_rxq_get_cycles(rxqs[i], >> RXQ_CYCLES_PROC_HIST)); >> + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc); >> + if (assign_cyc) { >> + VLOG_INFO("Core %d on numa node %d assigned port >> \'%s\' " >> + "rx queue %d " >> + "(measured processing cycles %"PRIu64").", >> + rxqs[i]->pmd->core_id, numa_id, >> + netdev_rxq_get_name(rxqs[i]->rx), >> + netdev_rxq_get_queue_id(rxqs[i]->rx), >> + dp_netdev_rxq_get_cycles(rxqs[i], >> + >> RXQ_CYCLES_PROC_HIST)); >> + } else { >> + VLOG_INFO("Core %d on numa node %d assigned port >> \'%s\' " >> + "rx queue %d.", rxqs[i]->pmd->core_id, >> numa_id, >> + netdev_rxq_get_name(rxqs[i]->rx), >> + netdev_rxq_get_queue_id(rxqs[i]->rx)); >> + } >> } >> } >> diff --git a/tests/pmd.at b/tests/pmd.at >> index 4cae6c8..1f952f3 100644 >> --- a/tests/pmd.at >> +++ b/tests/pmd.at >> @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ >> >> m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id >> \)[[0-9]]*:/\1<cleared>\2<cleared>:/"]) >> -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 >> 7/<group>/"]) >> +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 >> 3 4 7/<group>/"]) >> +m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 5 7/<group>/;s/0 >> 2 4 6/<group>/"]) >> m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) >> >> @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>: >> ]) >> >> +AT_CHECK([ovs-vsctl set Open_vSwitch . >> other_config:pmd-rxq-assign=cycles]) >> TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) >> AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3]) >> CHECK_PMD_THREADS_CREATED([2], [], [+$TMP]) >> >> -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed >> ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed >> SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl >> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed >> ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed >> SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl >> +port: p0 queue-id: <group> >> +port: p0 queue-id: <group> >> +]) >> + >> +AT_CHECK([ovs-vsctl set Open_vSwitch . >> other_config:pmd-rxq-assign=roundrobin]) >> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed >> ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed >> SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl >> port: p0 queue-id: <group> >> port: p0 queue-id: <group> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >> index e318151..91d132d 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -433,4 +433,28 @@ >> </column> >> >> + <column name="other_config" key="pmd-rxq-assign" >> + type='{"type": "string", >> + "enum": ["set", ["cycles", "roundrobin"]]}'> >> + <p> >> + Specifies how RX queues will be automatically assigned to >> CPU cores. >> + Options: >> + <dl> >> + <dt><code>cycles</code></dt> >> + <dd>Rxqs will be sorted by order of measured processing >> cycles >> + before being assigned to CPU cores.</dd> >> + <dt><code>roundrobin</code></dt> >> + <dd>Rxqs will be round-robined across CPU cores.</dd> >> + </dl> >> + </p> >> + <p> >> + The default value is <code>cycles</code>. >> + </p> >> + <p> >> + Changing this value will affect an automatic re-assignment >> of Rxqs to >> + CPUs. Note: Rxqs mapped to CPU cores with >> + <code>pmd-rxq-affinity</code> are unaffected. >> + </p> >> + </column> >> + >> <column name="other_config" key="n-handler-threads" >> type='{"type": "integer", "minInteger": 1}'> >> -- >> 1.8.3.1
Looks good to me. Acked-by: Ilya Maximets <i.maximets@samsung.com> On 31.08.2018 11:47, Kevin Traynor wrote: > Prior to OVS 2.9 automatic assignment of Rxqs to PMDs > (i.e. CPUs) was done by round-robin. > > That was changed in OVS 2.9 to ordering the Rxqs based on > their measured processing cycles. This was to assign the > busiest Rxqs to different PMDs, improving aggregate > throughput. > > For the most part the new scheme should be better, but > there could be situations where a user prefers a simple > round-robin scheme because Rxqs from a single port are > more likely to be spread across multiple PMDs, and/or > traffic is very bursty/unpredictable. > > Add 'pmd-rxq-assign' config to allow a user to select > round-robin based assignment. > > Signed-off-by: Kevin Traynor <ktraynor@redhat.com> > Acked-by: Eelco Chaudron <echaudro@redhat.com> > --- > > V5: > - Squashed NEWS into main patch (Ben) > > V4: > - Modified warning log (Ilya) > > V3: > - Rolled in some style and vswitch.xml changes (Ilya) > - Set cycles mode by default on wrong config (Ilya) > > V2: > - simplified nextpmd change (Eelco) > - removed confusing doc sentence (Eelco) > - fixed commit msg (Ilya) > - made change in pmd-rxq-assign value also perform re-assignment (Ilya) > - renamed to roundrobin mode (Ilya) > - moved vswitch.xml changes to right config section (Ilya) > - comment/log updates > - moved NEWS update to separate patch as it's been changing on master > > Documentation/topics/dpdk/pmd.rst | 33 +++++++++++++--- > NEWS | 4 +- > lib/dpif-netdev.c | 83 +++++++++++++++++++++++++++++---------- > tests/pmd.at | 12 +++++- > vswitchd/vswitch.xml | 24 +++++++++++ > 5 files changed, 126 insertions(+), 30 deletions(-) > > diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst > index 5f0671e..dd9172d 100644 > --- a/Documentation/topics/dpdk/pmd.rst > +++ b/Documentation/topics/dpdk/pmd.rst > @@ -113,10 +113,15 @@ means that this thread will only poll the *pinned* Rx queues. > > If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs > -(cores) automatically. Where known, the processing cycles that have been stored > -for each Rx queue will be used to assign Rx queue to PMDs based on a round > -robin of the sorted Rx queues. For example, take the following example, where > -there are five Rx queues and three cores - 3, 7, and 8 - available and the > -measured usage of core cycles per Rx queue over the last interval is seen to > -be: > +(cores) automatically. > + > +The algorithm used to automatically assign Rxqs to PMDs can be set by:: > + > + $ ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=<assignment> > + > +By default, ``cycles`` assignment is used where the Rxqs will be ordered by > +their measured processing cycles, and then be evenly assigned in descending > +order to PMDs based on an up/down walk of the PMDs. For example, where there > +are five Rx queues and three cores - 3, 7, and 8 - available and the measured > +usage of core cycles per Rx queue over the last interval is seen to be: > > - Queue #0: 30% > @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in the following order:: > Core 8: Q3 (60%) | Q0 (30%) > > +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs are > +assigned to PMDs in a round-robined fashion. This algorithm was used by > +default prior to OVS 2.9. For example, given the following ports and queues: > + > +- Port #0 Queue #0 (P0Q0) > +- Port #0 Queue #1 (P0Q1) > +- Port #1 Queue #0 (P1Q0) > +- Port #1 Queue #1 (P1Q1) > +- Port #1 Queue #2 (P1Q2) > + > +The Rx queues may be assigned to the cores in the following order:: > + > + Core 3: P0Q0 | P1Q1 > + Core 7: P0Q1 | P1Q2 > + Core 8: P1Q0 | > + > To see the current measured usage history of PMD core cycles for each Rx > queue:: > diff --git a/NEWS b/NEWS > index 33b4d8a..33b3638 100644 > --- a/NEWS > +++ b/NEWS > @@ -9,5 +9,7 @@ Post-v2.10.0 > - ovn: > * ovn-ctl: allow passing user:group ids to the OVN daemons. > - > + - DPDK: > + * Add option for simple round-robin based Rxq to PMD assignment. > + It can be set with pmd-rxq-assign. > > v2.10.0 - xx xxx xxxx > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 807a462..ae24e6a 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -349,4 +349,6 @@ struct dp_netdev { > struct id_pool *tx_qid_pool; > struct ovs_mutex tx_qid_pool_mutex; > + /* Use measured cycles for rxq to pmd assignment. */ > + bool pmd_rxq_assign_cyc; > > /* Protects the access of the 'struct dp_netdev_pmd_thread' > @@ -1500,4 +1502,5 @@ create_dp_netdev(const char *name, const struct dpif_class *class, > > cmap_init(&dp->poll_threads); > + dp->pmd_rxq_assign_cyc = true; > > ovs_mutex_init(&dp->tx_qid_pool_mutex); > @@ -3724,4 +3727,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) > struct dp_netdev *dp = get_dp_netdev(dpif); > const char *cmask = smap_get(other_config, "pmd-cpu-mask"); > + const char *pmd_rxq_assign = smap_get_def(other_config, "pmd-rxq-assign", > + "cycles"); > unsigned long long insert_prob = > smap_get_ullong(other_config, "emc-insert-inv-prob", > @@ -3786,4 +3791,18 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) > } > } > + > + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles"); > + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) { > + VLOG_WARN("Unsupported Rxq to PMD assignment mode in pmd-rxq-assign. " > + "Defaulting to 'cycles'."); > + pmd_rxq_assign_cyc = true; > + pmd_rxq_assign = "cycles"; > + } > + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) { > + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc; > + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.", > + pmd_rxq_assign); > + dp_netdev_request_reconfigure(dp); > + } > return 0; > } > @@ -4256,8 +4275,16 @@ rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) > } > > -/* Returns the next pmd from the numa node in > - * incrementing or decrementing order. */ > +/* > + * Returns the next pmd from the numa node. > + * > + * If 'updown' is 'true' it will alternate between selecting the next pmd in > + * either an up or down walk, switching between up/down when the first or last > + * core is reached. e.g. 1,2,3,3,2,1,1,2... > + * > + * If 'updown' is 'false' it will select the next pmd wrapping around when last > + * core reached. e.g. 1,2,3,1,2,3,1,2... > + */ > static struct dp_netdev_pmd_thread * > -rr_numa_get_pmd(struct rr_numa *numa) > +rr_numa_get_pmd(struct rr_numa *numa, bool updown) > { > int numa_idx = numa->cur_index; > @@ -4267,5 +4294,9 @@ rr_numa_get_pmd(struct rr_numa *numa) > if (numa->cur_index == numa->n_pmds-1) { > /* Reached the last pmd. */ > - numa->idx_inc = false; > + if (updown) { > + numa->idx_inc = false; > + } else { > + numa->cur_index = 0; > + } > } else { > numa->cur_index++; > @@ -4330,7 +4361,4 @@ compare_rxq_cycles(const void *a, const void *b) > * pmds to unpinned queues. > * > - * If 'pinned' is false queues will be sorted by processing cycles they are > - * consuming and then assigned to pmds in round robin order. > - * > * The function doesn't touch the pmd threads, it just stores the assignment > * in the 'pmd' member of each rxq. */ > @@ -4345,4 +4373,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) > struct rr_numa *numa = NULL; > int numa_id; > + bool assign_cyc = dp->pmd_rxq_assign_cyc; > > HMAP_FOR_EACH (port, node, &dp->ports) { > @@ -4375,10 +4404,13 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) > rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); > } > - /* Sum the queue intervals and store the cycle history. */ > - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); > - } > - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, cycle_hist); > > + if (assign_cyc) { > + /* Sum the queue intervals and store the cycle history. */ > + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > + cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); > + } > + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > + cycle_hist); > + } > /* Store the queue. */ > rxqs[n_rxqs++] = q; > @@ -4387,5 +4419,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) > } > > - if (n_rxqs > 1) { > + if (n_rxqs > 1 && assign_cyc) { > /* Sort the queues in order of the processing cycles > * they consumed during their last pmd interval. */ > @@ -4411,5 +4443,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) > continue; > } > - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); > + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, assign_cyc); > VLOG_WARN("There's no available (non-isolated) pmd thread " > "on numa node %d. Queue %d on port \'%s\' will " > @@ -4420,11 +4452,20 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) > rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); > } else { > - rxqs[i]->pmd = rr_numa_get_pmd(numa); > - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " > - "rx queue %d (measured processing cycles %"PRIu64").", > - rxqs[i]->pmd->core_id, numa_id, > - netdev_rxq_get_name(rxqs[i]->rx), > - netdev_rxq_get_queue_id(rxqs[i]->rx), > - dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_HIST)); > + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc); > + if (assign_cyc) { > + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " > + "rx queue %d " > + "(measured processing cycles %"PRIu64").", > + rxqs[i]->pmd->core_id, numa_id, > + netdev_rxq_get_name(rxqs[i]->rx), > + netdev_rxq_get_queue_id(rxqs[i]->rx), > + dp_netdev_rxq_get_cycles(rxqs[i], > + RXQ_CYCLES_PROC_HIST)); > + } else { > + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " > + "rx queue %d.", rxqs[i]->pmd->core_id, numa_id, > + netdev_rxq_get_name(rxqs[i]->rx), > + netdev_rxq_get_queue_id(rxqs[i]->rx)); > + } > } > } > diff --git a/tests/pmd.at b/tests/pmd.at > index 4cae6c8..1f952f3 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ > > m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1<cleared>\2<cleared>:/"]) > -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 7/<group>/"]) > +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 7/<group>/"]) > +m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 5 7/<group>/;s/0 2 4 6/<group>/"]) > m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) > > @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>: > ]) > > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=cycles]) > TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) > AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3]) > CHECK_PMD_THREADS_CREATED([2], [], [+$TMP]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl > +port: p0 queue-id: <group> > +port: p0 queue-id: <group> > +]) > + > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=roundrobin]) > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl > port: p0 queue-id: <group> > port: p0 queue-id: <group> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index e318151..91d132d 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -433,4 +433,28 @@ > </column> > > + <column name="other_config" key="pmd-rxq-assign" > + type='{"type": "string", > + "enum": ["set", ["cycles", "roundrobin"]]}'> > + <p> > + Specifies how RX queues will be automatically assigned to CPU cores. > + Options: > + <dl> > + <dt><code>cycles</code></dt> > + <dd>Rxqs will be sorted by order of measured processing cycles > + before being assigned to CPU cores.</dd> > + <dt><code>roundrobin</code></dt> > + <dd>Rxqs will be round-robined across CPU cores.</dd> > + </dl> > + </p> > + <p> > + The default value is <code>cycles</code>. > + </p> > + <p> > + Changing this value will affect an automatic re-assignment of Rxqs to > + CPUs. Note: Rxqs mapped to CPU cores with > + <code>pmd-rxq-affinity</code> are unaffected. > + </p> > + </column> > + > <column name="other_config" key="n-handler-threads" > type='{"type": "integer", "minInteger": 1}'> >
> > Looks good to me. > > Acked-by: Ilya Maximets <i.maximets@samsung.com> > I'll pull this in with this week's pull request, thanks all for the effort. Thanks Ian > On 31.08.2018 11:47, Kevin Traynor wrote: > > Prior to OVS 2.9 automatic assignment of Rxqs to PMDs (i.e. CPUs) was > > done by round-robin. > > > > That was changed in OVS 2.9 to ordering the Rxqs based on their > > measured processing cycles. This was to assign the busiest Rxqs to > > different PMDs, improving aggregate throughput. > > > > For the most part the new scheme should be better, but there could be > > situations where a user prefers a simple round-robin scheme because > > Rxqs from a single port are more likely to be spread across multiple > > PMDs, and/or traffic is very bursty/unpredictable. > > > > Add 'pmd-rxq-assign' config to allow a user to select round-robin > > based assignment. > > > > Signed-off-by: Kevin Traynor <ktraynor@redhat.com> > > Acked-by: Eelco Chaudron <echaudro@redhat.com> > > --- > > > > V5: > > - Squashed NEWS into main patch (Ben) > > > > V4: > > - Modified warning log (Ilya) > > > > V3: > > - Rolled in some style and vswitch.xml changes (Ilya) > > - Set cycles mode by default on wrong config (Ilya) > > > > V2: > > - simplified nextpmd change (Eelco) > > - removed confusing doc sentence (Eelco) > > - fixed commit msg (Ilya) > > - made change in pmd-rxq-assign value also perform re-assignment > > (Ilya) > > - renamed to roundrobin mode (Ilya) > > - moved vswitch.xml changes to right config section (Ilya) > > - comment/log updates > > - moved NEWS update to separate patch as it's been changing on master > > > > Documentation/topics/dpdk/pmd.rst | 33 +++++++++++++--- > > NEWS | 4 +- > > lib/dpif-netdev.c | 83 +++++++++++++++++++++++++++++--- > ------- > > tests/pmd.at | 12 +++++- > > vswitchd/vswitch.xml | 24 +++++++++++ > > 5 files changed, 126 insertions(+), 30 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/pmd.rst > > b/Documentation/topics/dpdk/pmd.rst > > index 5f0671e..dd9172d 100644 > > --- a/Documentation/topics/dpdk/pmd.rst > > +++ b/Documentation/topics/dpdk/pmd.rst > > @@ -113,10 +113,15 @@ means that this thread will only poll the *pinned* > Rx queues. > > > > If ``pmd-rxq-affinity`` is not set for Rx queues, they will be > > assigned to PMDs > > -(cores) automatically. Where known, the processing cycles that have > > been stored -for each Rx queue will be used to assign Rx queue to PMDs > > based on a round -robin of the sorted Rx queues. For example, take the > > following example, where -there are five Rx queues and three cores - > > 3, 7, and 8 - available and the -measured usage of core cycles per Rx > > queue over the last interval is seen to > > -be: > > +(cores) automatically. > > + > > +The algorithm used to automatically assign Rxqs to PMDs can be set by:: > > + > > + $ ovs-vsctl set Open_vSwitch . > > + other_config:pmd-rxq-assign=<assignment> > > + > > +By default, ``cycles`` assignment is used where the Rxqs will be > > +ordered by their measured processing cycles, and then be evenly > > +assigned in descending order to PMDs based on an up/down walk of the > > +PMDs. For example, where there are five Rx queues and three cores - > > +3, 7, and 8 - available and the measured usage of core cycles per Rx > queue over the last interval is seen to be: > > > > - Queue #0: 30% > > @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in the > following order:: > > Core 8: Q3 (60%) | Q0 (30%) > > > > +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs > > +are assigned to PMDs in a round-robined fashion. This algorithm was > > +used by default prior to OVS 2.9. For example, given the following > ports and queues: > > + > > +- Port #0 Queue #0 (P0Q0) > > +- Port #0 Queue #1 (P0Q1) > > +- Port #1 Queue #0 (P1Q0) > > +- Port #1 Queue #1 (P1Q1) > > +- Port #1 Queue #2 (P1Q2) > > + > > +The Rx queues may be assigned to the cores in the following order:: > > + > > + Core 3: P0Q0 | P1Q1 > > + Core 7: P0Q1 | P1Q2 > > + Core 8: P1Q0 | > > + > > To see the current measured usage history of PMD core cycles for each > > Rx > > queue:: > > diff --git a/NEWS b/NEWS > > index 33b4d8a..33b3638 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -9,5 +9,7 @@ Post-v2.10.0 > > - ovn: > > * ovn-ctl: allow passing user:group ids to the OVN daemons. > > - > > + - DPDK: > > + * Add option for simple round-robin based Rxq to PMD assignment. > > + It can be set with pmd-rxq-assign. > > > > v2.10.0 - xx xxx xxxx > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > 807a462..ae24e6a 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -349,4 +349,6 @@ struct dp_netdev { > > struct id_pool *tx_qid_pool; > > struct ovs_mutex tx_qid_pool_mutex; > > + /* Use measured cycles for rxq to pmd assignment. */ > > + bool pmd_rxq_assign_cyc; > > > > /* Protects the access of the 'struct dp_netdev_pmd_thread' > > @@ -1500,4 +1502,5 @@ create_dp_netdev(const char *name, const struct > > dpif_class *class, > > > > cmap_init(&dp->poll_threads); > > + dp->pmd_rxq_assign_cyc = true; > > > > ovs_mutex_init(&dp->tx_qid_pool_mutex); > > @@ -3724,4 +3727,6 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > > struct dp_netdev *dp = get_dp_netdev(dpif); > > const char *cmask = smap_get(other_config, "pmd-cpu-mask"); > > + const char *pmd_rxq_assign = smap_get_def(other_config, "pmd-rxq- > assign", > > + "cycles"); > > unsigned long long insert_prob = > > smap_get_ullong(other_config, "emc-insert-inv-prob", @@ > > -3786,4 +3791,18 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > > } > > } > > + > > + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles"); > > + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) { > > + VLOG_WARN("Unsupported Rxq to PMD assignment mode in pmd-rxq- > assign. " > > + "Defaulting to 'cycles'."); > > + pmd_rxq_assign_cyc = true; > > + pmd_rxq_assign = "cycles"; > > + } > > + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) { > > + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc; > > + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.", > > + pmd_rxq_assign); > > + dp_netdev_request_reconfigure(dp); > > + } > > return 0; > > } > > @@ -4256,8 +4275,16 @@ rr_numa_list_populate(struct dp_netdev *dp, > > struct rr_numa_list *rr) } > > > > -/* Returns the next pmd from the numa node in > > - * incrementing or decrementing order. */ > > +/* > > + * Returns the next pmd from the numa node. > > + * > > + * If 'updown' is 'true' it will alternate between selecting the next > > +pmd in > > + * either an up or down walk, switching between up/down when the > > +first or last > > + * core is reached. e.g. 1,2,3,3,2,1,1,2... > > + * > > + * If 'updown' is 'false' it will select the next pmd wrapping around > > +when last > > + * core reached. e.g. 1,2,3,1,2,3,1,2... > > + */ > > static struct dp_netdev_pmd_thread * > > -rr_numa_get_pmd(struct rr_numa *numa) > > +rr_numa_get_pmd(struct rr_numa *numa, bool updown) > > { > > int numa_idx = numa->cur_index; > > @@ -4267,5 +4294,9 @@ rr_numa_get_pmd(struct rr_numa *numa) > > if (numa->cur_index == numa->n_pmds-1) { > > /* Reached the last pmd. */ > > - numa->idx_inc = false; > > + if (updown) { > > + numa->idx_inc = false; > > + } else { > > + numa->cur_index = 0; > > + } > > } else { > > numa->cur_index++; > > @@ -4330,7 +4361,4 @@ compare_rxq_cycles(const void *a, const void *b) > > * pmds to unpinned queues. > > * > > - * If 'pinned' is false queues will be sorted by processing cycles > > they are > > - * consuming and then assigned to pmds in round robin order. > > - * > > * The function doesn't touch the pmd threads, it just stores the > assignment > > * in the 'pmd' member of each rxq. */ @@ -4345,4 +4373,5 @@ > > rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp- > >port_mutex) > > struct rr_numa *numa = NULL; > > int numa_id; > > + bool assign_cyc = dp->pmd_rxq_assign_cyc; > > > > HMAP_FOR_EACH (port, node, &dp->ports) { @@ -4375,10 +4404,13 @@ > > rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp- > >port_mutex) > > rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); > > } > > - /* Sum the queue intervals and store the cycle history. > */ > > - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > > - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, > i); > > - } > > - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > cycle_hist); > > > > + if (assign_cyc) { > > + /* Sum the queue intervals and store the cycle > history. */ > > + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) > { > > + cycle_hist += > dp_netdev_rxq_get_intrvl_cycles(q, i); > > + } > > + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > > + cycle_hist); > > + } > > /* Store the queue. */ > > rxqs[n_rxqs++] = q; > > @@ -4387,5 +4419,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) > OVS_REQUIRES(dp->port_mutex) > > } > > > > - if (n_rxqs > 1) { > > + if (n_rxqs > 1 && assign_cyc) { > > /* Sort the queues in order of the processing cycles > > * they consumed during their last pmd interval. */ @@ > > -4411,5 +4443,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) > OVS_REQUIRES(dp->port_mutex) > > continue; > > } > > - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); > > + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, > > + assign_cyc); > > VLOG_WARN("There's no available (non-isolated) pmd thread " > > "on numa node %d. Queue %d on port \'%s\' will " > > @@ -4420,11 +4452,20 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > > rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); > > } else { > > - rxqs[i]->pmd = rr_numa_get_pmd(numa); > > - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " > > - "rx queue %d (measured processing cycles > %"PRIu64").", > > - rxqs[i]->pmd->core_id, numa_id, > > - netdev_rxq_get_name(rxqs[i]->rx), > > - netdev_rxq_get_queue_id(rxqs[i]->rx), > > - dp_netdev_rxq_get_cycles(rxqs[i], > RXQ_CYCLES_PROC_HIST)); > > + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc); > > + if (assign_cyc) { > > + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' > " > > + "rx queue %d " > > + "(measured processing cycles %"PRIu64").", > > + rxqs[i]->pmd->core_id, numa_id, > > + netdev_rxq_get_name(rxqs[i]->rx), > > + netdev_rxq_get_queue_id(rxqs[i]->rx), > > + dp_netdev_rxq_get_cycles(rxqs[i], > > + > RXQ_CYCLES_PROC_HIST)); > > + } else { > > + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' > " > > + "rx queue %d.", rxqs[i]->pmd->core_id, > numa_id, > > + netdev_rxq_get_name(rxqs[i]->rx), > > + netdev_rxq_get_queue_id(rxqs[i]->rx)); > > + } > > } > > } > > diff --git a/tests/pmd.at b/tests/pmd.at index 4cae6c8..1f952f3 100644 > > --- a/tests/pmd.at > > +++ b/tests/pmd.at > > @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ > > > > m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id > > \)[[0-9]]*:/\1<cleared>\2<cleared>:/"]) > > -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 > > 7/<group>/"]) > > +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 > > +3 4 7/<group>/"]) m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 > > +5 7/<group>/;s/0 2 4 6/<group>/"]) > > m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) > > > > @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>: > > ]) > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . > > +other_config:pmd-rxq-assign=cycles]) > > TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) > > AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3]) > > CHECK_PMD_THREADS_CREATED([2], [], [+$TMP]) > > > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > > ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > > SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > > +':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > > +SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl > > +port: p0 queue-id: <group> > > +port: p0 queue-id: <group> > > +]) > > + > > +AT_CHECK([ovs-vsctl set Open_vSwitch . > > +other_config:pmd-rxq-assign=roundrobin]) > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > > +':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > > +SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl > > port: p0 queue-id: <group> > > port: p0 queue-id: <group> > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > > e318151..91d132d 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -433,4 +433,28 @@ > > </column> > > > > + <column name="other_config" key="pmd-rxq-assign" > > + type='{"type": "string", > > + "enum": ["set", ["cycles", "roundrobin"]]}'> > > + <p> > > + Specifies how RX queues will be automatically assigned to CPU > cores. > > + Options: > > + <dl> > > + <dt><code>cycles</code></dt> > > + <dd>Rxqs will be sorted by order of measured processing > cycles > > + before being assigned to CPU cores.</dd> > > + <dt><code>roundrobin</code></dt> > > + <dd>Rxqs will be round-robined across CPU cores.</dd> > > + </dl> > > + </p> > > + <p> > > + The default value is <code>cycles</code>. > > + </p> > > + <p> > > + Changing this value will affect an automatic re-assignment of > Rxqs to > > + CPUs. Note: Rxqs mapped to CPU cores with > > + <code>pmd-rxq-affinity</code> are unaffected. > > + </p> > > + </column> > > + > > <column name="other_config" key="n-handler-threads" > > type='{"type": "integer", "minInteger": 1}'> > >
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst index 5f0671e..dd9172d 100644 --- a/Documentation/topics/dpdk/pmd.rst +++ b/Documentation/topics/dpdk/pmd.rst @@ -113,10 +113,15 @@ means that this thread will only poll the *pinned* Rx queues. If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs -(cores) automatically. Where known, the processing cycles that have been stored -for each Rx queue will be used to assign Rx queue to PMDs based on a round -robin of the sorted Rx queues. For example, take the following example, where -there are five Rx queues and three cores - 3, 7, and 8 - available and the -measured usage of core cycles per Rx queue over the last interval is seen to -be: +(cores) automatically. + +The algorithm used to automatically assign Rxqs to PMDs can be set by:: + + $ ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=<assignment> + +By default, ``cycles`` assignment is used where the Rxqs will be ordered by +their measured processing cycles, and then be evenly assigned in descending +order to PMDs based on an up/down walk of the PMDs. For example, where there +are five Rx queues and three cores - 3, 7, and 8 - available and the measured +usage of core cycles per Rx queue over the last interval is seen to be: - Queue #0: 30% @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in the following order:: Core 8: Q3 (60%) | Q0 (30%) +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs are +assigned to PMDs in a round-robined fashion. This algorithm was used by +default prior to OVS 2.9. For example, given the following ports and queues: + +- Port #0 Queue #0 (P0Q0) +- Port #0 Queue #1 (P0Q1) +- Port #1 Queue #0 (P1Q0) +- Port #1 Queue #1 (P1Q1) +- Port #1 Queue #2 (P1Q2) + +The Rx queues may be assigned to the cores in the following order:: + + Core 3: P0Q0 | P1Q1 + Core 7: P0Q1 | P1Q2 + Core 8: P1Q0 | + To see the current measured usage history of PMD core cycles for each Rx queue:: diff --git a/NEWS b/NEWS index 33b4d8a..33b3638 100644 --- a/NEWS +++ b/NEWS @@ -9,5 +9,7 @@ Post-v2.10.0 - ovn: * ovn-ctl: allow passing user:group ids to the OVN daemons. - + - DPDK: + * Add option for simple round-robin based Rxq to PMD assignment. + It can be set with pmd-rxq-assign. v2.10.0 - xx xxx xxxx diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 807a462..ae24e6a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -349,4 +349,6 @@ struct dp_netdev { struct id_pool *tx_qid_pool; struct ovs_mutex tx_qid_pool_mutex; + /* Use measured cycles for rxq to pmd assignment. */ + bool pmd_rxq_assign_cyc; /* Protects the access of the 'struct dp_netdev_pmd_thread' @@ -1500,4 +1502,5 @@ create_dp_netdev(const char *name, const struct dpif_class *class, cmap_init(&dp->poll_threads); + dp->pmd_rxq_assign_cyc = true; ovs_mutex_init(&dp->tx_qid_pool_mutex); @@ -3724,4 +3727,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) struct dp_netdev *dp = get_dp_netdev(dpif); const char *cmask = smap_get(other_config, "pmd-cpu-mask"); + const char *pmd_rxq_assign = smap_get_def(other_config, "pmd-rxq-assign", + "cycles"); unsigned long long insert_prob = smap_get_ullong(other_config, "emc-insert-inv-prob", @@ -3786,4 +3791,18 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) } } + + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles"); + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) { + VLOG_WARN("Unsupported Rxq to PMD assignment mode in pmd-rxq-assign. " + "Defaulting to 'cycles'."); + pmd_rxq_assign_cyc = true; + pmd_rxq_assign = "cycles"; + } + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) { + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc; + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.", + pmd_rxq_assign); + dp_netdev_request_reconfigure(dp); + } return 0; } @@ -4256,8 +4275,16 @@ rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) } -/* Returns the next pmd from the numa node in - * incrementing or decrementing order. */ +/* + * Returns the next pmd from the numa node. + * + * If 'updown' is 'true' it will alternate between selecting the next pmd in + * either an up or down walk, switching between up/down when the first or last + * core is reached. e.g. 1,2,3,3,2,1,1,2... + * + * If 'updown' is 'false' it will select the next pmd wrapping around when last + * core reached. e.g. 1,2,3,1,2,3,1,2... + */ static struct dp_netdev_pmd_thread * -rr_numa_get_pmd(struct rr_numa *numa) +rr_numa_get_pmd(struct rr_numa *numa, bool updown) { int numa_idx = numa->cur_index; @@ -4267,5 +4294,9 @@ rr_numa_get_pmd(struct rr_numa *numa) if (numa->cur_index == numa->n_pmds-1) { /* Reached the last pmd. */ - numa->idx_inc = false; + if (updown) { + numa->idx_inc = false; + } else { + numa->cur_index = 0; + } } else { numa->cur_index++; @@ -4330,7 +4361,4 @@ compare_rxq_cycles(const void *a, const void *b) * pmds to unpinned queues. * - * If 'pinned' is false queues will be sorted by processing cycles they are - * consuming and then assigned to pmds in round robin order. - * * The function doesn't touch the pmd threads, it just stores the assignment * in the 'pmd' member of each rxq. */ @@ -4345,4 +4373,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) struct rr_numa *numa = NULL; int numa_id; + bool assign_cyc = dp->pmd_rxq_assign_cyc; HMAP_FOR_EACH (port, node, &dp->ports) { @@ -4375,10 +4404,13 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); } - /* Sum the queue intervals and store the cycle history. */ - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); - } - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, cycle_hist); + if (assign_cyc) { + /* Sum the queue intervals and store the cycle history. */ + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { + cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); + } + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, + cycle_hist); + } /* Store the queue. */ rxqs[n_rxqs++] = q; @@ -4387,5 +4419,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) } - if (n_rxqs > 1) { + if (n_rxqs > 1 && assign_cyc) { /* Sort the queues in order of the processing cycles * they consumed during their last pmd interval. */ @@ -4411,5 +4443,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) continue; } - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, assign_cyc); VLOG_WARN("There's no available (non-isolated) pmd thread " "on numa node %d. Queue %d on port \'%s\' will " @@ -4420,11 +4452,20 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); } else { - rxqs[i]->pmd = rr_numa_get_pmd(numa); - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " - "rx queue %d (measured processing cycles %"PRIu64").", - rxqs[i]->pmd->core_id, numa_id, - netdev_rxq_get_name(rxqs[i]->rx), - netdev_rxq_get_queue_id(rxqs[i]->rx), - dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_HIST)); + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc); + if (assign_cyc) { + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " + "rx queue %d " + "(measured processing cycles %"PRIu64").", + rxqs[i]->pmd->core_id, numa_id, + netdev_rxq_get_name(rxqs[i]->rx), + netdev_rxq_get_queue_id(rxqs[i]->rx), + dp_netdev_rxq_get_cycles(rxqs[i], + RXQ_CYCLES_PROC_HIST)); + } else { + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " + "rx queue %d.", rxqs[i]->pmd->core_id, numa_id, + netdev_rxq_get_name(rxqs[i]->rx), + netdev_rxq_get_queue_id(rxqs[i]->rx)); + } } } diff --git a/tests/pmd.at b/tests/pmd.at index 4cae6c8..1f952f3 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1<cleared>\2<cleared>:/"]) -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 7/<group>/"]) +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 7/<group>/"]) +m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 5 7/<group>/;s/0 2 4 6/<group>/"]) m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>: ]) +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=cycles]) TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3]) CHECK_PMD_THREADS_CREATED([2], [], [+$TMP]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl +port: p0 queue-id: <group> +port: p0 queue-id: <group> +]) + +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=roundrobin]) +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl port: p0 queue-id: <group> port: p0 queue-id: <group> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index e318151..91d132d 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -433,4 +433,28 @@ </column> + <column name="other_config" key="pmd-rxq-assign" + type='{"type": "string", + "enum": ["set", ["cycles", "roundrobin"]]}'> + <p> + Specifies how RX queues will be automatically assigned to CPU cores. + Options: + <dl> + <dt><code>cycles</code></dt> + <dd>Rxqs will be sorted by order of measured processing cycles + before being assigned to CPU cores.</dd> + <dt><code>roundrobin</code></dt> + <dd>Rxqs will be round-robined across CPU cores.</dd> + </dl> + </p> + <p> + The default value is <code>cycles</code>. + </p> + <p> + Changing this value will affect an automatic re-assignment of Rxqs to + CPUs. Note: Rxqs mapped to CPU cores with + <code>pmd-rxq-affinity</code> are unaffected. + </p> + </column> + <column name="other_config" key="n-handler-threads" type='{"type": "integer", "minInteger": 1}'>