From patchwork Thu Jan 4 12:07:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 855576 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC6Bs3Pssz9sRm for ; Thu, 4 Jan 2018 23:13:09 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 1780CBA3; Thu, 4 Jan 2018 12:13:07 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 17922ACC for ; Thu, 4 Jan 2018 12:13:05 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mout.web.de (mout.web.de [212.227.17.12]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7354E18A for ; Thu, 4 Jan 2018 12:13:02 +0000 (UTC) Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb101 [213.165.67.124]) with ESMTPSA (Nemesis) id 0MT8o6-1eMjGx3niI-00S3P9; Thu, 04 Jan 2018 13:07:53 +0100 From: Jan Scheurich To: dev@openvswitch.org Date: Thu, 4 Jan 2018 13:07:44 +0100 Message-Id: <1515067666-12398-2-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com> References: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com> X-Provags-ID: V03:K0:PFTiTZHZ15NIcYqbyhWYhC62tYZG469bYZxQa8GPUZkwa3NuC56 8aW39iKMaacl4GML+HvEoKLSB02bsDsM27WNyRZWaUVS/iddi172KVwIKSLYzXtD0rnVsZ6 h7m+v2vJsugowtW0W2qWMKpqrFSxWRtfxU7AkOYbIyD+Ke8iloL/70hjoVuBEli0wXOIDRr fsE5DxNztRNLf22t3S11A== X-UI-Out-Filterresults: notjunk:1; V01:K0:pZJlYHgW8QQ=:oyoASrn0XuXr97zpBWUPEp NyTbe/bTMEgnYs115TV+7PK+CeCFN8Jgy3G3I6LRIF+RWQR64lUvQfm0QQnGQrhEQBnhqwIMR Vp3ABkewpE1IA98pKgBUlSLUqXcsOe/3UgrgG9qHLi7JwAE35HLMTE+x7giYZXXhxrobp1Ui/ 9Olw62z/I7D026iipuUI2Ghy1etBCEoyEwIsTSCQx5GgwBjxhtNZU8b298xxb67EMdU0eAX9N +nkUWpB4GzWO6z7gEPdn6RL+EO8Kmed+UpEPsSJUHi7631Pz6YqmOBeUxlYRwjAry5xoOT2s6 OkaqcHsNy8aufptuXGEWjOW7N4RmJmB6oX1YHifr+tLC+qHr87XjMabeDCiCpoopngOV4yqY0 jVzs91/d/02yHrXXSw026UNa2TZHi/CNrW4/cynwIt81QPO3987JKNEjdGlQpW2ddZeqV0TOn fUoOJjK7vUT0qD2zU63xM1jtlMeo/uBnWx61G9pPvlQWqtpgOOw++qraSWBHxB5kPvaH5OM8R CX/y08Vf5jLBKB0iYuZPy8LSNpJ3iiL0BITMTU+zrZ9Mrct9pIBbkrrtiHEf4ps9hOHrThbSN U5L3f3vSOti9R72Tr3P19RQOeCQZOKovZuSAVKQx9pZRdHv1CwzGBhMJUOHhrZcyN6X8Poz+e +j3RnE1KDgcGmN4x2QcIoFdj90sANyCNcquq07+2BcCZjMifZ7mmseLeitRajDvHgw5zoK1C0 lmaiw4PBG4H4Ap/X6gSNUGjlTw3IjwpLcFHX9Ljtd0xjYjA/qxE5uFwPimHOM97bAc6LbtOgO iwg/dMwStE+agLG9kq2euxDUeEdSQ== X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v5 1/3] dpif-netdev: Refactor PMD performance into dpif-netdev-perf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Add module dpif-netdev-perf to host all PMD performance-related data structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-netdev and delegate whatever possible into the new module, using clean interfaces to shield dpif-netdev from the implementation details. Accordingly, the all PMD statistics members are moved from the main struct dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats. Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: 1. The cycles per packet counts are now based on packets received rather than packet passes through the datapath. 2. Packet counters are now kept for packets received and packets recirculated. These are kept as separate counters for maintainability reasons. The cost of incrementing these counters is negligible. These new counters are also displayed to the user. 3. A display statistic is added for the average number of datapath passes per packet. This should be useful for user debugging and understanding of packet processing. 4. The user visible 'miss' counter is used for successful upcalls, rather than the sum of sucessful and unsuccessful upcalls. Hence, this becomes what user historically understands by OVS 'miss upcall'. The user display is annotated to make this clear as well. 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to make it clear what the meaning is. 6. The enum pmd_stat_type is annotated to make the usage of the stats counters clear. 7. The subtable lookup stats is renamed to make it clear that it relates to masked lookups. 8. The PMD stats test is updated to handle the new user stats of packets received, packets recirculated and average number of datapath passes per packet. On top of that introduce a "-pmd " option to the PMD info commands to filter the output for a single PMD. Signed-off-by: Jan Scheurich Co-authored-by: Darrell Ball Signed-off-by: Darrell Ball --- lib/automake.mk | 2 + lib/dpif-netdev-perf.c | 66 +++++++++ lib/dpif-netdev-perf.h | 129 ++++++++++++++++++ lib/dpif-netdev.c | 353 ++++++++++++++++++++----------------------------- tests/pmd.at | 22 +-- 5 files changed, 357 insertions(+), 215 deletions(-) create mode 100644 lib/dpif-netdev-perf.c create mode 100644 lib/dpif-netdev-perf.h diff --git a/lib/automake.mk b/lib/automake.mk index 4b38a11..159319f 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpdk.h \ lib/dpif-netdev.c \ lib/dpif-netdev.h \ + lib/dpif-netdev-perf.c \ + lib/dpif-netdev-perf.h \ lib/dpif-provider.h \ lib/dpif.c \ lib/dpif.h \ diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c new file mode 100644 index 0000000..7d8b7b2 --- /dev/null +++ b/lib/dpif-netdev-perf.c @@ -0,0 +1,66 @@ +/* + * Copyright (c) 2017 Ericsson AB. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#ifdef DPDK_NETDEV +#include +#endif + +#include "openvswitch/dynamic-string.h" +#include "openvswitch/vlog.h" +#include "dpif-netdev-perf.h" +#include "timeval.h" + +VLOG_DEFINE_THIS_MODULE(pmd_perf); + +void +pmd_perf_stats_init(struct pmd_perf_stats *s) { + memset(s, 0 , sizeof(*s)); + s->start_ms = time_msec(); +} + +void +pmd_perf_read_counters(struct pmd_perf_stats *s, + uint64_t stats[PMD_N_STATS]) +{ + uint64_t val; + + /* These loops subtracts reference values ('.zero[*]') from the counters. + * Since loads and stores are relaxed, it might be possible for a '.zero[*]' + * value to be more recent than the current value we're reading from the + * counter. This is not a big problem, since these numbers are not + * supposed to be too accurate, but we should at least make sure that + * the result is not negative. */ + for (int i = 0; i < PMD_N_STATS; i++) { + atomic_read_relaxed(&s->counters.n[i], &val); + if (val > s->counters.zero[i]) { + stats[i] = val - s->counters.zero[i]; + } else { + stats[i] = 0; + } + } +} + +void +pmd_perf_stats_clear(struct pmd_perf_stats *s) +{ + s->start_ms = 0; /* Marks start of clearing. */ + for (int i = 0; i < PMD_N_STATS; i++) { + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); + } + s->start_ms = time_msec(); /* Clearing finished. */ +} diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h new file mode 100644 index 0000000..fa5cf37 --- /dev/null +++ b/lib/dpif-netdev-perf.h @@ -0,0 +1,129 @@ +/* + * Copyright (c) 2017 Ericsson AB. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PERF_H +#define DPIF_NETDEV_PERF_H 1 + +#include +#include +#include +#include +#include + +#include "openvswitch/vlog.h" +#include "ovs-atomic.h" +#include "timeval.h" +#include "unixctl.h" +#include "util.h" + +#ifdef __cplusplus +extern "C" { +#endif + +enum pmd_stat_type { + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ + /* The above statistics account for the total + * number of packet passes through the datapath + * pipeline and should not be overlapping with each + * other. */ + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table + hits. Each MASKED_HIT hit will have >= 1 + MASKED_LOOKUP(s). */ + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an + * interface. */ + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to + * recirculation. */ + PMD_STAT_SENT_PKTS, /* Packets that has been sent. */ + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and + * processing polled packets. */ + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ + PMD_CYCLES_UPCALL, /* Cycles spent in upcalls. Included in + PMD_CYCLES_POLL_BUSY. */ + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ + PMD_N_STATS +}; + +struct pmd_counters { + atomic_uint64_t n[PMD_N_STATS]; /* Indexed by PMD_STAT_*. */ + uint64_t zero[PMD_N_STATS]; +}; + +struct pmd_perf_stats { + uint64_t start_ms; + uint64_t last_tsc; + struct pmd_counters counters; +}; + +enum pmd_info_type; +struct dp_netdev_pmd_thread; + +struct pmd_perf_params { + int command_type; + bool histograms; + size_t iter_hist_len; + size_t ms_hist_len; +}; + +void pmd_perf_stats_init(struct pmd_perf_stats *s); +void pmd_perf_stats_clear(struct pmd_perf_stats *s); +void pmd_perf_read_counters(struct pmd_perf_stats *s, + uint64_t stats[PMD_N_STATS]); + +static inline void +pmd_perf_update_counter(struct pmd_perf_stats *s, + enum pmd_stat_type counter, int delta) +{ + uint64_t tmp; + atomic_read_relaxed(&s->counters.n[counter], &tmp); + tmp += delta; + atomic_store_relaxed(&s->counters.n[counter], tmp); +} + +static inline void +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) +{ + if (OVS_UNLIKELY(s->start_ms == 0)) { + /* Stats not yet fully initialised. */ + return; + } + s->last_tsc = now_tsc; +} + +static inline void +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, + int packets) +{ + uint64_t cycles = now_tsc - s->last_tsc; + + /* No need for atomic updates as only invoked within PMD thread. */ + if (packets > 0) { + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; + } else { + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; + } +} + +#ifdef __cplusplus +} +#endif + +#endif /* DPIF_NETDEV_PERF_H */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 349e98a..8b9698a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -44,6 +44,7 @@ #include "csum.h" #include "dp-packet.h" #include "dpif.h" +#include "dpif-netdev-perf.h" #include "dpif-provider.h" #include "dummy.h" #include "fat-rwlock.h" @@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, odp_port_t) OVS_REQUIRES(dp->port_mutex); -enum dp_stat_type { - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ - DP_STAT_MISS, /* Packets that did not match. */ - DP_STAT_LOST, /* Packets not passed up to the client. */ - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table - hits */ - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ - DP_N_STATS -}; - -enum pmd_cycles_counter_type { - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and - * processing polled packets */ - PMD_N_CYCLES -}; - enum rxq_cycles_counter_type { RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and processing packets during the current @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( const struct dp_netdev_flow *); static void dp_netdev_actions_free(struct dp_netdev_actions *); -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ -struct dp_netdev_pmd_stats { - /* Indexed by DP_STAT_*. */ - atomic_ullong n[DP_N_STATS]; -}; - -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ -struct dp_netdev_pmd_cycles { - /* Indexed by PMD_CYCLES_*. */ - atomic_ullong n[PMD_N_CYCLES]; -}; - -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, - enum dp_stat_type type, int cnt); - struct polled_queue { struct dp_netdev_rxq *rxq; odp_port_t port_no; @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { are stored for each polled rxq. */ long long int rxq_next_cycle_store; - /* Statistics. */ - struct dp_netdev_pmd_stats stats; - - /* Cycles counters */ - struct dp_netdev_pmd_cycles cycles; - /* Current context of the PMD thread. */ struct dp_netdev_pmd_thread_ctx ctx; @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { struct hmap tnl_port_cache; struct hmap send_port_cache; - /* Only a pmd thread can write on its own 'cycles' and 'stats'. - * The main thread keeps 'stats_zero' and 'cycles_zero' as base - * values and subtracts them from 'stats' and 'cycles' before - * reporting to the user */ - unsigned long long stats_zero[DP_N_STATS]; - uint64_t cycles_zero[PMD_N_CYCLES]; + /* Keep track of detailed PMD performance statistics. */ + struct pmd_perf_stats perf_stats; /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; @@ -833,47 +790,10 @@ enum pmd_info_type { }; static void -pmd_info_show_stats(struct ds *reply, - struct dp_netdev_pmd_thread *pmd, - unsigned long long stats[DP_N_STATS], - uint64_t cycles[PMD_N_CYCLES]) +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) { - unsigned long long total_packets; - uint64_t total_cycles = 0; - double lookups_per_hit = 0, packets_per_batch = 0; - int i; - - /* These loops subtracts reference values ('*_zero') from the counters. - * Since loads and stores are relaxed, it might be possible for a '*_zero' - * value to be more recent than the current value we're reading from the - * counter. This is not a big problem, since these numbers are not - * supposed to be too accurate, but we should at least make sure that - * the result is not negative. */ - for (i = 0; i < DP_N_STATS; i++) { - if (stats[i] > pmd->stats_zero[i]) { - stats[i] -= pmd->stats_zero[i]; - } else { - stats[i] = 0; - } - } - - /* Sum of all the matched and not matched packets gives the total. */ - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] - + stats[DP_STAT_MISS]; - - for (i = 0; i < PMD_N_CYCLES; i++) { - if (cycles[i] > pmd->cycles_zero[i]) { - cycles[i] -= pmd->cycles_zero[i]; - } else { - cycles[i] = 0; - } - - total_cycles += cycles[i]; - } - ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) ? "main thread" : "pmd thread"); - if (pmd->numa_id != OVS_NUMA_UNSPEC) { ds_put_format(reply, " numa_id %d", pmd->numa_id); } @@ -881,23 +801,51 @@ pmd_info_show_stats(struct ds *reply, ds_put_format(reply, " core_id %u", pmd->core_id); } ds_put_cstr(reply, ":\n"); +} + +static void +pmd_info_show_stats(struct ds *reply, + struct dp_netdev_pmd_thread *pmd) +{ + uint64_t stats[PMD_N_STATS]; + uint64_t total_cycles = 0; + double passes_per_pkt = 0; + double lookups_per_hit = 0; + double packets_per_batch = 0; + + pmd_perf_read_counters(&pmd->perf_stats, stats); + total_cycles = stats[PMD_CYCLES_ITER_IDLE] + + stats[PMD_CYCLES_ITER_BUSY]; - if (stats[DP_STAT_MASKED_HIT] > 0) { - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] - / (double) stats[DP_STAT_MASKED_HIT]; + format_pmd_thread(reply, pmd); + + if (stats[PMD_STAT_RECV] > 0) { + passes_per_pkt = (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC]) + / (double) stats[PMD_STAT_RECV]; + } + if (stats[PMD_STAT_MASKED_HIT] > 0) { + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] + / (double) stats[PMD_STAT_MASKED_HIT]; } - if (stats[DP_STAT_SENT_BATCHES] > 0) { - packets_per_batch = stats[DP_STAT_SENT_PKTS] - / (double) stats[DP_STAT_SENT_BATCHES]; + if (stats[PMD_STAT_SENT_BATCHES] > 0) { + packets_per_batch = stats[PMD_STAT_SENT_PKTS] + / (double) stats[PMD_STAT_SENT_BATCHES]; } ds_put_format(reply, - "\temc hits:%llu\n\tmegaflow hits:%llu\n" - "\tavg. subtable lookups per hit:%.2f\n" - "\tmiss:%llu\n\tlost:%llu\n" + "\tpackets received:%"PRIu64"\n" + "\tpacket recirculations:%"PRIu64"\n" + "\tavg. datapath passes per packet:%.2f\n" + "\temc hits:%"PRIu64"\n" + "\tmegaflow hits:%"PRIu64"\n" + "\tavg. subtable lookups per megaflow hit:%.2f\n" + "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n" + "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n" "\tavg. packets per output batch: %.2f\n", - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], + stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC], + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], packets_per_batch); if (total_cycles == 0) { @@ -907,46 +855,26 @@ pmd_info_show_stats(struct ds *reply, ds_put_format(reply, "\tidle cycles:%"PRIu64" (%.02f%%)\n" "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", - cycles[PMD_CYCLES_IDLE], - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, - cycles[PMD_CYCLES_PROCESSING], - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); + stats[PMD_CYCLES_ITER_IDLE], + stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100, + stats[PMD_CYCLES_ITER_BUSY], + stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles * 100); + uint64_t total_packets = stats[PMD_STAT_RECV]; if (total_packets == 0) { return; } ds_put_format(reply, - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", + "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n", total_cycles / (double)total_packets, total_cycles, total_packets); ds_put_format(reply, "\tavg processing cycles per packet: " - "%.02f (%"PRIu64"/%llu)\n", - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, - cycles[PMD_CYCLES_PROCESSING], total_packets); -} - -static void -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, - struct dp_netdev_pmd_thread *pmd, - unsigned long long stats[DP_N_STATS], - uint64_t cycles[PMD_N_CYCLES]) -{ - int i; - - /* We cannot write 'stats' and 'cycles' (because they're written by other - * threads) and we shouldn't change 'stats' (because they're used to count - * datapath stats, which must not be cleared here). Instead, we save the - * current values and subtract them from the values to be displayed in the - * future */ - for (i = 0; i < DP_N_STATS; i++) { - pmd->stats_zero[i] = stats[i]; - } - for (i = 0; i < PMD_N_CYCLES; i++) { - pmd->cycles_zero[i] = cycles[i]; - } + "%.02f (%"PRIu64"/%"PRIu64")\n", + stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets, + stats[PMD_CYCLES_POLL_BUSY], total_packets); } static int @@ -1106,23 +1034,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], struct ds reply = DS_EMPTY_INITIALIZER; struct dp_netdev_pmd_thread **pmd_list; struct dp_netdev *dp = NULL; - size_t n; enum pmd_info_type type = *(enum pmd_info_type *) aux; + int core_id = -1; + size_t n; ovs_mutex_lock(&dp_netdev_mutex); - if (argc == 2) { - dp = shash_find_data(&dp_netdevs, argv[1]); - } else if (shash_count(&dp_netdevs) == 1) { - /* There's only one datapath */ - dp = shash_first(&dp_netdevs)->data; + while (argc > 1) { + if (!strcmp(argv[1], "-pmd")) { + core_id = strtol(argv[2], NULL, 10); + argc -= 2; + argv += 2; + } else { + dp = shash_find_data(&dp_netdevs, argv[1]); + argc -= 1; + argv += 1; + } } if (!dp) { - ovs_mutex_unlock(&dp_netdev_mutex); - unixctl_command_reply_error(conn, - "please specify an existing datapath"); - return; + if (shash_count(&dp_netdevs) == 1) { + /* There's only one datapath */ + dp = shash_first(&dp_netdevs)->data; + } else { + ovs_mutex_unlock(&dp_netdev_mutex); + unixctl_command_reply_error(conn, + "please specify an existing datapath"); + return; + } } sorted_poll_thread_list(dp, &pmd_list, &n); @@ -1131,26 +1070,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], if (!pmd) { break; } - + if (core_id != -1 && pmd->core_id != core_id) { + continue; + } if (type == PMD_INFO_SHOW_RXQ) { pmd_info_show_rxq(&reply, pmd); - } else { - unsigned long long stats[DP_N_STATS]; - uint64_t cycles[PMD_N_CYCLES]; - - /* Read current stats and cycle counters */ - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); - } - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); - } - - if (type == PMD_INFO_CLEAR_STATS) { - pmd_info_clear_stats(&reply, pmd, stats, cycles); - } else if (type == PMD_INFO_SHOW_STATS) { - pmd_info_show_stats(&reply, pmd, stats, cycles); - } + } else if (type == PMD_INFO_CLEAR_STATS) { + pmd_perf_stats_clear(&pmd->perf_stats); + } else if (type == PMD_INFO_SHOW_STATS) { + pmd_info_show_stats(&reply, pmd); } } free(pmd_list); @@ -1168,14 +1096,14 @@ dpif_netdev_init(void) clear_aux = PMD_INFO_CLEAR_STATS, poll_aux = PMD_INFO_SHOW_RXQ; - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core | dp]", + 0, 2, dpif_netdev_pmd_info, (void *)&show_aux); - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core | dp]", + 0, 2, dpif_netdev_pmd_info, (void *)&clear_aux); - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core | dp]", + 0, 2, dpif_netdev_pmd_info, (void *)&poll_aux); unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, @@ -1511,23 +1439,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *pmd; + uint64_t pmd_stats[PMD_N_STATS]; - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; + stats->n_flows = stats->n_hit = + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { - unsigned long long n; stats->n_flows += cmap_count(&pmd->flow_table); - - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); - stats->n_hit += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); - stats->n_hit += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); - stats->n_missed += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); - stats->n_lost += n; + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; + stats->n_missed += pmd_stats[PMD_STAT_MISS]; + stats->n_lost += pmd_stats[PMD_STAT_LOST]; } stats->n_masks = UINT32_MAX; - stats->n_mask_hit = UINT64_MAX; return 0; } @@ -3209,28 +3133,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) /* Stop counting cycles and add them to the counter 'type' */ static inline void cycles_count_end(struct dp_netdev_pmd_thread *pmd, - enum pmd_cycles_counter_type type) + enum pmd_stat_type type) OVS_RELEASES(&cycles_counter_fake_mutex) OVS_NO_THREAD_SAFETY_ANALYSIS { unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; - non_atomic_ullong_add(&pmd->cycles.n[type], interval); + pmd_perf_update_counter(&pmd->perf_stats, type, interval); } /* Calculate the intermediate cycle result and add to the counter 'type' */ static inline void cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_rxq *rxq, - enum pmd_cycles_counter_type type) + enum pmd_stat_type type) OVS_NO_THREAD_SAFETY_ANALYSIS { unsigned long long new_cycles = cycles_counter(); unsigned long long interval = new_cycles - pmd->ctx.last_cycles; pmd->ctx.last_cycles = new_cycles; - non_atomic_ullong_add(&pmd->cycles.n[type], interval); - if (rxq && (type == PMD_CYCLES_PROCESSING)) { + pmd_perf_update_counter(&pmd->perf_stats, type, interval); + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { /* Add to the amount of current processing cycles. */ non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); } @@ -3289,8 +3213,10 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); dp_packet_batch_init(&p->output_pkts); - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); + pmd_perf_update_counter(&pmd->perf_stats, + PMD_STAT_SENT_PKTS, output_cnt); + pmd_perf_update_counter(&pmd->perf_stats, + PMD_STAT_SENT_BATCHES, 1); } static void @@ -3971,12 +3897,12 @@ dpif_netdev_run(struct dpif *dpif) port->port_no); cycles_count_intermediate(non_pmd, NULL, process_packets - ? PMD_CYCLES_PROCESSING - : PMD_CYCLES_IDLE); + ? PMD_CYCLES_POLL_BUSY + : PMD_CYCLES_POLL_IDLE); } } } - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); pmd_thread_ctx_time_update(non_pmd); dpif_netdev_xps_revalidate_pmd(non_pmd, false); ovs_mutex_unlock(&dp->non_pmd_mutex); @@ -4121,6 +4047,7 @@ static void * pmd_thread_main(void *f_) { struct dp_netdev_pmd_thread *pmd = f_; + struct pmd_perf_stats *s = &pmd->perf_stats; unsigned int lc = 0; struct polled_queue *poll_list; bool exiting; @@ -4156,13 +4083,17 @@ reload: cycles_count_start(pmd); for (;;) { + uint64_t iter_packets = 0; + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); for (i = 0; i < poll_cnt; i++) { process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); cycles_count_intermediate(pmd, poll_list[i].rxq, - process_packets ? PMD_CYCLES_PROCESSING - : PMD_CYCLES_IDLE); + process_packets + ? PMD_CYCLES_POLL_BUSY + : PMD_CYCLES_POLL_IDLE); + iter_packets += process_packets; } if (lc++ > 1024) { @@ -4183,10 +4114,12 @@ reload: if (reload) { break; } + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); } + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); } - cycles_count_end(pmd, PMD_CYCLES_IDLE); + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); exiting = latch_is_set(&pmd->exit_latch); @@ -4640,6 +4573,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, } cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), hash_int(core_id, 0)); + pmd_perf_stats_init(&pmd->perf_stats); } static void @@ -4838,13 +4772,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); } -static void -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, - enum dp_stat_type type, int cnt) -{ - non_atomic_ullong_add(&pmd->stats.n[type], cnt); -} - static int dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, @@ -5017,6 +4944,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, int i; atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); + pmd_perf_update_counter(&pmd->perf_stats, + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, + cnt); DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { struct dp_netdev_flow *flow; @@ -5065,18 +4995,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, } } - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, - cnt - n_dropped - n_missed); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, + cnt - n_dropped - n_missed); return dp_packet_batch_size(packets_); } -static inline void +static inline int handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet, const struct netdev_flow_key *key, - struct ofpbuf *actions, struct ofpbuf *put_actions, - int *lost_cnt) + struct ofpbuf *actions, struct ofpbuf *put_actions) { struct ofpbuf *add_actions; struct dp_packet_batch b; @@ -5096,8 +5025,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, put_actions); if (OVS_UNLIKELY(error && error != ENOSPC)) { dp_packet_delete(packet); - (*lost_cnt)++; - return; + return error; } /* The Netlink encoding of datapath flow keys cannot express @@ -5137,6 +5065,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_mutex_unlock(&pmd->flow_mutex); emc_probabilistic_insert(pmd, key, netdev_flow); } + /* Only error ENOSPC can reach here. We process the packet but do not + * install a datapath flow. Treat as successful. */ + return 0; } static inline void @@ -5158,7 +5089,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, struct dpcls *cls; struct dpcls_rule *rules[PKT_ARRAY_SIZE]; struct dp_netdev *dp = pmd->dp; - int miss_cnt = 0, lost_cnt = 0; + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; int lookup_cnt = 0, add_lookup_cnt; bool any_miss; size_t i; @@ -5200,9 +5131,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, continue; } - miss_cnt++; - handle_packet_upcall(pmd, packet, &keys[i], &actions, - &put_actions, &lost_cnt); + int error = handle_packet_upcall(pmd, packet, &keys[i], + &actions, &put_actions); + + if (OVS_UNLIKELY(error)) { + upcall_fail_cnt++; + } else { + upcall_ok_cnt++; + } } ofpbuf_uninit(&actions); @@ -5212,8 +5148,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, DP_PACKET_BATCH_FOR_EACH (packet, packets_) { if (OVS_UNLIKELY(!rules[i])) { dp_packet_delete(packet); - lost_cnt++; - miss_cnt++; + upcall_fail_cnt++; } } } @@ -5231,10 +5166,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); } - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, + cnt - upcall_ok_cnt - upcall_fail_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, + lookup_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, + upcall_ok_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, + upcall_fail_cnt); } /* Packets enter the datapath from a port (or from recirculation) here. diff --git a/tests/pmd.at b/tests/pmd.at index e39a23a..0356f87 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=, requested_rx_queues=4, requested_tx_queues=) ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl pmd thread numa_id core_id : + packets received:0 + packet recirculations:0 + avg. datapath passes per packet:0.00 emc hits:0 megaflow hits:0 - avg. subtable lookups per hit:0.00 - miss:0 - lost:0 + avg. subtable lookups per megaflow hit:0.00 + miss(i.e. lookup miss with success upcall):0 + lost(i.e. lookup miss with failed upcall):0 ]) ovs-appctl time/stop @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl pmd thread numa_id core_id : + packets received:20 + packet recirculations:0 + avg. datapath passes per packet:1.00 emc hits:19 megaflow hits:0 - avg. subtable lookups per hit:0.00 - miss:1 - lost:0 + avg. subtable lookups per megaflow hit:0.00 + miss(i.e. lookup miss with success upcall):1 + lost(i.e. lookup miss with failed upcall):0 ]) OVS_VSWITCHD_STOP From patchwork Thu Jan 4 12:07:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 855574 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC6690kWJz9s7s for ; Thu, 4 Jan 2018 23:09:05 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 36E02BDB; Thu, 4 Jan 2018 12:08:03 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id AD823BAF for ; Thu, 4 Jan 2018 12:08:00 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mout.web.de (mout.web.de [212.227.17.11]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0A59014D for ; Thu, 4 Jan 2018 12:07:57 +0000 (UTC) Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb101 [213.165.67.124]) with ESMTPSA (Nemesis) id 0LaCji-1fBO0z06hi-00m4GR; Thu, 04 Jan 2018 13:07:55 +0100 From: Jan Scheurich To: dev@openvswitch.org Date: Thu, 4 Jan 2018 13:07:45 +0100 Message-Id: <1515067666-12398-3-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com> References: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com> X-Provags-ID: V03:K0:uPDcDMb/kt/U0muhQ/97zSChYlDjGxTGAkMddRL5nmSgBNkajlD gfTi677qvQ/ftW5nQ4QHtS59PRwxzRE65IK2bN1mGA56FII1Mo1pKmglf330OtYJogQa3Mh 3FEKb9FBZ4mQB/KwJ5+msNXPbgb6ZsOXMw2XFSCIJLXNuXI2rWilYBE+yqI28cnvrv6nJHZ mSSDtBtjVakuWeAjG4vlQ== X-UI-Out-Filterresults: notjunk:1; V01:K0:YVBuUBHocfo=:Md/m/6SlvPufIaZsJN5r1R TMZV6dcnnuCbL+xih8mEFAJoMdQZLJxss8NT1J/OkHAY7l2zRqaWvvKjdCokNFu9jUKkj00Qt anH3lirvWbJez0lvPdXGCTz4PPKVfYc6znDe/z0wobdnPOnfbDBuyZHm/qBqpiHL2eD+2u1iC edpCC15QOGWHMqD8A/liuRZv9D3+lxV0QBY2if9naaSk1WpoIZFyEKZmiWWhR54hu9604N8+N //gM7tbeJ2GvgnQaYGMX6mveWdpmkTbkR/GaCZ1Se6k/gsgGyaIvTDVQvtqGoB7JjuRbKyN4X W/mAJIqHcm73yhH7lljTb/oLrWmHcaAc5v7+kG4jmpxTB7Rg1w5Nkad6+7G3oi8TlJJQGmU33 5VpEQX7WuxZuI1FlyXDNvt4LdRBu0P4A5mlTceoR1Hng1BUlUXft92oKxvKfN5kjx/df6KQ7x ct6oU+Vh2QEvVlqDTuz+m6CwmxGNgHKfIKO2JXHOIKU169MJjKBvUrh38FEnK32ZZqRUQggUM QdOpFstiufNmYpbLraCzVT595DcLgzenbQnyEzLTFFv/Fg4wW5tyvO3EwJtT4e4IKhBYWLujT JYkmVbYbDJKMgZKJSnqJB0i/vc0QVu8gfRTtBAdqIP6w1dDJEppd2bi9u5CWsGNJ7WGQ7rHw8 lf+6ehCzbrO6Z6IFE92CnrrHuzkCFNtUEw5SesDGA/+sOLYwo7Y6Vhju3mr4JgeelcfaM83DU M7/ts1Lv9mrTABQcQ5LwRqFlAZnX8FFurKwgYdVQESrB6sqnaEAGSm+oq/L9LCU1hKFugp0Bs fG3ISG01NBcGcozzs8myrNdZfUPKA== X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v5 2/3] dpif-netdev: Detailed performance stats for PMDs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch instruments the dpif-netdev datapath to record detailed statistics of what is happening in every iteration of a PMD thread. The collection of detailed statistics can be controlled by a new configuration parameter "other_config:pmd-perf-metrics". By default it is disabled. The run-time overhead, when enabled, is in the order of 1%. The covered metrics per iteration are: - cycles - packets - (rx) batches - packets/batch - max. vhostuser qlen - upcalls - cycles spent in upcalls This raw recorded data is used threefold: 1. In histograms for each of the following metrics: - cycles/iteration (log.) - packets/iteration (log.) - cycles/packet - packets/batch - max. vhostuser qlen (log.) - upcalls - cycles/upcall (log) The histograms bins are divided linear or logarithmic. 2. A cyclic history of the above statistics for 1024 iterations 3. A cyclic history of the cummulative/average values per millisecond wall clock for the last 1024 milliseconds: - number of iterations - avg. cycles/iteration - packets (Kpps) - avg. packets/batch - avg. max vhost qlen - upcalls - avg. cycles/upcall The gathered performance statists can be printed at any time with the new CLI command ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len] [-pmd core] [dp] The options are -nh: Suppress the histograms -it iter_len: Display the last iter_len iteration stats -ms ms_len: Display the last ms_len millisecond stats -pmd core: Display only The performance statistics are reset with the existing dpif-netdev/pmd-stats-clear command. The output always contains the following global PMD statistics, similar to the pmd-stats-show command: Time: 15:24:55.270 Measurement duration: 1.008 s pmd thread numa_id 0 core_id 1: Cycles: 2419034712 (2.40 GHz) Iterations: 572817 (1.76 us/it) - idle: 486808 (15.9 % cycles) - busy: 86009 (84.1 % cycles) Packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes: 3599415 (1.50 passes/pkt) - EMC hits: 336472 ( 9.3 %) - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Signed-off-by: Jan Scheurich --- lib/dp-packet.h | 2 + lib/dpif-netdev-perf.c | 309 ++++++++++++++++++++++++++++++++++++++++++++++++- lib/dpif-netdev-perf.h | 173 ++++++++++++++++++++++++++- lib/dpif-netdev.c | 158 +++++++++++++++++++++++-- lib/netdev-dpdk.c | 23 +++- lib/netdev-dpdk.h | 14 +++ ofproto/ofproto-dpif.c | 3 +- 7 files changed, 664 insertions(+), 18 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b4b721c..7950247 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -695,8 +695,10 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */ struct dp_packet_batch { size_t count; + size_t qfill; /* Number of packets in queue when reading rx burst. */ bool trunc; /* true if the batch needs truncate. */ struct dp_packet *packets[NETDEV_MAX_BURST]; + }; static inline void diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index 7d8b7b2..a66a48c 100644 --- a/lib/dpif-netdev-perf.c +++ b/lib/dpif-netdev-perf.c @@ -15,6 +15,7 @@ */ #include +#include #ifdef DPDK_NETDEV #include @@ -27,13 +28,307 @@ VLOG_DEFINE_THIS_MODULE(pmd_perf); +#ifdef DPDK_NETDEV +static uint64_t +get_tsc_hz(void) +{ + return rte_get_tsc_hz(); +} +#else +static uint64_t +read_tsc(void) +{ + register uint64_t tsc asm("eax"); + asm volatile (".byte 15, 49" : : : "eax", "edx"); + return tsc; +} + +static uint64_t +get_tsc_hz(void) +{ + uint64_t tsc_start = read_tsc(); + xsleep(1); + return read_tsc() - tsc_start; +} +#endif + +static void +histogram_walls_set_lin(struct histogram *hist, uint32_t min, uint32_t max) +{ + int i; + + ovs_assert(min < max); + for (i = 0; i < NUM_BINS-1; i++) { + hist->wall[i] = min + (i * (max - min)) / (NUM_BINS - 2); + } + hist->wall[NUM_BINS-1] = UINT32_MAX; +} + +static void +histogram_walls_set_log(struct histogram *hist, uint32_t min, uint32_t max) +{ + int i, start, bins, wall; + double log_min, log_max; + + ovs_assert(min < max); + if (min > 0) { + log_min = log(min); + log_max = log(max); + start = 0; + bins = NUM_BINS - 1; + } else { + hist->wall[0] = 0; + log_min = log(1); + log_max = log(max); + start = 1; + bins = NUM_BINS - 2; + } + wall = start; + for (i = 0; i < bins; i++) { + /* Make sure each wall is monotonically increasing. */ + wall = MAX(wall, exp(log_min + (i * (log_max - log_min)) / (bins-1))); + hist->wall[start + i] = wall++; + } + if (hist->wall[NUM_BINS-2] < max) { + hist->wall[NUM_BINS-2] = max; + } + hist->wall[NUM_BINS-1] = UINT32_MAX; +} + +uint64_t +histogram_samples(const struct histogram *hist) +{ + uint64_t samples = 0; + + for (int i = 0; i < NUM_BINS; i++) { + samples += hist->bin[i]; + } + return samples; +} + +static void +histogram_clear(struct histogram *hist) +{ + int i; + + for (i = 0; i < NUM_BINS; i++) { + hist->bin[i] = 0; + } +} + +static void +history_init(struct history *h) +{ + memset(h, 0, sizeof(*h)); +} + void pmd_perf_stats_init(struct pmd_perf_stats *s) { - memset(s, 0 , sizeof(*s)); + /* The struct has been zeroed at allocation. */ + histogram_walls_set_log(&s->cycles, 500, 24000000); + histogram_walls_set_log(&s->pkts, 0, 1000); + histogram_walls_set_lin(&s->cycles_per_pkt, 100, 30000); + histogram_walls_set_lin(&s->pkts_per_batch, 0, 32); + histogram_walls_set_lin(&s->upcalls, 0, 30); + histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000); + histogram_walls_set_log(&s->max_vhost_qfill, 0, 512); s->start_ms = time_msec(); } void +pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, + double duration) +{ + uint64_t stats[PMD_N_STATS]; + double us_per_cycle = 1000000.0 / get_tsc_hz(); + + if (duration == 0) { + return; + } + + pmd_perf_read_counters(s, stats); + uint64_t tot_cycles = stats[PMD_CYCLES_ITER_IDLE] + + stats[PMD_CYCLES_ITER_BUSY]; + uint64_t packets = stats[PMD_STAT_RECV]; + uint64_t passes = stats[PMD_STAT_RECV] + + stats[PMD_STAT_RECIRC]; + uint64_t upcalls = stats[PMD_STAT_MISS]; + uint64_t upcall_cycles = stats[PMD_CYCLES_UPCALL]; + uint64_t tot_iter = histogram_samples(&s->pkts); + uint64_t idle_iter = s->pkts.bin[0]; + uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0; + + ds_put_format(str, + " Cycles: %12"PRIu64" (%.2f GHz)\n" + " Iterations: %12"PRIu64" (%.2f us/it)\n" + " - idle: %12"PRIu64" (%4.1f %% cycles)\n" + " - busy: %12"PRIu64" (%4.1f %% cycles)\n", + tot_cycles, (tot_cycles / duration) / 1E9, + tot_iter, tot_cycles * us_per_cycle / tot_iter, + idle_iter, + 100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles, + busy_iter, + 100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles); + if (packets > 0) { + ds_put_format(str, + " Packets: %12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n" + " Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n" + " - EMC hits: %12"PRIu64" (%4.1f %%)\n" + " - Megaflow hits: %12"PRIu64" (%4.1f %%, %.2f subtbl lookups/" + "hit)\n" + " - Upcalls: %12"PRIu64" (%4.1f %%, %.1f us/upcall)\n" + " - Lost upcalls: %12"PRIu64" (%4.1f %%)\n" + "\n", + packets, (packets / duration) / 1000, + 1.0 * stats[PMD_CYCLES_ITER_BUSY] / packets, + passes, packets ? 1.0 * passes / packets : 0, + stats[PMD_STAT_EXACT_HIT], + 100.0 * stats[PMD_STAT_EXACT_HIT] / passes, + stats[PMD_STAT_MASKED_HIT], + 100.0 * stats[PMD_STAT_MASKED_HIT] / passes, + stats[PMD_STAT_MASKED_HIT] + ? 1.0 * stats[PMD_STAT_MASKED_LOOKUP] / stats[PMD_STAT_MASKED_HIT] + : 0, + upcalls, 100.0 * upcalls / passes, + upcalls ? (upcall_cycles * us_per_cycle) / upcalls : 0, + stats[PMD_STAT_LOST], + 100.0 * stats[PMD_STAT_LOST] / passes); + } else { + ds_put_format(str, + " Packets: %12"PRIu64"\n" + "\n", + 0UL); + } +} + +void +pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s) +{ + int i; + + ds_put_cstr(str, "Histograms\n"); + ds_put_format(str, + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n", + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch", + "max vhost qlen", "upcalls/it", "cycles/upcall"); + for (i = 0; i < NUM_BINS-1; i++) { + ds_put_format(str, + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64 + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64 + " %-9d %-11"PRIu64"\n", + s->cycles.wall[i], s->cycles.bin[i], + s->pkts.wall[i],s->pkts.bin[i], + s->cycles_per_pkt.wall[i], s->cycles_per_pkt.bin[i], + s->pkts_per_batch.wall[i], s->pkts_per_batch.bin[i], + s->max_vhost_qfill.wall[i], s->max_vhost_qfill.bin[i], + s->upcalls.wall[i], s->upcalls.bin[i], + s->cycles_per_upcall.wall[i], s->cycles_per_upcall.bin[i]); + } + ds_put_format(str, + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64 + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64 + " %-9s %-11"PRIu64"\n", + ">", s->cycles.bin[i], + ">", s->pkts.bin[i], + ">", s->cycles_per_pkt.bin[i], + ">", s->pkts_per_batch.bin[i], + ">", s->max_vhost_qfill.bin[i], + ">", s->upcalls.bin[i], + ">", s->cycles_per_upcall.bin[i]); + if (s->totals.iterations > 0) { + ds_put_cstr(str, + "-----------------------------------------------------" + "-----------------------------------------------------" + "------------------------------------------------\n"); + ds_put_format(str, + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n", + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch", + "vhost qlen", "upcalls/it", "cycles/upcall"); + ds_put_format(str, + " %-21"PRIu64" %-21.5f %-21"PRIu64 + " %-21.5f %-21.5f %-21.5f %-21"PRIu32"\n", + s->totals.cycles / s->totals.iterations, + 1.0 * s->totals.pkts / s->totals.iterations, + s->totals.pkts + ? s->totals.busy_cycles / s->totals.pkts : 0, + s->totals.batches + ? 1.0 * s->totals.pkts / s->totals.batches : 0, + 1.0 * s->totals.max_vhost_qfill / s->totals.iterations, + 1.0 * s->totals.upcalls / s->totals.iterations, + s->totals.upcalls + ? s->totals.upcall_cycles / s->totals.upcalls : 0); + } +} + +void +pmd_perf_format_iteration_history(struct ds *str, struct pmd_perf_stats *s, + int n_iter) +{ + struct iter_stats *is; + size_t index; + int i; + + if (n_iter == 0) { + return; + } + ds_put_format(str, " %-17s %-10s %-10s %-10s %-10s " + "%-10s %-10s %-10s\n", + "tsc", "cycles", "packets", "cycles/pkt", "pkts/batch", + "vhost qlen", "upcalls", "cycles/upcall"); + for (i = 1; i <= n_iter; i++) { + index = (s->iterations.idx + HISTORY_LEN - i) % HISTORY_LEN; + is = &s->iterations.sample[index]; + ds_put_format(str, + " %-17"PRIu64" %-11"PRIu64" %-11"PRIu32 + " %-11"PRIu64" %-11"PRIu32" %-11"PRIu32 + " %-11"PRIu32" %-11"PRIu32"\n", + is->timestamp, + is->cycles, + is->pkts, + is->pkts ? is->cycles / is->pkts : 0, + is->batches ? is->pkts / is->batches : 0, + is->max_vhost_qfill, + is->upcalls, + is->upcalls ? is->upcall_cycles / is->upcalls : 0); + } +} + +void +pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int n_ms) +{ + struct iter_stats *is; + size_t index; + int i; + + if (n_ms == 0) { + return; + } + ds_put_format(str, + " %-12s %-10s %-10s %-10s %-10s" + " %-10s %-10s %-10s %-10s\n", + "ms", "iterations", "cycles/it", "Kpps", "cycles/pkt", + "pkts/batch", "vhost qlen", "upcalls", "cycles/upcall"); + for (i = 1; i <= n_ms; i++) { + index = (s->milliseconds.idx + HISTORY_LEN - i) % HISTORY_LEN; + is = &s->milliseconds.sample[index]; + ds_put_format(str, + " %-12"PRIu64" %-11"PRIu32" %-11"PRIu64 + " %-11"PRIu32" %-11"PRIu64" %-11"PRIu32 + " %-11"PRIu32" %-11"PRIu32" %-11"PRIu32"\n", + is->timestamp, + is->iterations, + is->iterations ? is->cycles / is->iterations : 0, + is->pkts, + is->pkts ? is->busy_cycles / is->pkts : 0, + is->batches ? is->pkts / is->batches : 0, + is->iterations + ? is->max_vhost_qfill / is->iterations : 0, + is->upcalls, + is->upcalls ? is->upcall_cycles / is->upcalls : 0); + } +} + +void pmd_perf_read_counters(struct pmd_perf_stats *s, uint64_t stats[PMD_N_STATS]) { @@ -59,8 +354,20 @@ void pmd_perf_stats_clear(struct pmd_perf_stats *s) { s->start_ms = 0; /* Marks start of clearing. */ + memset(&s->current, 0 , sizeof(struct iter_stats)); + memset(&s->totals, 0 , sizeof(struct iter_stats)); for (int i = 0; i < PMD_N_STATS; i++) { atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); } + histogram_clear(&s->cycles); + histogram_clear(&s->pkts); + histogram_clear(&s->cycles_per_pkt); + histogram_clear(&s->upcalls); + histogram_clear(&s->cycles_per_upcall); + histogram_clear(&s->pkts_per_batch); + histogram_clear(&s->max_vhost_qfill); + history_init(&s->iterations); + history_init(&s->milliseconds); s->start_ms = time_msec(); /* Clearing finished. */ + s->milliseconds.sample[0].timestamp = s->start_ms; } diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index fa5cf37..3611723 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -33,6 +33,11 @@ extern "C" { #endif +#define NUM_BINS 32 /* Number of histogram bins. */ +#define HISTORY_LEN 1000 /* Length of recorded history + (iterations and ms). */ +#define DEF_HIST_SHOW 20 /* Default number of history samples. */ + enum pmd_stat_type { PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ @@ -62,6 +67,28 @@ enum pmd_stat_type { PMD_N_STATS }; +struct histogram { + uint32_t wall[NUM_BINS]; + uint64_t bin[NUM_BINS]; +}; + +struct iter_stats { + uint64_t timestamp; + uint64_t cycles; + uint64_t busy_cycles; + uint32_t iterations; + uint32_t pkts; + uint32_t upcalls; + uint32_t upcall_cycles; + uint32_t batches; + uint32_t max_vhost_qfill; +}; + +struct history { + uint64_t idx; + struct iter_stats sample[HISTORY_LEN]; +}; + struct pmd_counters { atomic_uint64_t n[PMD_N_STATS]; /* Indexed by PMD_STAT_*. */ uint64_t zero[PMD_N_STATS]; @@ -71,6 +98,17 @@ struct pmd_perf_stats { uint64_t start_ms; uint64_t last_tsc; struct pmd_counters counters; + struct iter_stats current; + struct iter_stats totals; + struct histogram cycles; + struct histogram pkts; + struct histogram cycles_per_pkt; + struct histogram upcalls; + struct histogram cycles_per_upcall; + struct histogram pkts_per_batch; + struct histogram max_vhost_qfill; + struct history iterations; + struct history milliseconds; }; enum pmd_info_type; @@ -98,6 +136,63 @@ pmd_perf_update_counter(struct pmd_perf_stats *s, atomic_store_relaxed(&s->counters.n[counter], tmp); } +void pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, + double duration); +void pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s); +void pmd_perf_format_iteration_history(struct ds *str, + struct pmd_perf_stats *s, + int n_iter); +void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, + int n_ms); + +void pmd_perf_show(struct unixctl_conn *conn, int argc, + const char *argv[], + void *aux OVS_UNUSED); +static inline void +histogram_add_sample(struct histogram *hist, uint32_t val) +{ + /* TODO: Can do better with binary search? */ + for (int i = 0; i < NUM_BINS-1; i++) { + if (val <= hist->wall[i]) { + hist->bin[i]++; + return; + } + } + hist->bin[NUM_BINS-1]++; +} + +uint64_t histogram_samples(const struct histogram *hist); + +static inline struct iter_stats * +history_current(struct history *h) +{ + return &h->sample[h->idx]; +} + +static inline struct iter_stats * +history_next(struct history *h) +{ + struct iter_stats *next; + + h->idx++; + if (h->idx == HISTORY_LEN) { + h->idx = 0; + } + next = &h->sample[h->idx]; + memset(next, 0, sizeof(*next)); + return next; +} + +static inline struct iter_stats * +history_store(struct history *h, struct iter_stats *is) +{ + if (is) { + h->sample[h->idx] = *is; + } + /* Advance the history pointer */ + return history_next(h); +} + static inline void pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) { @@ -105,14 +200,25 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) /* Stats not yet fully initialised. */ return; } - s->last_tsc = now_tsc; + memset(&s->current, 0, sizeof(struct iter_stats)); + s->current.timestamp = now_tsc; } static inline void pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, - int packets) + int packets, bool full_metrics) { - uint64_t cycles = now_tsc - s->last_tsc; + struct iter_stats *cum_ms; + uint64_t cycles, cycles_per_pkt = 0; + + if (OVS_UNLIKELY(s->current.timestamp == 0)) { + /* Stats were cleared during the ongoing iteration. */ + return; + } + + cycles = now_tsc - s->current.timestamp; + s->current.cycles = cycles; + s->current.pkts = packets; /* No need for atomic updates as only invoked within PMD thread. */ if (packets > 0) { @@ -120,7 +226,68 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, } else { s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; } + /* Add iteration sample to histograms. */ + histogram_add_sample(&s->cycles, cycles); + histogram_add_sample(&s->pkts, packets); + + if (!full_metrics) { + return; + } + + s->counters.n[PMD_CYCLES_UPCALL] += s->current.upcall_cycles; + + if (packets > 0) { + cycles_per_pkt = cycles / packets; + histogram_add_sample(&s->cycles_per_pkt, cycles_per_pkt); + } + if (s->current.batches > 0) { + histogram_add_sample(&s->pkts_per_batch, packets / s->current.batches); + } + histogram_add_sample(&s->upcalls, s->current.upcalls); + if (s->current.upcalls > 0) { + histogram_add_sample(&s->cycles_per_upcall, + s->current.upcall_cycles / s->current.upcalls); + } + histogram_add_sample(&s->max_vhost_qfill, s->current.max_vhost_qfill); + + /* Add iteration samples to millisecond stats. */ + cum_ms = history_current(&s->milliseconds); + cum_ms->iterations++; + cum_ms->cycles += cycles; + if (packets > 0) { + cum_ms->busy_cycles += cycles; + } + cum_ms->pkts += s->current.pkts; + cum_ms->upcalls += s->current.upcalls; + cum_ms->upcall_cycles += s->current.upcall_cycles; + cum_ms->batches += s->current.batches; + cum_ms->max_vhost_qfill += s->current.max_vhost_qfill; + + /* Store in iteration history. */ + history_store(&s->iterations, &s->current); + if (now_tsc - s->last_tsc > 10000) { + /* Check if ms is completed and store in milliseconds history. */ + uint64_t now = time_msec(); + if (now != cum_ms->timestamp) { + /* Add ms stats to totals. */ + s->totals.iterations += cum_ms->iterations; + s->totals.cycles += cum_ms->cycles; + s->totals.busy_cycles += cum_ms->busy_cycles; + s->totals.pkts += cum_ms->pkts; + s->totals.upcalls += cum_ms->upcalls; + s->totals.upcall_cycles += cum_ms->upcall_cycles; + s->totals.batches += cum_ms->batches; + s->totals.max_vhost_qfill += cum_ms->max_vhost_qfill; + cum_ms = history_next(&s->milliseconds); + cum_ms->timestamp = now; + } + s->last_tsc = now_tsc; + } +} + +#ifdef __cplusplus } +#endif #ifdef __cplusplus } diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8b9698a..8f64df3 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -53,6 +53,7 @@ #include "id-pool.h" #include "latch.h" #include "netdev.h" +#include "netdev-provider.h" #include "netdev-vport.h" #include "netlink.h" #include "odp-execute.h" @@ -278,6 +279,8 @@ struct dp_netdev { /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min; + /* Enable collection of PMD performance metrics. */ + ATOMIC(bool) pmd_perf_metrics; /* Protects access to ofproto-dpif-upcall interface during revalidator * thread synchronization. */ @@ -703,6 +706,7 @@ static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); static void dp_netdev_request_reconfigure(struct dp_netdev *dp); +static inline bool pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd); static void emc_cache_init(struct emc_cache *flow_cache) @@ -786,7 +790,8 @@ get_dp_netdev(const struct dpif *dpif) enum pmd_info_type { PMD_INFO_SHOW_STATS, /* Show how cpu cycles are spent. */ PMD_INFO_CLEAR_STATS, /* Set the cycles count to 0. */ - PMD_INFO_SHOW_RXQ /* Show poll-lists of pmd threads. */ + PMD_INFO_SHOW_RXQ, /* Show poll-lists of pmd threads. */ + PMD_INFO_PERF_SHOW, /* Show pmd performance details. */ }; static void @@ -873,8 +878,46 @@ pmd_info_show_stats(struct ds *reply, ds_put_format(reply, "\tavg processing cycles per packet: " "%.02f (%"PRIu64"/%"PRIu64")\n", - stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets, - stats[PMD_CYCLES_POLL_BUSY], total_packets); + stats[PMD_CYCLES_ITER_BUSY] / (double)total_packets, + stats[PMD_CYCLES_ITER_BUSY], total_packets); +} + +static void +pmd_info_show_perf(struct ds *reply, + struct dp_netdev_pmd_thread *pmd, + struct pmd_perf_params *par) +{ + if (pmd->core_id != NON_PMD_CORE_ID) { + char *time_str = + xastrftime_msec("%H:%M:%S.###", time_wall_msec(), true); + long long now = time_msec(); + double duration = (now - pmd->perf_stats.start_ms) / 1000.0; + + ds_put_cstr(reply, "\n"); + ds_put_format(reply, "Time: %s\n", time_str); + ds_put_format(reply, "Measurement duration: %.3f s\n", duration); + ds_put_cstr(reply, "\n"); + format_pmd_thread(reply, pmd); + ds_put_cstr(reply, "\n"); + pmd_perf_format_overall_stats(reply, &pmd->perf_stats, duration); + if (pmd_perf_metrics_enabled(pmd)) { + if (par->histograms) { + ds_put_cstr(reply, "\n"); + pmd_perf_format_histograms(reply, &pmd->perf_stats); + } + if (par->iter_hist_len > 0) { + ds_put_cstr(reply, "\n"); + pmd_perf_format_iteration_history(reply, &pmd->perf_stats, + par->iter_hist_len); + } + if (par->ms_hist_len > 0) { + ds_put_cstr(reply, "\n"); + pmd_perf_format_ms_history(reply, &pmd->perf_stats, + par->ms_hist_len); + } + } + free(time_str); + } } static int @@ -1079,6 +1122,8 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], pmd_perf_stats_clear(&pmd->perf_stats); } else if (type == PMD_INFO_SHOW_STATS) { pmd_info_show_stats(&reply, pmd); + } else if (type == PMD_INFO_PERF_SHOW) { + pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params *)aux); } } free(pmd_list); @@ -1088,6 +1133,48 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], unixctl_command_reply(conn, ds_cstr(&reply)); ds_destroy(&reply); } + +static void +pmd_perf_show_cmd(struct unixctl_conn *conn, int argc, + const char *argv[], + void *aux OVS_UNUSED) +{ + struct pmd_perf_params par; + long int it_hist = 0, ms_hist = 0; + par.histograms = true; + + while (argc > 1) { + if (!strcmp(argv[1], "-nh")) { + par.histograms = false; + argc -= 1; + argv += 1; + } else if (!strcmp(argv[1], "-it") && argc > 2) { + it_hist = strtol(argv[2], NULL, 10); + if (it_hist < 0) { + it_hist = 0; + } else if (it_hist > HISTORY_LEN) { + it_hist = HISTORY_LEN; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-ms") && argc > 2) { + ms_hist = strtol(argv[2], NULL, 10); + if (ms_hist < 0) { + ms_hist = 0; + } else if (ms_hist > HISTORY_LEN) { + ms_hist = HISTORY_LEN; + } + argc -= 2; + argv += 2; + } else { + break; + } + } + par.iter_hist_len = it_hist; + par.ms_hist_len = ms_hist; + par.command_type = PMD_INFO_PERF_SHOW; + dpif_netdev_pmd_info(conn, argc, argv, &par); +} static int dpif_netdev_init(void) @@ -1105,6 +1192,12 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core | dp]", 0, 2, dpif_netdev_pmd_info, (void *)&poll_aux); + unixctl_command_register("dpif-netdev/pmd-perf-show", + "[-nh] [-it iter-history-len]" + " [-ms ms-history-len]" + " [-pmd core | dp]", + 0, 7, pmd_perf_show_cmd, + NULL); unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, NULL); @@ -2980,6 +3073,18 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) } } + bool perf_enabled = smap_get_bool(other_config, "pmd-perf-metrics", false); + bool cur_perf_enabled; + atomic_read_relaxed(&dp->pmd_perf_metrics, &cur_perf_enabled); + if (perf_enabled != cur_perf_enabled) { + atomic_store_relaxed(&dp->pmd_perf_metrics, perf_enabled); + if (perf_enabled) { + VLOG_INFO("PMD performance metrics collection enabled"); + } else { + VLOG_INFO("PMD performance metrics collection disabled"); + } + } + return 0; } @@ -3160,6 +3265,14 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, } } +static inline bool +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd) +{ + bool pmd_perf_enabled; + atomic_read_relaxed(&pmd->dp->pmd_perf_metrics, &pmd_perf_enabled); + return pmd_perf_enabled; +} + static void dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, enum rxq_cycles_counter_type type, @@ -3236,6 +3349,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, struct netdev_rxq *rx, odp_port_t port_no) { + struct pmd_perf_stats *s = &pmd->perf_stats; struct dp_packet_batch batch; int error; int batch_cnt = 0; @@ -3245,8 +3359,25 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, if (!error) { *recirc_depth_get() = 0; pmd_thread_ctx_time_update(pmd); - batch_cnt = batch.count; + if (pmd_perf_metrics_enabled(pmd)) { + /* Update batch histogram. */ + s->current.batches++; + histogram_add_sample(&s->pkts_per_batch, batch_cnt); + /* Update the maximum Rx queue fill level. */ + uint32_t qfill = batch.qfill; + switch (netdev_dpdk_get_type(netdev_rxq_get_netdev(rx))) { + case DPDK_DEV_VHOST: + if (qfill > s->current.max_vhost_qfill) { + s->current.max_vhost_qfill = qfill; + } + break; + case DPDK_DEV_ETH: + default: + break; + } + } + /* Process packet batch. */ dp_netdev_input(pmd, &batch, port_no); dp_netdev_pmd_flush_output_packets(pmd); } else if (error != EAGAIN && error != EOPNOTSUPP) { @@ -4084,7 +4215,9 @@ reload: cycles_count_start(pmd); for (;;) { uint64_t iter_packets = 0; + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); + for (i = 0; i < poll_cnt; i++) { process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, @@ -4116,7 +4249,9 @@ reload: } cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); } - pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); + + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets, + pmd_perf_metrics_enabled(pmd)); } cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); @@ -5012,6 +5147,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct match match; ovs_u128 ufid; int error; + uint64_t cycles = cycles_counter(); match.tun_md.valid = false; miniflow_expand(&key->mf, &match.flow); @@ -5065,9 +5201,15 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_mutex_unlock(&pmd->flow_mutex); emc_probabilistic_insert(pmd, key, netdev_flow); } - /* Only error ENOSPC can reach here. We process the packet but do not - * install a datapath flow. Treat as successful. */ - return 0; + if (pmd_perf_metrics_enabled(pmd)) { + /* Update upcall stats. */ + cycles = cycles_counter() - cycles; + struct pmd_perf_stats *s = &pmd->perf_stats; + s->current.upcalls++; + s->current.upcall_cycles += cycles; + histogram_add_sample(&s->cycles_per_upcall, cycles); + } + return error; } static inline void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 364f545..d701621 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -188,11 +189,6 @@ enum { DPDK_RING_SIZE = 256 }; BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE)); enum { DRAIN_TSC = 200000ULL }; -enum dpdk_dev_type { - DPDK_DEV_ETH = 0, - DPDK_DEV_VHOST = 1, -}; - /* Quality of Service */ /* An instance of a QoS configuration. Always associated with a particular @@ -835,6 +831,13 @@ netdev_dpdk_cast(const struct netdev *netdev) return CONTAINER_OF(netdev, struct netdev_dpdk, up); } +enum dpdk_dev_type +netdev_dpdk_get_type(const struct netdev *netdev) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + return dev->type; +} + static struct netdev * netdev_dpdk_alloc(void) { @@ -1644,6 +1647,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, return EAGAIN; } + batch->qfill = nb_rx; + + if (OVS_UNLIKELY(nb_rx == NETDEV_MAX_BURST)) { + batch->qfill += rte_vhost_rx_queue_count(netdev_dpdk_get_vid(dev), + qid * VIRTIO_QNUM + + VIRTIO_TXQ); + } + if (policer) { dropped = nb_rx; nb_rx = ingress_policer_run(policer, @@ -1683,6 +1694,8 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch) return EAGAIN; } + batch->qfill = nb_rx; + if (policer) { dropped = nb_rx; nb_rx = ingress_policer_run(policer, diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..2b357db 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -22,11 +22,18 @@ #include "openvswitch/compiler.h" struct dp_packet; +struct netdev; + +enum dpdk_dev_type { + DPDK_DEV_ETH = 0, + DPDK_DEV_VHOST = 1, +}; #ifdef DPDK_NETDEV void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); +enum dpdk_dev_type netdev_dpdk_get_type(const struct netdev *netdev); #else @@ -41,6 +48,13 @@ free_dpdk_buf(struct dp_packet *buf OVS_UNUSED) /* Nothing */ } +static inline enum dpdk_dev_type +netdev_dpdk_get_type(const struct netdev *netdev OVS_UNUSED) +{ + /* Nothing to do. Return value zero to make compiler happy. */ + return DPDK_DEV_ETH; +} + #endif #endif /* netdev-dpdk.h */ diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 43b7b89..3845b12 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -5317,7 +5317,8 @@ dpif_show_backer(const struct dpif_backer *backer, struct ds *ds) dpif_get_dp_stats(backer->dpif, &dp_stats); ds_put_format(ds, "%s: hit:%"PRIu64" missed:%"PRIu64"\n", - dpif_name(backer->dpif), dp_stats.n_hit, dp_stats.n_missed); + dpif_name(backer->dpif), + dp_stats.n_hit + dp_stats.n_mask_hit, dp_stats.n_missed); shash_init(&ofproto_shash); ofprotos = get_ofprotos(&ofproto_shash); From patchwork Thu Jan 4 12:07:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 855573 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC65X2L4jz9s7s for ; Thu, 4 Jan 2018 23:08:32 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 50141BD4; Thu, 4 Jan 2018 12:08:01 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2F342BAF for ; Thu, 4 Jan 2018 12:08:00 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mout.web.de (mout.web.de [212.227.17.12]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C554C18A for ; Thu, 4 Jan 2018 12:07:58 +0000 (UTC) Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb101 [213.165.67.124]) with ESMTPSA (Nemesis) id 0M7snk-1etFzY0DJ2-00vNxN; Thu, 04 Jan 2018 13:07:56 +0100 From: Jan Scheurich To: dev@openvswitch.org Date: Thu, 4 Jan 2018 13:07:46 +0100 Message-Id: <1515067666-12398-4-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com> References: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com> X-Provags-ID: V03:K0:YWM8sKdUBeBqeHqTm1colNYOQK69fDbwD5Uw/OZ5Z9oEDXnqX09 cG7YJUW9IP0XlYPnTN07kmV5eiPfeLCzydYdgJaFhMdTw5BLjQXXTFdt2HEJx2TtQuR7D1B HqWhPbqMvRbVtkO2otVe9L4v7KKy7vWNYnFUsoKRtPranG3FWTMlcj5YCaCN7Duqlhk1hvL NwNT6jSiRr8YY9GjBl7XQ== X-UI-Out-Filterresults: notjunk:1; V01:K0:bM79Is+UnLI=:mG6Dus26T/9j8wW2T4km5f qGDo+s9tzlyML36Szih3HOJwvCqo3MIKl4OcfW6Ag9QMj/nFbWGnNa2eAjkQ8WAI9zTLzbXK9 DL+KGBOBHeXbopW4vFhcE07QOsa83qHzqZFeIypn8e1RiNxSabflegY+xjuN6H97XKNh3MeyW vx0qkrlGqEtQc9QZniN9YCQoepz9Z2xoyGjSizAixKnosxEFo/L7vtz1hWtiyCGupmpC2yqLj 0dT3l8FdMehfpV3ln5Cwr9xMHl6O8U27t/bN8XWgzqVGzZZO5BXF3tp8wqI2DgpdUS8Ka4ojl 2cOgnXp9H9bi8+hcYZgF7bV3DxrwStr20dkL+tERtsKYFI5J3vkpx+03A6RMHxij9TypnDtmb qsE/A0zhtij7hiYFYVbbqGVwYNHJ3WXGPkPZpgd29GsGDbsJajjnAsrIVFVGLAkV3jvlRDpqI inR/VHHLiKKjzu/q+iPkPRzgm2jpHmB5k9c4MwHmd/TtVFGDn3p3XpCBbwwXjOkqkmxvQg7hb BsJeZXUVcCBTjBYsVU9PYRDeCbjbwDB+O3SptDJZGlU3mngD1wSMbtyhOhS/5GSrRSxJFInP7 UHv/0JztMBno+pFdPYkmbiWvpNCD21lfZGuQBjL72GFWFhSzENn9aUk/nTH5h6mHIEWwEDLbc pQ8Le27+dQlGkMdKke/bNtNL2yX8VTD6jwezQIFKKF7lGsot7havf2vh/dAFj2pBVzmohrVx6 9boiJmbLbRAvA8uGimkrRF3G0xBG9HuwRX+J53DaafMc2GYgfkFOesOhwgl8P9vEz0HMmcnWu MeRK0tKdCvdp++EIY6yhpY8coK6ZA== X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v5 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch enhances dpif-netdev-perf to detect iterations with suspicious statistics according to the following criteria: - iteration lasts longer than US_THR microseconds (default 250). This can be used to capture events where a PMD is blocked or interrupted for such a period of time that there is a risk for dropped packets on any of its Rx queues. - max vhost qlen exceeds a threshold Q_THR (default 128). This can be used to infer virtio queue overruns and dropped packets inside a VM, which are not visible in OVS otherwise. Such suspicious iterations can be logged together with their iteration statistics to be able to correlate them to packet drop or other events outside OVS. A new command is introduced to enable/disable logging at run-time and to adjust the above thresholds for suspicious iterations: ovs-appctl dpif-netdev/pmd-perf-log-set [on|off] [-b before] [-a after] [-us usec] [-q qlen] Turn logging on or off at run-time (on|off). -b before: The number of iterations before the suspicious iteration to be logged (default 5). -a after: The number of iterations after the suspicious iteration to be logged (default 5). -q qlen: Suspicious vhost queue fill level threshold. Increase this to 512 if the Qemu supports 1024 virtio queue length. (default 128). -us usec: change the duration threshold for a suspicious iteration (default 250 us). If more than 100 iterations before or after a suspicious iteration have been looged once, OVS falls back to the safe default values (5/5) to avoid that logging itself causes continuos further logging. Signed-off-by: Jan Scheurich --- lib/dpif-netdev-perf.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++ lib/dpif-netdev-perf.h | 32 +++++++++++ lib/dpif-netdev.c | 5 ++ 3 files changed, 179 insertions(+) diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index a66a48c..3fd19b0 100644 --- a/lib/dpif-netdev-perf.c +++ b/lib/dpif-netdev-perf.c @@ -28,6 +28,23 @@ VLOG_DEFINE_THIS_MODULE(pmd_perf); +#define ITER_US_THRESHOLD 250 /* Warning threshold for iteration duration + in microseconds. */ +#define VHOST_QUEUE_FULL 128 /* Size of the virtio TX queue. */ +#define LOG_IT_BEFORE 5 /* Number of iteration to log before + suspicious iteration. */ +#define LOG_IT_AFTER 5 /* Number of iteration to log after + suspicious iteration. */ + +bool log_on = false; +static uint32_t log_it_before = LOG_IT_BEFORE; +static uint32_t log_it_after = LOG_IT_AFTER; +static uint32_t log_us_thr = ITER_US_THRESHOLD; +uint32_t log_q_thr = VHOST_QUEUE_FULL; +uint64_t iter_cycle_threshold; + +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600); + #ifdef DPDK_NETDEV static uint64_t get_tsc_hz(void) @@ -133,6 +150,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s) { histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000); histogram_walls_set_log(&s->max_vhost_qfill, 0, 512); s->start_ms = time_msec(); + s->log_begin_it = UINT64_MAX; + s->log_end_it = UINT64_MAX; } void @@ -368,6 +387,129 @@ pmd_perf_stats_clear(struct pmd_perf_stats *s) histogram_clear(&s->max_vhost_qfill); history_init(&s->iterations); history_init(&s->milliseconds); + s->log_begin_it = UINT64_MAX; + s->log_end_it = UINT64_MAX; s->start_ms = time_msec(); /* Clearing finished. */ s->milliseconds.sample[0].timestamp = s->start_ms; } + +void +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s, + uint64_t cycles, + char *reason) +{ + VLOG_WARN_RL(&latency_rl, + "Suspicious iteration (%s): tsc=%"PRIu64 + " duration=%"PRIu64" us\n", + reason, s->current.timestamp, + (1000000L * cycles) / get_tsc_hz()); + if (log_it_before + log_it_after > 0) { + if (s->log_begin_it == UINT64_MAX) { + s->log_begin_it = + (s->iterations.idx + HISTORY_LEN - log_it_before) + % HISTORY_LEN; + } + s->log_end_it = + (s->iterations.idx + log_it_after) % HISTORY_LEN; + } +} + +void +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) +{ + struct ds log = DS_EMPTY_INITIALIZER; + pmd_perf_format_iteration_history(&log, s, + (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN); + VLOG_WARN_RL(&latency_rl, + "Neighborhood of suspicious iteration:\n" + "%s", ds_cstr(&log)); + ds_destroy(&log); + s->log_end_it = s->log_begin_it = UINT64_MAX; + if (log_it_before > 100 || log_it_after > 100) { + /* Reset to safe default values to avoid disturbance. */ + log_it_before = LOG_IT_BEFORE; + log_it_after = LOG_IT_AFTER; + } +} + +void +pmd_perf_log_set_cmd(struct unixctl_conn *conn, + int argc, const char *argv[], + void *aux OVS_UNUSED) +{ + int it_before, it_after, us_thr, q_thr; + bool on; + bool usage = false; + + on = log_on; + it_before = log_it_before; + it_after = log_it_after; + q_thr = log_q_thr; + us_thr = log_us_thr; + + while (argc > 1) { + if (!strcmp(argv[1], "on")) { + on = true; + argc--; + argv++; + } else if (!strcmp(argv[1], "off")) { + on = false; + argc--; + argv++; + } else if (!strcmp(argv[1], "-a")) { + it_after = strtol(argv[2], NULL, 10); + if (it_after < 0) { + it_after = 0; + } else if (it_before + it_after > HISTORY_LEN-1) { + it_after = HISTORY_LEN-1 - it_before; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-b")) { + it_before = strtol(argv[2], NULL, 10); + if (it_before < 0) { + it_before = 0; + } else if (it_before > HISTORY_LEN-10) { + it_before = HISTORY_LEN-10; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-q")) { + q_thr = strtol(argv[2], NULL, 10); + if (q_thr < 0) { + q_thr = 0; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-us")) { + us_thr = strtol(argv[2], NULL, 10); + if (us_thr < 0) { + us_thr = 0; + } + argc -= 2; + argv += 2; + } else { + usage = true; + break; + } + } + + if (usage) { + unixctl_command_reply_error(conn, + "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set " + "[on|off] [-b before] [-a after] [-us usec] [-q qlen]"); + return; + } + + VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d," + " q_thr=%d\n", + on ? "on" : "off", it_before, it_after, us_thr, q_thr); + log_on = on; + log_it_before = it_before; + log_it_after = it_after; + log_q_thr = q_thr; + log_us_thr = us_thr; + iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L; + + unixctl_command_reply(conn, ""); +} diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 3611723..392ca80 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -109,6 +109,8 @@ struct pmd_perf_stats { struct histogram max_vhost_qfill; struct history iterations; struct history milliseconds; + uint64_t log_begin_it; + uint64_t log_end_it; }; enum pmd_info_type; @@ -121,6 +123,10 @@ struct pmd_perf_params { size_t ms_hist_len; }; +extern bool log_on; +extern uint32_t log_q_thr; +extern uint64_t iter_cycle_threshold; + void pmd_perf_stats_init(struct pmd_perf_stats *s); void pmd_perf_stats_clear(struct pmd_perf_stats *s); void pmd_perf_read_counters(struct pmd_perf_stats *s, @@ -145,9 +151,17 @@ void pmd_perf_format_iteration_history(struct ds *str, void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int n_ms); +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s, + uint64_t cycles, + char *reason); +void pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s); void pmd_perf_show(struct unixctl_conn *conn, int argc, const char *argv[], void *aux OVS_UNUSED); +void pmd_perf_log_set_cmd(struct unixctl_conn *conn, + int argc, const char *argv[], + void *aux OVS_UNUSED); + static inline void histogram_add_sample(struct histogram *hist, uint32_t val) { @@ -210,6 +224,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, { struct iter_stats *cum_ms; uint64_t cycles, cycles_per_pkt = 0; + char *reason = NULL; if (OVS_UNLIKELY(s->current.timestamp == 0)) { /* Stats were cleared during the ongoing iteration. */ @@ -263,6 +278,23 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, cum_ms->batches += s->current.batches; cum_ms->max_vhost_qfill += s->current.max_vhost_qfill; + if (log_on) { + /* Log suspicious iterations. */ + if (cycles > iter_cycle_threshold) { + reason = "Excessive total cycles"; + } else if (s->current.max_vhost_qfill >= log_q_thr) { + reason = "Vhost RX queue full"; + } + if (OVS_UNLIKELY(reason)) { + pmd_perf_log_suspicious_iteration(s, cycles, reason); + } + + /* Log iteration stats interval when required. */ + if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) { + pmd_perf_log_iteration_neighborhood(s); + } + } + /* Store in iteration history. */ history_store(&s->iterations, &s->current); if (now_tsc - s->last_tsc > 10000) { diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8f64df3..96cc4d5 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1201,6 +1201,11 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, NULL); + unixctl_command_register("dpif-netdev/pmd-perf-log-set", + "[on|off] [-b before] [-a after] " + "[-us usec] [-q qlen]", + 0, 10, pmd_perf_log_set_cmd, + NULL); return 0; }