From patchwork Wed Nov 22 12:11:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 840392 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=samsung.com header.i=@samsung.com header.b="qRyUrNUu"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhhCS3np2z9s7B for ; Wed, 22 Nov 2017 23:12:03 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id EBBFBBF8; Wed, 22 Nov 2017 12:11:59 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 3504BBD3 for ; Wed, 22 Nov 2017 12:11:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C78E1134 for ; Wed, 22 Nov 2017 12:11:56 +0000 (UTC) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20171122121153euoutp016d681b2198c2245266385c310737cb93~5Z0kfWy2L1533715337euoutp01o; Wed, 22 Nov 2017 12:11:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com 20171122121153euoutp016d681b2198c2245266385c310737cb93~5Z0kfWy2L1533715337euoutp01o DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1511352713; bh=GPtEVvxcqUzwQNjboC6i7GNoFV/DgwMjIy5lZgqARtk=; h=From:To:Cc:Subject:Date:References:From; b=qRyUrNUucpUvopl7qGbT6b1ai83tggBSX0/rQQVxtf9eRKzNbDrMDni4F5HPurCxm hXXQOSOpczUfD2cSMDsScnPJRkXIFBYBoYNdAGoLo0AGpqqBR/10XXm2LCA91ejR1S Hvjz5nby7qskdNE8xZ+X4WKEgTHIsBOafQ6mz49A= Received: from eusmges5.samsung.com (unknown [203.254.199.245]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20171122121152eucas1p1a2f69d958fecca5fdaa4e590138e32cb~5Z0jxs6rz2979929799eucas1p1_; Wed, 22 Nov 2017 12:11:52 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges5.samsung.com (EUCPMTA) with SMTP id C1.B3.12743.889651A5; Wed, 22 Nov 2017 12:11:52 +0000 (GMT) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20171122121152eucas1p255a20537045e26274b725529604110f6~5Z0i9oS0c1683216832eucas1p2P; Wed, 22 Nov 2017 12:11:52 +0000 (GMT) X-AuditID: cbfec7f5-f79d06d0000031c7-0f-5a156988edbc Received: from eusync4.samsung.com ( [203.254.199.214]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id FD.A7.20118.889651A5; Wed, 22 Nov 2017 12:11:52 +0000 (GMT) Received: from imaximets.rnd.samsung.ru ([106.109.129.180]) by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OZT00BNDJ7NLV00@eusync4.samsung.com>; Wed, 22 Nov 2017 12:11:52 +0000 (GMT) From: Ilya Maximets To: ovs-dev@openvswitch.org, Ben Pfaff , Bhanuprakash Bodireddy Date: Wed, 22 Nov 2017 15:11:45 +0300 Message-id: <1511352705-2125-1-git-send-email-i.maximets@samsung.com> X-Mailer: git-send-email 2.7.4 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrNIsWRmVeSWpSXmKPExsWy7djPc7odmaJRBt+PalrsfKZs8WpyA6PF tM+32S2utP9kt9j48CyrxYobpxgt5n56zujA7rF4z0smj2c3/zN6PL/Ww+LRt2UVYwBLFJdN SmpOZllqkb5dAlfG+8Yn7AUHYivOTZrN2sD42LmLkYNDQsBE4uyzjC5GTiBTTOLCvfVsXYxc HEICSxklnv55ygLhfGaUaHj7hxGiykSi9VQLO0RiGaPEn7vTWUESQgLNTBI3fniD2GwCOhKn Vh9hBNkgIlAosfNiEUg9s8BiRokHd5azgNQICwRKrJh1jQnEZhFQlVg2/yRYnFfAVeL3wbnM EMvkJG6e62QGaZYQ+MgqcefXBzaIhIvE3tMv2SFsYYlXx7dA2TISlyd3s0A0NAOdveoSI4Qz gVHiS/NyJogqe4lTN6+C2cwCfBKTtk1nhgQGr0RHmxCE6SGxpdsGotpR4siuBnaIJ2MlPh26 wTqBUWoBI8MqRpHU0uLc9NRiU73ixNzi0rx0veT83E2MwCg8/e/41x2MS49ZHWIU4GBU4uGd kSQSJcSaWFZcmXuIUYKDWUmEd6+CaJQQb0piZVVqUX58UWlOavEhRmkOFiVxXtuotkghgfTE ktTs1NSC1CKYLBMHp1QDo+uiiSvvdE7hXNbC4hx0NM1ZaPtU0W/7t4XU+KRJpvvZ/j3rX3jm 1NnFydUWk/3/Ra5b13vYrqFXgpVl1SFe3mwe36wrencUIjPOfSv4uN1d4MHRM+ple79aB7Tu mjjpZp5s0aJmt5Ci3sQVG5d8sl/bpXKMsavtgc75+onrfdoW8x2Wtb5+QomlOCPRUIu5qDgR ACvT0G2+AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrKJMWRmVeSWpSXmKPExsVy+t/xa7odmaJRBl/6mS12PlO2eDW5gdFi 2ufb7BZX2n+yW2x8eJbVYsWNU4wWcz89Z3Rg91i85yWTx7Ob/xk9nl/rYfHo27KKMYAlissm JTUnsyy1SN8ugSvjfeMT9oIDsRXnJs1mbWB87NzFyMkhIWAi0XqqhR3CFpO4cG89G4gtJLCE UeLNHoMuRi4gu5VJomNjLwtIgk1AR+LU6iOMXYwcHCIChRIHjkmB1DALLGaU2P1xA9ggYQF/ iUnrfzKC2CwCqhLL5p8E6+UVcJX4fXAuM8QyOYmb5zqZJzByL2BkWMUoklpanJueW2ykV5yY W1yal66XnJ+7iREYGNuO/dyyg7HrXfAhRgEORiUe3hlJIlFCrIllxZW5hxglOJiVRHj3KohG CfGmJFZWpRblxxeV5qQWH2KU5mBREuft3bM6UkggPbEkNTs1tSC1CCbLxMEp1cC42/3RxqL1 6RZXuGc7TG48tXHDxsO2hwseVFcv01kTVLS2cv+bS7c+Z2gttGIvYP7+VXZd5YoJ3BpzVt19 +kfjgI6+2JZuqy0i/+5dE1rWO3W6RMWhwx3P22XmMbR4X9t9XenHjGhhm2oO770Lv/bPM3yU dlK98cGfHlMGNxPhCwuWHuwpjkn5r8RSnJFoqMVcVJwIAJJw918IAgAA X-CMS-MailID: 20171122121152eucas1p255a20537045e26274b725529604110f6 X-Msg-Generator: CA CMS-TYPE: 201P X-CMS-RootMailID: 20171122121152eucas1p255a20537045e26274b725529604110f6 X-RootMTR: 20171122121152eucas1p255a20537045e26274b725529604110f6 References: X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Ilya Maximets , Heetae Ahn Subject: [ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure." X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This reverts commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941. Padding and aligning of dp_netdev_pmd_thread structure members is useless, broken in a several ways and only greatly degrades maintainability and extensibility of the structure. Issues: 1. It's not working because all the instances of struct dp_netdev_pmd_thread allocated only by usual malloc. All the memory is not aligned to cachelines -> structure almost never starts at aligned memory address. This means that any further paddings and alignments inside the structure are completely useless. Fo example: Breakpoint 1, pmd_thread_main (gdb) p pmd $49 = (struct dp_netdev_pmd_thread *) 0x1b1af20 (gdb) p &pmd->cacheline1 $51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60 (gdb) p &pmd->cacheline0 $52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20 (gdb) p &pmd->flow_cache $53 = (struct emc_cache *) 0x1b1afe0 All of the above addresses shifted from cacheline start by 32B. Can we fix it properly? NO. OVS currently doesn't have appropriate API to allocate aligned memory. The best candidate is 'xmalloc_cacheline()' but it clearly states that "The memory returned will not be at the start of a cache line, though, so don't assume such alignment". And also, this function will never return aligned memory on Windows or MacOS. 2. CACHE_LINE_SIZE is not constant. Different architectures have different cache line sizes, but the code assumes that CACHE_LINE_SIZE is always equal to 64 bytes. All the structure members are grouped by 64 bytes and padded to CACHE_LINE_SIZE. This leads to a huge holes in a structures if CACHE_LINE_SIZE differs from 64. This is opposite to portability. If I want good performance of cmap I need to have CACHE_LINE_SIZE equal to the real cache line size, but I will have huge holes in the structures. If you'll take a look to struct rte_mbuf from DPDK you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure. 3. Sizes of system/libc defined types are not constant for all the systems. For example, sizeof(pthread_mutex_t) == 48 on my ARMv8 machine, but only 40 on x86. The difference could be much bigger on Windows or MacOS systems. But the code assumes that sizeof(struct ovs_mutex) is always 48 bytes. This may lead to broken alignment/big holes in case of padding/wrong comments about amount of free pad bytes. 4. Sizes of the many fileds in structure depends on defines like DP_N_STATS, PMD_N_CYCLES, EM_FLOW_HASH_ENTRIES and so on. Any change in these defines or any change in any structure contained by thread should lead to the not so simple refactoring of the whole dp_netdev_pmd_thread structure. This greatly reduces maintainability and complicates development of a new features. 5. There is no reason to align flow_cache member because it's too big and we usually access random entries by single thread only. So, the padding/alignment only creates some visibility of performance optimization but does nothing useful in reality. It only complicates maintenance and adds huge holes for non-x86 architectures and non-Linux systems. Performance improvement stated in a original commit message should be random and not valuable. I see no performance difference. Most of the above issues are also true for some other padded/aligned structures like 'struct netdev_dpdk'. They will be treated separately. CC: Bhanuprakash Bodireddy CC: Ben Pfaff Signed-off-by: Ilya Maximets Acked-by: Jan Scheurich --- lib/dpif-netdev.c | 160 +++++++++++++++++++++++------------------------------- 1 file changed, 69 insertions(+), 91 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a62630..6784269 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -547,31 +547,18 @@ struct tx_port { * actions in either case. * */ struct dp_netdev_pmd_thread { - PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, - struct dp_netdev *dp; - struct cmap_node node; /* In 'dp->poll_threads'. */ - pthread_cond_t cond; /* For synchronizing pmd thread - reload. */ - ); - - PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, - struct ovs_mutex cond_mutex; /* Mutex for condition variable. */ - pthread_t thread; - unsigned core_id; /* CPU core id of this pmd thread. */ - int numa_id; /* numa node id of this pmd thread. */ - ); + struct dp_netdev *dp; + struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ + struct cmap_node node; /* In 'dp->poll_threads'. */ + + pthread_cond_t cond; /* For synchronizing pmd thread reload. */ + struct ovs_mutex cond_mutex; /* Mutex for condition variable. */ /* Per thread exact-match cache. Note, the instance for cpu core * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly * need to be protected by 'non_pmd_mutex'. Every other instance * will only be accessed by its own pmd thread. */ - OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache; - struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ - - /* Queue id used by this pmd thread to send packets on all netdevs if - * XPS disabled for this netdev. All static_tx_qid's are unique and less - * than 'cmap_count(dp->poll_threads)'. */ - uint32_t static_tx_qid; + struct emc_cache flow_cache; /* Flow-Table and classifiers * @@ -580,77 +567,68 @@ struct dp_netdev_pmd_thread { * 'flow_mutex'. */ struct ovs_mutex flow_mutex; - PADDED_MEMBERS(CACHE_LINE_SIZE, - struct cmap flow_table OVS_GUARDED; /* Flow table. */ - - /* One classifier per in_port polled by the pmd */ - struct cmap classifiers; - /* Periodically sort subtable vectors according to hit frequencies */ - long long int next_optimization; - /* End of the next time interval for which processing cycles - are stored for each polled rxq. */ - long long int rxq_next_cycle_store; - - /* Cycles counters */ - struct dp_netdev_pmd_cycles cycles; - - /* Used to count cycles. See 'cycles_counter_end()'. */ - unsigned long long last_cycles; - struct latch exit_latch; /* For terminating the pmd thread. */ - ); - - PADDED_MEMBERS(CACHE_LINE_SIZE, - /* Statistics. */ - struct dp_netdev_pmd_stats stats; - - struct seq *reload_seq; - uint64_t last_reload_seq; - atomic_bool reload; /* Do we need to reload ports? */ - bool isolated; - - /* Set to true if the pmd thread needs to be reloaded. */ - bool need_reload; - /* 5 pad bytes. */ - ); - - PADDED_MEMBERS(CACHE_LINE_SIZE, - struct ovs_mutex port_mutex; /* Mutex for 'poll_list' - and 'tx_ports'. */ - /* 16 pad bytes. */ - ); - PADDED_MEMBERS(CACHE_LINE_SIZE, - /* List of rx queues to poll. */ - struct hmap poll_list OVS_GUARDED; - /* Map of 'tx_port's used for transmission. Written by the main - * thread, read by the pmd thread. */ - struct hmap tx_ports OVS_GUARDED; - ); - PADDED_MEMBERS(CACHE_LINE_SIZE, - /* These are thread-local copies of 'tx_ports'. One contains only - * tunnel ports (that support push_tunnel/pop_tunnel), the other - * contains ports with at least one txq (that support send). - * A port can be in both. - * - * There are two separate maps to make sure that we don't try to - * execute OUTPUT on a device which has 0 txqs or PUSH/POP on a - * non-tunnel device. - * - * The instances for cpu core NON_PMD_CORE_ID can be accessed by - * multiple threads and thusly need to be protected by 'non_pmd_mutex'. - * Every other instance will only be accessed by its own pmd thread. */ - struct hmap tnl_port_cache; - struct hmap send_port_cache; - ); - - PADDED_MEMBERS(CACHE_LINE_SIZE, - /* Only a pmd thread can write on its own 'cycles' and 'stats'. - * The main thread keeps 'stats_zero' and 'cycles_zero' as base - * values and subtracts them from 'stats' and 'cycles' before - * reporting to the user */ - unsigned long long stats_zero[DP_N_STATS]; - uint64_t cycles_zero[PMD_N_CYCLES]; - /* 8 pad bytes. */ - ); + struct cmap flow_table OVS_GUARDED; /* Flow table. */ + + /* One classifier per in_port polled by the pmd */ + struct cmap classifiers; + /* Periodically sort subtable vectors according to hit frequencies */ + long long int next_optimization; + /* End of the next time interval for which processing cycles + are stored for each polled rxq. */ + long long int rxq_next_cycle_store; + + /* Statistics. */ + struct dp_netdev_pmd_stats stats; + + /* Cycles counters */ + struct dp_netdev_pmd_cycles cycles; + + /* Used to count cicles. See 'cycles_counter_end()' */ + unsigned long long last_cycles; + + struct latch exit_latch; /* For terminating the pmd thread. */ + struct seq *reload_seq; + uint64_t last_reload_seq; + atomic_bool reload; /* Do we need to reload ports? */ + pthread_t thread; + unsigned core_id; /* CPU core id of this pmd thread. */ + int numa_id; /* numa node id of this pmd thread. */ + bool isolated; + + /* Queue id used by this pmd thread to send packets on all netdevs if + * XPS disabled for this netdev. All static_tx_qid's are unique and less + * than 'cmap_count(dp->poll_threads)'. */ + uint32_t static_tx_qid; + + struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */ + /* List of rx queues to poll. */ + struct hmap poll_list OVS_GUARDED; + /* Map of 'tx_port's used for transmission. Written by the main thread, + * read by the pmd thread. */ + struct hmap tx_ports OVS_GUARDED; + + /* These are thread-local copies of 'tx_ports'. One contains only tunnel + * ports (that support push_tunnel/pop_tunnel), the other contains ports + * with at least one txq (that support send). A port can be in both. + * + * There are two separate maps to make sure that we don't try to execute + * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device. + * + * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple + * threads, and thusly need to be protected by 'non_pmd_mutex'. Every + * other instance will only be accessed by its own pmd thread. */ + struct hmap tnl_port_cache; + struct hmap send_port_cache; + + /* Only a pmd thread can write on its own 'cycles' and 'stats'. + * The main thread keeps 'stats_zero' and 'cycles_zero' as base + * values and subtracts them from 'stats' and 'cycles' before + * reporting to the user */ + unsigned long long stats_zero[DP_N_STATS]; + uint64_t cycles_zero[PMD_N_CYCLES]; + + /* Set to true if the pmd thread needs to be reloaded. */ + bool need_reload; }; /* Interface to netdev-based datapath. */