From patchwork Thu Apr 12 15:32:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 897721 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=ericsson.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=ericsson.com header.i=@ericsson.com header.b="Mw1P5R7B"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=ericsson.com header.i=@ericsson.com header.b="iBccJ7gr"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40MQ040Hl3z9s27 for ; Fri, 13 Apr 2018 01:32:51 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id EE407D53; Thu, 12 Apr 2018 15:32:48 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A279AD32 for ; Thu, 12 Apr 2018 15:32:47 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from sesbmg22.ericsson.net (sesbmg22.ericsson.net [193.180.251.48]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 92C3968B for ; Thu, 12 Apr 2018 15:32:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=ericsson.com; s=mailgw201801; c=relaxed/simple; q=dns/txt; i=@ericsson.com; t=1523547161; h=From:Sender:Reply-To:Subject:Date:Message-ID:To:CC:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=6ZtdXpZrdNXmjiftxpfSIxa8ycOd8E5eyysFcx7IYkY=; b=Mw1P5R7BiIRC2IRWemjED3xfyAWW2u7NyR63jqSgz9G4nJaCC6tiFlP2+H/yKo3O nuEk1XrZrNLrrSVYbOOsstUKTkxU0bNSKiFbq9DXCUGY0iMhMIbIlpu7RhgdKjPw VeDbN6J/Ya5+cpYLVXgiKgoETDI6i5SjHfbDKwgBeIw=; X-AuditID: c1b4fb30-a81ff70000005e22-4c-5acf7c146827 Received: from ESESSHC022.ericsson.se (Unknown_Domain [153.88.183.84]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id B8.28.24098.41C7FCA5; Thu, 12 Apr 2018 17:32:36 +0200 (CEST) Received: from ESESSMB502.ericsson.se (153.88.183.163) by ESESSHC022.ericsson.se (153.88.183.84) with Microsoft SMTP Server (TLS) id 14.3.382.0; Thu, 12 Apr 2018 17:32:35 +0200 Received: from ESESSMB502.ericsson.se (153.88.183.163) by ESESSMB502.ericsson.se (153.88.183.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.26; Thu, 12 Apr 2018 17:32:35 +0200 Received: from EUR01-DB5-obe.outbound.protection.outlook.com (153.88.183.157) by ESESSMB502.ericsson.se (153.88.183.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.26 via Frontend Transport; Thu, 12 Apr 2018 17:32:34 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=QlJa0H8EZsY5qyKT7cnCrgxXLJO1VSSmAZecV413aKU=; b=iBccJ7graewNwIkXWHHxYK9bK3iNEp4/IY+OxF1yw6XnKVmcbdcDGnnsBY3r9v0k1AZNmcPkkpRM7KHtrRoyIuHEdrjwfYT+wxStn3hTJDBNjcnngtmd3VAJA+rjshA9zvWze3vjKiBH+P2hPo+Dn/R+4urb+ZcST41relo0+hE= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=jan.scheurich@ericsson.com; Received: from ubuntu.eed.ericsson.se (129.192.10.2) by DB3PR07MB0650.eurprd07.prod.outlook.com (2a01:111:e400:943d::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.675.4; Thu, 12 Apr 2018 15:32:32 +0000 From: Jan Scheurich To: Date: Thu, 12 Apr 2018 17:32:12 +0200 Message-ID: <1523547133-14055-3-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1523547133-14055-1-git-send-email-jan.scheurich@ericsson.com> References: <1523547133-14055-1-git-send-email-jan.scheurich@ericsson.com> MIME-Version: 1.0 X-Originating-IP: [129.192.10.2] X-ClientProxiedBy: HE1PR05CA0381.eurprd05.prod.outlook.com (2603:10a6:7:94::40) To DB3PR07MB0650.eurprd07.prod.outlook.com (2a01:111:e400:943d::16) X-MS-PublicTrafficType: Email X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:DB3PR07MB0650; X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 3:0iKbNCy7NY69Tl/oHE9+ko84l0inOeFcJrXdHqljFlMv0GJeBZykNKcs06rAT4SuKPfnEMf6QfZ0reYX6ZgBWHc3mRsobaIb+tnIeAm654oM1FLZgHswzbajXpCGVc9anKHVWdtLbh+JQCpF0v3gcUice7QHBt+PpAV46ntaDZIh/CB4jo4vCV6K1eZzkIu9hhcL8zWZ9t3iGeP9W6gMzdi+Md6pUz3xP+OFGnw1M4j3f9srP/Ya0hep1ebq0rtr; 25:6+FCSNPExQ+dmCUcptfRJ4B4GBrjNL5kLMCFh0oCfRT+cJqP1gnWPKhzTa8OIF077Zwp7BZz6B0Ai0LSuxbbO5JCPJ8ij1rFtHfgHYZbSKD4YzB3HRAh37VK+aCffCW9cCuwPKVWvxgMrLLk99M2oVAac1BTcZmoTI4X6OS0HXJCXOLw+QtOh4u5LcJ9lvmE3JhU3ZsgpG+dRENC9Vgd+ZAaGQdr4RRvluvCnss2KZb0FCk90rqSXbTWt35rKVS5CTQjmItPGEwo0XAIN9s4EQBZrHg0Nns0T/sLI0q+ptndYHdPX/+OXkChwYXOxHFsapEvpPSXG/tF5zOw0WjBPw==; 31:sxVbBnVwheFkzQiHDA7k+l4oZWh7qct9wBpEJe1HVk4VaiidljXANgNYVfD4j7YOAe6VK8nrWd+jJJBjEv508VeCY/0RnlXFCoumHLplbzLTpziycs++RGEx95vXKkITdVnp3+b1w/bzwNp84q32bCKBCpRC/j5+VFSRJR9fvLxZIhvwJ9UiZIbDLQWto/q/3u2O5HgjUFn2ZpDe4uMNyOWiwK1G0n8Vi7oNJvYvZwk= X-MS-TrafficTypeDiagnostic: DB3PR07MB0650: X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 20:olo6InMgdKYk9qwGkN95Ywc2R2IJFPAHCa8PzSNDDQ01bLq7MPhtByImUtnsxKEshmcXpX0yjKyUbd8pFBpAcPfF2bBgNDgIo5+sgPleYGhpYEL9eFPL14HpXHd+Fz+b9pJ/go9CGCvcG1kdFbu2dL03YzUECBKybTmNlFNF4UY09/AH8HPWYcLWjtAKTzHLzFTEG9sBKZ8ffadhvNobTgiokK/NRpkPqmBY27JyLNEcsQArlEwjMBAC3n3wa2CsyJm0rkt9sIFta5EyUvwneaAqlt6Pot4YdYOCzdmKQzjpwhbZ+ujBUIOI2FmozPrsZ2zoxImQRGh60+wE/OWag5UxZJqmCuBUiMWwmiRI4GRk1w6AeDZN9sZT7WvK0lJXkm0FGEgYmb5/tEjmYpa6XAZC1twcDhOJOmOSsaAP4uQlRtyzlpXo1YPLsokhBpFCNtnHPEA/P5D3dUFCT23L+N5hsGdgSV5cg58JUtr+PnSQh5NPJVBqcoIEECCwmONc; 4:z8jFwW8yhoxgLTq1NnqB/T38hRJDX/eoEAcsgicUpW9EDq+wa0mOl1vKuBdpH/RVVaU8qJOXiJL7v6ByGNBIj7tBfkVhWwCWoX7ZkNv1ypz0NEHvrP+KvSQiQRwhtclx6c+y5PBNIjYaeMM0L3t4LUI1gsigY1+AWnr5I/MqBHc6oyrfYUR0YUqahYdQrRO5DqaGxlsxlukQQgN9EHm95FI9l3MuKwKSIAu8syVzbO2C1TXcm8lYBd8606BwAFXF+k0z7aAdxP43eyw1utZlzsZNlVNP0RLmM3ZN5VnFea6bZ2S5v67YbtQY2w/Z/PHyAtDWNy9hoV0zP2+vfIC0vG9ieWrfDqzepW7hbbuIl/0= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(37575265505322)(228905959029699); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(10201501046)(3231221)(944501327)(52105095)(93006095)(93001095)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011); SRVR:DB3PR07MB0650; BCL:0; PCL:0; RULEID:; SRVR:DB3PR07MB0650; X-Forefront-PRVS: 06400060E1 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(39380400002)(396003)(39860400002)(366004)(346002)(376002)(189003)(199004)(186003)(6486002)(16586007)(11346002)(52116002)(316002)(7736002)(956004)(386003)(305945005)(2361001)(47776003)(50466002)(59450400001)(6506007)(76176011)(2351001)(2616005)(476003)(16526019)(26005)(8936002)(44832011)(48376002)(105586002)(86362001)(6916009)(575784001)(36756003)(50226002)(8676002)(478600001)(51416003)(551984002)(3846002)(53936002)(4326008)(53946003)(446003)(6512007)(6116002)(81156014)(486006)(81166006)(66066001)(6666003)(5660300001)(97736004)(68736007)(106356001)(2906002)(25786009)(107886003)(579004)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR07MB0650; H:ubuntu.eed.ericsson.se; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; Received-SPF: None (protection.outlook.com: ericsson.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DB3PR07MB0650; 23:Ege7pOwAb7No8JPLCifOQjvqJ5Sm/hNWF6+e4VBtk?= YSfs1Kunp6iUp1/ocfuasVXtyLThQi07nLWuOaBXEODYZsvdmlJiOQtF1YfOSzH7afGclPVR0z0Up9Aq/WZe5aZ8DKSN2ptg9cTWoVfcR3b9uJLp4Xc7PKmu2OiyCHuzueBYXV1bmVKGUEQG+KZthWFwz/eHzwVg5woCAoDzWetzccIIMr/Nuv8RSGmyChW7QHWloTSy+15CT+llmOPgR45FGbZPvBbsM0GwAB0hlLmajj4hHrYD+TJovKXhgXewFoXsyopXHTZIoUSgcLRWMH+Y/TSRa1w331qhTM3SVvbjPgy1YWP7pq9nUs8p8FS1rc6tb6g+ffXhPsJ6ORUXtPKPjxkNzggEzhHSZxvynnAFDUcfRfarxrnrUorfz5biXHJh0z/8ZUamLJ2JZLVZSS7K/vnxrrWnhbHYNcCTD3rhKTFokPH+b9W6aO2TgmwZpNyBMePowgYruM+WIquv80HG8edcyqYJw0QztpOXg2bJZO3SqzVA9ndmVaFw9laGy397aAKC+Tlj2VCa8dMrdwo3jCWElCBBCalvvfQ72LrAmjjWZO+oXd/hh5BbPwbDSMFgVRHMh7X0gzDCnSOow0ub/xCcihtMTWZ74sYv1F9Px/lfPlmkVOZBh4INdrXYYU+BOUqSqG9qgaWPcbE8YJd04Bn8fxdcxpJLXuF/ZJxgW2CKRa2UNJQBGug0bZIPzhRis2qnBlhIKaIkHotij1IPRXwUru5OKw/T3PZWfH0odRIjhMKGjCdgngVzIJgIRK/JQncfcO0T8CRiLxUri5LwzQZjJFH36nQ9ZO9lTnQ0SpOPMjwZF0J+9EhyhIBpDkMQT10QceRkTPC4XqcFX6Goy629JthwyxbfmTUWdMn9nkgEA9lm2JVVC9LHTu7NwCvJvVOqzKwj67oaIiT6lxDp7xhot1P2K6c0BsiSZk0RkhwH1B3j+FMhTFa7pRXNb8YWwg57N58nIf3phitqncbfUKiCzKgwAQ1SU+4TqjsVaJpkCPOj3kqkVnvlRxWS+mdwDCNoisVSxkk50JB9siRcn0bl2dcpuXtX3Yyph+dmHDcZUZWXeHqpL0AZKMgi9YUYXGRfTHwglpfCSNCuJGh6L1LJHXkYVryTmaJGz2eyStGCsgdJ50QQYpb3QKWV4rlZ7JokIyx/U8pMhHRFgdAUHwUW9kHpVGLWmeiJlaQcYAutWqB64CX8eVxPrUOvHfe6H6/TXcKaFzCcjSYrc5Bh6aYj5pgWyXpgUyflr0jMYQuEcsjQJ3tEgo4op3lvwmKFPL9nxGZPoQLnsbqj+L3c+gkt3iSMcAjhp+XSp3WVf2ITqG6Uuy1bGvYJ8u8y3U= X-Microsoft-Antispam-Message-Info: k7L0lAJM76NSsmDoibSLfile2hHdpuyeSgF0kI5I+dej6Ws6hlKyE55zMjXXqCWG8NTABpKa7tV1JShrniwjA0qp9DAgjRzjti5F/ByndYu3PRHA5l/lrczESlzEt6FCMz7r+V7yI6VxsWiUHi+4F/ToIzC5YU9hGjWl11IwkWQpu9NKhuNbQA2GcZh9/Smx X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 6:7Dm/KU6fMdVRZzj6f/mvFvToiH4aY12GI2T/oLtfuSIlzg/ysxOyoVx9ZiqR2eQG36l/4k2/SNpe7DZCZ6ez6XuVqJU1i5JcD6CVMvuAAT/xpqODHpcGeHtzQyzGOY66+uswFD/aqQFz8tJdGFPA4xro8dLw9dZjIbIOKJmGGMfgX6OeZ6nEWaENAahXSGiIOvO6Me2fTRFsnwR7+8KACuAAu1yBzif8iO82tmD/zJiOk/+z/Rkw4xQwJ604+6bR8r4jGLY5v6b3NdOjAXtO2jgSJ5rXbFrBevMbvIr8rlLIEveMIz1F7rv+SlUglv8mOAlFSKdjIKgGiYUA7Luy2kjDUNDeEv7MDnMClPXlUQ6l16qaFOv/dzSF3d7hnl+ZuFOkLRWqmzRRAojDdk7uoESQKOBAO2kxLcrBjTOcC/0S0teVrWunAQXWG50KSd3219ZdlVCtLIuPrUhFnNpV/w==; 5:uFALmtfEXUmVJwJ/H4MY/TNVUn2EcZ3ibur0WHYFR/x+hcicT/K4bsyjZEUhHHmAgCkymQPeon1Zkp7lS3hUZwL0ibNFmEqsozaiLTaQpyDIa+ZeaXRN2vqQE9BeTHtyDiWFATmxXCbHXowMAVh67ZEu6YJN5n/nUG3rdeFE9Hk=; 24:Xp8Ns3cgWq4KGtU83MYHQVMuJypEOpXKN5TE8GDv7NGVhdSUYmKbwp+qrNpUT7nfdtBpU7Vsku7onl/67M28gngtJLdvPUP1/kmrzkYXChI= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 7:K04ixuS76nCZ8KxOwGuwIYaKzg5XwWva3mXpp+5RmttnJOEFIGyz1tDNWZPsdCPE49pk66A1KFMLsKk3if6b8QM9ssonyEC2OWz6sfxby2etF9uNZLF3PhvTLxDpHh7OyTAssxw8QO8DdsrBk7fuCSxxh5qerTx3/3d2QrEkrn5Rp5Nwq3iIRODOXpsZ7qqfAU00MtjhA4ONAEXElpREMaPeEgATtkmHGIyJu65BicNAtdYoUKa8C7dagSi8h0gB X-MS-Office365-Filtering-Correlation-Id: 3cdb7bd1-23f2-4ff1-f8d6-08d5a08a9c45 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Apr 2018 15:32:32.2742 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3cdb7bd1-23f2-4ff1-f8d6-08d5a08a9c45 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR07MB0650 X-OriginatorOrg: ericsson.com X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupileLIzCtJLcpLzFFi42KZGbE9RFey5nyUQetNfYsda5cyWhw9vYfZ 4kr7T3aLjQ/PslqsPfSB3YHVY/Gel0wez27+Z/R4v+8qm0ffllWMASxRXDYpqTmZZalF+nYJ XBn9x78zFazZwFyx+4R9A+PsnUxdjJwcEgImEpf2PWLuYuTiEBI4wihxt2krG4SzhVHizK1L UJnvjBLvnz9khXCWMkmsvLsVzGER+MQkcXjSdhaQYYwCsRItW6cyQVR1MUnsffgSqIqDg03A QGL2bgeQGhEBSYl/i7eA7WAWmMoo0fz8G9glwgKeEofntbGC2CwCqhK/7rwHs3kFvCSmn37K DHGtnMTJY5PB4pwC3hKfNzcyg8wXAqqZv0YJolxQ4uTMJ2D3MAtISBx88QKqVUFi45aZLCB7 JQSmM0ocf34KbK+QgLrE30NfWSCKZCWOnp0DVbSNUaL72j52CKeBXeLB9avQMNOSuDzxHjuE vYRdomu+IYSdLfGq8x8jhO0t0fR7B5S9lFni6b0yCFtG4vSUjVBDG9kk1h3ewz6BUW8WktNn ITl9ASPTKkbR4tTipNx0IyO91KLM5OLi/Dy9vNSSTYzAFHJwy2+DHYwvnzseYhTgYFTi4V2b ez5KiDWxrLgy9xCjBAezkgjvHDugEG9KYmVValF+fFFpTmrxIUZpDhYlcV4Lv81RQgLpiSWp 2ampBalFMFkmDk6pBsaCFlUF0TeG7K5BYYJbVi1uOhU9QU1Irvn7xzW1HpMmvv91e6I3657k u3xhhaU3v/tffXQ+t/Km2mmf+0m2aaoXGBeGvprw5vKXKYdrmYv/Ml3c+1GO406EP6f3mw+7 KgVjPxq7qYeubWeU7jHeF9A39VP1mSClB5P82L+yc5lej7xRu0YxRkeJpTgj0VCLuag4EQAf B0/QHQMAAA== X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v11 2/3] dpif-netdev: Detailed performance stats for PMDs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch instruments the dpif-netdev datapath to record detailed statistics of what is happening in every iteration of a PMD thread. The collection of detailed statistics can be controlled by a new Open_vSwitch configuration parameter "other_config:pmd-perf-metrics". By default it is disabled. The run-time overhead, when enabled, is in the order of 1%. The covered metrics per iteration are: - cycles - packets - (rx) batches - packets/batch - max. vhostuser qlen - upcalls - cycles spent in upcalls This raw recorded data is used threefold: 1. In histograms for each of the following metrics: - cycles/iteration (log.) - packets/iteration (log.) - cycles/packet - packets/batch - max. vhostuser qlen (log.) - upcalls - cycles/upcall (log) The histograms bins are divided linear or logarithmic. 2. A cyclic history of the above statistics for 999 iterations 3. A cyclic history of the cummulative/average values per millisecond wall clock for the last 1000 milliseconds: - number of iterations - avg. cycles/iteration - packets (Kpps) - avg. packets/batch - avg. max vhost qlen - upcalls - avg. cycles/upcall The gathered performance metrics can be printed at any time with the new CLI command ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len] [-pmd core] [dp] The options are -nh: Suppress the histograms -it iter_len: Display the last iter_len iteration stats -ms ms_len: Display the last ms_len millisecond stats -pmd core: Display only the specified PMD The performance statistics are reset with the existing dpif-netdev/pmd-stats-clear command. The output always contains the following global PMD statistics, similar to the pmd-stats-show command: Time: 15:24:55.270 Measurement duration: 1.008 s pmd thread numa_id 0 core_id 1: Cycles: 2419034712 (2.40 GHz) Iterations: 572817 (1.76 us/it) - idle: 486808 (15.9 % cycles) - busy: 86009 (84.1 % cycles) Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes: 3599415 (1.50 passes/pkt) - EMC hits: 336472 ( 9.3 %) - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 2399607 (2381 Kpps) Tx batches: 171400 (14.00 pkts/batch) Signed-off-by: Jan Scheurich Acked-by: Billy O'Mahony --- NEWS | 4 + lib/automake.mk | 1 + lib/dpif-netdev-perf.c | 462 +++++++++++++++++++++++++++++++++++++++++++- lib/dpif-netdev-perf.h | 197 ++++++++++++++++--- lib/dpif-netdev-unixctl.man | 157 +++++++++++++++ lib/dpif-netdev.c | 187 ++++++++++++++++-- manpages.mk | 2 + vswitchd/ovs-vswitchd.8.in | 27 +-- vswitchd/vswitch.xml | 12 ++ 9 files changed, 985 insertions(+), 64 deletions(-) create mode 100644 lib/dpif-netdev-unixctl.man diff --git a/NEWS b/NEWS index 757d648..ff81a9f 100644 --- a/NEWS +++ b/NEWS @@ -19,6 +19,10 @@ Post-v2.9.0 * implemented icmp4/icmp6/tcp_reset actions in order to drop the packet and reply with a RST for TCP or ICMPv4/ICMPv6 unreachable message for other IPv4/IPv6-based protocols whenever a reject ACL rule is hit. + - Userspace datapath: + * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD + * Detailed PMD performance metrics available with new command + ovs-appctl dpif-netdev/pmd-perf-show v2.9.0 - 19 Feb 2018 -------------------- diff --git a/lib/automake.mk b/lib/automake.mk index 915a33b..3276aaa 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -491,6 +491,7 @@ MAN_FRAGMENTS += \ lib/dpctl.man \ lib/memory-unixctl.man \ lib/netdev-dpdk-unixctl.man \ + lib/dpif-netdev-unixctl.man \ lib/ofp-version.man \ lib/ovs.tmac \ lib/service.man \ diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index f06991a..caa0e27 100644 --- a/lib/dpif-netdev-perf.c +++ b/lib/dpif-netdev-perf.c @@ -15,18 +15,333 @@ */ #include +#include +#include "dpif-netdev-perf.h" #include "openvswitch/dynamic-string.h" #include "openvswitch/vlog.h" -#include "dpif-netdev-perf.h" +#include "ovs-thread.h" #include "timeval.h" VLOG_DEFINE_THIS_MODULE(pmd_perf); +#ifdef DPDK_NETDEV +static uint64_t +get_tsc_hz(void) +{ + return rte_get_tsc_hz(); +} +#else +/* This function is only invoked from PMD threads which depend on DPDK. + * A dummy function is sufficient when building without DPDK_NETDEV. */ +static uint64_t +get_tsc_hz(void) +{ + return 1; +} +#endif + +/* Histogram functions. */ + +static void +histogram_walls_set_lin(struct histogram *hist, uint32_t min, uint32_t max) +{ + int i; + + ovs_assert(min < max); + for (i = 0; i < NUM_BINS-1; i++) { + hist->wall[i] = min + (i * (max - min)) / (NUM_BINS - 2); + } + hist->wall[NUM_BINS-1] = UINT32_MAX; +} + +static void +histogram_walls_set_log(struct histogram *hist, uint32_t min, uint32_t max) +{ + int i, start, bins, wall; + double log_min, log_max; + + ovs_assert(min < max); + if (min > 0) { + log_min = log(min); + log_max = log(max); + start = 0; + bins = NUM_BINS - 1; + } else { + hist->wall[0] = 0; + log_min = log(1); + log_max = log(max); + start = 1; + bins = NUM_BINS - 2; + } + wall = start; + for (i = 0; i < bins; i++) { + /* Make sure each wall is monotonically increasing. */ + wall = MAX(wall, exp(log_min + (i * (log_max - log_min)) / (bins-1))); + hist->wall[start + i] = wall++; + } + if (hist->wall[NUM_BINS-2] < max) { + hist->wall[NUM_BINS-2] = max; + } + hist->wall[NUM_BINS-1] = UINT32_MAX; +} + +uint64_t +histogram_samples(const struct histogram *hist) +{ + uint64_t samples = 0; + + for (int i = 0; i < NUM_BINS; i++) { + samples += hist->bin[i]; + } + return samples; +} + +static void +histogram_clear(struct histogram *hist) +{ + int i; + + for (i = 0; i < NUM_BINS; i++) { + hist->bin[i] = 0; + } +} + +static void +history_init(struct history *h) +{ + memset(h, 0, sizeof(*h)); +} + void pmd_perf_stats_init(struct pmd_perf_stats *s) { - memset(s, 0 , sizeof(*s)); + memset(s, 0, sizeof(*s)); + ovs_mutex_init(&s->stats_mutex); + ovs_mutex_init(&s->clear_mutex); + /* Logarithmic histogram for cycles/it ranging from 500 to 24M + * (corresponding to 200 ns to 9.6 ms at 2.5 GHz TSC clock). */ + histogram_walls_set_log(&s->cycles, 500, 24000000); + /* Logarithmic histogram for pkts/it ranging from 0 to 1000. */ + histogram_walls_set_log(&s->pkts, 0, 1000); + /* Linear histogram for cycles/pkt ranging from 100 to 30K. */ + histogram_walls_set_lin(&s->cycles_per_pkt, 100, 30000); + /* Linear histogram for pkts/rx batch ranging from 0 to 32, + * the maximum rx batch size in OVS. */ + histogram_walls_set_lin(&s->pkts_per_batch, 0, 32); + /* Linear histogram for upcalls/it ranging from 0 to 30. */ + histogram_walls_set_lin(&s->upcalls, 0, 30); + /* Logarithmic histogram for cycles/upcall ranging from 1000 to 1M + * (corresponding to 400 ns to 400 us at 2.5 GHz TSC clock). */ + histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000); + /* Log. histogram for max vhostuser queue fill level from 0 to 512. + * 512 is the maximum fill level for a virtio queue with 1024 + * descriptors (maximum configurable length in Qemu), with the + * DPDK 17.11 virtio PMD in the guest. */ + histogram_walls_set_log(&s->max_vhost_qfill, 0, 512); + s->iteration_cnt = 0; + s->start_ms = time_msec(); +} + +void +pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, + double duration) +{ + uint64_t stats[PMD_N_STATS]; + double us_per_cycle = 1000000.0 / get_tsc_hz(); + + if (duration == 0) { + return; + } + + pmd_perf_read_counters(s, stats); + uint64_t tot_cycles = stats[PMD_CYCLES_ITER_IDLE] + + stats[PMD_CYCLES_ITER_BUSY]; + uint64_t rx_packets = stats[PMD_STAT_RECV]; + uint64_t tx_packets = stats[PMD_STAT_SENT_PKTS]; + uint64_t tx_batches = stats[PMD_STAT_SENT_BATCHES]; + uint64_t passes = stats[PMD_STAT_RECV] + + stats[PMD_STAT_RECIRC]; + uint64_t upcalls = stats[PMD_STAT_MISS]; + uint64_t upcall_cycles = stats[PMD_CYCLES_UPCALL]; + uint64_t tot_iter = histogram_samples(&s->pkts); + uint64_t idle_iter = s->pkts.bin[0]; + uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0; + + ds_put_format(str, + " Cycles: %12"PRIu64" (%.2f GHz)\n" + " Iterations: %12"PRIu64" (%.2f us/it)\n" + " - idle: %12"PRIu64" (%4.1f %% cycles)\n" + " - busy: %12"PRIu64" (%4.1f %% cycles)\n", + tot_cycles, (tot_cycles / duration) / 1E9, + tot_iter, tot_cycles * us_per_cycle / tot_iter, + idle_iter, + 100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles, + busy_iter, + 100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles); + if (rx_packets > 0) { + ds_put_format(str, + " Rx packets: %12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n" + " Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n" + " - EMC hits: %12"PRIu64" (%4.1f %%)\n" + " - Megaflow hits: %12"PRIu64" (%4.1f %%, %.2f subtbl lookups/" + "hit)\n" + " - Upcalls: %12"PRIu64" (%4.1f %%, %.1f us/upcall)\n" + " - Lost upcalls: %12"PRIu64" (%4.1f %%)\n", + rx_packets, (rx_packets / duration) / 1000, + 1.0 * stats[PMD_CYCLES_ITER_BUSY] / rx_packets, + passes, rx_packets ? 1.0 * passes / rx_packets : 0, + stats[PMD_STAT_EXACT_HIT], + 100.0 * stats[PMD_STAT_EXACT_HIT] / passes, + stats[PMD_STAT_MASKED_HIT], + 100.0 * stats[PMD_STAT_MASKED_HIT] / passes, + stats[PMD_STAT_MASKED_HIT] + ? 1.0 * stats[PMD_STAT_MASKED_LOOKUP] / stats[PMD_STAT_MASKED_HIT] + : 0, + upcalls, 100.0 * upcalls / passes, + upcalls ? (upcall_cycles * us_per_cycle) / upcalls : 0, + stats[PMD_STAT_LOST], + 100.0 * stats[PMD_STAT_LOST] / passes); + } else { + ds_put_format(str, " Rx packets: %12d\n", 0); + } + if (tx_packets > 0) { + ds_put_format(str, + " Tx packets: %12"PRIu64" (%.0f Kpps)\n" + " Tx batches: %12"PRIu64" (%.2f pkts/batch)" + "\n", + tx_packets, (tx_packets / duration) / 1000, + tx_batches, 1.0 * tx_packets / tx_batches); + } else { + ds_put_format(str, " Tx packets: %12d\n\n", 0); + } +} + +void +pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s) +{ + int i; + + ds_put_cstr(str, "Histograms\n"); + ds_put_format(str, + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n", + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch", + "max vhost qlen", "upcalls/it", "cycles/upcall"); + for (i = 0; i < NUM_BINS-1; i++) { + ds_put_format(str, + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64 + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64 + " %-9d %-11"PRIu64"\n", + s->cycles.wall[i], s->cycles.bin[i], + s->pkts.wall[i],s->pkts.bin[i], + s->cycles_per_pkt.wall[i], s->cycles_per_pkt.bin[i], + s->pkts_per_batch.wall[i], s->pkts_per_batch.bin[i], + s->max_vhost_qfill.wall[i], s->max_vhost_qfill.bin[i], + s->upcalls.wall[i], s->upcalls.bin[i], + s->cycles_per_upcall.wall[i], s->cycles_per_upcall.bin[i]); + } + ds_put_format(str, + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64 + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64 + " %-9s %-11"PRIu64"\n", + ">", s->cycles.bin[i], + ">", s->pkts.bin[i], + ">", s->cycles_per_pkt.bin[i], + ">", s->pkts_per_batch.bin[i], + ">", s->max_vhost_qfill.bin[i], + ">", s->upcalls.bin[i], + ">", s->cycles_per_upcall.bin[i]); + if (s->totals.iterations > 0) { + ds_put_cstr(str, + "-----------------------------------------------------" + "-----------------------------------------------------" + "------------------------------------------------\n"); + ds_put_format(str, + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n", + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch", + "vhost qlen", "upcalls/it", "cycles/upcall"); + ds_put_format(str, + " %-21"PRIu64" %-21.5f %-21"PRIu64 + " %-21.5f %-21.5f %-21.5f %-21"PRIu32"\n", + s->totals.cycles / s->totals.iterations, + 1.0 * s->totals.pkts / s->totals.iterations, + s->totals.pkts + ? s->totals.busy_cycles / s->totals.pkts : 0, + s->totals.batches + ? 1.0 * s->totals.pkts / s->totals.batches : 0, + 1.0 * s->totals.max_vhost_qfill / s->totals.iterations, + 1.0 * s->totals.upcalls / s->totals.iterations, + s->totals.upcalls + ? s->totals.upcall_cycles / s->totals.upcalls : 0); + } +} + +void +pmd_perf_format_iteration_history(struct ds *str, struct pmd_perf_stats *s, + int n_iter) +{ + struct iter_stats *is; + size_t index; + int i; + + if (n_iter == 0) { + return; + } + ds_put_format(str, " %-17s %-10s %-10s %-10s %-10s " + "%-10s %-10s %-10s\n", + "iter", "cycles", "packets", "cycles/pkt", "pkts/batch", + "vhost qlen", "upcalls", "cycles/upcall"); + for (i = 1; i <= n_iter; i++) { + index = history_sub(s->iterations.idx, i); + is = &s->iterations.sample[index]; + ds_put_format(str, + " %-17"PRIu64" %-11"PRIu64" %-11"PRIu32 + " %-11"PRIu64" %-11"PRIu32" %-11"PRIu32 + " %-11"PRIu32" %-11"PRIu32"\n", + is->timestamp, + is->cycles, + is->pkts, + is->pkts ? is->cycles / is->pkts : 0, + is->batches ? is->pkts / is->batches : 0, + is->max_vhost_qfill, + is->upcalls, + is->upcalls ? is->upcall_cycles / is->upcalls : 0); + } +} + +void +pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int n_ms) +{ + struct iter_stats *is; + size_t index; + int i; + + if (n_ms == 0) { + return; + } + ds_put_format(str, + " %-12s %-10s %-10s %-10s %-10s" + " %-10s %-10s %-10s %-10s\n", + "ms", "iterations", "cycles/it", "Kpps", "cycles/pkt", + "pkts/batch", "vhost qlen", "upcalls", "cycles/upcall"); + for (i = 1; i <= n_ms; i++) { + index = history_sub(s->milliseconds.idx, i); + is = &s->milliseconds.sample[index]; + ds_put_format(str, + " %-12"PRIu64" %-11"PRIu32" %-11"PRIu64 + " %-11"PRIu32" %-11"PRIu64" %-11"PRIu32 + " %-11"PRIu32" %-11"PRIu32" %-11"PRIu32"\n", + is->timestamp, + is->iterations, + is->iterations ? is->cycles / is->iterations : 0, + is->pkts, + is->pkts ? is->busy_cycles / is->pkts : 0, + is->batches ? is->pkts / is->batches : 0, + is->iterations + ? is->max_vhost_qfill / is->iterations : 0, + is->upcalls, + is->upcalls ? is->upcall_cycles / is->upcalls : 0); + } } void @@ -51,10 +366,151 @@ pmd_perf_read_counters(struct pmd_perf_stats *s, } } +/* This function clears the PMD performance counters from within the PMD + * thread or from another thread when the PMD thread is not executing its + * poll loop. */ void -pmd_perf_stats_clear(struct pmd_perf_stats *s) +pmd_perf_stats_clear_lock(struct pmd_perf_stats *s) + OVS_REQUIRES(s->stats_mutex) { + ovs_mutex_lock(&s->clear_mutex); for (int i = 0; i < PMD_N_STATS; i++) { atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); } + /* The following stats are only applicable in PMD thread and */ + memset(&s->current, 0 , sizeof(struct iter_stats)); + memset(&s->totals, 0 , sizeof(struct iter_stats)); + histogram_clear(&s->cycles); + histogram_clear(&s->pkts); + histogram_clear(&s->cycles_per_pkt); + histogram_clear(&s->upcalls); + histogram_clear(&s->cycles_per_upcall); + histogram_clear(&s->pkts_per_batch); + histogram_clear(&s->max_vhost_qfill); + history_init(&s->iterations); + history_init(&s->milliseconds); + s->start_ms = time_msec(); + s->milliseconds.sample[0].timestamp = s->start_ms; + /* Clearing finished. */ + s->clear = false; + ovs_mutex_unlock(&s->clear_mutex); +} + +/* This function can be called from the anywhere to clear the stats + * of PMD and non-PMD threads. */ +void +pmd_perf_stats_clear(struct pmd_perf_stats *s) +{ + if (ovs_mutex_trylock(&s->stats_mutex) == 0) { + /* Locking successful. PMD not polling. */ + pmd_perf_stats_clear_lock(s); + ovs_mutex_unlock(&s->stats_mutex); + } else { + /* Request the polling PMD to clear the stats. There is no need to + * block here as stats retrieval is prevented during clearing. */ + s->clear = true; + } +} + +/* Functions recording PMD metrics per iteration. */ + +inline void +pmd_perf_start_iteration(struct pmd_perf_stats *s) +{ + if (s->clear) { + /* Clear the PMD stats before starting next iteration. */ + pmd_perf_stats_clear_lock(s); + } + s->iteration_cnt++; + /* Initialize the current interval stats. */ + memset(&s->current, 0, sizeof(struct iter_stats)); + if (OVS_LIKELY(s->last_tsc)) { + /* We assume here that last_tsc was updated immediately prior at + * the end of the previous iteration, or just before the first + * iteration. */ + s->start_tsc = s->last_tsc; + } else { + /* In case last_tsc has never been set before. */ + s->start_tsc = cycles_counter_update(s); + } +} + +inline void +pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets, + int tx_packets, bool full_metrics) +{ + uint64_t now_tsc = cycles_counter_update(s); + struct iter_stats *cum_ms; + uint64_t cycles, cycles_per_pkt = 0; + + cycles = now_tsc - s->start_tsc; + s->current.timestamp = s->iteration_cnt; + s->current.cycles = cycles; + s->current.pkts = rx_packets; + + if (rx_packets + tx_packets > 0) { + pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles); + } else { + pmd_perf_update_counter(s, PMD_CYCLES_ITER_IDLE, cycles); + } + /* Add iteration samples to histograms. */ + histogram_add_sample(&s->cycles, cycles); + histogram_add_sample(&s->pkts, rx_packets); + + if (!full_metrics) { + return; + } + + s->counters.n[PMD_CYCLES_UPCALL] += s->current.upcall_cycles; + + if (rx_packets > 0) { + cycles_per_pkt = cycles / rx_packets; + histogram_add_sample(&s->cycles_per_pkt, cycles_per_pkt); + } + if (s->current.batches > 0) { + histogram_add_sample(&s->pkts_per_batch, + rx_packets / s->current.batches); + } + histogram_add_sample(&s->upcalls, s->current.upcalls); + if (s->current.upcalls > 0) { + histogram_add_sample(&s->cycles_per_upcall, + s->current.upcall_cycles / s->current.upcalls); + } + histogram_add_sample(&s->max_vhost_qfill, s->current.max_vhost_qfill); + + /* Add iteration samples to millisecond stats. */ + cum_ms = history_current(&s->milliseconds); + cum_ms->iterations++; + cum_ms->cycles += cycles; + if (rx_packets > 0) { + cum_ms->busy_cycles += cycles; + } + cum_ms->pkts += s->current.pkts; + cum_ms->upcalls += s->current.upcalls; + cum_ms->upcall_cycles += s->current.upcall_cycles; + cum_ms->batches += s->current.batches; + cum_ms->max_vhost_qfill += s->current.max_vhost_qfill; + + /* Store in iteration history. This advances the iteration idx and + * clears the next slot in the iteration history. */ + history_store(&s->iterations, &s->current); + if (now_tsc > s->next_check_tsc) { + /* Check if ms is completed and store in milliseconds history. */ + uint64_t now = time_msec(); + if (now != cum_ms->timestamp) { + /* Add ms stats to totals. */ + s->totals.iterations += cum_ms->iterations; + s->totals.cycles += cum_ms->cycles; + s->totals.busy_cycles += cum_ms->busy_cycles; + s->totals.pkts += cum_ms->pkts; + s->totals.upcalls += cum_ms->upcalls; + s->totals.upcall_cycles += cum_ms->upcall_cycles; + s->totals.batches += cum_ms->batches; + s->totals.max_vhost_qfill += cum_ms->max_vhost_qfill; + cum_ms = history_next(&s->milliseconds); + cum_ms->timestamp = now; + } + /* Do the next check after 10K cycles (4 us at 2.5 GHz TSC clock). */ + s->next_check_tsc = cycles_counter_update(s) + 10000; + } } diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 5993c25..e225d10 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -38,10 +38,18 @@ extern "C" { #endif -/* This module encapsulates data structures and functions to maintain PMD - * performance metrics such as packet counters, execution cycles. It - * provides a clean API for dpif-netdev to initialize, update and read and +/* This module encapsulates data structures and functions to maintain basic PMD + * performance metrics such as packet counters, execution cycles as well as + * histograms and time series recording for more detailed PMD metrics. + * + * It provides a clean API for dpif-netdev to initialize, update and read and * reset these metrics. + * + * The basic set of PMD counters is implemented as atomic_uint64_t variables + * to guarantee correct read also in 32-bit systems. + * + * The detailed PMD performance metrics are only supported on 64-bit systems + * with atomic 64-bit read and store semantics for plain uint64_t counters. */ /* Set of counter types maintained in pmd_perf_stats. */ @@ -66,6 +74,7 @@ enum pmd_stat_type { PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ + PMD_CYCLES_UPCALL, /* Cycles spent processing upcalls. */ PMD_N_STATS }; @@ -81,18 +90,91 @@ struct pmd_counters { uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ }; -/* Container for all performance metrics of a PMD. - * Part of the struct dp_netdev_pmd_thread. */ +/* Data structure to collect statistical distribution of an integer measurement + * type in form of a histogram. The wall[] array contains the inclusive + * upper boundaries of the bins, while the bin[] array contains the actual + * counters per bin. The histogram walls are typically set automatically + * using the functions provided below.*/ + +#define NUM_BINS 32 /* Number of histogram bins. */ + +struct histogram { + uint32_t wall[NUM_BINS]; + uint64_t bin[NUM_BINS]; +}; + +/* Data structure to record details PMD execution metrics per iteration for + * a history period of up to HISTORY_LEN iterations in circular buffer. + * Also used to record up to HISTORY_LEN millisecond averages/totals of these + * metrics.*/ + +struct iter_stats { + uint64_t timestamp; /* Iteration no. or millisecond. */ + uint64_t cycles; /* Number of TSC cycles spent in it. or ms. */ + uint64_t busy_cycles; /* Cycles spent in busy iterations or ms. */ + uint32_t iterations; /* Iterations in ms. */ + uint32_t pkts; /* Packets processed in iteration or ms. */ + uint32_t upcalls; /* Number of upcalls in iteration or ms. */ + uint32_t upcall_cycles; /* Cycles spent in upcalls in it. or ms. */ + uint32_t batches; /* Number of rx batches in iteration or ms. */ + uint32_t max_vhost_qfill; /* Maximum fill level in iteration or ms. */ +}; + +#define HISTORY_LEN 1000 /* Length of recorded history + (iterations and ms). */ +#define DEF_HIST_SHOW 20 /* Default number of history samples to + display. */ + +struct history { + size_t idx; /* Slot to which next call to history_store() + will write. */ + struct iter_stats sample[HISTORY_LEN]; +}; + +/* Container for all performance metrics of a PMD within the struct + * dp_netdev_pmd_thread. The metrics must be updated from within the PMD + * thread but can be read from any thread. The basic PMD counters in + * struct pmd_counters can be read without protection against concurrent + * clearing. The other metrics may only be safely read with the clear_mutex + * held to protect against concurrent clearing. */ struct pmd_perf_stats { - /* Start of the current PMD iteration in TSC cycles.*/ - uint64_t start_it_tsc; + /* Prevents interference between PMD polling and stats clearing. */ + struct ovs_mutex stats_mutex; + /* Set by CLI thread to order clearing of PMD stats. */ + volatile bool clear; + /* Prevents stats retrieval while clearing is in progress. */ + struct ovs_mutex clear_mutex; + /* Start of the current performance measurement period. */ + uint64_t start_ms; + /* Counter for PMD iterations. */ + uint64_t iteration_cnt; + /* Start of the current iteration. */ + uint64_t start_tsc; /* Latest TSC time stamp taken in PMD. */ uint64_t last_tsc; + /* Used to space certain checks in time. */ + uint64_t next_check_tsc; /* If non-NULL, outermost cycle timer currently running in PMD. */ struct cycle_timer *cur_timer; /* Set of PMD counters with their zero offsets. */ struct pmd_counters counters; + /* Statistics of the current iteration. */ + struct iter_stats current; + /* Totals for the current millisecond. */ + struct iter_stats totals; + /* Histograms for the PMD metrics. */ + struct histogram cycles; + struct histogram pkts; + struct histogram cycles_per_pkt; + struct histogram upcalls; + struct histogram cycles_per_upcall; + struct histogram pkts_per_batch; + struct histogram max_vhost_qfill; + /* Iteration history buffer. */ + struct history iterations; + /* Millisecond history buffer. */ + struct history milliseconds; }; /* Support for accurate timing of PMD execution on TSC clock cycle level. @@ -175,8 +257,14 @@ cycle_timer_stop(struct pmd_perf_stats *s, return now - timer->start; } +/* Functions to initialize and reset the PMD performance metrics. */ + void pmd_perf_stats_init(struct pmd_perf_stats *s); void pmd_perf_stats_clear(struct pmd_perf_stats *s); +void pmd_perf_stats_clear_lock(struct pmd_perf_stats *s); + +/* Functions to read and update PMD counters. */ + void pmd_perf_read_counters(struct pmd_perf_stats *s, uint64_t stats[PMD_N_STATS]); @@ -199,32 +287,95 @@ pmd_perf_update_counter(struct pmd_perf_stats *s, atomic_store_relaxed(&s->counters.n[counter], tmp); } +/* Functions to manipulate a sample history. */ + static inline void -pmd_perf_start_iteration(struct pmd_perf_stats *s) +histogram_add_sample(struct histogram *hist, uint32_t val) { - if (OVS_LIKELY(s->last_tsc)) { - /* We assume here that last_tsc was updated immediately prior at - * the end of the previous iteration, or just before the first - * iteration. */ - s->start_it_tsc = s->last_tsc; - } else { - /* In case last_tsc has never been set before. */ - s->start_it_tsc = cycles_counter_update(s); + /* TODO: Can do better with binary search? */ + for (int i = 0; i < NUM_BINS-1; i++) { + if (val <= hist->wall[i]) { + hist->bin[i]++; + return; + } } + hist->bin[NUM_BINS-1]++; } -static inline void -pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets) +uint64_t histogram_samples(const struct histogram *hist); + +/* This function is used to advance the given history index by positive + * offset in the circular history buffer. */ +static inline uint32_t +history_add(uint32_t idx, uint32_t offset) { - uint64_t cycles = cycles_counter_update(s) - s->start_it_tsc; + return (idx + offset) % HISTORY_LEN; +} + +/* This function computes the difference between two indices into the + * circular history buffer. The result is always positive in the range + * 0 .. HISTORY_LEN-1 and specifies the number of steps to reach idx1 + * starting from idx2. It can also be used to retreat the history index + * idx1 by idx2 steps. */ +static inline uint32_t +history_sub(uint32_t idx1, uint32_t idx2) +{ + return (idx1 + HISTORY_LEN - idx2) % HISTORY_LEN; +} + +static inline struct iter_stats * +history_current(struct history *h) +{ + return &h->sample[h->idx]; +} + +static inline struct iter_stats * +history_next(struct history *h) +{ + size_t next_idx = history_add(h->idx, 1); + struct iter_stats *next = &h->sample[next_idx]; - if (rx_packets > 0) { - pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles); - } else { - pmd_perf_update_counter(s, PMD_CYCLES_ITER_IDLE, cycles); + memset(next, 0, sizeof(*next)); + h->idx = next_idx; + return next; +} + +static inline struct iter_stats * +history_store(struct history *h, struct iter_stats *is) +{ + if (is) { + h->sample[h->idx] = *is; } + /* Advance the history pointer */ + return history_next(h); } +/* Functions recording PMD metrics per iteration. */ + +void +pmd_perf_start_iteration(struct pmd_perf_stats *s); +void +pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets, + int tx_packets, bool full_metrics); + +/* Formatting the output of commands. */ + +struct pmd_perf_params { + int command_type; + bool histograms; + size_t iter_hist_len; + size_t ms_hist_len; +}; + +void pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, + double duration); +void pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s); +void pmd_perf_format_iteration_history(struct ds *str, + struct pmd_perf_stats *s, + int n_iter); +void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, + int n_ms); + #ifdef __cplusplus } #endif diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man new file mode 100644 index 0000000..76c3e4e --- /dev/null +++ b/lib/dpif-netdev-unixctl.man @@ -0,0 +1,157 @@ +.SS "DPIF-NETDEV COMMANDS" +These commands are used to expose internal information (mostly statistics) +about the "dpif-netdev" userspace datapath. If there is only one datapath +(as is often the case, unless \fBdpctl/\fR commands are used), the \fIdp\fR +argument can be omitted. By default the commands present data for all pmd +threads in the datapath. By specifying the "-pmd Core" option one can filter +the output for a single pmd in the datapath. +. +.IP "\fBdpif-netdev/pmd-stats-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" +Shows performance statistics for one or all pmd threads of the datapath +\fIdp\fR. The special thread "main" sums up the statistics of every non pmd +thread. + +The sum of "emc hits", "masked hits" and "miss" is the number of +packet lookups performed by the datapath. Beware that a recirculated packet +experiences one additional lookup per recirculation, so there may be +more lookups than forwarded packets in the datapath. + +Cycles are counted using the TSC or similar facilities (when available on +the platform). The duration of one cycle depends on the processing platform. + +"idle cycles" refers to cycles spent in PMD iterations not forwarding any +any packets. "processing cycles" refers to cycles spent in PMD iterations +forwarding at least one packet, including the cost for polling, processing and +transmitting said packets. + +To reset these counters use \fBdpif-netdev/pmd-stats-clear\fR. +. +.IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]" +Resets to zero the per pmd thread performance numbers shown by the +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR commands. +It will NOT reset datapath or bridge statistics, only the values shown by +the above commands. +. +.IP "\fBdpif-netdev/pmd-perf-show\fR [\fB-nh\fR] [\fB-it\fR \fIiter_len\fR] \ +[\fB-ms\fR \fIms_len\fR] [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" +Shows detailed performance metrics for one or all pmds threads of the +user space datapath. + +The collection of detailed statistics can be controlled by a new +configuration parameter "other_config:pmd-perf-metrics". By default it +is disabled. The run-time overhead, when enabled, is in the order of 1%. + +.RS +.IP +.PD .4v +.IP \(em +used cycles +.IP \(em +forwared packets +.IP \(em +number of rx batches +.IP \(em +packets/rx batch +.IP \(em +max. vhostuser queue fill level +.IP \(em +number of upcalls +.IP \(em +cycles spent in upcalls +.PD +.RE +.IP +This raw recorded data is used threefold: + +.RS +.IP +.PD .4v +.IP 1. +In histograms for each of the following metrics: +.RS +.IP \(em +cycles/iteration (logarithmic) +.IP \(em +packets/iteration (logarithmic) +.IP \(em +cycles/packet +.IP \(em +packets/batch +.IP \(em +max. vhostuser qlen (logarithmic) +.IP \(em +upcalls +.IP \(em +cycles/upcall (logarithmic) +The histograms bins are divided linear or logarithmic. +.RE +.IP 2. +A cyclic history of the above metrics for 1024 iterations +.IP 3. +A cyclic history of the cummulative/average values per millisecond wall +clock for the last 1024 milliseconds: +.RS +.IP \(em +number of iterations +.IP \(em +avg. cycles/iteration +.IP \(em +packets (Kpps) +.IP \(em +avg. packets/batch +.IP \(em +avg. max vhost qlen +.IP \(em +upcalls +.IP \(em +avg. cycles/upcall +.RE +.PD +.RE +.IP +. +The command options are: +.RS +.IP "\fB-nh\fR" +Suppress the histograms +.IP "\fB-it\fR \fIiter_len\fR" +Display the last iter_len iteration stats +.IP "\fB-ms\fR \fIms_len\fR" +Display the last ms_len millisecond stats +.RE +.IP +The output always contains the following global PMD statistics: +.RS +.IP +Time: 15:24:55.270 .br +Measurement duration: 1.008 s + +pmd thread numa_id 0 core_id 1: + + Cycles: 2419034712 (2.40 GHz) + Iterations: 572817 (1.76 us/it) + - idle: 486808 (15.9 % cycles) + - busy: 86009 (84.1 % cycles) + Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) + Datapath passes: 3599415 (1.50 passes/pkt) + - EMC hits: 336472 ( 9.3 %) + - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) + - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) + - Lost upcalls: 0 ( 0.0 %) + Tx packets: 2399607 (2381 Kpps) + Tx batches: 171400 (14.00 pkts/batch) +.RE +.IP +Here "Rx packets" actually reflects the number of packets forwarded by the +datapath. "Datapath passes" matches the number of packet lookups as +reported by the \fBdpif-netdev/pmd-stats-show\fR command. + +To reset the counters and start a new measurement use +\fBdpif-netdev/pmd-stats-clear\fR. +. +.IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" +For one or all pmd threads of the datapath \fIdp\fR show the list of queue-ids +with port names, which this thread polls. +. +.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]" +Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage. diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 7ce3943..6242968 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -49,6 +49,7 @@ #include "id-pool.h" #include "latch.h" #include "netdev.h" +#include "netdev-provider.h" #include "netdev-vport.h" #include "netlink.h" #include "odp-execute.h" @@ -281,6 +282,8 @@ struct dp_netdev { /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min; + /* Enable collection of PMD performance metrics. */ + atomic_bool pmd_perf_metrics; /* Protects access to ofproto-dpif-upcall interface during revalidator * thread synchronization. */ @@ -356,6 +359,7 @@ struct dp_netdev_rxq { particular core. */ unsigned intrvl_idx; /* Write index for 'cycles_intrvl'. */ struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this queue. */ + bool is_vhost; /* Is rxq of a vhost port. */ /* Counters of cycles spent successfully polling and processing pkts. */ atomic_ullong cycles[RXQ_N_CYCLES]; @@ -717,6 +721,8 @@ static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); static void dp_netdev_request_reconfigure(struct dp_netdev *dp); +static inline bool +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd); static void emc_cache_init(struct emc_cache *flow_cache) @@ -800,7 +806,8 @@ get_dp_netdev(const struct dpif *dpif) enum pmd_info_type { PMD_INFO_SHOW_STATS, /* Show how cpu cycles are spent. */ PMD_INFO_CLEAR_STATS, /* Set the cycles count to 0. */ - PMD_INFO_SHOW_RXQ /* Show poll-lists of pmd threads. */ + PMD_INFO_SHOW_RXQ, /* Show poll lists of pmd threads. */ + PMD_INFO_PERF_SHOW, /* Show pmd performance details. */ }; static void @@ -891,6 +898,47 @@ pmd_info_show_stats(struct ds *reply, stats[PMD_CYCLES_ITER_BUSY], total_packets); } +static void +pmd_info_show_perf(struct ds *reply, + struct dp_netdev_pmd_thread *pmd, + struct pmd_perf_params *par) +{ + if (pmd->core_id != NON_PMD_CORE_ID) { + char *time_str = + xastrftime_msec("%H:%M:%S.###", time_wall_msec(), true); + long long now = time_msec(); + double duration = (now - pmd->perf_stats.start_ms) / 1000.0; + + ds_put_cstr(reply, "\n"); + ds_put_format(reply, "Time: %s\n", time_str); + ds_put_format(reply, "Measurement duration: %.3f s\n", duration); + ds_put_cstr(reply, "\n"); + format_pmd_thread(reply, pmd); + ds_put_cstr(reply, "\n"); + pmd_perf_format_overall_stats(reply, &pmd->perf_stats, duration); + if (pmd_perf_metrics_enabled(pmd)) { + /* Prevent parallel clearing of perf metrics. */ + ovs_mutex_lock(&pmd->perf_stats.clear_mutex); + if (par->histograms) { + ds_put_cstr(reply, "\n"); + pmd_perf_format_histograms(reply, &pmd->perf_stats); + } + if (par->iter_hist_len > 0) { + ds_put_cstr(reply, "\n"); + pmd_perf_format_iteration_history(reply, &pmd->perf_stats, + par->iter_hist_len); + } + if (par->ms_hist_len > 0) { + ds_put_cstr(reply, "\n"); + pmd_perf_format_ms_history(reply, &pmd->perf_stats, + par->ms_hist_len); + } + ovs_mutex_unlock(&pmd->perf_stats.clear_mutex); + } + free(time_str); + } +} + static int compare_poll_list(const void *a_, const void *b_) { @@ -1068,7 +1116,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], ovs_mutex_lock(&dp_netdev_mutex); while (argc > 1) { - if (!strcmp(argv[1], "-pmd") && argc >= 3) { + if (!strcmp(argv[1], "-pmd") && argc > 2) { if (str_to_uint(argv[2], 10, &core_id)) { filter_on_pmd = true; } @@ -1108,6 +1156,8 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], pmd_perf_stats_clear(&pmd->perf_stats); } else if (type == PMD_INFO_SHOW_STATS) { pmd_info_show_stats(&reply, pmd); + } else if (type == PMD_INFO_PERF_SHOW) { + pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params *)aux); } } free(pmd_list); @@ -1117,6 +1167,48 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], unixctl_command_reply(conn, ds_cstr(&reply)); ds_destroy(&reply); } + +static void +pmd_perf_show_cmd(struct unixctl_conn *conn, int argc, + const char *argv[], + void *aux OVS_UNUSED) +{ + struct pmd_perf_params par; + long int it_hist = 0, ms_hist = 0; + par.histograms = true; + + while (argc > 1) { + if (!strcmp(argv[1], "-nh")) { + par.histograms = false; + argc -= 1; + argv += 1; + } else if (!strcmp(argv[1], "-it") && argc > 2) { + it_hist = strtol(argv[2], NULL, 10); + if (it_hist < 0) { + it_hist = 0; + } else if (it_hist > HISTORY_LEN) { + it_hist = HISTORY_LEN; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-ms") && argc > 2) { + ms_hist = strtol(argv[2], NULL, 10); + if (ms_hist < 0) { + ms_hist = 0; + } else if (ms_hist > HISTORY_LEN) { + ms_hist = HISTORY_LEN; + } + argc -= 2; + argv += 2; + } else { + break; + } + } + par.iter_hist_len = it_hist; + par.ms_hist_len = ms_hist; + par.command_type = PMD_INFO_PERF_SHOW; + dpif_netdev_pmd_info(conn, argc, argv, &par); +} static int dpif_netdev_init(void) @@ -1134,6 +1226,12 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]", 0, 3, dpif_netdev_pmd_info, (void *)&poll_aux); + unixctl_command_register("dpif-netdev/pmd-perf-show", + "[-nh] [-it iter-history-len]" + " [-ms ms-history-len]" + " [-pmd core] [dp]", + 0, 8, pmd_perf_show_cmd, + NULL); unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, NULL); @@ -3021,6 +3119,18 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) } } + bool perf_enabled = smap_get_bool(other_config, "pmd-perf-metrics", false); + bool cur_perf_enabled; + atomic_read_relaxed(&dp->pmd_perf_metrics, &cur_perf_enabled); + if (perf_enabled != cur_perf_enabled) { + atomic_store_relaxed(&dp->pmd_perf_metrics, perf_enabled); + if (perf_enabled) { + VLOG_INFO("PMD performance metrics collection enabled"); + } else { + VLOG_INFO("PMD performance metrics collection disabled"); + } + } + return 0; } @@ -3190,6 +3300,25 @@ dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx) return processing_cycles; } +#if ATOMIC_ALWAYS_LOCK_FREE_8B +static inline bool +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd) +{ + bool pmd_perf_enabled; + atomic_read_relaxed(&pmd->dp->pmd_perf_metrics, &pmd_perf_enabled); + return pmd_perf_enabled; +} +#else +/* If stores and reads of 64-bit integers are not atomic, the full PMD + * performance metrics are not available as locked access to 64 bit + * integers would be prohibitively expensive. */ +static inline bool +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd OVS_UNUSED) +{ + return false; +} +#endif + static int dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, struct tx_port *p) @@ -3265,10 +3394,12 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_rxq *rxq, odp_port_t port_no) { + struct pmd_perf_stats *s = &pmd->perf_stats; struct dp_packet_batch batch; struct cycle_timer timer; int error; - int batch_cnt = 0, output_cnt = 0; + int batch_cnt = 0; + int rem_qlen = 0, *qlen_p = NULL; uint64_t cycles; /* Measure duration for polling and processing rx burst. */ @@ -3277,20 +3408,37 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, pmd->ctx.last_rxq = rxq; dp_packet_batch_init(&batch); - error = netdev_rxq_recv(rxq->rx, &batch, NULL); + /* Fetch the rx queue length only for vhostuser ports. */ + if (pmd_perf_metrics_enabled(pmd) && rxq->is_vhost) { + qlen_p = &rem_qlen; + } + + error = netdev_rxq_recv(rxq->rx, &batch, qlen_p); if (!error) { /* At least one packet received. */ *recirc_depth_get() = 0; pmd_thread_ctx_time_update(pmd); - batch_cnt = batch.count; + if (pmd_perf_metrics_enabled(pmd)) { + /* Update batch histogram. */ + s->current.batches++; + histogram_add_sample(&s->pkts_per_batch, batch_cnt); + /* Update the maximum vhost rx queue fill level. */ + if (rxq->is_vhost && rem_qlen >= 0) { + uint32_t qfill = batch_cnt + rem_qlen; + if (qfill > s->current.max_vhost_qfill) { + s->current.max_vhost_qfill = qfill; + } + } + } + /* Process packet batch. */ dp_netdev_input(pmd, &batch, port_no); /* Assign processing cycles to rx queue. */ cycles = cycle_timer_stop(&pmd->perf_stats, &timer); dp_netdev_rxq_add_cycles(rxq, RXQ_CYCLES_PROC_CURR, cycles); - output_cnt = dp_netdev_pmd_flush_output_packets(pmd, false); + dp_netdev_pmd_flush_output_packets(pmd, false); } else { /* Discard cycles. */ cycle_timer_stop(&pmd->perf_stats, &timer); @@ -3304,7 +3452,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, pmd->ctx.last_rxq = NULL; - return batch_cnt + output_cnt; + return batch_cnt; } static struct tx_port * @@ -3360,6 +3508,7 @@ port_reconfigure(struct dp_netdev_port *port) } port->rxqs[i].port = port; + port->rxqs[i].is_vhost = !strncmp(port->type, "dpdkvhost", 9); err = netdev_rxq_open(netdev, &port->rxqs[i].rx, i); if (err) { @@ -4138,23 +4287,26 @@ reload: pmd->intrvl_tsc_prev = 0; atomic_store_relaxed(&pmd->intrvl_cycles, 0); cycles_counter_update(s); + /* Protect pmd stats from external clearing while polling. */ + ovs_mutex_lock(&pmd->perf_stats.stats_mutex); for (;;) { - uint64_t iter_packets = 0; + uint64_t rx_packets = 0, tx_packets = 0; pmd_perf_start_iteration(s); + for (i = 0; i < poll_cnt; i++) { process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, poll_list[i].port_no); - iter_packets += process_packets; + rx_packets += process_packets; } - if (!iter_packets) { + if (!rx_packets) { /* We didn't receive anything in the process loop. * Check if we need to send something. * There was no time updates on current iteration. */ pmd_thread_ctx_time_update(pmd); - iter_packets += dp_netdev_pmd_flush_output_packets(pmd, false); + tx_packets = dp_netdev_pmd_flush_output_packets(pmd, false); } if (lc++ > 1024) { @@ -4173,8 +4325,10 @@ reload: break; } } - pmd_perf_end_iteration(s, iter_packets); + pmd_perf_end_iteration(s, rx_packets, tx_packets, + pmd_perf_metrics_enabled(pmd)); } + ovs_mutex_unlock(&pmd->perf_stats.stats_mutex); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); exiting = latch_is_set(&pmd->exit_latch); @@ -5069,6 +5223,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct match match; ovs_u128 ufid; int error; + uint64_t cycles = cycles_counter_update(&pmd->perf_stats); match.tun_md.valid = false; miniflow_expand(&key->mf, &match.flow); @@ -5122,6 +5277,14 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_mutex_unlock(&pmd->flow_mutex); emc_probabilistic_insert(pmd, key, netdev_flow); } + if (pmd_perf_metrics_enabled(pmd)) { + /* Update upcall stats. */ + cycles = cycles_counter_update(&pmd->perf_stats) - cycles; + struct pmd_perf_stats *s = &pmd->perf_stats; + s->current.upcalls++; + s->current.upcall_cycles += cycles; + histogram_add_sample(&s->cycles_per_upcall, cycles); + } return error; } diff --git a/manpages.mk b/manpages.mk index d4bf0ec..aaf8bc2 100644 --- a/manpages.mk +++ b/manpages.mk @@ -250,6 +250,7 @@ vswitchd/ovs-vswitchd.8: \ lib/coverage-unixctl.man \ lib/daemon.man \ lib/dpctl.man \ + lib/dpif-netdev-unixctl.man \ lib/memory-unixctl.man \ lib/netdev-dpdk-unixctl.man \ lib/service.man \ @@ -266,6 +267,7 @@ lib/common.man: lib/coverage-unixctl.man: lib/daemon.man: lib/dpctl.man: +lib/dpif-netdev-unixctl.man: lib/memory-unixctl.man: lib/netdev-dpdk-unixctl.man: lib/service.man: diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index 80e5f53..8b4034d 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -256,32 +256,7 @@ type). .. .so lib/dpctl.man . -.SS "DPIF-NETDEV COMMANDS" -These commands are used to expose internal information (mostly statistics) -about the ``dpif-netdev'' userspace datapath. If there is only one datapath -(as is often the case, unless \fBdpctl/\fR commands are used), the \fIdp\fR -argument can be omitted. -.IP "\fBdpif-netdev/pmd-stats-show\fR [\fIdp\fR]" -Shows performance statistics for each pmd thread of the datapath \fIdp\fR. -The special thread ``main'' sums up the statistics of every non pmd thread. -The sum of ``emc hits'', ``masked hits'' and ``miss'' is the number of -packets received by the datapath. Cycles are counted using the TSC or similar -facilities (when available on the platform). To reset these counters use -\fBdpif-netdev/pmd-stats-clear\fR. The duration of one cycle depends on the -measuring infrastructure. ``idle cycles'' refers to cycles spent polling -devices but not receiving any packets. ``processing cycles'' refers to cycles -spent polling devices and successfully receiving packets, plus the cycles -spent processing said packets. -.IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]" -Resets to zero the per pmd thread performance numbers shown by the -\fBdpif-netdev/pmd-stats-show\fR command. It will NOT reset datapath or -bridge statistics, only the values shown by the above command. -.IP "\fBdpif-netdev/pmd-rxq-show\fR [\fIdp\fR]" -For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with -port names, which this thread polls. -.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]" -Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage. -. +.so lib/dpif-netdev-unixctl.man .so lib/netdev-dpdk-unixctl.man .so ofproto/ofproto-dpif-unixctl.man .so ofproto/ofproto-unixctl.man diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 9c2a826..4d3dd1c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -375,6 +375,18 @@

+ +

+ Enables recording of detailed PMD performance metrics for analysis + and trouble-shooting. This can have a performance impact in the + order of 1%. +

+

+ Defaults to false but can be changed at any time. +

+
+