From patchwork Thu Apr 12 15:32:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 897722 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=ericsson.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=ericsson.com header.i=@ericsson.com header.b="RUe0jB55"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=ericsson.com header.i=@ericsson.com header.b="K5Gnl6Vg"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40MQ0l2F5dz9s27 for ; Fri, 13 Apr 2018 01:33:27 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 04C4DDA1; Thu, 12 Apr 2018 15:32:51 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id EEDC1D70 for ; Thu, 12 Apr 2018 15:32:49 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from sesbmg22.ericsson.net (sesbmg22.ericsson.net [193.180.251.48]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0B34A689 for ; Thu, 12 Apr 2018 15:32:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=ericsson.com; s=mailgw201801; c=relaxed/simple; q=dns/txt; i=@ericsson.com; t=1523547162; h=From:Sender:Reply-To:Subject:Date:Message-ID:To:CC:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ihYcratXF/0Vi2dh9uwX9m2F+Veuu+EZM5vjGKyWkvw=; b=RUe0jB55FAw1i37uWL0nLx/ujbuDPwiUErfv+zOh47W15Q7hanp1hN37kq9mWvxH sWpbIAT/fFJUaBN24JKKmfkUcmdXOXHkIl2ENtPHCwKHaUF0XM7WYM/QlIyk8fod EsieJDNKxvigHsTPnxJukxGvSY4wjJGh5kUumOZo0zw=; X-AuditID: c1b4fb30-e6c349c000005e22-50-5acf7c14b171 Received: from ESESSHC022.ericsson.se (Unknown_Domain [153.88.183.84]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id 7A.28.24098.41C7FCA5; Thu, 12 Apr 2018 17:32:36 +0200 (CEST) Received: from ESESSMB502.ericsson.se (153.88.183.163) by ESESSHC022.ericsson.se (153.88.183.84) with Microsoft SMTP Server (TLS) id 14.3.382.0; Thu, 12 Apr 2018 17:32:36 +0200 Received: from ESESSMB502.ericsson.se (153.88.183.163) by ESESSMB502.ericsson.se (153.88.183.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.26; Thu, 12 Apr 2018 17:32:35 +0200 Received: from EUR01-DB5-obe.outbound.protection.outlook.com (153.88.183.157) by ESESSMB502.ericsson.se (153.88.183.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.26 via Frontend Transport; Thu, 12 Apr 2018 17:32:35 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=aURcAu21xNhs36711j+UxhzOEAn4+CvDYpV4TvkwAFk=; b=K5Gnl6VgJnYFPqqtiTAD6g9b7CvMeCmvO5w5ZhoHuYNLu8MDsQV3YwLihgBAXv3PAwAzBSmHGphXCyoWV65xt3dDA17P64qCIR0SGSQR9uTJjq9Uh/uXEebyRAZ9e0Qipj7f+17A99txnYNPkQtUOr+qN9h3dbl+EM5JuLraEgg= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=jan.scheurich@ericsson.com; Received: from ubuntu.eed.ericsson.se (129.192.10.2) by DB3PR07MB0650.eurprd07.prod.outlook.com (2a01:111:e400:943d::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.675.4; Thu, 12 Apr 2018 15:32:33 +0000 From: Jan Scheurich To: Date: Thu, 12 Apr 2018 17:32:13 +0200 Message-ID: <1523547133-14055-4-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1523547133-14055-1-git-send-email-jan.scheurich@ericsson.com> References: <1523547133-14055-1-git-send-email-jan.scheurich@ericsson.com> MIME-Version: 1.0 X-Originating-IP: [129.192.10.2] X-ClientProxiedBy: HE1PR05CA0381.eurprd05.prod.outlook.com (2603:10a6:7:94::40) To DB3PR07MB0650.eurprd07.prod.outlook.com (2a01:111:e400:943d::16) X-MS-PublicTrafficType: Email X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:DB3PR07MB0650; X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 3:4tQ0o7GuUNX9qlbyLnP2rzZ5kyKhOqdaBXeDc4QEce04OQYJcdsdqKGCVMEeLnWA6Gl3IoPZblEUT6flVDPBh5R4H+hCg2VS34Mtd4qJsQXjEYRkzMP+Os56cmyFicTAXO6BRO8F0Z1LBOqhGAl4c0R499THJfbADPG7700kUxp49mXSwYz+jjkn1xXCGGm+33aa3ie18gXs1H6zGNDREY5GpIaIhBsWc58EZZhHOZGVELnOCjhbgEHXUnIA2TJL; 25:3xO5BGxiOzTKemYuzPqnTKEt5nSK0SpyR3gkm1lbOAru/wW8qKN4jnBV2XYBNUZOyRI80J4XMSC1vUZp0o2sclIykUipDcgjW2RwMB4vP8YQ0SBO9SOPrNT40xgClzdBaAey9+qBkkT69T4Oaotx/sH7TUK+Sk1OrhCqp5KygDTKpsZhHPfuvd2aNsFkXZBl0W1hOzF6Gz07GZvyNxNenkjN9OlDIELqoPn/9rHvXcCaR6ZNwI5QmLbWmwkwr8snd3YhhDAJlBYoxPl2cCrpEJmDpeoMqDt0jVih0n15Is+oiuZXZv9hJIrZPpMaWKmkaihBMaSqY+3xQt6YsXygvw==; 31:FA9Y+o5doENDwn9uVX4Fh27ay6CyslOim5wvxT41pZtEY/QeeSi3N4SmkoTJktX/aaaANwvOM3xGgyF7jjbkV1IbnuaqpC22ftELyOwM4YSjvsw/fF6wMvDKVSNNNWu5RpwHKFv4Mp9CWjFS64y7elV8Ub0EZPZA8Snu2u55he7K1bCm0cUvwbOHLAcutpiifakc+sHymL7tTJJ2dc3PdsdxvTf6QbZUNPdd7D3hsmc= X-MS-TrafficTypeDiagnostic: DB3PR07MB0650: X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 20:GRIRg5NzdlBsb72+14rdioxrbYe+xylnDip9whH+s8os4jk4If3lIUFUw3HwiaTBeppzNi9rGV3XNDCyjYE0aDRbITb0JAlq24aScfLfg9m1j/i/QgpTqMtjhofQvEQTnx9XclXvbnJkAMRkiV2gJ1Vd1/bFtWGc4kr9aGlqh3qW39zQL2XL+QQROQ18RuGCEsd5bFXnSq0zGcqYSpy7SzNaY89zHDxhoSsY4nUx3zAcLHtmAGM4ww/8YpN95PSO0P3hFJcLiFlZ0vRL551gpCLrtOoHG1we3O46k6g/3CMF3zaC+/CU+575p5qPpWrI6/0HDursrkjZxIVjMTfBKcrDjY/Zk6Vd5pB1IuiK38MuW2d6Xwi1uVwPlVgOxMwmkpskqP2zejLt4DX9WDJogOuF+E27Cg6XsBcm0Hch0jzGef2IZGmA7/aY3z6a+kqR6ZNmc3ig2A6JAIxm60kbc7x9Nx0Rj1dSRHWxqyDjD3LIT2eMKqyXuFc6ff8vONT2; 4:QmyzTX2/K3LNVhz5ZKqLGeDgpid4sTwu5TJibt4dwf0/Tv3jBbbsNwrW67AYa/hOOLtqurS5mZY9mZEjySEOgyvxR3FQ9ZB8XFrowmi4vM0B9HrYGdl/sMh1KY3RtpiTdYhgPJ+x/KWR/2dVRB/xSPgvpxroXYSqVNZurUk8or8gyYMaiBhVk1JStq80aQLmBcDn8y6q9lDcubkt+nIJ5jSde7TQQJC50NVRwJEpBm9+bcQu/SMzTcDEv51tngTXzKJ+9PVLyYE3etHbQqbULBaa5q9eqLxF9oUO8/kxTUc/yd4WfquEpuntaTOv8qLVKgdR9i8il3mQ1EPw/05+GIAxG4VowU0MirtZ/HSmOvdYwmBfEujsCsgtsEYjPG7a X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(37575265505322)(228905959029699)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(10201501046)(3231221)(944501327)(52105095)(93006095)(93001095)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011); SRVR:DB3PR07MB0650; BCL:0; PCL:0; RULEID:; SRVR:DB3PR07MB0650; X-Forefront-PRVS: 06400060E1 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(39380400002)(396003)(39860400002)(366004)(346002)(376002)(189003)(199004)(186003)(6486002)(16586007)(11346002)(52116002)(316002)(7736002)(956004)(386003)(305945005)(2361001)(47776003)(50466002)(59450400001)(6506007)(76176011)(2351001)(2616005)(476003)(16526019)(26005)(8936002)(44832011)(48376002)(105586002)(86362001)(6916009)(575784001)(36756003)(50226002)(8676002)(478600001)(51416003)(3846002)(53936002)(4326008)(446003)(6512007)(6116002)(81156014)(486006)(81166006)(66066001)(6666003)(5660300001)(97736004)(68736007)(106356001)(2906002)(25786009)(107886003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR07MB0650; H:ubuntu.eed.ericsson.se; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; Received-SPF: None (protection.outlook.com: ericsson.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DB3PR07MB0650; 23:DHqvCqWFdLQRbQrOAgZNmrFbbc+wGMlyBQOk/a+XK?= OQZk50Tw948jcr9Ro4/mQVeYplBiKeWZLNXdzCPk9c+0aFS0UYLoxgwhNm6pb9heDcLqh7NsfVVHe+nhe2us1ZxrcLpMBPFVLk2zjwwP8rqJ1dlzeTuRQJu475rScmCUPoMFuvV6+D20fNriCExIfTpIlR3pLaSzayIJRkUikSX+8/H5/SBVUsz0t2GmSJNsk7t9qRgtIHy0tg5O/ao0WrCnM9ZdNdiYS5tnYn1EvTXn3TmzJyKm5pUMQhPkrABQUXqqHV6DBxwMhy7Gj20/QeoR6NuST/UEYJX5tLnniJrPciV2v3/5LV15OR0L8VrI7VwWkwrAetlGO82VSkc4sjUGut8Ey52RXvIfRPgIB9jLWgqF81gbQclKifxMdc6FfNMjo+a4W3cKf5pAhiLHFlLFVmWfzKt6mCcGOJGKaVr7IBxEs8zWRkQmB9cCE21xdLmO1+IhLXGulmXI1q/+4vGqjr4VL8+OJ53FD5zNAiSmjWKo7TLu7yIBhqFWQDQuksGQwDlVMt8Jhi109yaS4iV0pbMl3dZLaTtK2ezvK2fGhEelbRmH9b/qN2pbXze7g5dy6dg5eQhgfHpFpBYJItJHDWJol4yKWNKGY6YGIisNOgIcO/kfLS6DQe++2KdmD1Phm8lmftEoEkEl+VH8Gt5b0ZtEnTTTeFgjKbRHYfzcReFkCYF6/2Z2WILAZH3LIO5Mqn7NqQHVajpJfyOS2Lbythek/uaNkNj5xuf75da5V21cbDWhkKUpAtHGX0oBtJRAs0pDuXCFWGt3I/iuurgBwvaF/ajPNa0ZCbYyRSjMbp9lUmE3S5OGoNsGH2MDvnAtBRJUFy+rMgwHJ8LnEurP1vQBphtPQV7e94bDgLUojF+S+KH4Gk2kUrI4/ZxvHFbWrlrWqX6ICeTubgRdQuiMlORWlvkeW51KTZL2MROPU57MS8GA6cFWn3voYoaO+5B64JPX/1rSUU536A4kKI2gTioIhy309ig8htKazcCqvEpCBZ0hLzOT6Vk8Tp1X06LRoIySUqZuIeCj8fBGJgeVJvaB81tgw64OowN+p4BVb5Ow7q5vbesxz9VxUDYNb6csDV3zdx9au/s0mUlV8iXxAmOtRxMaU1/uuC5jDCrFZDjflrSX8hhNs2NTU1Zfenw2k/RcpUkZkCuGl6XYEbd217w4wKL3EoMrwVXfbtnvDxcvXQD78GK4gQgRJjW4d+HgQbPLzSyHlx8G6cWf/7kEo1cFCRIzKWN+r+4UZCBfW5MdbYEMsQI1pcwD5cTz68= X-Microsoft-Antispam-Message-Info: 6ujynY9/4Vip3J9vvgjeiPHBxCu9tTHIJXcQNMvN12RljHUGZtcdOBztk6KvviHYbB0a6QIAPX+C8KnP9ckXA526a7R29i38iSCjGnKMm8CDBpaUDDz3g/+H6n9ayNz2/b7kR3smxUqr8iXPj0jz8DIVbeuNGPIEzl8UjMLXyDjtntcEcmaw6G/R1F67mu3Q X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 6:Rg71mrqIqxtmK8fwKJ5iDBwwSpmBhPAfgv7xyCRYkeurd33egf5WCByYRgntJUiETh6N+HEQ/tgS+UE666Ggh6i/6Svxkc+WD/cqTYrDyz9maaVDuGETR7QXHCyTJz8mZCggNtHXFp3k9ktUWAe1rxcRaRLEP6uaO40nlT0qAZBcFnjJ8ILA6ZLTb3w8ieDAjDmln2kC0O4HfhJm78t1HwDleVzNqqjrOBLagt1qJgG+UTeHSHhUgXwF0qV08CDRRRiVFKzWuYVMbryS4mzNt5qSHSQ4u3DHltZRSS9ALgjCCJhse9KI4KiXYV1r5cbeD3JnZujoiXgI6OJ+aWf+vRVOhpPne+I0DSPcpoS2GGzXT9MRVyi6xSaZKwelPsSq+zM2Qswt+qYA1QyapzXgezYYc/eOPy+yGs4xcp5b4UKRPiESaleosS6mLq7XYk0s8JiUz9kgIXnK6qYwGfNTQQ==; 5:rJvTEPI3NC+vOW1dD+mVjtLq45dpLHvExT/Wjk8hk7BH1Nvrg595SedsNlosMM84C8QEeMgeWhyzsh+VVa1UtflXm3FHioEouWLrooN2SH0OGgXuguJxsBsfCkEcpKTqFyYJLw1oilRRU5tmBefUlKiy87IiONv8jW8GePEEj6Q=; 24:KL11TtkKykT3GQYVKLTYRWX+iRGQl8vAJUXbiIrC/vxebFomHNjY3KodriyDChr5OKKKpCEf6eYRQCFH5PjJD9Vnl5b7XXiwAg8ERpMi1gQ= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DB3PR07MB0650; 7:8wv1o6JgpY0vYv1rtBEFZW2mTkjyZpgO+ZKMNaKrXv7zCsDsQjTaYbmo1KISm+gwIJtP4ety7gdiC8BuaQGhGWaiwl0rJ8vYT5yKkbMG18hZ4oWcJ91smohgXYAiMSGsulexhRuWY4WfMZlfiWCm3Whe0PuYhpp+qb8HXo0vBWk59TEzskcRy2Cn28I6ZBjmgHTj1fsIkyCZCPp/6jBRAtMi1kr82DPriHHs7zK6hHjiTk65FuLkI3qcZJlJZJ+z X-MS-Office365-Filtering-Correlation-Id: d966be37-25ea-4afa-197d-08d5a08a9ce8 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Apr 2018 15:32:33.3850 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d966be37-25ea-4afa-197d-08d5a08a9ce8 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR07MB0650 X-OriginatorOrg: ericsson.com X-Brightmail-Tracker: H4sIAAAAAAAAA01Sa0iTYRjl/S7b53L1ulY9rpsMIhOyZQX7Ec7AaCVERIEsQmd+6lCnbVMy +jE1CBVvpMlMKzLFpCXmNFGRnFPxhtK8UIblKKc/amrOMCnz81vgv3Oec87znB8PQ0oKaBmj 05tYg16bIheIKEv022vHZfdGNQpX2S5lm7UWKXuHOknl+IM1obJpdoRWWu2LwghaXdO5QKjn PmwgtadrQqAusjWgK5RGdDaeTdFlsoYT4bGipCJnniB9vhDdyc3rFZiRJyEf+TGAT8N756Ig H4kYCXYg6KgY9hEbgkr3GMGTXwg87lmaJ7UEdJdkCzlC4WUCCh0/BNwyhG/C/ZZyXySfgB6X ZTPCMAKsgMcdEZxHigPhb41t6waJyxHkulcJTtiNoyFnsA5xmMJH4EXx+NZcjC/BU/dzId/2 EAz0PaQ57Iej4GdzNsntl3CeV3LeHgADlq8Uh0kM0D0/T/LRIGiyWSjuLuAKBP3uwa39EnwU /ti9FG86CL0jVT5TK4KCyS4hT8xCeOKqIXhXCDhLZ3xCmxAmGmdpXkiGd/X5iMc3IHu4UcDj WhK+zWTy+AAMlTX5wp9oWH5UKixBoZXbuldu6/4MEQ1oj5E1xqUmhoWFsgbdLaMxTR+qZ01v 0OaLdNvWFW1owX3OjjCD5P5ia+qoRkJrM41ZqXYEDCmXiqvCN0fieG3WXdaQFmPISGGNdrSf oeT7xMrLzRoJTtSa2GSWTWcN/1WC8ZOZUbK352V71JIsbvpz8ORh//a6SJW0f6jMLlZdjYn9 /noweGNBWTh2vSJ6JDahb81qK26pV/VsfImalO/Imoif20v3LbkSS2VXvbLqlYGuM6ZVlSPY kTE1vea9wHgIbJLuXE8zmOXVgcdWLupaclqnlAGLCuep37cjq8+XfBQFySljkvZkCGkwav8B aO0MZx4DAAA= X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v11 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch enhances dpif-netdev-perf to detect iterations with suspicious statistics according to the following criteria: - iteration lasts longer than US_THR microseconds (default 250). This can be used to capture events where a PMD is blocked or interrupted for such a period of time that there is a risk for dropped packets on any of its Rx queues. - max vhost qlen exceeds a threshold Q_THR (default 128). This can be used to infer virtio queue overruns and dropped packets inside a VM, which are not visible in OVS otherwise. Such suspicious iterations can be logged together with their iteration statistics to be able to correlate them to packet drop or other events outside OVS. A new command is introduced to enable/disable logging at run-time and to adjust the above thresholds for suspicious iterations: ovs-appctl dpif-netdev/pmd-perf-log-set on | off [-b before] [-a after] [-e|-ne] [-us usec] [-q qlen] Turn logging on or off at run-time (on|off). -b before: The number of iterations before the suspicious iteration to be logged (default 5). -a after: The number of iterations after the suspicious iteration to be logged (default 5). -e: Extend logging interval if another suspicious iteration is detected before logging occurs. -ne: Do not extend logging interval (default). -q qlen: Suspicious vhost queue fill level threshold. Increase this to 512 if the Qemu supports 1024 virtio queue length. (default 128). -us usec: change the duration threshold for a suspicious iteration (default 250 us). Note: Logging of suspicious iterations itself consumes a considerable amount of processing cycles of a PMD which may be visible in the iteration history. In the worst case this can lead OVS to detect another suspicious iteration caused by logging. If more than 100 iterations around a suspicious iteration have been logged once, OVS falls back to the safe default values (-b 5/-a 5/-ne) to avoid that logging itself causes continuos further logging. Signed-off-by: Jan Scheurich Acked-by: Billy O'Mahony --- NEWS | 2 + lib/dpif-netdev-perf.c | 223 ++++++++++++++++++++++++++++++++++++++++++++ lib/dpif-netdev-perf.h | 21 +++++ lib/dpif-netdev-unixctl.man | 59 ++++++++++++ lib/dpif-netdev.c | 5 + 5 files changed, 310 insertions(+) diff --git a/NEWS b/NEWS index ff81a9f..f11ef64 100644 --- a/NEWS +++ b/NEWS @@ -23,6 +23,8 @@ Post-v2.9.0 * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD * Detailed PMD performance metrics available with new command ovs-appctl dpif-netdev/pmd-perf-show + * Supervision of PMD performance metrics and logging of suspicious + iterations v2.9.0 - 19 Feb 2018 -------------------- diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index caa0e27..47ce2c2 100644 --- a/lib/dpif-netdev-perf.c +++ b/lib/dpif-netdev-perf.c @@ -25,6 +25,24 @@ VLOG_DEFINE_THIS_MODULE(pmd_perf); +#define ITER_US_THRESHOLD 250 /* Warning threshold for iteration duration + in microseconds. */ +#define VHOST_QUEUE_FULL 128 /* Size of the virtio TX queue. */ +#define LOG_IT_BEFORE 5 /* Number of iterations to log before + suspicious iteration. */ +#define LOG_IT_AFTER 5 /* Number of iterations to log after + suspicious iteration. */ + +bool log_enabled = false; +bool log_extend = false; +static uint32_t log_it_before = LOG_IT_BEFORE; +static uint32_t log_it_after = LOG_IT_AFTER; +static uint32_t log_us_thr = ITER_US_THRESHOLD; +uint32_t log_q_thr = VHOST_QUEUE_FULL; +uint64_t iter_cycle_threshold; + +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600); + #ifdef DPDK_NETDEV static uint64_t get_tsc_hz(void) @@ -141,6 +159,10 @@ pmd_perf_stats_init(struct pmd_perf_stats *s) histogram_walls_set_log(&s->max_vhost_qfill, 0, 512); s->iteration_cnt = 0; s->start_ms = time_msec(); + s->log_susp_it = UINT32_MAX; + s->log_begin_it = UINT32_MAX; + s->log_end_it = UINT32_MAX; + s->log_reason = NULL; } void @@ -391,6 +413,10 @@ pmd_perf_stats_clear_lock(struct pmd_perf_stats *s) history_init(&s->milliseconds); s->start_ms = time_msec(); s->milliseconds.sample[0].timestamp = s->start_ms; + s->log_susp_it = UINT32_MAX; + s->log_begin_it = UINT32_MAX; + s->log_end_it = UINT32_MAX; + s->log_reason = NULL; /* Clearing finished. */ s->clear = false; ovs_mutex_unlock(&s->clear_mutex); @@ -442,6 +468,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets, uint64_t now_tsc = cycles_counter_update(s); struct iter_stats *cum_ms; uint64_t cycles, cycles_per_pkt = 0; + char *reason = NULL; cycles = now_tsc - s->start_tsc; s->current.timestamp = s->iteration_cnt; @@ -491,9 +518,30 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets, cum_ms->batches += s->current.batches; cum_ms->max_vhost_qfill += s->current.max_vhost_qfill; + if (log_enabled) { + /* Log suspicious iterations. */ + if (cycles > iter_cycle_threshold) { + reason = "Excessive total cycles"; + } else if (s->current.max_vhost_qfill >= log_q_thr) { + reason = "Vhost RX queue full"; + } + if (OVS_UNLIKELY(reason)) { + pmd_perf_set_log_susp_iteration(s, reason); + cycles_counter_update(s); + } + + /* Log iteration interval around suspicious iteration when reaching + * the end of the range to be logged. */ + if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) { + pmd_perf_log_susp_iteration_neighborhood(s); + cycles_counter_update(s); + } + } + /* Store in iteration history. This advances the iteration idx and * clears the next slot in the iteration history. */ history_store(&s->iterations, &s->current); + if (now_tsc > s->next_check_tsc) { /* Check if ms is completed and store in milliseconds history. */ uint64_t now = time_msec(); @@ -514,3 +562,178 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets, s->next_check_tsc = cycles_counter_update(s) + 10000; } } + +/* Delay logging of the suspicious iteration and the range of iterations + * around it until after the last iteration in the range to be logged. + * This avoids any distortion of the measurements through the cost of + * logging itself. */ + +void +pmd_perf_set_log_susp_iteration(struct pmd_perf_stats *s, + char *reason) +{ + if (s->log_susp_it == UINT32_MAX) { + /* No logging scheduled yet. */ + s->log_susp_it = s->iterations.idx; + s->log_reason = reason; + s->log_begin_it = history_sub(s->iterations.idx, log_it_before); + s->log_end_it = history_add(s->iterations.idx, log_it_after + 1); + } else if (log_extend) { + /* Logging was initiated earlier, we log the previous suspicious + * iteration now and extend the logging interval, if possible. */ + struct iter_stats *susp = &s->iterations.sample[s->log_susp_it]; + uint32_t new_end_it, old_range, new_range; + + VLOG_WARN_RL(&latency_rl, + "Suspicious iteration (%s): iter=%"PRIu64 + " duration=%"PRIu64" us\n", + s->log_reason, + susp->timestamp, + (1000000L * susp->cycles) / get_tsc_hz()); + + new_end_it = history_add(s->iterations.idx, log_it_after + 1); + new_range = history_sub(new_end_it, s->log_begin_it); + old_range = history_sub(s->log_end_it, s->log_begin_it); + if (new_range < old_range) { + /* Extended range exceeds history length. */ + new_end_it = s->log_begin_it; + } + s->log_susp_it = s->iterations.idx; + s->log_reason = reason; + s->log_end_it = new_end_it; + } +} + +void +pmd_perf_log_susp_iteration_neighborhood(struct pmd_perf_stats *s) +{ + ovs_assert(s->log_reason != NULL); + ovs_assert(s->log_susp_it != UINT32_MAX); + + struct ds log = DS_EMPTY_INITIALIZER; + struct iter_stats *susp = &s->iterations.sample[s->log_susp_it]; + uint32_t range = history_sub(s->log_end_it, s->log_begin_it); + + VLOG_WARN_RL(&latency_rl, + "Suspicious iteration (%s): iter=%"PRIu64 + " duration=%"PRIu64" us\n", + s->log_reason, + susp->timestamp, + (1000000L * susp->cycles) / get_tsc_hz()); + + pmd_perf_format_iteration_history(&log, s, range); + VLOG_WARN_RL(&latency_rl, + "Neighborhood of suspicious iteration:\n" + "%s", ds_cstr(&log)); + ds_destroy(&log); + s->log_susp_it = s->log_end_it = s->log_begin_it = UINT32_MAX; + s->log_reason = NULL; + + if (range > 100) { + /* Reset to safe default values to avoid disturbance. */ + log_it_before = LOG_IT_BEFORE; + log_it_after = LOG_IT_AFTER; + log_extend = false; + } +} + +void +pmd_perf_log_set_cmd(struct unixctl_conn *conn, + int argc, const char *argv[], + void *aux OVS_UNUSED) +{ + unsigned int it_before, it_after, us_thr, q_thr; + bool on, extend; + bool usage = false; + + on = log_enabled; + extend = log_extend; + it_before = log_it_before; + it_after = log_it_after; + q_thr = log_q_thr; + us_thr = log_us_thr; + + while (argc > 1) { + if (!strcmp(argv[1], "on")) { + on = true; + argc--; + argv++; + } else if (!strcmp(argv[1], "off")) { + on = false; + argc--; + argv++; + } else if (!strcmp(argv[1], "-e")) { + extend = true; + argc--; + argv++; + } else if (!strcmp(argv[1], "-ne")) { + extend = false; + argc--; + argv++; + } else if (!strcmp(argv[1], "-a") && argc > 2) { + if (str_to_uint(argv[2], 10, &it_after)) { + if (it_after > HISTORY_LEN - 2) { + it_after = HISTORY_LEN - 2; + } + } else { + usage = true; + break; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-b") && argc > 2) { + if (str_to_uint(argv[2], 10, &it_before)) { + if (it_before > HISTORY_LEN - 2) { + it_before = HISTORY_LEN - 2; + } + } else { + usage = true; + break; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-q") && argc > 2) { + if (!str_to_uint(argv[2], 10, &q_thr)) { + usage = true; + break; + } + argc -= 2; + argv += 2; + } else if (!strcmp(argv[1], "-us") && argc > 2) { + if (!str_to_uint(argv[2], 10, &us_thr)) { + usage = true; + break; + } + argc -= 2; + argv += 2; + } else { + usage = true; + break; + } + } + if (it_before + it_after > HISTORY_LEN - 2) { + it_after = HISTORY_LEN - 2 - it_before; + } + + if (usage) { + unixctl_command_reply_error(conn, + "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set " + "[on|off] [-b before] [-a after] [-e|-ne] " + "[-us usec] [-q qlen]"); + return; + } + + VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, extend=%s, " + "us_thr=%d, q_thr=%d\n", + on ? "on" : "off", it_before, it_after, + extend ? "true" : "false", us_thr, q_thr); + log_enabled = on; + log_extend = extend; + log_it_before = it_before; + log_it_after = it_after; + log_q_thr = q_thr; + log_us_thr = us_thr; + iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L; + + unixctl_command_reply(conn, ""); +} diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index e225d10..b8aa4e3 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -175,6 +175,14 @@ struct pmd_perf_stats { struct history iterations; /* Millisecond history buffer. */ struct history milliseconds; + /* Suspicious iteration log. */ + uint32_t log_susp_it; + /* Start of iteration range to log. */ + uint32_t log_begin_it; + /* End of iteration range to log. */ + uint32_t log_end_it; + /* Reason for logging suspicious iteration. */ + char *log_reason; }; /* Support for accurate timing of PMD execution on TSC clock cycle level. @@ -350,6 +358,16 @@ history_store(struct history *h, struct iter_stats *is) return history_next(h); } +/* Data and function related to logging of suspicious iterations. */ + +extern bool log_enabled; +extern bool log_extend; +extern uint32_t log_q_thr; +extern uint64_t iter_cycle_threshold; + +void pmd_perf_set_log_susp_iteration(struct pmd_perf_stats *s, char *reason); +void pmd_perf_log_susp_iteration_neighborhood(struct pmd_perf_stats *s); + /* Functions recording PMD metrics per iteration. */ void @@ -375,6 +393,9 @@ void pmd_perf_format_iteration_history(struct ds *str, int n_iter); void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int n_ms); +void pmd_perf_log_set_cmd(struct unixctl_conn *conn, + int argc, const char *argv[], + void *aux OVS_UNUSED); #ifdef __cplusplus } diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man index 76c3e4e..ab8619e 100644 --- a/lib/dpif-netdev-unixctl.man +++ b/lib/dpif-netdev-unixctl.man @@ -149,6 +149,65 @@ reported by the \fBdpif-netdev/pmd-stats-show\fR command. To reset the counters and start a new measurement use \fBdpif-netdev/pmd-stats-clear\fR. . +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \ +[\fB-b\fR \fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-e\fR|\fB-ne\fR] \ +[\fB-us\fR \fIusec\fR] [\fB-q\fR \fIqlen\fR]" +. +The userspace "netdev" datapath is able to supervise the PMD performance +metrics and detect iterations with suspicious statistics according to the +following criteria: +.RS +.IP \(em +The iteration lasts longer than \fIusec\fR microseconds (default 250). +This can be used to capture events where a PMD is blocked or interrupted for +such a period of time that there is a risk for dropped packets on any of its Rx +queues. +.IP \(em +The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This can be +used to infer virtio queue overruns and dropped packets inside a VM, which are +not visible in OVS otherwise. +.RE +.IP +Such suspicious iterations can be logged together with their iteration +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them to +packet drop or other events outside OVS. + +The above command enables (\fBon\fR) or disables (\fBoff\fR) supervision and +logging at run-time and can be used to adjust the above thresholds for +detecting suspicious iterations. By default supervision and logging is +disabled. + +The command options are: +.RS +.IP "\fB-b\fR \fIbefore\fR" +The number of iterations before the suspicious iteration to be logged +(default 5). +.IP "\fB-a\fR \fIafter\fR" +The number of iterations after the suspicious iteration to be logged +(default 5). +.IP "\fB-e\fR" +Extend logging interval if another suspicious iteration is detected +before logging occurs. +.IP "\fB-ne\fR" +Do not extend logging interval if another suspicious iteration is detected +before logging occurs (default). +.IP "\fB-q\fR \fIqlen\fR" +Suspicious vhost queue fill level threshold. Increase this to 512 if the Qemu +supports 1024 virtio queue length (default 128). +.IP "\fB-us\fR \fIusec\fR" +Change the duration threshold for a suspicious iteration (default 250 us). +.RE + +Note: Logging of suspicious iterations itself consumes a considerable amount +of processing cycles of a PMD which may be visible in the iteration history. +In the worst case this can lead OVS to detect another suspicious iteration +caused by logging. + +If more than 100 iterations around a suspicious iteration have been logged +once, OVS falls back to the safe default values (-b 5 -a 5 -ne) to avoid +that logging itself continuously causes logging of further suspicious +iterations. +. .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" For one or all pmd threads of the datapath \fIdp\fR show the list of queue-ids with port names, which this thread polls. diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 6242968..f86ed2a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1235,6 +1235,11 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, NULL); + unixctl_command_register("dpif-netdev/pmd-perf-log-set", + "on|off [-b before] [-a after] [-e|-ne] " + "[-us usec] [-q qlen]", + 0, 10, pmd_perf_log_set_cmd, + NULL); return 0; }