From patchwork Wed Jun 5 23:24:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1110823 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="hd8DkMsw"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45K4dc3GV6z9s9y for ; Thu, 6 Jun 2019 09:25:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726789AbfFEXZH (ORCPT ); Wed, 5 Jun 2019 19:25:07 -0400 Received: from mail-eopbgr30054.outbound.protection.outlook.com ([40.107.3.54]:39078 "EHLO EUR03-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726502AbfFEXZH (ORCPT ); Wed, 5 Jun 2019 19:25:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=R8s75I8iY1Z190szH7/QjPPRO3L5jgrcDtajiI+Gauw=; b=hd8DkMsw06lTy73mB02xDJMgHI+BxzJHacIJLidfol0bSTHm9E8rCy/kgwxUUCPuHdNbFmDtB82NdvGVczGVA4mJSBgX6jYMT5ywwdo4pvL1GOKPaTZ3uJco/8TJM6Q3r8aTKsZKLxWSNTkJ4Wue3GMNL5iXD6zYu2HnNhtwemU= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB6105.eurprd05.prod.outlook.com (20.179.10.223) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1943.22; Wed, 5 Jun 2019 23:24:50 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::4008:6417:32d4:6031]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::4008:6417:32d4:6031%5]) with mapi id 15.20.1965.011; Wed, 5 Jun 2019 23:24:50 +0000 From: Saeed Mahameed To: "David S. Miller" , Jason Gunthorpe , Doug Ledford CC: Michael Chan , Andy Gospodarek , Tal Gilboa , "linux-rdma@vger.kernel.org" , "netdev@vger.kernel.org" , Yamin Friedman , Max Gurtovoy , Saeed Mahameed Subject: [for-next 8/9] linux/dim: Implement rdma_dim Thread-Topic: [for-next 8/9] linux/dim: Implement rdma_dim Thread-Index: AQHVG/Xfi9BxrroRGE+jH0AsIwCvmQ== Date: Wed, 5 Jun 2019 23:24:50 +0000 Message-ID: <20190605232348.6452-9-saeedm@mellanox.com> References: <20190605232348.6452-1-saeedm@mellanox.com> In-Reply-To: <20190605232348.6452-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.21.0 x-originating-ip: [209.116.155.178] x-clientproxiedby: BYAPR02CA0045.namprd02.prod.outlook.com (2603:10b6:a03:54::22) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 60fa4fb6-b216-4195-8404-08d6ea0d01cb x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB6105; x-ms-traffictypediagnostic: DB8PR05MB6105: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 00594E8DBA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(346002)(39850400004)(136003)(366004)(376002)(199004)(189003)(4326008)(316002)(5660300002)(2906002)(66946007)(53936002)(186003)(8936002)(66476007)(8676002)(81166006)(66446008)(1076003)(478600001)(86362001)(50226002)(81156014)(66556008)(14454004)(107886003)(6116002)(73956011)(7736002)(305945005)(446003)(11346002)(64756008)(3846002)(14444005)(71190400001)(99286004)(26005)(25786009)(110136005)(71200400001)(486006)(2616005)(68736007)(6512007)(66066001)(6436002)(76176011)(6486002)(54906003)(36756003)(6506007)(386003)(102836004)(256004)(476003)(52116002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB6105; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: nTmx5ZxRPrbuxYZhHkjFEh2VOowrtS7DS9+Pf5vUQesU97+f38WmdRjsLetza5Hn5VVW6xrAJFDEF3MInwHKUdMgrcXabSDvXmfB60uBZSBECTNDkcEAm9+ojerZOnQQmH1MTohGd84hL8wM1BUAQRvT9WeR3nTS8RE334BDPIFv+EEDtDgnmtSQ7xYkbmjm1bbU0CBo7m5t2jKcRTsm15Kki+kxDC5nkAZxOeqbrvhMiy4f0a50zQ4maJ+YQUhr4sbO1+wEcrEIPDnus3MYzu/iV2ReeEOg14lsPVqRMzxSLHLs5j+6V373j7VB+Wj67cfNXOWUbIEoVHLxfOJG1qSghTSc8ZlWVzZpj2Bjh/7WdnB7NDF0abP4hj/l+IYq/unyWrukVX2+kUeD2fzXlL4O+zmxMioBVXQFfD51nAs= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 60fa4fb6-b216-4195-8404-08d6ea0d01cb X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jun 2019 23:24:50.7941 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: saeedm@mellanox.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB6105 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Yamin Friedman rdma_dim implements a different algorithm than net_dim and is based on completions which is how we can implement interrupt moderation in RDMA. The algorithm optimizes for number of completions and ratio between completions and events. It also has a feature for fast reduction of moderation level when the traffic changes in such a way as to no longer require high moderation in order to avoid long latencies. rdma_dim.h will be called from the ib_core module. Signed-off-by: Yamin Friedman Reviewed-by: Max Gurtovoy Signed-off-by: Saeed Mahameed --- MAINTAINERS | 1 + include/linux/rdma_dim.h | 28 +++++++ lib/dim/Makefile | 7 +- lib/dim/rdma_dim.c | 162 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 197 insertions(+), 1 deletion(-) create mode 100644 include/linux/rdma_dim.h create mode 100644 lib/dim/rdma_dim.c diff --git a/MAINTAINERS b/MAINTAINERS index cb621d5cf223..86e4698ab390 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5589,6 +5589,7 @@ DYNAMIC INTERRUPT MODERATION M: Tal Gilboa S: Maintained F: include/linux/net_dim.h +F: include/linux/rdma_dim.h F: include/linux/dim.h F: lib/dim/ diff --git a/include/linux/rdma_dim.h b/include/linux/rdma_dim.h new file mode 100644 index 000000000000..0623ea5a1e78 --- /dev/null +++ b/include/linux/rdma_dim.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2019 Mellanox Technologies. */ + +#ifndef RDMA_DIM_H +#define RDMA_DIM_H + +#include +#include + +#define RDMA_DIM_PARAMS_NUM_PROFILES 9 +#define RDMA_DIM_START_PROFILE 0 + +static const struct dim_cq_moder +rdma_dim_prof[RDMA_DIM_PARAMS_NUM_PROFILES] = { + {1, 0, 1, 0}, + {1, 0, 4, 0}, + {2, 0, 4, 0}, + {2, 0, 8, 0}, + {4, 0, 8, 0}, + {16, 0, 8, 0}, + {16, 0, 16, 0}, + {32, 0, 16, 0}, + {32, 0, 32, 0}, +}; + +void rdma_dim(struct dim *dim, u64 completions); + +#endif /* RDMA_DIM_H */ diff --git a/lib/dim/Makefile b/lib/dim/Makefile index 160afe288df0..73ddd0c64661 100644 --- a/lib/dim/Makefile +++ b/lib/dim/Makefile @@ -2,8 +2,13 @@ # DIM Dynamic Interrupt Moderation library # -obj-$(CONFIG_DIMLIB) = net_dim.o +obj-$(CONFIG_DIMLIB) += net_dim.o +obj-$(CONFIG_DIMLIB) += rdma_dim.o net_dim-y = \ dim.o \ net_dim.o + +rdma_dim-y = \ + dim.o \ + rdma_dim.o diff --git a/lib/dim/rdma_dim.c b/lib/dim/rdma_dim.c new file mode 100644 index 000000000000..503881ec5614 --- /dev/null +++ b/lib/dim/rdma_dim.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2019, Mellanox Technologies inc. All rights reserved. + */ + +#include + +/** + ** rdma_dim_step: - Moves the moderation profile one step. + ** @dim: The moderation struct. + ** + ** Description: Moves the moderation profile of @dim by one step. If we + ** are at the edge of the profile range returns DIM_ON_EDGE without + ** moving. + **/ +static int rdma_dim_step(struct dim *dim) +{ + switch (dim->tune_state) { + case DIM_PARKING_ON_TOP: + /* fall through */ + case DIM_PARKING_TIRED: + break; + case DIM_GOING_RIGHT: + if (dim->profile_ix == (RDMA_DIM_PARAMS_NUM_PROFILES - 1)) + return DIM_ON_EDGE; + dim->profile_ix++; + dim->steps_right++; + break; + case DIM_GOING_LEFT: + if (dim->profile_ix == 0) + return DIM_ON_EDGE; + dim->profile_ix--; + dim->steps_left++; + break; + } + + return DIM_STEPPED; +} + +/** + ** rdma_dim_stats_compare - Compares the current stats to the previous stats. + ** @curr: The current dim stats. + ** @prev: The previous dim stats. + ** + ** Description: Checks to see if we have significantly more or less + ** completions. + ** If the completions are not greatly changed checks if the completion to + ** event ratio has significantly changed. + **/ +static int rdma_dim_stats_compare(struct dim_stats *curr, + struct dim_stats *prev) +{ + /* first stat */ + if (!prev->cpms) + return DIM_STATS_SAME; + + if (IS_SIGNIFICANT_DIFF(curr->cpms, prev->cpms)) + return (curr->cpms > prev->cpms) ? DIM_STATS_BETTER : + DIM_STATS_WORSE; + + if (IS_SIGNIFICANT_DIFF(curr->cpe_ratio, prev->cpe_ratio)) + return (curr->cpe_ratio > prev->cpe_ratio) ? DIM_STATS_BETTER : + DIM_STATS_WORSE; + + return DIM_STATS_SAME; +} + +/** + ** rdma_dim_decision - Decides the next moderation level. + ** @curr_stats: The current dim stats. + ** @dim: The moderation struct. + ** + ** Description: Uses rdma_dim_stats_compare to decide what the next moderation + ** level should be. If the completion to event ratio is low compared to the + ** current level we reset the moderation to keep latency low. + **/ +static bool rdma_dim_decision(struct dim_stats *curr_stats, struct dim *dim) +{ + int prev_ix = dim->profile_ix; + int stats_res; + int step_res; + + switch (dim->tune_state) { + case DIM_PARKING_ON_TOP: + /* fall through */ + case DIM_PARKING_TIRED: + break; + case DIM_GOING_RIGHT: + /* fall through */ + case DIM_GOING_LEFT: + stats_res = rdma_dim_stats_compare(curr_stats, + &dim->prev_stats); + + switch (stats_res) { + case DIM_STATS_SAME: + if (curr_stats->cpe_ratio <= 50 * prev_ix) + dim->profile_ix = 0; + break; + case DIM_STATS_WORSE: + dim_turn(dim); + /* fall through */ + case DIM_STATS_BETTER: + step_res = rdma_dim_step(dim); + if (step_res == DIM_ON_EDGE) + dim_turn(dim); + break; + } + break; + } + + dim->prev_stats = *curr_stats; + + return dim->profile_ix != prev_ix; +} + +/** + ** rdma_dim - Runs the adaptive moderation. + ** @dim: The moderation struct. + ** @completions: The number of completions collected in this round. + ** + ** Description: Each call to rdma_dim takes the latest amount of + ** completions that have been collected and counts them as a new event. + ** Once enough events have been collected the algorithm decides a new + ** moderation level. + **/ +void rdma_dim(struct dim *dim, u64 completions) +{ + struct dim_stats curr_stats; + u32 nevents; + struct dim_sample *curr_sample = &dim->measuring_sample; + + dim_update_sample_with_comps(curr_sample->event_ctr + 1, + curr_sample->pkt_ctr, + curr_sample->byte_ctr, + curr_sample->comp_ctr + completions, + &dim->measuring_sample); + + switch (dim->state) { + case DIM_MEASURE_IN_PROGRESS: + nevents = curr_sample->event_ctr - dim->start_sample.event_ctr; + if (nevents < DIM_NEVENTS) + break; + dim_calc_stats(&dim->start_sample, curr_sample, &curr_stats); + if (rdma_dim_decision(&curr_stats, dim)) { + dim->state = DIM_APPLY_NEW_PROFILE; + schedule_work(&dim->work); + break; + } + /* fall through */ + case DIM_START_MEASURE: + dim->state = DIM_MEASURE_IN_PROGRESS; + dim_update_sample_with_comps(curr_sample->event_ctr, + curr_sample->pkt_ctr, + curr_sample->byte_ctr, + curr_sample->comp_ctr, + &dim->start_sample); + break; + case DIM_APPLY_NEW_PROFILE: + break; + } +} +EXPORT_SYMBOL(rdma_dim);