From patchwork Tue Aug 25 18:12:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1351213 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BbcY506G2z9sTN; Wed, 26 Aug 2020 04:12:57 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kAdRG-0002A2-4U; Tue, 25 Aug 2020 18:12:54 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kAdRE-00029O-Ac for kernel-team@lists.ubuntu.com; Tue, 25 Aug 2020 18:12:52 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kAdRD-00070u-VY for kernel-team@lists.ubuntu.com; Tue, 25 Aug 2020 18:12:52 +0000 Received: by mail-qk1-f197.google.com with SMTP id b207so5253628qkc.12 for ; Tue, 25 Aug 2020 11:12:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OOqQbjdY+HF9d1qwZSXGzHy8WZr9A2IogZCTVB/MzPc=; b=NlAW5t7zY+yXZ4rf/SZOlRwa9PlrVP5w5lIER37RennpipC76oTrHpA2+1ztFrco1D K78bUITQN2JEWwztJQf8TXz3SRpdNBh00je36GUnoZugMxUtm1VOf+ngDdxfcvCkucEK DOjndwXy481Ae10Qb2zZL9F/FlCGZMZGLTxNQUrWLfPMA1HsccSMwOQJAEqYZn8bCXj0 1epgqAhhVUPNdlr2E1JuNFwnvsTs7flkQvsiezXQJ4pfbnay3qV+V02rBEygtQFF1I67 KZjuM32/SAZX4lhw276oMkwzKbUMU4ojuQMlxjhtpIVsvA9NjfNsP7hv/enkxgreL9x3 Fp2g== X-Gm-Message-State: AOAM532awFRbrZ4q4st9Q4LZvSqjJ9aZHJbCd3+nVSVsuEq9ILpmVzs8 ZyazmSTfvp4Cf1C4g9CKbnbnGf/n2dhdr0uA0vYv6grcAm4Cl5vQlS5cMhJLmRD682FjEfblEgz Ki2Kbe8/GqnJbklWdF8vMGpgIBkcrUe6Gu4UOdkjM X-Received: by 2002:a37:74d:: with SMTP id 74mr10549795qkh.147.1598379170577; Tue, 25 Aug 2020 11:12:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxfMRqL7vrzJb2BsP31nE5yOIh6uq/LW5RHv1f+OcZKqw9UMURLrUCMJ5iDE4dE4n6M9+CYnA== X-Received: by 2002:a37:74d:: with SMTP id 74mr10549779qkh.147.1598379170256; Tue, 25 Aug 2020 11:12:50 -0700 (PDT) Received: from localhost.localdomain ([201.82.186.200]) by smtp.gmail.com with ESMTPSA id r48sm881332qtb.26.2020.08.25.11.12.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:12:49 -0700 (PDT) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [bionic:linux-azure-4.15][PATCH 1/3] RDMA/rxe: Close a race after ib_register_device Date: Tue, 25 Aug 2020 15:12:26 -0300 Message-Id: <20200825181228.3274075-2-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200825181228.3274075-1-marcelo.cerri@canonical.com> References: <20200825180923.3254380-1-marcelo.cerri@canonical.com> <20200825181228.3274075-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Gunthorpe BugLink: https://bugs.launchpad.net/bugs/1874503 Since rxe allows unregistration from other threads the rxe pointer can become invalid any moment after ib_register_driver returns. This could cause a user triggered use after free. Add another driver callback to be called right after the device becomes registered to complete any device setup required post-registration. This callback has enough core locking to prevent the device from becoming unregistered. Signed-off-by: Jason Gunthorpe (backported from commit ca22354b140853b8155692d5b2bc0110aa54e937) Signed-off-by: Sultan Alsawaf Signed-off-by: Marcelo Henrique Cerri --- drivers/infiniband/core/device.c | 8 ++++++++ drivers/infiniband/sw/rxe/rxe_net.c | 8 ++++---- drivers/infiniband/sw/rxe/rxe_net.h | 2 +- drivers/infiniband/sw/rxe/rxe_sysfs.c | 8 ++------ drivers/infiniband/sw/rxe/rxe_verbs.c | 14 ++++++++++++++ include/rdma/ib_verbs.h | 5 +++++ 6 files changed, 34 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 6b0d1d8609ca..664e497d2daf 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -547,6 +547,12 @@ int ib_register_device(struct ib_device *device, device->reg_state = IB_DEV_REGISTERED; + if (device->enable_driver) { + ret = device->enable_driver(device); + if (ret) + goto sysfs_cleanup; + } + list_for_each_entry(client, &client_list, list) if (!add_client_context(device, client) && client->add) client->add(device); @@ -558,6 +564,8 @@ int ib_register_device(struct ib_device *device, mutex_unlock(&device_mutex); return 0; +sysfs_cleanup: + ib_device_unregister_sysfs(device); cg_cleanup: ib_device_unregister_rdmacg(device); cache_cleanup: diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 24a68a9da8be..d568dc12a00b 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -551,27 +551,27 @@ enum rdma_link_layer rxe_link_layer(struct rxe_dev *rxe, unsigned int port_num) return IB_LINK_LAYER_ETHERNET; } -struct rxe_dev *rxe_net_add(struct net_device *ndev) +int rxe_net_add(struct net_device *ndev) { int err; struct rxe_dev *rxe = NULL; rxe = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe)); if (!rxe) - return NULL; + return -ENOMEM; rxe->ndev = ndev; err = rxe_add(rxe, ndev->mtu); if (err) { ib_dealloc_device(&rxe->ib_dev); - return NULL; + return err; } spin_lock_bh(&dev_list_lock); list_add_tail(&rxe->list, &rxe_dev_list); spin_unlock_bh(&dev_list_lock); - return rxe; + return 0; } void rxe_remove_all(void) diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 1c06b3bfe1b6..0a889f21f64d 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -47,7 +47,7 @@ extern struct rxe_recv_sockets recv_sockets; extern struct notifier_block rxe_net_notifier; void rxe_release_udp_tunnel(struct socket *sk); -struct rxe_dev *rxe_net_add(struct net_device *ndev); +int rxe_net_add(struct net_device *ndev); int rxe_net_init(void); void rxe_net_exit(void); diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index d5ed7571128f..08c9e9539977 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -75,7 +75,6 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) int err = 0; char intf[32]; struct net_device *ndev = NULL; - struct rxe_dev *rxe; len = sanitize_arg(val, intf, sizeof(intf)); if (!len) { @@ -97,15 +96,12 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) goto err; } - rxe = rxe_net_add(ndev); - if (!rxe) { + err = rxe_net_add(ndev); + if (err) { pr_err("failed to add %s\n", intf); - err = -EINVAL; goto err; } - rxe_set_port_state(ndev); - pr_info("added %s to %s\n", rxe->ib_dev.name, intf); err: if (ndev) dev_put(ndev); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index f9cec72273cd..72f65a5a4ee2 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1188,6 +1188,15 @@ static struct device_attribute *rxe_dev_attributes[] = { &dev_attr_parent, }; +static int rxe_enable_driver(struct ib_device *ib_dev) +{ + struct rxe_dev *rxe = container_of(ib_dev, struct rxe_dev, ib_dev); + + rxe_set_port_state(rxe); + dev_info(&rxe->ib_dev.dev, "added %s\n", netdev_name(rxe->ndev)); + return 0; +} + int rxe_register_device(struct rxe_dev *rxe) { int err; @@ -1290,6 +1299,7 @@ int rxe_register_device(struct rxe_dev *rxe) dev->detach_mcast = rxe_detach_mcast; dev->get_hw_stats = rxe_ib_get_hw_stats; dev->alloc_hw_stats = rxe_ib_alloc_hw_stats; + dev->enable_driver = rxe_enable_driver; tfm = crypto_alloc_shash("crc32", 0, 0); if (IS_ERR(tfm)) { @@ -1321,6 +1331,10 @@ int rxe_register_device(struct rxe_dev *rxe) err1: crypto_free_shash(rxe->tfm); + /* + * Note that rxe may be invalid at this point if another thread + * unregistered it. + */ return err; } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index ad7547c7ffc2..a9a4e15fd0ad 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2317,6 +2317,11 @@ struct ib_device { struct ib_rwq_ind_table_init_attr *init_attr, struct ib_udata *udata); int (*destroy_rwq_ind_table)(struct ib_rwq_ind_table *wq_ind_table); + /* + * Called after the device becomes registered, before clients are + * attached + */ + int (*enable_driver)(struct ib_device *device); /** * rdma netdev operation * From patchwork Tue Aug 25 18:12:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1351214 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BbcY72Fz7z9sTR; Wed, 26 Aug 2020 04:12:59 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kAdRI-0002BL-BE; Tue, 25 Aug 2020 18:12:56 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kAdRG-0002A1-6P for kernel-team@lists.ubuntu.com; Tue, 25 Aug 2020 18:12:54 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kAdRF-000712-Sr for kernel-team@lists.ubuntu.com; Tue, 25 Aug 2020 18:12:54 +0000 Received: by mail-qt1-f198.google.com with SMTP id w1so7390376qto.4 for ; Tue, 25 Aug 2020 11:12:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DwottlVd85Icdrr7lU3JMshnbereMSPzOJ2HGIdBCls=; b=iSm9+e1BkfOWmxhoaqZGWbSwH/+TM8N6CuLtTM6pcspr6qP3cUODbFMthzB0+zL+Iz F7jmq8bz/sZdWdBC+pbDo2R9aG9rACybJDjqMD3uDkGF+518Pxr5S+Wa8GxYSqet/9Aw SjLOUd600vaq7/tZkGq3YsoXP1Jeyr4PlipoovZssX2UPLmRxOklt4UhpbGKqGk7RnAF 4p/5+y/UQD4ERZDpOMH/ygPhiK6v49oJKvKqH63UdROUaJ07oZ+Tt3Uehmwwyu4Tk05o sfTSPRvpzeGyNU/Y5po6LhO9mtuA/HNPxhDVohmPZ0lGqLiJYcgBUWmgVaOwRRve3GzI c+Zg== X-Gm-Message-State: AOAM533vMHfiH/jpWUviYIRBS2nyGUSgkUPmlvTpFWSV/B/O6I3iMJKu imFtVpjqYz+wgI2wXEQrtxMfPxANNy3GLYlbXbFvwgNBe6DIjqCxIHrP06dYak1yC1PdjYN1Bh9 MYvPhwWGNyUF5SxsQ4UXCLmc00rzp1PLeLcHQyAWg X-Received: by 2002:a37:e89:: with SMTP id 131mr10290301qko.279.1598379172625; Tue, 25 Aug 2020 11:12:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyMxxjl8xhHtHLbAvYK1BloLGkJz5WKs9ulz6hxl2dcBfrWNcXEnx8L3DGdq5mTyMSp0UFTcQ== X-Received: by 2002:a37:e89:: with SMTP id 131mr10290279qko.279.1598379172308; Tue, 25 Aug 2020 11:12:52 -0700 (PDT) Received: from localhost.localdomain ([201.82.186.200]) by smtp.gmail.com with ESMTPSA id r48sm881332qtb.26.2020.08.25.11.12.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:12:51 -0700 (PDT) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [bionic:linux-azure-4.15][PATCH 2/3] IB/mlx5: Align usage of QP1 create flags with rest of mlx5 defines Date: Tue, 25 Aug 2020 15:12:27 -0300 Message-Id: <20200825181228.3274075-3-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200825181228.3274075-1-marcelo.cerri@canonical.com> References: <20200825180923.3254380-1-marcelo.cerri@canonical.com> <20200825181228.3274075-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Michael Guralnik BugLink: https://bugs.launchpad.net/bugs/1874503 There is little value in keeping separate function for one flag, provide it directly like any other mlx5 define. Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org Signed-off-by: Michael Guralnik Signed-off-by: Leon Romanovsky Reviewed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe (cherry picked from commit 3f89b01f4bbab2dcd1a6e7e31e64faacb448e3be) Signed-off-by: Sultan Alsawaf Signed-off-by: Marcelo Henrique Cerri --- drivers/infiniband/hw/mlx5/gsi.c | 2 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 7 +------ drivers/infiniband/hw/mlx5/qp.c | 8 ++++---- 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/gsi.c b/drivers/infiniband/hw/mlx5/gsi.c index 262c18b2f525..153493b56a69 100644 --- a/drivers/infiniband/hw/mlx5/gsi.c +++ b/drivers/infiniband/hw/mlx5/gsi.c @@ -263,7 +263,7 @@ static struct ib_qp *create_gsi_ud_qp(struct mlx5_ib_gsi_qp *gsi) }, .sq_sig_type = gsi->sq_sig_type, .qp_type = IB_QPT_UD, - .create_flags = mlx5_ib_create_qp_sqpn_qp1(), + .create_flags = MLX5_IB_QP_CREATE_SQPN_QP1, }; return ib_create_qp(pd, &init_attr); diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index c182784966a5..e941ea5a1f06 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -218,12 +218,7 @@ struct mlx5_ib_flow_db { * These flags are intended for internal use by the mlx5_ib driver, and they * rely on the range reserved for that use in the ib_qp_create_flags enum. */ - -/* Create a UD QP whose source QP number is 1 */ -static inline enum ib_qp_create_flags mlx5_ib_create_qp_sqpn_qp1(void) -{ - return IB_QP_CREATE_RESERVED_START; -} +#define MLX5_IB_QP_CREATE_SQPN_QP1 IB_QP_CREATE_RESERVED_START struct wr_list { u16 opcode; diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index daed5441c9c4..3c2ac2baa858 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -908,7 +908,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev, IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK | IB_QP_CREATE_IPOIB_UD_LSO | IB_QP_CREATE_NETIF_QP | - mlx5_ib_create_qp_sqpn_qp1())) + MLX5_IB_QP_CREATE_SQPN_QP1)) return -EINVAL; if (init_attr->qp_type == MLX5_IB_QPT_REG_UMR) @@ -955,7 +955,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev, MLX5_SET(qpc, qpc, fre, 1); MLX5_SET(qpc, qpc, rlky, 1); - if (init_attr->create_flags & mlx5_ib_create_qp_sqpn_qp1()) { + if (init_attr->create_flags & MLX5_IB_QP_CREATE_SQPN_QP1) { MLX5_SET(qpc, qpc, deth_sqpn, 1); qp->flags |= MLX5_IB_QP_SQPN_QP1; } @@ -1704,7 +1704,7 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, return -EINVAL; } if (init_attr->create_flags & - mlx5_ib_create_qp_sqpn_qp1()) { + MLX5_IB_QP_CREATE_SQPN_QP1) { mlx5_ib_dbg(dev, "user-space is not allowed to create UD QPs spoofing as QP1\n"); return -EINVAL; } @@ -4707,7 +4707,7 @@ int mlx5_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, if (qp->flags & MLX5_IB_QP_MANAGED_RECV) qp_init_attr->create_flags |= IB_QP_CREATE_MANAGED_RECV; if (qp->flags & MLX5_IB_QP_SQPN_QP1) - qp_init_attr->create_flags |= mlx5_ib_create_qp_sqpn_qp1(); + qp_init_attr->create_flags |= MLX5_IB_QP_CREATE_SQPN_QP1; qp_init_attr->sq_sig_type = qp->sq_signal_bits & MLX5_WQE_CTRL_CQ_UPDATE ? IB_SIGNAL_ALL_WR : IB_SIGNAL_REQ_WR; From patchwork Tue Aug 25 18:12:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1351215 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BbcY93VzCz9sTN; Wed, 26 Aug 2020 04:13:01 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kAdRK-0002Cg-J9; Tue, 25 Aug 2020 18:12:58 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kAdRI-0002B5-32 for kernel-team@lists.ubuntu.com; Tue, 25 Aug 2020 18:12:56 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kAdRH-00071D-Ob for kernel-team@lists.ubuntu.com; Tue, 25 Aug 2020 18:12:55 +0000 Received: by mail-qt1-f198.google.com with SMTP id r24so10024337qtu.3 for ; Tue, 25 Aug 2020 11:12:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=F9wvJLrc1AzKKaaH9atA7OUPC3vF7oRAkgXpSrAcnrQ=; b=lLbS5gMLD4vce0oZLFVDMdbIfNLjNMjpsYfLVL6o0NoJ7hUNukd3u8D9D1RmwzUyCv RgCKHG7V4WPqg9TFbGZ03TBv1J2h7OeGrwIzoB54juFb9kowNYyedIfh6T+4Ei6S2ChG JcdRV9/T3RaOlf37marJG5DGRT1pINcOhJnzpjvZPw/nblBhJHLECNrPOhFFQJGybr3x BXzKWcD35lZ3p/lWz5+j98V7JvHMYd5A3MeuYRfbogBBKSfmpxUP7iJInKgNxUG8ee3t ol3+wo7AYoA57SOkrBXZyOxyFFpQxItvJwYvHm1cyktDZIC+aufTh7Dx1NLfvLkjl0Fl Hu8Q== X-Gm-Message-State: AOAM530LN0ETv7cCtOg8Kwvd1XZwUXY8dUCCXTIIq+FWVko4blnsjgWe lC3wBvGzAvNjHUiLoARKR/3TTyirciAok/C3fnNXxQeAVD7Chzlil/m8Nlp/E7pXK9Q1VvAQDoz Ew4EAg1WKDXrK+PDzQFrZhhSUh08LNxl8kzW2dQgx X-Received: by 2002:ae9:c017:: with SMTP id u23mr10700253qkk.454.1598379174368; Tue, 25 Aug 2020 11:12:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoPMHhydsfZH1t+4hT5ik9Q0plv2tBewZnKQooLY4WfD4imw6X04ektt4jsnZ0I43B+N5xhQ== X-Received: by 2002:ae9:c017:: with SMTP id u23mr10700216qkk.454.1598379173933; Tue, 25 Aug 2020 11:12:53 -0700 (PDT) Received: from localhost.localdomain ([201.82.186.200]) by smtp.gmail.com with ESMTPSA id r48sm881332qtb.26.2020.08.25.11.12.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:12:53 -0700 (PDT) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [bionic:linux-azure-4.15][PATCH 3/3] IB/mlx5: Test write combining support Date: Tue, 25 Aug 2020 15:12:28 -0300 Message-Id: <20200825181228.3274075-4-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200825181228.3274075-1-marcelo.cerri@canonical.com> References: <20200825180923.3254380-1-marcelo.cerri@canonical.com> <20200825181228.3274075-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Michael Guralnik BugLink: https://bugs.launchpad.net/bugs/1874503 Linux can run in all sorts of physical machines and VMs where write combining may or may not be supported. Currently there is no way to reliably tell if the system supports WC, or not. The driver uses WC to optimize posting work to the HCA, and getting this wrong in either direction can cause a significant performance loss. Add a test in mlx5_ib initialization process to test whether write-combining is supported on the machine. The test will run as part of the enable_driver callback to ensure that the test runs after the device is setup and can create and modify the QP needed, but runs before the device is exposed to the users. The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame is different from the WQE in memory, requesting CQE only on the BlueFlame WQE. By checking whether we received a completion on one of these WQEs we can know if BlueFlame succeeded and this write-combining must be supported. Change reporting of BlueFlame support to be dependent on write-combining support instead of the FW's guess as to what the machine can do. Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org Signed-off-by: Michael Guralnik Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe (backported from commit 11f552e21755cb6f804572243a1502b6bbd008dd) Signed-off-by: Sultan Alsawaf Signed-off-by: Marcelo Henrique Cerri --- drivers/infiniband/hw/mlx5/main.c | 15 +- drivers/infiniband/hw/mlx5/mem.c | 197 +++++++++++++++++++++++++++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 + drivers/infiniband/hw/mlx5/qp.c | 6 +- 4 files changed, 220 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 4176d2b5b62e..c4f389edd2d2 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1397,7 +1397,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, return ERR_PTR(-EINVAL); resp.qp_tab_size = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp); - if (mlx5_core_is_pf(dev->mdev) && MLX5_CAP_GEN(dev->mdev, bf)) + if (dev->wc_support) resp.bf_reg_size = 1 << MLX5_CAP_GEN(dev->mdev, log_bf_reg_size); resp.cache_line_size = cache_line_size(); resp.max_sq_desc_sz = MLX5_CAP_GEN(dev->mdev, max_wqe_sz_sq); @@ -4072,6 +4072,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev->ib_dev.get_port_immutable = mlx5_port_immutable; dev->ib_dev.get_dev_fw_str = get_dev_fw_str; dev->ib_dev.get_vector_affinity = mlx5_ib_get_vector_affinity; + dev->ib_dev.enable_driver = mlx5_ib_enable_driver; if (MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads)) dev->ib_dev.alloc_rdma_netdev = mlx5_ib_alloc_rdma_netdev; @@ -4244,6 +4245,18 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) return NULL; } +int mlx5_ib_enable_driver(struct ib_device *dev) +{ + struct mlx5_ib_dev *mdev = to_mdev(dev); + int ret; + + ret = mlx5_ib_test_wc(mdev); + mlx5_ib_dbg(mdev, "Write-Combining %s", + mdev->wc_support ? "supported" : "not supported"); + + return ret; +} + static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context) { struct mlx5_ib_dev *dev = context; diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c index f3dbd75a0a96..53d91073e999 100644 --- a/drivers/infiniband/hw/mlx5/mem.c +++ b/drivers/infiniband/hw/mlx5/mem.c @@ -34,6 +34,7 @@ #include #include #include "mlx5_ib.h" +#include /* @umem: umem object to scan * @addr: ib virtual address requested by the user @@ -232,3 +233,199 @@ int mlx5_ib_get_buf_offset(u64 addr, int page_shift, u32 *offset) *offset = buf_off >> ilog2(off_size); return 0; } + +#define WR_ID_BF 0xBF +#define WR_ID_END 0xBAD +#define TEST_WC_NUM_WQES 255 +#define TEST_WC_POLLING_MAX_TIME_JIFFIES msecs_to_jiffies(100) +static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id, + bool signaled) +{ + struct mlx5_ib_qp *qp = to_mqp(ibqp); + struct mlx5_wqe_ctrl_seg ctrl = {}; + struct mlx5_bf *bf = &qp->bf; + __be32 mmio_wqe[16] = {}; + unsigned long flags; + unsigned int idx; + int i; + + if (unlikely(dev->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)) + return -EIO; + + spin_lock_irqsave(&qp->sq.lock, flags); + + idx = qp->sq.cur_post & (qp->sq.wqe_cnt - 1); + + ctrl.fm_ce_se = signaled ? MLX5_WQE_CTRL_CQ_UPDATE : 0; + ctrl.opmod_idx_opcode = + cpu_to_be32(((u32)(qp->sq.cur_post) << 8) | MLX5_OPCODE_NOP); + ctrl.qpn_ds = cpu_to_be32((sizeof(struct mlx5_wqe_ctrl_seg) / 16) | + (qp->trans_qp.base.mqp.qpn << 8)); + + qp->sq.wrid[idx] = wr_id; + qp->sq.w_list[idx].opcode = MLX5_OPCODE_NOP; + qp->sq.wqe_head[idx] = qp->sq.head + 1; + qp->sq.cur_post += DIV_ROUND_UP(sizeof(struct mlx5_wqe_ctrl_seg), + MLX5_SEND_WQE_BB); + qp->sq.w_list[idx].next = qp->sq.cur_post; + qp->sq.head++; + + memcpy(mmio_wqe, &ctrl, sizeof(ctrl)); + ((struct mlx5_wqe_ctrl_seg *)&mmio_wqe)->fm_ce_se |= + MLX5_WQE_CTRL_CQ_UPDATE; + + /* Make sure that descriptors are written before + * updating doorbell record and ringing the doorbell + */ + wmb(); + + qp->db.db[MLX5_SND_DBR] = cpu_to_be32(qp->sq.cur_post); + + /* Make sure doorbell record is visible to the HCA before + * we hit doorbell + */ + wmb(); + for (i = 0; i < 8; i++) + mlx5_write64(&mmio_wqe[i * 2], + bf->bfreg->map + bf->offset + i * 8, NULL); + + bf->offset ^= bf->buf_size; + + spin_unlock_irqrestore(&qp->sq.lock, flags); + + return 0; +} + +static int test_wc_poll_cq_result(struct mlx5_ib_dev *dev, struct ib_cq *cq) +{ + int ret; + struct ib_wc wc = {}; + unsigned long end = jiffies + TEST_WC_POLLING_MAX_TIME_JIFFIES; + + do { + ret = ib_poll_cq(cq, 1, &wc); + if (ret < 0 || wc.status) + return ret < 0 ? ret : -EINVAL; + if (ret) + break; + } while (!time_after(jiffies, end)); + + if (!ret) + return -ETIMEDOUT; + + if (wc.wr_id != WR_ID_BF) + ret = 0; + + return ret; +} + +static int test_wc_do_send(struct mlx5_ib_dev *dev, struct ib_qp *qp) +{ + int err, i; + + for (i = 0; i < TEST_WC_NUM_WQES; i++) { + err = post_send_nop(dev, qp, WR_ID_BF, false); + if (err) + return err; + } + + return post_send_nop(dev, qp, WR_ID_END, true); +} + +int mlx5_ib_test_wc(struct mlx5_ib_dev *dev) +{ + struct ib_cq_init_attr cq_attr = { .cqe = TEST_WC_NUM_WQES + 1 }; + int port_type_cap = MLX5_CAP_GEN(dev->mdev, port_type); + struct ib_qp_init_attr qp_init_attr = { + .cap = { .max_send_wr = TEST_WC_NUM_WQES }, + .qp_type = IB_QPT_UD, + .sq_sig_type = IB_SIGNAL_REQ_WR, + .create_flags = MLX5_IB_QP_CREATE_WC_TEST, + }; + struct ib_qp_attr qp_attr = { .port_num = 1 }; + struct ib_device *ibdev = &dev->ib_dev; + struct ib_qp *qp; + struct ib_cq *cq; + struct ib_pd *pd; + int ret; + + if (!MLX5_CAP_GEN(dev->mdev, bf)) + return 0; + + if (!dev->mdev->roce.roce_en && + port_type_cap == MLX5_CAP_PORT_TYPE_ETH) { + if (mlx5_core_is_pf(dev->mdev)) + dev->wc_support = true; + return 0; + } + + ret = mlx5_alloc_bfreg(dev->mdev, &dev->wc_bfreg, true, false); + if (ret) + goto print_err; + + if (!dev->wc_bfreg.wc) + goto out1; + + pd = ib_alloc_pd(ibdev, 0); + if (IS_ERR(pd)) { + ret = PTR_ERR(pd); + goto out1; + } + + cq = ib_create_cq(ibdev, NULL, NULL, NULL, &cq_attr); + if (IS_ERR(cq)) { + ret = PTR_ERR(cq); + goto out2; + } + + qp_init_attr.recv_cq = cq; + qp_init_attr.send_cq = cq; + qp = ib_create_qp(pd, &qp_init_attr); + if (IS_ERR(qp)) { + ret = PTR_ERR(qp); + goto out3; + } + + qp_attr.qp_state = IB_QPS_INIT; + ret = ib_modify_qp(qp, &qp_attr, + IB_QP_STATE | IB_QP_PORT | IB_QP_PKEY_INDEX | + IB_QP_QKEY); + if (ret) + goto out4; + + qp_attr.qp_state = IB_QPS_RTR; + ret = ib_modify_qp(qp, &qp_attr, IB_QP_STATE); + if (ret) + goto out4; + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_SQ_PSN); + if (ret) + goto out4; + + ret = test_wc_do_send(dev, qp); + if (ret < 0) + goto out4; + + ret = test_wc_poll_cq_result(dev, cq); + if (ret > 0) { + dev->wc_support = true; + ret = 0; + } + +out4: + ib_destroy_qp(qp); +out3: + ib_destroy_cq(cq); +out2: + ib_dealloc_pd(pd); +out1: + mlx5_free_bfreg(dev->mdev, &dev->wc_bfreg); +print_err: + if (ret) + mlx5_ib_err( + dev, + "Error %d while trying to test write-combining support\n", + ret); + return ret; +} diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index e941ea5a1f06..b8447a857b14 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -219,6 +219,7 @@ struct mlx5_ib_flow_db { * rely on the range reserved for that use in the ib_qp_create_flags enum. */ #define MLX5_IB_QP_CREATE_SQPN_QP1 IB_QP_CREATE_RESERVED_START +#define MLX5_IB_QP_CREATE_WC_TEST (IB_QP_CREATE_RESERVED_START << 1) struct wr_list { u16 opcode; @@ -714,6 +715,7 @@ struct mlx5_ib_dev { */ struct mutex cap_mask_mutex; bool ib_active; + u8 wc_support:1; struct umr_common umrc; /* sync used page count stats */ @@ -740,6 +742,7 @@ struct mlx5_ib_dev { /* Array with num_ports elements */ struct mlx5_ib_port *port; struct mlx5_sq_bfreg bfreg; + struct mlx5_sq_bfreg wc_bfreg; struct mlx5_sq_bfreg fp_bfreg; struct mlx5_ib_delay_drop delay_drop; struct mlx5_ib_dbg_cc_params *dbg_cc_params; @@ -1115,4 +1118,6 @@ static inline int get_num_uars(struct mlx5_ib_dev *dev, return get_uars_per_sys_page(dev, bfregi->lib_uar_4k) * bfregi->num_sys_pages; } +int mlx5_ib_enable_driver(struct ib_device *dev); +int mlx5_ib_test_wc(struct mlx5_ib_dev *dev); #endif /* MLX5_IB_H */ diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 3c2ac2baa858..ff4eeac25348 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -908,11 +908,14 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev, IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK | IB_QP_CREATE_IPOIB_UD_LSO | IB_QP_CREATE_NETIF_QP | - MLX5_IB_QP_CREATE_SQPN_QP1)) + MLX5_IB_QP_CREATE_SQPN_QP1 | + MLX5_IB_QP_CREATE_WC_TEST)) return -EINVAL; if (init_attr->qp_type == MLX5_IB_QPT_REG_UMR) qp->bf.bfreg = &dev->fp_bfreg; + else if (init_attr->create_flags & MLX5_IB_QP_CREATE_WC_TEST) + qp->bf.bfreg = &dev->wc_bfreg; else qp->bf.bfreg = &dev->bfreg; @@ -4276,7 +4279,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, * we hit doorbell */ wmb(); - /* currently we support only regular doorbells */ mlx5_write64((__be32 *)ctrl, bf->bfreg->map + bf->offset, NULL); /* Make sure doorbells don't leak out of SQ spinlock * and reach the HCA out of order.