From patchwork Wed Mar 25 10:41:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Barrat X-Patchwork-Id: 1261325 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48nQJk0nY6z9sQt for ; Wed, 25 Mar 2020 22:05:42 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 48nQJj4S1xzDqNL for ; Wed, 25 Mar 2020 22:05:41 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=fbarrat@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 48nPnF4FJjzDqZ1 for ; Wed, 25 Mar 2020 21:41:53 +1100 (AEDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02PAYmKQ175091 for ; Wed, 25 Mar 2020 06:41:49 -0400 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0b-001b2d01.pphosted.com with ESMTP id 2ywchy20cm-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 25 Mar 2020 06:41:49 -0400 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 25 Mar 2020 10:41:43 -0000 Received: from b06avi18878370.portsmouth.uk.ibm.com (9.149.26.194) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 25 Mar 2020 10:41:41 -0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 02PAfhFJ44368322 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 25 Mar 2020 10:41:43 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 51C24A4051; Wed, 25 Mar 2020 10:41:43 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00761A404D; Wed, 25 Mar 2020 10:41:43 +0000 (GMT) Received: from pic2.home (unknown [9.145.53.158]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 25 Mar 2020 10:41:42 +0000 (GMT) From: Frederic Barrat To: mikey@neuling.org, oohall@gmail.com, skiboot@lists.ozlabs.org Date: Wed, 25 Mar 2020 11:41:17 +0100 X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 20032510-4275-0000-0000-000003B2AF6C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20032510-4276-0000-0000-000038C7ED19 Message-Id: <20200325104117.64381-1-fbarrat@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-25_04:2020-03-24, 2020-03-25 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 lowpriorityscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 impostorscore=0 suspectscore=0 clxscore=1011 spamscore=0 malwarescore=0 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003250086 Subject: [Skiboot] [PATCH v3] hw/phb4: Tune GPU direct performance on witherspoon in PCI mode X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: clombard@linux.ibm.com, skiboot-stable@lists.ozlabs.org, andrew.donnellan@au1.ibm.com Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Good GPU direct performance on witherspoon, with a Mellanox adapter on the shared slot, requires to reallocate some dma engines within PEC2, "stealing" some from PHB4&5 and giving extras to PHB3. It's currently done when using CAPI mode. But the same is true if the adapter stays in PCI mode. In preparation for upcoming versions of MOFED, which may not use CAPI mode, this patch reallocates dma engines even in PCI mode for a series of Mellanox adapters that can be used with GPU direct, on witherspoon and on the shared slot only. The loss of dma engines for PHB4&5 on witherspoon has not shown problems in testing, as well as in current deployments where CAPI mode is used. Here is a comparison of the bandwidth numbers seen with the PHB in PCI mode (no CAPI) with and without this patch. Variations on smaller packet sizes can be attributed to jitter and are not that meaningful. # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.6.1 # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D) # Size Bandwidth (MB/s) Bandwidth (MB/s) # with patch without patch 1 1.29 1.48 2 2.66 3.04 4 5.34 5.93 8 10.68 11.86 16 21.39 23.71 32 42.78 49.15 64 85.43 97.67 128 170.82 196.64 256 385.47 383.02 512 774.68 755.54 1024 1535.14 1495.30 2048 2599.31 2561.60 4096 5192.31 5092.47 8192 9930.30 9566.90 16384 18189.81 16803.42 32768 24671.48 21383.57 65536 28977.71 24104.50 131072 31110.55 25858.95 262144 32180.64 26470.61 524288 32842.23 26961.93 1048576 33184.87 27217.38 2097152 33342.67 27338.08 Signed-off-by: Frederic Barrat Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan --- This is meant to be part of the next fix pack for witherspoon, so cc'ing stable Changelog: v3: skip virtual PHBs when looking for Mellanox cards (Oliver) v2: use same code from capi setup (Andrew) trigger setup from witherspoon platform file (Oliver) hw/phb4.c | 53 +++++++++++++++++++--------------- include/phb4.h | 2 ++ platforms/astbmc/witherspoon.c | 47 ++++++++++++++++++++++++++++++ 3 files changed, 78 insertions(+), 24 deletions(-) diff --git a/hw/phb4.c b/hw/phb4.c index f17625bb..60e797cf 100644 --- a/hw/phb4.c +++ b/hw/phb4.c @@ -809,6 +809,33 @@ static int64_t phb4_pcicfg_no_dstate(void *dev __unused, return OPAL_PARTIAL; } +void phb4_pec2_dma_engine_realloc(struct phb4 *p) +{ + uint64_t reg; + + /* + * Allocate 16 extra dma read engines to stack 0, to boost dma + * performance for devices on stack 0 of PEC2, i.e PHB3. + * It comes at a price of reduced read engine allocation for + * devices on stack 1 and 2. The engine allocation becomes + * 48/8/8 instead of the default 32/16/16. + * + * The reallocation magic value should be 0xffff0000ff008000, + * but per the PCI designers, dma engine 32 (bit 0) has a + * quirk, and 0x7fff80007F008000 has the same effect (engine + * 32 goes to PHB4). + */ + if (p->index != 3) /* shared slot on PEC2 */ + return; + + PHBINF(p, "Allocating an extra 16 dma read engines on PEC2 stack0\n"); + reg = 0x7fff80007F008000ULL; + xscom_write(p->chip_id, + p->pci_xscom + XPEC_PCI_PRDSTKOVR, reg); + xscom_write(p->chip_id, + p->pe_xscom + XPEC_NEST_READ_STACK_OVERRIDE, reg); +} + static void phb4_check_device_quirks(struct pci_device *dev) { /* Some special adapter tweaks for devices directly under the PHB */ @@ -4415,30 +4442,8 @@ static int64_t enable_capi_mode(struct phb4 *p, uint64_t pe_number, * dma-read engines allocations to maximize the DMA read performance */ if ((p->index == CAPP1_PHB_INDEX) && - (capp_eng & CAPP_MAX_DMA_READ_ENGINES)) { - - /* - * Allocate Additional 16/8 dma read engines to stack0/stack1 - * respectively. Read engines 0:31 are anyways always assigned - * to stack0. Also skip allocating DMA Read Engine-32 by - * enabling Bit[0] in XPEC_NEST_READ_STACK_OVERRIDE register. - * Enabling this bit seems cause a parity error reported in - * NFIR[1]-nonbar_pe. - */ - reg = 0x7fff80007F008000ULL; - - xscom_write(p->chip_id, p->pci_xscom + XPEC_PCI_PRDSTKOVR, reg); - xscom_write(p->chip_id, p->pe_xscom + - XPEC_NEST_READ_STACK_OVERRIDE, reg); - - /* Log this reallocation as it may impact dma performance of - * other slots connected to PEC2 - */ - PHBINF(p, "CAPP: Set %d dma-read engines for PEC2/stack-0\n", - 32 + __builtin_popcountll(reg & PPC_BITMASK(0, 31))); - PHBDBG(p, "CAPP: XPEC_NEST_READ_STACK_OVERRIDE: %016llx\n", - reg); - } + (capp_eng & CAPP_MAX_DMA_READ_ENGINES)) + phb4_pec2_dma_engine_realloc(p); /* PCI to PB data movement ignores the PB init signal. */ xscom_write_mask(p->chip_id, p->pe_xscom + XPEC_NEST_PBCQ_HW_CONFIG, diff --git a/include/phb4.h b/include/phb4.h index 6d5fd510..abba2d9c 100644 --- a/include/phb4.h +++ b/include/phb4.h @@ -257,4 +257,6 @@ static inline int phb4_get_opal_id(unsigned int chip_id, unsigned int index) return chip_id * PHB4_MAX_PHBS_PER_CHIP_P9P + index; } +void phb4_pec2_dma_engine_realloc(struct phb4 *p); + #endif /* __PHB4_H */ diff --git a/platforms/astbmc/witherspoon.c b/platforms/astbmc/witherspoon.c index 6387af48..39c3f161 100644 --- a/platforms/astbmc/witherspoon.c +++ b/platforms/astbmc/witherspoon.c @@ -192,6 +192,52 @@ static void witherspoon_shared_slot_fixup(void) } } +static int check_mlx_cards(struct phb *phb __unused, struct pci_device *dev, + void *userdata __unused) +{ + uint16_t mlx_cards[] = { + 0x1017, /* ConnectX-5 */ + 0x1019, /* ConnectX-5 Ex */ + 0x101b, /* ConnectX-6 */ + 0x101d, /* ConnectX-6 Dx */ + 0x101f, /* ConnectX-6 Lx */ + 0x1021, /* ConnectX-7 */ + }; + + if (PCI_VENDOR_ID(dev->vdid) == 0x15b3) { /* Mellanox */ + for (int i = 0; i < ARRAY_SIZE(mlx_cards); i++) { + if (mlx_cards[i] == PCI_DEVICE_ID(dev->vdid)) + return 1; + } + } + return 0; +} + +static void witherspoon_pci_probe_complete(void) +{ + struct pci_device *dev; + struct phb *phb; + struct phb4 *p; + + /* + * Reallocate dma engines between stacks in PEC2 if a Mellanox + * card is found on the shared slot, as it is required to get + * good GPU direct performance. + */ + for_each_phb(phb) { + /* skip the virtual PHBs */ + if (phb->phb_type != phb_type_pcie_v4) + continue; + p = phb_to_phb4(phb); + /* Keep only the first PHB on PEC2 */ + if (p->index != 3) + continue; + dev = pci_walk_dev(phb, NULL, check_mlx_cards, NULL); + if (dev) + phb4_pec2_dma_engine_realloc(p); + } +} + static void set_link_details(struct npu2 *npu, uint32_t link_index, uint32_t brick_index, enum npu2_dev_type type) { @@ -533,6 +579,7 @@ DECLARE_PLATFORM(witherspoon) = { .probe = witherspoon_probe, .init = astbmc_init, .pre_pci_fixup = witherspoon_shared_slot_fixup, + .pci_probe_complete = witherspoon_pci_probe_complete, .start_preload_resource = flash_start_preload_resource, .resource_loaded = flash_resource_loaded, .bmc = &bmc_plat_ast2500_openbmc,