From patchwork Mon Dec 3 06:49:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rashmica Gupta X-Patchwork-Id: 1006678 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 437bG96xczz9s3C for ; Mon, 3 Dec 2018 17:49:53 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="gHG1x9Ic"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 437bG95GTxzDqYK for ; Mon, 3 Dec 2018 17:49:53 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="gHG1x9Ic"; dkim-atps=neutral X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::441; helo=mail-pf1-x441.google.com; envelope-from=rashmica.g@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="gHG1x9Ic"; dkim-atps=neutral Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 437bG130dZzDqXq for ; Mon, 3 Dec 2018 17:49:45 +1100 (AEDT) Received: by mail-pf1-x441.google.com with SMTP id w73so5857151pfk.10 for ; Sun, 02 Dec 2018 22:49:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=zkUckc8BBco6gBVv1FKujqgRLzq0hrWJuTGw7WDqq8Y=; b=gHG1x9IcI2KFGj0CmCzqhzzlvmQLfGSU+UvYVjwyvDw8hROhHuFN40vAuLvwEmBIR5 Jpi/Mek5dU0Z37Jc/ba9+SnWflD88FThg8dh6bQJjzUkVGHZLlP69YTfGFb12Jht2GT6 fFYN3WncZJEkcnLj+8tpExfspCMNYOtDspdTfbcHJdCxfP0wv8tB/0VzgI4mFAJaPa8J Ri2bv2VehYocVNqF+W2OTtEFEhyg24HNnK3hp70yDiMC8WfZMN6S2Eue8PX3DE/8WUks QNVKRMuZLTjm4yXp2kfKTYetU4u3m8r7myefloniPc/HhBVI9cZhFBjPNmvZUs5yumRy WV8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=zkUckc8BBco6gBVv1FKujqgRLzq0hrWJuTGw7WDqq8Y=; b=O+ROTyDp0x/XhqBRLjz3gwf5IbQFdthQjqrKyLDIybgnF6FKyDOmsXelsWx6+p3xOI CAU0k7ZkuMcHD0zrsnACRGrM6xF1qQwO062PrTfIacgOby+rql7hNyzSTIcU9OrE6ewb jtxpILVCyt2k4Zps3cMZxCcrmp2DgTEwbgDqQKwNWIhfQDjoG5PkIF35G48gdW4Ca4Yn eOPKLAEg04D1pODUYaH6coj/RFskMlLpGULHUx2OBraeQ1uIxeCOmXoxn7mwLqG0Ok5l WvSic6R21hPOsToVK2SmJs9H0dsUAariHaPt1MyJ5cLuDBZ0C0RwtdW14I8Ghm1jH6Fm hRWQ== X-Gm-Message-State: AA+aEWb5hUIdHg64xERtSdrqo5lMAYqovVEjVJvJAc8JLFm+ofrwE4Wh K8TUDkN0pBvIPZPKw8RHqIFRnv9M X-Google-Smtp-Source: AFSGD/VxbApjskc1ZmYb1DqEUdAMzWXnxvM/4Hjr6dwdqQlSWMTjKAHwFu7OpppVh93ogB8kkzY0rQ== X-Received: by 2002:a63:9306:: with SMTP id b6mr11798465pge.36.1543819783089; Sun, 02 Dec 2018 22:49:43 -0800 (PST) Received: from rashmica.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id s190sm16622920pfb.103.2018.12.02.22.49.39 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 02 Dec 2018 22:49:42 -0800 (PST) From: Rashmica Gupta To: skiboot@lists.ozlabs.org, aik@ozlabs.ru Date: Mon, 3 Dec 2018 17:49:32 +1100 Message-Id: <20181203064932.27598-1-rashmica.g@gmail.com> X-Mailer: git-send-email 2.17.2 Subject: [Skiboot] [PATCH v3] Add purging CPU L2 and L3 caches into NPU hreset. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alistair@popple.id.au, andrew.donnellan@au1.ibm.com MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" If a GPU is passed through to a guest and the guest unexpectedly terminates, there can be cache lines in CPUs that belong to the GPU. So purge the caches as part of the reset sequence. L1 is write through, so doesn't need to be purged. This also needs to be called if the guest reboots so call it in npu2_dev_cfg_exp_devcap(). The sequence to purge the L2 and L3 caches from the hw team: "L2 purge: (1) initiate purge putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TYPE L2CAC_FLUSH -all putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER ON -all (2) check this is off in all caches to know purge completed getspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_REG_BUSY -all (3) putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER OFF -all L3 purge: 1) Start the purge: putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_TTYPE FULL_PURGE -all putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ ON -all 2) Ensure that the purge has completed by checking the status bit: getspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ -all You should see it say OFF if it's done: p9n.ex k0:n0:s0:p00:c0 EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ OFF" Suggested-by: Alistair Popple Signed-off-by: Rashmica Gupta Reviewed-by: Alexey Kardashevskiy --- This is done synchronously for now as it doesn't seem to take *too* long (purging the L2 and L3 caches after building the 4.16 linux kernel on a p9 with 16 cores took 1.57 ms, 1.49ms and 1.46ms). hw/npu2.c | 135 +++++++++++++++++++++++++++++++++++++++++++- include/npu2-regs.h | 11 ++++ 2 files changed, 145 insertions(+), 1 deletion(-) diff --git a/hw/npu2.c b/hw/npu2.c index 30049f5b..9c0e6114 100644 --- a/hw/npu2.c +++ b/hw/npu2.c @@ -326,6 +326,136 @@ static int64_t npu2_dev_cfg_bar(void *dev, struct pci_cfg_reg_filter *pcrf, return npu2_cfg_read_bar(ndev, pcrf, offset, len, data); } +static int start_l2_purge(uint32_t chip_id, uint32_t core_id) +{ + int rc; + uint64_t addr = XSCOM_ADDR_P9_EX(core_id, L2_PRD_PURGE_CMD_REG); + + rc = xscom_write_mask(chip_id, addr, L2CAC_FLUSH, + L2_PRD_PURGE_CMD_TYPE_MASK); + if (!rc) + rc = xscom_write_mask(chip_id, addr, L2_PRD_PURGE_CMD_TRIGGER, + L2_PRD_PURGE_CMD_TRIGGER); + if (rc) + prlog(PR_ERR, "PURGE L2 on core 0x%x: XSCOM write_mask " + "failed %i\n", core_id, rc); + return rc; +} + +static int wait_l2_purge(uint32_t chip_id, uint32_t core_id) +{ + int rc; + unsigned long now = mftb(); + unsigned long end = now + msecs_to_tb(2); + uint64_t val = L2_PRD_PURGE_CMD_REG_BUSY; + uint64_t addr = XSCOM_ADDR_P9_EX(core_id, L2_PRD_PURGE_CMD_REG); + + while (val & L2_PRD_PURGE_CMD_REG_BUSY) { + rc = xscom_read(chip_id, addr, &val); + if (rc) { + prlog(PR_ERR, "PURGE L2 on core 0x%x: XSCOM read " + "failed %i\n", core_id, rc); + break; + } + if (!(val & L2_PRD_PURGE_CMD_REG_BUSY)) + break; + now = mftb(); + if (tb_compare(now, end) == TB_AAFTERB) { + prlog(PR_ERR, "PURGE L2 on core 0x%x timed out %i\n", + core_id, rc); + return OPAL_BUSY; + } + } + + /* We have to clear the trigger bit ourselves */ + val &= ~L2_PRD_PURGE_CMD_TRIGGER; + rc = xscom_write(chip_id, addr, val); + if (rc) + prlog(PR_ERR, "PURGE L2 on core 0x%x: XSCOM write failed %i\n", + core_id, rc); + return rc; +} + +static int start_l3_purge(uint32_t chip_id, uint32_t core_id) +{ + int rc; + uint64_t addr = XSCOM_ADDR_P9_EX(core_id, L3_PRD_PURGE_REG); + + rc = xscom_write_mask(chip_id, addr, L3_FULL_PURGE, + L3_PRD_PURGE_TTYPE_MASK); + if (!rc) + rc = xscom_write_mask(chip_id, addr, L3_PRD_PURGE_REQ, + L3_PRD_PURGE_REQ); + if (rc) + prlog(PR_ERR, "PURGE L3 on core 0x%x: XSCOM write_mask " + "failed %i\n", core_id, rc); + return rc; +} + +static int wait_l3_purge(uint32_t chip_id, uint32_t core_id) +{ + int rc; + unsigned long now = mftb(); + unsigned long end = now + msecs_to_tb(2); + uint64_t val = L3_PRD_PURGE_REQ; + uint64_t addr = XSCOM_ADDR_P9_EX(core_id, L3_PRD_PURGE_REG); + + /* Trigger bit is automatically set to zero when flushing is done */ + while (val & L3_PRD_PURGE_REQ) { + rc = xscom_read(chip_id, addr, &val); + if (rc) { + prlog(PR_ERR, "PURGE L3 on core 0x%x: XSCOM read " + "failed %i\n", core_id, rc); + break; + } + if (!(val & L3_PRD_PURGE_REQ)) + break; + now = mftb(); + if (tb_compare(now, end) == TB_AAFTERB) { + prlog(PR_ERR, "PURGE L3 on core 0x%x timed out %i\n", + core_id, rc); + return OPAL_BUSY; + } + } + return rc; +} + +static int64_t purge_l2_l3_caches(void) +{ + struct cpu_thread *t; + uint64_t core_id, prev_core_id = (uint64_t)-1; + + for_each_ungarded_cpu(t) { + /* Only need to do it once per core chiplet */ + core_id = pir_to_core_id(t->pir); + if (prev_core_id == core_id) + continue; + prev_core_id = core_id; + if (start_l2_purge(t->chip_id, core_id)) + goto out; + if (start_l3_purge(t->chip_id, core_id)) + goto out; + } + + prev_core_id = (uint64_t)-1; + for_each_ungarded_cpu(t) { + /* Only need to do it once per core chiplet */ + core_id = pir_to_core_id(t->pir); + if (prev_core_id == core_id) + continue; + prev_core_id = core_id; + + if (wait_l2_purge(t->chip_id, core_id)) + goto out; + if (wait_l3_purge(t->chip_id, core_id)) + goto out; + } + return OPAL_SUCCESS; +out: + prlog(PR_ERR, "Failed on core: 0x%llx\n", core_id); + return OPAL_BUSY_EVENT; +} + static int64_t npu2_dev_cfg_exp_devcap(void *dev, struct pci_cfg_reg_filter *pcrf __unused, uint32_t offset, uint32_t size, @@ -346,6 +476,9 @@ static int64_t npu2_dev_cfg_exp_devcap(void *dev, if (*data & PCICAP_EXP_DEVCTL_FUNC_RESET) npu2_dev_procedure_reset(ndev); + if (purge_l2_l3_caches()) + return OPAL_BUSY_EVENT; + return OPAL_PARTIAL; } @@ -1125,7 +1258,7 @@ static int64_t npu2_hreset(struct pci_slot *slot __unused) reset_ntl(ndev); } } - return OPAL_SUCCESS; + return purge_l2_l3_caches(); } static int64_t npu2_freset(struct pci_slot *slot __unused) diff --git a/include/npu2-regs.h b/include/npu2-regs.h index 10a28166..8273b2be 100644 --- a/include/npu2-regs.h +++ b/include/npu2-regs.h @@ -756,4 +756,15 @@ void npu2_scom_write(uint64_t gcid, uint64_t scom_base, #define OB3_ODL0_ENDPOINT_INFO 0xC010832 #define OB3_ODL1_ENDPOINT_INFO 0xC010833 +/* Registers and bits used to clear the L2 and L3 cache */ +#define L2_PRD_PURGE_CMD_REG 0x1080E +#define L2_PRD_PURGE_CMD_REG_BUSY 0x0040000000000000 +#define L2_PRD_PURGE_CMD_TYPE_MASK PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3) | PPC_BIT(4) +#define L2_PRD_PURGE_CMD_TRIGGER PPC_BIT(0) +#define L2CAC_FLUSH 0x0 +#define L3_PRD_PURGE_REG 0x1180E +#define L3_PRD_PURGE_REQ PPC_BIT(0) +#define L3_PRD_PURGE_TTYPE_MASK PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3) | PPC_BIT(4) +#define L3_FULL_PURGE 0x0 + #endif /* __NPU2_REGS_H */