From patchwork Wed Sep 16 00:41:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver O'Halloran X-Patchwork-Id: 1364828 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BrhBl597Pz9sTQ for ; Wed, 16 Sep 2020 10:42:23 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=DEujOjeq; dkim-atps=neutral Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4BrhBl3vKkzDqQr for ; Wed, 16 Sep 2020 10:42:23 +1000 (AEST) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::644; helo=mail-pl1-x644.google.com; envelope-from=oohall@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=DEujOjeq; dkim-atps=neutral Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4BrhBT4wYMzDqGg; Wed, 16 Sep 2020 10:42:09 +1000 (AEST) Received: by mail-pl1-x644.google.com with SMTP id f1so2231977plo.13; Tue, 15 Sep 2020 17:42:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=7k+4gfkg6mObjxlh//kDjpxtcqtTUwAxfINQr8gKfoU=; b=DEujOjeq1bJRhB2QGUPs6CyNLHu/6ap/cD/YeZeniJgAQWuWf1ymoCJR0Nuica0iiU vXy1EvtlJc67/z86qST54EnIYH0R4p17r9h8GwwgI0b1hm1oaPEHy16PQfFZv30bMvME KQmywSPWoFoLwwIiG5Z4M2OHSTBEncONfoUfIV1QB5fiHGNJkbgt/3rC6cyk9jVT75PJ CyTPMslFh9oXWKnJwmPnMbLF07+v4SBvfBM5Oy+Pfo67bR1PjgMDsDxp1WXAHASogS10 EIolQNvBehyvh4o1OnjrfKxb7LqHHe92B/Z+OX7v3/9M9AecZ3/yXoprUWXxEjk1SAag dzbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=7k+4gfkg6mObjxlh//kDjpxtcqtTUwAxfINQr8gKfoU=; b=R2I9tGP76KQ+N/p25uUMUtqQbuCzr+t8rVjRQOSMjXr0Im6yu407imaK8euSjQwYZm rBJJ+Hs65Q8sBsmNDFYzGJTmto4YK6qsaK0I0p4TfVR0NFf2Ue3uzJsHaYZRA9HIx19d fQT6pygqFQ5GrKtPxnyo5aNMwMxhqBUaH+vXfi+Mt+tzmPQ7kp9/JzP6gKSUZ6Wv+mtR lgfDzkDoInEXBtHsJgNxo6QxkVkylSczPtK3g3J4vWkx1gH5ufAdfoQqPZpA+Hkc0EdB +YLhlDu9DPaCgQ0+lca3x99lw+Ncc4Ayop21zjTnwjm3QkCyhIcCb7EjQHtm89qm5g4u 0dBg== X-Gm-Message-State: AOAM531jQ5x6yeXjraBbi6vPS/oKGxklppnqntQOEG3hK9750n1LOvMK O7oafxjEnyMCsgHXEqMOgPm38tCaDWsjTw== X-Google-Smtp-Source: ABdhPJxC/18ENTiwLaBLZwELo7oKxcFlGkXK9Yjkt9/IIdDlf/E84uhPa5jtOC8DF281C7EPOdPJ0g== X-Received: by 2002:a17:902:bf43:b029:d1:e651:fd83 with SMTP id u3-20020a170902bf43b02900d1e651fd83mr3663966pls.46.1600216925297; Tue, 15 Sep 2020 17:42:05 -0700 (PDT) Received: from localhost.ibm.com (194-193-34-182.tpgi.com.au. [194.193.34.182]) by smtp.gmail.com with ESMTPSA id l141sm14619063pfd.47.2020.09.15.17.42.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Sep 2020 17:42:04 -0700 (PDT) From: Oliver O'Halloran To: skiboot@lists.ozlabs.org Date: Wed, 16 Sep 2020 10:41:50 +1000 Message-Id: <20200916004150.2312623-1-oohall@gmail.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Subject: [Skiboot] [PATCH] opal-prd: Have a worker process handle page offlining X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: skiboot-stable@lists.ozlabs.org Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" The memory_error() hservice interface expects the memory_error() call to just accept the offline request and return without actually offlining the memory. Currently we will attempt to offline the marked pages before returning to HBRT which can result in an excessively long time spent in the memory_error() hservice call which blocks HBRT from processing other errors. Fix this by adding a worker process which performs the page offlining via the sysfs memory error interfaces. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran --- external/opal-prd/opal-prd.c | 85 ++++++++++++++++++++++++++++-------- 1 file changed, 66 insertions(+), 19 deletions(-) diff --git a/external/opal-prd/opal-prd.c b/external/opal-prd/opal-prd.c index 33ea5f5a8f6e..f2861611fe7a 100644 --- a/external/opal-prd/opal-prd.c +++ b/external/opal-prd/opal-prd.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -701,13 +702,40 @@ out: return rc; } +static int memory_error_worker(const char *sysfsfile, const char *type, + uint64_t i_start_addr, uint64_t i_endAddr) +{ + int memfd, rc, n, ret = 0; + char buf[ADDR_STRING_SZ]; + uint64_t addr; + + memfd = open(sysfsfile, O_WRONLY); + if (memfd < 0) { + pr_log(LOG_CRIT, "MEM: Failed to offline memory! " + "Unable to open sysfs node %s: %m", sysfsfile); + return -1; + } + + for (addr = i_start_addr; addr <= i_endAddr; addr += ctx->page_size) { + n = snprintf(buf, ADDR_STRING_SZ, "0x%lx", addr); + rc = write(memfd, buf, n); + if (rc != n) { + pr_log(LOG_CRIT, "MEM: Failed to offline memory! " + "page addr: %016lx type: %s: %m", + addr, type); + ret = rc; + } + } + + close(memfd); + return ret; +} + int hservice_memory_error(uint64_t i_start_addr, uint64_t i_endAddr, enum MemoryError_t i_errorType) { const char *sysfsfile, *typestr; - char buf[ADDR_STRING_SZ]; - int memfd, rc, n, ret = 0; - uint64_t addr; + pid_t pid; switch(i_errorType) { case MEMORY_ERROR_CE: @@ -727,26 +755,21 @@ int hservice_memory_error(uint64_t i_start_addr, uint64_t i_endAddr, pr_log(LOG_ERR, "MEM: Memory error: range %016lx-%016lx, type: %s", i_start_addr, i_endAddr, typestr); + /* + * HBRT expects the memory offlining process to happen in the background + * after the notification is delivered. + */ + pid = fork(); + if (pid > 0) + exit(memory_error_worker(sysfsfile, typestr, i_start_addr, i_endAddr)); - memfd = open(sysfsfile, O_WRONLY); - if (memfd < 0) { - pr_log(LOG_CRIT, "MEM: Failed to offline memory! " - "Unable to open sysfs node %s: %m", sysfsfile); + if (pid < 0) { + perror("MEM: unable to fork worker to offline memory!\n"); return -1; } - for (addr = i_start_addr; addr <= i_endAddr; addr += ctx->page_size) { - n = snprintf(buf, ADDR_STRING_SZ, "0x%lx", addr); - rc = write(memfd, buf, n); - if (rc != n) { - pr_log(LOG_CRIT, "MEM: Failed to offline memory! " - "page addr: %016lx type: %d: %m", - addr, i_errorType); - ret = rc; - } - } - - return ret; + pr_log(LOG_INFO, "MEM: forked off %d to handle mem error\n", pid); + return 0; } uint64_t hservice_get_interface_capabilities(uint64_t set) @@ -2003,6 +2026,13 @@ out_send: free(send_msg); } +volatile bool worker_terminated; + +void signchild_handler(int sig) +{ + worker_terminated = true; +} + static int run_attn_loop(struct opal_prd_ctx *ctx) { struct pollfd pollfds[2]; @@ -2049,6 +2079,23 @@ static int run_attn_loop(struct opal_prd_ctx *ctx) process_msgq(ctx); rc = poll(pollfds, 2, -1); + + if (worker_terminated) { + pid_t pid; + + worker_terminated = false; + do { + pid = waitpid(-1, NULL, WNOHANG); + if (pid > 0) { + pr_log(LOG_DEBUG, "reaped %d\n", pid); + } else if (rc == -1 && errno != ECHILD) { + pr_log(LOG_ERR, "error %m while reaping\n"); + break; + } + } while (pid > 0); + + continue; + } if (rc < 0) { pr_log(LOG_ERR, "FW: event poll failed: %m"); exit(EXIT_FAILURE);