From patchwork Fri May 11 13:58:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 912001 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40jBXT57Bmz9s31 for ; Fri, 11 May 2018 23:59:05 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 40jBXS471YzF2VT for ; Fri, 11 May 2018 23:59:04 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40jBXF6367zF2VQ for ; Fri, 11 May 2018 23:58:52 +1000 (AEST) Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4BDsma2134385 for ; Fri, 11 May 2018 09:58:50 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hwbv5a3wf-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 11 May 2018 09:58:50 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 11 May 2018 14:58:48 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 11 May 2018 14:58:46 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4BDwiFX62193716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 11 May 2018 13:58:44 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 52CC1A4051; Fri, 11 May 2018 14:50:21 +0100 (BST) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A3905A4040; Fri, 11 May 2018 14:50:20 +0100 (BST) Received: from jupiter.in.ibm.com (unknown [9.85.74.90]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 11 May 2018 14:50:20 +0100 (BST) From: Mahesh J Salgaonkar To: skiboot list Date: Fri, 11 May 2018 19:28:43 +0530 User-Agent: StGit/unknown-version MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18051113-0040-0000-0000-000004389A4D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18051113-0041-0000-0000-0000263CE451 Message-Id: <152604712300.701.13496749985386675066.stgit@jupiter.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-11_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805110132 Subject: [Skiboot] [PATCH] opal-prd: Do not error out on first failure for soft/hard offline. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jeremy Kerr Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar The memory errors (CEs and UEs) that are detected as part of background memory scrubbing are reported by PRD asynchronously to opal-prd along with affected memory ranges. hservice_memory_error() converts these ranges into page granularity before hooking up them to soft/hard offline-ing infrastructure. But the current implementation of hservice_memory_error() does not hookup all the pages to soft/hard offline-ing if any of the page offline action fails. e.g hard offline can fail for: - Pages that are not part of buddy managed pool. - Pages that are reserved by kernel using memblock_reserved() - Pages that are in use by kernel. But for the pages that are in use by user space application, the hard offline marks the page as hwpoison, sends SIGBUS signal to kill the affected application as recovery action and returns success. Hence, It is possible that some of the pages in that memory range are in use by application or free. By stopping on first error we loose the opportunity to hwpoison the subsequent pages which may be free or in use by application. This patch fixes this issue. Signed-off-by: Mahesh Salgaonkar --- external/opal-prd/opal-prd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/external/opal-prd/opal-prd.c b/external/opal-prd/opal-prd.c index bc092d186..e3b4439c6 100644 --- a/external/opal-prd/opal-prd.c +++ b/external/opal-prd/opal-prd.c @@ -701,7 +701,7 @@ int hservice_memory_error(uint64_t i_start_addr, uint64_t i_endAddr, { const char *sysfsfile, *typestr; char buf[ADDR_STRING_SZ]; - int memfd, rc, n; + int memfd, rc, n, ret = 0; uint64_t addr; switch(i_errorType) { @@ -737,11 +737,11 @@ int hservice_memory_error(uint64_t i_start_addr, uint64_t i_endAddr, pr_log(LOG_CRIT, "MEM: Failed to offline memory! " "page addr: %016lx type: %d: %m", addr, i_errorType); - return rc; + ret = rc; } } - return 0; + return ret; } uint64_t hservice_get_interface_capabilities(uint64_t set)