From patchwork Tue Sep 12 04:38:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 812688 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xrskl5yfMz9s1h for ; Tue, 12 Sep 2017 14:48:47 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Kd1tPHZY"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3xrskl4MKNzDqZs for ; Tue, 12 Sep 2017 14:48:47 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Kd1tPHZY"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::243; helo=mail-pf0-x243.google.com; envelope-from=bsingharora@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Kd1tPHZY"; dkim-atps=neutral Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com [IPv6:2607:f8b0:400e:c00::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3xrsWy0YvZzDrJc for ; Tue, 12 Sep 2017 14:39:26 +1000 (AEST) Received: by mail-pf0-x243.google.com with SMTP id f84so5793851pfj.3 for ; Mon, 11 Sep 2017 21:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=USjdrazhYwnLQivP5P9GUvQcOAYun+BWk3bmmpiiTIw=; b=Kd1tPHZYHm92ZaUt4dNxsbHh71l1kHdBQSr0R4dtFUex/hZ5c9Vc+KibPi/27FidUz F/ZVRwJLZJ93P3VKv5vn3CAKC9Epd+SekvwFiZe6OdUXur77H+yrsienFAHE2nZIKPWN ByDkYmhWuK/gOVMTuicCOZjc0LOA93aXj/eNy4vqjtXhATFgbJvi5q+vCIGVffCZ4mae aoRNBZotTY+SxeMTVKRCBHRgY7H9RQF3HSr6pXU7+2QqSgJdt0VFBw22YAGY0Yh/Mpd7 BYBwEuePOdxHbZcKpg4WcMH7aQL96yMYk9Bl1rfCKl95D+gIyAfO0+Fq8f0vUxbenKSn qnRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=USjdrazhYwnLQivP5P9GUvQcOAYun+BWk3bmmpiiTIw=; b=aA4zsASG95EcqvaTQJiF4QSzWyYWNTatxQ3sLzacRfuhk5VtvwGxcp7wdkJh1I8QZF 0m1Q+O9pHHEJyE7YsMra7SCUuViz4jY4IWzsuCZ5AD3Rjy9KEsa9NNtq6KF3V3WEXfDn tamoMYQMMmbbJigGu49YAQic+lA46wFPE/+pogLuwp7lAFVcp8xmBdnCpYo8LoVI3+Mm CJ0OErZ26TYN0r/1bUT0byAvQxfsLrmpC0JE9i8RAjwlYmNp4CF/7ktvke+SjJE4K2kB 96Qh6896HtbAqB6N5AZNnD+LpbodiRobGipJtBHDiPWpaZ1+Xml79mX05Qp0wq/950zj uJfA== X-Gm-Message-State: AHPjjUh4qLTQ9htUroeBHdbryNJnP6lP7eStnFW7re8fj/o+WXRKXDai fjHff+zEMoXlZg== X-Google-Smtp-Source: ADKCNb5Boj712f8Pre3anQOIEdwsOpoMZnUKk+5CTVmdnsivwJIAsqtFX/1WZohiLQRNjfo9IipLWg== X-Received: by 10.98.32.92 with SMTP id g89mr13866165pfg.285.1505191164274; Mon, 11 Sep 2017 21:39:24 -0700 (PDT) Received: from firefly.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id c62sm1165226pfl.84.2017.09.11.21.39.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Sep 2017 21:39:23 -0700 (PDT) From: Balbir Singh To: mpe@ellerman.id.au Subject: [PATCH v1 4/4] powerpc/mce: hookup memory_failure for UE errors Date: Tue, 12 Sep 2017 14:38:59 +1000 Message-Id: <20170912043859.32473-5-bsingharora@gmail.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20170912043859.32473-1-bsingharora@gmail.com> References: <20170912043859.32473-1-bsingharora@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mahesh@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" If we are in user space and hit a UE error, we now have the basic infrastructure to walk the page tables and find out the effective address that was accessed, since the DAR is not valid. We use a work_queue content to hookup the bad pfn, any other context causes problems, since memory_failure itself can call into schedule() via lru_drain_ bits. We could probably poison the struct page to avoid a race between detection and taking corrective action. Signed-off-by: Balbir Singh --- arch/powerpc/kernel/mce.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index f41a75d..d9be9e4 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -39,11 +39,21 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event); static DEFINE_PER_CPU(int, mce_queue_count); static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue); +/* Queue for delayed MCE UE events. */ +static DEFINE_PER_CPU(int, mce_ue_count); +static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], + mce_ue_event_queue); + static void machine_check_process_queued_event(struct irq_work *work); +void machine_check_ue_event(struct machine_check_event *evt); +static void machine_process_ue_event(struct work_struct *work); + static struct irq_work mce_event_process_work = { .func = machine_check_process_queued_event, }; +DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -143,6 +153,7 @@ void save_mce_event(struct pt_regs *regs, long handled, if (phys_addr != ULONG_MAX) { mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; + machine_check_ue_event(mce); } } return; @@ -197,6 +208,26 @@ void release_mce_event(void) get_mce_event(NULL, true); } + +/* + * Queue up the MCE event which then can be handled later. + */ +void machine_check_ue_event(struct machine_check_event *evt) +{ + int index; + + index = __this_cpu_inc_return(mce_ue_count) - 1; + /* If queue is full, just return for now. */ + if (index >= MAX_MC_EVT) { + __this_cpu_dec(mce_ue_count); + return; + } + memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt)); + + /* Queue work to process this event later. */ + schedule_work(&mce_ue_event_work); +} + /* * Queue up the MCE event which then can be handled later. */ @@ -219,7 +250,32 @@ void machine_check_queue_event(void) /* Queue irq work to process this event later. */ irq_work_queue(&mce_event_process_work); } - +/* + * process pending MCE event from the mce event queue. This function will be + * called during syscall exit. + */ +static void machine_process_ue_event(struct work_struct *work) +{ + int index; + struct machine_check_event *evt; + + while (__this_cpu_read(mce_ue_count) > 0) { + index = __this_cpu_read(mce_ue_count) - 1; + evt = this_cpu_ptr(&mce_ue_event_queue[index]); +#ifdef CONFIG_MEMORY_FAILURE + /* + * This should probably queued elsewhere, but + * oh! well + */ + if (evt->error_type == MCE_ERROR_TYPE_UE) { + if (evt->u.ue_error.physical_address_provided) + memory_failure(evt->u.ue_error.physical_address, + SIGBUS, 0); + } +#endif + __this_cpu_dec(mce_ue_count); + } +} /* * process pending MCE event from the mce event queue. This function will be * called during syscall exit. @@ -227,6 +283,7 @@ void machine_check_queue_event(void) static void machine_check_process_queued_event(struct irq_work *work) { int index; + struct machine_check_event *evt; add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); @@ -236,8 +293,8 @@ static void machine_check_process_queued_event(struct irq_work *work) */ while (__this_cpu_read(mce_queue_count) > 0) { index = __this_cpu_read(mce_queue_count) - 1; - machine_check_print_event_info( - this_cpu_ptr(&mce_event_queue[index]), false); + evt = this_cpu_ptr(&mce_event_queue[index]); + machine_check_print_event_info(evt, false); __this_cpu_dec(mce_queue_count); } }