diff mbox series

[1/1] powerpc/pseries: Enable RAS hotplug events late

Message ID b3202827b5f22e9a7e8f145f83140698911641f7.1518394650.git.sam.bobroff@au1.ibm.com (mailing list archive)
State Accepted
Commit c9dccf1d074a67d36c510845f663980d69e3409b
Headers show
Series [1/1] powerpc/pseries: Enable RAS hotplug events late | expand

Commit Message

Sam Bobroff Feb. 12, 2018, 12:19 a.m. UTC
Currently if the kernel receives a memory hot-unplug event early
enough, it may get stuck in an infinite loop in
dissolve_free_huge_pages(). This appears as a stall just after:

pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY

It appears to be caused by "minimum_order" being uninitialized, due to
init_ras_IRQ() executing before hugetlb_init().

To correct this, extract the part of init_ras_IRQ() that enables
hotplug event processing and place it in the machine_late_initcall
phase, which is guaranteed to be after hugetlb_init() is called.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
---
 arch/powerpc/platforms/pseries/ras.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

Comments

Balbir Singh Feb. 12, 2018, 5:46 a.m. UTC | #1
On Mon, Feb 12, 2018 at 11:19 AM, Sam Bobroff <sam.bobroff@au1.ibm.com> wrote:
> Currently if the kernel receives a memory hot-unplug event early
> enough, it may get stuck in an infinite loop in
> dissolve_free_huge_pages(). This appears as a stall just after:
>
> pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY
>
> It appears to be caused by "minimum_order" being uninitialized, due to
> init_ras_IRQ() executing before hugetlb_init().
>
> To correct this, extract the part of init_ras_IRQ() that enables
> hotplug event processing and place it in the machine_late_initcall
> phase, which is guaranteed to be after hugetlb_init() is called.
>
> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/ras.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 81d8614e7379..ba284949af06 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -66,6 +66,26 @@ static int __init init_ras_IRQ(void)
>                 of_node_put(np);
>         }
>
> +       /* EPOW Events */
> +       np = of_find_node_by_path("/event-sources/epow-events");
> +       if (np != NULL) {
> +               request_event_sources_irqs(np, ras_epow_interrupt, "RAS_EPOW");
> +               of_node_put(np);
> +       }
> +
> +       return 0;
> +}
> +machine_subsys_initcall(pseries, init_ras_IRQ);
> +
> +/*
> + * Enable the hotplug interrupt late because processing them may touch other
> + * devices or systems (e.g. hugepages) that have not been initialized at the
> + * subsys stage.
> + */
> +int __init init_ras_hotplug_IRQ(void)
> +{
> +       struct device_node *np;
> +
>         /* Hotplug Events */
>         np = of_find_node_by_path("/event-sources/hot-plug-events");
>         if (np != NULL) {
> @@ -75,16 +95,9 @@ static int __init init_ras_IRQ(void)
>                 of_node_put(np);
>         }
>
> -       /* EPOW Events */
> -       np = of_find_node_by_path("/event-sources/epow-events");
> -       if (np != NULL) {
> -               request_event_sources_irqs(np, ras_epow_interrupt, "RAS_EPOW");
> -               of_node_put(np);
> -       }
> -
>         return 0;
>  }
> -machine_subsys_initcall(pseries, init_ras_IRQ);
> +machine_late_initcall(pseries, init_ras_hotplug_IRQ);
>

Seems reasonable to me, the other RAS events internal error and epow
seem like they are in the right place.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Michael Ellerman Feb. 14, 2018, 5:43 a.m. UTC | #2
On Mon, 2018-02-12 at 00:19:29 UTC, Sam Bobroff wrote:
> Currently if the kernel receives a memory hot-unplug event early
> enough, it may get stuck in an infinite loop in
> dissolve_free_huge_pages(). This appears as a stall just after:
> 
> pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY
> 
> It appears to be caused by "minimum_order" being uninitialized, due to
> init_ras_IRQ() executing before hugetlb_init().
> 
> To correct this, extract the part of init_ras_IRQ() that enables
> hotplug event processing and place it in the machine_late_initcall
> phase, which is guaranteed to be after hugetlb_init() is called.
> 
> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
> Acked-by: Balbir Singh <bsingharora@gmail.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/c9dccf1d074a67d36c510845f66398

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 81d8614e7379..ba284949af06 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -66,6 +66,26 @@  static int __init init_ras_IRQ(void)
 		of_node_put(np);
 	}
 
+	/* EPOW Events */
+	np = of_find_node_by_path("/event-sources/epow-events");
+	if (np != NULL) {
+		request_event_sources_irqs(np, ras_epow_interrupt, "RAS_EPOW");
+		of_node_put(np);
+	}
+
+	return 0;
+}
+machine_subsys_initcall(pseries, init_ras_IRQ);
+
+/*
+ * Enable the hotplug interrupt late because processing them may touch other
+ * devices or systems (e.g. hugepages) that have not been initialized at the
+ * subsys stage.
+ */
+int __init init_ras_hotplug_IRQ(void)
+{
+	struct device_node *np;
+
 	/* Hotplug Events */
 	np = of_find_node_by_path("/event-sources/hot-plug-events");
 	if (np != NULL) {
@@ -75,16 +95,9 @@  static int __init init_ras_IRQ(void)
 		of_node_put(np);
 	}
 
-	/* EPOW Events */
-	np = of_find_node_by_path("/event-sources/epow-events");
-	if (np != NULL) {
-		request_event_sources_irqs(np, ras_epow_interrupt, "RAS_EPOW");
-		of_node_put(np);
-	}
-
 	return 0;
 }
-machine_subsys_initcall(pseries, init_ras_IRQ);
+machine_late_initcall(pseries, init_ras_hotplug_IRQ);
 
 #define EPOW_SHUTDOWN_NORMAL				1
 #define EPOW_SHUTDOWN_ON_UPS				2