diff mbox

Prevent flushing of locked PROM ITLB entry (RED State Exception or hang on reboot)

Message ID 201407271539.40156.cat.schulze@alice-dsl.net
State Changes Requested
Delegated to: David Miller
Headers show

Commit Message

Christopher Alexander Tobias Schulze July 27, 2014, 1:39 p.m. UTC
On UltraSPARC systems, the PROM establishes a locked ITLB entry that needs
to be present on return to the PROM (which happens when the system reboots).
If the kernel tries to reboot with this ITLB entry not being present, a
RED State Exception is caused, which *may* cause a restart as well, but some
internal state may be corrupted. (It was observed that PCI resource information
from the PROM was badly damaged when a linux kernel was started after such a
RED State Exception, causing subsequent resource allocation errors in the
kernel.) In other cases, the machine might also hang after the RED State Exception.

The locked PROM ITLB entry is flushed by flush_tlb_kernel_range(start, end) when
the flushed interval contains the PROM region from 0xf0000000 to 0x100000000UL.
This seems to happen when __vunmap() is called and triggers a __purge_vmap_area_lazy()
call, where both virtual mappings below 0xf0000000 (kernel modules) and above
0x100000000UL (data mappings) are to be flushed. The kernel at present does not consider
that there might also be a PROM mapping in between, and forces all existing mappings
in the interval determined by __purge_vmap_area_lazy() to be removed.

The flushing of the locked PROM ITLB entry happens in our case already during the boot
process when modules are loaded and unloaded for probing purposes.

With this patch, the affected SunBlade 2000 was able to reboot without problems again.

The patch was originally developed against a 3.13 backport kernel from Debian.
Both the patch against 3.13 and a recent 3.16-rc6 are included below. Please note
that I could only test the 3.13 version as I do not have access to the affected
machine anymore.

PATCH 1 - KERNEL VERSION 3.13 #####################################################

Alexander Schulze

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 28, 2014, 11:02 p.m. UTC | #1
From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Date: Sun, 27 Jul 2014 15:39:39 +0200

As mentioned for your previous two patches, please format your Subject
line properly, and provide a proper "Signed-off-by: " tag.

> On UltraSPARC systems, the PROM establishes a locked ITLB entry that needs
> to be present on return to the PROM (which happens when the system reboots).
> If the kernel tries to reboot with this ITLB entry not being present, a
> RED State Exception is caused, which *may* cause a restart as well, but some
> internal state may be corrupted. (It was observed that PCI resource information
> from the PROM was badly damaged when a linux kernel was started after such a
> RED State Exception, causing subsequent resource allocation errors in the
> kernel.) In other cases, the machine might also hang after the RED State Exception.
> 
> The locked PROM ITLB entry is flushed by flush_tlb_kernel_range(start, end) when
> the flushed interval contains the PROM region from 0xf0000000 to 0x100000000UL.
> This seems to happen when __vunmap() is called and triggers a __purge_vmap_area_lazy()
> call, where both virtual mappings below 0xf0000000 (kernel modules) and above
> 0x100000000UL (data mappings) are to be flushed. The kernel at present does not consider
> that there might also be a PROM mapping in between, and forces all existing mappings
> in the interval determined by __purge_vmap_area_lazy() to be removed.
> 
> The flushing of the locked PROM ITLB entry happens in our case already during the boot
> process when modules are loaded and unloaded for probing purposes.
> 
> With this patch, the affected SunBlade 2000 was able to reboot without problems again.

There are stylistic problems with your change, but I would also really
like to know who calls flush_tlb_kernel_range() for an area covering
the firmware's protected area?

That's where the bug more likely is.

There are only calls to this function from outside of the sparc code, first:

mm/highmem.c

which is not relevant because sparc64 doesn't use highmem, and then we have:

mm/percpu-vm.c
mm/vmalloc.c

So it has to be one of the calls in these two files causing the problem.

For the vmalloc case that would be odd, since we clearly define the VMALLOC
range to be outside of the firmware's special range:

#define VMALLOC_START		_AC(0x0000000100000000,UL)
#define VMALLOC_END		_AC(0x0000010000000000,UL)

>  #define flush_tlb_kernel_range(start,end) \
> -do {   flush_tsb_kernel_range(start,end); \
> -       __flush_tlb_kernel_range(start,end); \
> +do { \
> +       if((start < 0x100000000UL) && (end > 0xf0000000)) { \

Space between "if" and "(" please.  Also, please use the existing macros
LOW_OBP_ADDRESS and HI_OBP_ADDRESS instead of magic address values.

> +               if(start < 0xf0000000) { \

Likewise.

> +                         flush_tsb_kernel_range(start, 0xf0000000); \
> +                       __flush_tlb_kernel_range(start, 0xf0000000); \

These are indented differently, they should be on the same column, and
please use TAB characters.

>  #define flush_tlb_kernel_range(start, end) \
> -do {   flush_tsb_kernel_range(start,end); \
> -       smp_flush_tlb_kernel_range(start, end); \
> +do { \

Likewise for all of this macro too.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Alexander Tobias Schulze Aug. 3, 2014, 12:58 p.m. UTC | #2
On Tuesday 29 July 2014 01:02, David Miller wrote:
> From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
> Date: Sun, 27 Jul 2014 15:39:39 +0200
> 
> As mentioned for your previous two patches, please format your Subject
> line properly, and provide a proper "Signed-off-by: " tag.
> 
[snip]
> 
> There are stylistic problems with your change, but I would also really
> like to know who calls flush_tlb_kernel_range() for an area covering
> the firmware's protected area?
> 
> That's where the bug more likely is.

Yes, I am aware that the patch in the preprocessor define is not really
good style, and the patch should perhaps be pushed higher or lower in the
call chain. I decided to send what I had, however, as I am currently unable
to test modified patches due to lack of access to a Sun machine. (I do not
even have a SPARC64 cross toolchain, so I cannot validate any changes to my
original patches...)

> There are only calls to this function from outside of the sparc code, first:
> 
> mm/highmem.c
> 
> which is not relevant because sparc64 doesn't use highmem, and then we have:
> 
> mm/percpu-vm.c
> mm/vmalloc.c
> 
> So it has to be one of the calls in these two files causing the problem.

Fortunately, I have the call trace ending in the problematic call of
flush_tlb_kernel_range() at hand:

	__purge_vmap_area_lazy()
	free_vmap_area_noflush()
	remove_vm_area()
	__vunmap()
	n_tty_close()
	tty_ldisc_close()
	tty_ldisc_kill()
	tty_ldisc_release()
	tty_release()
	__fput()
	task_work_run()
	__handle_signal()

__purge_vmap_area_lazy() is in mm/vmalloc.c, as is __vunmap(). The line
discipline part is mostly irrelevant, but may be interesting, as I remember
having read on this mailing list that others tracked the reboot problem to
memory allocation changes in the ldisc layer. These changes may have
initially triggered the problematic __vunmap() call that kills the locked
PROM mapping.

The range to be flushed was

	[0x0000'0000'1000'6000, 0x0000'0001'0272'8000]

and this crosses the PROM area.

The LITLB entry (as read from the diagnostic ASI) was

	TAG:	0x0000'0000'f000'0000 -> VA: 0x0000'0000'f000'0000, CTX: 0
	DATA:	0xc000'0000'3ff8'0064 -> PA: 0x0000'0000'3ff8'0000, SZ: 512 kB, locked, priv, R/O, not global

> For the vmalloc case that would be odd, since we clearly define the VMALLOC
> range to be outside of the firmware's special range:
> 
> #define VMALLOC_START		_AC(0x0000000100000000,UL)
> #define VMALLOC_END		_AC(0x0000010000000000,UL)

Yes, but this "lazy flushing" seems to accumulate regions to be
flushed, and while each individual region does not cross the PROM
region, the "combined region" does so, and there was no check for this
situation and to split the "lazy flushing" into two calls to exclude
the PROM area from flushing.

It seems that the region to be flushed is a combination of a kernel
module allocation (below the PROM area) and a vmalloc allocation
(above the PROM area).

> >  #define flush_tlb_kernel_range(start,end) \
> > -do {   flush_tsb_kernel_range(start,end); \
> > -       __flush_tlb_kernel_range(start,end); \
> > +do { \
> > +       if((start < 0x100000000UL) && (end > 0xf0000000)) { \
> 
> Space between "if" and "(" please.  Also, please use the existing macros
> LOW_OBP_ADDRESS and HI_OBP_ADDRESS instead of magic address values.
> 
> > +               if(start < 0xf0000000) { \
> 
> Likewise.
> 
> > +                         flush_tsb_kernel_range(start, 0xf0000000); \
> > +                       __flush_tlb_kernel_range(start, 0xf0000000); \
> 
> These are indented differently, they should be on the same column, and
> please use TAB characters.
> 
> >  #define flush_tlb_kernel_range(start, end) \
> > -do {   flush_tsb_kernel_range(start,end); \
> > -       smp_flush_tlb_kernel_range(start, end); \
> > +do { \
> 
> Likewise for all of this macro too.

I will send reformatted patches shortly. However, I think that these checks
should be moved to somewhere else, as I do not regard bloating these
preprocessor defines is really good style. (The original patch was meant as
a hint what happened and needs to be changed, it was not meant to be merged
exactly in the form as presented.)

Regards,
Alexander Schulze
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 4, 2014, 11:26 p.m. UTC | #3
From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Date: Sun, 3 Aug 2014 14:58:36 +0200

> __purge_vmap_area_lazy() is in mm/vmalloc.c, as is __vunmap(). The line
> discipline part is mostly irrelevant, but may be interesting, as I remember
> having read on this mailing list that others tracked the reboot problem to
> memory allocation changes in the ldisc layer. These changes may have
> initially triggered the problematic __vunmap() call that kills the locked
> PROM mapping.

The problem was introduced by:

commit db64fe02258f1507e13fe5212a989922323685ce
Author: Nick Piggin <npiggin@suse.de>
Date:   Sat Oct 18 20:27:03 2008 -0700

    mm: rewrite vmap layer

which added the lazy VMAP tlb flushing.

The issue is that since and module and VMALLOC areas (both managed
by VMAP handling) are not contiguous, you can end up flushing across
the openfirmware region if f.e. a module is unloaded and some vfree()'s
occur.

I've sent an email to Nick Piggin who added this code, you were CC:'d.

If Nick doesn't agree to fix this in the VMAP lazy flushing layer, and
we implement a fix using your approach, I think we have to stop making
flush_tlb_kernel_range() a macro and instead implement it inside of
arch/sparc/kernel/init_64.c or similar as a real function.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -Naupr linux-source-3.13-orig/arch/sparc/include/asm/tlbflush_64.h linux-source-3.13-patched/arch/sparc/include/asm/tlbflush_64.h
--- linux-source-3.13-orig/arch/sparc/include/asm/tlbflush_64.h 2014-04-14 15:48:24.000000000 +0200
+++ linux-source-3.13-patched/arch/sparc/include/asm/tlbflush_64.h      2014-07-27 14:29:58.000000000 +0200
@@ -49,8 +49,20 @@  extern void __flush_tlb_kernel_range(uns
 #ifndef CONFIG_SMP
 
 #define flush_tlb_kernel_range(start,end) \
-do {   flush_tsb_kernel_range(start,end); \
-       __flush_tlb_kernel_range(start,end); \
+do { \
+       if((start < 0x100000000UL) && (end > 0xf0000000)) { \
+               if(start < 0xf0000000) { \
+                         flush_tsb_kernel_range(start, 0xf0000000); \
+                       __flush_tlb_kernel_range(start, 0xf0000000); \
+               } \
+               if(end > 0x100000000UL) { \
+                         flush_tsb_kernel_range(0x100000000UL, end); \
+                       __flush_tlb_kernel_range(0x100000000UL, end); \
+               } \
+       } else { \
+                 flush_tsb_kernel_range(start, end); \
+               __flush_tlb_kernel_range(start, end); \
+       } \
 } while (0)
 
 static inline void global_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr)
@@ -64,8 +76,20 @@  extern void smp_flush_tlb_kernel_range(u
 extern void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr);
 
 #define flush_tlb_kernel_range(start, end) \
-do {   flush_tsb_kernel_range(start,end); \
-       smp_flush_tlb_kernel_range(start, end); \
+do { \
+       if((start < 0x100000000UL) && (end > 0xf0000000)) { \
+               if(start < 0xf0000000) { \
+                           flush_tsb_kernel_range(start, 0xf0000000); \
+                       smp_flush_tlb_kernel_range(start, 0xf0000000); \
+               } \
+               if(end > 0x100000000UL) { \
+                           flush_tsb_kernel_range(0x100000000UL, end); \
+                       smp_flush_tlb_kernel_range(0x100000000UL, end); \
+               } \
+       } else { \
+                   flush_tsb_kernel_range(start, end); \
+               smp_flush_tlb_kernel_range(start, end); \
+       } \
 } while (0)
 
 #define global_flush_tlb_page(mm, vaddr) \

PATCH 2 - KERNEL VERSION 3.16 #####################################################

diff -Naupr linux-3.16-rc6-orig/arch/sparc/include/asm/tlbflush_64.h linux-3.16-rc6-patched/arch/sparc/include/asm/tlbflush_64.h
--- linux-3.16-rc6-orig/arch/sparc/include/asm/tlbflush_64.h    2014-07-27 11:46:05.000000000 +0200
+++ linux-3.16-rc6-patched/arch/sparc/include/asm/tlbflush_64.h 2014-07-27 14:28:12.000000000 +0200
@@ -49,8 +49,20 @@  void __flush_tlb_kernel_range(unsigned l
 #ifndef CONFIG_SMP
 
 #define flush_tlb_kernel_range(start,end) \
-do {   flush_tsb_kernel_range(start,end); \
-       __flush_tlb_kernel_range(start,end); \
+do { \
+       if((start < 0x100000000UL) && (end > 0xf0000000)) { \
+               if(start < 0xf0000000) { \
+                         flush_tsb_kernel_range(start, 0xf0000000); \
+                       __flush_tlb_kernel_range(start, 0xf0000000); \
+               } \
+               if(end > 0x100000000UL) { \
+                         flush_tsb_kernel_range(0x100000000UL, end); \
+                       __flush_tlb_kernel_range(0x100000000UL, end); \
+               } \
+       } else { \
+                 flush_tsb_kernel_range(start, end); \
+               __flush_tlb_kernel_range(start, end); \
+       } \
 } while (0)
 
 static inline void global_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr)
@@ -64,8 +76,20 @@  void smp_flush_tlb_kernel_range(unsigned
 void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr);
 
 #define flush_tlb_kernel_range(start, end) \
-do {   flush_tsb_kernel_range(start,end); \
-       smp_flush_tlb_kernel_range(start, end); \
+do { \
+       if((start < 0x100000000UL) && (end > 0xf0000000)) { \
+               if(start < 0xf0000000) { \
+                           flush_tsb_kernel_range(start, 0xf0000000); \
+                       smp_flush_tlb_kernel_range(start, 0xf0000000); \
+               } \
+               if(end > 0x100000000UL) { \
+                           flush_tsb_kernel_range(0x100000000UL, end); \
+                       smp_flush_tlb_kernel_range(0x100000000UL, end); \
+               } \
+       } else { \
+                   flush_tsb_kernel_range(start, end); \
+               smp_flush_tlb_kernel_range(start, end); \
+       } \
 } while (0)
 
Regards,