diff mbox

NMI watchdog: BUG: soft lockup

Message ID 20161025.115059.1745729890762996484.davem@davemloft.net
State RFC
Delegated to: David Miller
Headers show

Commit Message

David Miller Oct. 25, 2016, 3:50 p.m. UTC
From: David Miller <davem@davemloft.net>

Date: Tue, 25 Oct 2016 11:22:31 -0400 (EDT)

> From: James Clarke <jrtc27@jrtc27.com>

> Date: Tue, 25 Oct 2016 16:11:52 +0100

> 

>> Yep, that fix is still there, but you will note that end *is* above start in

>> the call. Something is being allocated and freed right at the end of the malloc

>> area, so it’s iterating over almost the entire thing.

> 

> Ok, let me think about this some more.


So, for the TSB part we don't need to do anything fancy, something like
the patch below will suffice.

As per the TLB flush that's a bit more complicated.

For the older chips we need to do more work because they unfortunately
defined the context flush to even remove locked TLB entries.
Otherwise we could simply do a nucleus context flush if the range is
too large.  So we'll have to use diagnostic accesses to implement the
same functionality.

UltraSPARC-III and later provide more usable facilities for this
situation.  UltraSPARC-III/IV have a "flush all" which removes all
non-locked TLB entries.  And all of the sun4v chips have a more
reasonable context flush, which does not remove "permanent" entries.

I'll start hacking something up.

Comments

Jessica Clarke Oct. 25, 2016, 3:59 p.m. UTC | #1
> On 25 Oct 2016, at 16:50, David Miller <davem@davemloft.net> wrote:
> 
> From: David Miller <davem@davemloft.net>
> Date: Tue, 25 Oct 2016 11:22:31 -0400 (EDT)
> 
>> From: James Clarke <jrtc27@jrtc27.com>
>> Date: Tue, 25 Oct 2016 16:11:52 +0100
>> 
>>> Yep, that fix is still there, but you will note that end *is* above start in
>>> the call. Something is being allocated and freed right at the end of the malloc
>>> area, so it’s iterating over almost the entire thing.
>> 
>> Ok, let me think about this some more.
> 
> So, for the TSB part we don't need to do anything fancy, something like
> the patch below will suffice.
> 
> As per the TLB flush that's a bit more complicated.
> 
> For the older chips we need to do more work because they unfortunately
> defined the context flush to even remove locked TLB entries.
> Otherwise we could simply do a nucleus context flush if the range is
> too large.  So we'll have to use diagnostic accesses to implement the
> same functionality.
> 
> UltraSPARC-III and later provide more usable facilities for this
> situation.  UltraSPARC-III/IV have a "flush all" which removes all
> non-locked TLB entries.  And all of the sun4v chips have a more
> reasonable context flush, which does not remove "permanent" entries.
> 
> I'll start hacking something up.
> 
> diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
> index f2b7711..1f63411 100644
> --- a/arch/sparc/mm/tsb.c
> +++ b/arch/sparc/mm/tsb.c
> @@ -27,6 +27,20 @@ static inline int tag_compare(unsigned long tag, unsigned long vaddr)
> 	return (tag == (vaddr >> 22));
> }
> 
> +static void flush_tsb_kernel_range_scan(unsigned long start, unsigned long end)
> +{
> +	unsigned long idx;
> +
> +	start >> 22;
> +	end >> 22;
> +	for (idx = 0; idx < KERNEL_TSB_NENTRIES; idx++) {
> +		struct tsb *ent = &swapper_tsb[idx];
> +
> +		if (ent->tag >= start && end->tag < end)
> +			ent->tag = (1UL << TSB_TAG_INVALID_BIT);
> +	}
> +}
> +
> /* TSB flushes need only occur on the processor initiating the address
>  * space modification, not on each cpu the address space has run on.
>  * Only the TLB flush needs that treatment.
> @@ -36,6 +50,9 @@ void flush_tsb_kernel_range(unsigned long start, unsigned long end)
> {
> 	unsigned long v;
> 
> +	if ((end - start) >> PAGE_SHIFT >= 2 * KERNEL_TSB_NENTRIES)
> +		return flush_tsb_kernel_range_scan(start, end);
> +
> 	for (v = start; v < end; v += PAGE_SIZE) {
> 		unsigned long hash = tsb_hash(v, PAGE_SHIFT,
> 					      KERNEL_TSB_NENTRIES);

That’s basically the same as my patch, except this potentially flushes things
outside [start, end) if they’re not on 2^22-byte boundaries. Of course, the
performance hit of that may be better than my patch which also walks through
pages up to the first 2^22-aligned address after start, and those after the
last 2^22-aligned address before end, but won’t flush anything that’s outside
the range.

Can we do a similar thing for the TLB by iterating over all its entries? Surely
one of the ASIs lets you do that?

James

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 25, 2016, 5:04 p.m. UTC | #2
From: James Clarke <jrtc27@jrtc27.com>

Date: Tue, 25 Oct 2016 16:59:04 +0100

> That’s basically the same as my patch, except this potentially flushes things

> outside [start, end) if they’re not on 2^22-byte boundaries.


Oh yes, I see, thanks for pointing that out.

We have to take the index into account for the purposes of virtual
address comparison.
David Miller Oct. 25, 2016, 5:06 p.m. UTC | #3
From: James Clarke <jrtc27@jrtc27.com>
Date: Tue, 25 Oct 2016 16:59:04 +0100

> Can we do a similar thing for the TLB by iterating over all its entries? Surely
> one of the ASIs lets you do that?

Yes, you can di it using diagnostic accesses.

But in my opinion for UltraSPARC-III/IV we should do an "all flush"
and for sun4v do a context 0 flush when the range is huge.

It's so much simpler, easier to audit and validate, and is probably
cheaper in the end.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jessica Clarke Oct. 25, 2016, 5:09 p.m. UTC | #4
> On 25 Oct 2016, at 18:04, David Miller <davem@davemloft.net> wrote:
> 
> From: James Clarke <jrtc27@jrtc27.com>
> Date: Tue, 25 Oct 2016 16:59:04 +0100
> 
>> That’s basically the same as my patch, except this potentially flushes things
>> outside [start, end) if they’re not on 2^22-byte boundaries.
> 
> Oh yes, I see, thanks for pointing that out.
> 
> We have to take the index into account for the purposes of virtual
> address comparison.

I don’t quite follow what you’re trying to say here? idx is only used to index the TSB.

James

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 25, 2016, 5:18 p.m. UTC | #5
From: James Clarke <jrtc27@jrtc27.com>

Date: Tue, 25 Oct 2016 18:09:12 +0100

>> On 25 Oct 2016, at 18:04, David Miller <davem@davemloft.net> wrote:

>> 

>> From: James Clarke <jrtc27@jrtc27.com>

>> Date: Tue, 25 Oct 2016 16:59:04 +0100

>> 

>>> That’s basically the same as my patch, except this potentially flushes things

>>> outside [start, end) if they’re not on 2^22-byte boundaries.

>> 

>> Oh yes, I see, thanks for pointing that out.

>> 

>> We have to take the index into account for the purposes of virtual

>> address comparison.

> 

> I don’t quite follow what you’re trying to say here? idx is only used to index the TSB.


It is also how bits "13-21" of the virtual address are matched.

Think about what happens when a TSB entry is looked up.

Since, by virtue being in TSB entry with index X the virtual address
bits 13-21 are implied, they don't need to be stored in the tag.

So the full virtual address comparison is something like:

	unsigned long compare = (tag >> 22) << 22; /* Clear CONTEXT bits */

	compare |= (tsb_index & (nentries - 1)) << 13;

	if (vaddr == compare)
		goto match;

The swapper TSB only stores PAGE_SIZE translations.
David Miller Oct. 25, 2016, 5:27 p.m. UTC | #6
From: David Miller <davem@davemloft.net>
Date: Tue, 25 Oct 2016 13:18:08 -0400 (EDT)

> So the full virtual address comparison is something like:
> 
> 	unsigned long compare = (tag >> 22) << 22; /* Clear CONTEXT bits */
> 
> 	compare |= (tsb_index & (nentries - 1)) << 13;
> 
> 	if (vaddr == compare)
> 		goto match;
> 
> The swapper TSB only stores PAGE_SIZE translations.

Ok, this should work:

static void flush_tsb_kernel_range_scan(unsigned long start, unsigned long end)
{
        unsigned long idx;

        for (idx = 0; idx < KERNEL_TSB_NENTRIES; idx++) {
                struct tsb *ent = &swapper_tsb[idx];
                unsigned long match = idx << 13;

                match |= (ent->tag << 22);
                if (match >= start && match < end)
                        ent->tag = (1UL << TSB_TAG_INVALID_BIT);
        }
}

Bits 13-"21+N" come from the TSB index, and the tag always stores
bits 22 and above.  So simply 'or'ing them together always gives
us a usable match value.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jessica Clarke Oct. 25, 2016, 5:40 p.m. UTC | #7
> On 25 Oct 2016, at 18:27, David Miller <davem@davemloft.net> wrote:
> 
> From: David Miller <davem@davemloft.net>
> Date: Tue, 25 Oct 2016 13:18:08 -0400 (EDT)
> 
>> So the full virtual address comparison is something like:
>> 
>> 	unsigned long compare = (tag >> 22) << 22; /* Clear CONTEXT bits */
>> 
>> 	compare |= (tsb_index & (nentries - 1)) << 13;
>> 
>> 	if (vaddr == compare)
>> 		goto match;
>> 
>> The swapper TSB only stores PAGE_SIZE translations.
> 
> Ok, this should work:
> 
> static void flush_tsb_kernel_range_scan(unsigned long start, unsigned long end)
> {
>        unsigned long idx;
> 
>        for (idx = 0; idx < KERNEL_TSB_NENTRIES; idx++) {
>                struct tsb *ent = &swapper_tsb[idx];
>                unsigned long match = idx << 13;
> 
>                match |= (ent->tag << 22);
>                if (match >= start && match < end)
>                        ent->tag = (1UL << TSB_TAG_INVALID_BIT);
>        }
> }
> 
> Bits 13-"21+N" come from the TSB index, and the tag always stores
> bits 22 and above.  So simply 'or'ing them together always gives
> us a usable match value.

Ah, right, got it. That makes it much nicer than mine.

James

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index f2b7711..1f63411 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -27,6 +27,20 @@  static inline int tag_compare(unsigned long tag, unsigned long vaddr)
 	return (tag == (vaddr >> 22));
 }
 
+static void flush_tsb_kernel_range_scan(unsigned long start, unsigned long end)
+{
+	unsigned long idx;
+
+	start >> 22;
+	end >> 22;
+	for (idx = 0; idx < KERNEL_TSB_NENTRIES; idx++) {
+		struct tsb *ent = &swapper_tsb[idx];
+
+		if (ent->tag >= start && end->tag < end)
+			ent->tag = (1UL << TSB_TAG_INVALID_BIT);
+	}
+}
+
 /* TSB flushes need only occur on the processor initiating the address
  * space modification, not on each cpu the address space has run on.
  * Only the TLB flush needs that treatment.
@@ -36,6 +50,9 @@  void flush_tsb_kernel_range(unsigned long start, unsigned long end)
 {
 	unsigned long v;
 
+	if ((end - start) >> PAGE_SHIFT >= 2 * KERNEL_TSB_NENTRIES)
+		return flush_tsb_kernel_range_scan(start, end);
+
 	for (v = start; v < end; v += PAGE_SIZE) {
 		unsigned long hash = tsb_hash(v, PAGE_SHIFT,
 					      KERNEL_TSB_NENTRIES);