diff mbox

sparc64: Handle extremely large kernel TSB range flushes sanely.

Message ID 20161026200536.GA5465@Jamess-MacBook.local
State Deferred
Delegated to: David Miller
Headers show

Commit Message

Jessica Clarke Oct. 26, 2016, 8:05 p.m. UTC
On Wed, Oct 26, 2016 at 03:04:59PM -0400, David Miller wrote:
> From: James Clarke <jrtc27@jrtc27.com>
> Date: Wed, 26 Oct 2016 18:21:06 +0100
>
> >> On 26 Oct 2016, at 18:09, David Miller <davem@davemloft.net> wrote:
> >>
> >> From: James Clarke <jrtc27@jrtc27.com>
> >> Date: Wed, 26 Oct 2016 17:58:16 +0100
> >>
> >>>> On 26 Oct 2016, at 16:54, David Miller <davem@davemloft.net> wrote:
> >>>>
> >>>> From: James Clarke <jrtc27@jrtc27.com>
> >>>> Date: Wed, 26 Oct 2016 09:28:05 +0100
> >>>>
> >>>>> Any progress on TLB flushing?
> >>>>
> >>>> I'll keep plugging away at it today.
> >>>
> >>> Great; let me know if you need a guinea pig, as it’s pretty easy for me to
> >>> reproduce.
> >>
> >> Will do, what kind of cpus do you have?
> >
> > * UltraSparc T5 (Niagara5)
> > * UltraSparc T1 (Niagara)
> > * UltraSPARC IIIi
> >
> > The IIIi seems to be down at the moment though.
>
> James, here is what I have so far.  I only gave it a brief testing on
> sun4v, so no guarantees for the sun4u cases.  This is against the
> current sparc GIT tree.
>
> The cut-off is 32 pages, we can discuss whether that's a good value
> to use or not.  FWIW, x86_64 has similar code for this situation and
> uses a cut-off of 33.  Perhaps 64 is a better value, who knows.
>
> It might even make sense to use a different cut-off for the hypervisor
> case since the hypervisor trap we have to use to do the TLB operation
> adds even more expense to each iteration of the range loop.
>
> The policy implemented for huge range flushes below is:
>
> 1) Spitfire - Flush all non-locked entries, by hand using diagnostic
>    TLB accesses.
>
> 2) Cheetah - Flush all non-locked entries using "flush all" operation.
>
> 3) sun4v/hypervisor - Flush entire kernel context, which does not
>    remove locked or "permanent" entries.
>
> Anyways, let me know how it goes.

Thanks for this, it's now compiling. I'll let you know if it works
within the next 24 hours.

Before I forget, what do you think about the following patch? I know
Debian used to use the 64-bit kernel for a 32-bit sparc userland, and so
"Architecture: sparc" was correct, but obviously sparc64 also exists. It
seems more sane to make sparc64 default to "Architecture: sparc64", with
sparc users needing to override this with KBUILD_DEBARCH if they want
to, rather than providing a setup that's broken out of the box for
sparc64 users.

From: James Clarke <jrtc27@jrtc27.com>
Date: Wed, 26 Oct 2016 20:17:10 +0100
Subject: [PATCH] builddeb: Add support for sparc64

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
---
 scripts/package/builddeb | 2 ++
 1 file changed, 2 insertions(+)

--
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Oct. 26, 2016, 9:02 p.m. UTC | #1
From: James Clarke <jrtc27@jrtc27.com>
Date: Wed, 26 Oct 2016 21:05:36 +0100

> Thanks for this, it's now compiling. I'll let you know if it works
> within the next 24 hours.

Thanks.

> Before I forget, what do you think about the following patch? I know
> Debian used to use the 64-bit kernel for a 32-bit sparc userland, and so
> "Architecture: sparc" was correct, but obviously sparc64 also exists. It
> seems more sane to make sparc64 default to "Architecture: sparc64", with
> sparc users needing to override this with KBUILD_DEBARCH if they want
> to, rather than providing a setup that's broken out of the box for
> sparc64 users.
> 
> From: James Clarke <jrtc27@jrtc27.com>
> Date: Wed, 26 Oct 2016 20:17:10 +0100
> Subject: [PATCH] builddeb: Add support for sparc64
> 
> Signed-off-by: James Clarke <jrtc27@jrtc27.com>

I don't know.

I still personally use a 32-bit userland on my sparc64 systems because
that is what performs the best and is what I will be using for as long
as I possibly can.

I've actually never used this target, is this for build the kernel or
userland components?
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jessica Clarke Oct. 27, 2016, 1:27 a.m. UTC | #2
> On 26 Oct 2016, at 22:02, David Miller <davem@davemloft.net> wrote:
> 
> From: James Clarke <jrtc27@jrtc27.com>
> Date: Wed, 26 Oct 2016 21:05:36 +0100
> 
>> Thanks for this, it's now compiling. I'll let you know if it works
>> within the next 24 hours.
> 
> Thanks.
> 
>> Before I forget, what do you think about the following patch? I know
>> Debian used to use the 64-bit kernel for a 32-bit sparc userland, and so
>> "Architecture: sparc" was correct, but obviously sparc64 also exists. It
>> seems more sane to make sparc64 default to "Architecture: sparc64", with
>> sparc users needing to override this with KBUILD_DEBARCH if they want
>> to, rather than providing a setup that's broken out of the box for
>> sparc64 users.
>> 
>> From: James Clarke <jrtc27@jrtc27.com>
>> Date: Wed, 26 Oct 2016 20:17:10 +0100
>> Subject: [PATCH] builddeb: Add support for sparc64
>> 
>> Signed-off-by: James Clarke <jrtc27@jrtc27.com>
> 
> I don't know.
> 
> I still personally use a 32-bit userland on my sparc64 systems because
> that is what performs the best and is what I will be using for as long
> as I possibly can.
> 
> I've actually never used this target, is this for build the kernel or
> userland components?

Yes, make pkg-deb builds kernel, firmware, headers and linux-libc packages.
By the way, the first build I made of 4.9 (using Debian’s 4.8 config as old
config) wouldn’t boot, since:

* sunvdc module needs _mcount
* sunvnet module needs _mcount and count_bits
* crc32c_sparc64 module needs _mcount and VISenter
[* raid6_pq module needs memcpy, though that’s just for a data partition]

The workaround is not to use CONFIG_MODVERSIONS, but this wasn’t at all clear
at first. This is because of d3867f0483, which moved these to being exported in
their .S.

Anyway, the new kernel is running now and being stress-tested.

James

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jessica Clarke Oct. 27, 2016, 8:25 a.m. UTC | #3
> On 27 Oct 2016, at 02:27, James Clarke <jrtc27@jrtc27.com> wrote:
> 
>> On 26 Oct 2016, at 22:02, David Miller <davem@davemloft.net> wrote:
>> 
>> From: James Clarke <jrtc27@jrtc27.com>
>> Date: Wed, 26 Oct 2016 21:05:36 +0100
>> 
>>> Thanks for this, it's now compiling. I'll let you know if it works
>>> within the next 24 hours.
>> 
>> Thanks.
>> 
>>> Before I forget, what do you think about the following patch? I know
>>> Debian used to use the 64-bit kernel for a 32-bit sparc userland, and so
>>> "Architecture: sparc" was correct, but obviously sparc64 also exists. It
>>> seems more sane to make sparc64 default to "Architecture: sparc64", with
>>> sparc users needing to override this with KBUILD_DEBARCH if they want
>>> to, rather than providing a setup that's broken out of the box for
>>> sparc64 users.
>>> 
>>> From: James Clarke <jrtc27@jrtc27.com>
>>> Date: Wed, 26 Oct 2016 20:17:10 +0100
>>> Subject: [PATCH] builddeb: Add support for sparc64
>>> 
>>> Signed-off-by: James Clarke <jrtc27@jrtc27.com>
>> 
>> I don't know.
>> 
>> I still personally use a 32-bit userland on my sparc64 systems because
>> that is what performs the best and is what I will be using for as long
>> as I possibly can.
>> 
>> I've actually never used this target, is this for build the kernel or
>> userland components?
> 
> Yes, make pkg-deb builds kernel, firmware, headers and linux-libc packages.
> By the way, the first build I made of 4.9 (using Debian’s 4.8 config as old
> config) wouldn’t boot, since:
> 
> * sunvdc module needs _mcount
> * sunvnet module needs _mcount and count_bits
> * crc32c_sparc64 module needs _mcount and VISenter
> [* raid6_pq module needs memcpy, though that’s just for a data partition]
> 
> The workaround is not to use CONFIG_MODVERSIONS, but this wasn’t at all clear
> at first. This is because of d3867f0483, which moved these to being exported in
> their .S.
> 
> Anyway, the new kernel is running now and being stress-tested.

Hi David,
I’ve run it on the T5 and it seems to work without lockups:

[5948090.988821] vln_init: *vmap_lazy_nr is 32754
[5948090.989943] vln_init: lazy_max_pages() is 32768
[5948091.157381] TSB[insmod:261876]: DEBUG flush_tsb_kernel_range start=0000000010006000 end=00000000f0000000 PAGE_SIZE=2000
[5948091.157530] TSB[insmod:261876]: DEBUG flush_tsb_kernel_range start=0000000100000000 end=0005ffff8c000000 PAGE_SIZE=2000
[5948091.158240] vln_init: vmap_lazy_nr is caeb1c
[5948091.158252] vln_init: *vmap_lazy_nr is 0
[5948091.159311] vln_init: lazy_max_pages() is 32768
... continues on as normal ...

(again, that’s my debugging module to see how close the system is to a flush)

I can't (yet) vouch for the IIIi, but when it comes back up I’ll give it a go[1].
I'll also put it on the T1 at some point today, but it *should* also work since
it's using the same sun4v/hypervisor implementation as the T5.

Thanks,
James

[1] Not sure how long that will take...

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jessica Clarke Oct. 27, 2016, 4:02 p.m. UTC | #4
> On 27 Oct 2016, at 16:51, David Miller <davem@davemloft.net> wrote:
> 
> From: James Clarke <jrtc27@jrtc27.com>
> Date: Thu, 27 Oct 2016 09:25:36 +0100
> 
>> I’ve run it on the T5 and it seems to work without lockups:
>> 
>> [5948090.988821] vln_init: *vmap_lazy_nr is 32754
>> [5948090.989943] vln_init: lazy_max_pages() is 32768
>> [5948091.157381] TSB[insmod:261876]: DEBUG flush_tsb_kernel_range start=0000000010006000 end=00000000f0000000 PAGE_SIZE=2000
>> [5948091.157530] TSB[insmod:261876]: DEBUG flush_tsb_kernel_range start=0000000100000000 end=0005ffff8c000000 PAGE_SIZE=2000
>> [5948091.158240] vln_init: vmap_lazy_nr is caeb1c
>> [5948091.158252] vln_init: *vmap_lazy_nr is 0
>> [5948091.159311] vln_init: lazy_max_pages() is 32768
>> ... continues on as normal ...
>> 
>> (again, that’s my debugging module to see how close the system is to a flush)
>> 
>> I can't (yet) vouch for the IIIi, but when it comes back up I’ll give it a go[1].
>> I'll also put it on the T1 at some point today, but it *should* also work since
>> it's using the same sun4v/hypervisor implementation as the T5.
> 
> I'm about to test it on my IIIi and will commit this if it seems to work properly.
> 
> I guess you have no opinion about the cut-off choosen? :-)
> 
> Anyways, we can fine tune it later.

I was just testing it on the IIIi when I got this. Anyway, it seems to work fine.
It hasn’t (yet) had one of the stupidly high allocations, but it did flush a block
of 3658 pages just fine (assuming the flush actually worked). Similarly for the T1.

The cut-off seems pretty arbitrary, and the only way to determine it properly would
be benchmarking (or finding out what the relevant delays are). Given x86 uses 33,
32 or 64 seem perfectly fine, but going into the hundreds doesn’t sound stupid
either... For such small numbers it’s probably hardly going to matter.

Tested-by: James Clarke <jrtc27@jrtc27.com>

James

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 27, 2016, 4:14 p.m. UTC | #5
From: James Clarke <jrtc27@jrtc27.com>

Date: Thu, 27 Oct 2016 17:02:32 +0100

> I was just testing it on the IIIi when I got this. Anyway, it seems to work fine.

> It hasn’t (yet) had one of the stupidly high allocations, but it did flush a block

> of 3658 pages just fine (assuming the flush actually worked). Similarly for the T1.


Thanks for testing.  I'll post the final patch I committed.

> The cut-off seems pretty arbitrary, and the only way to determine it properly would

> be benchmarking (or finding out what the relevant delays are). Given x86 uses 33,

> 32 or 64 seem perfectly fine, but going into the hundreds doesn’t sound stupid

> either... For such small numbers it’s probably hardly going to matter.


It's not too hard to write a kernel module which just does dummy TLB flushes in
the loop and count the cycles using the %tick register.  And I plan to hack on
something like that soon'ish.

Another part of the equation is that it blows away, at a minimum, all kernel
TLB entries.  And that has a certain cost too.
diff mbox

Patch

diff --git a/scripts/package/builddeb b/scripts/package/builddeb
index 8ea9fd2..63b3112 100755
--- a/scripts/package/builddeb
+++ b/scripts/package/builddeb
@@ -41,6 +41,8 @@  set_debarch() {
 		debarch="$UTS_MACHINE" ;;
 	x86_64)
 		debarch=amd64 ;;
+	sparc64)
+		debarch=sparc64 ;;
 	sparc*)
 		debarch=sparc ;;
 	s390*)