[v2,3/9] powerpc/powernv: Remove real mode access limit for early allocations

Message ID 20170813013346.14002-3-npiggin@gmail.com
State New
Headers show

Commit Message

Nicholas Piggin Aug. 13, 2017, 1:33 a.m.
This removes the RMA limit on powernv platform, which constrains
early allocations such as PACAs and stacks. There are still other
restrictions that must be followed, such as bolted SLB limits, but
real mode addressing has no constraints.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/hash_utils_64.c | 24 +++++++++++++++---------
 arch/powerpc/mm/pgtable-radix.c | 33 +++++++++++++++++----------------
 2 files changed, 32 insertions(+), 25 deletions(-)

Comments

Benjamin Herrenschmidt Aug. 14, 2017, 1:10 p.m. | #1
On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > This removes the RMA limit on powernv platform, which constrains
> > early allocations such as PACAs and stacks. There are still other
> > restrictions that must be followed, such as bolted SLB limits, but
> > real mode addressing has no constraints.

For radix, should we consider making the PACAs chip/node local ?

> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/mm/hash_utils_64.c | 24 +++++++++++++++---------
> >  arch/powerpc/mm/pgtable-radix.c | 33 +++++++++++++++++----------------
> 
> I missed that we'd duplicated this logic for radix vs hash [yes I know I
> merged the commit that did it :)]
> 
> > diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> > index 671a45d86c18..61ca17d81737 100644
> > --- a/arch/powerpc/mm/pgtable-radix.c
> > +++ b/arch/powerpc/mm/pgtable-radix.c
> > @@ -598,22 +598,23 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >  	 * physical on those processors
> >  	 */
> >  	BUG_ON(first_memblock_base != 0);
> > -	/*
> > -	 * We limit the allocation that depend on ppc64_rma_size
> > -	 * to first_memblock_size. We also clamp it to 1GB to
> > -	 * avoid some funky things such as RTAS bugs.
> 
> That comment about RTAS is 7 years old, and I'm pretty sure it was a
> historical note when it was written.
> 
> I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> then we can always put it back.
> 
> > -	 *
> > -	 * On radix config we really don't have a limitation
> > -	 * on real mode access. But keeping it as above works
> > -	 * well enough.
> 
> Ergh.
> 
> > -	 */
> > -	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> > -	/*
> > -	 * Finally limit subsequent allocations. We really don't want
> > -	 * to limit the memblock allocations to rma_size. FIXME!! should
> > -	 * we even limit at all ?
> > -	 */
> 
> So I think we should just delete this function entirely.
> 
> Any objections?
> 
> cheers

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Aug. 14, 2017, 1:13 p.m. | #2
On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > -     /*
> > -      * We limit the allocation that depend on ppc64_rma_size
> > -      * to first_memblock_size. We also clamp it to 1GB to
> > -      * avoid some funky things such as RTAS bugs.
> 
> That comment about RTAS is 7 years old, and I'm pretty sure it was a
> historical note when it was written.
> 
> I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> then we can always put it back.

Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
never used it ..). In this case we need to at least clamp to 2G (no
trust RTAS doing unsigned properly).

> > -      *
> > -      * On radix config we really don't have a limitation
> > -      * on real mode access. But keeping it as above works
> > -      * well enough.
> 
> Ergh.
> 
> > -      */
> > -     ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> > -     /*
> > -      * Finally limit subsequent allocations. We really don't want
> > -      * to limit the memblock allocations to rma_size. FIXME!! should
> > -      * we even limit at all ?
> > -      */
> 
> So I think we should just delete this function entirely.
> 
> Any objections?

Well.. RTAS is quite sucky ... 

Ben.

> cheers

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas Piggin Aug. 14, 2017, 2:51 p.m. | #3
On Mon, 14 Aug 2017 23:10:50 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > Nicholas Piggin <npiggin@gmail.com> writes:
> >   
> > > This removes the RMA limit on powernv platform, which constrains
> > > early allocations such as PACAs and stacks. There are still other
> > > restrictions that must be followed, such as bolted SLB limits, but
> > > real mode addressing has no constraints.  
> 
> For radix, should we consider making the PACAs chip/node local ?

Yes that's the main goal of the series. I had NUMAization patches
at the end. Just dropped them for now because some of them need
toplogy information that's not available (that's why I was asking
about moving unflattening earlier in boot, but we may be able to
move allocations later too).

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas Piggin Aug. 15, 2017, 12:10 p.m. | #4
On Mon, 14 Aug 2017 23:13:07 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > > -     /*
> > > -      * We limit the allocation that depend on ppc64_rma_size
> > > -      * to first_memblock_size. We also clamp it to 1GB to
> > > -      * avoid some funky things such as RTAS bugs.  
> > 
> > That comment about RTAS is 7 years old, and I'm pretty sure it was a
> > historical note when it was written.
> > 
> > I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> > then we can always put it back.  
> 
> Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> never used it ..). In this case we need to at least clamp to 2G (no
> trust RTAS doing unsigned properly).

Is there any allocation not covered by RTAS_INSTANTIATE_MAX?

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Aug. 15, 2017, 12:48 p.m. | #5
On Tue, 2017-08-15 at 22:10 +1000, Nicholas Piggin wrote:
> On Mon, 14 Aug 2017 23:13:07 +1000
> Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:
> 
> > On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > > > -     /*
> > > > -      * We limit the allocation that depend on ppc64_rma_size
> > > > -      * to first_memblock_size. We also clamp it to 1GB to
> > > > -      * avoid some funky things such as RTAS bugs.  
> > > 
> > > That comment about RTAS is 7 years old, and I'm pretty sure it was a
> > > historical note when it was written.
> > > 
> > > I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> > > then we can always put it back.  
> > 
> > Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> > never used it ..). In this case we need to at least clamp to 2G (no
> > trust RTAS doing unsigned properly).
> 
> Is there any allocation not covered by RTAS_INSTANTIATE_MAX?

Not sure, we have to audit. Talking about all this with mpe today, I
think we just need to make sure that anything that has a restriction
uses a specific identifier for *that* restriction rather than just
blindly "rma". For example, seg0_limit for segment 0 in HPT. In the
case of PACAs, we would create a specific limit that is
min(seg0_limit,rma) for pseries and -1 for powernv. etc..

The RMA limit can then become either strictly a pseries thing, or be
initialized to -1 on powernv (or max mem).

Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas Piggin Aug. 15, 2017, 1:02 p.m. | #6
On Tue, 15 Aug 2017 22:48:22 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Tue, 2017-08-15 at 22:10 +1000, Nicholas Piggin wrote:
> > On Mon, 14 Aug 2017 23:13:07 +1000
> > Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:
> >   
> > > On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:  
> > > > > -     /*
> > > > > -      * We limit the allocation that depend on ppc64_rma_size
> > > > > -      * to first_memblock_size. We also clamp it to 1GB to
> > > > > -      * avoid some funky things such as RTAS bugs.    
> > > > 
> > > > That comment about RTAS is 7 years old, and I'm pretty sure it was a
> > > > historical note when it was written.
> > > > 
> > > > I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> > > > then we can always put it back.    
> > > 
> > > Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> > > never used it ..). In this case we need to at least clamp to 2G (no
> > > trust RTAS doing unsigned properly).  
> > 
> > Is there any allocation not covered by RTAS_INSTANTIATE_MAX?  
> 
> Not sure, we have to audit.

Okay.

> Talking about all this with mpe today, I
> think we just need to make sure that anything that has a restriction
> uses a specific identifier for *that* restriction rather than just
> blindly "rma". For example, seg0_limit for segment 0 in HPT. In the
> case of PACAs, we would create a specific limit that is
> min(seg0_limit,rma) for pseries and -1 for powernv. etc..
> 
> The RMA limit can then become either strictly a pseries thing, or be
> initialized to -1 on powernv (or max mem).

Right, I'm trying to get there with the patch. Well really it breaks
into two different things -- RMA for real mode (which is unlimited for
powernv), and non faulting (of any kind, SLB or TLB on booke) for
virtual mode.

RTAS is still squished in there with RMA, but we do have that RTAS
limit so we should try to move it out to there.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Sept. 12, 2017, 10:13 a.m. | #7
Benjamin Herrenschmidt <benh@au1.ibm.com> writes:

> On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
>> > -     /*
>> > -      * We limit the allocation that depend on ppc64_rma_size
>> > -      * to first_memblock_size. We also clamp it to 1GB to
>> > -      * avoid some funky things such as RTAS bugs.
>> 
>> That comment about RTAS is 7 years old, and I'm pretty sure it was a
>> historical note when it was written.
>> 
>> I'm inclined to drop it and if we discover new bugs with RTAS on Power9
>> then we can always put it back.
>
> Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> never used it ..). In this case we need to at least clamp to 2G (no
> trust RTAS doing unsigned properly).
>

Yes. I added the limit to radix after I observed that we have MSR[SF] =
0.

IIRC it was PACA access that was causing it to crash on return from RTAS.

hmm the commit also explains that.

powerpc/mm/radix: Limit paca allocation in radix

On return from RTAS we access the paca variables and we have 64 bit
disabled. This requires us to limit paca in 32 bit range.

Fix this by setting ppc64_rma_size to first_memblock_size/1G range.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Sept. 12, 2017, 10:26 a.m. | #8
On Tue, 2017-09-12 at 15:43 +0530, Aneesh Kumar K.V wrote:
> Yes. I added the limit to radix after I observed that we have MSR[SF] =
> 0.
> 
> IIRC it was PACA access that was causing it to crash on return from RTAS.
> 
> hmm the commit also explains that.
> 
> powerpc/mm/radix: Limit paca allocation in radix
> 
> On return from RTAS we access the paca variables and we have 64 bit
> disabled. This requires us to limit paca in 32 bit range.
> 
> Fix this by setting ppc64_rma_size to first_memblock_size/1G range.

That should be fixable with better use of temporaries...

Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7a20669c19e7..d3da19cc4867 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1824,16 +1824,22 @@  void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	 */
 	BUG_ON(first_memblock_base != 0);
 
-	/* On LPAR systems, the first entry is our RMA region,
-	 * non-LPAR 64-bit hash MMU systems don't have a limitation
-	 * on real mode access, but using the first entry works well
-	 * enough. We also clamp it to 1G to avoid some funky things
-	 * such as RTAS bugs etc...
-	 */
-	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
+	if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
+		/*
+		 * On virtualized systems, the first entry is our RMA region,
+		 * non-LPAR 64-bit hash MMU systems don't have a limitation
+		 * on real mode access.
+		 *
+		 * We also clamp it to 1G to avoid some funky things
+		 * such as RTAS bugs etc...
+		 */
+		ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
 
-	/* Finally limit subsequent allocations */
-	memblock_set_current_limit(ppc64_rma_size);
+		/* Finally limit subsequent allocations */
+		memblock_set_current_limit(ppc64_rma_size);
+	} else {
+		ppc64_rma_size = ULONG_MAX;
+	}
 }
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 671a45d86c18..61ca17d81737 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -598,22 +598,23 @@  void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	 * physical on those processors
 	 */
 	BUG_ON(first_memblock_base != 0);
-	/*
-	 * We limit the allocation that depend on ppc64_rma_size
-	 * to first_memblock_size. We also clamp it to 1GB to
-	 * avoid some funky things such as RTAS bugs.
-	 *
-	 * On radix config we really don't have a limitation
-	 * on real mode access. But keeping it as above works
-	 * well enough.
-	 */
-	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
-	/*
-	 * Finally limit subsequent allocations. We really don't want
-	 * to limit the memblock allocations to rma_size. FIXME!! should
-	 * we even limit at all ?
-	 */
-	memblock_set_current_limit(first_memblock_base + first_memblock_size);
+
+	if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
+		/*
+		 * On virtualized systems, the first entry is our RMA region,
+		 * non-LPAR 64-bit hash MMU systems don't have a limitation
+		 * on real mode access.
+		 *
+		 * We also clamp it to 1G to avoid some funky things
+		 * such as RTAS bugs etc...
+		 */
+		ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
+
+		/* Finally limit subsequent allocations */
+		memblock_set_current_limit(ppc64_rma_size);
+	} else {
+		ppc64_rma_size = ULONG_MAX;
+	}
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG