diff mbox

[RFC] Replaced tlbilx with tlbwe in the initialization code

Message ID 1360846595-3397-1-git-send-email-diana.craciun@freescale.com (mailing list archive)
State Accepted, archived
Commit ed2ddc56e758d516c5699260ada4d68434dfe1dc
Headers show

Commit Message

Diana Craciun Feb. 14, 2013, 12:56 p.m. UTC
From: Diana Craciun <Diana.Craciun@freescale.com>

On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor
state can execute TLB management instructions. If EPCR[DGTMI]=0
tlbwe and tlbilx are allowed to execute normally in the guest state.

A hypervisor may choose to virtualize TLB1 and for this purpose it
may use IPROT to protect the entries for being invalidated by the
guest. However, because tlbwe and tlbilx execution in the guest state
are sharing the same bit, it is not possible to have a scenario where
tlbwe is allowed to be executed in guest state and tlbilx traps. When
guest TLB management instructions are allowed to be executed in guest
state the guest cannot use tlbilx to invalidate TLB1 guest entries.

Linux is using tlbilx in the boot code to invalidate the temporary
entries it creates when initializing the MMU. The patch is replacing
the usage of tlbilx in initialization code with tlbwe with VALID bit
cleared.

Linux is also using tlbilx in other contexts (like huge pages or
indirect entries) but removing the tlbilx from the initialization code
offers the possibility to have scenarios under hypervisor which are
not using huge pages or indirect entries.

Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
---
 arch/powerpc/kernel/exceptions-64e.S | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

Comments

Benjamin Herrenschmidt Feb. 15, 2013, 12:11 a.m. UTC | #1
On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote:
> From: Diana Craciun <Diana.Craciun@freescale.com>
> 
> On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor
> state can execute TLB management instructions. If EPCR[DGTMI]=0
> tlbwe and tlbilx are allowed to execute normally in the guest state.
> 
> A hypervisor may choose to virtualize TLB1 and for this purpose it
> may use IPROT to protect the entries for being invalidated by the
> guest. However, because tlbwe and tlbilx execution in the guest state
> are sharing the same bit, it is not possible to have a scenario where
> tlbwe is allowed to be executed in guest state and tlbilx traps. When
> guest TLB management instructions are allowed to be executed in guest
> state the guest cannot use tlbilx to invalidate TLB1 guest entries.

Sorry, I don't understand the explanation... can you be more detailed ?

> Linux is using tlbilx in the boot code to invalidate the temporary
> entries it creates when initializing the MMU. The patch is replacing
> the usage of tlbilx in initialization code with tlbwe with VALID bit
> cleared.
> 
> Linux is also using tlbilx in other contexts (like huge pages or
> indirect entries) but removing the tlbilx from the initialization code
> offers the possibility to have scenarios under hypervisor which are
> not using huge pages or indirect entries.
> 
> Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
> ---
>  arch/powerpc/kernel/exceptions-64e.S | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index 4684e33..1f0ae33 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -1010,12 +1010,9 @@ skpinv:	addi	r6,r6,1				/* Increment */
>  	mtspr	SPRN_MAS0,r3
>  	tlbre
>  	mfspr	r6,SPRN_MAS1
> -	rlwinm	r6,r6,0,2,0	/* clear IPROT */
> +	rlwinm	r6,r6,0,2,31	/* clear IPROT and VALID */
>  	mtspr	SPRN_MAS1,r6
>  	tlbwe
> -
> -	/* Invalidate TLB1 */
> -	PPC_TLBILX_ALL(0,R0)
>  	sync
>  	isync
>  
> @@ -1069,12 +1066,9 @@ skpinv:	addi	r6,r6,1				/* Increment */
>  	mtspr	SPRN_MAS0,r4
>  	tlbre
>  	mfspr	r5,SPRN_MAS1
> -	rlwinm	r5,r5,0,2,0	/* clear IPROT */
> +	rlwinm	r5,r5,0,2,31	/* clear IPROT and VALID */
>  	mtspr	SPRN_MAS1,r5
>  	tlbwe
> -
> -	/* Invalidate TLB1 */
> -	PPC_TLBILX_ALL(0,R0)
>  	sync
>  	isync
>
Diana Craciun Feb. 15, 2013, 3:16 p.m. UTC | #2
On 02/15/2013 02:11 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote:
>> From: Diana Craciun <Diana.Craciun@freescale.com>
>>
>> On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor
>> state can execute TLB management instructions. If EPCR[DGTMI]=0
>> tlbwe and tlbilx are allowed to execute normally in the guest state.
>>
>> A hypervisor may choose to virtualize TLB1 and for this purpose it
>> may use IPROT to protect the entries for being invalidated by the
>> guest. However, because tlbwe and tlbilx execution in the guest state
>> are sharing the same bit, it is not possible to have a scenario where
>> tlbwe is allowed to be executed in guest state and tlbilx traps. When
>> guest TLB management instructions are allowed to be executed in guest
>> state the guest cannot use tlbilx to invalidate TLB1 guest entries.
> Sorry, I don't understand the explanation... can you be more detailed ?

TLB1 supports huge page sizes. The guest may see the memory as 
contiguous but it sees the guest physical memory as presented by the 
hypervisor. In reality the real physical memory may be fragmented. In 
this case the hypervisor can add more than one TLB1 entry for one guest 
request and the hypervisor will keep track of all fragments. When the 
guest performs a tlbilx, the hypervisor will correctly invalidate all 
the corresponding fragments because both tlbwe and tlbilx trap and has 
full control of tlb management instructions targeting TLB1.

For e6500 a single bit controls if tlbwe and tlbilx trap to the 
Hypervisor. tlbwe targeting TLB1 always traps. But if we want to use 
LRAT for TLB0, we have to configure tlbwe (targeting TLB 0) to go 
directly to the guest. But in this case tlbilx (which is targeting both 
TLBs) will never trap.

If the tlbilx does not trap, the guest can invalidate only one of 
(possible more) fragments and furthermore the synchronization between 
what entries the hypervisor thinks there are in the TLB1 and what are 
the actual entries is lost.

Diana
Scott Wood Feb. 19, 2013, 7:47 p.m. UTC | #3
On 02/15/2013 09:16:15 AM, Diana Craciun wrote:
> On 02/15/2013 02:11 AM, Benjamin Herrenschmidt wrote:
>> On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote:
>>> From: Diana Craciun <Diana.Craciun@freescale.com>
>>> 
>>> On Freescale e6500 cores EPCR[DGTMI] controls whether guest  
>>> supervisor
>>> state can execute TLB management instructions. If EPCR[DGTMI]=0
>>> tlbwe and tlbilx are allowed to execute normally in the guest state.
>>> 
>>> A hypervisor may choose to virtualize TLB1 and for this purpose it
>>> may use IPROT to protect the entries for being invalidated by the
>>> guest. However, because tlbwe and tlbilx execution in the guest  
>>> state
>>> are sharing the same bit, it is not possible to have a scenario  
>>> where
>>> tlbwe is allowed to be executed in guest state and tlbilx traps.  
>>> When
>>> guest TLB management instructions are allowed to be executed in  
>>> guest
>>> state the guest cannot use tlbilx to invalidate TLB1 guest entries.
>> Sorry, I don't understand the explanation... can you be more  
>> detailed ?
> 
> TLB1 supports huge page sizes. The guest may see the memory as  
> contiguous but it sees the guest physical memory as presented by the  
> hypervisor. In reality the real physical memory may be fragmented. In  
> this case the hypervisor can add more than one TLB1 entry for one  
> guest request and the hypervisor will keep track of all fragments.  
> When the guest performs a tlbilx, the hypervisor will correctly  
> invalidate all the corresponding fragments because both tlbwe and  
> tlbilx trap and has full control of tlb management instructions  
> targeting TLB1.
> 
> For e6500 a single bit controls if tlbwe and tlbilx trap to the  
> Hypervisor. tlbwe targeting TLB1 always traps. But if we want to use  
> LRAT for TLB0, we have to configure tlbwe (targeting TLB 0) to go  
> directly to the guest. But in this case tlbilx (which is targeting  
> both TLBs) will never trap.
> 
> If the tlbilx does not trap, the guest can invalidate only one of  
> (possible more) fragments and furthermore the synchronization between  
> what entries the hypervisor thinks there are in the TLB1 and what are  
> the actual entries is lost.

This patch addresses boot-time invalidations only.  How will you handle  
hugetlb invalidations (or indirect entry invalidations, once that  
becomes supported)?

-Scott
Diana Craciun Feb. 20, 2013, 9:22 a.m. UTC | #4
On 02/19/2013 09:47 PM, Scott Wood wrote:
> On 02/15/2013 09:16:15 AM, Diana Craciun wrote:
>> On 02/15/2013 02:11 AM, Benjamin Herrenschmidt wrote:
>>> On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote:
>>>> From: Diana Craciun <Diana.Craciun@freescale.com>
>>>>
>>>> On Freescale e6500 cores EPCR[DGTMI] controls whether guest
>>>> supervisor
>>>> state can execute TLB management instructions. If EPCR[DGTMI]=0
>>>> tlbwe and tlbilx are allowed to execute normally in the guest state.
>>>>
>>>> A hypervisor may choose to virtualize TLB1 and for this purpose it
>>>> may use IPROT to protect the entries for being invalidated by the
>>>> guest. However, because tlbwe and tlbilx execution in the guest
>>>> state
>>>> are sharing the same bit, it is not possible to have a scenario
>>>> where
>>>> tlbwe is allowed to be executed in guest state and tlbilx traps.
>>>> When
>>>> guest TLB management instructions are allowed to be executed in
>>>> guest
>>>> state the guest cannot use tlbilx to invalidate TLB1 guest entries.
>>> Sorry, I don't understand the explanation... can you be more
>>> detailed ?
>> TLB1 supports huge page sizes. The guest may see the memory as
>> contiguous but it sees the guest physical memory as presented by the
>> hypervisor. In reality the real physical memory may be fragmented. In
>> this case the hypervisor can add more than one TLB1 entry for one
>> guest request and the hypervisor will keep track of all fragments.
>> When the guest performs a tlbilx, the hypervisor will correctly
>> invalidate all the corresponding fragments because both tlbwe and
>> tlbilx trap and has full control of tlb management instructions
>> targeting TLB1.
>>
>> For e6500 a single bit controls if tlbwe and tlbilx trap to the
>> Hypervisor. tlbwe targeting TLB1 always traps. But if we want to use
>> LRAT for TLB0, we have to configure tlbwe (targeting TLB 0) to go
>> directly to the guest. But in this case tlbilx (which is targeting
>> both TLBs) will never trap.
>>
>> If the tlbilx does not trap, the guest can invalidate only one of
>> (possible more) fragments and furthermore the synchronization between
>> what entries the hypervisor thinks there are in the TLB1 and what are
>> the actual entries is lost.
> This patch addresses boot-time invalidations only.  How will you handle
> hugetlb invalidations (or indirect entry invalidations, once that
> becomes supported)?
>
> -Scott

I will not handle them. This patch offers the possibility to run Linux 
under hypervisor without using hugetlb or indirect entries (of course in 
case when we configure tlb management instructions to go to the guest 
because otherwise it works)

If indirect entries are supported most likely we will configure tlbilx 
and tlbwe to trap. In this case LRAT will be still used through the page 
table walk mechanism.

Diana
Stuart Yoder Feb. 20, 2013, 2:22 p.m. UTC | #5
On Tue, Feb 19, 2013 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote:
>
> This patch addresses boot-time invalidations only.  How will you handle
> hugetlb invalidations (or indirect entry invalidations, once that becomes
> supported)?

We do envision that "direct guest TLB management" is an opt-in option
that a guest can enable.

If LRAT is on, with TLB management directly handled by guests, the only
mechanism we have to do TLB1 invalidates is tlbwe.   That is our only option
as far as I know.   So, hugetlb and indirect entries will each need to be
addressed separately.    The kernel code that handles these either needs
to be A) modified to unconditionally do all invalidates by tlbwe or B)
conditionally
use tlbwe depending on whether this is a guest that has enabled direct
TLB management.

Stuart
Diana Craciun Feb. 20, 2013, 2:31 p.m. UTC | #6
On 02/20/2013 04:22 PM, Stuart Yoder wrote:
> On Tue, Feb 19, 2013 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote:
>> This patch addresses boot-time invalidations only.  How will you handle
>> hugetlb invalidations (or indirect entry invalidations, once that becomes
>> supported)?
> We do envision that "direct guest TLB management" is an opt-in option
> that a guest can enable.
>
> If LRAT is on, with TLB management directly handled by guests, the only
> mechanism we have to do TLB1 invalidates is tlbwe.   That is our only option
> as far as I know.   So, hugetlb and indirect entries will each need to be
> addressed separately.    The kernel code that handles these either needs
> to be A) modified to unconditionally do all invalidates by tlbwe or B)
> conditionally
> use tlbwe depending on whether this is a guest that has enabled direct
> TLB management.
>
> Stuart
>

In case of indirect entries I think we can configure tlbwe and tlbilx to 
go to the hypervisor. The guest should not mix tlbwe (for TLB0) and 
hardware page table walk, so we can support this scenario without 
modifying the guest.

Diana
diff mbox

Patch

diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index 4684e33..1f0ae33 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1010,12 +1010,9 @@  skpinv:	addi	r6,r6,1				/* Increment */
 	mtspr	SPRN_MAS0,r3
 	tlbre
 	mfspr	r6,SPRN_MAS1
-	rlwinm	r6,r6,0,2,0	/* clear IPROT */
+	rlwinm	r6,r6,0,2,31	/* clear IPROT and VALID */
 	mtspr	SPRN_MAS1,r6
 	tlbwe
-
-	/* Invalidate TLB1 */
-	PPC_TLBILX_ALL(0,R0)
 	sync
 	isync
 
@@ -1069,12 +1066,9 @@  skpinv:	addi	r6,r6,1				/* Increment */
 	mtspr	SPRN_MAS0,r4
 	tlbre
 	mfspr	r5,SPRN_MAS1
-	rlwinm	r5,r5,0,2,0	/* clear IPROT */
+	rlwinm	r5,r5,0,2,31	/* clear IPROT and VALID */
 	mtspr	SPRN_MAS1,r5
 	tlbwe
-
-	/* Invalidate TLB1 */
-	PPC_TLBILX_ALL(0,R0)
 	sync
 	isync