Message ID | 1360846595-3397-1-git-send-email-diana.craciun@freescale.com (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | ed2ddc56e758d516c5699260ada4d68434dfe1dc |
Headers | show |
On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote: > From: Diana Craciun <Diana.Craciun@freescale.com> > > On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor > state can execute TLB management instructions. If EPCR[DGTMI]=0 > tlbwe and tlbilx are allowed to execute normally in the guest state. > > A hypervisor may choose to virtualize TLB1 and for this purpose it > may use IPROT to protect the entries for being invalidated by the > guest. However, because tlbwe and tlbilx execution in the guest state > are sharing the same bit, it is not possible to have a scenario where > tlbwe is allowed to be executed in guest state and tlbilx traps. When > guest TLB management instructions are allowed to be executed in guest > state the guest cannot use tlbilx to invalidate TLB1 guest entries. Sorry, I don't understand the explanation... can you be more detailed ? > Linux is using tlbilx in the boot code to invalidate the temporary > entries it creates when initializing the MMU. The patch is replacing > the usage of tlbilx in initialization code with tlbwe with VALID bit > cleared. > > Linux is also using tlbilx in other contexts (like huge pages or > indirect entries) but removing the tlbilx from the initialization code > offers the possibility to have scenarios under hypervisor which are > not using huge pages or indirect entries. > > Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com> > --- > arch/powerpc/kernel/exceptions-64e.S | 10 ++-------- > 1 file changed, 2 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S > index 4684e33..1f0ae33 100644 > --- a/arch/powerpc/kernel/exceptions-64e.S > +++ b/arch/powerpc/kernel/exceptions-64e.S > @@ -1010,12 +1010,9 @@ skpinv: addi r6,r6,1 /* Increment */ > mtspr SPRN_MAS0,r3 > tlbre > mfspr r6,SPRN_MAS1 > - rlwinm r6,r6,0,2,0 /* clear IPROT */ > + rlwinm r6,r6,0,2,31 /* clear IPROT and VALID */ > mtspr SPRN_MAS1,r6 > tlbwe > - > - /* Invalidate TLB1 */ > - PPC_TLBILX_ALL(0,R0) > sync > isync > > @@ -1069,12 +1066,9 @@ skpinv: addi r6,r6,1 /* Increment */ > mtspr SPRN_MAS0,r4 > tlbre > mfspr r5,SPRN_MAS1 > - rlwinm r5,r5,0,2,0 /* clear IPROT */ > + rlwinm r5,r5,0,2,31 /* clear IPROT and VALID */ > mtspr SPRN_MAS1,r5 > tlbwe > - > - /* Invalidate TLB1 */ > - PPC_TLBILX_ALL(0,R0) > sync > isync >
On 02/15/2013 02:11 AM, Benjamin Herrenschmidt wrote: > On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote: >> From: Diana Craciun <Diana.Craciun@freescale.com> >> >> On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor >> state can execute TLB management instructions. If EPCR[DGTMI]=0 >> tlbwe and tlbilx are allowed to execute normally in the guest state. >> >> A hypervisor may choose to virtualize TLB1 and for this purpose it >> may use IPROT to protect the entries for being invalidated by the >> guest. However, because tlbwe and tlbilx execution in the guest state >> are sharing the same bit, it is not possible to have a scenario where >> tlbwe is allowed to be executed in guest state and tlbilx traps. When >> guest TLB management instructions are allowed to be executed in guest >> state the guest cannot use tlbilx to invalidate TLB1 guest entries. > Sorry, I don't understand the explanation... can you be more detailed ? TLB1 supports huge page sizes. The guest may see the memory as contiguous but it sees the guest physical memory as presented by the hypervisor. In reality the real physical memory may be fragmented. In this case the hypervisor can add more than one TLB1 entry for one guest request and the hypervisor will keep track of all fragments. When the guest performs a tlbilx, the hypervisor will correctly invalidate all the corresponding fragments because both tlbwe and tlbilx trap and has full control of tlb management instructions targeting TLB1. For e6500 a single bit controls if tlbwe and tlbilx trap to the Hypervisor. tlbwe targeting TLB1 always traps. But if we want to use LRAT for TLB0, we have to configure tlbwe (targeting TLB 0) to go directly to the guest. But in this case tlbilx (which is targeting both TLBs) will never trap. If the tlbilx does not trap, the guest can invalidate only one of (possible more) fragments and furthermore the synchronization between what entries the hypervisor thinks there are in the TLB1 and what are the actual entries is lost. Diana
On 02/15/2013 09:16:15 AM, Diana Craciun wrote: > On 02/15/2013 02:11 AM, Benjamin Herrenschmidt wrote: >> On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote: >>> From: Diana Craciun <Diana.Craciun@freescale.com> >>> >>> On Freescale e6500 cores EPCR[DGTMI] controls whether guest >>> supervisor >>> state can execute TLB management instructions. If EPCR[DGTMI]=0 >>> tlbwe and tlbilx are allowed to execute normally in the guest state. >>> >>> A hypervisor may choose to virtualize TLB1 and for this purpose it >>> may use IPROT to protect the entries for being invalidated by the >>> guest. However, because tlbwe and tlbilx execution in the guest >>> state >>> are sharing the same bit, it is not possible to have a scenario >>> where >>> tlbwe is allowed to be executed in guest state and tlbilx traps. >>> When >>> guest TLB management instructions are allowed to be executed in >>> guest >>> state the guest cannot use tlbilx to invalidate TLB1 guest entries. >> Sorry, I don't understand the explanation... can you be more >> detailed ? > > TLB1 supports huge page sizes. The guest may see the memory as > contiguous but it sees the guest physical memory as presented by the > hypervisor. In reality the real physical memory may be fragmented. In > this case the hypervisor can add more than one TLB1 entry for one > guest request and the hypervisor will keep track of all fragments. > When the guest performs a tlbilx, the hypervisor will correctly > invalidate all the corresponding fragments because both tlbwe and > tlbilx trap and has full control of tlb management instructions > targeting TLB1. > > For e6500 a single bit controls if tlbwe and tlbilx trap to the > Hypervisor. tlbwe targeting TLB1 always traps. But if we want to use > LRAT for TLB0, we have to configure tlbwe (targeting TLB 0) to go > directly to the guest. But in this case tlbilx (which is targeting > both TLBs) will never trap. > > If the tlbilx does not trap, the guest can invalidate only one of > (possible more) fragments and furthermore the synchronization between > what entries the hypervisor thinks there are in the TLB1 and what are > the actual entries is lost. This patch addresses boot-time invalidations only. How will you handle hugetlb invalidations (or indirect entry invalidations, once that becomes supported)? -Scott
On 02/19/2013 09:47 PM, Scott Wood wrote: > On 02/15/2013 09:16:15 AM, Diana Craciun wrote: >> On 02/15/2013 02:11 AM, Benjamin Herrenschmidt wrote: >>> On Thu, 2013-02-14 at 14:56 +0200, Diana Craciun wrote: >>>> From: Diana Craciun <Diana.Craciun@freescale.com> >>>> >>>> On Freescale e6500 cores EPCR[DGTMI] controls whether guest >>>> supervisor >>>> state can execute TLB management instructions. If EPCR[DGTMI]=0 >>>> tlbwe and tlbilx are allowed to execute normally in the guest state. >>>> >>>> A hypervisor may choose to virtualize TLB1 and for this purpose it >>>> may use IPROT to protect the entries for being invalidated by the >>>> guest. However, because tlbwe and tlbilx execution in the guest >>>> state >>>> are sharing the same bit, it is not possible to have a scenario >>>> where >>>> tlbwe is allowed to be executed in guest state and tlbilx traps. >>>> When >>>> guest TLB management instructions are allowed to be executed in >>>> guest >>>> state the guest cannot use tlbilx to invalidate TLB1 guest entries. >>> Sorry, I don't understand the explanation... can you be more >>> detailed ? >> TLB1 supports huge page sizes. The guest may see the memory as >> contiguous but it sees the guest physical memory as presented by the >> hypervisor. In reality the real physical memory may be fragmented. In >> this case the hypervisor can add more than one TLB1 entry for one >> guest request and the hypervisor will keep track of all fragments. >> When the guest performs a tlbilx, the hypervisor will correctly >> invalidate all the corresponding fragments because both tlbwe and >> tlbilx trap and has full control of tlb management instructions >> targeting TLB1. >> >> For e6500 a single bit controls if tlbwe and tlbilx trap to the >> Hypervisor. tlbwe targeting TLB1 always traps. But if we want to use >> LRAT for TLB0, we have to configure tlbwe (targeting TLB 0) to go >> directly to the guest. But in this case tlbilx (which is targeting >> both TLBs) will never trap. >> >> If the tlbilx does not trap, the guest can invalidate only one of >> (possible more) fragments and furthermore the synchronization between >> what entries the hypervisor thinks there are in the TLB1 and what are >> the actual entries is lost. > This patch addresses boot-time invalidations only. How will you handle > hugetlb invalidations (or indirect entry invalidations, once that > becomes supported)? > > -Scott I will not handle them. This patch offers the possibility to run Linux under hypervisor without using hugetlb or indirect entries (of course in case when we configure tlb management instructions to go to the guest because otherwise it works) If indirect entries are supported most likely we will configure tlbilx and tlbwe to trap. In this case LRAT will be still used through the page table walk mechanism. Diana
On Tue, Feb 19, 2013 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote: > > This patch addresses boot-time invalidations only. How will you handle > hugetlb invalidations (or indirect entry invalidations, once that becomes > supported)? We do envision that "direct guest TLB management" is an opt-in option that a guest can enable. If LRAT is on, with TLB management directly handled by guests, the only mechanism we have to do TLB1 invalidates is tlbwe. That is our only option as far as I know. So, hugetlb and indirect entries will each need to be addressed separately. The kernel code that handles these either needs to be A) modified to unconditionally do all invalidates by tlbwe or B) conditionally use tlbwe depending on whether this is a guest that has enabled direct TLB management. Stuart
On 02/20/2013 04:22 PM, Stuart Yoder wrote: > On Tue, Feb 19, 2013 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote: >> This patch addresses boot-time invalidations only. How will you handle >> hugetlb invalidations (or indirect entry invalidations, once that becomes >> supported)? > We do envision that "direct guest TLB management" is an opt-in option > that a guest can enable. > > If LRAT is on, with TLB management directly handled by guests, the only > mechanism we have to do TLB1 invalidates is tlbwe. That is our only option > as far as I know. So, hugetlb and indirect entries will each need to be > addressed separately. The kernel code that handles these either needs > to be A) modified to unconditionally do all invalidates by tlbwe or B) > conditionally > use tlbwe depending on whether this is a guest that has enabled direct > TLB management. > > Stuart > In case of indirect entries I think we can configure tlbwe and tlbilx to go to the hypervisor. The guest should not mix tlbwe (for TLB0) and hardware page table walk, so we can support this scenario without modifying the guest. Diana
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 4684e33..1f0ae33 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1010,12 +1010,9 @@ skpinv: addi r6,r6,1 /* Increment */ mtspr SPRN_MAS0,r3 tlbre mfspr r6,SPRN_MAS1 - rlwinm r6,r6,0,2,0 /* clear IPROT */ + rlwinm r6,r6,0,2,31 /* clear IPROT and VALID */ mtspr SPRN_MAS1,r6 tlbwe - - /* Invalidate TLB1 */ - PPC_TLBILX_ALL(0,R0) sync isync @@ -1069,12 +1066,9 @@ skpinv: addi r6,r6,1 /* Increment */ mtspr SPRN_MAS0,r4 tlbre mfspr r5,SPRN_MAS1 - rlwinm r5,r5,0,2,0 /* clear IPROT */ + rlwinm r5,r5,0,2,31 /* clear IPROT and VALID */ mtspr SPRN_MAS1,r5 tlbwe - - /* Invalidate TLB1 */ - PPC_TLBILX_ALL(0,R0) sync isync