Patchwork [v2] powernv: kvm: make _PAGE_NUMA take effect

login
register
mail settings
Submitter Liu Ping Fan
Date Jan. 21, 2014, 8:15 a.m.
Message ID <1390292129-15871-1-git-send-email-pingfank@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/312827/
State New
Headers show

Comments

Liu Ping Fan - Jan. 21, 2014, 8:15 a.m.
To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
when guest access them, we should force the checking when guest uses hypercall
to setup hpte.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
v2:
  It should be the reply to "[PATCH 2/4] powernv: kvm: make _PAGE_NUMA take effect"
  And I imporve the changelog according to Aneesh's suggestion.
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Aneesh Kumar K.V - Jan. 21, 2014, 9:42 a.m.
Liu Ping Fan <kernelfans@gmail.com> writes:

> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
> when guest access them, we should force the checking when guest uses hypercall
> to setup hpte.
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
mmu_notifier_invalidate_range_end, which will mark existing guest hpte
entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
guest hpte entries. This patch does that. 

> ---
> v2:
>   It should be the reply to "[PATCH 2/4] powernv: kvm: make _PAGE_NUMA take effect"
>   And I imporve the changelog according to Aneesh's suggestion.
> ---
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 9c51544..af8602d 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -232,7 +232,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
>  		/* Look up the Linux PTE for the backing page */
>  		pte_size = psize;
>  		pte = lookup_linux_pte(pgdir, hva, writing, &pte_size);
> -		if (pte_present(pte)) {
> +		if (pte_present(pte) && !pte_numa(pte)) {
>  			if (writing && !pte_write(pte))
>  				/* make the actual HPTE be read-only */
>  				ptel = hpte_make_readonly(ptel);
> -- 
> 1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Jan. 27, 2014, 9:11 a.m.
On 21.01.2014, at 10:42, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Liu Ping Fan <kernelfans@gmail.com> writes:
> 
>> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
>> when guest access them, we should force the checking when guest uses hypercall
>> to setup hpte.
>> 
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
> mmu_notifier_invalidate_range_end, which will mark existing guest hpte
> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
> guest hpte entries. This patch does that. 

So what happens next? We insert a page into the HTAB without HPTE_V_VALID set, so the guest will fail to use it. If the guest does an H_READ on it it will suddenly turn to V_VALID though?

I might need a crash course in the use of HPTE_V_ABSENT.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V - Jan. 27, 2014, 10:28 a.m.
Alexander Graf <agraf@suse.de> writes:

> On 21.01.2014, at 10:42, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Liu Ping Fan <kernelfans@gmail.com> writes:
>> 
>>> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
>>> when guest access them, we should force the checking when guest uses hypercall
>>> to setup hpte.
>>> 
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> 
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
>> mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>> guest hpte entries. This patch does that. 
>
> So what happens next? We insert a page into the HTAB without
> HPTE_V_VALID set, so the guest will fail to use it. If the guest does
> an H_READ on it it will suddenly turn to V_VALID though?

As per the guest the entry is valid, so yes an hread should return a
valid entry. But in real hpte we would mark it not valid.

>
> I might need a crash course in the use of HPTE_V_ABSENT.

When guest tries to access the address, the host will handle the fault.

kvmppc_hpte_hv_fault should give more info

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras - Jan. 27, 2014, 10:41 a.m.
On Mon, Jan 27, 2014 at 10:11:40AM +0100, Alexander Graf wrote:
> 
> On 21.01.2014, at 10:42, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > Liu Ping Fan <kernelfans@gmail.com> writes:
> > 
> >> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
> >> when guest access them, we should force the checking when guest uses hypercall
> >> to setup hpte.
> >> 
> >> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > 
> > Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
> > mmu_notifier_invalidate_range_end, which will mark existing guest hpte
> > entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
> > guest hpte entries. This patch does that. 
> 
> So what happens next? We insert a page into the HTAB without HPTE_V_VALID set, so the guest will fail to use it. If the guest does an H_READ on it it will suddenly turn to V_VALID though?
> 
> I might need a crash course in the use of HPTE_V_ABSENT.

HPTE_V_ABSENT means present from the point of view of the guest but
not present from the host's point of view, so yes H_READ turns
HPTE_V_ABSENT into HPTE_V_VALID.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Jan. 27, 2014, 10:42 a.m.
On 27.01.2014, at 11:28, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Alexander Graf <agraf@suse.de> writes:
> 
>> On 21.01.2014, at 10:42, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>> 
>>> Liu Ping Fan <kernelfans@gmail.com> writes:
>>> 
>>>> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
>>>> when guest access them, we should force the checking when guest uses hypercall
>>>> to setup hpte.
>>>> 
>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> 
>>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> 
>>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
>>> mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>> guest hpte entries. This patch does that. 
>> 
>> So what happens next? We insert a page into the HTAB without
>> HPTE_V_VALID set, so the guest will fail to use it. If the guest does
>> an H_READ on it it will suddenly turn to V_VALID though?
> 
> As per the guest the entry is valid, so yes an hread should return a
> valid entry. But in real hpte we would mark it not valid.

Ah, yes.

> 
>> 
>> I might need a crash course in the use of HPTE_V_ABSENT.
> 
> When guest tries to access the address, the host will handle the fault.
> 
> kvmppc_hpte_hv_fault should give more info

Thanks for the pointer. So we fault it in lazily. Is there any particular reason we can't do that on h_enter already? After all this just means an additional roundtrip because the guest is pretty likely to use the page it just entered, no?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V - Jan. 27, 2014, 3:57 p.m.
Alexander Graf <agraf@suse.de> writes:

> On 27.01.2014, at 11:28, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Alexander Graf <agraf@suse.de> writes:
>> 
>>> On 21.01.2014, at 10:42, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>>> 
>>>> Liu Ping Fan <kernelfans@gmail.com> writes:
>>>> 
>>>>> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
>>>>> when guest access them, we should force the checking when guest uses hypercall
>>>>> to setup hpte.
>>>>> 
>>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> 
>>>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>> 
>>>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
>>>> mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>> guest hpte entries. This patch does that. 
>>> 
>>> So what happens next? We insert a page into the HTAB without
>>> HPTE_V_VALID set, so the guest will fail to use it. If the guest does
>>> an H_READ on it it will suddenly turn to V_VALID though?
>> 
>> As per the guest the entry is valid, so yes an hread should return a
>> valid entry. But in real hpte we would mark it not valid.
>
> Ah, yes.
>
>> 
>>> 
>>> I might need a crash course in the use of HPTE_V_ABSENT.
>> 
>> When guest tries to access the address, the host will handle the fault.
>> 
>> kvmppc_hpte_hv_fault should give more info
>
> Thanks for the pointer. So we fault it in lazily. Is there any
> particular reason we can't do that on h_enter already? After all this
> just means an additional roundtrip because the guest is pretty likely
> to use the page it just entered, no?

We could get wrong numa fault information if we didn't do h_enter from
the right node from which we faulted.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Ping Fan - April 3, 2014, 2:36 a.m.
Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
enable numa fault for powerpc.

Thx,
Fan

On Mon, Jan 27, 2014 at 5:11 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 21.01.2014, at 10:42, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Liu Ping Fan <kernelfans@gmail.com> writes:
>>
>>> To make sure that on host, the pages marked with _PAGE_NUMA result in a fault
>>> when guest access them, we should force the checking when guest uses hypercall
>>> to setup hpte.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>
>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and
>> mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>> guest hpte entries. This patch does that.
>
> So what happens next? We insert a page into the HTAB without HPTE_V_VALID set, so the guest will fail to use it. If the guest does an H_READ on it it will suddenly turn to V_VALID though?
>
> I might need a crash course in the use of HPTE_V_ABSENT.
>
>
> Alex
>
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 3, 2014, 11:36 a.m.
On 03.04.14 04:36, Liu ping fan wrote:
> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
> enable numa fault for powerpc.

What bad happens without this patch? We map a page even though it was 
declared to get NUMA migrated? What happens next?

I'm trying to figure out whether I need to mark this with a stable tag 
for 3.14.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 3, 2014, 11:38 a.m.
On 03.04.14 13:36, Alexander Graf wrote:
>
> On 03.04.14 04:36, Liu ping fan wrote:
>> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
>> enable numa fault for powerpc.
>
> What bad happens without this patch? We map a page even though it was 
> declared to get NUMA migrated? What happens next?
>
> I'm trying to figure out whether I need to mark this with a stable tag 
> for 3.14.

Also, what about the other uses of pre_present()?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 3, 2014, 11:43 a.m.
On 03.04.14 13:38, Alexander Graf wrote:
>
> On 03.04.14 13:36, Alexander Graf wrote:
>>
>> On 03.04.14 04:36, Liu ping fan wrote:
>>> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
>>> enable numa fault for powerpc.
>>
>> What bad happens without this patch? We map a page even though it was 
>> declared to get NUMA migrated? What happens next?
>>
>> I'm trying to figure out whether I need to mark this with a stable 
>> tag for 3.14.
>
> Also, what about the other uses of pre_present()?

Oh how I love to reply to myself. The patch doesn't apply to the latest 
code anymore either. Also please rework the wording of your patch 
subject and patch description to explain the actual problem at hand.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V - April 7, 2014, 7:42 a.m.
Alexander Graf <agraf@suse.com> writes:

> On 03.04.14 04:36, Liu ping fan wrote:
>> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
>> enable numa fault for powerpc.
>
> What bad happens without this patch? We map a page even though it was 
> declared to get NUMA migrated? What happens next?

Nothing much, we won't be properly accounting the numa access in the
host.  What we want to achieve is to convert a guest access of the page to
a host fault so that we can do proper numa access accounting in the
host. This would enable us to migrate the page to the correct numa
node. 

>
> I'm trying to figure out whether I need to mark this with a stable tag 
> for 3.14.
>
>

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 7, 2014, 8:36 a.m.
On 07.04.14 09:42, Aneesh Kumar K.V wrote:
> Alexander Graf <agraf@suse.com> writes:
>
>> On 03.04.14 04:36, Liu ping fan wrote:
>>> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
>>> enable numa fault for powerpc.
>> What bad happens without this patch? We map a page even though it was
>> declared to get NUMA migrated? What happens next?
> Nothing much, we won't be properly accounting the numa access in the
> host.  What we want to achieve is to convert a guest access of the page to
> a host fault so that we can do proper numa access accounting in the
> host. This would enable us to migrate the page to the correct numa
> node.

Ok, so no breakages, just less performance. I wouldn't consider it 
stable material then :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Ping Fan - April 10, 2014, 3:28 a.m.
On Mon, Apr 7, 2014 at 4:36 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 07.04.14 09:42, Aneesh Kumar K.V wrote:
>>
>> Alexander Graf <agraf@suse.com> writes:
>>
>>> On 03.04.14 04:36, Liu ping fan wrote:
>>>>
>>>> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
>>>> enable numa fault for powerpc.
>>>
>>> What bad happens without this patch? We map a page even though it was
>>> declared to get NUMA migrated? What happens next?
>>
>> Nothing much, we won't be properly accounting the numa access in the
>> host.  What we want to achieve is to convert a guest access of the page to
>> a host fault so that we can do proper numa access accounting in the
>> host. This would enable us to migrate the page to the correct numa
>> node.
>
>
> Ok, so no breakages, just less performance. I wouldn't consider it stable
> material then :).
>
Sorry to reply late, since I am half out of office during this period.
I am puzzling about you reply.    Without this patch, the guest can
NOT sense the numa changes and ask host to put the pages in right
place.  So the pages which is used by guest will be always misplaced.
The numa-fault method is inspired by real requirement to improve
performance, so we should also consider the performance drop of guest.
Right?

Regards,
Fan

>
> Alex
>
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 10, 2014, 10:02 a.m.
On 10.04.14 05:28, Liu ping fan wrote:
> On Mon, Apr 7, 2014 at 4:36 PM, Alexander Graf <agraf@suse.de> wrote:
>> On 07.04.14 09:42, Aneesh Kumar K.V wrote:
>>> Alexander Graf <agraf@suse.com> writes:
>>>
>>>> On 03.04.14 04:36, Liu ping fan wrote:
>>>>> Hi Alex, could you help to pick up this patch? since  v3.14 kernel can
>>>>> enable numa fault for powerpc.
>>>> What bad happens without this patch? We map a page even though it was
>>>> declared to get NUMA migrated? What happens next?
>>> Nothing much, we won't be properly accounting the numa access in the
>>> host.  What we want to achieve is to convert a guest access of the page to
>>> a host fault so that we can do proper numa access accounting in the
>>> host. This would enable us to migrate the page to the correct numa
>>> node.
>>
>> Ok, so no breakages, just less performance. I wouldn't consider it stable
>> material then :).
>>
> Sorry to reply late, since I am half out of office during this period.
> I am puzzling about you reply.    Without this patch, the guest can
> NOT sense the numa changes and ask host to put the pages in right
> place.  So the pages which is used by guest will be always misplaced.
> The numa-fault method is inspired by real requirement to improve
> performance, so we should also consider the performance drop of guest.
> Right?

The patch will get into Linux, I just consider a non-working new feature 
not a regression that warrants us to CC stable@vger :). After all 
performance shouldn't be worse than without the numa migration feature, 
correct?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9c51544..af8602d 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -232,7 +232,7 @@  long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		/* Look up the Linux PTE for the backing page */
 		pte_size = psize;
 		pte = lookup_linux_pte(pgdir, hva, writing, &pte_size);
-		if (pte_present(pte)) {
+		if (pte_present(pte) && !pte_numa(pte)) {
 			if (writing && !pte_write(pte))
 				/* make the actual HPTE be read-only */
 				ptel = hpte_make_readonly(ptel);