diff mbox

[v3] powerpc: kvm: make _PAGE_NUMA take effect

Message ID 1397216707-12795-1-git-send-email-pingfank@linux.vnet.ibm.com
State New, archived
Headers show

Commit Message

pingfank@linux.vnet.ibm.com April 11, 2014, 11:45 a.m. UTC
When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start
and mmu_notifier_invalidate_range_end, which will mark existing guest hpte
entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
guest hpte entries.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
v3:
  rebased onto Alex's tree, branch kvm-ppc-next
  substitue commit log with the more clear description in Aneesh's reply (thanks, Aneesh)
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Alexander Graf April 11, 2014, 2:03 p.m. UTC | #1
On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:

> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start
> and mmu_notifier_invalidate_range_end, which will mark existing guest hpte
> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
> guest hpte entries.

What happens when we don't? Why do we need the check? Why isn't it done implicitly? What happens when we treat a NUMA marked page as non-present? Why does it work out for us?

Assume you have no idea what PAGE_NUMA is, but try to figure out what this patch does and whether you need to cherry-pick it into your downstream kernel. The description as is still is not very helpful for that. It doesn't even explain what really changes with this patch applied.

> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> v3:
>  rebased onto Alex's tree, branch kvm-ppc-next
>  substitue commit log with the more clear description in Aneesh's reply (thanks, Aneesh)
> ---
> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 1d6c56a..1117525 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -234,7 +234,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
> 		pte_size = psize;
> 		pte = lookup_linux_pte_and_update(pgdir, hva, writing,
> 						  &pte_size);
> -		if (pte_present(pte)) {
> +		if (pte_present(pte)&&!pte_numa(pte)) {

Spaces missing


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pingfan Liu April 13, 2014, 2:27 a.m. UTC | #2
On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:
>
>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start
>> and mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>> guest hpte entries.
>
> What happens when we don't? Why do we need the check? Why isn't it done implicitly? What happens when we treat a NUMA marked page as non-present? Why does it work out for us?
>
> Assume you have no idea what PAGE_NUMA is, but try to figure out what this patch does and whether you need to cherry-pick it into your downstream kernel. The description as is still is not very helpful for that. It doesn't even explain what really changes with this patch applied.
>
Yeah.  what about appending the following description?  Can it make
the context clear?
"Guest should not setup a hpte for the page whose pte is marked with
_PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
to check whether the page is placed correctly or not."

>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> v3:
>>  rebased onto Alex's tree, branch kvm-ppc-next
>>  substitue commit log with the more clear description in Aneesh's reply (thanks, Aneesh)
>> ---
>> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> index 1d6c56a..1117525 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> @@ -234,7 +234,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
>>               pte_size = psize;
>>               pte = lookup_linux_pte_and_update(pgdir, hva, writing,
>>                                                 &pte_size);
>> -             if (pte_present(pte)) {
>> +             if (pte_present(pte)&&!pte_numa(pte)) {
>
> Spaces missing
>
Will fix,

Thx,
Fan
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf April 14, 2014, 6:43 a.m. UTC | #3
On 13.04.14 04:27, Liu ping fan wrote:
> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:
>>
>>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start
>>> and mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>> guest hpte entries.
>> What happens when we don't? Why do we need the check? Why isn't it done implicitly? What happens when we treat a NUMA marked page as non-present? Why does it work out for us?
>>
>> Assume you have no idea what PAGE_NUMA is, but try to figure out what this patch does and whether you need to cherry-pick it into your downstream kernel. The description as is still is not very helpful for that. It doesn't even explain what really changes with this patch applied.
>>
> Yeah.  what about appending the following description?  Can it make
> the context clear?
> "Guest should not setup a hpte for the page whose pte is marked with
> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
> to check whether the page is placed correctly or not."

Try to come up with a text that answers the following questions in order:

   - What does _PAGE_NUMA mean?
   - How does page migration with _PAGE_NUMA work?
   -> Why should we not map pages when _PAGE_NUMA is set?
   - Which part of what needs to be done did the previous _PAGE_NUMA 
patch address?
   - What's the situation without this patch?
   - Which scenario does this patch fix?

Once you have a text that answers those, you should have a good patch 
description :).

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
pingfan liu April 14, 2014, 8:08 a.m. UTC | #4
On Mon, Apr 14, 2014 at 2:43 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 13.04.14 04:27, Liu ping fan wrote:
>>
>> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>>>
>>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> wrote:
>>>
>>>> When we mark pte with _PAGE_NUMA we already call
>>>> mmu_notifier_invalidate_range_start
>>>> and mmu_notifier_invalidate_range_end, which will mark existing guest
>>>> hpte
>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>> guest hpte entries.
>>>
>>> What happens when we don't? Why do we need the check? Why isn't it done
>>> implicitly? What happens when we treat a NUMA marked page as non-present?
>>> Why does it work out for us?
>>>
>>> Assume you have no idea what PAGE_NUMA is, but try to figure out what
>>> this patch does and whether you need to cherry-pick it into your downstream
>>> kernel. The description as is still is not very helpful for that. It doesn't
>>> even explain what really changes with this patch applied.
>>>
>> Yeah.  what about appending the following description?  Can it make
>> the context clear?
>> "Guest should not setup a hpte for the page whose pte is marked with
>> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
>> to check whether the page is placed correctly or not."
>
>
> Try to come up with a text that answers the following questions in order:
>
I divide them into 3 groups, and answer them by 3 sections. Seems that
it has the total story :)
Please take a look.

>   - What does _PAGE_NUMA mean?
Group 1 -> section 2

>   - How does page migration with _PAGE_NUMA work?
>   -> Why should we not map pages when _PAGE_NUMA is set?
Group 2 -> section 1
(Note: for the 1st question in this group, I am not sure about the
details, except that we can fix numa balancing by moving task or
moving page.  So I comment as " migration should be involved to cut
down the distance between the cpu and pages")

>   - Which part of what needs to be done did the previous _PAGE_NUMA patch
> address?
>   - What's the situation without this patch?
>   - Which scenario does this patch fix?
>
Group 3 -> section 3


Numa fault is a method which help to achieve auto numa balancing.
When such a page fault takes place, the page fault handler will check
whether the page is placed correctly. If not, migration should be
involved to cut down the distance between the cpu and pages.

A pte with _PAGE_NUMA help to implement numa fault. It means not to
allow the MMU to access the page directly. So a page fault is triggered
and numa fault handler gets the opportunity to run checker.

As for the access of MMU, we need special handling for the powernv's guest.
When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
invalidate it in guest's htab, but when we tried to re-insert them,
we firstly try to fix it in real-mode. Only after this fails, we fallback
to virt mode, and most of important, we run numa fault handler in virt
mode.  This patch guards the way of real-mode to ensure that if a pte is
marked with _PAGE_NUMA, it will NOT be fixed in real mode, instead, it will
be fixed in virt mode and have the opportunity to be checked with placement.


Thx,
Fan


> Once you have a text that answers those, you should have a good patch
> description :).
>
> Alex
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf April 14, 2014, 9:01 a.m. UTC | #5
On 14.04.14 10:08, liu ping fan wrote:
> On Mon, Apr 14, 2014 at 2:43 PM, Alexander Graf <agraf@suse.de> wrote:
>> On 13.04.14 04:27, Liu ping fan wrote:
>>> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>>>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> wrote:
>>>>
>>>>> When we mark pte with _PAGE_NUMA we already call
>>>>> mmu_notifier_invalidate_range_start
>>>>> and mmu_notifier_invalidate_range_end, which will mark existing guest
>>>>> hpte
>>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>>> guest hpte entries.
>>>> What happens when we don't? Why do we need the check? Why isn't it done
>>>> implicitly? What happens when we treat a NUMA marked page as non-present?
>>>> Why does it work out for us?
>>>>
>>>> Assume you have no idea what PAGE_NUMA is, but try to figure out what
>>>> this patch does and whether you need to cherry-pick it into your downstream
>>>> kernel. The description as is still is not very helpful for that. It doesn't
>>>> even explain what really changes with this patch applied.
>>>>
>>> Yeah.  what about appending the following description?  Can it make
>>> the context clear?
>>> "Guest should not setup a hpte for the page whose pte is marked with
>>> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
>>> to check whether the page is placed correctly or not."
>>
>> Try to come up with a text that answers the following questions in order:
>>
> I divide them into 3 groups, and answer them by 3 sections. Seems that
> it has the total story :)
> Please take a look.
>
>>    - What does _PAGE_NUMA mean?
> Group 1 -> section 2
>
>>    - How does page migration with _PAGE_NUMA work?
>>    -> Why should we not map pages when _PAGE_NUMA is set?
> Group 2 -> section 1
> (Note: for the 1st question in this group, I am not sure about the
> details, except that we can fix numa balancing by moving task or
> moving page.  So I comment as " migration should be involved to cut
> down the distance between the cpu and pages")
>
>>    - Which part of what needs to be done did the previous _PAGE_NUMA patch
>> address?
>>    - What's the situation without this patch?
>>    - Which scenario does this patch fix?
>>
> Group 3 -> section 3
>
>
> Numa fault is a method which help to achieve auto numa balancing.
> When such a page fault takes place, the page fault handler will check
> whether the page is placed correctly. If not, migration should be
> involved to cut down the distance between the cpu and pages.
>
> A pte with _PAGE_NUMA help to implement numa fault. It means not to
> allow the MMU to access the page directly. So a page fault is triggered
> and numa fault handler gets the opportunity to run checker.
>
> As for the access of MMU, we need special handling for the powernv's guest.
> When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
> invalidate it in guest's htab, but when we tried to re-insert them,
> we firstly try to fix it in real-mode. Only after this fails, we fallback
> to virt mode, and most of important, we run numa fault handler in virt
> mode.  This patch guards the way of real-mode to ensure that if a pte is
> marked with _PAGE_NUMA, it will NOT be fixed in real mode, instead, it will
> be fixed in virt mode and have the opportunity to be checked with placement.

s/fixed/mapped/g

Otherwise works as patch description for me :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1d6c56a..1117525 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -234,7 +234,7 @@  long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		pte_size = psize;
 		pte = lookup_linux_pte_and_update(pgdir, hva, writing,
 						  &pte_size);
-		if (pte_present(pte)) {
+		if (pte_present(pte)&&!pte_numa(pte)) {
 			if (writing && !pte_write(pte))
 				/* make the actual HPTE be read-only */
 				ptel = hpte_make_readonly(ptel);