Patchwork [19/38] KVM: PPC: Add cache flush on page map

login
register
mail settings
Submitter Alexander Graf
Date Aug. 14, 2012, 11:04 p.m.
Message ID <1344985483-7440-20-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/177496/
State New
Headers show

Comments

Alexander Graf - Aug. 14, 2012, 11:04 p.m.
When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_host.h   |    1 +
 arch/powerpc/include/asm/kvm_ppc.h    |   12 ++++++++++++
 arch/powerpc/kvm/book3s_32_mmu_host.c |    3 +++
 arch/powerpc/kvm/book3s_64_mmu_host.c |    2 ++
 arch/powerpc/kvm/e500_tlb.c           |    3 +++
 arch/powerpc/mm/mem.c                 |    1 +
 6 files changed, 22 insertions(+), 0 deletions(-)
Scott Wood - Aug. 15, 2012, 1:23 a.m.
On 08/14/2012 06:04 PM, Alexander Graf wrote:
> When we map a page that wasn't icache cleared before, do so when first
> mapping it in KVM using the same information bits as the Linux mapping
> logic. That way we are 100% sure that any page we map does not have stale
> entries in the icache.

We're not really 100% sure of that -- this only handles the case where
the kernel does the dirtying, not when it's done by QEMU or the guest.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 9:52 a.m.
On 15.08.2012, at 03:23, Scott Wood wrote:

> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>> When we map a page that wasn't icache cleared before, do so when first
>> mapping it in KVM using the same information bits as the Linux mapping
>> logic. That way we are 100% sure that any page we map does not have stale
>> entries in the icache.
> 
> We're not really 100% sure of that -- this only handles the case where
> the kernel does the dirtying, not when it's done by QEMU or the guest.

When the guest does it, the guest is responsible for clearing the icache. Same for QEMU. It needs to clear it when doing DMA.

However, what is still broken would be a direct /dev/mem map. There QEMU should probably clear the icache before starting the guest, in case another guest was running on that same memory before. Fortunately, we don't have that mode available in upstream QEMU :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 5:26 p.m.
On 08/15/2012 04:52 AM, Alexander Graf wrote:
> 
> On 15.08.2012, at 03:23, Scott Wood wrote:
> 
>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>> When we map a page that wasn't icache cleared before, do so when first
>>> mapping it in KVM using the same information bits as the Linux mapping
>>> logic. That way we are 100% sure that any page we map does not have stale
>>> entries in the icache.
>>
>> We're not really 100% sure of that -- this only handles the case where
>> the kernel does the dirtying, not when it's done by QEMU or the guest.
> 
> When the guest does it, the guest is responsible for clearing the
> icache. Same for QEMU. It needs to clear it when doing DMA.

Sure.  I was just worried that that commit message could be taken the
wrong way, as in "we no longer need the QEMU icache flushing patch".

> However, what is still broken would be a direct /dev/mem map. There
> QEMU should probably clear the icache before starting the guest, in
> case another guest was running on that same memory before.
> Fortunately, we don't have that mode available in upstream QEMU :).

How is QEMU loading images different if it's /dev/mem versus ordinary
anonymous memory?  You probably won't have stale icache data in the
latter case (which makes it less likely to be a problem in pratice), but
in theory you could have data that still hasn't left the dcache.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 5:27 p.m.
On 15.08.2012, at 19:26, Scott Wood wrote:

> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>> 
>> On 15.08.2012, at 03:23, Scott Wood wrote:
>> 
>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>> When we map a page that wasn't icache cleared before, do so when first
>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>> entries in the icache.
>>> 
>>> We're not really 100% sure of that -- this only handles the case where
>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>> 
>> When the guest does it, the guest is responsible for clearing the
>> icache. Same for QEMU. It needs to clear it when doing DMA.
> 
> Sure.  I was just worried that that commit message could be taken the
> wrong way, as in "we no longer need the QEMU icache flushing patch".
> 
>> However, what is still broken would be a direct /dev/mem map. There
>> QEMU should probably clear the icache before starting the guest, in
>> case another guest was running on that same memory before.
>> Fortunately, we don't have that mode available in upstream QEMU :).
> 
> How is QEMU loading images different if it's /dev/mem versus ordinary
> anonymous memory?  You probably won't have stale icache data in the
> latter case (which makes it less likely to be a problem in pratice), but
> in theory you could have data that still hasn't left the dcache.

It's the same. I just talked to Ben about this today in a different context and we should be safe :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 5:47 p.m.
On 08/15/2012 12:27 PM, Alexander Graf wrote:
> 
> On 15.08.2012, at 19:26, Scott Wood wrote:
> 
>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>
>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>
>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>> entries in the icache.
>>>>
>>>> We're not really 100% sure of that -- this only handles the case where
>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>
>>> When the guest does it, the guest is responsible for clearing the
>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>
>> Sure.  I was just worried that that commit message could be taken the
>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>
>>> However, what is still broken would be a direct /dev/mem map. There
>>> QEMU should probably clear the icache before starting the guest, in
>>> case another guest was running on that same memory before.
>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>
>> How is QEMU loading images different if it's /dev/mem versus ordinary
>> anonymous memory?  You probably won't have stale icache data in the
>> latter case (which makes it less likely to be a problem in pratice), but
>> in theory you could have data that still hasn't left the dcache.
> 
> It's the same. I just talked to Ben about this today in a different context and we should be safe :).

Safe how?

If it's truly the same, we're definitely not safe, since I had problems
with this using /dev/mem (particularly when changing the kernel image
without a host reboot) before I put in the icache flush patch.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 6:01 p.m.
On 15.08.2012, at 19:47, Scott Wood wrote:

> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>> 
>> On 15.08.2012, at 19:26, Scott Wood wrote:
>> 
>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>> 
>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>> 
>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>> entries in the icache.
>>>>> 
>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>> 
>>>> When the guest does it, the guest is responsible for clearing the
>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>> 
>>> Sure.  I was just worried that that commit message could be taken the
>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>> 
>>>> However, what is still broken would be a direct /dev/mem map. There
>>>> QEMU should probably clear the icache before starting the guest, in
>>>> case another guest was running on that same memory before.
>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>> 
>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>> anonymous memory?  You probably won't have stale icache data in the
>>> latter case (which makes it less likely to be a problem in pratice), but
>>> in theory you could have data that still hasn't left the dcache.
>> 
>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
> 
> Safe how?
> 
> If it's truly the same, we're definitely not safe, since I had problems
> with this using /dev/mem (particularly when changing the kernel image
> without a host reboot) before I put in the icache flush patch.

QEMU needs to icache flush everything it puts into guest memory. Whatever blob comes next (SLOF for -M pseries, kernel for -M e500) assumes dirty icache for every page it maps itself. So SLOF will icache flush the kernel text segment, Linux will icache flush user space pages.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 6:16 p.m.
On 08/15/2012 01:01 PM, Alexander Graf wrote:
> 
> On 15.08.2012, at 19:47, Scott Wood wrote:
> 
>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>
>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>
>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>
>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>
>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>> entries in the icache.
>>>>>>
>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>
>>>>> When the guest does it, the guest is responsible for clearing the
>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>
>>>> Sure.  I was just worried that that commit message could be taken the
>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>
>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>> case another guest was running on that same memory before.
>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>
>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>> anonymous memory?  You probably won't have stale icache data in the
>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>> in theory you could have data that still hasn't left the dcache.
>>>
>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>
>> Safe how?
>>
>> If it's truly the same, we're definitely not safe, since I had problems
>> with this using /dev/mem (particularly when changing the kernel image
>> without a host reboot) before I put in the icache flush patch.
> 
> QEMU needs to icache flush everything it puts into guest memory.

Yes.  I thought you meant we should be safe as things are now.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 6:27 p.m.
On 15.08.2012, at 20:16, Scott Wood wrote:

> On 08/15/2012 01:01 PM, Alexander Graf wrote:
>> 
>> On 15.08.2012, at 19:47, Scott Wood wrote:
>> 
>>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>> 
>>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>> 
>>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>> 
>>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>> 
>>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>>> entries in the icache.
>>>>>>> 
>>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>> 
>>>>>> When the guest does it, the guest is responsible for clearing the
>>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>> 
>>>>> Sure.  I was just worried that that commit message could be taken the
>>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>> 
>>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>>> case another guest was running on that same memory before.
>>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>> 
>>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>>> anonymous memory?  You probably won't have stale icache data in the
>>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>>> in theory you could have data that still hasn't left the dcache.
>>>> 
>>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>> 
>>> Safe how?
>>> 
>>> If it's truly the same, we're definitely not safe, since I had problems
>>> with this using /dev/mem (particularly when changing the kernel image
>>> without a host reboot) before I put in the icache flush patch.
>> 
>> QEMU needs to icache flush everything it puts into guest memory.
> 
> Yes.  I thought you meant we should be safe as things are now.

Hrm. What happened to your patch that flushes the icache on cpu_physical_memory_rw?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 6:29 p.m.
On 15.08.2012, at 20:27, Alexander Graf wrote:

> 
> On 15.08.2012, at 20:16, Scott Wood wrote:
> 
>> On 08/15/2012 01:01 PM, Alexander Graf wrote:
>>> 
>>> On 15.08.2012, at 19:47, Scott Wood wrote:
>>> 
>>>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>>> 
>>>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>>> 
>>>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>>> 
>>>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>>> 
>>>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>>>> entries in the icache.
>>>>>>>> 
>>>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>>> 
>>>>>>> When the guest does it, the guest is responsible for clearing the
>>>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>>> 
>>>>>> Sure.  I was just worried that that commit message could be taken the
>>>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>>> 
>>>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>>>> case another guest was running on that same memory before.
>>>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>>> 
>>>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>>>> anonymous memory?  You probably won't have stale icache data in the
>>>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>>>> in theory you could have data that still hasn't left the dcache.
>>>>> 
>>>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>>> 
>>>> Safe how?
>>>> 
>>>> If it's truly the same, we're definitely not safe, since I had problems
>>>> with this using /dev/mem (particularly when changing the kernel image
>>>> without a host reboot) before I put in the icache flush patch.
>>> 
>>> QEMU needs to icache flush everything it puts into guest memory.
>> 
>> Yes.  I thought you meant we should be safe as things are now.
> 
> Hrm. What happened to your patch that flushes the icache on cpu_physical_memory_rw?

Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 6:33 p.m.
On 08/15/2012 01:29 PM, Alexander Graf wrote:
> 
> On 15.08.2012, at 20:27, Alexander Graf wrote:
> 
>>
>> On 15.08.2012, at 20:16, Scott Wood wrote:
>>
>>> On 08/15/2012 01:01 PM, Alexander Graf wrote:
>>>>
>>>> On 15.08.2012, at 19:47, Scott Wood wrote:
>>>>
>>>>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>>>>
>>>>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>>>>
>>>>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>>>>
>>>>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>>>>> entries in the icache.
>>>>>>>>>
>>>>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>>>>
>>>>>>>> When the guest does it, the guest is responsible for clearing the
>>>>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>>>>
>>>>>>> Sure.  I was just worried that that commit message could be taken the
>>>>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>>>>
>>>>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>>>>> case another guest was running on that same memory before.
>>>>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>>>>
>>>>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>>>>> anonymous memory?  You probably won't have stale icache data in the
>>>>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>>>>> in theory you could have data that still hasn't left the dcache.
>>>>>>
>>>>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>>>>
>>>>> Safe how?
>>>>>
>>>>> If it's truly the same, we're definitely not safe, since I had problems
>>>>> with this using /dev/mem (particularly when changing the kernel image
>>>>> without a host reboot) before I put in the icache flush patch.
>>>>
>>>> QEMU needs to icache flush everything it puts into guest memory.
>>>
>>> Yes.  I thought you meant we should be safe as things are now.
>>
>> Hrm. What happened to your patch that flushes the icache on cpu_physical_memory_rw?

IIRC Ben wanted it conditionalized to not slow things down on
icache-coherent systems, and I never got around to respinning it.

> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.

Why?

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 6:51 p.m.
On 15.08.2012, at 20:33, Scott Wood wrote:

> On 08/15/2012 01:29 PM, Alexander Graf wrote:
>> 
>> On 15.08.2012, at 20:27, Alexander Graf wrote:
>> 
>>> 
>>> On 15.08.2012, at 20:16, Scott Wood wrote:
>>> 
>>>> On 08/15/2012 01:01 PM, Alexander Graf wrote:
>>>>> 
>>>>> On 15.08.2012, at 19:47, Scott Wood wrote:
>>>>> 
>>>>>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>>>>> 
>>>>>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>>>>> 
>>>>>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>>>>> 
>>>>>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>>>>> 
>>>>>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>>>>>> entries in the icache.
>>>>>>>>>> 
>>>>>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>>>>> 
>>>>>>>>> When the guest does it, the guest is responsible for clearing the
>>>>>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>>>>> 
>>>>>>>> Sure.  I was just worried that that commit message could be taken the
>>>>>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>>>>> 
>>>>>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>>>>>> case another guest was running on that same memory before.
>>>>>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>>>>> 
>>>>>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>>>>>> anonymous memory?  You probably won't have stale icache data in the
>>>>>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>>>>>> in theory you could have data that still hasn't left the dcache.
>>>>>>> 
>>>>>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>>>>> 
>>>>>> Safe how?
>>>>>> 
>>>>>> If it's truly the same, we're definitely not safe, since I had problems
>>>>>> with this using /dev/mem (particularly when changing the kernel image
>>>>>> without a host reboot) before I put in the icache flush patch.
>>>>> 
>>>>> QEMU needs to icache flush everything it puts into guest memory.
>>>> 
>>>> Yes.  I thought you meant we should be safe as things are now.
>>> 
>>> Hrm. What happened to your patch that flushes the icache on cpu_physical_memory_rw?
> 
> IIRC Ben wanted it conditionalized to not slow things down on
> icache-coherent systems, and I never got around to respinning it.

No, he was saying that DMA doesn't flush the icache:

  http://thread.gmane.org/gmane.comp.emulators.qemu/119022/focus=119086

> 
>> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.
> 
> Why?

Because guest Linux apparently assumes that DMA'd memory needs to be icache flushed.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 6:56 p.m.
On 08/15/2012 01:51 PM, Alexander Graf wrote:
> 
> On 15.08.2012, at 20:33, Scott Wood wrote:
> 
>> On 08/15/2012 01:29 PM, Alexander Graf wrote:
>>>
>>> On 15.08.2012, at 20:27, Alexander Graf wrote:
>>>
>>>>
>>>> On 15.08.2012, at 20:16, Scott Wood wrote:
>>>>
>>>>> On 08/15/2012 01:01 PM, Alexander Graf wrote:
>>>>>>
>>>>>> On 15.08.2012, at 19:47, Scott Wood wrote:
>>>>>>
>>>>>>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>>>>>>
>>>>>>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>>>>>>
>>>>>>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>>>>>>
>>>>>>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>>>>>>> entries in the icache.
>>>>>>>>>>>
>>>>>>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>>>>>>
>>>>>>>>>> When the guest does it, the guest is responsible for clearing the
>>>>>>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>>>>>>
>>>>>>>>> Sure.  I was just worried that that commit message could be taken the
>>>>>>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>>>>>>
>>>>>>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>>>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>>>>>>> case another guest was running on that same memory before.
>>>>>>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>>>>>>
>>>>>>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>>>>>>> anonymous memory?  You probably won't have stale icache data in the
>>>>>>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>>>>>>> in theory you could have data that still hasn't left the dcache.
>>>>>>>>
>>>>>>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>>>>>>
>>>>>>> Safe how?
>>>>>>>
>>>>>>> If it's truly the same, we're definitely not safe, since I had problems
>>>>>>> with this using /dev/mem (particularly when changing the kernel image
>>>>>>> without a host reboot) before I put in the icache flush patch.
>>>>>>
>>>>>> QEMU needs to icache flush everything it puts into guest memory.
>>>>>
>>>>> Yes.  I thought you meant we should be safe as things are now.
>>>>
>>>> Hrm. What happened to your patch that flushes the icache on cpu_physical_memory_rw?
>>
>> IIRC Ben wanted it conditionalized to not slow things down on
>> icache-coherent systems, and I never got around to respinning it.
> 
> No, he was saying that DMA doesn't flush the icache:
> 
>   http://thread.gmane.org/gmane.comp.emulators.qemu/119022/focus=119086

I recall someone asking for it to be made conditional, but I don't have
time to look it up right now -- I want to try to get some U-Boot stuff
done before the end of the merge window tomorrow.

>>> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.
>>
>> Why?
> 
> Because guest Linux apparently assumes that DMA'd memory needs to be icache flushed.

What about breakpoints and other debug modifications?

And it's possible (if not necessarily likely) that other guests are
different.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 6:58 p.m.
On 15.08.2012, at 20:56, Scott Wood wrote:

> On 08/15/2012 01:51 PM, Alexander Graf wrote:
>> 
>> On 15.08.2012, at 20:33, Scott Wood wrote:
>> 
>>> On 08/15/2012 01:29 PM, Alexander Graf wrote:
>>>> 
>>>> On 15.08.2012, at 20:27, Alexander Graf wrote:
>>>> 
>>>>> 
>>>>> On 15.08.2012, at 20:16, Scott Wood wrote:
>>>>> 
>>>>>> On 08/15/2012 01:01 PM, Alexander Graf wrote:
>>>>>>> 
>>>>>>> On 15.08.2012, at 19:47, Scott Wood wrote:
>>>>>>> 
>>>>>>>> On 08/15/2012 12:27 PM, Alexander Graf wrote:
>>>>>>>>> 
>>>>>>>>> On 15.08.2012, at 19:26, Scott Wood wrote:
>>>>>>>>> 
>>>>>>>>>> On 08/15/2012 04:52 AM, Alexander Graf wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 15.08.2012, at 03:23, Scott Wood wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>>>>>>>>>>>>> When we map a page that wasn't icache cleared before, do so when first
>>>>>>>>>>>>> mapping it in KVM using the same information bits as the Linux mapping
>>>>>>>>>>>>> logic. That way we are 100% sure that any page we map does not have stale
>>>>>>>>>>>>> entries in the icache.
>>>>>>>>>>>> 
>>>>>>>>>>>> We're not really 100% sure of that -- this only handles the case where
>>>>>>>>>>>> the kernel does the dirtying, not when it's done by QEMU or the guest.
>>>>>>>>>>> 
>>>>>>>>>>> When the guest does it, the guest is responsible for clearing the
>>>>>>>>>>> icache. Same for QEMU. It needs to clear it when doing DMA.
>>>>>>>>>> 
>>>>>>>>>> Sure.  I was just worried that that commit message could be taken the
>>>>>>>>>> wrong way, as in "we no longer need the QEMU icache flushing patch".
>>>>>>>>>> 
>>>>>>>>>>> However, what is still broken would be a direct /dev/mem map. There
>>>>>>>>>>> QEMU should probably clear the icache before starting the guest, in
>>>>>>>>>>> case another guest was running on that same memory before.
>>>>>>>>>>> Fortunately, we don't have that mode available in upstream QEMU :).
>>>>>>>>>> 
>>>>>>>>>> How is QEMU loading images different if it's /dev/mem versus ordinary
>>>>>>>>>> anonymous memory?  You probably won't have stale icache data in the
>>>>>>>>>> latter case (which makes it less likely to be a problem in pratice), but
>>>>>>>>>> in theory you could have data that still hasn't left the dcache.
>>>>>>>>> 
>>>>>>>>> It's the same. I just talked to Ben about this today in a different context and we should be safe :).
>>>>>>>> 
>>>>>>>> Safe how?
>>>>>>>> 
>>>>>>>> If it's truly the same, we're definitely not safe, since I had problems
>>>>>>>> with this using /dev/mem (particularly when changing the kernel image
>>>>>>>> without a host reboot) before I put in the icache flush patch.
>>>>>>> 
>>>>>>> QEMU needs to icache flush everything it puts into guest memory.
>>>>>> 
>>>>>> Yes.  I thought you meant we should be safe as things are now.
>>>>> 
>>>>> Hrm. What happened to your patch that flushes the icache on cpu_physical_memory_rw?
>>> 
>>> IIRC Ben wanted it conditionalized to not slow things down on
>>> icache-coherent systems, and I never got around to respinning it.
>> 
>> No, he was saying that DMA doesn't flush the icache:
>> 
>>  http://thread.gmane.org/gmane.comp.emulators.qemu/119022/focus=119086
> 
> I recall someone asking for it to be made conditional, but I don't have
> time to look it up right now -- I want to try to get some U-Boot stuff
> done before the end of the merge window tomorrow.

Sure :)

> 
>>>> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.
>>> 
>>> Why?
>> 
>> Because guest Linux apparently assumes that DMA'd memory needs to be icache flushed.
> 
> What about breakpoints and other debug modifications?

The breakpoint code is arch specific. We can just put an icache flush in there.

> And it's possible (if not necessarily likely) that other guests are
> different.

Does fsl hardware guarantee icache coherency from device DMA?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 7:05 p.m.
On 08/15/2012 01:58 PM, Alexander Graf wrote:
> 
> On 15.08.2012, at 20:56, Scott Wood wrote:
> 
>> On 08/15/2012 01:51 PM, Alexander Graf wrote:
>>>
>>> On 15.08.2012, at 20:33, Scott Wood wrote:
>>>
>>>> On 08/15/2012 01:29 PM, Alexander Graf wrote:
>>>>> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.
>>>>
>>>> Why?
>>>
>>> Because guest Linux apparently assumes that DMA'd memory needs to be icache flushed.
>>
>> What about breakpoints and other debug modifications?
> 
> The breakpoint code is arch specific. We can just put an icache flush in there.

That doesn't cover other modifications that a debugger might do
(including manual poking at code done by a person at the command line).
 It's not really the breakpoint that's the special case, it's things
that the guest thinks of as DMA -- and differentiating that seems like a
questionable optimization.  If the guest is going to flush anyway, is
there any significant performance penalty to flushing twice?  The second
time would just be a no-op beyond doing the MMU/cache lookup.

>> And it's possible (if not necessarily likely) that other guests are
>> different.
> 
> Does fsl hardware guarantee icache coherency from device DMA?

I don't think so, but I don't know of any fsl hardware that leaves dirty
data in the dcache after DMA.  Even with stashing on our newer chips,
the data first goes to memory and then the core is told to prefetch it.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Aug. 15, 2012, 7:29 p.m.
On 15.08.2012, at 21:05, Scott Wood wrote:

> On 08/15/2012 01:58 PM, Alexander Graf wrote:
>> 
>> On 15.08.2012, at 20:56, Scott Wood wrote:
>> 
>>> On 08/15/2012 01:51 PM, Alexander Graf wrote:
>>>> 
>>>> On 15.08.2012, at 20:33, Scott Wood wrote:
>>>> 
>>>>> On 08/15/2012 01:29 PM, Alexander Graf wrote:
>>>>>> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.
>>>>> 
>>>>> Why?
>>>> 
>>>> Because guest Linux apparently assumes that DMA'd memory needs to be icache flushed.
>>> 
>>> What about breakpoints and other debug modifications?
>> 
>> The breakpoint code is arch specific. We can just put an icache flush in there.
> 
> That doesn't cover other modifications that a debugger might do
> (including manual poking at code done by a person at the command line).

Why not? This would go through gdbstub, where we can always put in an icache flush.

> It's not really the breakpoint that's the special case, it's things
> that the guest thinks of as DMA -- and differentiating that seems like a
> questionable optimization.  If the guest is going to flush anyway, is
> there any significant performance penalty to flushing twice?  The second
> time would just be a no-op beyond doing the MMU/cache lookup.

I would hope the guest is clever enough to only icache flush when we actually execute from these pages.

> 
>>> And it's possible (if not necessarily likely) that other guests are
>>> different.
>> 
>> Does fsl hardware guarantee icache coherency from device DMA?
> 
> I don't think so, but I don't know of any fsl hardware that leaves dirty
> data in the dcache after DMA.  Even with stashing on our newer chips,
> the data first goes to memory and then the core is told to prefetch it.

For Linux, I think we always flush the dcache when flushing the icache. However, that argument is reasonably valid. We probably want to flush the dache on DMA, so that a stale icache can fetch it from memory properly. But I don't see a reason why we would want to do so for the icache if hardware doesn't do it either.

But let's get Ben on board here :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - Aug. 15, 2012, 7:53 p.m.
On 08/15/2012 02:29 PM, Alexander Graf wrote:
> 
> On 15.08.2012, at 21:05, Scott Wood wrote:
> 
>> On 08/15/2012 01:58 PM, Alexander Graf wrote:
>>>
>>> On 15.08.2012, at 20:56, Scott Wood wrote:
>>>
>>>> On 08/15/2012 01:51 PM, Alexander Graf wrote:
>>>>>
>>>>> On 15.08.2012, at 20:33, Scott Wood wrote:
>>>>>
>>>>>> On 08/15/2012 01:29 PM, Alexander Graf wrote:
>>>>>>> Ah, if I read Ben's comment correctly we only need it for rom loads, not always for cpu_physical_memory_rw.
>>>>>>
>>>>>> Why?
>>>>>
>>>>> Because guest Linux apparently assumes that DMA'd memory needs to be icache flushed.
>>>>
>>>> What about breakpoints and other debug modifications?
>>>
>>> The breakpoint code is arch specific. We can just put an icache flush in there.
>>
>> That doesn't cover other modifications that a debugger might do
>> (including manual poking at code done by a person at the command line).
> 
> Why not? This would go through gdbstub,

Not necessarily.  I could be poking at memory from the QEMU command
line.  If there isn't a command for that, there should be. :-)

> where we can always put in an icache flush.

If you want to cover every individual place where this function is
called for non-DMA, fine, though I'd feel more comfortable with
something that specifically identifies the access as for DMA.

>>>>>> And it's possible (if not necessarily likely) that other guests are
>>>> different.
>>>
>>> Does fsl hardware guarantee icache coherency from device DMA?
>>
>> I don't think so, but I don't know of any fsl hardware that leaves dirty
>> data in the dcache after DMA.  Even with stashing on our newer chips,
>> the data first goes to memory and then the core is told to prefetch it.
> 
> For Linux, I think we always flush the dcache when flushing the
> icache. However, that argument is reasonably valid. We probably want
> to flush the dache on DMA, so that a stale icache can fetch it from
> memory properly. But I don't see a reason why we would want to do so
> for the icache if hardware doesn't do it either.
> 
> But let's get Ben on board here :).

The only reason to invalidate the icache on DMA accesses would be to
avoid introducing a special case in the QEMU code, unless we find
hardware to emulate that does invalidate icache on DMA writes but isn't
icache-coherent in general (it's fairly plausable -- icache would act on
snoops it sees on the bus, but icache fills wouldn't issue snoops of
their own).

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ff8d51c..cea9d3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -33,6 +33,7 @@ 
 #include <asm/kvm_asm.h>
 #include <asm/processor.h>
 #include <asm/page.h>
+#include <asm/cacheflush.h>
 
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index c38e824..88de314 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -220,4 +220,16 @@  void kvmppc_claim_lpid(long lpid);
 void kvmppc_free_lpid(long lpid);
 void kvmppc_init_lpid(unsigned long nr_lpids);
 
+static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
+{
+	/* Clear i-cache for new pages */
+	struct page *page;
+	page = pfn_to_page(pfn);
+	if (!test_bit(PG_arch_1, &page->flags)) {
+		flush_dcache_icache_page(page);
+		set_bit(PG_arch_1, &page->flags);
+	}
+}
+
+
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c
index f922c29..837f13e 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -211,6 +211,9 @@  next_pteg:
 		pteg1 |= PP_RWRX;
 	}
 
+	if (orig_pte->may_execute)
+		kvmppc_mmu_flush_icache(hpaddr >> PAGE_SHIFT);
+
 	local_irq_disable();
 
 	if (pteg[rr]) {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 10fc8ec..0688b6b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -126,6 +126,8 @@  int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 
 	if (!orig_pte->may_execute)
 		rflags |= HPTE_R_N;
+	else
+		kvmppc_mmu_flush_icache(hpaddr >> PAGE_SHIFT);
 
 	hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
 
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 06273a7..a6519ca 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -543,6 +543,9 @@  static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 	kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
 				ref, gvaddr, stlbe);
 
+	/* Clear i-cache for new pages */
+	kvmppc_mmu_flush_icache(pfn);
+
 	/* Drop refcount on page, so that mmu notifiers can clear it */
 	kvm_release_pfn_clean(pfn);
 }
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index baaafde..fbdad0e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -469,6 +469,7 @@  void flush_dcache_icache_page(struct page *page)
 	__flush_dcache_icache_phys(page_to_pfn(page) << PAGE_SHIFT);
 #endif
 }
+EXPORT_SYMBOL(flush_dcache_icache_page);
 
 void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
 {