diff mbox

qemu-kvm: Role of flush_icache_range on PPC

Message ID 4E832DE3.40503@siemens.com
State New
Headers show

Commit Message

Jan Kiszka Sept. 28, 2011, 2:23 p.m. UTC
Alex,

we have this diff in qemu-kvm:



flush_icache_range() is doing something only on PPC hosts. So do we need
this upstream?

Jan

Comments

Alexander Graf Sept. 28, 2011, 2:26 p.m. UTC | #1
On 28.09.2011, at 16:23, Jan Kiszka wrote:

> Alex,
> 
> we have this diff in qemu-kvm:
> 
> diff --git a/exec.c b/exec.c
> index c1e045d..f188549 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3950,6 +3955,11 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                     cpu_physical_memory_set_dirty_flags(
>                         addr1, (0xff & ~CODE_DIRTY_FLAG));
>                 }
> +		/* qemu doesn't execute guest code directly, but kvm does
> +		   therefore flush instruction caches */
> +		if (kvm_enabled())
> +		    flush_icache_range((unsigned long)ptr,
> +				       ((unsigned long)ptr)+l);
>                 qemu_put_ram_ptr(ptr);
>             }
>         } else {
> 
> 
> flush_icache_range() is doing something only on PPC hosts. So do we need
> this upstream?

This makes sure that when device emulation overwrites code that is already present in the cache of a CPU, it gets flushed from the icache. I'm fairly sure we want that :). But let's ask Ben and David as well.


Alex
Jan Kiszka Sept. 28, 2011, 2:45 p.m. UTC | #2
On 2011-09-28 16:26, Alexander Graf wrote:
>
> On 28.09.2011, at 16:23, Jan Kiszka wrote:
>
>> Alex,
>>
>> we have this diff in qemu-kvm:
>>
>> diff --git a/exec.c b/exec.c
>> index c1e045d..f188549 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -3950,6 +3955,11 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>                      cpu_physical_memory_set_dirty_flags(
>>                          addr1, (0xff&  ~CODE_DIRTY_FLAG));
>>                  }
>> +		/* qemu doesn't execute guest code directly, but kvm does
>> +		   therefore flush instruction caches */
>> +		if (kvm_enabled())
>> +		    flush_icache_range((unsigned long)ptr,
>> +				       ((unsigned long)ptr)+l);
>>                  qemu_put_ram_ptr(ptr);
>>              }
>>          } else {
>>
>>
>> flush_icache_range() is doing something only on PPC hosts. So do we need
>> this upstream?
>
> This makes sure that when device emulation overwrites code that is already present in the cache of a CPU, it gets flushed from the icache. I'm fairly sure we want that :). But let's ask Ben and David as well.

/me wondered which write scenario precisely needs this. It could only be 
something synchronous /wrt to some VCPU. Which operations could trigger 
such a write? Does PPC inject software breakpoints in form of trap 
operations or so?

Mmm, according to our ancient recordings, the hunk above was once 
introduced for the sake of IA64: 9dc99a2823. I skipped it in my removal 
patch as it has some non-IA64 effect, at least potentially.

Jan
Jan Kiszka Sept. 28, 2011, 2:49 p.m. UTC | #3
On 2011-09-28 16:45, Jan Kiszka wrote:
> On 2011-09-28 16:26, Alexander Graf wrote:
>>
>> On 28.09.2011, at 16:23, Jan Kiszka wrote:
>>
>>> Alex,
>>>
>>> we have this diff in qemu-kvm:
>>>
>>> diff --git a/exec.c b/exec.c
>>> index c1e045d..f188549 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -3950,6 +3955,11 @@ void cpu_physical_memory_rw(target_phys_addr_t
>>> addr, uint8_t *buf,
>>> cpu_physical_memory_set_dirty_flags(
>>> addr1, (0xff& ~CODE_DIRTY_FLAG));
>>> }
>>> + /* qemu doesn't execute guest code directly, but kvm does
>>> + therefore flush instruction caches */
>>> + if (kvm_enabled())
>>> + flush_icache_range((unsigned long)ptr,
>>> + ((unsigned long)ptr)+l);
>>> qemu_put_ram_ptr(ptr);
>>> }
>>> } else {
>>>
>>>
>>> flush_icache_range() is doing something only on PPC hosts. So do we need
>>> this upstream?
>>
>> This makes sure that when device emulation overwrites code that is
>> already present in the cache of a CPU, it gets flushed from the
>> icache. I'm fairly sure we want that :). But let's ask Ben and David
>> as well.
>
> /me wondered which write scenario precisely needs this. It could only be
> something synchronous /wrt to some VCPU. Which operations could trigger
> such a write? Does PPC inject software breakpoints in form of trap
> operations or so?
>
> Mmm, according to our ancient recordings, the hunk above was once
> introduced for the sake of IA64: 9dc99a2823. I skipped it in my removal
> patch as it has some non-IA64 effect, at least potentially.

Correction: It was introduced by 6d3295f7c8, but generalized with 
f067512c06. That former commit talks about DMA operations on IA64 that 
also updates/drops the icache in reality.

Jan
Alexander Graf Sept. 28, 2011, 2:57 p.m. UTC | #4
Am 28.09.2011 um 16:49 schrieb Jan Kiszka <jan.kiszka@siemens.com>:

> On 2011-09-28 16:45, Jan Kiszka wrote:
>> On 2011-09-28 16:26, Alexander Graf wrote:
>>> 
>>> On 28.09.2011, at 16:23, Jan Kiszka wrote:
>>> 
>>>> Alex,
>>>> 
>>>> we have this diff in qemu-kvm:
>>>> 
>>>> diff --git a/exec.c b/exec.c
>>>> index c1e045d..f188549 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -3950,6 +3955,11 @@ void cpu_physical_memory_rw(target_phys_addr_t
>>>> addr, uint8_t *buf,
>>>> cpu_physical_memory_set_dirty_flags(
>>>> addr1, (0xff& ~CODE_DIRTY_FLAG));
>>>> }
>>>> + /* qemu doesn't execute guest code directly, but kvm does
>>>> + therefore flush instruction caches */
>>>> + if (kvm_enabled())
>>>> + flush_icache_range((unsigned long)ptr,
>>>> + ((unsigned long)ptr)+l);
>>>> qemu_put_ram_ptr(ptr);
>>>> }
>>>> } else {
>>>> 
>>>> 
>>>> flush_icache_range() is doing something only on PPC hosts. So do we need
>>>> this upstream?
>>> 
>>> This makes sure that when device emulation overwrites code that is
>>> already present in the cache of a CPU, it gets flushed from the
>>> icache. I'm fairly sure we want that :). But let's ask Ben and David
>>> as well.
>> 
>> /me wondered which write scenario precisely needs this. It could only be
>> something synchronous /wrt to some VCPU. Which operations could trigger
>> such a write? Does PPC inject software breakpoints in form of trap
>> operations or so?
>> 
>> Mmm, according to our ancient recordings, the hunk above was once
>> introduced for the sake of IA64: 9dc99a2823. I skipped it in my removal
>> patch as it has some non-IA64 effect, at least potentially.
> 
> Correction: It was introduced by 6d3295f7c8, but generalized with f067512c06. That former commit talks about DMA operations on IA64 that also updates/drops the icache in reality.

Yeah I remember discussions around the topic. IIUC DMA invalidates the cache lines of the CPUs in the system for the region it's writing to. At least potentially. But again, I'll leave this to the IBM guys to answer :). They know best how their hardware works.

Alex

>
Scott Wood Sept. 28, 2011, 5:27 p.m. UTC | #5
On 09/28/2011 09:45 AM, Jan Kiszka wrote:
> On 2011-09-28 16:26, Alexander Graf wrote:
>>
>> On 28.09.2011, at 16:23, Jan Kiszka wrote:
>>
>>> Alex,
>>>
>>> we have this diff in qemu-kvm:
>>>
>>> diff --git a/exec.c b/exec.c
>>> index c1e045d..f188549 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -3950,6 +3955,11 @@ void cpu_physical_memory_rw(target_phys_addr_t
>>> addr, uint8_t *buf,
>>>                      cpu_physical_memory_set_dirty_flags(
>>>                          addr1, (0xff&  ~CODE_DIRTY_FLAG));
>>>                  }
>>> +        /* qemu doesn't execute guest code directly, but kvm does
>>> +           therefore flush instruction caches */
>>> +        if (kvm_enabled())
>>> +            flush_icache_range((unsigned long)ptr,
>>> +                       ((unsigned long)ptr)+l);
>>>                  qemu_put_ram_ptr(ptr);
>>>              }
>>>          } else {

Been meaning to send a poke about this one:

http://patchwork.ozlabs.org/patch/90403/

I've seen issues with this when the executable images are initially
loaded by qemu.

>>> flush_icache_range() is doing something only on PPC hosts. So do we need
>>> this upstream?
>>
>> This makes sure that when device emulation overwrites code that is
>> already present in the cache of a CPU, it gets flushed from the
>> icache. I'm fairly sure we want that :). But let's ask Ben and David
>> as well.
> 
> /me wondered which write scenario precisely needs this. It could only be
> something synchronous /wrt to some VCPU.

Why would it need to be synchronous?  Even if it's asynchronous emulated
DMA, we don't want it sitting around only in a data cache that
instruction fetches won't snoop.

> Which operations could trigger
> such a write? Does PPC inject software breakpoints in form of trap
> operations or so?

It's not implemented yet in mainline for powerpc (we have something
internal that is on the backlog of things to be cleaned up and sent
out), but this is what we'd do for kvm_arch_insert_sw_breakpoint().

-Scott
Benjamin Herrenschmidt Sept. 28, 2011, 8:58 p.m. UTC | #6
On Wed, 2011-09-28 at 16:26 +0200, Alexander Graf wrote:
> 
> This makes sure that when device emulation overwrites code that is
> already present in the cache of a CPU, it gets flushed from the
> icache. I'm fairly sure we want that :). But let's ask Ben and David
> as well.

Hrm we don't need that. DMA doesn't flush the icache on power. The
kernel will take care of it if necessary.

The only case you do need it is when doing the initial load of the
kernel or SLOF image before you execute it.

Cheers,
Ben.
Benjamin Herrenschmidt Sept. 28, 2011, 9:02 p.m. UTC | #7
On Wed, 2011-09-28 at 12:27 -0500, Scott Wood wrote:

> Why would it need to be synchronous?  Even if it's asynchronous emulated
> DMA, we don't want it sitting around only in a data cache that
> instruction fetches won't snoop.

Except that this is exactly what happens on real HW :-)

The guest will do the necessary invalidations. DMA doesn't keep the
icache coherent on HW, why should it on kvm/qemu ?

> It's not implemented yet in mainline for powerpc (we have something
> internal that is on the backlog of things to be cleaned up and sent
> out), but this is what we'd do for kvm_arch_insert_sw_breakpoint().

Yes, breakpoints do need a flash, as does the initial program load.

Cheers,
Ben.
Scott Wood Sept. 28, 2011, 9:20 p.m. UTC | #8
On 09/28/2011 04:02 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2011-09-28 at 12:27 -0500, Scott Wood wrote:
> 
>> Why would it need to be synchronous?  Even if it's asynchronous emulated
>> DMA, we don't want it sitting around only in a data cache that
>> instruction fetches won't snoop.
> 
> Except that this is exactly what happens on real HW :-)

DMA does not normally go straight to data cache, at least on hardware
I'm familiar with.

> The guest will do the necessary invalidations. DMA doesn't keep the
> icache coherent on HW, why should it on kvm/qemu ?

Sure, if there might be stale stuff in the icache, the guest will need
to invalidate that.  But when running on real hardware, an OS does not
need to flush it out of data cache after a DMA transaction[1].  So
technically we just want a flush_dcache_range() for DMA.

It's moot unless we can distinguish DMA writes from breakpoint writes,
though.

-Scott

[1] Most OSes may do this anyway, to avoid needing to special case when
the dirtying is done entirely by DMA (or to avoid making assumptions
that could be broken by weird hardware), but that doesn't mean QEMU/KVM
should assume that -- maybe unless there's enough performance to be
gained by looking like the aforementioned "weird hardware" in certain
configurations.
Benjamin Herrenschmidt Sept. 28, 2011, 9:34 p.m. UTC | #9
On Wed, 2011-09-28 at 16:20 -0500, Scott Wood wrote:

> Sure, if there might be stale stuff in the icache, the guest will need
> to invalidate that.  But when running on real hardware, an OS does not
> need to flush it out of data cache after a DMA transaction[1].  So
> technically we just want a flush_dcache_range() for DMA.
>
> It's moot unless we can distinguish DMA writes from breakpoint writes,
> though.
> 
> -Scott
> 
> [1] Most OSes may do this anyway, to avoid needing to special case when
> the dirtying is done entirely by DMA (or to avoid making assumptions
> that could be broken by weird hardware), but that doesn't mean QEMU/KVM
> should assume that -- maybe unless there's enough performance to be
> gained by looking like the aforementioned "weird hardware" in certain
> configurations.

I see what you mean. A DMA would have had an implicit cache flush while
qemu memcpy'ing to the guest won't. Hrm.

I'm not sure any guest relies on that since architecturally, the HW is
permitted to do cache intervention tricks, and rather than flush,
transfer the data directly to the cache that originally contained the
lines (cache injection). We do even support that on some embedded stuff.

In any case, we should then make that depend on a flag, because it's
certainly unnecessary on P5, P6 and P7 which have a snooping icache and
can be costly.

Cheers,
Ben.
diff mbox

Patch

diff --git a/exec.c b/exec.c
index c1e045d..f188549 100644
--- a/exec.c
+++ b/exec.c
@@ -3950,6 +3955,11 @@  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                     cpu_physical_memory_set_dirty_flags(
                         addr1, (0xff & ~CODE_DIRTY_FLAG));
                 }
+		/* qemu doesn't execute guest code directly, but kvm does
+		   therefore flush instruction caches */
+		if (kvm_enabled())
+		    flush_icache_range((unsigned long)ptr,
+				       ((unsigned long)ptr)+l);
                 qemu_put_ram_ptr(ptr);
             }
         } else {