Message ID | 1386768216-33686-1-git-send-email-agraf@suse.de |
---|---|
State | New |
Headers | show |
Il 11/12/2013 14:23, Alexander Graf ha scritto: > + if (kvm_enabled()) { > + /* > + * The guest may want to directly execute from the rom region, > + * so we better invalidate its icache > + */ > + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); > + } Shouldn't KVM itself do that when a memslot is registered? There should be no reason for non-TCG QEMU to flush the icache. Paolo
On 11.12.2013, at 14:27, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 11/12/2013 14:23, Alexander Graf ha scritto: >> + if (kvm_enabled()) { >> + /* >> + * The guest may want to directly execute from the rom region, >> + * so we better invalidate its icache >> + */ >> + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); >> + } > > Shouldn't KVM itself do that when a memslot is registered? There should > be no reason for non-TCG QEMU to flush the icache. How would KVM know when things changed inside of a memory region? It's up to user space to manage the contents of a memory region, no? Alex
On 11 December 2013 13:23, Alexander Graf <agraf@suse.de> wrote: > The guest expects that its data and instruction cache view of the world > is 100% consistent when it initially boots. This works just fine on > initial rom population for the first boot. > > However, when we reboot and then repopulate the rom region there could > be old code still stuck in the instruction cache, giving the guest an > inconsistent view of the world when we're using kvm. > > So we need to invalidate the icache every time we write a rom into guest > address space. We do not need to do this for every DMA since the guest > expects it has to flush the icache manually in that case. > @@ -2033,6 +2034,13 @@ void cpu_physical_memory_write_rom(hwaddr addr, > ptr = qemu_get_ram_ptr(addr1); > memcpy(ptr, buf, l); > invalidate_and_set_dirty(addr1, l); > + if (kvm_enabled()) { > + /* > + * The guest may want to directly execute from the rom region, > + * so we better invalidate its icache > + */ > + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); > + } I bet these aren't the only places where code gets written to guest memory. Also are you sure flush_icache_range() works correctly when multiple threads (multiple vCPUs, potentially executing on different host CPUs) are involved? The TCG case only needs to care about "this thread writes code to memory that it will itself later execute", not any kind of cross-host-CPU flushing. There was a huge thread on kvmarm earlier this year https://lists.cs.columbia.edu/pipermail/kvmarm/2013-August/006716.html about a similar sort of issue, and I think the conclusion was that the kernel basically had to deal with the problem itself [though the thread is rather confusing...]. I've cc'd Marc Z in the hope he remembers the ARM specific detail... thanks -- PMM
Il 11/12/2013 14:35, Alexander Graf ha scritto: >>> >> + if (kvm_enabled()) { >>> >> + /* >>> >> + * The guest may want to directly execute from the rom region, >>> >> + * so we better invalidate its icache >>> >> + */ >>> >> + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); >>> >> + } >> > >> > Shouldn't KVM itself do that when a memslot is registered? There should >> > be no reason for non-TCG QEMU to flush the icache. > How would KVM know when things changed inside of a memory region? It's up to user space to manage the contents of a memory region, no? Yeah, that is true. BTW, shouldn't the same happen when you do migration? I'd prefer the above snippet to be replaced by a function in kvm-stub.c/kvm-all.c (kvm_flush_icache_range). I wonder if there would be a reason to add a KVM_FLUSH_ICACHE ioctl though. Could a virtually-indexed/virtually-tagged icache require flushing by guest address instead of host address? Paolo
On 11 December 2013 13:35, Alexander Graf <agraf@suse.de> wrote: > How would KVM know when things changed inside of a memory region? > It's up to user space to manage the contents of a memory region, no? If the architecture spec says that a freshly reset physical CPU has coherent icache and dcache, then resetting the vCPU should also ensure the icache and dcache are coherent, so one way to solve this would be just to make sure that vcpu reset did the right thing. -- PMM
On 11.12.2013, at 15:07, Peter Maydell <peter.maydell@linaro.org> wrote: > On 11 December 2013 13:35, Alexander Graf <agraf@suse.de> wrote: >> How would KVM know when things changed inside of a memory region? >> It's up to user space to manage the contents of a memory region, no? > > If the architecture spec says that a freshly reset physical CPU has > coherent icache and dcache, then resetting the vCPU should also > ensure the icache and dcache are coherent, so one way to solve > this would be just to make sure that vcpu reset did the right thing. Well, this really is a simplified view of the world. On real hardware the system boots up with caches disabled. Firmware is then responsible for enabling caches and flushing things as it goes. Firmware loads the kernel into ram, flushing the icache on those regions it wrote to along the way. The kernel boots and every time it faults in a page, it flushes caches for that page. So really the problem is that we're skipping the "cache disabled firmware" step. With this patch, we're simulating a bootloader's behavior when writing a blob into guest memory. Since that's really what we are trying to behave like - a bootloader. Alex
> -----Original Message----- > From: Peter Maydell [mailto:peter.maydell@linaro.org] > Sent: Wednesday, December 11, 2013 4:07 PM > To: Alexander Graf > Cc: Paolo Bonzini; Vlad Bogdan-BOGVLAD1; QEMU Developers; qemu- > ppc@nongnu.org; Sethi Varun-B16395; Wood Scott-B07421; Caraman Mihai > Claudiu-B02008 > Subject: Re: [Qemu-devel] [PATCH] roms: Flush icache when writing roms to > guest memory > > On 11 December 2013 13:35, Alexander Graf <agraf@suse.de> wrote: > > How would KVM know when things changed inside of a memory region? > > It's up to user space to manage the contents of a memory region, no? > > If the architecture spec says that a freshly reset physical CPU has > coherent icache and dcache, then resetting the vCPU should also > ensure the icache and dcache are coherent, so one way to solve > this would be just to make sure that vcpu reset did the right thing. This is not related to reset operation. Freescale e500 core family does not assure the coherency between data and instruction cache. This is an extract from reference manual: 'When a processor modifies any memory location that can contain an instruction, software must ensure that the instruction cache is made consistent with data memory and that the modifications are made visible to the instruction fetching mechanism. This must be done even if the cache is disabled or if the page is marked caching-inhibited.' So it's the loader duty to synchronize the instruction cache. -Mike
On 11.12.2013, at 15:03, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 11/12/2013 14:35, Alexander Graf ha scritto: >>>>>> + if (kvm_enabled()) { >>>>>> + /* >>>>>> + * The guest may want to directly execute from the rom region, >>>>>> + * so we better invalidate its icache >>>>>> + */ >>>>>> + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); >>>>>> + } >>>> >>>> Shouldn't KVM itself do that when a memslot is registered? There should >>>> be no reason for non-TCG QEMU to flush the icache. >> How would KVM know when things changed inside of a memory region? It's up to user space to manage the contents of a memory region, no? > > Yeah, that is true. BTW, shouldn't the same happen when you do migration? Fortunately no, because migration always happens on a clean plate, so the icache is not populated yet for the regions that the guest's memory get written to :). > I'd prefer the above snippet to be replaced by a function in > kvm-stub.c/kvm-all.c (kvm_flush_icache_range). That makes sense. > I wonder if there would be a reason to add a KVM_FLUSH_ICACHE ioctl > though. Could a virtually-indexed/virtually-tagged icache require > flushing by guest address instead of host address? No PPC platform I care about has vi/vt icache. I don't know if ARM has any - but I'd prefer to keep this as simple as possible for as long as we can. Newer POWER chips even just do cache snooping and don't need all this manual cache synchronization nonsense anymore. Alex
On 11 December 2013 14:18, mihai.caraman@freescale.com <mihai.caraman@freescale.com> wrote: >> From: Peter Maydell [mailto:peter.maydell@linaro.org] >> If the architecture spec says that a freshly reset physical CPU has >> coherent icache and dcache, then resetting the vCPU should also >> ensure the icache and dcache are coherent, so one way to solve >> this would be just to make sure that vcpu reset did the right thing. > > This is not related to reset operation. Freescale e500 core family > does not assure the coherency between data and instruction cache. > This is an extract from reference manual: > > 'When a processor modifies any memory location that can contain an > instruction, software must ensure that the instruction cache is made > consistent with data memory and that the modifications are made visible > to the instruction fetching mechanism. This must be done even if the > cache is disabled or if the page is marked caching-inhibited.' > > So it's the loader duty to synchronize the instruction cache. But these are (emulated) ROMs, not an emulated bootloader. They ought to work like actual ROMs: QEMU as the emulator of the system/devices provides the contents of physical address space; KVM as the emulator of the CPU provides a CPU which doesn't start up executing from rubbish in its icache. (This matches how a real physical CPU executes its first instruction by really going out to the ROM, not by looking at its cache.) thanks -- PMM
> On 11.12.2013, at 16:15, Alexander Graf < agraf@suse.de > wrote: > > Well, this really is a simplified view of the world. > > On real hardware the system boots up with caches disabled. Firmware is > then responsible for enabling caches and flushing things as it goes. > Firmware loads the kernel into ram, flushing the icache on those regions > it wrote to along the way. The kernel boots and every time it faults in a > page, it flushes caches for that page. > > So really the problem is that we're skipping the "cache disabled > firmware" step. With this patch, we're simulating a bootloader's behavior > when writing a blob into guest memory. Since that's really what we are > trying to behave like - a bootloader. The cache synchronization is required by self-modifying code not just bootloaders. -Mike
On 11.12.2013, at 15:25, Peter Maydell <peter.maydell@linaro.org> wrote: > On 11 December 2013 14:18, mihai.caraman@freescale.com > <mihai.caraman@freescale.com> wrote: >>> From: Peter Maydell [mailto:peter.maydell@linaro.org] >>> If the architecture spec says that a freshly reset physical CPU has >>> coherent icache and dcache, then resetting the vCPU should also >>> ensure the icache and dcache are coherent, so one way to solve >>> this would be just to make sure that vcpu reset did the right thing. >> >> This is not related to reset operation. Freescale e500 core family >> does not assure the coherency between data and instruction cache. >> This is an extract from reference manual: >> >> 'When a processor modifies any memory location that can contain an >> instruction, software must ensure that the instruction cache is made >> consistent with data memory and that the modifications are made visible >> to the instruction fetching mechanism. This must be done even if the >> cache is disabled or if the page is marked caching-inhibited.' >> >> So it's the loader duty to synchronize the instruction cache. > > But these are (emulated) ROMs, not an emulated bootloader. > They ought to work like actual ROMs: QEMU as the emulator No, they don't. Real ROMs lie in cache inhibited memory and are only copied / shadowed into RAM by firmware. We don't do that with QEMU. > of the system/devices provides the contents of physical address > space; KVM as the emulator of the CPU provides a CPU which > doesn't start up executing from rubbish in its icache. (This matches > how a real physical CPU executes its first instruction by really > going out to the ROM, not by looking at its cache.) KVM can't even execute from real ROM (MMIO) regions. Alex
On 11.12.2013, at 15:07, Peter Maydell <peter.maydell@linaro.org> wrote: > But these are (emulated) ROMs, not an emulated bootloader. > They ought to work like actual ROMs: QEMU as the emulator > of the system/devices provides the contents of physical address > space; KVM as the emulator of the CPU provides a CPU which > doesn't start up executing from rubbish in its icache. (This matches > how a real physical CPU executes its first instruction by really > going out to the ROM, not by looking at its cache.) For ppce500 machine, Qemu calls load_uimage2()/load_elf() effectively loading the image at address 0 instead of handling it as a raw blob. We do not run yet a bootloader inside the VM. -Mike
On Wed, 2013-12-11 at 13:56 +0000, Peter Maydell wrote: > On 11 December 2013 13:23, Alexander Graf <agraf@suse.de> wrote: > > The guest expects that its data and instruction cache view of the world > > is 100% consistent when it initially boots. This works just fine on > > initial rom population for the first boot. > > > > However, when we reboot and then repopulate the rom region there could > > be old code still stuck in the instruction cache, giving the guest an > > inconsistent view of the world when we're using kvm. > > > > So we need to invalidate the icache every time we write a rom into guest > > address space. We do not need to do this for every DMA since the guest > > expects it has to flush the icache manually in that case. > > > @@ -2033,6 +2034,13 @@ void cpu_physical_memory_write_rom(hwaddr addr, > > ptr = qemu_get_ram_ptr(addr1); > > memcpy(ptr, buf, l); > > invalidate_and_set_dirty(addr1, l); > > + if (kvm_enabled()) { > > + /* > > + * The guest may want to directly execute from the rom region, > > + * so we better invalidate its icache > > + */ > > + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); > > + } > > I bet these aren't the only places where code gets written > to guest memory. Also are you sure flush_icache_range() > works correctly when multiple threads (multiple vCPUs, > potentially executing on different host CPUs) are involved? On PPC these cache operations broadcast, and are the architecturally defined way of doing self-modifying code. > The TCG case only needs to care about "this thread writes code > to memory that it will itself later execute", not any kind of > cross-host-CPU flushing. Can't the TCG thread get migrated between CPUs? > There was a huge thread on kvmarm earlier this year > https://lists.cs.columbia.edu/pipermail/kvmarm/2013-August/006716.html > about a similar sort of issue, and I think the conclusion was that > the kernel basically had to deal with the problem itself [though > the thread is rather confusing...]. I've cc'd Marc Z in the hope > he remembers the ARM specific detail... Hmm, a good point is raised in that thread regarding what happens if a page is swapped out and then back in: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-August/006738.html I think the usual mechanism PPC booke uses to handle this is currently not effective with KVM because we get the pages via __get_user_pages fast() rather than by enabling execute permission in an ISI (see the second instance of set_pte_filter() in arch/powerpc/mm/pgtable.c). Even if we fix that to invoke the cache cleaning code when KVM acquires a page, though, QEMU would still need to flush if it modifies/loads code on a page that may already be marked in the kernel as having been cleaned. -Scott
Il 13/12/2013 20:18, Scott Wood ha scritto: >> Also are you sure flush_icache_range() >> works correctly when multiple threads (multiple vCPUs, >> potentially executing on different host CPUs) are involved? > > On PPC these cache operations broadcast, and are the architecturally > defined way of doing self-modifying code. I expect that to be the same on any cache-coherent system. On a VIVT cache with shadow paging, some kernel collaboration may be necessary because you have to flush using guest addresses rather than host addresses (or alternatively you have to flush a whole context id). But we can fix the problem when it happens. Paolo
On 14 December 2013 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 13/12/2013 20:18, Scott Wood ha scritto: >>> Also are you sure flush_icache_range() >>> works correctly when multiple threads (multiple vCPUs, >>> potentially executing on different host CPUs) are involved? >> >> On PPC these cache operations broadcast, and are the architecturally >> defined way of doing self-modifying code. > > I expect that to be the same on any cache-coherent system. On ARM you have the choice of "clean/invalidate sufficient to run code on this CPU" vs "sufficient to run code on any CPU" (the latter obviously is a more expensive operation). As it happens I've checked through and the syscall/gcc primitive we use is doing the latter not the former. But I did have to check. I think SPARC is also OK (the manual defines the 'flush' insn as working for all cpus). Haven't checked others. thanks -- PMM
diff --git a/exec.c b/exec.c index f4b9ef2..cc63eb6 100644 --- a/exec.c +++ b/exec.c @@ -50,6 +50,7 @@ #include "translate-all.h" #include "exec/memory-internal.h" +#include "qemu/cache-utils.h" //#define DEBUG_SUBPAGE @@ -2033,6 +2034,13 @@ void cpu_physical_memory_write_rom(hwaddr addr, ptr = qemu_get_ram_ptr(addr1); memcpy(ptr, buf, l); invalidate_and_set_dirty(addr1, l); + if (kvm_enabled()) { + /* + * The guest may want to directly execute from the rom region, + * so we better invalidate its icache + */ + flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l); + } } len -= l; buf += l; diff --git a/hw/core/loader.c b/hw/core/loader.c index 60d2ebd..4f809f3 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -51,6 +51,7 @@ #include "hw/nvram/fw_cfg.h" #include "exec/memory.h" #include "exec/address-spaces.h" +#include "qemu/cache-utils.h" #include <zlib.h> @@ -619,6 +620,7 @@ static void *rom_set_mr(Rom *rom, Object *owner, const char *name) data = memory_region_get_ram_ptr(rom->mr); memcpy(data, rom->data, rom->datasize); + flush_icache_range((uintptr_t)data, (uintptr_t)data + rom->datasize); return data; } @@ -777,6 +779,14 @@ static void rom_reset(void *unused) if (rom->mr) { void *host = memory_region_get_ram_ptr(rom->mr); memcpy(host, rom->data, rom->datasize); + if (kvm_enabled()) { + /* + * The guest may want to directly execute from the rom region, + * so we better invalidate its icache + */ + flush_icache_range((uintptr_t)host, + (uintptr_t)host + rom->datasize); + } } else { cpu_physical_memory_write_rom(rom->addr, rom->data, rom->datasize); }
We use the rom infrastructure to write firmware and/or initial kernel blobs into guest address space. So we're essentially the layer before the first code that gets executed inside the guest. The guest expects that its data and instruction cache view of the world is 100% consistent when it initially boots. This works just fine on initial rom population for the first boot. However, when we reboot and then repopulate the rom region there could be old code still stuck in the instruction cache, giving the guest an inconsistent view of the world when we're using kvm. So we need to invalidate the icache every time we write a rom into guest address space. We do not need to do this for every DMA since the guest expects it has to flush the icache manually in that case. This fixes random reboot issues on e5500 (booke ppc) for me. Signed-off-by: Alexander Graf <agraf@suse.de> --- exec.c | 8 ++++++++ hw/core/loader.c | 10 ++++++++++ 2 files changed, 18 insertions(+)