Message ID | 5d48957b-780b-aa9c-7061-cba6808909b4@redhat.com |
---|---|
State | New |
Headers | show |
Paolo Bonzini <pbonzini@redhat.com> writes: > On 02/08/2016 08:37, Alex Bennée wrote: >>> - in notdirty_mem_write, care must be put in the ordering of >>> tb_invalidate_phys_page_fast (which itself calls tlb_unprotect_code and >>> takes the tb_lock in tb_invalidate_phys_page_range) and tlb_set_dirty. >>> At least it seems to me that the call to tb_invalidate_phys_page_fast >>> should be after the write, but that's not all. Perhaps merge this part >>> of notdirty_mem_write: > > I looked at it again and you are already doing the right thing in patch 19. > It's possible to simplify it a bit though like this: > > diff --git a/exec.c b/exec.c > index c8389f9..7850c39 100644 > --- a/exec.c > +++ b/exec.c > @@ -1944,9 +1944,6 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr) > static void notdirty_mem_write(void *opaque, hwaddr ram_addr, > uint64_t val, unsigned size) > { > - if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) { > - tb_invalidate_phys_page_fast(ram_addr, size); > - } > switch (size) { > case 1: > stb_p(qemu_map_ram_ptr(NULL, ram_addr), val); > @@ -1960,11 +1957,19 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, > */ > cpu_physical_memory_set_dirty_range(ram_addr, size, > DIRTY_CLIENTS_NOCODE); > + tb_lock(); > + if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) { > + /* tb_invalidate_phys_page_range will call tlb_unprotect_code > + * once the last TB in this page is gone. > + */ > + tb_invalidate_phys_page_fast(ram_addr, size); > + } > /* we remove the notdirty callback only if the code has been > flushed */ > if (!cpu_physical_memory_is_clean(ram_addr)) { > tlb_set_dirty(current_cpu, current_cpu->mem_io_vaddr); > } > + tb_unlock(); > } > > static bool notdirty_mem_accepts(void *opaque, hwaddr addr, > > > Anyhow, the next step is to merge either cmpxchg-based atomics > or iothread-free single-threaded TCG. Either will do. :) By iothread-free single-threaded TCG you mean dropping the need to grab the BQL when we start the TCG thread and making the BQL purely an on-demand/when needed thing? The cmpxchg stuff is looking good to me - I still have to do a pass over rth's patch set since he re-based on async safe work. In fact once your updated PULL req is in even better ;-) > I think that even iothread-free single-threaded TCG requires this > TLB stuff, because the iothread's address_space_write (and hence > invalidate_and_set_dirty) can race against the TCG thread's > code generation. Yes. > > Thanks, > > Paolo -- Alex Bennée
On Tue, Sep 27, 2016 at 18:16:45 +0200, Paolo Bonzini wrote: > Anyhow, the next step is to merge either cmpxchg-based atomics > or iothread-free single-threaded TCG. Either will do. :) > > I think that even iothread-free single-threaded TCG requires this > TLB stuff, because the iothread's address_space_write (and hence > invalidate_and_set_dirty) can race against the TCG thread's > code generation. What's a quick-and-dirty way to disable the fast-path TLB lookups? Alex: you told me the monitor has an option for this, but I can't find it. I'm looking for something that'd go in tcg/i386 to simply bypass the fast path. Forcing the slow TLB lookup would be an easy way to then implement a per-TLB seqlock. I think TLB corruption might explain the crashes I see when booting Ubuntu in a many-core guest (running on a many-core host). Thanks, Emilio
Emilio G. Cota <cota@braap.org> writes: > On Tue, Sep 27, 2016 at 18:16:45 +0200, Paolo Bonzini wrote: >> Anyhow, the next step is to merge either cmpxchg-based atomics >> or iothread-free single-threaded TCG. Either will do. :) >> >> I think that even iothread-free single-threaded TCG requires this >> TLB stuff, because the iothread's address_space_write (and hence >> invalidate_and_set_dirty) can race against the TCG thread's >> code generation. > > What's a quick-and-dirty way to disable the fast-path TLB lookups? > Alex: you told me the monitor has an option for this, but I can't > find it. I'm looking for something that'd go in tcg/i386 to simply > bypass the fast path. Hack up tlb_set_page_with_attrs() to always set one of the TLB_FOO bits (you might want to invent a new one as the other do have meanings). > > Forcing the slow TLB lookup would be an easy way to then implement > a per-TLB seqlock. I think TLB corruption might explain the crashes I > see when booting Ubuntu in a many-core guest (running on a many-core > host). TLB corruption is suspected but I've never come up with a clean test case to force it. I find heavy compiles in a system image can do it but my SMC torture test never crashes. > > Thanks, > > Emilio -- Alex Bennée
On 09/27/2016 03:29 PM, Emilio G. Cota wrote: > What's a quick-and-dirty way to disable the fast-path TLB lookups? > Alex: you told me the monitor has an option for this, but I can't > find it. I'm looking for something that'd go in tcg/i386 to simply > bypass the fast path. There is no easy way. If you need that, you'd have to significantly modify the tcg backend. r~
Richard Henderson <rth@twiddle.net> writes: > On 09/27/2016 03:29 PM, Emilio G. Cota wrote: >> What's a quick-and-dirty way to disable the fast-path TLB lookups? >> Alex: you told me the monitor has an option for this, but I can't >> find it. I'm looking for something that'd go in tcg/i386 to simply >> bypass the fast path. > > There is no easy way. If you need that, you'd have to significantly modify the > tcg backend. Surely all the backends force the slow-path when any of TLB_FLAGS_MASK are set. Unless adding an extra bit is going to run out of spare bits on some backends? > > > r~
On 09/27/2016 04:32 PM, Alex Bennée wrote: > > Richard Henderson <rth@twiddle.net> writes: > >> On 09/27/2016 03:29 PM, Emilio G. Cota wrote: >>> What's a quick-and-dirty way to disable the fast-path TLB lookups? >>> Alex: you told me the monitor has an option for this, but I can't >>> find it. I'm looking for something that'd go in tcg/i386 to simply >>> bypass the fast path. >> >> There is no easy way. If you need that, you'd have to significantly modify the >> tcg backend. > > Surely all the backends force the slow-path when any of TLB_FLAGS_MASK > are set. Unless adding an extra bit is going to run out of spare bits on > some backends? You could do that, yes. You also need to adjust softmmu_template.h to match. r~
diff --git a/exec.c b/exec.c index c8389f9..7850c39 100644 --- a/exec.c +++ b/exec.c @@ -1944,9 +1944,6 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr) static void notdirty_mem_write(void *opaque, hwaddr ram_addr, uint64_t val, unsigned size) { - if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) { - tb_invalidate_phys_page_fast(ram_addr, size); - } switch (size) { case 1: stb_p(qemu_map_ram_ptr(NULL, ram_addr), val); @@ -1960,11 +1957,19 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, */ cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_NOCODE); + tb_lock(); + if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) { + /* tb_invalidate_phys_page_range will call tlb_unprotect_code + * once the last TB in this page is gone. + */ + tb_invalidate_phys_page_fast(ram_addr, size); + } /* we remove the notdirty callback only if the code has been flushed */ if (!cpu_physical_memory_is_clean(ram_addr)) { tlb_set_dirty(current_cpu, current_cpu->mem_io_vaddr); } + tb_unlock(); } static bool notdirty_mem_accepts(void *opaque, hwaddr addr,