Message ID | 20111210095911.AE7BE75E5D@guava.gson.org |
---|---|
State | New |
Headers | show |
On Sat, 10 Dec 2011, Andreas Gustafsson wrote: > When the i386 cmpxchg instruction is executed with a memory operand > and the comparison result is "unequal", do the memory write before > changing the accumulator instead of the other way around, because > otherwise the new accumulator value will incorrectly be used in the > comparison when the instruction is restarted after a page fault. > > This bug was originally reported on 2010-04-25 as > https://bugs.launchpad.net/qemu/+bug/569760 I'm starting to lose count on how many times this patch resurfaces here. Please see http://web.archiveorange.com/archive/v/1XS1vaW9MlyMzguWl5fE and the links in the thread. > > Signed-off-by: Andreas Gustafsson <gson@gson.org> > --- > --- translate.c.orig 2011-12-09 18:21:28.000000000 +0200 > +++ translate.c 2011-12-09 18:21:24.000000000 +0200 > @@ -4870,20 +4870,23 @@ static target_ulong disas_insn(DisasCont > tcg_gen_sub_tl(t2, cpu_regs[R_EAX], t0); > gen_extu(ot, t2); > tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, label1); > + label2 = gen_new_label(); > if (mod == 3) { > - label2 = gen_new_label(); > gen_op_mov_reg_v(ot, R_EAX, t0); > tcg_gen_br(label2); > gen_set_label(label1); > gen_op_mov_reg_v(ot, rm, t1); > - gen_set_label(label2); > } else { > - tcg_gen_mov_tl(t1, t0); > + /* perform no-op store cycle like physical cpu; must be > + before changing accumulator to ensure idempotency if > + the store faults and the instruction is restarted */ > + gen_op_st_v(ot + s->mem_index, t0, a0); > gen_op_mov_reg_v(ot, R_EAX, t0); > + tcg_gen_br(label2); > gen_set_label(label1); > - /* always store */ > gen_op_st_v(ot + s->mem_index, t1, a0); > } > + gen_set_label(label2); > tcg_gen_mov_tl(cpu_cc_src, t0); > tcg_gen_mov_tl(cpu_cc_dst, t2); > s->cc_op = CC_OP_SUBB + ot; >
malc wrote: > On Sat, 10 Dec 2011, Andreas Gustafsson wrote: > > > When the i386 cmpxchg instruction is executed with a memory operand > > and the comparison result is "unequal", do the memory write before > > changing the accumulator instead of the other way around, because > > otherwise the new accumulator value will incorrectly be used in the > > comparison when the instruction is restarted after a page fault. > > > > This bug was originally reported on 2010-04-25 as > > https://bugs.launchpad.net/qemu/+bug/569760 > > I'm starting to lose count on how many times this patch resurfaces here. This was the first time I posted the patch to the list. I originally attached it to the bug launchpad ticket since the instructions on the qemu web site at the time did not clearly indicate otherwise, and was then specifically asked to send it to the mailing list. It took me some time, but that's what I just did. > Please see http://web.archiveorange.com/archive/v/1XS1vaW9MlyMzguWl5fE > and the links in the thread. Thank you for the links. This is the first time I see them, as no one had CC:d me on that discussion, nor updated the launchpad ticket. I found <http://emulators.com/docs/nx33_qemu_0125.htm> especially illuminating. It should be required reading for anyone contemplating the use of qemu for x86 emulation. In your message of 8 Sep 2010 (which I never received), you said: > This is tab damaged. This has already been addressed in the version I sent to the list today. > Secondly it looks as if this addresses only a small part of a general > problem [1], Since the general problem still hasn't been addressed, the partial fix is still better than nothing. It is being applied as part of the local qemu patches in pkgsrc (www.pkgsrc.org), and has enabled the use of qemu for extensive automated regression testing of NetBSD which would not be possible without the patch. > also in a very naive and inefficient way, Inefficient in what way? The generated code only grows by a single unconditional branch. > while also opening a hole can of worms (should real parallel execution > for TCG be ever implemented) Would you care to explain what this can of worms is, and how it is not already open in the original code?
On Sat, 10 Dec 2011, Andreas Gustafsson wrote: > malc wrote: > > On Sat, 10 Dec 2011, Andreas Gustafsson wrote: > > > > > When the i386 cmpxchg instruction is executed with a memory operand > > > and the comparison result is "unequal", do the memory write before > > > changing the accumulator instead of the other way around, because > > > otherwise the new accumulator value will incorrectly be used in the > > > comparison when the instruction is restarted after a page fault. > > > > > > This bug was originally reported on 2010-04-25 as > > > https://bugs.launchpad.net/qemu/+bug/569760 > > > > I'm starting to lose count on how many times this patch resurfaces here. > > This was the first time I posted the patch to the list. I originally > attached it to the bug launchpad ticket since the instructions > on the qemu web site at the time did not clearly indicate otherwise, > and was then specifically asked to send it to the mailing list. It > took me some time, but that's what I just did. > I see. The posters before never mentioned the authorship of the patch so i assumed it was theirs.. > > Please see http://web.archiveorange.com/archive/v/1XS1vaW9MlyMzguWl5fE > > and the links in the thread. > > Thank you for the links. This is the first time I see them, as no one > had CC:d me on that discussion, nor updated the launchpad ticket. > > I found <http://emulators.com/docs/nx33_qemu_0125.htm> especially > illuminating. It should be required reading for anyone contemplating > the use of qemu for x86 emulation. > > In your message of 8 Sep 2010 (which I never received), you said: > > This is tab damaged. > > This has already been addressed in the version I sent to the list > today. > > > Secondly it looks as if this addresses only a small part of a general > > problem [1], > > Since the general problem still hasn't been addressed, the partial fix > is still better than nothing. It is being applied as part of the > local qemu patches in pkgsrc (www.pkgsrc.org), and has enabled the use > of qemu for extensive automated regression testing of NetBSD which > would not be possible without the patch. > Here's where we disagree. > > also in a very naive and inefficient way, > > Inefficient in what way? The generated code only grows by a single > unconditional branch. The generated code grows by a memory write (which is not what the hardware does). > > > while also opening a hole can of worms (should real parallel execution > > for TCG be ever implemented) > > Would you care to explain what this can of worms is, and how it is > not already open in the original code? > Not at the moment, the details are fuzzy. What it boilds down to, i think, is that this patch replaces something that's supposed to be atomic with thing which is not. I recall having discussion aboutit with Fabrcie (in private) and i blieve (and if my memory serves me) we came to the conclusion that there's a way forward w.r.t. to this issue i just never came around of implementing it, i can try to dig out the old mails and share the highlights with you if you are interested.
malc wrote: > > Inefficient in what way? The generated code only grows by a single > > unconditional branch. > > The generated code grows by a memory write Yes, an additional store instruction is generated, but the number of store instructions *executed* does not change. The original code already does a store in both the compare-equal and the compare-unequal case, by branching to the same store instruction; with my patch, a store is still done in both cases, but the store instruction executed is a separate one in each. > (which is not what the hardware does). The hardware does an atomic read-modify-write; qemu emulates this as a separate read and write, and like the hardware, always does the write part whether or not the "modify" part actually modified the data. My patch does *not* change this in any way. > I recall having discussion aboutit with Fabrcie (in private) and i > blieve (and if my memory serves me) we came to the conclusion that > there's a way forward w.r.t. to this issue i just never came around of > implementing it, i can try to dig out the old mails and share the > highlights with you if you are interested. Yes, please.
--- translate.c.orig 2011-12-09 18:21:28.000000000 +0200 +++ translate.c 2011-12-09 18:21:24.000000000 +0200 @@ -4870,20 +4870,23 @@ static target_ulong disas_insn(DisasCont tcg_gen_sub_tl(t2, cpu_regs[R_EAX], t0); gen_extu(ot, t2); tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, label1); + label2 = gen_new_label(); if (mod == 3) { - label2 = gen_new_label(); gen_op_mov_reg_v(ot, R_EAX, t0); tcg_gen_br(label2); gen_set_label(label1); gen_op_mov_reg_v(ot, rm, t1); - gen_set_label(label2); } else { - tcg_gen_mov_tl(t1, t0); + /* perform no-op store cycle like physical cpu; must be + before changing accumulator to ensure idempotency if + the store faults and the instruction is restarted */ + gen_op_st_v(ot + s->mem_index, t0, a0); gen_op_mov_reg_v(ot, R_EAX, t0); + tcg_gen_br(label2); gen_set_label(label1); - /* always store */ gen_op_st_v(ot + s->mem_index, t1, a0); } + gen_set_label(label2); tcg_gen_mov_tl(cpu_cc_src, t0); tcg_gen_mov_tl(cpu_cc_dst, t2); s->cc_op = CC_OP_SUBB + ot;
When the i386 cmpxchg instruction is executed with a memory operand and the comparison result is "unequal", do the memory write before changing the accumulator instead of the other way around, because otherwise the new accumulator value will incorrectly be used in the comparison when the instruction is restarted after a page fault. This bug was originally reported on 2010-04-25 as https://bugs.launchpad.net/qemu/+bug/569760 Signed-off-by: Andreas Gustafsson <gson@gson.org> ---