Patchwork target-i386: fix cmpxchg instruction emulation

login
register
mail settings
Submitter Andreas Gustafsson
Date Dec. 10, 2011, 9:59 a.m.
Message ID <20111210095911.AE7BE75E5D@guava.gson.org>
Download mbox | patch
Permalink /patch/130517/
State New
Headers show

Comments

Andreas Gustafsson - Dec. 10, 2011, 9:59 a.m.
When the i386 cmpxchg instruction is executed with a memory operand
and the comparison result is "unequal", do the memory write before
changing the accumulator instead of the other way around, because
otherwise the new accumulator value will incorrectly be used in the
comparison when the instruction is restarted after a page fault.

This bug was originally reported on 2010-04-25 as
https://bugs.launchpad.net/qemu/+bug/569760 

Signed-off-by: Andreas Gustafsson <gson@gson.org>
---
malc - Dec. 10, 2011, 10:29 a.m.
On Sat, 10 Dec 2011, Andreas Gustafsson wrote:

> When the i386 cmpxchg instruction is executed with a memory operand
> and the comparison result is "unequal", do the memory write before
> changing the accumulator instead of the other way around, because
> otherwise the new accumulator value will incorrectly be used in the
> comparison when the instruction is restarted after a page fault.
> 
> This bug was originally reported on 2010-04-25 as
> https://bugs.launchpad.net/qemu/+bug/569760 

I'm starting to lose count on how many times this patch resurfaces here.
Please see http://web.archiveorange.com/archive/v/1XS1vaW9MlyMzguWl5fE
and the links in the thread.

> 
> Signed-off-by: Andreas Gustafsson <gson@gson.org>
> ---
> --- translate.c.orig	2011-12-09 18:21:28.000000000 +0200
> +++ translate.c	2011-12-09 18:21:24.000000000 +0200
> @@ -4870,20 +4870,23 @@ static target_ulong disas_insn(DisasCont
>              tcg_gen_sub_tl(t2, cpu_regs[R_EAX], t0);
>              gen_extu(ot, t2);
>              tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, label1);
> +            label2 = gen_new_label();
>              if (mod == 3) {
> -                label2 = gen_new_label();
>                  gen_op_mov_reg_v(ot, R_EAX, t0);
>                  tcg_gen_br(label2);
>                  gen_set_label(label1);
>                  gen_op_mov_reg_v(ot, rm, t1);
> -                gen_set_label(label2);
>              } else {
> -                tcg_gen_mov_tl(t1, t0);
> +                /* perform no-op store cycle like physical cpu; must be
> +                   before changing accumulator to ensure idempotency if
> +                   the store faults and the instruction is restarted */
> +                gen_op_st_v(ot + s->mem_index, t0, a0);
>                  gen_op_mov_reg_v(ot, R_EAX, t0);
> +                tcg_gen_br(label2);
>                  gen_set_label(label1);
> -                /* always store */
>                  gen_op_st_v(ot + s->mem_index, t1, a0);
>              }
> +            gen_set_label(label2);
>              tcg_gen_mov_tl(cpu_cc_src, t0);
>              tcg_gen_mov_tl(cpu_cc_dst, t2);
>              s->cc_op = CC_OP_SUBB + ot;
>
Andreas Gustafsson - Dec. 10, 2011, 11:51 a.m.
malc wrote:
> On Sat, 10 Dec 2011, Andreas Gustafsson wrote:
> 
> > When the i386 cmpxchg instruction is executed with a memory operand
> > and the comparison result is "unequal", do the memory write before
> > changing the accumulator instead of the other way around, because
> > otherwise the new accumulator value will incorrectly be used in the
> > comparison when the instruction is restarted after a page fault.
> > 
> > This bug was originally reported on 2010-04-25 as
> > https://bugs.launchpad.net/qemu/+bug/569760 
> 
> I'm starting to lose count on how many times this patch resurfaces here.

This was the first time I posted the patch to the list.  I originally
attached it to the bug launchpad ticket since the instructions
on the qemu web site at the time did not clearly indicate otherwise,
and was then specifically asked to send it to the mailing list.  It
took me some time, but that's what I just did.

> Please see http://web.archiveorange.com/archive/v/1XS1vaW9MlyMzguWl5fE
> and the links in the thread.

Thank you for the links.  This is the first time I see them, as no one
had CC:d me on that discussion, nor updated the launchpad ticket.

I found <http://emulators.com/docs/nx33_qemu_0125.htm> especially
illuminating.  It should be required reading for anyone contemplating
the use of qemu for x86 emulation.

In your message of 8 Sep 2010 (which I never received), you said:
> This is tab damaged.

This has already been addressed in the version I sent to the list
today.

> Secondly it looks as if this addresses only a small part of a general
> problem [1],

Since the general problem still hasn't been addressed, the partial fix
is still better than nothing.  It is being applied as part of the
local qemu patches in pkgsrc (www.pkgsrc.org), and has enabled the use
of qemu for extensive automated regression testing of NetBSD which
would not be possible without the patch.

> also in a very naive and inefficient way,

Inefficient in what way?  The generated code only grows by a single
unconditional branch.

> while also opening a hole can of worms (should real parallel execution
> for TCG be ever implemented)

Would you care to explain what this can of worms is, and how it is
not already open in the original code?
malc - Dec. 10, 2011, 12:34 p.m.
On Sat, 10 Dec 2011, Andreas Gustafsson wrote:

> malc wrote:
> > On Sat, 10 Dec 2011, Andreas Gustafsson wrote:
> > 
> > > When the i386 cmpxchg instruction is executed with a memory operand
> > > and the comparison result is "unequal", do the memory write before
> > > changing the accumulator instead of the other way around, because
> > > otherwise the new accumulator value will incorrectly be used in the
> > > comparison when the instruction is restarted after a page fault.
> > > 
> > > This bug was originally reported on 2010-04-25 as
> > > https://bugs.launchpad.net/qemu/+bug/569760 
> > 
> > I'm starting to lose count on how many times this patch resurfaces here.
> 
> This was the first time I posted the patch to the list.  I originally
> attached it to the bug launchpad ticket since the instructions
> on the qemu web site at the time did not clearly indicate otherwise,
> and was then specifically asked to send it to the mailing list.  It
> took me some time, but that's what I just did.
> 

I see. The posters before never mentioned the authorship of the patch
so i assumed it was theirs..

> > Please see http://web.archiveorange.com/archive/v/1XS1vaW9MlyMzguWl5fE
> > and the links in the thread.
> 
> Thank you for the links.  This is the first time I see them, as no one
> had CC:d me on that discussion, nor updated the launchpad ticket.
> 
> I found <http://emulators.com/docs/nx33_qemu_0125.htm> especially
> illuminating.  It should be required reading for anyone contemplating
> the use of qemu for x86 emulation.
> 
> In your message of 8 Sep 2010 (which I never received), you said:
> > This is tab damaged.
> 
> This has already been addressed in the version I sent to the list
> today.
> 
> > Secondly it looks as if this addresses only a small part of a general
> > problem [1],
> 
> Since the general problem still hasn't been addressed, the partial fix
> is still better than nothing.  It is being applied as part of the
> local qemu patches in pkgsrc (www.pkgsrc.org), and has enabled the use
> of qemu for extensive automated regression testing of NetBSD which
> would not be possible without the patch.
> 

Here's where we disagree.

> > also in a very naive and inefficient way,
> 
> Inefficient in what way?  The generated code only grows by a single
> unconditional branch.

The generated code grows by a memory write (which is not what the hardware
does).

> 
> > while also opening a hole can of worms (should real parallel execution
> > for TCG be ever implemented)
> 
> Would you care to explain what this can of worms is, and how it is
> not already open in the original code?
> 

Not at the moment, the details are fuzzy. What it boilds down to, i think,
is that this patch replaces something that's supposed to be atomic with
thing which is not.

I recall having discussion aboutit with Fabrcie (in private) and i
blieve (and if my memory serves me) we came to the conclusion that
there's a way forward w.r.t. to this issue i just never came around of
implementing it, i can try to dig out the old mails and share the
highlights with you if you are interested.
Andreas Gustafsson - Dec. 10, 2011, 2:15 p.m.
malc wrote:
> > Inefficient in what way?  The generated code only grows by a single
> > unconditional branch.
> 
> The generated code grows by a memory write

Yes, an additional store instruction is generated, but the number of
store instructions *executed* does not change.  The original code
already does a store in both the compare-equal and the compare-unequal
case, by branching to the same store instruction; with my patch, a
store is still done in both cases, but the store instruction executed
is a separate one in each.

> (which is not what the hardware does).

The hardware does an atomic read-modify-write; qemu emulates this as a
separate read and write, and like the hardware, always does the write
part whether or not the "modify" part actually modified the data.  My
patch does *not* change this in any way.

> I recall having discussion aboutit with Fabrcie (in private) and i
> blieve (and if my memory serves me) we came to the conclusion that
> there's a way forward w.r.t. to this issue i just never came around of
> implementing it, i can try to dig out the old mails and share the
> highlights with you if you are interested.

Yes, please.

Patch

--- translate.c.orig	2011-12-09 18:21:28.000000000 +0200
+++ translate.c	2011-12-09 18:21:24.000000000 +0200
@@ -4870,20 +4870,23 @@  static target_ulong disas_insn(DisasCont
             tcg_gen_sub_tl(t2, cpu_regs[R_EAX], t0);
             gen_extu(ot, t2);
             tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, label1);
+            label2 = gen_new_label();
             if (mod == 3) {
-                label2 = gen_new_label();
                 gen_op_mov_reg_v(ot, R_EAX, t0);
                 tcg_gen_br(label2);
                 gen_set_label(label1);
                 gen_op_mov_reg_v(ot, rm, t1);
-                gen_set_label(label2);
             } else {
-                tcg_gen_mov_tl(t1, t0);
+                /* perform no-op store cycle like physical cpu; must be
+                   before changing accumulator to ensure idempotency if
+                   the store faults and the instruction is restarted */
+                gen_op_st_v(ot + s->mem_index, t0, a0);
                 gen_op_mov_reg_v(ot, R_EAX, t0);
+                tcg_gen_br(label2);
                 gen_set_label(label1);
-                /* always store */
                 gen_op_st_v(ot + s->mem_index, t1, a0);
             }
+            gen_set_label(label2);
             tcg_gen_mov_tl(cpu_cc_src, t0);
             tcg_gen_mov_tl(cpu_cc_dst, t2);
             s->cc_op = CC_OP_SUBB + ot;