Patchwork [2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n

login
register
mail settings
Submitter Uros Bizjak
Date Jan. 14, 2013, 4:48 p.m.
Message ID <CAFULd4bWUhZ2ZLJA7Cs+oxxdPvfHbzcmMDgB7b11ZhZbD6JF1g@mail.gmail.com>
Download mbox | patch
Permalink /patch/211840/
State New
Headers show

Comments

Uros Bizjak - Jan. 14, 2013, 4:48 p.m.
On Sun, Jan 13, 2013 at 11:29 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Sun, Jan 13, 2013 at 11:23:24PM +0100, Uros Bizjak wrote:
>> On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> >> +(define_insn "atomic_store<mode>_1"
>> >> >> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> >> >> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> >> >> +                     (match_operand:SI 2 "const_int_operand")]
>> >> >> +                    UNSPEC_MOVA))]
>> >> >> +  ""
>> >> >> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>> >> >
>> >> > Is that the updated pattern you wanted? It looks similar to mine.
>> >>
>> >> Yes the attached patch actually implements all proposed fixes.
>> >
>> > Ok great. Can you just commit it then? It looks good to me.
>>
>> No problem, but what about this part:
>
> Right now it just means its silently ignored, no wrong code generated.
> If people are ok with a new target hook I can add one.
> There are some more bugs in this area, like PR55947
>
> Giving a warning is imho less important than supporting this at all.
> So I would prefer to not delay this patch.
>
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index 2b615a1..c283869 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
>>        return const0_rtx;
>>      }
>>
>> +  /* need target hook there to check for not hle acquire */
>> +
>>    if (HAVE_atomic_clear)
>>      {
>>        emit_insn (gen_atomic_clear (mem, model));
>>
>> Middle-end support should be implemented before target support is
>> committed. So, please figure out how to emit correct error on
>> unsupported models and get middle-end patch reviewed first. We do get
>> "Error: instruction `mov' after `xacquire' not allowed" assembler
>> error with "xacquire movb $0,mem" asm, though.
>
> The sync.md code is only called for the acquire bit.
>
> The only case where it may happen I guess if someone sets both.

This cannot happen, we reject code that sets both __HLE* flags.

2012-01-14  Uros Bizjak  <ubizjak@gmail.com>
	    Andi Kleen  <ak@linux.intel.com>

	PR target/55948
	* config/i386/sync.md (atomic_store<mode>_1): New pattern.
	(atomic_store<mode>): Call atomic_store<mode>_1 for IX86_HLE_RELEASE
	memmodel flag.

testsuite/ChangeLog

2012-01-14  Andi Kleen  <ak@linux.intel.com>

	PR target/55948
	* gcc.target/i386/hle-clear-rel.c: New file
	* gcc.target/i386/hle-store-rel.c: New file.

I have committed attached patch to mainline SVN, after re-tested it on
x86_64-pc-linux-gnu.

Uros.
Andi Kleen - Jan. 14, 2013, 6:06 p.m.
> This cannot happen, we reject code that sets both __HLE* flags.

Good thanks.

BTW I found more HLE bugs, it looks like some of the fetch_op_*
patterns do not match always and fall back to cmpxchg, which
does not generate HLE code correctly. Not fully sure what's 
wrong, can you spot any obvious problems? You changed the

(define_insn "atomic_<logic><mode>"

pattern last.

The only one that should really fallback to cmpxchg is nand,
all the others can be generated directly.

This can be seen by commenting in the #if 0 case in the libstdc++
HLE patch test case I sent yesterday. 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

-Andi
Uros Bizjak - Jan. 14, 2013, 6:40 p.m.
On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> This cannot happen, we reject code that sets both __HLE* flags.
>
> BTW I found more HLE bugs, it looks like some of the fetch_op_*
> patterns do not match always and fall back to cmpxchg, which
> does not generate HLE code correctly. Not fully sure what's
> wrong, can you spot any obvious problems? You changed the
>
> (define_insn "atomic_<logic><mode>"
>
> pattern last.

I don't think this is a target problem, these insns work as expected
and are covered by extensive testsuite in gcc.target/i386/hle-*.c.

Uros.
Andi Kleen - Jan. 14, 2013, 7:01 p.m.
On Mon, Jan 14, 2013 at 07:40:56PM +0100, Uros Bizjak wrote:
> On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >> This cannot happen, we reject code that sets both __HLE* flags.
> >
> > BTW I found more HLE bugs, it looks like some of the fetch_op_*
> > patterns do not match always and fall back to cmpxchg, which
> > does not generate HLE code correctly. Not fully sure what's
> > wrong, can you spot any obvious problems? You changed the
> >
> > (define_insn "atomic_<logic><mode>"
> >
> > pattern last.
> 
> I don't think this is a target problem, these insns work as expected
> and are covered by extensive testsuite in gcc.target/i386/hle-*.c.

Well the C++ test cases I wrote didn't work. It may be related to 
how complex the program is. Simple calls as in the original
test suite seem to work.

e.g.  instead of xacquire lock and ... it ended up with a cmpxchg loop
(which I think is a fallback path). The cmpxchg loop didn't include
a HLE prefix (and simply adding one is not enoigh, would need more
changes for successfull elision)

Before HLE the cmpxchg code was correct, just somewhat inefficient.
Even with HLE it is technically correct, just it'll never elide.

I think I would like to fix and,or,xor and disallow HLE for nand.

Here's a test case. Needs the libstdc++ HLE patch posted.

#include <atomic>

#define ACQ memory_order_acquire | __memory_order_hle_acquire
#define REL memory_order_release | __memory_order_hle_release

int main()
{
  using namespace std;
  atomic_ulong au = ATOMIC_VAR_INIT(0);

  if (!au.fetch_and(1, ACQ))
    au.fetch_and(-1, REL);

  unsigned lock = 0;
  __atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);

  return 0;
}

The first fetch_and generates: (wrong)


.L2:
        movq    %rax, %rcx
        movq    %rax, %rdx
        andl    $1, %ecx
        lock; cmpxchgq  %rcx, -24(%rsp)
        jne     .L2


the second __atomic_fetch_and generates (correct):


        lock; 
        .byte   0xf2
        andl    $1, -28(%rsp)
.LBE14:


-Andi
Uros Bizjak - Jan. 14, 2013, 7:24 p.m.
On Mon, Jan 14, 2013 at 8:01 PM, Andi Kleen <andi@firstfloor.org> wrote:

> Well the C++ test cases I wrote didn't work. It may be related to
> how complex the program is. Simple calls as in the original
> test suite seem to work.
>
> e.g.  instead of xacquire lock and ... it ended up with a cmpxchg loop
> (which I think is a fallback path). The cmpxchg loop didn't include
> a HLE prefix (and simply adding one is not enoigh, would need more
> changes for successfull elision)
>
> Before HLE the cmpxchg code was correct, just somewhat inefficient.
> Even with HLE it is technically correct, just it'll never elide.
>
> I think I would like to fix and,or,xor and disallow HLE for nand.
>
> Here's a test case. Needs the libstdc++ HLE patch posted.

Can you please attach _preprocessed_ (i.e. add -save-temps to compile
flags) source to a PR?

Uros.

Patch

Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md	(revision 195152)
+++ config/i386/sync.md	(working copy)
@@ -224,8 +224,12 @@ 
 	  DONE;
 	}
 
-      /* Otherwise use a normal store.  */
-      emit_move_insn (operands[0], operands[1]);
+      /* Otherwise use a store.  */
+      if (INTVAL (operands[2]) & IX86_HLE_RELEASE)
+	emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
+					     operands[2]));
+      else
+	emit_move_insn (operands[0], operands[1]);
     }
   /* ... followed by an MFENCE, if required.  */
   if (model == MEMMODEL_SEQ_CST)
@@ -233,6 +237,14 @@ 
   DONE;
 })
 
+(define_insn "atomic_store<mode>_1"
+  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
+	(unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
+			(match_operand:SI 2 "const_int_operand")]
+		       UNSPEC_MOVA))]
+  ""
+  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
 (define_insn_and_split "atomic_storedi_fpu"
   [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
 	(unspec:DI [(match_operand:DI 1 "register_operand" "x,m,?r")]
Index: testsuite/gcc.target/i386/hle-clear-rel.c
===================================================================
--- testsuite/gcc.target/i386/hle-clear-rel.c	(revision 0)
+++ testsuite/gcc.target/i386/hle-clear-rel.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_clear (char *p, int v)
+{
+  __atomic_clear (p, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
Index: testsuite/gcc.target/i386/hle-store-rel.c
===================================================================
--- testsuite/gcc.target/i386/hle-store-rel.c	(revision 0)
+++ testsuite/gcc.target/i386/hle-store-rel.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_store (int *p, int v)
+{
+  __atomic_store_n (p, v, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}