Patchwork [libcpp] : Use x86 __builtin_ia32_pcmpestri128 instead of asm.

login
register
mail settings
Submitter Uros Bizjak
Date June 18, 2012, 8:19 p.m.
Message ID <CAFULd4a=HVQrmm9wEsoSgPEeVB6x9U+yJk1OjUOsTSeiiquJQw@mail.gmail.com>
Download mbox | patch
Permalink /patch/165585/
State New
Headers show

Comments

Uros Bizjak - June 18, 2012, 8:19 p.m.
Hello!

Following the patch that allows unaligned operands in pcmpestri [1],
we can substitute x86 asm in lex.c with equivalent builtin functions.

2011-06-18  Uros Bizjak  <ubizjak@gmail.com>

	* lex.c (search_line_sse42): Use __builtin_ia32_loaddqu and
	__builtin_ia32_pcmpestri128 instead of asm.

Bootstrapped and regression tested on x86_64-pc-linux-gnu SSE4.2
target. Also, I have checked that the same code is generated for
changed function.

OK for mainline?

[1] http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01189.html

Uros.
Richard Henderson - June 18, 2012, 10:07 p.m.
On 2012-06-18 13:19, Uros Bizjak wrote:
>        /* ??? The builtin doesn't understand that the PCMPESTRI read from
>  	 memory need not be aligned.  */
> -      __asm ("%vpcmpestri $0, (%1), %2"
> -	     : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
> +      sv = __builtin_ia32_loaddqu ((const char *) s);
> +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
> +


Surely the comment can be removed too then?


r~
Uros Bizjak - June 19, 2012, 6:38 a.m.
On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson <rth@redhat.com> wrote:
> On 2012-06-18 13:19, Uros Bizjak wrote:
>>        /* ??? The builtin doesn't understand that the PCMPESTRI read from
>>        memory need not be aligned.  */
>> -      __asm ("%vpcmpestri $0, (%1), %2"
>> -          : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
>> +      sv = __builtin_ia32_loaddqu ((const char *) s);
>> +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>> +
>
>
> Surely the comment can be removed too then?

I'm not sure there. The builtin, as defined, expects V16QI operand
with xm constraint. Using:

int test (const char *s1)
{
  const v16qi *p = (const v16qi *)(unsigned long) s1;
  return __builtin_ia32_pcmpistri128 (*p, ...);
}

will generate movdqa before pcmpistri.

With x86 pcmp[ie]str patch, we trick gcc to pass unaligned memory to
the pcmp[ie]str RTX, but we still need __builtin_ia32_loaddqu in front
of __builtin_ia32_pcmpestri128.

Uros.
Uros Bizjak - June 19, 2012, 7:29 a.m.
On Tue, Jun 19, 2012 at 8:38 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson <rth@redhat.com> wrote:
>> On 2012-06-18 13:19, Uros Bizjak wrote:
>>>        /* ??? The builtin doesn't understand that the PCMPESTRI read from
>>>        memory need not be aligned.  */
>>> -      __asm ("%vpcmpestri $0, (%1), %2"
>>> -          : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
>>> +      sv = __builtin_ia32_loaddqu ((const char *) s);
>>> +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>>> +
>>
>>
>> Surely the comment can be removed too then?
>
> I'm not sure there. The builtin, as defined, expects V16QI operand
> with xm constraint. Using:
>
> int test (const char *s1)
> {
>  const v16qi *p = (const v16qi *)(unsigned long) s1;
>  return __builtin_ia32_pcmpistri128 (*p, ...);
> }
>
> will generate movdqa before pcmpistri.

Pedantic correction: __builtin_ia32_pcmpistri128 (v16qi_arg, *p, N);

movdqa in front of this builtin will be generated with -O0.

Uros.
Richard Henderson - June 19, 2012, 3:51 p.m.
On 2012-06-18 23:38, Uros Bizjak wrote:
> On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson <rth@redhat.com> wrote:
>> On 2012-06-18 13:19, Uros Bizjak wrote:
>>>        /* ??? The builtin doesn't understand that the PCMPESTRI read from
>>>        memory need not be aligned.  */
>>> -      __asm ("%vpcmpestri $0, (%1), %2"
>>> -          : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
>>> +      sv = __builtin_ia32_loaddqu ((const char *) s);
>>> +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>>> +
>>
>>
>> Surely the comment can be removed too then?
> 
> I'm not sure there. The builtin, as defined, expects V16QI operand
> with xm constraint.

Fair enough.  I'm ok with the patch as-is.


r~

Patch

Index: lex.c
===================================================================
--- lex.c	(revision 188750)
+++ lex.c	(working copy)
@@ -420,6 +420,7 @@  search_line_sse42 (const uchar *s, const uchar *en
 {
   typedef char v16qi __attribute__ ((__vector_size__ (16)));
   static const v16qi search = { '\n', '\r', '?', '\\' };
+  v16qi sv;
 
   uintptr_t si = (uintptr_t)s;
   uintptr_t index;
@@ -439,8 +440,9 @@  search_line_sse42 (const uchar *s, const uchar *en
 
       /* ??? The builtin doesn't understand that the PCMPESTRI read from
 	 memory need not be aligned.  */
-      __asm ("%vpcmpestri $0, (%1), %2"
-	     : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
+      sv = __builtin_ia32_loaddqu ((const char *) s);
+      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
+
       if (__builtin_expect (index < 16, 0))
 	goto found;