diff mbox series

[AArch64] PR target/85512: Tighten SIMD right shift immediate constraints

Message ID 5ADF4F77.1070508@foss.arm.com
State New
Headers show
Series [AArch64] PR target/85512: Tighten SIMD right shift immediate constraints | expand

Commit Message

Kyrill Tkachov April 24, 2018, 3:38 p.m. UTC
Hi all,

In this testcase it is possible to generate an invalid SISD shift of zero:
Error: immediate value out of range 1 to 64 at operand 3 -- `sshr v9.2s,v0.2s,0'

The SSHR and USHR instructions require a shift from 1 up to the element size.
However our constraints on the scalar shifts that generate these patterns
allow a shift amount of zero as well. The pure GP-reg ASR and LSR instructions allow a shift amount of zero.

It is unlikely that a shift of zero will survive till the end of compilation, but it's not impossible, as this PR shows.

The patch tightens up the constraints in the offending patterns by adding two new constraints
that allow shift amounts [1,32] and [1,64] and using them in *aarch64_ashr_sisd_or_int_<mode>3
and *aarch64_lshr_sisd_or_int_<mode>3.
The left-shift SISD instructions SHL and USHL allow a shift amount of zero so don't need adjustment
The vector shift patterns that map down to SSHR and USHR already enforce the correct immediate range.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?

The issue is latent on the branches so I would like to backport there after some time on trunk.

Thanks,
Kyrill

2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/85512
     * config/aarch64/constraints.md (Usg, Usj): New constraints.
     * config/aarch64/iterators.md (cmode_simd): New mode attribute.
     * config/aarch64/aarch64.md (*aarch64_ashr_sisd_or_int_<mode>3):
     Use the above on operand 2.  Reindent.
     (*aarch64_lshr_sisd_or_int_<mode>3): Likewise.

2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/85512
     * gcc.dg/pr85512.c: New test.

Comments

James Greenhalgh April 24, 2018, 4:09 p.m. UTC | #1
On Tue, Apr 24, 2018 at 04:38:31PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> In this testcase it is possible to generate an invalid SISD shift of zero:
> Error: immediate value out of range 1 to 64 at operand 3 -- `sshr v9.2s,v0.2s,0'
> 
> The SSHR and USHR instructions require a shift from 1 up to the element size.
> However our constraints on the scalar shifts that generate these patterns
> allow a shift amount of zero as well. The pure GP-reg ASR and LSR
> instructions allow a shift amount of zero.
> 
> It is unlikely that a shift of zero will survive till the end of compilation,
> but it's not impossible, as this PR shows.
> 
> The patch tightens up the constraints in the offending patterns by adding two
> new constraints that allow shift amounts [1,32] and [1,64] and using them in
> *aarch64_ashr_sisd_or_int_<mode>3
> and *aarch64_lshr_sisd_or_int_<mode>3.
> The left-shift SISD instructions SHL and USHL allow a shift amount of zero so
> don't need adjustment The vector shift patterns that map down to SSHR and
> USHR already enforce the correct immediate range.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

OK if the release managers are fine with this.

I was a little nervous that we were tightening restrictions on an instruction
which has caused trouble in the past, but this patch looks right to me.

Thanks,
James

> 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      PR target/85512
>      * config/aarch64/constraints.md (Usg, Usj): New constraints.
>      * config/aarch64/iterators.md (cmode_simd): New mode attribute.
>      * config/aarch64/aarch64.md (*aarch64_ashr_sisd_or_int_<mode>3):
>      Use the above on operand 2.  Reindent.
>      (*aarch64_lshr_sisd_or_int_<mode>3): Likewise.
> 
> 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      PR target/85512
>      * gcc.dg/pr85512.c: New test.
Kyrill Tkachov April 24, 2018, 4:22 p.m. UTC | #2
On 24/04/18 17:09, James Greenhalgh wrote:
> On Tue, Apr 24, 2018 at 04:38:31PM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In this testcase it is possible to generate an invalid SISD shift of zero:
>> Error: immediate value out of range 1 to 64 at operand 3 -- `sshr v9.2s,v0.2s,0'
>>
>> The SSHR and USHR instructions require a shift from 1 up to the element size.
>> However our constraints on the scalar shifts that generate these patterns
>> allow a shift amount of zero as well. The pure GP-reg ASR and LSR
>> instructions allow a shift amount of zero.
>>
>> It is unlikely that a shift of zero will survive till the end of compilation,
>> but it's not impossible, as this PR shows.
>>
>> The patch tightens up the constraints in the offending patterns by adding two
>> new constraints that allow shift amounts [1,32] and [1,64] and using them in
>> *aarch64_ashr_sisd_or_int_<mode>3
>> and *aarch64_lshr_sisd_or_int_<mode>3.
>> The left-shift SISD instructions SHL and USHL allow a shift amount of zero so
>> don't need adjustment The vector shift patterns that map down to SSHR and
>> USHR already enforce the correct immediate range.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for trunk?
> OK if the release managers are fine with this.
>
> I was a little nervous that we were tightening restrictions on an instruction
> which has caused trouble in the past, but this patch looks right to me.

Thanks James.

I've cleaned up the testcase a bit to leave only the function that generates the invalid instruction,
making it shorter.

Jakub, is the patch ok to go in for GCC 8 from your perspective?

Thanks,
Kyrill

> Thanks,
> James
>
>> 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       PR target/85512
>>       * config/aarch64/constraints.md (Usg, Usj): New constraints.
>>       * config/aarch64/iterators.md (cmode_simd): New mode attribute.
>>       * config/aarch64/aarch64.md (*aarch64_ashr_sisd_or_int_<mode>3):
>>       Use the above on operand 2.  Reindent.
>>       (*aarch64_lshr_sisd_or_int_<mode>3): Likewise.
>>
>> 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       PR target/85512
>>       * gcc.dg/pr85512.c: New test.
commit 0856b82380edda94b4e70b19522366fb80ebecff
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Apr 24 13:51:19 2018 +0100

    [AArch64] PR target/85512

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4924168..953edb7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4403,7 +4403,8 @@ (define_insn "*aarch64_lshr_sisd_or_int_<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w")
 	(lshiftrt:GPI
 	 (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
-	 (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,r,Us<cmode>,w,0")))]
+	 (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>"
+			      "Us<cmode>,r,Us<cmode_simd>,w,0")))]
   ""
   "@
    lsr\t%<w>0, %<w>1, %2
@@ -4448,9 +4449,10 @@ (define_split
 ;; Arithmetic right shift using SISD or Integer instruction
 (define_insn "*aarch64_ashr_sisd_or_int_<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w")
-        (ashiftrt:GPI
-          (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
-          (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" "Us<cmode>,r,Us<cmode>,w,0")))]
+	(ashiftrt:GPI
+	  (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
+	  (match_operand:QI 2 "aarch64_reg_or_shift_imm_di"
+			       "Us<cmode>,r,Us<cmode_simd>,w,0")))]
   ""
   "@
    asr\t%<w>0, %<w>1, %2
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index f052103..b5da997 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -153,6 +153,20 @@ (define_constraint "Usf"
        (match_test "!(aarch64_is_noplt_call_p (op)
 		      || aarch64_is_long_call_p (op))")))
 
+(define_constraint "Usg"
+  "@internal
+  A constraint that matches an immediate right shift constant in SImode
+  suitable for a SISD instruction."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, 1, 32)")))
+
+(define_constraint "Usj"
+  "@internal
+  A constraint that matches an immediate right shift constant in DImode
+  suitable for a SISD instruction."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, 1, 64)")))
+
 (define_constraint "UsM"
   "@internal
   A constraint that matches the immediate constant -1."
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 33933ec..bd97408 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -589,6 +589,9 @@ (define_mode_attr lconst2 [(SI "UsO") (DI "UsP")])
 ;; Map a mode to a specific constraint character.
 (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
 
+;; Map modes to Usg and Usj constraints for SISD right shifts
+(define_mode_attr cmode_simd [(SI "g") (DI "j")])
+
 (define_mode_attr Vtype [(V8QI "8b") (V16QI "16b")
 			 (V4HI "4h") (V8HI  "8h")
                          (V2SI "2s") (V4SI  "4s")
diff --git a/gcc/testsuite/gcc.dg/pr85512.c b/gcc/testsuite/gcc.dg/pr85512.c
new file mode 100644
index 0000000..b581f83
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr85512.c
@@ -0,0 +1,47 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -fno-if-conversion" } */
+
+typedef unsigned char u8;
+typedef unsigned short u16;
+typedef unsigned int u32;
+typedef unsigned long long u64;
+u64
+bar0(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+u64
+bar1(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+u64
+bar2(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+u64
+bar0(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+{
+l0:     u32_2 += __builtin_add_overflow_p((u32)(u64)u32_0, (u16)-(u32)u64_2, (u64)-(u64)(unsigned)__builtin_parityll((u16)(u16)(((u64)0x1a6cb5b10 << 0))));
+        u8_3 <<= __builtin_add_overflow((u16)~(u32)u64_2, (u16)(u8)(unsigned)__builtin_popcountll((u16)-(u8)(((u64)0x725582 << 0))), &u32_1);
+        u64_1 -= (u8)~(u8)(0);
+        u16_3 = (u16_3 >> ((u32)~(u8)(1) & 15)) | (u16_3 << ((16 - ((u32)~(u8)(1) & 15)) & 15));
+        u8_2 = __builtin_mul_overflow((u64)(u8)__builtin_bswap32((u32)-(u32)u16_0), (u64)(u16)u16_2, &u8_3) ? (u64)~(u8)u8_3 : (u16)(u16)(((u64)0x7ffffff << 0));
+        u32_0 *= (u8)(u8)(((u64)0x1ffffffffffffff << 0));
+        u32_1 >>= (u64)~(u64)(((u64)0x61bf860d09fb3a << 0)) >= (u8)(u16)(unsigned)__builtin_parityll((u16)(u8)(((u64)0x6 << 0)));
+        u16_0 >>= __builtin_add_overflow_p((u64)-(u8)(((u64)0x68b4dda55e3 << 0)), (u16)(u64)__builtin_bswap64((u16)~(u32)__builtin_bswap32((u32)(u32)u8_3)), (u64)(u16)u16_1);
+        u64_0 += (u8)-(u64)(((u64)0xcc88a5c0292b6ba0 << 0));
+        u32_0 += __builtin_mul_overflow((u8)-(u64)(((u64)0xc89172ea72a << 0)), (u64)(u64)u8_2, &u8_3);
+        u64_0 >>= __builtin_add_overflow((u32)-(u64)(0), (u32)-(u16)u8_1, &u8_2);
+        u16_1 >>= (u32)(u64)u16_1 & 15;
+        u16_3 ^= (u16)~(u16)(1);
+        u32_2 &= (u16)-(u32)(0);
+l1:     u32_3 = (u32_3 >> ((u64)(u32)u32_1 & 31)) | (u32_3 << ((32 - ((u64)(u32)u32_1 & 31)) & 31));
+        u64_1 |= (u64)~(u64)(unsigned)__builtin_parityll((u8)-(u32)u32_1);
+        u8_3 *= __builtin_add_overflow((u64)-(u32)(((u64)0xffff << 0)), (u32)~(u64)(((u64)0x117e3e << 0)), &u32_2);
+        u16_3 = (u16_3 << ((u64)~(u8)(((u64)0xf78e81 << 0)) & 15)) | (u16_3 >> ((16 - ((u64)~(u8)(((u64)0xf78e81 << 0)) & 15)) & 15));
+        u64_1 = (u64)(u16)bar1((u8)((u32)(u64)(((u64)0x3ffffff << 0))), (u16)((u8)(u16)(((u64)0x5b << 0))), (u32)((u32)~(u8)(1)), (u64)((u8)(u16)(unsigned)__builtin_clrsb((u32)~(u32)(unsigned)__builtin_clrsbll((u8)(u16)(((u64)0xffffffff << 0))))), (u8)((u8)-(u64)(((u64)0x3e43180756484 << 0))), (u16)((u8)(u16)(((u64)0x7 << 0))), (u32)((u64)(u32)(((u64)0x285fa35c89 << 0))), (u64)((u32)(u8)(((u64)0x3ffff << 0))), (u8)((u16)-(u32)(((u64)0x73d01 << 0))), (u16)((u16)-(u16)(((u64)0x1fffffffffffff << 0))), (u32)((u16)(u64)(0)), (u64)((u16)(u32)(((u64)0x4c << 0))), (u8)((u64)-(u64)(((u64)0x3fffffffffffff << 0))), (u16)((u16)~(u16)(((u64)0xfffffffff << 0))), (u32)((u64)(u16)(((u64)0x7edb0cc1c << 0))), (u64)((u32)(u64)(((u64)0x1ffffffffff << 0)))) > (u16)-(u64)(((u64)0x7 << 0)) ? (u16)(u8)u64_2 : (u64)(u16)u32_2;
+        u32_0 >>= (u8)(u16)(((u64)0x32 << 0)) != (u16)-(u64)u16_3;
+        u16_1 *= __builtin_mul_overflow_p((u64)(u32)u32_1, (u16)(u8)(((u64)0x4ad149d89bf0be6 << 0)), (u64)(u32)(((u64)0x1bd7589 << 0)));
+        u8_1 &= (u64)-(u64)u8_0;
+        u16_3 %= (u16)(u16)(unsigned)__builtin_clrsbll((u32)~(u32)(((u64)0x3db8721fd79 << 0)));
+        u8_3 >>= (u32)(u8)u8_1 & 7;
+        u64_1 |= (u8)-(u64)(unsigned)__builtin_ffsll((u32)-(u64)bar2((u8)((u16)(u16)(((u64)0x3 << 0))), (u16)((u32)-(u8)(((u64)0x86af5 << 0))), (u32)((u16)-(u64)__builtin_bswap64((u64)-(u64)(0))), (u64)((u16)(u16)(((u64)0x75138426ec84c6 << 0))), (u8)((u64)(u32)(((u64)0x7fffffffff << 0))), (u16)((u32)~(u8)(((u64)0x71aa939dbdf3 << 0))), (u32)((u16)(u32)(((u64)0x8776ee7dbb651a2d << 0))), (u64)((u8)(u64)(0)), (u8)((u16)(u8)(unsigned)__builtin_clrsbll((u16)~(u32)(((u64)0x8df94655ec8430 << 0)))), (u16)((u16)-(u64)(unsigned)__builtin_clrsbll((u32)(u64)(((u64)0x3090a532 << 0)))), (u32)((u16)~(u16)(1)), (u64)((u8)(u32)(((u64)0x7fffffffffff << 0))), (u8)((u32)~(u64)(0)), (u16)((u8)~(u8)(unsigned)__builtin_ffs((u64)(u64)(0))), (u32)((u16)-(u8)(((u64)0x5dfe702 << 0))), (u64)((u8)(u64)(((u64)0x68f2a584e0 << 0)))));
+        u32_3 >>= (u32)-(u32)u32_2 & 31;
+        u8_3 = (u8_3 >> ((u32)-(u8)u8_1 & 7)) | (u8_3 << ((8 - ((u32)-(u8)u8_1 & 7)) & 7));
+        u8_2 >>= (u16)-(u64)u64_3 & 7;
+        u32_1 = (u32_1 >> ((u16)(u16)(1) & 31)) | (u32_1 << ((32 - ((u16)(u16)(1) & 31)) & 31));
+        return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+}
Jakub Jelinek April 24, 2018, 4:41 p.m. UTC | #3
On Tue, Apr 24, 2018 at 05:22:15PM +0100, Kyrill Tkachov wrote:
> I've cleaned up the testcase a bit to leave only the function that generates the invalid instruction,
> making it shorter.
> 
> Jakub, is the patch ok to go in for GCC 8 from your perspective?

The PR is marked P1 now, so sure, please commit this for GCC 8, the sooner
the better.  We have only one other P1 left.

The only thing I'm unsure about is whether you want to make the top of the
range 32 and 64 rather than just 31 and 63, after all the operand
predicate used there requires < 32 and < 64, and from middle-end's POV
shifts by 32 or 64 are undefined (unless SHIFT_COUNT_TRUNCATED, but
aarch64 defines it to
#define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)

So, using
(match_test "IN_RANGE (ival, 1, 31)")))
and
(match_test "IN_RANGE (ival, 1, 63)")))
would feel safer to me, but your call.

> > > 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> > > 
> > >       PR target/85512
> > >       * config/aarch64/constraints.md (Usg, Usj): New constraints.
> > >       * config/aarch64/iterators.md (cmode_simd): New mode attribute.
> > >       * config/aarch64/aarch64.md (*aarch64_ashr_sisd_or_int_<mode>3):
> > >       Use the above on operand 2.  Reindent.
> > >       (*aarch64_lshr_sisd_or_int_<mode>3): Likewise.
> > > 
> > > 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> > > 
> > >       PR target/85512
> > >       * gcc.dg/pr85512.c: New test.
> 


	Jakub
Kyrill Tkachov April 24, 2018, 4:59 p.m. UTC | #4
On 24/04/18 17:41, Jakub Jelinek wrote:
> On Tue, Apr 24, 2018 at 05:22:15PM +0100, Kyrill Tkachov wrote:
>> I've cleaned up the testcase a bit to leave only the function that generates the invalid instruction,
>> making it shorter.
>>
>> Jakub, is the patch ok to go in for GCC 8 from your perspective?
> The PR is marked P1 now, so sure, please commit this for GCC 8, the sooner
> the better.  We have only one other P1 left.

Thanks, I've committed it as 259614.

> The only thing I'm unsure about is whether you want to make the top of the
> range 32 and 64 rather than just 31 and 63, after all the operand
> predicate used there requires < 32 and < 64, and from middle-end's POV
> shifts by 32 or 64 are undefined (unless SHIFT_COUNT_TRUNCATED, but
> aarch64 defines it to
> #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
>
> So, using
> (match_test "IN_RANGE (ival, 1, 31)")))
> and
> (match_test "IN_RANGE (ival, 1, 63)")))
> would feel safer to me, but your call.

I don't think this is necessary.
We already nominally allow shifts up to 32/64 in the SIMD versions
(see aarch64_simd_ashr<mode> in aarch64-simd.md) though I imagine if such shifts
are undefined in the midend then it will just not generate them or at least blow
up somewhere along the way.

So I've left the range as is.
Thanks,
Kyrill

>>>> 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>        PR target/85512
>>>>        * config/aarch64/constraints.md (Usg, Usj): New constraints.
>>>>        * config/aarch64/iterators.md (cmode_simd): New mode attribute.
>>>>        * config/aarch64/aarch64.md (*aarch64_ashr_sisd_or_int_<mode>3):
>>>>        Use the above on operand 2.  Reindent.
>>>>        (*aarch64_lshr_sisd_or_int_<mode>3): Likewise.
>>>>
>>>> 2018-04-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>        PR target/85512
>>>>        * gcc.dg/pr85512.c: New test.
>
> 	Jakub
Richard Sandiford April 26, 2018, 1:47 p.m. UTC | #5
Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
> On 24/04/18 17:41, Jakub Jelinek wrote:
>> On Tue, Apr 24, 2018 at 05:22:15PM +0100, Kyrill Tkachov wrote:
>>> I've cleaned up the testcase a bit to leave only the function that
>>> generates the invalid instruction,
>>> making it shorter.
>>>
>>> Jakub, is the patch ok to go in for GCC 8 from your perspective?
>> The PR is marked P1 now, so sure, please commit this for GCC 8, the sooner
>> the better.  We have only one other P1 left.
>
> Thanks, I've committed it as 259614.
>
>> The only thing I'm unsure about is whether you want to make the top of the
>> range 32 and 64 rather than just 31 and 63, after all the operand
>> predicate used there requires < 32 and < 64, and from middle-end's POV
>> shifts by 32 or 64 are undefined (unless SHIFT_COUNT_TRUNCATED, but
>> aarch64 defines it to
>> #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
>>
>> So, using
>> (match_test "IN_RANGE (ival, 1, 31)")))
>> and
>> (match_test "IN_RANGE (ival, 1, 63)")))
>> would feel safer to me, but your call.
>
> I don't think this is necessary.
> We already nominally allow shifts up to 32/64 in the SIMD versions
> (see aarch64_simd_ashr<mode> in aarch64-simd.md) though I imagine if such shifts
> are undefined in the midend then it will just not generate them or at least blow
> up somewhere along the way.
>
> So I've left the range as is.

But as Jakub says, it's wrong for the constraints to accept something
that the predicates don't.  That must never happen for reloadable
operands.  E.g. we could enter RA with a 32-bit shift by a register R
that is known to be equal to 32.  As it stands, the constraints allow
the RA to replace R with 32 (e.g. if R would otherwise be spilled) but
the predicates would reject the result.  We'd then get an unrecognisable
insn ICE.

So I think we either need to adjust the predicate to accept the wider
range, or adjust the constraint as above.  At this stage adjusting the
constraint seems safer.

Thanks,
Richard
Kyrill Tkachov April 27, 2018, 8:29 a.m. UTC | #6
On 26/04/18 14:47, Richard Sandiford wrote:
> Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
>> On 24/04/18 17:41, Jakub Jelinek wrote:
>>> On Tue, Apr 24, 2018 at 05:22:15PM +0100, Kyrill Tkachov wrote:
>>>> I've cleaned up the testcase a bit to leave only the function that
>>>> generates the invalid instruction,
>>>> making it shorter.
>>>>
>>>> Jakub, is the patch ok to go in for GCC 8 from your perspective?
>>> The PR is marked P1 now, so sure, please commit this for GCC 8, the sooner
>>> the better.  We have only one other P1 left.
>> Thanks, I've committed it as 259614.
>>
>>> The only thing I'm unsure about is whether you want to make the top of the
>>> range 32 and 64 rather than just 31 and 63, after all the operand
>>> predicate used there requires < 32 and < 64, and from middle-end's POV
>>> shifts by 32 or 64 are undefined (unless SHIFT_COUNT_TRUNCATED, but
>>> aarch64 defines it to
>>> #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
>>>
>>> So, using
>>> (match_test "IN_RANGE (ival, 1, 31)")))
>>> and
>>> (match_test "IN_RANGE (ival, 1, 63)")))
>>> would feel safer to me, but your call.
>> I don't think this is necessary.
>> We already nominally allow shifts up to 32/64 in the SIMD versions
>> (see aarch64_simd_ashr<mode> in aarch64-simd.md) though I imagine if such shifts
>> are undefined in the midend then it will just not generate them or at least blow
>> up somewhere along the way.
>>
>> So I've left the range as is.
> But as Jakub says, it's wrong for the constraints to accept something
> that the predicates don't.  That must never happen for reloadable
> operands.  E.g. we could enter RA with a 32-bit shift by a register R
> that is known to be equal to 32.  As it stands, the constraints allow
> the RA to replace R with 32 (e.g. if R would otherwise be spilled) but
> the predicates would reject the result.  We'd then get an unrecognisable
> insn ICE.
>
> So I think we either need to adjust the predicate to accept the wider
> range, or adjust the constraint as above.  At this stage adjusting the
> constraint seems safer.

Ok, thanks for the explanation. I think I misunderstood the issue initially.
This patch tightens the new constraints to 31 and 63 at the upper limit.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill


2018-04-27  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/constraints.md (Usg): Limit to 31.
     (Usj): Limit to 63.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index b5da997e7bab1541ed1401a94a3b94a0cde8017f..32a0fa60a198c714f7c0b8b987da6bc26992845d 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -158,14 +158,14 @@ (define_constraint "Usg"
   A constraint that matches an immediate right shift constant in SImode
   suitable for a SISD instruction."
   (and (match_code "const_int")
-       (match_test "IN_RANGE (ival, 1, 32)")))
+       (match_test "IN_RANGE (ival, 1, 31)")))
 
 (define_constraint "Usj"
   "@internal
   A constraint that matches an immediate right shift constant in DImode
   suitable for a SISD instruction."
   (and (match_code "const_int")
-       (match_test "IN_RANGE (ival, 1, 64)")))
+       (match_test "IN_RANGE (ival, 1, 63)")))
 
 (define_constraint "UsM"
   "@internal
James Greenhalgh April 27, 2018, 8:33 a.m. UTC | #7
On Fri, Apr 27, 2018 at 09:29:28AM +0100, Kyrill Tkachov wrote:
> 
> On 26/04/18 14:47, Richard Sandiford wrote:
> > Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
> >> On 24/04/18 17:41, Jakub Jelinek wrote:
> >>> On Tue, Apr 24, 2018 at 05:22:15PM +0100, Kyrill Tkachov wrote:
> >>>> I've cleaned up the testcase a bit to leave only the function that
> >>>> generates the invalid instruction,
> >>>> making it shorter.
> >>>>
> >>>> Jakub, is the patch ok to go in for GCC 8 from your perspective?
> >>> The PR is marked P1 now, so sure, please commit this for GCC 8, the sooner
> >>> the better.  We have only one other P1 left.
> >> Thanks, I've committed it as 259614.
> >>
> >>> The only thing I'm unsure about is whether you want to make the top of the
> >>> range 32 and 64 rather than just 31 and 63, after all the operand
> >>> predicate used there requires < 32 and < 64, and from middle-end's POV
> >>> shifts by 32 or 64 are undefined (unless SHIFT_COUNT_TRUNCATED, but
> >>> aarch64 defines it to
> >>> #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
> >>>
> >>> So, using
> >>> (match_test "IN_RANGE (ival, 1, 31)")))
> >>> and
> >>> (match_test "IN_RANGE (ival, 1, 63)")))
> >>> would feel safer to me, but your call.
> >> I don't think this is necessary.
> >> We already nominally allow shifts up to 32/64 in the SIMD versions
> >> (see aarch64_simd_ashr<mode> in aarch64-simd.md) though I imagine if such shifts
> >> are undefined in the midend then it will just not generate them or at least blow
> >> up somewhere along the way.
> >>
> >> So I've left the range as is.
> > But as Jakub says, it's wrong for the constraints to accept something
> > that the predicates don't.  That must never happen for reloadable
> > operands.  E.g. we could enter RA with a 32-bit shift by a register R
> > that is known to be equal to 32.  As it stands, the constraints allow
> > the RA to replace R with 32 (e.g. if R would otherwise be spilled) but
> > the predicates would reject the result.  We'd then get an unrecognisable
> > insn ICE.
> >
> > So I think we either need to adjust the predicate to accept the wider
> > range, or adjust the constraint as above.  At this stage adjusting the
> > constraint seems safer.
> 
> Ok, thanks for the explanation. I think I misunderstood the issue initially.
> This patch tightens the new constraints to 31 and 63 at the upper limit.

Given the discussion, I'd say this is the obvious fix, but as you've
asked - this is OK.

Thanks,
James

> 2018-04-27  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * config/aarch64/constraints.md (Usg): Limit to 31.
>      (Usj): Limit to 63.
>
diff mbox series

Patch

commit d8890f8682517336648b6c780087e131a8943f09
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Apr 24 13:51:19 2018 +0100

    [AArch64] PR target/85512

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4924168..953edb7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4403,7 +4403,8 @@  (define_insn "*aarch64_lshr_sisd_or_int_<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w")
 	(lshiftrt:GPI
 	 (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
-	 (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,r,Us<cmode>,w,0")))]
+	 (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>"
+			      "Us<cmode>,r,Us<cmode_simd>,w,0")))]
   ""
   "@
    lsr\t%<w>0, %<w>1, %2
@@ -4448,9 +4449,10 @@  (define_split
 ;; Arithmetic right shift using SISD or Integer instruction
 (define_insn "*aarch64_ashr_sisd_or_int_<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w")
-        (ashiftrt:GPI
-          (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
-          (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" "Us<cmode>,r,Us<cmode>,w,0")))]
+	(ashiftrt:GPI
+	  (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
+	  (match_operand:QI 2 "aarch64_reg_or_shift_imm_di"
+			       "Us<cmode>,r,Us<cmode_simd>,w,0")))]
   ""
   "@
    asr\t%<w>0, %<w>1, %2
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index f052103..b5da997 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -153,6 +153,20 @@  (define_constraint "Usf"
        (match_test "!(aarch64_is_noplt_call_p (op)
 		      || aarch64_is_long_call_p (op))")))
 
+(define_constraint "Usg"
+  "@internal
+  A constraint that matches an immediate right shift constant in SImode
+  suitable for a SISD instruction."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, 1, 32)")))
+
+(define_constraint "Usj"
+  "@internal
+  A constraint that matches an immediate right shift constant in DImode
+  suitable for a SISD instruction."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, 1, 64)")))
+
 (define_constraint "UsM"
   "@internal
   A constraint that matches the immediate constant -1."
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 33933ec..bd97408 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -589,6 +589,9 @@  (define_mode_attr lconst2 [(SI "UsO") (DI "UsP")])
 ;; Map a mode to a specific constraint character.
 (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
 
+;; Map modes to Usg and Usj constraints for SISD right shifts
+(define_mode_attr cmode_simd [(SI "g") (DI "j")])
+
 (define_mode_attr Vtype [(V8QI "8b") (V16QI "16b")
 			 (V4HI "4h") (V8HI  "8h")
                          (V2SI "2s") (V4SI  "4s")
diff --git a/gcc/testsuite/gcc.dg/pr85512.c b/gcc/testsuite/gcc.dg/pr85512.c
new file mode 100644
index 0000000..ce3cdb0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr85512.c
@@ -0,0 +1,118 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O -fno-if-conversion" } */
+
+typedef unsigned char u8;
+typedef unsigned short u16;
+typedef unsigned int u32;
+typedef unsigned long long u64;
+inline u64
+bar0(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+static __attribute__((noipa)) u64
+bar1(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+ __attribute__((noclone)) __attribute__((no_icf)) u64
+bar2(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+static u64
+bar3(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+inline __attribute__((noclone)) __attribute__((no_icf)) u64
+bar4(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3);
+u64
+bar0(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+{
+l0:	u32_2 += __builtin_add_overflow_p((u32)(u64)u32_0, (u16)-(u32)u64_2, (u64)-(u64)(unsigned)__builtin_parityll((u16)(u16)(((u64)0x1a6cb5b10 << 0))));
+	u8_3 <<= __builtin_add_overflow((u16)~(u32)u64_2, (u16)(u8)(unsigned)__builtin_popcountll((u16)-(u8)(((u64)0x725582 << 0))), &u32_1);
+	u64_1 -= (u8)~(u8)(0);
+	u16_3 = (u16_3 >> ((u32)~(u8)(1) & 15)) | (u16_3 << ((16 - ((u32)~(u8)(1) & 15)) & 15));
+	u8_2 = __builtin_mul_overflow((u64)(u8)__builtin_bswap32((u32)-(u32)u16_0), (u64)(u16)u16_2, &u8_3) ? (u64)~(u8)u8_3 : (u16)(u16)(((u64)0x7ffffff << 0));
+	u32_0 *= (u8)(u8)(((u64)0x1ffffffffffffff << 0));
+	u32_1 >>= (u64)~(u64)(((u64)0x61bf860d09fb3a << 0)) >= (u8)(u16)(unsigned)__builtin_parityll((u16)(u8)(((u64)0x6 << 0)));
+	u16_0 >>= __builtin_add_overflow_p((u64)-(u8)(((u64)0x68b4dda55e3 << 0)), (u16)(u64)__builtin_bswap64((u16)~(u32)__builtin_bswap32((u32)(u32)u8_3)), (u64)(u16)u16_1);
+	u64_0 += (u8)-(u64)(((u64)0xcc88a5c0292b6ba0 << 0));
+	u32_0 += __builtin_mul_overflow((u8)-(u64)(((u64)0xc89172ea72a << 0)), (u64)(u64)u8_2, &u8_3);
+	u64_0 >>= __builtin_add_overflow((u32)-(u64)(0), (u32)-(u16)u8_1, &u8_2);
+	u16_1 >>= (u32)(u64)u16_1 & 15;
+	u16_3 ^= (u16)~(u16)(1);
+	u32_2 &= (u16)-(u32)(0);
+l1:	u32_3 = (u32_3 >> ((u64)(u32)u32_1 & 31)) | (u32_3 << ((32 - ((u64)(u32)u32_1 & 31)) & 31));
+	u64_1 |= (u64)~(u64)(unsigned)__builtin_parityll((u8)-(u32)u32_1);
+	u8_3 *= __builtin_add_overflow((u64)-(u32)(((u64)0xffff << 0)), (u32)~(u64)(((u64)0x117e3e << 0)), &u32_2);
+	u16_3 = (u16_3 << ((u64)~(u8)(((u64)0xf78e81 << 0)) & 15)) | (u16_3 >> ((16 - ((u64)~(u8)(((u64)0xf78e81 << 0)) & 15)) & 15));
+	u64_1 = (u64)(u16)bar1((u8)((u32)(u64)(((u64)0x3ffffff << 0))), (u16)((u8)(u16)(((u64)0x5b << 0))), (u32)((u32)~(u8)(1)), (u64)((u8)(u16)(unsigned)__builtin_clrsb((u32)~(u32)(unsigned)__builtin_clrsbll((u8)(u16)(((u64)0xffffffff << 0))))), (u8)((u8)-(u64)(((u64)0x3e43180756484 << 0))), (u16)((u8)(u16)(((u64)0x7 << 0))), (u32)((u64)(u32)(((u64)0x285fa35c89 << 0))), (u64)((u32)(u8)(((u64)0x3ffff << 0))), (u8)((u16)-(u32)(((u64)0x73d01 << 0))), (u16)((u16)-(u16)(((u64)0x1fffffffffffff << 0))), (u32)((u16)(u64)(0)), (u64)((u16)(u32)(((u64)0x4c << 0))), (u8)((u64)-(u64)(((u64)0x3fffffffffffff << 0))), (u16)((u16)~(u16)(((u64)0xfffffffff << 0))), (u32)((u64)(u16)(((u64)0x7edb0cc1c << 0))), (u64)((u32)(u64)(((u64)0x1ffffffffff << 0)))) > (u16)-(u64)(((u64)0x7 << 0)) ? (u16)(u8)u64_2 : (u64)(u16)u32_2;
+	u32_0 >>= (u8)(u16)(((u64)0x32 << 0)) != (u16)-(u64)u16_3;
+	u16_1 *= __builtin_mul_overflow_p((u64)(u32)u32_1, (u16)(u8)(((u64)0x4ad149d89bf0be6 << 0)), (u64)(u32)(((u64)0x1bd7589 << 0)));
+	u8_1 &= (u64)-(u64)u8_0;
+	u16_3 %= (u16)(u16)(unsigned)__builtin_clrsbll((u32)~(u32)(((u64)0x3db8721fd79 << 0)));
+	u8_3 >>= (u32)(u8)u8_1 & 7;
+	u64_1 |= (u8)-(u64)(unsigned)__builtin_ffsll((u32)-(u64)bar2((u8)((u16)(u16)(((u64)0x3 << 0))), (u16)((u32)-(u8)(((u64)0x86af5 << 0))), (u32)((u16)-(u64)__builtin_bswap64((u64)-(u64)(0))), (u64)((u16)(u16)(((u64)0x75138426ec84c6 << 0))), (u8)((u64)(u32)(((u64)0x7fffffffff << 0))), (u16)((u32)~(u8)(((u64)0x71aa939dbdf3 << 0))), (u32)((u16)(u32)(((u64)0x8776ee7dbb651a2d << 0))), (u64)((u8)(u64)(0)), (u8)((u16)(u8)(unsigned)__builtin_clrsbll((u16)~(u32)(((u64)0x8df94655ec8430 << 0)))), (u16)((u16)-(u64)(unsigned)__builtin_clrsbll((u32)(u64)(((u64)0x3090a532 << 0)))), (u32)((u16)~(u16)(1)), (u64)((u8)(u32)(((u64)0x7fffffffffff << 0))), (u8)((u32)~(u64)(0)), (u16)((u8)~(u8)(unsigned)__builtin_ffs((u64)(u64)(0))), (u32)((u16)-(u8)(((u64)0x5dfe702 << 0))), (u64)((u8)(u64)(((u64)0x68f2a584e0 << 0)))));
+	u32_3 >>= (u32)-(u32)u32_2 & 31;
+	u8_3 = (u8_3 >> ((u32)-(u8)u8_1 & 7)) | (u8_3 << ((8 - ((u32)-(u8)u8_1 & 7)) & 7));
+	u8_2 >>= (u16)-(u64)u64_3 & 7;
+	u32_1 = (u32_1 >> ((u16)(u16)(1) & 31)) | (u32_1 << ((32 - ((u16)(u16)(1) & 31)) & 31));
+	return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+}
+
+u64
+bar1(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+{
+l0:	u32_3 -= __builtin_mul_overflow_p((u64)(u8)__builtin_bswap32((u8)(u32)u32_0), (u8)(u64)(((u64)0x3fffffffffffffff << 0)), (u16)~(u16)(((u64)0xea2 << 0)));
+	u16_0 = (u32)(u32)__builtin_bswap32((u16)-(u16)(((u64)0xb1e888e9 << 0))) > (u8)(u16)(((u64)0xffffffffffffff << 0)) ? (u64)-(u8)u64_2 : (u8)(u64)(1);
+	u32_2 ^= (u32)(u32)(((u64)0x3fffffffff << 0));
+l1:	u64_1 ^= (u32)(u8)u8_3 != (u64)(u16)u64_0;
+	u16_3 &= (u64)-(u32)(1);
+	u16_0 <<= __builtin_sub_overflow((u32)(u16)u32_0, (u64)(u64)(((u64)0xffffffffff << 0)), &u32_2);
+	u64_2 <<= __builtin_mul_overflow_p((u16)(u32)u16_2, (u16)(u64)(((u64)0x5c35c << 0)), (u64)(u32)u8_0);
+	u16_1 |= __builtin_add_overflow((u32)(u8)(0), (u16)(u32)u32_0, &u32_0);
+	return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+}
+
+u64
+bar2(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+{
+l0:	u16_2 >>= (u8)~(u16)(1) & 15;
+	u64_1 |= __builtin_sub_overflow((u32)~(u8)u16_2, (u64)(u8)u16_1, &u32_1);
+	u64_2 <<= (u8)-(u32)(((u64)0xf << 0)) & 63;
+	u16_3 &= (u16)-(u16)(((u64)0x1d0309136d << 0)) != (u32)(u16)u64_0;
+l1:	u16_1 *= (u16)(u32)__builtin_bswap32((u16)(u64)u32_2) >= (u32)-(u16)u64_0;
+	u16_1 *= (u8)~(u8)(0);
+	u32_0 += __builtin_sub_overflow_p((u32)(u8)u16_3, (u16)-(u16)u64_1, (u16)~(u16)u8_1);
+	return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+}
+
+u64
+bar3(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+{
+l0:	u16_2 >>= (u8)(u8)__builtin_bswap32((u16)(u32)u64_3) & 15;
+	u16_0 *= (u64)~(u32)(((u64)0x3ffffffffffffff << 0));
+	u32_1 = __builtin_mul_overflow_p((u8)(u32)u8_2, (u16)~(u16)u32_1, (u8)~(u16)(((u64)0xeb3715b6 << 0))) ? u32_1 : (u16)(u8)(0);
+	u32_0 %= (u64)(u8)(1);
+	u64_1 &= __builtin_sub_overflow((u16)(u32)u32_1, (u32)~(u32)u64_3, &u8_3);
+l1:	u32_3 |= (u32)~(u16)(((u64)0x67771c6cef68fffd << 0));
+	u8_0 <<= (u64)(u8)(unsigned)__builtin_ffs((u64)~(u16)u64_3) & 7;
+	u16_1 += __builtin_sub_overflow((u8)-(u64)(((u64)0x1a << 0)), (u8)-(u8)(((u64)0x43b5fa << 0)), &u64_2);
+	u64_0 &= (u64)(u32)u16_0;
+	u8_1 <<= __builtin_add_overflow_p((u32)~(u8)(((u64)0x4 << 0)), (u8)(u32)(((u64)0x1fffffffff << 0)), (u32)(u32)(((u64)0x65f66 << 0)));
+	return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+}
+
+u64
+bar4(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+{
+l0:	u64_0 ^= (u32)-(u16)(unsigned)__builtin_popcount((u8)~(u8)__builtin_bswap16((u8)~(u8)u32_0)) < (u64)(u8)u16_1;
+	u64_0 *= __builtin_add_overflow_p((u16)-(u16)u32_2, (u64)(u8)u32_0, (u64)(u32)u64_0);
+	u16_0 &= (u16)~(u64)__builtin_bswap32((u8)(u16)u64_0);
+	u64_0 ^= __builtin_sub_overflow_p((u16)(u16)(unsigned)__builtin_parityll((u32)-(u8)u64_0), (u32)(u64)(unsigned)__builtin_clrsbll((u64)-(u16)(((u64)0x26beb5879cd5 << 0))), (u64)(u16)u8_3);
+	u16_1 >>= (u16)(u8)u8_3 & 15;
+l1:	u16_2 >>= __builtin_mul_overflow_p((u16)(u32)(0), (u16)-(u16)u8_1, (u8)-(u64)u32_1);
+	u16_0 -= (u16)(u8)(unsigned)__builtin_clrsb((u32)~(u64)u32_1);
+	u32_0 = __builtin_mul_overflow_p((u32)(u64)u64_0, (u8)(u32)(((u64)0x8a762207277 << 0)), (u32)(u8)u64_3) ? (u16)~(u32)__builtin_bswap64((u16)(u64)u32_1) : (u16)(u16)(((u64)0xffff << 0));
+	u64_0 -= __builtin_mul_overflow((u32)-(u8)u32_2, (u8)(u32)u32_2, &u16_2);
+	u16_3 /= (u64)~(u32)(((u64)0x724748d41286e6ef << 0));
+	return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+}
+
+int main(void)
+{
+	u64 x = 0;
+	x += bar0(((u8)~(u16)(((u64)0x20242a7bde2a << 0))), ((u16)(u32)(((u64)0xdb737d << 0))), ((u32)~(u32)(((u64)0x1fff << 0))), ((u64)(u8)(0)), ((u8)-(u16)(((u64)0x7fffffffffffffff << 0))), ((u16)~(u8)(((u64)0x11b6030930062 << 0))), ((u32)-(u32)(((u64)0x891f360 << 0))), ((u64)(u8)(1)), ((u8)(u16)(((u64)0xffffffffffff << 0))), ((u16)(u64)(0)), ((u32)~(u16)(((u64)0xff000000 << 0))), ((u64)(u32)(0)), ((u8)(u8)(((u64)0x7fffffffffffff << 0))), ((u16)(u8)(0)), ((u32)(u16)(((u64)0x3ffffffffffffff << 0))), ((u64)(u16)(((u64)0x1ec5dd7 << 0))));
+	__builtin_printf("%016llx", (unsigned long long)(x >> 0)); __builtin_printf("\n");
+	return 0;
+}