avoid aarch64 ICE on large vectors (PR 89797)

Message ID	e7c5c284-6047-d78e-f82e-6bc72953dcf2@gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-499420-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=WRAYchl7kji9t9DV9kj/9oeSUjeBahOnihhvQFz4V5av5ySWjh oZZ+bkucFUv9+LvbUJxxaO9XeclhxHiYWkXDzOLaiC+l/TMsu8DJRz3mvRg5E85p xT4Sh193D0LXoYtFK7oLWcApDipWWjet5SxKj3zCSA1SQgv/fTUDm/MfA= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org To: gcc-patches <gcc-patches@gcc.gnu.org>, Jeff Law <law@redhat.com>, richard.sandiford@arm.com Cc: Jakub Jelinek <jakub@redhat.com> From: Martin Sebor <msebor@gmail.com> Subject: [PATCH] avoid aarch64 ICE on large vectors (PR 89797) Message-ID: <e7c5c284-6047-d78e-f82e-6bc72953dcf2@gmail.com> Date: Wed, 17 Apr 2019 20:38:21 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------4E57F5674E7149D69C2EF6B9"
Series	avoid aarch64 ICE on large vectors (PR 89797) \| expand avoid aarch64 ICE on large vectors (PR 89797)

Martin Sebor April 18, 2019, 2:38 a.m. UTC

The fix for pr89797 committed in r270326 was limited to targets
with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
The tests for the fix have been failing with an ICE on aarch64
because it suffers from more or less the same problem but in
its own target-specific code.  Attached is the patch I posted
yesterday that fixes the ICE, successfully bootstrapped and
regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
aarch64.exp tests with an aarch64-linux-elf cross-compiler.
There are no ICEs but there are tons of errors in the latter
tests because many (most?) either expect to be able to find
libc headers or link executables (I have not built libc for
aarch64).

I'm around tomorrow but then traveling the next two weeks (with
no connectivity the first week) so I unfortunately won't be able
to fix whatever this change might break until the week of May 6.

Jeff, if you have an aarch64 tester that could verify this patch
tomorrow that would help give us some confidence.  Otherwise,
another option to consider for the time being is to xfail
the tests on aarch64.

Thanks
Martin

Richard Earnshaw (lists) April 18, 2019, 8:58 a.m. UTC | #1

On 18/04/2019 03:38, Martin Sebor wrote:
> The fix for pr89797 committed in r270326 was limited to targets
> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
> The tests for the fix have been failing with an ICE on aarch64
> because it suffers from more or less the same problem but in
> its own target-specific code.  Attached is the patch I posted
> yesterday that fixes the ICE, successfully bootstrapped and
> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
> There are no ICEs but there are tons of errors in the latter
> tests because many (most?) either expect to be able to find
> libc headers or link executables (I have not built libc for
> aarch64).
> 
> I'm around tomorrow but then traveling the next two weeks (with
> no connectivity the first week) so I unfortunately won't be able
> to fix whatever this change might break until the week of May 6.
> 
> Jeff, if you have an aarch64 tester that could verify this patch
> tomorrow that would help give us some confidence.  Otherwise,
> another option to consider for the time being is to xfail
> the tests on aarch64.
> 
> Thanks
> Martin
> 
> gcc-89797.diff
> 
> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/89797
> 	* tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
> 	NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
> 	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
> 	assuming type size fits in SHWI.
> 
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h	(revision 270418)
> +++ gcc/tree.h	(working copy)
> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>    if (NUM_POLY_INT_COEFFS == 2)
>      {
>        poly_uint64 res = 0;
> -      res.coeffs[0] = 1 << (precision & 0xff);
> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);

I'm not sure I understand this either before or after.  (precision &
0xff) gives a value in the range 0-255, which is way more than can be
accommodated in a left shift, even for a HWI.

>        if (precision & 0x100)
> -	res.coeffs[1] = 1 << (precision & 0xff);
> +	res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);

And this equally doesn't make sense >>16 will shift the masked value out
of the bottom of the result.

Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?

R.

>        return res;
>      }
>    else
> -    return (unsigned HOST_WIDE_INT)1 << precision;
> +    return HOST_WIDE_INT_1U << precision;
>  }
>  
>  /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c	(revision 270418)
> +++ gcc/config/aarch64/aarch64.c	(working copy)
> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
>         be set for non-predicate vectors of booleans.  Modes are the most
>         direct way we have of identifying real SVE predicate types.  */
>      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
> +  tree size = TYPE_SIZE (type);
> +  unsigned HOST_WIDE_INT align = 128;
> +  if (tree_fits_uhwi_p (size))
> +    align = tree_to_uhwi (TYPE_SIZE (type));
>    return MIN (align, 128);
>  }
>  
>

Richard Earnshaw (lists) April 18, 2019, 11:59 a.m. UTC | #2

On 18/04/2019 09:58, Richard Earnshaw (lists) wrote:
> On 18/04/2019 03:38, Martin Sebor wrote:
>> The fix for pr89797 committed in r270326 was limited to targets
>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>> The tests for the fix have been failing with an ICE on aarch64
>> because it suffers from more or less the same problem but in
>> its own target-specific code.  Attached is the patch I posted
>> yesterday that fixes the ICE, successfully bootstrapped and
>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>> There are no ICEs but there are tons of errors in the latter
>> tests because many (most?) either expect to be able to find
>> libc headers or link executables (I have not built libc for
>> aarch64).
>>
>> I'm around tomorrow but then traveling the next two weeks (with
>> no connectivity the first week) so I unfortunately won't be able
>> to fix whatever this change might break until the week of May 6.
>>
>> Jeff, if you have an aarch64 tester that could verify this patch
>> tomorrow that would help give us some confidence.  Otherwise,
>> another option to consider for the time being is to xfail
>> the tests on aarch64.
>>
>> Thanks
>> Martin
>>
>> gcc-89797.diff
>>
>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>>
>> gcc/ChangeLog:
>>
>> 	PR middle-end/89797
>> 	* tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>> 	NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>> 	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>> 	assuming type size fits in SHWI.
>>
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h	(revision 270418)
>> +++ gcc/tree.h	(working copy)
>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>    if (NUM_POLY_INT_COEFFS == 2)
>>      {
>>        poly_uint64 res = 0;
>> -      res.coeffs[0] = 1 << (precision & 0xff);
>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
> 
> I'm not sure I understand this either before or after.  (precision &
> 0xff) gives a value in the range 0-255, which is way more than can be
> accommodated in a left shift, even for a HWI.
> 
>>        if (precision & 0x100)
>> -	res.coeffs[1] = 1 << (precision & 0xff);
>> +	res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
> 
> And this equally doesn't make sense >>16 will shift the masked value out
> of the bottom of the result.
> 
> Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?

Maybe I've completely misread this, but shouldn't the code be:

  if (NUM_POLY_INT_COEFFS == 2)
    {
      poly_uint64 res = 0;
      if (precision < HOST_BITS_PER_WIDE_INT)
	{
	  res.coeffs[0] = UNSIGNED_HOST_WIDE_INT_1U << precision;
	  res.coeffs[1] = 0;
	}
      else
	{
	  res.coeffs[0] = 0;
	  res.coeffs[1] = UNSIGNED_HOST_WIDE_INT_1U << (precision -
HOST_BITS_PER_WIDE_INT);
	}
      return res;
    }

R.

> 
> R.
> 
>>        return res;
>>      }
>>    else
>> -    return (unsigned HOST_WIDE_INT)1 << precision;
>> +    return HOST_WIDE_INT_1U << precision;
>>  }
>>  
>>  /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> --- gcc/config/aarch64/aarch64.c	(revision 270418)
>> +++ gcc/config/aarch64/aarch64.c	(working copy)
>> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
>>         be set for non-predicate vectors of booleans.  Modes are the most
>>         direct way we have of identifying real SVE predicate types.  */
>>      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
>> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
>> +  tree size = TYPE_SIZE (type);
>> +  unsigned HOST_WIDE_INT align = 128;
>> +  if (tree_fits_uhwi_p (size))
>> +    align = tree_to_uhwi (TYPE_SIZE (type));
>>    return MIN (align, 128);
>>  }
>>  
>>
>

Richard Sandiford April 18, 2019, 12:07 p.m. UTC | #3

"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> On 18/04/2019 03:38, Martin Sebor wrote:
>> The fix for pr89797 committed in r270326 was limited to targets
>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>> The tests for the fix have been failing with an ICE on aarch64
>> because it suffers from more or less the same problem but in
>> its own target-specific code.  Attached is the patch I posted
>> yesterday that fixes the ICE, successfully bootstrapped and
>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>> There are no ICEs but there are tons of errors in the latter
>> tests because many (most?) either expect to be able to find
>> libc headers or link executables (I have not built libc for
>> aarch64).
>> 
>> I'm around tomorrow but then traveling the next two weeks (with
>> no connectivity the first week) so I unfortunately won't be able
>> to fix whatever this change might break until the week of May 6.
>> 
>> Jeff, if you have an aarch64 tester that could verify this patch
>> tomorrow that would help give us some confidence.  Otherwise,
>> another option to consider for the time being is to xfail
>> the tests on aarch64.
>> 
>> Thanks
>> Martin
>> 
>> gcc-89797.diff
>> 
>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>> 
>> gcc/ChangeLog:
>> 
>> 	PR middle-end/89797
>> 	* tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>> 	NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>> 	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>> 	assuming type size fits in SHWI.
>> 
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h	(revision 270418)
>> +++ gcc/tree.h	(working copy)
>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>    if (NUM_POLY_INT_COEFFS == 2)
>>      {
>>        poly_uint64 res = 0;
>> -      res.coeffs[0] = 1 << (precision & 0xff);
>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
>
> I'm not sure I understand this either before or after.  (precision &
> 0xff) gives a value in the range 0-255, which is way more than can be
> accommodated in a left shift, even for a HWI.

For the record: the problem is that we have two coefficients that are
logically both in the range 1<<[0,63], needing 6 bits per coefficient
and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
it in a single bit.

A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
E.g. 0xff can be loaded directly from memory as a byte.  There's not
going to be much in it either way though.

>>        if (precision & 0x100)
>> -	res.coeffs[1] = 1 << (precision & 0xff);
>> +	res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
>
> And this equally doesn't make sense >>16 will shift the masked value out
> of the bottom of the result.

Yeah, we should keep the original shift amount and just do
s/1/HOST_WIDE_INT_1U/.

> Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?
>
> R.
>
>>        return res;
>>      }
>>    else
>> -    return (unsigned HOST_WIDE_INT)1 << precision;
>> +    return HOST_WIDE_INT_1U << precision;
>>  }
>>  
>>  /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> --- gcc/config/aarch64/aarch64.c	(revision 270418)
>> +++ gcc/config/aarch64/aarch64.c	(working copy)
>> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
>>         be set for non-predicate vectors of booleans.  Modes are the most
>>         direct way we have of identifying real SVE predicate types.  */
>>      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
>> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
>> +  tree size = TYPE_SIZE (type);
>> +  unsigned HOST_WIDE_INT align = 128;
>> +  if (tree_fits_uhwi_p (size))
>> +    align = tree_to_uhwi (TYPE_SIZE (type));
>>    return MIN (align, 128);

I guess this is personal preference, sorry, but:

  return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();

seems more direct and obvious than:

  tree size = TYPE_SIZE (type);
  unsigned HOST_WIDE_INT align = 128;
  if (tree_fits_uhwi_p (size))
    align = tree_to_uhwi (TYPE_SIZE (type));
  return MIN (align, 128);

OK with those changes, thanks.  I've bootrapped & regression-tested
it in that form on aarch64-linux-gnu.

Richard

Christophe Lyon April 18, 2019, 1:33 p.m. UTC | #4

On Thu, 18 Apr 2019 at 14:08, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> > On 18/04/2019 03:38, Martin Sebor wrote:
> >> The fix for pr89797 committed in r270326 was limited to targets
> >> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
> >> The tests for the fix have been failing with an ICE on aarch64
> >> because it suffers from more or less the same problem but in
> >> its own target-specific code.  Attached is the patch I posted
> >> yesterday that fixes the ICE, successfully bootstrapped and
> >> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
> >> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
> >> There are no ICEs but there are tons of errors in the latter
> >> tests because many (most?) either expect to be able to find
> >> libc headers or link executables (I have not built libc for
> >> aarch64).
> >>
> >> I'm around tomorrow but then traveling the next two weeks (with
> >> no connectivity the first week) so I unfortunately won't be able
> >> to fix whatever this change might break until the week of May 6.
> >>
> >> Jeff, if you have an aarch64 tester that could verify this patch
> >> tomorrow that would help give us some confidence.  Otherwise,
> >> another option to consider for the time being is to xfail
> >> the tests on aarch64.
> >>
> >> Thanks
> >> Martin
> >>
> >> gcc-89797.diff
> >>
> >> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
> >>
> >> gcc/ChangeLog:
> >>
> >>      PR middle-end/89797
> >>      * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
> >>      NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
> >>      * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
> >>      assuming type size fits in SHWI.
> >>
> >> Index: gcc/tree.h
> >> ===================================================================
> >> --- gcc/tree.h       (revision 270418)
> >> +++ gcc/tree.h       (working copy)
> >> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
> >>    if (NUM_POLY_INT_COEFFS == 2)
> >>      {
> >>        poly_uint64 res = 0;
> >> -      res.coeffs[0] = 1 << (precision & 0xff);
> >> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
> >
> > I'm not sure I understand this either before or after.  (precision &
> > 0xff) gives a value in the range 0-255, which is way more than can be
> > accommodated in a left shift, even for a HWI.
>
> For the record: the problem is that we have two coefficients that are
> logically both in the range 1<<[0,63], needing 6 bits per coefficient
> and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
> coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
> it in a single bit.
>
> A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
> E.g. 0xff can be loaded directly from memory as a byte.  There's not
> going to be much in it either way though.
>
> >>        if (precision & 0x100)
> >> -    res.coeffs[1] = 1 << (precision & 0xff);
> >> +    res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
> >
> > And this equally doesn't make sense >>16 will shift the masked value out
> > of the bottom of the result.
>
> Yeah, we should keep the original shift amount and just do
> s/1/HOST_WIDE_INT_1U/.
>
> > Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?
> >
> > R.
> >
> >>        return res;
> >>      }
> >>    else
> >> -    return (unsigned HOST_WIDE_INT)1 << precision;
> >> +    return HOST_WIDE_INT_1U << precision;
> >>  }
> >>
> >>  /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
> >> Index: gcc/config/aarch64/aarch64.c
> >> ===================================================================
> >> --- gcc/config/aarch64/aarch64.c     (revision 270418)
> >> +++ gcc/config/aarch64/aarch64.c     (working copy)
> >> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
> >>         be set for non-predicate vectors of booleans.  Modes are the most
> >>         direct way we have of identifying real SVE predicate types.  */
> >>      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
> >> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
> >> +  tree size = TYPE_SIZE (type);
> >> +  unsigned HOST_WIDE_INT align = 128;
> >> +  if (tree_fits_uhwi_p (size))
> >> +    align = tree_to_uhwi (TYPE_SIZE (type));
> >>    return MIN (align, 128);
>
> I guess this is personal preference, sorry, but:
>
>   return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();
>
> seems more direct and obvious than:
>
>   tree size = TYPE_SIZE (type);
>   unsigned HOST_WIDE_INT align = 128;
>   if (tree_fits_uhwi_p (size))
>     align = tree_to_uhwi (TYPE_SIZE (type));
>   return MIN (align, 128);
>
> OK with those changes, thanks.  I've bootrapped & regression-tested
> it in that form on aarch64-linux-gnu.
>

I'm late in the party, and I confirm that Martin's original patch
introduced regressions:
  Executed from: gcc.target/aarch64/aarch64.exp
    gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
\\.cfi_def_cfa_register 11 1
    gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
\\.cfi_escape 0xf,0xc,0x8f,0,0x92,0x2e,0,.* 1
    gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1
    gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times
cmp\\s+x[0-9]+, 61440 1
    gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times
str\\s+xzr, \\[sp, 0\\] 1
    gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times
sub\\s+x[0-9]+, x[0-9]+, 61440 1
and many in gcc.target/aarch64/sve/aarch64-sve.exp

Christophe


> Richard

Martin Sebor April 18, 2019, 3:06 p.m. UTC | #5

On 4/18/19 6:07 AM, Richard Sandiford wrote:
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> On 18/04/2019 03:38, Martin Sebor wrote:
>>> The fix for pr89797 committed in r270326 was limited to targets
>>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>>> The tests for the fix have been failing with an ICE on aarch64
>>> because it suffers from more or less the same problem but in
>>> its own target-specific code.  Attached is the patch I posted
>>> yesterday that fixes the ICE, successfully bootstrapped and
>>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>>> There are no ICEs but there are tons of errors in the latter
>>> tests because many (most?) either expect to be able to find
>>> libc headers or link executables (I have not built libc for
>>> aarch64).
>>>
>>> I'm around tomorrow but then traveling the next two weeks (with
>>> no connectivity the first week) so I unfortunately won't be able
>>> to fix whatever this change might break until the week of May 6.
>>>
>>> Jeff, if you have an aarch64 tester that could verify this patch
>>> tomorrow that would help give us some confidence.  Otherwise,
>>> another option to consider for the time being is to xfail
>>> the tests on aarch64.
>>>
>>> Thanks
>>> Martin
>>>
>>> gcc-89797.diff
>>>
>>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR middle-end/89797
>>> 	* tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>>> 	NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>>> 	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>>> 	assuming type size fits in SHWI.
>>>
>>> Index: gcc/tree.h
>>> ===================================================================
>>> --- gcc/tree.h	(revision 270418)
>>> +++ gcc/tree.h	(working copy)
>>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>>     if (NUM_POLY_INT_COEFFS == 2)
>>>       {
>>>         poly_uint64 res = 0;
>>> -      res.coeffs[0] = 1 << (precision & 0xff);
>>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
>>
>> I'm not sure I understand this either before or after.  (precision &
>> 0xff) gives a value in the range 0-255, which is way more than can be
>> accommodated in a left shift, even for a HWI.
> 
> For the record: the problem is that we have two coefficients that are
> logically both in the range 1<<[0,63], needing 6 bits per coefficient
> and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
> coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
> it in a single bit.
> 
> A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
> E.g. 0xff can be loaded directly from memory as a byte.  There's not
> going to be much in it either way though.
> 
>>>         if (precision & 0x100)
>>> -	res.coeffs[1] = 1 << (precision & 0xff);
>>> +	res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
>>
>> And this equally doesn't make sense >>16 will shift the masked value out
>> of the bottom of the result.
> 
> Yeah, we should keep the original shift amount and just do
> s/1/HOST_WIDE_INT_1U/.

Sorry, I confused things with the 16 bit shift thinko (I meant
to shift by 8 bits).  But I don't understand how poly_int works
at all so the shift was just a wild guess thinking the purpose
of the split was to allow for precisions in excess of (1 << 63).

But I'm afraid I still don't understand when (precision & 0x100)
could ever hold.  I put in an abort in that branch and ran
the test suite but didn't manage to trigger an ICE.  Can you
give me an example when it might trigger?  (I'd also like to
add a comment to the function to explain it.)

The test case I'm fixing is:

  attribute ((vector_size (1LLU << 62))) char v;

Greater values are rejected with:

   error: ‘vector_size’ attribute argument value ‘9223372036854775808’ 
exceeds 9223372036854775807

>> Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?
>>
>> R.
>>
>>>         return res;
>>>       }
>>>     else
>>> -    return (unsigned HOST_WIDE_INT)1 << precision;
>>> +    return HOST_WIDE_INT_1U << precision;
>>>   }
>>>   
>>>   /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
>>> Index: gcc/config/aarch64/aarch64.c
>>> ===================================================================
>>> --- gcc/config/aarch64/aarch64.c	(revision 270418)
>>> +++ gcc/config/aarch64/aarch64.c	(working copy)
>>> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
>>>          be set for non-predicate vectors of booleans.  Modes are the most
>>>          direct way we have of identifying real SVE predicate types.  */
>>>       return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
>>> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
>>> +  tree size = TYPE_SIZE (type);
>>> +  unsigned HOST_WIDE_INT align = 128;
>>> +  if (tree_fits_uhwi_p (size))
>>> +    align = tree_to_uhwi (TYPE_SIZE (type));
>>>     return MIN (align, 128);
> 
> I guess this is personal preference, sorry, but:
> 
>    return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();
> 
> seems more direct and obvious than:
> 
>    tree size = TYPE_SIZE (type);
>    unsigned HOST_WIDE_INT align = 128;
>    if (tree_fits_uhwi_p (size))
>      align = tree_to_uhwi (TYPE_SIZE (type));
>    return MIN (align, 128);
> 
> OK with those changes, thanks.  I've bootrapped & regression-tested
> it in that form on aarch64-linux-gnu.

Okay.  I will hold off on committing until I hear back from you
on the question above.

Thanks
Martin

Richard Sandiford April 18, 2019, 3:59 p.m. UTC | #6

Martin Sebor <msebor@gmail.com> writes:
> On 4/18/19 6:07 AM, Richard Sandiford wrote:
>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>> On 18/04/2019 03:38, Martin Sebor wrote:
>>>> The fix for pr89797 committed in r270326 was limited to targets
>>>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>>>> The tests for the fix have been failing with an ICE on aarch64
>>>> because it suffers from more or less the same problem but in
>>>> its own target-specific code.  Attached is the patch I posted
>>>> yesterday that fixes the ICE, successfully bootstrapped and
>>>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>>>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>>>> There are no ICEs but there are tons of errors in the latter
>>>> tests because many (most?) either expect to be able to find
>>>> libc headers or link executables (I have not built libc for
>>>> aarch64).
>>>>
>>>> I'm around tomorrow but then traveling the next two weeks (with
>>>> no connectivity the first week) so I unfortunately won't be able
>>>> to fix whatever this change might break until the week of May 6.
>>>>
>>>> Jeff, if you have an aarch64 tester that could verify this patch
>>>> tomorrow that would help give us some confidence.  Otherwise,
>>>> another option to consider for the time being is to xfail
>>>> the tests on aarch64.
>>>>
>>>> Thanks
>>>> Martin
>>>>
>>>> gcc-89797.diff
>>>>
>>>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	PR middle-end/89797
>>>> 	* tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>>>> 	NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>>>> 	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>>>> 	assuming type size fits in SHWI.
>>>>
>>>> Index: gcc/tree.h
>>>> ===================================================================
>>>> --- gcc/tree.h	(revision 270418)
>>>> +++ gcc/tree.h	(working copy)
>>>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>>>     if (NUM_POLY_INT_COEFFS == 2)
>>>>       {
>>>>         poly_uint64 res = 0;
>>>> -      res.coeffs[0] = 1 << (precision & 0xff);
>>>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
>>>
>>> I'm not sure I understand this either before or after.  (precision &
>>> 0xff) gives a value in the range 0-255, which is way more than can be
>>> accommodated in a left shift, even for a HWI.
>> 
>> For the record: the problem is that we have two coefficients that are
>> logically both in the range 1<<[0,63], needing 6 bits per coefficient
>> and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
>> coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
>> it in a single bit.
>> 
>> A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
>> E.g. 0xff can be loaded directly from memory as a byte.  There's not
>> going to be much in it either way though.
>> 
>>>>         if (precision & 0x100)
>>>> -	res.coeffs[1] = 1 << (precision & 0xff);
>>>> +	res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
>>>
>>> And this equally doesn't make sense >>16 will shift the masked value out
>>> of the bottom of the result.
>> 
>> Yeah, we should keep the original shift amount and just do
>> s/1/HOST_WIDE_INT_1U/.
>
> Sorry, I confused things with the 16 bit shift thinko (I meant
> to shift by 8 bits).  But I don't understand how poly_int works
> at all so the shift was just a wild guess thinking the purpose
> of the split was to allow for precisions in excess of (1 << 63).
>
> But I'm afraid I still don't understand when (precision & 0x100)
> could ever hold.  I put in an abort in that branch and ran
> the test suite but didn't manage to trigger an ICE.  Can you
> give me an example when it might trigger?  (I'd also like to
> add a comment to the function to explain it.)

The function is decoding the precision installed by:

inline void
SET_TYPE_VECTOR_SUBPARTS (tree node, poly_uint64 subparts)
{
  STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
  unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
  int index = exact_log2 (coeff0);
  gcc_assert (index >= 0);
  if (NUM_POLY_INT_COEFFS == 2)
    {
      unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
      gcc_assert (coeff1 == 0 || coeff1 == coeff0);
      VECTOR_TYPE_CHECK (node)->type_common.precision
	= index + (coeff1 != 0 ? 0x100 : 0);
    }
  else
    VECTOR_TYPE_CHECK (node)->type_common.precision = index;
}

Code protected by NUM_POLY_INT_COEFFS == 2 will only trigger on
AArch64 targets, so an x86 test run won't cover it.  The assert you
added would blow up on pretty much all of aarch64-sve.exp in an AArch64
test run though; see Christophe's results for what happened with the
original patch.

I certainly don't mind adding more commentary, but please keep that
separate from the PR fix, which is simply about changing "1 <<" to
"HOST_WIDE_INT_1U <<".  I think we went down a bit of a rabbit hole
with this encoding question. :-)

FWIW, the layout of "precision" is fundamental to the handling of length-
agnostic vector types and is covered by all SVE autovectorisation tests,
so I'm pretty confident that we're reading "precision" correctly.  (But of
course, both arms have the bug you're fixing, in which we were shifting
an int instead of a UWHI.)

Thanks,
Richard

Martin Sebor April 18, 2019, 7:30 p.m. UTC | #7

On 4/18/19 9:59 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 4/18/19 6:07 AM, Richard Sandiford wrote:
>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>> On 18/04/2019 03:38, Martin Sebor wrote:
>>>>> The fix for pr89797 committed in r270326 was limited to targets
>>>>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>>>>> The tests for the fix have been failing with an ICE on aarch64
>>>>> because it suffers from more or less the same problem but in
>>>>> its own target-specific code.  Attached is the patch I posted
>>>>> yesterday that fixes the ICE, successfully bootstrapped and
>>>>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>>>>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>>>>> There are no ICEs but there are tons of errors in the latter
>>>>> tests because many (most?) either expect to be able to find
>>>>> libc headers or link executables (I have not built libc for
>>>>> aarch64).
>>>>>
>>>>> I'm around tomorrow but then traveling the next two weeks (with
>>>>> no connectivity the first week) so I unfortunately won't be able
>>>>> to fix whatever this change might break until the week of May 6.
>>>>>
>>>>> Jeff, if you have an aarch64 tester that could verify this patch
>>>>> tomorrow that would help give us some confidence.  Otherwise,
>>>>> another option to consider for the time being is to xfail
>>>>> the tests on aarch64.
>>>>>
>>>>> Thanks
>>>>> Martin
>>>>>
>>>>> gcc-89797.diff
>>>>>
>>>>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 	PR middle-end/89797
>>>>> 	* tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>>>>> 	NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>>>>> 	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>>>>> 	assuming type size fits in SHWI.
>>>>>
>>>>> Index: gcc/tree.h
>>>>> ===================================================================
>>>>> --- gcc/tree.h	(revision 270418)
>>>>> +++ gcc/tree.h	(working copy)
>>>>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>>>>      if (NUM_POLY_INT_COEFFS == 2)
>>>>>        {
>>>>>          poly_uint64 res = 0;
>>>>> -      res.coeffs[0] = 1 << (precision & 0xff);
>>>>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
>>>>
>>>> I'm not sure I understand this either before or after.  (precision &
>>>> 0xff) gives a value in the range 0-255, which is way more than can be
>>>> accommodated in a left shift, even for a HWI.
>>>
>>> For the record: the problem is that we have two coefficients that are
>>> logically both in the range 1<<[0,63], needing 6 bits per coefficient
>>> and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
>>> coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
>>> it in a single bit.
>>>
>>> A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
>>> E.g. 0xff can be loaded directly from memory as a byte.  There's not
>>> going to be much in it either way though.
>>>
>>>>>          if (precision & 0x100)
>>>>> -	res.coeffs[1] = 1 << (precision & 0xff);
>>>>> +	res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
>>>>
>>>> And this equally doesn't make sense >>16 will shift the masked value out
>>>> of the bottom of the result.
>>>
>>> Yeah, we should keep the original shift amount and just do
>>> s/1/HOST_WIDE_INT_1U/.
>>
>> Sorry, I confused things with the 16 bit shift thinko (I meant
>> to shift by 8 bits).  But I don't understand how poly_int works
>> at all so the shift was just a wild guess thinking the purpose
>> of the split was to allow for precisions in excess of (1 << 63).
>>
>> But I'm afraid I still don't understand when (precision & 0x100)
>> could ever hold.  I put in an abort in that branch and ran
>> the test suite but didn't manage to trigger an ICE.  Can you
>> give me an example when it might trigger?  (I'd also like to
>> add a comment to the function to explain it.)
> 
> The function is decoding the precision installed by:
> 
> inline void
> SET_TYPE_VECTOR_SUBPARTS (tree node, poly_uint64 subparts)
> {
>    STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
>    unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
>    int index = exact_log2 (coeff0);
>    gcc_assert (index >= 0);
>    if (NUM_POLY_INT_COEFFS == 2)
>      {
>        unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
>        gcc_assert (coeff1 == 0 || coeff1 == coeff0);
>        VECTOR_TYPE_CHECK (node)->type_common.precision
> 	= index + (coeff1 != 0 ? 0x100 : 0);

Ugh.  It never occured to me that precision was being used like
this.  Even though the functions are one right after the other,
a brief comment would have helped avoid this misunderstanding.
Let me propose one separately.

>      }
>    else
>      VECTOR_TYPE_CHECK (node)->type_common.precision = index;
> }
> 
> Code protected by NUM_POLY_INT_COEFFS == 2 will only trigger on
> AArch64 targets, so an x86 test run won't cover it.  The assert you
> added would blow up on pretty much all of aarch64-sve.exp in an AArch64
> test run though; see Christophe's results for what happened with the
> original patch.

Ah!  aarch64-sve.exp explains it.  I only ran aarch64.exp.  There,
there's just one test that triggers an ICE (I missed it at first):
   aarch64/stack-check-prologue-16.c
But many of the aarch64-sve.exp do fail with the abort there.

> 
> I certainly don't mind adding more commentary, but please keep that
> separate from the PR fix, which is simply about changing "1 <<" to
> "HOST_WIDE_INT_1U <<".  I think we went down a bit of a rabbit hole
> with this encoding question. :-)

Sorry, I was just baffled by the code and trying to understand
how it works.

I will commit the attached revision shortly.

Thanks for your help!

Martin

> 
> FWIW, the layout of "precision" is fundamental to the handling of length-
> agnostic vector types and is covered by all SVE autovectorisation tests,
> so I'm pretty confident that we're reading "precision" correctly.  (But of
> course, both arms have the bug you're fixing, in which we were shifting
> an int instead of a UWHI.)
> 
> Thanks,
> Richard
>
PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable

gcc/ChangeLog:

	PR middle-end/89797
	* tree.h (TYPE_VECTOR_SUBPARTS): Use HOST_WIDE_INT_1U.
	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
	assuming type size fits in SHWI.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 270418)
+++ gcc/tree.h	(working copy)
@@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
   if (NUM_POLY_INT_COEFFS == 2)
     {
       poly_uint64 res = 0;
-      res.coeffs[0] = 1 << (precision & 0xff);
+      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
       if (precision & 0x100)
-	res.coeffs[1] = 1 << (precision & 0xff);
+	res.coeffs[1] = HOST_WIDE_INT_1U << (precision & 0xff);
       return res;
     }
   else
-    return (unsigned HOST_WIDE_INT)1 << precision;
+    return HOST_WIDE_INT_1U << precision;
 }
 
 /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 270418)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -14924,8 +14924,7 @@ aarch64_simd_vector_alignment (const_tree type)
        be set for non-predicate vectors of booleans.  Modes are the most
        direct way we have of identifying real SVE predicate types.  */
     return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
-  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
-  return MIN (align, 128);
+  return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();
 }
 
 /* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT.  */

avoid aarch64 ICE on large vectors (PR 89797)

Commit Message

Comments

Patch