Rework subreg_get_info

Message ID	87oa1g1ysh.fsf@e105548-lin.cambridge.arm.com
State	New
Headers	show Return-Path: <gcc-patches-return-441502-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=TVF0dX1n3bvxKxpUuDOGsatSqgFtTS1BtMNMWU4qyT82ThFT4Xgy4 5LUDA3ebLUHjAso0SbyAzsOPKtLbgytQo2ULWzTqsP4aigpBg6xMYrRpCQ5mymhB AJA3H34LPSPzNBTvwtuN95BI/0Yf/xQfs3kS8/Z+0I8fk6jHX+2D6Q= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Sandiford <richard.sandiford@arm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Rework subreg_get_info Date: Tue, 15 Nov 2016 16:41:02 +0000 Message-ID: <87oa1g1ysh.fsf@e105548-lin.cambridge.arm.com> User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain

Richard Sandiford Nov. 15, 2016, 4:41 p.m. UTC

This isn't intended to change the behaviour, just rewrite the
existing logic in a different (and hopefully clearer) way.
The new form -- particularly the part based on the "block"
concept -- is easier to convert to polynomial sizes.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  <richard.sandiford@arm.com>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

	* rtlanal.c (subreg_get_info): Use more local variables.
	Remark that for HARD_REGNO_NREGS_HAS_PADDING, each scalar unit
	occupies at least one register.  Use byte_lowpart_offset to
	check for big-endian offsets unless REG_WORDS_BIG_ENDIAN !=
	WORDS_BIG_ENDIAN.  Share previously-duplicated if block.
	Rework the main handling so that it operates on independently-
	addressable YMODE-sized blocks.  Use subreg_size_lowpart_offset
	to check lowpart offsets, without trying to find an equivalent
	integer mode first.  Handle WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
	as a final register-endianness correction.

Richard Sandiford Nov. 15, 2016, 4:47 p.m. UTC | #1

Richard Sandiford <richard.sandiford@arm.com> writes:
> This isn't intended to change the behaviour, just rewrite the
> existing logic in a different (and hopefully clearer) way.
> The new form -- particularly the part based on the "block"
> concept -- is easier to convert to polynomial sizes.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Sorry, I should have said: this was also tested by compiling the
testsuite before and after the change at -O2 -ftree-vectorize on:

    aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
    arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
    epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
    hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
    i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
    m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
    mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
    nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
    powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf
    rx-elf s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
    sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
    xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
    x86_64-linux-gnu xtensa-elf

There were no differences in assembly output.

Thanks,
Richard

Joseph Myers Nov. 16, 2016, 2:44 a.m. UTC | #2

On Tue, 15 Nov 2016, Richard Sandiford wrote:

> Richard Sandiford <richard.sandiford@arm.com> writes:
> > This isn't intended to change the behaviour, just rewrite the
> > existing logic in a different (and hopefully clearer) way.
> > The new form -- particularly the part based on the "block"
> > concept -- is easier to convert to polynomial sizes.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Sorry, I should have said: this was also tested by compiling the
> testsuite before and after the change at -O2 -ftree-vectorize on:
> 
>     aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
>     arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
>     epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
>     hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
>     i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
>     m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
>     mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
>     nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
>     powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf

e500 double (both DFmode and TFmode) was the case that motivated the 
original creation of subreg_get_info.  I think powerpc-linux-gnuspe 
--enable-e500-double would be a good case for verifying the patch doesn't 
change generated code.  (Though when I tried building it from mainline 
sources last week I got an ICE in LRA building __multc3, so testing it 
might be problematic at present.)

Eric Botcazou Nov. 16, 2016, 10:20 a.m. UTC | #3

> This isn't intended to change the behaviour, just rewrite the
> existing logic in a different (and hopefully clearer) way.

Yes, I agree that it's an improvement.  A few remarks below.

> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index ca6cced..7c0acf5 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, unsigned int offset, machine_mode ymode,
>  		 struct subreg_info *info)
>  {
> -  int nregs_xmode, nregs_ymode;
> -  int mode_multiple, nregs_multiple;
> -  int offset_adj, y_offset, y_offset_adj;
> -  int regsize_xmode, regsize_ymode;
> -  bool rknown;
> +  unsigned int nregs_xmode, nregs_ymode;
> 
>    gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
> 
> -  rknown = false;
> +  unsigned int xsize = GET_MODE_SIZE (xmode);
> +  unsigned int ysize = GET_MODE_SIZE (ymode);
> +  bool rknown = false;
> 
>    /* If there are holes in a non-scalar mode in registers, we expect
> -     that it is made up of its units concatenated together.  */
> +     that it is made up of its units concatenated together.  Each scalar
> +     unit occupies at least one register.  */

Why using "scalar unit" while the first sentence uses "unit"?  Are they 
different units?  What's the status of the new sentence?  Assertion?  
Expectation?  I'd also make the first sentence grammatical.

>    if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode))
>      {
> -      machine_mode xmode_unit;
> -
>        nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
> -      xmode_unit = GET_MODE_INNER (xmode);
> +      unsigned int nunits = GET_MODE_NUNITS (xmode);
> +      machine_mode xmode_unit = GET_MODE_INNER (xmode);
>        gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>        gcc_assert (nregs_xmode
> -		  == (GET_MODE_NUNITS (xmode)
> +		  == (nunits
>  		      * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>        gcc_assert (hard_regno_nregs[xregno][xmode]
> -		  == (hard_regno_nregs[xregno][xmode_unit]
> -		      * GET_MODE_NUNITS (xmode)));
> +		  == hard_regno_nregs[xregno][xmode_unit] * nunits);
> 
>        /* You can only ask for a SUBREG of a value with holes in the middle
>  	 if you don't cross the holes.  (Such a SUBREG should be done by
> @@ -3635,11 +3632,9 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, 3 for each part, but in memory it's two 128-bit parts.
>  	 Padding is assumed to be at the end (not necessarily the 'high part')
>  	 of each unit.  */
> -      if ((offset / GET_MODE_SIZE (xmode_unit) + 1
> -	   < GET_MODE_NUNITS (xmode))
> +      if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
>  	  && (offset / GET_MODE_SIZE (xmode_unit)
> -	      != ((offset + GET_MODE_SIZE (ymode) - 1)
> -		  / GET_MODE_SIZE (xmode_unit))))
> +	      != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit))))
>  	{
>  	  info->representable_p = false;
>  	  rknown = true;

This part is OK.

> @@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, nregs_ymode = hard_regno_nregs[xregno][ymode];
> 
>    /* Paradoxical subregs are otherwise valid.  */
> -  if (!rknown
> -      && offset == 0
> -      && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode))
> +  if (!rknown && offset == 0 && ysize > xsize)
>      {
>        info->representable_p = true;
>        /* If this is a big endian paradoxical subreg, which uses more
>  	 actual hard registers than the original register, we must
>  	 return a negative offset so that we find the proper highpart
>  	 of the register.  */
> -      if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD
> -	  ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> -	info->offset = nregs_xmode - nregs_ymode;
> +      if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize >
> UNITS_PER_WORD +	  ? REG_WORDS_BIG_ENDIAN
> +	  : byte_lowpart_offset (ymode, xmode) != 0)
> +	info->offset = (int) nregs_xmode - (int) nregs_ymode;
>        else
>  	info->offset = 0;
>        info->nregs = nregs_ymode;

This part is not OK, it turns straightforward code into convoluted code.

> @@ -3673,31 +3667,23 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, modes, we cannot generally form this subreg.  */
>    if (!HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode)
>        && !HARD_REGNO_NREGS_HAS_PADDING (xregno, ymode)
> -      && (GET_MODE_SIZE (xmode) % nregs_xmode) == 0
> -      && (GET_MODE_SIZE (ymode) % nregs_ymode) == 0)
> +      && (xsize % nregs_xmode) == 0
> +      && (ysize % nregs_ymode) == 0)
>      {
> -      regsize_xmode = GET_MODE_SIZE (xmode) / nregs_xmode;
> -      regsize_ymode = GET_MODE_SIZE (ymode) / nregs_ymode;
> -      if (!rknown && regsize_xmode > regsize_ymode && nregs_ymode > 1)
> -	{
> -	  info->representable_p = false;
> -	  info->nregs
> -	    = (GET_MODE_SIZE (ymode) + regsize_xmode - 1) / regsize_xmode;
> -	  info->offset = offset / regsize_xmode;
> -	  return;
> -	}
> -      if (!rknown && regsize_ymode > regsize_xmode && nregs_xmode > 1)
> +      int regsize_xmode = xsize / nregs_xmode;
> +      int regsize_ymode = ysize / nregs_ymode;
> +      if (!rknown
> +	  && ((nregs_ymode > 1 && regsize_xmode > regsize_ymode)
> +	      || (nregs_xmode > 1 && regsize_ymode > regsize_xmode)))
>  	{
>  	  info->representable_p = false;
> -	  info->nregs
> -	    = (GET_MODE_SIZE (ymode) + regsize_xmode - 1) / regsize_xmode;
> +	  info->nregs = CEIL (ysize, regsize_xmode);
>  	  info->offset = offset / regsize_xmode;
>  	  return;
>  	}
>        /* It's not valid to extract a subreg of mode YMODE at OFFSET that
>  	 would go outside of XMODE.  */
> -      if (!rknown
> -	  && GET_MODE_SIZE (ymode) + offset > GET_MODE_SIZE (xmode))
> +      if (!rknown && ysize + offset > xsize)
>  	{
>  	  info->representable_p = false;
>  	  info->nregs = nregs_ymode;

This part is OK.

> @@ -3717,7 +3703,7 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, info->representable_p = true;
>  	  info->nregs = nregs_ymode;
>  	  info->offset = offset / regsize_ymode;
> -	  gcc_assert (info->offset + info->nregs <= nregs_xmode);
> +	  gcc_assert (info->offset + nregs_ymode <= nregs_xmode);
>  	  return;
>  	}
>      }

This part is not OK, the assertion is intended to be on INFO.

> @@ -3736,47 +3722,39 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, }
>      }
> 
> -  /* This should always pass, otherwise we don't know how to verify
> -     the constraint.  These conditions may be relaxed but
> -     subreg_regno_offset would need to be redesigned.  */
> -  gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);
> +  /* Set NUM_BLOCKS to the number of independently-representable YMODE
> +     values there are in (reg:XMODE XREGNO).  We can view the register
> +     as consisting of this number of independent "blocks", where each
> +     block occupies NREGS_YMODE registers and contains exactly one
> +     representable YMODE value.  */
>    gcc_assert ((nregs_xmode % nregs_ymode) == 0);
> +  unsigned int num_blocks = nregs_xmode / nregs_ymode;

I find the "exactly" slightly confusing here, because it can make you think 
that it contains the number of bytes of a YMODE value; a possible better 
wording would be "a single" instead of "exactly one".

> -  if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
> -      && GET_MODE_SIZE (xmode) > UNITS_PER_WORD)
> -    {
> -      HOST_WIDE_INT xsize = GET_MODE_SIZE (xmode);
> -      HOST_WIDE_INT ysize = GET_MODE_SIZE (ymode);
> -      HOST_WIDE_INT off_low = offset & (ysize - 1);
> -      HOST_WIDE_INT off_high = offset & ~(ysize - 1);
> -      offset = (xsize - ysize - off_high) | off_low;
> -    }
> -  /* The XMODE value can be seen as a vector of NREGS_XMODE
> -     values.  The subreg must represent a lowpart of given field.
> -     Compute what field it is.  */
> -  offset_adj = offset;
> -  offset_adj -= subreg_lowpart_offset (ymode,
> -				       mode_for_size (GET_MODE_BITSIZE 
(xmode)
> -						      / nregs_xmode,
> -						      MODE_INT, 0));
> -
> -  /* Size of ymode must not be greater than the size of xmode.  */
> -  mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode);
> -  gcc_assert (mode_multiple != 0);
> -
> -  y_offset = offset / GET_MODE_SIZE (ymode);
> -  y_offset_adj = offset_adj / GET_MODE_SIZE (ymode);
> -  nregs_multiple = nregs_xmode / nregs_ymode;
> -
> -  gcc_assert ((offset_adj % GET_MODE_SIZE (ymode)) == 0);
> -  gcc_assert ((mode_multiple % nregs_multiple) == 0);
> +  /* Calculate the number of bytes in each block.  This must always
> +     be exact, otherwise we don't know how to verify the constraint.
> +     These conditions may be relaxed but subreg_regno_offset would
> +     need to be redesigned.  */
> +  gcc_assert ((xsize % num_blocks) == 0);
> +  unsigned int bytes_per_block = xsize / num_blocks;
> +
> +  /* Get the number of the first block that contains the subreg and the
> byte +     offset of the subreg from the start of that block.  */
> +  unsigned int block_number = offset / bytes_per_block;
> +  unsigned int subblock_offset = offset % bytes_per_block;
> 
>    if (!rknown)
>      {
> -      info->representable_p = (!(y_offset_adj % (mode_multiple /
> nregs_multiple))); +      /* Only the lowpart of each block is
> representable.  */
> +      info->representable_p
> +	= (subblock_offset
> +	   == subreg_size_lowpart_offset (ysize, bytes_per_block));
>        rknown = true;
>      }
> -  info->offset = (y_offset / (mode_multiple / nregs_multiple)) *
> nregs_ymode; +

This part is OK.

> +  if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN)
> +    info->offset = (num_blocks - block_number - 1) * nregs_ymode;
> +  else
> +    info->offset = block_number * nregs_ymode;
>    info->nregs = nregs_ymode;
>  }

Why was the test on the mode size dropped?

Richard Sandiford Nov. 16, 2016, 12:33 p.m. UTC | #4

Eric Botcazou <ebotcazou@adacore.com> writes:
>> This isn't intended to change the behaviour, just rewrite the
>> existing logic in a different (and hopefully clearer) way.
>
> Yes, I agree that it's an improvement.  A few remarks below.
>
>> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
>> index ca6cced..7c0acf5 100644
>> --- a/gcc/rtlanal.c
>> +++ b/gcc/rtlanal.c
>> @@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode
>> xmode, unsigned int offset, machine_mode ymode,
>>  		 struct subreg_info *info)
>>  {
>> -  int nregs_xmode, nregs_ymode;
>> -  int mode_multiple, nregs_multiple;
>> -  int offset_adj, y_offset, y_offset_adj;
>> -  int regsize_xmode, regsize_ymode;
>> -  bool rknown;
>> +  unsigned int nregs_xmode, nregs_ymode;
>> 
>>    gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
>> 
>> -  rknown = false;
>> +  unsigned int xsize = GET_MODE_SIZE (xmode);
>> +  unsigned int ysize = GET_MODE_SIZE (ymode);
>> +  bool rknown = false;
>> 
>>    /* If there are holes in a non-scalar mode in registers, we expect
>> -     that it is made up of its units concatenated together.  */
>> +     that it is made up of its units concatenated together.  Each scalar
>> +     unit occupies at least one register.  */
>
> Why using "scalar unit" while the first sentence uses "unit"?  Are they 
> different units?

No, the same.  I should probably have updated both.  I added "scalar"
to make it clear that we were talking about units in the GET_MODE_NUNITS
sense rather than the UNITS_PER_WORD sense (i.e. number of scalar values
rather than "unit" as an abstraction of "byte").

> What's the status of the new sentence?  Assertion?  
> Expectation?

Assertion.  It was a necessary but not sufficient condition to satisfy
the pre-patch:

      gcc_assert (hard_regno_nregs[xregno][xmode]
		  == (hard_regno_nregs[xregno][xmode_unit]
		      * GET_MODE_NUNITS (xmode)));

> I'd also make the first sentence grammatical.

Well, I think it's probably grammatical, but how about:

  If the register representation of a non-scalar mode has holes in it,
  we expect the scalar units to be concatenated together, with the holes
  distributed evenly among the scalar units.  Each scalar unit must occupy
  at least one register.

>> @@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode
>> xmode, nregs_ymode = hard_regno_nregs[xregno][ymode];
>> 
>>    /* Paradoxical subregs are otherwise valid.  */
>> -  if (!rknown
>> -      && offset == 0
>> -      && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode))
>> +  if (!rknown && offset == 0 && ysize > xsize)
>>      {
>>        info->representable_p = true;
>>        /* If this is a big endian paradoxical subreg, which uses more
>>  	 actual hard registers than the original register, we must
>>  	 return a negative offset so that we find the proper highpart
>>  	 of the register.  */
>> -      if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD
>> -	  ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)
>> -	info->offset = nregs_xmode - nregs_ymode;
>> +      if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize >
>> UNITS_PER_WORD +	  ? REG_WORDS_BIG_ENDIAN
>> +	  : byte_lowpart_offset (ymode, xmode) != 0)
>> +	info->offset = (int) nregs_xmode - (int) nregs_ymode;
>>        else
>>  	info->offset = 0;
>>        info->nregs = nregs_ymode;
>
> This part is not OK, it turns straightforward code into convoluted code.

This actually was one of the more important changes :-)  In combination
with later patches, the idea is to move away from UNITS_PER_WORD tests
when endianness is regular (all big or all little) and only do them
when the distinction between bytes and words makes a real difference.

The specific motivating examples were SVE predicate registers, which
occupy VL*2 bytes for some runtime VL.  They are smaller than a word
when VL<4, word-sized when VL==4, and bigger than a word when VL>4.
We therefore can't calculate:

  GET_MODE_SIZE (ymode) > UNITS_PER_WORD

at compile time.  This is one of the patches that avoids forcing the
issue unless the answer really matters.

>> @@ -3717,7 +3703,7 @@ subreg_get_info (unsigned int xregno, machine_mode
>> xmode, info->representable_p = true;
>>  	  info->nregs = nregs_ymode;
>>  	  info->offset = offset / regsize_ymode;
>> -	  gcc_assert (info->offset + info->nregs <= nregs_xmode);
>> +	  gcc_assert (info->offset + nregs_ymode <= nregs_xmode);
>>  	  return;
>>  	}
>>      }
>
> This part is not OK, the assertion is intended to be on INFO.

This was a cheap way of avoiding having to add a cast.  I'll add the
cast instead.

>> @@ -3736,47 +3722,39 @@ subreg_get_info (unsigned int xregno, machine_mode
>> xmode, }
>>      }
>> 
>> -  /* This should always pass, otherwise we don't know how to verify
>> -     the constraint.  These conditions may be relaxed but
>> -     subreg_regno_offset would need to be redesigned.  */
>> -  gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);
>> +  /* Set NUM_BLOCKS to the number of independently-representable YMODE
>> +     values there are in (reg:XMODE XREGNO).  We can view the register
>> +     as consisting of this number of independent "blocks", where each
>> +     block occupies NREGS_YMODE registers and contains exactly one
>> +     representable YMODE value.  */
>>    gcc_assert ((nregs_xmode % nregs_ymode) == 0);
>> +  unsigned int num_blocks = nregs_xmode / nregs_ymode;
>
> I find the "exactly" slightly confusing here, because it can make you think 
> that it contains the number of bytes of a YMODE value; a possible better 
> wording would be "a single" instead of "exactly one".

OK.

>> -  if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
>> -      && GET_MODE_SIZE (xmode) > UNITS_PER_WORD)
>> -    {
>> -      HOST_WIDE_INT xsize = GET_MODE_SIZE (xmode);
>> -      HOST_WIDE_INT ysize = GET_MODE_SIZE (ymode);
>> -      HOST_WIDE_INT off_low = offset & (ysize - 1);
>> -      HOST_WIDE_INT off_high = offset & ~(ysize - 1);
>> -      offset = (xsize - ysize - off_high) | off_low;
>> -    }
>> -  /* The XMODE value can be seen as a vector of NREGS_XMODE
>> -     values.  The subreg must represent a lowpart of given field.
>> -     Compute what field it is.  */
>> -  offset_adj = offset;
>> -  offset_adj -= subreg_lowpart_offset (ymode,
>> -				       mode_for_size (GET_MODE_BITSIZE 
> (xmode)
>> -						      / nregs_xmode,
>> -						      MODE_INT, 0));
>> -
>> -  /* Size of ymode must not be greater than the size of xmode.  */
>> -  mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode);
>> -  gcc_assert (mode_multiple != 0);
>> -
>> -  y_offset = offset / GET_MODE_SIZE (ymode);
>> -  y_offset_adj = offset_adj / GET_MODE_SIZE (ymode);
>> -  nregs_multiple = nregs_xmode / nregs_ymode;
>> -
>> -  gcc_assert ((offset_adj % GET_MODE_SIZE (ymode)) == 0);
>> -  gcc_assert ((mode_multiple % nregs_multiple) == 0);
>> +  /* Calculate the number of bytes in each block.  This must always
>> +     be exact, otherwise we don't know how to verify the constraint.
>> +     These conditions may be relaxed but subreg_regno_offset would
>> +     need to be redesigned.  */
>> +  gcc_assert ((xsize % num_blocks) == 0);
>> +  unsigned int bytes_per_block = xsize / num_blocks;
>> +
>> +  /* Get the number of the first block that contains the subreg and the
>> byte +     offset of the subreg from the start of that block.  */
>> +  unsigned int block_number = offset / bytes_per_block;
>> +  unsigned int subblock_offset = offset % bytes_per_block;
>> 
>>    if (!rknown)
>>      {
>> -      info->representable_p = (!(y_offset_adj % (mode_multiple /
>> nregs_multiple))); +      /* Only the lowpart of each block is
>> representable.  */
>> +      info->representable_p
>> +	= (subblock_offset
>> +	   == subreg_size_lowpart_offset (ysize, bytes_per_block));
>>        rknown = true;
>>      }
>> -  info->offset = (y_offset / (mode_multiple / nregs_multiple)) *
>> nregs_ymode; +
>
> This part is OK.
>
>> +  if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN)
>> +    info->offset = (num_blocks - block_number - 1) * nregs_ymode;
>> +  else
>> +    info->offset = block_number * nregs_ymode;
>>    info->nregs = nregs_ymode;
>>  }
>
> Why was the test on the mode size dropped?

For WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN targets?  In practice
the old code didn't handle the case in which a single word spans more
than one register: if xmode was bigger than a word, ymode was smaller
than a word, and the number of registers in a ymode was smaller than
the number of registers in a word, we would need to take "normal"
endianness into account to resolve the subword register offset while
using REG_WORDS_BIG_ENDIAN for the word component.  Instead the old
code reversed the endianness relative the size of ymode, regardless of
whether ymode was bigger than a word or smaller than a word.  In other
words, the assumption seems to have been that REG_WORDS_BIG_ENDIAN is
effectively "endianness across multiple registers" and there is no need
to subdivide register offsets into words and subwords.

In practice that was OK, since AFAICT no target with WORDS_BIG_ENDIAN !=
REG_WORDS_BIG_ENDIAN had subword-sized registers.  This in turn means
that "block endianness" is always word endianness for these targets.

But I suppose we should have an assert, even though it was already
an implicit assumption... :-)

Thanks,
Richard

Eric Botcazou Nov. 16, 2016, 2:52 p.m. UTC | #5

> Well, I think it's probably grammatical, but how about:
> 
>   If the register representation of a non-scalar mode has holes in it,
>   we expect the scalar units to be concatenated together, with the holes
>   distributed evenly among the scalar units.  Each scalar unit must occupy
>   at least one register.

Fine with me, thanks.

> This actually was one of the more important changes :-)  In combination
> with later patches, the idea is to move away from UNITS_PER_WORD tests
> when endianness is regular (all big or all little) and only do them
> when the distinction between bytes and words makes a real difference.
> 
> The specific motivating examples were SVE predicate registers, which
> occupy VL*2 bytes for some runtime VL.  They are smaller than a word
> when VL<4, word-sized when VL==4, and bigger than a word when VL>4.
> We therefore can't calculate:
> 
>   GET_MODE_SIZE (ymode) > UNITS_PER_WORD
> 
> at compile time.  This is one of the patches that avoids forcing the
> issue unless the answer really matters.

So you plan to modify it again to remove the ysize > UNITS_PER_WORD test?  Or 
you don't really care about the REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN case?

I understand that this formulation is intended to hide various combinations of 
WORDS_BIG_ENDIAN and BYTES_BIG_ENDIAN, but it's a net loss for most targets 
where the old code boils down to an unconditional assignment to info->offset.
Would it be doable to use the same handling as in subreg_size_lowpart_offset?

> For WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN targets?  In practice
> the old code didn't handle the case in which a single word spans more
> than one register: if xmode was bigger than a word, ymode was smaller
> than a word, and the number of registers in a ymode was smaller than
> the number of registers in a word, we would need to take "normal"
> endianness into account to resolve the subword register offset while
> using REG_WORDS_BIG_ENDIAN for the word component.  Instead the old
> code reversed the endianness relative the size of ymode, regardless of
> whether ymode was bigger than a word or smaller than a word.  In other
> words, the assumption seems to have been that REG_WORDS_BIG_ENDIAN is
> effectively "endianness across multiple registers" and there is no need
> to subdivide register offsets into words and subwords.
> 
> In practice that was OK, since AFAICT no target with WORDS_BIG_ENDIAN !=
> REG_WORDS_BIG_ENDIAN had subword-sized registers.  This in turn means
> that "block endianness" is always word endianness for these targets.

Since you tested on c6x-elf, that's OK, but I think that a comment before the 
if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN) test would be in order, stating 
the implicit assumption made at this point.

Richard Sandiford Nov. 18, 2016, 1:51 p.m. UTC | #6

Joseph Myers <joseph@codesourcery.com> writes:
> On Tue, 15 Nov 2016, Richard Sandiford wrote:
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > This isn't intended to change the behaviour, just rewrite the
>> > existing logic in a different (and hopefully clearer) way.
>> > The new form -- particularly the part based on the "block"
>> > concept -- is easier to convert to polynomial sizes.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> 
>> Sorry, I should have said: this was also tested by compiling the
>> testsuite before and after the change at -O2 -ftree-vectorize on:
>> 
>>     aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
>>     arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
>>     epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
>>     hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
>>     i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
>>     m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
>>     mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
>>     nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
>>     powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf
>
> e500 double (both DFmode and TFmode) was the case that motivated the 
> original creation of subreg_get_info.  I think powerpc-linux-gnuspe 
> --enable-e500-double would be a good case for verifying the patch doesn't 
> change generated code.  (Though when I tried building it from mainline 
> sources last week I got an ICE in LRA building __multc3, so testing it 
> might be problematic at present.)

Thanks for the pointer.  I've now tried comparing the testsuite
output on that combination and there were no differences.  31 tests
triggered an internal compiler error but >17800 compiled successfully,
so hopefully the test was at least somewhat meaningful.

(For the record: I got a build failure due to addsi3 FAILing,
but I hacked around that by converting it to a force_reg/DONE.)

Thanks,
Richard

Rework subreg_get_info

Commit Message

Comments

Patch