Message ID | 87oa1g1ysh.fsf@e105548-lin.cambridge.arm.com |
---|---|
State | New |
Headers | show |
Richard Sandiford <richard.sandiford@arm.com> writes: > This isn't intended to change the behaviour, just rewrite the > existing logic in a different (and hopefully clearer) way. > The new form -- particularly the part based on the "block" > concept -- is easier to convert to polynomial sizes. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Sorry, I should have said: this was also tested by compiling the testsuite before and after the change at -O2 -ftree-vectorize on: aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf rx-elf s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin x86_64-linux-gnu xtensa-elf There were no differences in assembly output. Thanks, Richard
On Tue, 15 Nov 2016, Richard Sandiford wrote: > Richard Sandiford <richard.sandiford@arm.com> writes: > > This isn't intended to change the behaviour, just rewrite the > > existing logic in a different (and hopefully clearer) way. > > The new form -- particularly the part based on the "block" > > concept -- is easier to convert to polynomial sizes. > > > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Sorry, I should have said: this was also tested by compiling the > testsuite before and after the change at -O2 -ftree-vectorize on: > > aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi > arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf > epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf > hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu > i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf > m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu > mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf > nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu > powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf e500 double (both DFmode and TFmode) was the case that motivated the original creation of subreg_get_info. I think powerpc-linux-gnuspe --enable-e500-double would be a good case for verifying the patch doesn't change generated code. (Though when I tried building it from mainline sources last week I got an ICE in LRA building __multc3, so testing it might be problematic at present.)
> This isn't intended to change the behaviour, just rewrite the > existing logic in a different (and hopefully clearer) way. Yes, I agree that it's an improvement. A few remarks below. > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c > index ca6cced..7c0acf5 100644 > --- a/gcc/rtlanal.c > +++ b/gcc/rtlanal.c > @@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode > xmode, unsigned int offset, machine_mode ymode, > struct subreg_info *info) > { > - int nregs_xmode, nregs_ymode; > - int mode_multiple, nregs_multiple; > - int offset_adj, y_offset, y_offset_adj; > - int regsize_xmode, regsize_ymode; > - bool rknown; > + unsigned int nregs_xmode, nregs_ymode; > > gcc_assert (xregno < FIRST_PSEUDO_REGISTER); > > - rknown = false; > + unsigned int xsize = GET_MODE_SIZE (xmode); > + unsigned int ysize = GET_MODE_SIZE (ymode); > + bool rknown = false; > > /* If there are holes in a non-scalar mode in registers, we expect > - that it is made up of its units concatenated together. */ > + that it is made up of its units concatenated together. Each scalar > + unit occupies at least one register. */ Why using "scalar unit" while the first sentence uses "unit"? Are they different units? What's the status of the new sentence? Assertion? Expectation? I'd also make the first sentence grammatical. > if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode)) > { > - machine_mode xmode_unit; > - > nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); > - xmode_unit = GET_MODE_INNER (xmode); > + unsigned int nunits = GET_MODE_NUNITS (xmode); > + machine_mode xmode_unit = GET_MODE_INNER (xmode); > gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); > gcc_assert (nregs_xmode > - == (GET_MODE_NUNITS (xmode) > + == (nunits > * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); > gcc_assert (hard_regno_nregs[xregno][xmode] > - == (hard_regno_nregs[xregno][xmode_unit] > - * GET_MODE_NUNITS (xmode))); > + == hard_regno_nregs[xregno][xmode_unit] * nunits); > > /* You can only ask for a SUBREG of a value with holes in the middle > if you don't cross the holes. (Such a SUBREG should be done by > @@ -3635,11 +3632,9 @@ subreg_get_info (unsigned int xregno, machine_mode > xmode, 3 for each part, but in memory it's two 128-bit parts. > Padding is assumed to be at the end (not necessarily the 'high part') > of each unit. */ > - if ((offset / GET_MODE_SIZE (xmode_unit) + 1 > - < GET_MODE_NUNITS (xmode)) > + if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits) > && (offset / GET_MODE_SIZE (xmode_unit) > - != ((offset + GET_MODE_SIZE (ymode) - 1) > - / GET_MODE_SIZE (xmode_unit)))) > + != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit)))) > { > info->representable_p = false; > rknown = true; This part is OK. > @@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode > xmode, nregs_ymode = hard_regno_nregs[xregno][ymode]; > > /* Paradoxical subregs are otherwise valid. */ > - if (!rknown > - && offset == 0 > - && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode)) > + if (!rknown && offset == 0 && ysize > xsize) > { > info->representable_p = true; > /* If this is a big endian paradoxical subreg, which uses more > actual hard registers than the original register, we must > return a negative offset so that we find the proper highpart > of the register. */ > - if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD > - ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN) > - info->offset = nregs_xmode - nregs_ymode; > + if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize > > UNITS_PER_WORD + ? REG_WORDS_BIG_ENDIAN > + : byte_lowpart_offset (ymode, xmode) != 0) > + info->offset = (int) nregs_xmode - (int) nregs_ymode; > else > info->offset = 0; > info->nregs = nregs_ymode; This part is not OK, it turns straightforward code into convoluted code. > @@ -3673,31 +3667,23 @@ subreg_get_info (unsigned int xregno, machine_mode > xmode, modes, we cannot generally form this subreg. */ > if (!HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode) > && !HARD_REGNO_NREGS_HAS_PADDING (xregno, ymode) > - && (GET_MODE_SIZE (xmode) % nregs_xmode) == 0 > - && (GET_MODE_SIZE (ymode) % nregs_ymode) == 0) > + && (xsize % nregs_xmode) == 0 > + && (ysize % nregs_ymode) == 0) > { > - regsize_xmode = GET_MODE_SIZE (xmode) / nregs_xmode; > - regsize_ymode = GET_MODE_SIZE (ymode) / nregs_ymode; > - if (!rknown && regsize_xmode > regsize_ymode && nregs_ymode > 1) > - { > - info->representable_p = false; > - info->nregs > - = (GET_MODE_SIZE (ymode) + regsize_xmode - 1) / regsize_xmode; > - info->offset = offset / regsize_xmode; > - return; > - } > - if (!rknown && regsize_ymode > regsize_xmode && nregs_xmode > 1) > + int regsize_xmode = xsize / nregs_xmode; > + int regsize_ymode = ysize / nregs_ymode; > + if (!rknown > + && ((nregs_ymode > 1 && regsize_xmode > regsize_ymode) > + || (nregs_xmode > 1 && regsize_ymode > regsize_xmode))) > { > info->representable_p = false; > - info->nregs > - = (GET_MODE_SIZE (ymode) + regsize_xmode - 1) / regsize_xmode; > + info->nregs = CEIL (ysize, regsize_xmode); > info->offset = offset / regsize_xmode; > return; > } > /* It's not valid to extract a subreg of mode YMODE at OFFSET that > would go outside of XMODE. */ > - if (!rknown > - && GET_MODE_SIZE (ymode) + offset > GET_MODE_SIZE (xmode)) > + if (!rknown && ysize + offset > xsize) > { > info->representable_p = false; > info->nregs = nregs_ymode; This part is OK. > @@ -3717,7 +3703,7 @@ subreg_get_info (unsigned int xregno, machine_mode > xmode, info->representable_p = true; > info->nregs = nregs_ymode; > info->offset = offset / regsize_ymode; > - gcc_assert (info->offset + info->nregs <= nregs_xmode); > + gcc_assert (info->offset + nregs_ymode <= nregs_xmode); > return; > } > } This part is not OK, the assertion is intended to be on INFO. > @@ -3736,47 +3722,39 @@ subreg_get_info (unsigned int xregno, machine_mode > xmode, } > } > > - /* This should always pass, otherwise we don't know how to verify > - the constraint. These conditions may be relaxed but > - subreg_regno_offset would need to be redesigned. */ > - gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); > + /* Set NUM_BLOCKS to the number of independently-representable YMODE > + values there are in (reg:XMODE XREGNO). We can view the register > + as consisting of this number of independent "blocks", where each > + block occupies NREGS_YMODE registers and contains exactly one > + representable YMODE value. */ > gcc_assert ((nregs_xmode % nregs_ymode) == 0); > + unsigned int num_blocks = nregs_xmode / nregs_ymode; I find the "exactly" slightly confusing here, because it can make you think that it contains the number of bytes of a YMODE value; a possible better wording would be "a single" instead of "exactly one". > - if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN > - && GET_MODE_SIZE (xmode) > UNITS_PER_WORD) > - { > - HOST_WIDE_INT xsize = GET_MODE_SIZE (xmode); > - HOST_WIDE_INT ysize = GET_MODE_SIZE (ymode); > - HOST_WIDE_INT off_low = offset & (ysize - 1); > - HOST_WIDE_INT off_high = offset & ~(ysize - 1); > - offset = (xsize - ysize - off_high) | off_low; > - } > - /* The XMODE value can be seen as a vector of NREGS_XMODE > - values. The subreg must represent a lowpart of given field. > - Compute what field it is. */ > - offset_adj = offset; > - offset_adj -= subreg_lowpart_offset (ymode, > - mode_for_size (GET_MODE_BITSIZE (xmode) > - / nregs_xmode, > - MODE_INT, 0)); > - > - /* Size of ymode must not be greater than the size of xmode. */ > - mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode); > - gcc_assert (mode_multiple != 0); > - > - y_offset = offset / GET_MODE_SIZE (ymode); > - y_offset_adj = offset_adj / GET_MODE_SIZE (ymode); > - nregs_multiple = nregs_xmode / nregs_ymode; > - > - gcc_assert ((offset_adj % GET_MODE_SIZE (ymode)) == 0); > - gcc_assert ((mode_multiple % nregs_multiple) == 0); > + /* Calculate the number of bytes in each block. This must always > + be exact, otherwise we don't know how to verify the constraint. > + These conditions may be relaxed but subreg_regno_offset would > + need to be redesigned. */ > + gcc_assert ((xsize % num_blocks) == 0); > + unsigned int bytes_per_block = xsize / num_blocks; > + > + /* Get the number of the first block that contains the subreg and the > byte + offset of the subreg from the start of that block. */ > + unsigned int block_number = offset / bytes_per_block; > + unsigned int subblock_offset = offset % bytes_per_block; > > if (!rknown) > { > - info->representable_p = (!(y_offset_adj % (mode_multiple / > nregs_multiple))); + /* Only the lowpart of each block is > representable. */ > + info->representable_p > + = (subblock_offset > + == subreg_size_lowpart_offset (ysize, bytes_per_block)); > rknown = true; > } > - info->offset = (y_offset / (mode_multiple / nregs_multiple)) * > nregs_ymode; + This part is OK. > + if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN) > + info->offset = (num_blocks - block_number - 1) * nregs_ymode; > + else > + info->offset = block_number * nregs_ymode; > info->nregs = nregs_ymode; > } Why was the test on the mode size dropped?
Eric Botcazou <ebotcazou@adacore.com> writes: >> This isn't intended to change the behaviour, just rewrite the >> existing logic in a different (and hopefully clearer) way. > > Yes, I agree that it's an improvement. A few remarks below. > >> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c >> index ca6cced..7c0acf5 100644 >> --- a/gcc/rtlanal.c >> +++ b/gcc/rtlanal.c >> @@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode >> xmode, unsigned int offset, machine_mode ymode, >> struct subreg_info *info) >> { >> - int nregs_xmode, nregs_ymode; >> - int mode_multiple, nregs_multiple; >> - int offset_adj, y_offset, y_offset_adj; >> - int regsize_xmode, regsize_ymode; >> - bool rknown; >> + unsigned int nregs_xmode, nregs_ymode; >> >> gcc_assert (xregno < FIRST_PSEUDO_REGISTER); >> >> - rknown = false; >> + unsigned int xsize = GET_MODE_SIZE (xmode); >> + unsigned int ysize = GET_MODE_SIZE (ymode); >> + bool rknown = false; >> >> /* If there are holes in a non-scalar mode in registers, we expect >> - that it is made up of its units concatenated together. */ >> + that it is made up of its units concatenated together. Each scalar >> + unit occupies at least one register. */ > > Why using "scalar unit" while the first sentence uses "unit"? Are they > different units? No, the same. I should probably have updated both. I added "scalar" to make it clear that we were talking about units in the GET_MODE_NUNITS sense rather than the UNITS_PER_WORD sense (i.e. number of scalar values rather than "unit" as an abstraction of "byte"). > What's the status of the new sentence? Assertion? > Expectation? Assertion. It was a necessary but not sufficient condition to satisfy the pre-patch: gcc_assert (hard_regno_nregs[xregno][xmode] == (hard_regno_nregs[xregno][xmode_unit] * GET_MODE_NUNITS (xmode))); > I'd also make the first sentence grammatical. Well, I think it's probably grammatical, but how about: If the register representation of a non-scalar mode has holes in it, we expect the scalar units to be concatenated together, with the holes distributed evenly among the scalar units. Each scalar unit must occupy at least one register. >> @@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode >> xmode, nregs_ymode = hard_regno_nregs[xregno][ymode]; >> >> /* Paradoxical subregs are otherwise valid. */ >> - if (!rknown >> - && offset == 0 >> - && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode)) >> + if (!rknown && offset == 0 && ysize > xsize) >> { >> info->representable_p = true; >> /* If this is a big endian paradoxical subreg, which uses more >> actual hard registers than the original register, we must >> return a negative offset so that we find the proper highpart >> of the register. */ >> - if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD >> - ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN) >> - info->offset = nregs_xmode - nregs_ymode; >> + if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize > >> UNITS_PER_WORD + ? REG_WORDS_BIG_ENDIAN >> + : byte_lowpart_offset (ymode, xmode) != 0) >> + info->offset = (int) nregs_xmode - (int) nregs_ymode; >> else >> info->offset = 0; >> info->nregs = nregs_ymode; > > This part is not OK, it turns straightforward code into convoluted code. This actually was one of the more important changes :-) In combination with later patches, the idea is to move away from UNITS_PER_WORD tests when endianness is regular (all big or all little) and only do them when the distinction between bytes and words makes a real difference. The specific motivating examples were SVE predicate registers, which occupy VL*2 bytes for some runtime VL. They are smaller than a word when VL<4, word-sized when VL==4, and bigger than a word when VL>4. We therefore can't calculate: GET_MODE_SIZE (ymode) > UNITS_PER_WORD at compile time. This is one of the patches that avoids forcing the issue unless the answer really matters. >> @@ -3717,7 +3703,7 @@ subreg_get_info (unsigned int xregno, machine_mode >> xmode, info->representable_p = true; >> info->nregs = nregs_ymode; >> info->offset = offset / regsize_ymode; >> - gcc_assert (info->offset + info->nregs <= nregs_xmode); >> + gcc_assert (info->offset + nregs_ymode <= nregs_xmode); >> return; >> } >> } > > This part is not OK, the assertion is intended to be on INFO. This was a cheap way of avoiding having to add a cast. I'll add the cast instead. >> @@ -3736,47 +3722,39 @@ subreg_get_info (unsigned int xregno, machine_mode >> xmode, } >> } >> >> - /* This should always pass, otherwise we don't know how to verify >> - the constraint. These conditions may be relaxed but >> - subreg_regno_offset would need to be redesigned. */ >> - gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); >> + /* Set NUM_BLOCKS to the number of independently-representable YMODE >> + values there are in (reg:XMODE XREGNO). We can view the register >> + as consisting of this number of independent "blocks", where each >> + block occupies NREGS_YMODE registers and contains exactly one >> + representable YMODE value. */ >> gcc_assert ((nregs_xmode % nregs_ymode) == 0); >> + unsigned int num_blocks = nregs_xmode / nregs_ymode; > > I find the "exactly" slightly confusing here, because it can make you think > that it contains the number of bytes of a YMODE value; a possible better > wording would be "a single" instead of "exactly one". OK. >> - if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN >> - && GET_MODE_SIZE (xmode) > UNITS_PER_WORD) >> - { >> - HOST_WIDE_INT xsize = GET_MODE_SIZE (xmode); >> - HOST_WIDE_INT ysize = GET_MODE_SIZE (ymode); >> - HOST_WIDE_INT off_low = offset & (ysize - 1); >> - HOST_WIDE_INT off_high = offset & ~(ysize - 1); >> - offset = (xsize - ysize - off_high) | off_low; >> - } >> - /* The XMODE value can be seen as a vector of NREGS_XMODE >> - values. The subreg must represent a lowpart of given field. >> - Compute what field it is. */ >> - offset_adj = offset; >> - offset_adj -= subreg_lowpart_offset (ymode, >> - mode_for_size (GET_MODE_BITSIZE > (xmode) >> - / nregs_xmode, >> - MODE_INT, 0)); >> - >> - /* Size of ymode must not be greater than the size of xmode. */ >> - mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode); >> - gcc_assert (mode_multiple != 0); >> - >> - y_offset = offset / GET_MODE_SIZE (ymode); >> - y_offset_adj = offset_adj / GET_MODE_SIZE (ymode); >> - nregs_multiple = nregs_xmode / nregs_ymode; >> - >> - gcc_assert ((offset_adj % GET_MODE_SIZE (ymode)) == 0); >> - gcc_assert ((mode_multiple % nregs_multiple) == 0); >> + /* Calculate the number of bytes in each block. This must always >> + be exact, otherwise we don't know how to verify the constraint. >> + These conditions may be relaxed but subreg_regno_offset would >> + need to be redesigned. */ >> + gcc_assert ((xsize % num_blocks) == 0); >> + unsigned int bytes_per_block = xsize / num_blocks; >> + >> + /* Get the number of the first block that contains the subreg and the >> byte + offset of the subreg from the start of that block. */ >> + unsigned int block_number = offset / bytes_per_block; >> + unsigned int subblock_offset = offset % bytes_per_block; >> >> if (!rknown) >> { >> - info->representable_p = (!(y_offset_adj % (mode_multiple / >> nregs_multiple))); + /* Only the lowpart of each block is >> representable. */ >> + info->representable_p >> + = (subblock_offset >> + == subreg_size_lowpart_offset (ysize, bytes_per_block)); >> rknown = true; >> } >> - info->offset = (y_offset / (mode_multiple / nregs_multiple)) * >> nregs_ymode; + > > This part is OK. > >> + if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN) >> + info->offset = (num_blocks - block_number - 1) * nregs_ymode; >> + else >> + info->offset = block_number * nregs_ymode; >> info->nregs = nregs_ymode; >> } > > Why was the test on the mode size dropped? For WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN targets? In practice the old code didn't handle the case in which a single word spans more than one register: if xmode was bigger than a word, ymode was smaller than a word, and the number of registers in a ymode was smaller than the number of registers in a word, we would need to take "normal" endianness into account to resolve the subword register offset while using REG_WORDS_BIG_ENDIAN for the word component. Instead the old code reversed the endianness relative the size of ymode, regardless of whether ymode was bigger than a word or smaller than a word. In other words, the assumption seems to have been that REG_WORDS_BIG_ENDIAN is effectively "endianness across multiple registers" and there is no need to subdivide register offsets into words and subwords. In practice that was OK, since AFAICT no target with WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN had subword-sized registers. This in turn means that "block endianness" is always word endianness for these targets. But I suppose we should have an assert, even though it was already an implicit assumption... :-) Thanks, Richard
> Well, I think it's probably grammatical, but how about: > > If the register representation of a non-scalar mode has holes in it, > we expect the scalar units to be concatenated together, with the holes > distributed evenly among the scalar units. Each scalar unit must occupy > at least one register. Fine with me, thanks. > This actually was one of the more important changes :-) In combination > with later patches, the idea is to move away from UNITS_PER_WORD tests > when endianness is regular (all big or all little) and only do them > when the distinction between bytes and words makes a real difference. > > The specific motivating examples were SVE predicate registers, which > occupy VL*2 bytes for some runtime VL. They are smaller than a word > when VL<4, word-sized when VL==4, and bigger than a word when VL>4. > We therefore can't calculate: > > GET_MODE_SIZE (ymode) > UNITS_PER_WORD > > at compile time. This is one of the patches that avoids forcing the > issue unless the answer really matters. So you plan to modify it again to remove the ysize > UNITS_PER_WORD test? Or you don't really care about the REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN case? I understand that this formulation is intended to hide various combinations of WORDS_BIG_ENDIAN and BYTES_BIG_ENDIAN, but it's a net loss for most targets where the old code boils down to an unconditional assignment to info->offset. Would it be doable to use the same handling as in subreg_size_lowpart_offset? > For WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN targets? In practice > the old code didn't handle the case in which a single word spans more > than one register: if xmode was bigger than a word, ymode was smaller > than a word, and the number of registers in a ymode was smaller than > the number of registers in a word, we would need to take "normal" > endianness into account to resolve the subword register offset while > using REG_WORDS_BIG_ENDIAN for the word component. Instead the old > code reversed the endianness relative the size of ymode, regardless of > whether ymode was bigger than a word or smaller than a word. In other > words, the assumption seems to have been that REG_WORDS_BIG_ENDIAN is > effectively "endianness across multiple registers" and there is no need > to subdivide register offsets into words and subwords. > > In practice that was OK, since AFAICT no target with WORDS_BIG_ENDIAN != > REG_WORDS_BIG_ENDIAN had subword-sized registers. This in turn means > that "block endianness" is always word endianness for these targets. Since you tested on c6x-elf, that's OK, but I think that a comment before the if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN) test would be in order, stating the implicit assumption made at this point.
Joseph Myers <joseph@codesourcery.com> writes: > On Tue, 15 Nov 2016, Richard Sandiford wrote: >> Richard Sandiford <richard.sandiford@arm.com> writes: >> > This isn't intended to change the behaviour, just rewrite the >> > existing logic in a different (and hopefully clearer) way. >> > The new form -- particularly the part based on the "block" >> > concept -- is easier to convert to polynomial sizes. >> > >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Sorry, I should have said: this was also tested by compiling the >> testsuite before and after the change at -O2 -ftree-vectorize on: >> >> aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi >> arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf >> epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf >> hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu >> i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf >> m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu >> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf >> nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu >> powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf > > e500 double (both DFmode and TFmode) was the case that motivated the > original creation of subreg_get_info. I think powerpc-linux-gnuspe > --enable-e500-double would be a good case for verifying the patch doesn't > change generated code. (Though when I tried building it from mainline > sources last week I got an ICE in LRA building __multc3, so testing it > might be problematic at present.) Thanks for the pointer. I've now tried comparing the testsuite output on that combination and there were no differences. 31 tests triggered an internal compiler error but >17800 compiled successfully, so hopefully the test was at least somewhat meaningful. (For the record: I got a build failure due to addsi3 FAILing, but I hacked around that by converting it to a force_reg/DONE.) Thanks, Richard
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index ca6cced..7c0acf5 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, unsigned int offset, machine_mode ymode, struct subreg_info *info) { - int nregs_xmode, nregs_ymode; - int mode_multiple, nregs_multiple; - int offset_adj, y_offset, y_offset_adj; - int regsize_xmode, regsize_ymode; - bool rknown; + unsigned int nregs_xmode, nregs_ymode; gcc_assert (xregno < FIRST_PSEUDO_REGISTER); - rknown = false; + unsigned int xsize = GET_MODE_SIZE (xmode); + unsigned int ysize = GET_MODE_SIZE (ymode); + bool rknown = false; /* If there are holes in a non-scalar mode in registers, we expect - that it is made up of its units concatenated together. */ + that it is made up of its units concatenated together. Each scalar + unit occupies at least one register. */ if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode)) { - machine_mode xmode_unit; - nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); - xmode_unit = GET_MODE_INNER (xmode); + unsigned int nunits = GET_MODE_NUNITS (xmode); + machine_mode xmode_unit = GET_MODE_INNER (xmode); gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); gcc_assert (nregs_xmode - == (GET_MODE_NUNITS (xmode) + == (nunits * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); gcc_assert (hard_regno_nregs[xregno][xmode] - == (hard_regno_nregs[xregno][xmode_unit] - * GET_MODE_NUNITS (xmode))); + == hard_regno_nregs[xregno][xmode_unit] * nunits); /* You can only ask for a SUBREG of a value with holes in the middle if you don't cross the holes. (Such a SUBREG should be done by @@ -3635,11 +3632,9 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, 3 for each part, but in memory it's two 128-bit parts. Padding is assumed to be at the end (not necessarily the 'high part') of each unit. */ - if ((offset / GET_MODE_SIZE (xmode_unit) + 1 - < GET_MODE_NUNITS (xmode)) + if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits) && (offset / GET_MODE_SIZE (xmode_unit) - != ((offset + GET_MODE_SIZE (ymode) - 1) - / GET_MODE_SIZE (xmode_unit)))) + != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit)))) { info->representable_p = false; rknown = true; @@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, nregs_ymode = hard_regno_nregs[xregno][ymode]; /* Paradoxical subregs are otherwise valid. */ - if (!rknown - && offset == 0 - && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode)) + if (!rknown && offset == 0 && ysize > xsize) { info->representable_p = true; /* If this is a big endian paradoxical subreg, which uses more actual hard registers than the original register, we must return a negative offset so that we find the proper highpart of the register. */ - if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD - ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN) - info->offset = nregs_xmode - nregs_ymode; + if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize > UNITS_PER_WORD + ? REG_WORDS_BIG_ENDIAN + : byte_lowpart_offset (ymode, xmode) != 0) + info->offset = (int) nregs_xmode - (int) nregs_ymode; else info->offset = 0; info->nregs = nregs_ymode; @@ -3673,31 +3667,23 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, modes, we cannot generally form this subreg. */ if (!HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode) && !HARD_REGNO_NREGS_HAS_PADDING (xregno, ymode) - && (GET_MODE_SIZE (xmode) % nregs_xmode) == 0 - && (GET_MODE_SIZE (ymode) % nregs_ymode) == 0) + && (xsize % nregs_xmode) == 0 + && (ysize % nregs_ymode) == 0) { - regsize_xmode = GET_MODE_SIZE (xmode) / nregs_xmode; - regsize_ymode = GET_MODE_SIZE (ymode) / nregs_ymode; - if (!rknown && regsize_xmode > regsize_ymode && nregs_ymode > 1) - { - info->representable_p = false; - info->nregs - = (GET_MODE_SIZE (ymode) + regsize_xmode - 1) / regsize_xmode; - info->offset = offset / regsize_xmode; - return; - } - if (!rknown && regsize_ymode > regsize_xmode && nregs_xmode > 1) + int regsize_xmode = xsize / nregs_xmode; + int regsize_ymode = ysize / nregs_ymode; + if (!rknown + && ((nregs_ymode > 1 && regsize_xmode > regsize_ymode) + || (nregs_xmode > 1 && regsize_ymode > regsize_xmode))) { info->representable_p = false; - info->nregs - = (GET_MODE_SIZE (ymode) + regsize_xmode - 1) / regsize_xmode; + info->nregs = CEIL (ysize, regsize_xmode); info->offset = offset / regsize_xmode; return; } /* It's not valid to extract a subreg of mode YMODE at OFFSET that would go outside of XMODE. */ - if (!rknown - && GET_MODE_SIZE (ymode) + offset > GET_MODE_SIZE (xmode)) + if (!rknown && ysize + offset > xsize) { info->representable_p = false; info->nregs = nregs_ymode; @@ -3717,7 +3703,7 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, info->representable_p = true; info->nregs = nregs_ymode; info->offset = offset / regsize_ymode; - gcc_assert (info->offset + info->nregs <= nregs_xmode); + gcc_assert (info->offset + nregs_ymode <= nregs_xmode); return; } } @@ -3736,47 +3722,39 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, } } - /* This should always pass, otherwise we don't know how to verify - the constraint. These conditions may be relaxed but - subreg_regno_offset would need to be redesigned. */ - gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); + /* Set NUM_BLOCKS to the number of independently-representable YMODE + values there are in (reg:XMODE XREGNO). We can view the register + as consisting of this number of independent "blocks", where each + block occupies NREGS_YMODE registers and contains exactly one + representable YMODE value. */ gcc_assert ((nregs_xmode % nregs_ymode) == 0); + unsigned int num_blocks = nregs_xmode / nregs_ymode; - if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN - && GET_MODE_SIZE (xmode) > UNITS_PER_WORD) - { - HOST_WIDE_INT xsize = GET_MODE_SIZE (xmode); - HOST_WIDE_INT ysize = GET_MODE_SIZE (ymode); - HOST_WIDE_INT off_low = offset & (ysize - 1); - HOST_WIDE_INT off_high = offset & ~(ysize - 1); - offset = (xsize - ysize - off_high) | off_low; - } - /* The XMODE value can be seen as a vector of NREGS_XMODE - values. The subreg must represent a lowpart of given field. - Compute what field it is. */ - offset_adj = offset; - offset_adj -= subreg_lowpart_offset (ymode, - mode_for_size (GET_MODE_BITSIZE (xmode) - / nregs_xmode, - MODE_INT, 0)); - - /* Size of ymode must not be greater than the size of xmode. */ - mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode); - gcc_assert (mode_multiple != 0); - - y_offset = offset / GET_MODE_SIZE (ymode); - y_offset_adj = offset_adj / GET_MODE_SIZE (ymode); - nregs_multiple = nregs_xmode / nregs_ymode; - - gcc_assert ((offset_adj % GET_MODE_SIZE (ymode)) == 0); - gcc_assert ((mode_multiple % nregs_multiple) == 0); + /* Calculate the number of bytes in each block. This must always + be exact, otherwise we don't know how to verify the constraint. + These conditions may be relaxed but subreg_regno_offset would + need to be redesigned. */ + gcc_assert ((xsize % num_blocks) == 0); + unsigned int bytes_per_block = xsize / num_blocks; + + /* Get the number of the first block that contains the subreg and the byte + offset of the subreg from the start of that block. */ + unsigned int block_number = offset / bytes_per_block; + unsigned int subblock_offset = offset % bytes_per_block; if (!rknown) { - info->representable_p = (!(y_offset_adj % (mode_multiple / nregs_multiple))); + /* Only the lowpart of each block is representable. */ + info->representable_p + = (subblock_offset + == subreg_size_lowpart_offset (ysize, bytes_per_block)); rknown = true; } - info->offset = (y_offset / (mode_multiple / nregs_multiple)) * nregs_ymode; + + if (WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN) + info->offset = (num_blocks - block_number - 1) * nregs_ymode; + else + info->offset = block_number * nregs_ymode; info->nregs = nregs_ymode; }