Message ID | D0B07D0D.4108%alan.hayward@arm.com |
---|---|
State | New |
Headers | show |
Alan Hayward <alan.hayward@arm.com> writes: > [Cleaning this thread up to submit patch again, with better explanation] > > This patch causes subreg_get_info() to exit early in the simple cases > where we are extracting a whole register from a multi register. > > In aarch64 for Big Endian we were producing a subreg of a OImode (256bits) > from a CImode (384bits) This would hit the following assert in > subreg_get_info: > > gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); > > > This is a rule we should be able to relax a little - if the subreg we want > fits into a whole register then this is a valid result and can be easily > detected earlier in the function. > > This has the bonus that we should be slightly reducing the execution time > for more common cases, for example a subreg of 64bits from 256bits. > > This patch is required for the second part of the patch, which is aarch64 > specific, and fixes up aarch64 Big Endian movoi/ci/xi. This second part > has already been approved. > > This patch will apply cleanly by itself and no regressions were seen when > testing aarch64 and x86_64 on make check. FWIW I agree this is the right approach, although I can't approve it. The assert above is guarding code that deals with a very general case, including some unusual combinations, so I don't think it would be a good idea to try to remove it entirely. E.g. Tejas hit the same assert because we were trying to create subregs of EImode SIMD registers on AArch64. EImode is 24 bytes, so it's one-*and-a-half* SIMD registers. Taking subregs of something like that is very dangerous and I think we want the assert to continue to trigger there. This patch deals with a much simpler and more obvious case. Thanks, Richard
> FWIW I agree this is the right approach, although I can't approve it. > The assert above is guarding code that deals with a very general case, > including some unusual combinations, so I don't think it would be a > good idea to try to remove it entirely. Yes, but the patch is a bit of kludge since it short-circuits the meat of the function: /* This should always pass, otherwise we don't know how to verify the constraint. These conditions may be relaxed but subreg_regno_offset would need to be redesigned. */ gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); gcc_assert ((nregs_xmode % nregs_ymode) == 0); So what would it take to do things properly here, i.e. relax the conditions and adjust downstream?
Eric Botcazou <ebotcazou@adacore.com> writes: >> FWIW I agree this is the right approach, although I can't approve it. >> The assert above is guarding code that deals with a very general case, >> including some unusual combinations, so I don't think it would be a >> good idea to try to remove it entirely. > > Yes, but the patch is a bit of kludge since it short-circuits the meat of the > function: > > /* This should always pass, otherwise we don't know how to verify > the constraint. These conditions may be relaxed but > subreg_regno_offset would need to be redesigned. */ > gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); > gcc_assert ((nregs_xmode % nregs_ymode) == 0); > > So what would it take to do things properly here, i.e. relax the conditions > and adjust downstream? What do you think we should relax it to though? Obviously there's a balance here between relaxing things enough and not relaxing them too far (so that the EImode AArch64 thing I mentioned is still a noisy failure, for example). ISTM the patch deals with the only significant case that is obviously safe for modes that are not a power of 2 in size. If you're saying that the condition itself is OK, but that the code should be further down in the function, then I don't think that would gain much. We already have early-outs for the simple cases, such as: /* Paradoxical subregs are otherwise valid. */ if (!rknown && offset == 0 && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode)) { info->representable_p = true; /* If this is a big endian paradoxical subreg, which uses more actual hard registers than the original register, we must return a negative offset so that we find the proper highpart of the register. */ if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN) info->offset = nregs_xmode - nregs_ymode; else info->offset = 0; info->nregs = nregs_ymode; return; } [...] /* Lowpart subregs are otherwise valid. */ if (!rknown && offset == subreg_lowpart_offset (ymode, xmode)) { info->representable_p = true; rknown = true; if (offset == 0 || nregs_xmode == nregs_ymode) { info->offset = 0; info->nregs = nregs_ymode; return; } } which also come before the assert. Thanks, Richard
> What do you think we should relax it to though? Obviously there's a balance > here between relaxing things enough and not relaxing them too far (so that > the EImode AArch64 thing I mentioned is still a noisy failure, for > example). ISTM the patch deals with the only significant case that is > obviously safe for modes that are not a power of 2 in size. Apparently the change wants to accept general subregs with not only modes whose sizes are multiple of each other but also whose sizes are multiple of a common large enough value. That clearly goes against: /* This should always pass, otherwise we don't know how to verify the constraint. These conditions may be relaxed but subreg_regno_offset would need to be redesigned. */ gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); gcc_assert ((nregs_xmode % nregs_ymode) == 0); so I think that we should formulate the new requirement and implement it in the main part of the function, instead of adding it as a kludge. > If you're saying that the condition itself is OK, but that the code should > be further down in the function, then I don't think that would gain much. > We already have early-outs for the simple cases, such as: Right, but they are more of special cases and this one is not.
Eric Botcazou <ebotcazou@adacore.com> writes: >> What do you think we should relax it to though? Obviously there's a balance >> here between relaxing things enough and not relaxing them too far (so that >> the EImode AArch64 thing I mentioned is still a noisy failure, for >> example). ISTM the patch deals with the only significant case that is >> obviously safe for modes that are not a power of 2 in size. > > Apparently the change wants to accept general subregs with not only modes > whose sizes are multiple of each other but also whose sizes are multiple of a > common large enough value. That clearly goes against: > > /* This should always pass, otherwise we don't know how to verify > the constraint. These conditions may be relaxed but > subreg_regno_offset would need to be redesigned. */ > gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); > gcc_assert ((nregs_xmode % nregs_ymode) == 0); > > so I think that we should formulate the new requirement and implement it in > the main part of the function, instead of adding it as a kludge. Please be more specific though. If you don't think the patch is correct, what do you think the requirement should be and how should it be integrated into the existing checks? E.g. the assert is there because the main calculation is based on: /* Size of ymode must not be greater than the size of xmode. */ mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode); gcc_assert (mode_multiple != 0); which clearly isn't a useful value if the division isn't exact. Do you mean that, since mode_multiple isn't correct for the DI-of-a-CI case, we should reformulate the end of the function to avoid using mode_multiple at all? Thanks, Richard
> Please be more specific though. If you don't think the patch is correct, > what do you think the requirement should be and how should it be integrated > into the existing checks? Good question, but I have asked it first. :-) So what are the new subregs that we want to accept here? Can someone write this down explicitly, I think that we cannot go ahead without that. > E.g. the assert is there because the main calculation is based on: > > /* Size of ymode must not be greater than the size of xmode. */ > mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode); > gcc_assert (mode_multiple != 0); > > which clearly isn't a useful value if the division isn't exact. > Do you mean that, since mode_multiple isn't correct for the > DI-of-a-CI case, we should reformulate the end of the function > to avoid using mode_multiple at all? Yes.
On 12/12/14 04:07, Alan Hayward wrote: > [Cleaning this thread up to submit patch again, with better explanation] > > This patch causes subreg_get_info() to exit early in the simple cases > where we are extracting a whole register from a multi register. > > In aarch64 for Big Endian we were producing a subreg of a OImode (256bits) > from a CImode (384bits) This would hit the following assert in > subreg_get_info: > > gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0); > > > This is a rule we should be able to relax a little - if the subreg we want > fits into a whole register then this is a valid result and can be easily > detected earlier in the function. > > This has the bonus that we should be slightly reducing the execution time > for more common cases, for example a subreg of 64bits from 256bits. > > This patch is required for the second part of the patch, which is aarch64 > specific, and fixes up aarch64 Big Endian movoi/ci/xi. This second part > has already been approved. > > This patch will apply cleanly by itself and no regressions were seen when > testing aarch64 and x86_64 on make check. > > Cheers, > Alan > > > Changelog: > > 2014-11-14 Alan Hayward <alan.hayward@arm.com> > > * rtlanal.c > (subreg_get_info): Exit early for simple and common cases ChangeLog nit. Should be: * rtlanal.c (subreg_get_info): Exit early for simple and common cases. With ChangeLog fixed, this patch is OK for the trunk. If you have a testcase, please submit it as well. Thanks for your patience, Jeff
Sorry for the slow response. Jeff has approved the patch in the meantime, but I didn't want to go ahead and apply it while there was still disagreement... Eric Botcazou <ebotcazou@adacore.com> writes: >> Please be more specific though. If you don't think the patch is correct, >> what do you think the requirement should be and how should it be integrated >> into the existing checks? > > Good question, but I have asked it first. :-) > > So what are the new subregs that we want to accept here? Can someone write > this down explicitly, I think that we cannot go ahead without that. The idea is that if: (1) we have a non-paradoxical subreg; (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full hard registers (no padding or unused upper bits); (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number of bytes (X) in each constituent hard register; (4) the offset is a multiple of X, i.e. the data we're accessing is aligned to a register boundary; and (5) endianness is regular (no differences between words and bytes, or between registers and memory) then the register offset is always the byte offset divided by X. >> E.g. the assert is there because the main calculation is based on: >> >> /* Size of ymode must not be greater than the size of xmode. */ >> mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode); >> gcc_assert (mode_multiple != 0); >> >> which clearly isn't a useful value if the division isn't exact. >> Do you mean that, since mode_multiple isn't correct for the >> DI-of-a-CI case, we should reformulate the end of the function >> to avoid using mode_multiple at all? > > Yes. It's not really obvious to me how to do that though. Maybe I just don't understand the cases that the existing code is trying to handle well enough. Thanks, Richard
> Sorry for the slow response. Jeff has approved the patch in the > meantime, but I didn't want to go ahead and apply it while there > was still disagreement... I still think that it isn't appropriate to short-circuit the main computation as the patch does, but I don't want to block it after Jeff's approval. > (1) we have a non-paradoxical subreg; > (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full > hard registers (no padding or unused upper bits); > (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number > of bytes (X) in each constituent hard register; > (4) the offset is a multiple of X, i.e. the data we're accessing > is aligned to a register boundary; and > (5) endianness is regular (no differences between words and bytes, > or between registers and memory) OK, that's a nice translation of the new code. :-) It seems to me that the patch wants to extend the support of generic subregs to modes whose sizes are not multiple of each other, which is a requirement of the existing code, but does that in a very specific case for the sake of the ARM port without saying where all the above restrictions come from.
On 01/10/15 06:05, Richard Sandiford wrote: > Sorry for the slow response. Jeff has approved the patch in the > meantime, but I didn't want to go ahead and apply it while there > was still disagreement... Thanks. I didn't realize there was a disagreement when I approved. Let's continue to hash this out a bit in the hopes that we can all get to a place where we're comfortable with the final change, whatever it happens to me. jeff
On 01/13/15 11:55, Eric Botcazou wrote: > >> (1) we have a non-paradoxical subreg; >> (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full >> hard registers (no padding or unused upper bits); >> (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number >> of bytes (X) in each constituent hard register; >> (4) the offset is a multiple of X, i.e. the data we're accessing >> is aligned to a register boundary; and >> (5) endianness is regular (no differences between words and bytes, >> or between registers and memory) > > OK, that's a nice translation of the new code. :-) > > It seems to me that the patch wants to extend the support of generic subregs > to modes whose sizes are not multiple of each other, which is a requirement of > the existing code, but does that in a very specific case for the sake of the > ARM port without saying where all the above restrictions come from. Basically we're lifting the restriction that the the sizes are multiples of each other. The requirements above are the set where we know it will work. They are target independent, but happen to match what the ARM needs. The certainly do short circuit the meat of the function, that's the whole point, there's this set of conditions under which we know this will work and when they hold, we bypass. Now one could argue that instead of bypassing we should put the code to handle this situation further down. I'd be leery of doing that just from a complexity standpoint. But one could also argue that short circuiting like the patch does adds complexity as well and may be a bit kludgy. Maybe the way forward here is for someone to try and integrate this support in the main part of the code and see how it looks. Then we can pick one. The downside is since this probably isn't a regression that work would need to happen quickly to make it into gcc-5. Which leads to another option, get the release managers to sign off on the kludge after gcc-5 branches and only install the kludge on the gcc-5 branch and insisting the other solution go in for gcc-6 and beyond. Not sure if they'd do that, but it's a discussion that could happen. jeff
On 14 January 2015 at 07:35, Jeff Law <law@redhat.com> wrote: > On 01/13/15 11:55, Eric Botcazou wrote: >> >> >>> (1) we have a non-paradoxical subreg; >>> (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full >>> hard registers (no padding or unused upper bits); >>> (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number >>> of bytes (X) in each constituent hard register; >>> (4) the offset is a multiple of X, i.e. the data we're accessing >>> is aligned to a register boundary; and >>> (5) endianness is regular (no differences between words and bytes, >>> or between registers and memory) >> >> >> OK, that's a nice translation of the new code. :-) >> >> It seems to me that the patch wants to extend the support of generic >> subregs >> to modes whose sizes are not multiple of each other, which is a >> requirement of >> the existing code, but does that in a very specific case for the sake of >> the >> ARM port without saying where all the above restrictions come from. > > Basically we're lifting the restriction that the the sizes are multiples of > each other. The requirements above are the set where we know it will work. > They are target independent, but happen to match what the ARM needs. > > The certainly do short circuit the meat of the function, that's the whole > point, there's this set of conditions under which we know this will work and > when they hold, we bypass. > > Now one could argue that instead of bypassing we should put the code to > handle this situation further down. I'd be leery of doing that just from a > complexity standpoint. But one could also argue that short circuiting like > the patch does adds complexity as well and may be a bit kludgy. > > Maybe the way forward here is for someone to try and integrate this support > in the main part of the code and see how it looks. Then we can pick one. > > The downside is since this probably isn't a regression that work would need > to happen quickly to make it into gcc-5. > > Which leads to another option, get the release managers to sign off on the > kludge after gcc-5 branches and only install the kludge on the gcc-5 branch > and insisting the other solution go in for gcc-6 and beyond. Not sure if > they'd do that, but it's a discussion that could happen. This issue is currently gating a number of patches that get big endian working on aarch64 (all of which are on the list), it would be good if we could get this addressed in some form in gcc-5 rather than put out a second release of gcc with borked BE aarch64 support. Cheers /Marcus
Seems like the thread might have died down, so just wanted to ping it. As Marcus says, this is holding up other patches so it'd be good to get something in soon. Would it be OK to commit the original patch or should we wait? Marcus Shawcroft <marcus.shawcroft@gmail.com> writes: > On 14 January 2015 at 07:35, Jeff Law <law@redhat.com> wrote: >> On 01/13/15 11:55, Eric Botcazou wrote: >>> >>> >>>> (1) we have a non-paradoxical subreg; >>>> (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full >>>> hard registers (no padding or unused upper bits); >>>> (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number >>>> of bytes (X) in each constituent hard register; >>>> (4) the offset is a multiple of X, i.e. the data we're accessing >>>> is aligned to a register boundary; and >>>> (5) endianness is regular (no differences between words and bytes, >>>> or between registers and memory) >>> >>> >>> OK, that's a nice translation of the new code. :-) >>> >>> It seems to me that the patch wants to extend the support of generic >>> subregs >>> to modes whose sizes are not multiple of each other, which is a >>> requirement of >>> the existing code, but does that in a very specific case for the sake of >>> the >>> ARM port without saying where all the above restrictions come from. >> >> Basically we're lifting the restriction that the the sizes are multiples of >> each other. The requirements above are the set where we know it will work. >> They are target independent, but happen to match what the ARM needs. >> >> The certainly do short circuit the meat of the function, that's the whole >> point, there's this set of conditions under which we know this will work and >> when they hold, we bypass. >> >> Now one could argue that instead of bypassing we should put the code to >> handle this situation further down. I'd be leery of doing that just from a >> complexity standpoint. But one could also argue that short circuiting like >> the patch does adds complexity as well and may be a bit kludgy. Yeah, I'm worried about the complexity too. We allow subregs that have padding and subregs where the number of bytes in the mode doesn't divide equally between the number of registers. We also have subregs where a DImode value in R can take a different number of registers from a DFmode value in R, despite the two modes having the same number of bits. I've no idea how we'd generalise the code so that those cases and the new one just fall out as particular inputs to an overarching equation. Or how we make sure that the equation doesn't give nonsense results for cases that would be better off triggering an abort (e.g. DFmode subregs of CImode when DImode and DFmode occupy different numbers of registers). I don't think we want to allow subregs in all cases where there is padding. We hit a similar case with 8-byte subregs of 24-byte values stored in 16-byte registers (DImode, EImode and TImode respectively). That doesn't do what we want because all three DImode pieces of the EImode aren't independently addressable, so the abort actually helped. TBH I find even the current code too hard to understand. I can plug specific inputs in and follow what happens, but I don't have a feel for why that's the right way of handling all possible inputs. In some ways I think we've made life hard for ourselves by trying to implement all these rules in a target-independent way. Subregs on memory are easy (even though we should generally be avoiding them :-): they start SUBREG_BYTE bytes into the MEM and occupy the number of bytes in the outer mode. At least AFAIK, we never have situation where an N-bit float can occupy a different number of memory bytes from an N-bit integer. But treating REGs as an image of memory (which I think is effectively what we're doing) has caused problems. As well as being complicated, doing things this way is pretty restrictive. One of the main uses of CANNOT_CHANGE_MODE_CLASS seems to be to work around cases where the generic rules get it wrong. Sometimes it seems like it would be better to let the target define which subregs it can form and on which registers. It would be less complicated, more general, nd (in cases where it allows C_C_M_C to be removed) hopefully more optimal. >> Maybe the way forward here is for someone to try and integrate this support >> in the main part of the code and see how it looks. Then we can pick one. I still have a mental block on how to do that :-) >> The downside is since this probably isn't a regression that work would need >> to happen quickly to make it into gcc-5. >> >> Which leads to another option, get the release managers to sign off on the >> kludge after gcc-5 branches and only install the kludge on the gcc-5 branch >> and insisting the other solution go in for gcc-6 and beyond. Not sure if >> they'd do that, but it's a discussion that could happen. > > This issue is currently gating a number of patches that get big endian > working on aarch64 (all of which are on the list), it would be good if > we could get this addressed in some form in gcc-5 rather than put out > a second release of gcc with borked BE aarch64 support. > > Cheers > /Marcus Thanks, Richard
> Seems like the thread might have died down, so just wanted to ping it. > As Marcus says, this is holding up other patches so it'd be good to get > something in soon. Would it be OK to commit the original patch or should > we wait? Yes, go ahead, but add a FIXME or ??? comment.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index c9bf69c..a3f7b78 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -3561,6 +3561,19 @@ subreg_get_info (unsigned int xregno, machine_mode xmode, info->offset = offset / regsize_xmode; return; } + /* Quick exit for the simple and common case of extracting whole + subregisters from a multiregister value. */ + if (!rknown + && WORDS_BIG_ENDIAN == REG_WORDS_BIG_ENDIAN + && regsize_xmode == regsize_ymode + && (offset % regsize_ymode) == 0) + { + info->representable_p = true; + info->nregs = nregs_ymode; + info->offset = offset / regsize_ymode; + gcc_assert (info->offset + info->nregs <= nregs_xmode); + return; + } } /* Lowpart subregs are otherwise valid. */