diff mbox

[RTL] Eliminate redundant vec_select moves.

Message ID CAMe9rOr2QC+G_fBOZ7FwAhXC+u1ZyseSSKn196b7LbSjCjk5cg@mail.gmail.com
State New
Headers show

Commit Message

H.J. Lu Dec. 11, 2013, 1:10 p.m. UTC
On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Henderson <rth@redhat.com> writes:
>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>> a single register.  On a little-endian target, the offset cannot be
>>> anything other than 0 in that case.
>>>
>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>> something that is always invalid, regardless of the target.  That kind
>>> of situation should be rejected by target-independent code instead.
>>
>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>> will be allocated to a single hard register.  That is something that we can't
>> know in target-independent code before register allocation.
>
> I was thinking that if we've got a class, we've also got things like
> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
> But even in the padding cases an offset-based check in C_C_M_C could
> be derived from other information.
>
> subreg_get_info handles padding with:
>
>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>       if (GET_MODE_INNER (xmode) == VOIDmode)
>         xmode_unit = xmode;
>       else
>         xmode_unit = GET_MODE_INNER (xmode);
>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>       gcc_assert (nregs_xmode
>                   == (GET_MODE_NUNITS (xmode)
>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>       gcc_assert (hard_regno_nregs[xregno][xmode]
>                   == (hard_regno_nregs[xregno][xmode_unit]
>                       * GET_MODE_NUNITS (xmode)));
>
>       /* You can only ask for a SUBREG of a value with holes in the middle
>          if you don't cross the holes.  (Such a SUBREG should be done by
>          picking a different register class, or doing it in memory if
>          necessary.)  An example of a value with holes is XCmode on 32-bit
>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>          3 for each part, but in memory it's two 128-bit parts.
>          Padding is assumed to be at the end (not necessarily the 'high part')
>          of each unit.  */
>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>            < GET_MODE_NUNITS (xmode))
>           && (offset / GET_MODE_SIZE (xmode_unit)
>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>                   / GET_MODE_SIZE (xmode_unit))))
>         {
>           info->representable_p = false;
>           rknown = true;
>         }
>
> and I wouldn't really want to force targets to individually reproduce
> that kind of logic at the class level.  If the worst comes to the worst
> we could cache the difficult cases.
>

My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
to know if the subreg byte is zero or not.  It doesn't care about mode
padding.  You are concerned about information passed to
CANNOT_CHANGE_MODE_CLASS is too expensive for target
to process.  It isn't the case for x86.  Am I correct that mode can't change
if subreg byte is non-zero?  A target can just check subreg byte != 0,
like my patch does.

Here is a patch to add SUBREG_BYTE to CANNOT_CHANGE_MODE_CLASS.
Tested on Linux/x86-64.  Does it look OK?

Thanks.

Comments

Richard Sandiford Dec. 11, 2013, 3:49 p.m. UTC | #1
"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Henderson <rth@redhat.com> writes:
>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>> a single register.  On a little-endian target, the offset cannot be
>>>> anything other than 0 in that case.
>>>>
>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>> something that is always invalid, regardless of the target.  That kind
>>>> of situation should be rejected by target-independent code instead.
>>>
>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>> will be allocated to a single hard register.  That is something that we can't
>>> know in target-independent code before register allocation.
>>
>> I was thinking that if we've got a class, we've also got things like
>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>> But even in the padding cases an offset-based check in C_C_M_C could
>> be derived from other information.
>>
>> subreg_get_info handles padding with:
>>
>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>         xmode_unit = xmode;
>>       else
>>         xmode_unit = GET_MODE_INNER (xmode);
>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>       gcc_assert (nregs_xmode
>>                   == (GET_MODE_NUNITS (xmode)
>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>                       * GET_MODE_NUNITS (xmode)));
>>
>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>          picking a different register class, or doing it in memory if
>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>          3 for each part, but in memory it's two 128-bit parts.
>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>          of each unit.  */
>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>            < GET_MODE_NUNITS (xmode))
>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>                   / GET_MODE_SIZE (xmode_unit))))
>>         {
>>           info->representable_p = false;
>>           rknown = true;
>>         }
>>
>> and I wouldn't really want to force targets to individually reproduce
>> that kind of logic at the class level.  If the worst comes to the worst
>> we could cache the difficult cases.
>>
>
> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
> to know if the subreg byte is zero or not.  It doesn't care about mode
> padding.  You are concerned about information passed to
> CANNOT_CHANGE_MODE_CLASS is too expensive for target
> to process.  It isn't the case for x86.

No, I'm concerned that by going this route, we're forcing every target
(or at least every target with wider-than-word registers, which is most
of the common ones) to implement the same target-independent restriction.
This is not an x86-specific issue.

Thanks,
Richard
H.J. Lu Dec. 11, 2013, 4:09 p.m. UTC | #2
On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Henderson <rth@redhat.com> writes:
>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>> anything other than 0 in that case.
>>>>>
>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>> something that is always invalid, regardless of the target.  That kind
>>>>> of situation should be rejected by target-independent code instead.
>>>>
>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>> will be allocated to a single hard register.  That is something that we can't
>>>> know in target-independent code before register allocation.
>>>
>>> I was thinking that if we've got a class, we've also got things like
>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>> But even in the padding cases an offset-based check in C_C_M_C could
>>> be derived from other information.
>>>
>>> subreg_get_info handles padding with:
>>>
>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>         xmode_unit = xmode;
>>>       else
>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>       gcc_assert (nregs_xmode
>>>                   == (GET_MODE_NUNITS (xmode)
>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>                       * GET_MODE_NUNITS (xmode)));
>>>
>>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>>          picking a different register class, or doing it in memory if
>>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>>          3 for each part, but in memory it's two 128-bit parts.
>>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>>          of each unit.  */
>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>            < GET_MODE_NUNITS (xmode))
>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>         {
>>>           info->representable_p = false;
>>>           rknown = true;
>>>         }
>>>
>>> and I wouldn't really want to force targets to individually reproduce
>>> that kind of logic at the class level.  If the worst comes to the worst
>>> we could cache the difficult cases.
>>>
>>
>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>> to know if the subreg byte is zero or not.  It doesn't care about mode
>> padding.  You are concerned about information passed to
>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>> to process.  It isn't the case for x86.
>
> No, I'm concerned that by going this route, we're forcing every target
> (or at least every target with wider-than-word registers, which is most
> of the common ones) to implement the same target-independent restriction.
> This is not an x86-specific issue.
>

So you prefer a generic solution which makes
CANNOT_CHANGE_MODE_CLASS return true
for vector mode subreg if subreg byte != 0. Is this
correct?

Thanks.
Tejas Belagod Dec. 11, 2013, 4:26 p.m. UTC | #3
H.J. Lu wrote:
> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Henderson <rth@redhat.com> writes:
>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>> anything other than 0 in that case.
>>>>>>
>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>> something that is always invalid, regardless of the target.  That kind
>>>>>> of situation should be rejected by target-independent code instead.
>>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>>> will be allocated to a single hard register.  That is something that we can't
>>>>> know in target-independent code before register allocation.
>>>> I was thinking that if we've got a class, we've also got things like
>>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>>> But even in the padding cases an offset-based check in C_C_M_C could
>>>> be derived from other information.
>>>>
>>>> subreg_get_info handles padding with:
>>>>
>>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>>         xmode_unit = xmode;
>>>>       else
>>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>>       gcc_assert (nregs_xmode
>>>>                   == (GET_MODE_NUNITS (xmode)
>>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>>                       * GET_MODE_NUNITS (xmode)));
>>>>
>>>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>>>          picking a different register class, or doing it in memory if
>>>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>>>          3 for each part, but in memory it's two 128-bit parts.
>>>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>>>          of each unit.  */
>>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>>            < GET_MODE_NUNITS (xmode))
>>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>>         {
>>>>           info->representable_p = false;
>>>>           rknown = true;
>>>>         }
>>>>
>>>> and I wouldn't really want to force targets to individually reproduce
>>>> that kind of logic at the class level.  If the worst comes to the worst
>>>> we could cache the difficult cases.
>>>>
>>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>>> to know if the subreg byte is zero or not.  It doesn't care about mode
>>> padding.  You are concerned about information passed to
>>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>>> to process.  It isn't the case for x86.
>> No, I'm concerned that by going this route, we're forcing every target
>> (or at least every target with wider-than-word registers, which is most
>> of the common ones) to implement the same target-independent restriction.
>> This is not an x86-specific issue.
>>
> 
> So you prefer a generic solution which makes
> CANNOT_CHANGE_MODE_CLASS return true
> for vector mode subreg if subreg byte != 0. Is this
> correct?

Do you mean a generic solution for C_C_M_C to return true for non-zero 
byte_offset vector subregs in the context of x86?

I want to clarify because in the context of 32-bit ARM little-endian, a non-zero 
byte-offset vector subreg is still a valid full hardreg. eg. for

    (subreg:DI (reg:V4SF) 8)

C_C_M_C can return 'false' as this can be resolved to a full D-reg.

Thanks,
Tejas.
H.J. Lu Dec. 11, 2013, 4:34 p.m. UTC | #4
On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> H.J. Lu wrote:
>>
>> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>>
>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>
>>>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>>
>>>>> Richard Henderson <rth@redhat.com> writes:
>>>>>>
>>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>>>
>>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>>> anything other than 0 in that case.
>>>>>>>
>>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>>> something that is always invalid, regardless of the target.  That
>>>>>>> kind
>>>>>>> of situation should be rejected by target-independent code instead.
>>>>>>
>>>>>> But, we want to disable the subreg before we know whether or not
>>>>>> (reg:V4SF X)
>>>>>> will be allocated to a single hard register.  That is something that
>>>>>> we can't
>>>>>> know in target-independent code before register allocation.
>>>>>
>>>>> I was thinking that if we've got a class, we've also got things like
>>>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>>>> But even in the padding cases an offset-based check in C_C_M_C could
>>>>> be derived from other information.
>>>>>
>>>>> subreg_get_info handles padding with:
>>>>>
>>>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>>>         xmode_unit = xmode;
>>>>>       else
>>>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>>>       gcc_assert (nregs_xmode
>>>>>                   == (GET_MODE_NUNITS (xmode)
>>>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno,
>>>>> xmode_unit)));
>>>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>>>                       * GET_MODE_NUNITS (xmode)));
>>>>>
>>>>>       /* You can only ask for a SUBREG of a value with holes in the
>>>>> middle
>>>>>          if you don't cross the holes.  (Such a SUBREG should be done
>>>>> by
>>>>>          picking a different register class, or doing it in memory if
>>>>>          necessary.)  An example of a value with holes is XCmode on
>>>>> 32-bit
>>>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit
>>>>> registers,
>>>>>          3 for each part, but in memory it's two 128-bit parts.
>>>>>          Padding is assumed to be at the end (not necessarily the 'high
>>>>> part')
>>>>>          of each unit.  */
>>>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>>>            < GET_MODE_NUNITS (xmode))
>>>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>>>         {
>>>>>           info->representable_p = false;
>>>>>           rknown = true;
>>>>>         }
>>>>>
>>>>> and I wouldn't really want to force targets to individually reproduce
>>>>> that kind of logic at the class level.  If the worst comes to the worst
>>>>> we could cache the difficult cases.
>>>>>
>>>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>>>> to know if the subreg byte is zero or not.  It doesn't care about mode
>>>> padding.  You are concerned about information passed to
>>>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>>>> to process.  It isn't the case for x86.
>>>
>>> No, I'm concerned that by going this route, we're forcing every target
>>> (or at least every target with wider-than-word registers, which is most
>>> of the common ones) to implement the same target-independent restriction.
>>> This is not an x86-specific issue.
>>>
>>
>> So you prefer a generic solution which makes
>> CANNOT_CHANGE_MODE_CLASS return true
>> for vector mode subreg if subreg byte != 0. Is this
>> correct?
>
>
> Do you mean a generic solution for C_C_M_C to return true for non-zero
> byte_offset vector subregs in the context of x86?
>
> I want to clarify because in the context of 32-bit ARM little-endian, a
> non-zero byte-offset vector subreg is still a valid full hardreg. eg. for
>
>    (subreg:DI (reg:V4SF) 8)
>
> C_C_M_C can return 'false' as this can be resolved to a full D-reg.
>

Does that mean subreg byte interpretation is endian-dependent?
Both llittle endian

subreg:DI (reg:V4SF) 0)

and big endian

subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)

refer to the same lower 64 bits of reg:V4SF.  Is this correct?
Tejas Belagod Dec. 11, 2013, 4:45 p.m. UTC | #5
H.J. Lu wrote:
> On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>> H.J. Lu wrote:
>>> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Richard Henderson <rth@redhat.com> writes:
>>>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>>>> anything other than 0 in that case.
>>>>>>>>
>>>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>>>> something that is always invalid, regardless of the target.  That
>>>>>>>> kind
>>>>>>>> of situation should be rejected by target-independent code instead.
>>>>>>> But, we want to disable the subreg before we know whether or not
>>>>>>> (reg:V4SF X)
>>>>>>> will be allocated to a single hard register.  That is something that
>>>>>>> we can't
>>>>>>> know in target-independent code before register allocation.
>>>>>> I was thinking that if we've got a class, we've also got things like
>>>>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>>>>> But even in the padding cases an offset-based check in C_C_M_C could
>>>>>> be derived from other information.
>>>>>>
>>>>>> subreg_get_info handles padding with:
>>>>>>
>>>>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>>>>         xmode_unit = xmode;
>>>>>>       else
>>>>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>>>>       gcc_assert (nregs_xmode
>>>>>>                   == (GET_MODE_NUNITS (xmode)
>>>>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno,
>>>>>> xmode_unit)));
>>>>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>>>>                       * GET_MODE_NUNITS (xmode)));
>>>>>>
>>>>>>       /* You can only ask for a SUBREG of a value with holes in the
>>>>>> middle
>>>>>>          if you don't cross the holes.  (Such a SUBREG should be done
>>>>>> by
>>>>>>          picking a different register class, or doing it in memory if
>>>>>>          necessary.)  An example of a value with holes is XCmode on
>>>>>> 32-bit
>>>>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit
>>>>>> registers,
>>>>>>          3 for each part, but in memory it's two 128-bit parts.
>>>>>>          Padding is assumed to be at the end (not necessarily the 'high
>>>>>> part')
>>>>>>          of each unit.  */
>>>>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>>>>            < GET_MODE_NUNITS (xmode))
>>>>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>>>>         {
>>>>>>           info->representable_p = false;
>>>>>>           rknown = true;
>>>>>>         }
>>>>>>
>>>>>> and I wouldn't really want to force targets to individually reproduce
>>>>>> that kind of logic at the class level.  If the worst comes to the worst
>>>>>> we could cache the difficult cases.
>>>>>>
>>>>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>>>>> to know if the subreg byte is zero or not.  It doesn't care about mode
>>>>> padding.  You are concerned about information passed to
>>>>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>>>>> to process.  It isn't the case for x86.
>>>> No, I'm concerned that by going this route, we're forcing every target
>>>> (or at least every target with wider-than-word registers, which is most
>>>> of the common ones) to implement the same target-independent restriction.
>>>> This is not an x86-specific issue.
>>>>
>>> So you prefer a generic solution which makes
>>> CANNOT_CHANGE_MODE_CLASS return true
>>> for vector mode subreg if subreg byte != 0. Is this
>>> correct?
>>
>> Do you mean a generic solution for C_C_M_C to return true for non-zero
>> byte_offset vector subregs in the context of x86?
>>
>> I want to clarify because in the context of 32-bit ARM little-endian, a
>> non-zero byte-offset vector subreg is still a valid full hardreg. eg. for
>>
>>    (subreg:DI (reg:V4SF) 8)
>>
>> C_C_M_C can return 'false' as this can be resolved to a full D-reg.
>>
> 
> Does that mean subreg byte interpretation is endian-dependent?
> Both llittle endian
> 
> subreg:DI (reg:V4SF) 0)
> 
> and big endian
> 
> subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)
> 
> refer to the same lower 64 bits of reg:V4SF.  Is this correct?
> 

If my understanding of endianness representation in RTL registers is correct, yes.

I said little-endian because C_C_M_C is currently gated on TARGET_BIG_ENDIAN in 
arm.h.

Thanks,
Tejas.
H.J. Lu Dec. 14, 2013, 4:32 p.m. UTC | #6
On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Henderson <rth@redhat.com> writes:
>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>> anything other than 0 in that case.
>>>>>
>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>> something that is always invalid, regardless of the target.  That kind
>>>>> of situation should be rejected by target-independent code instead.
>>>>
>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>> will be allocated to a single hard register.  That is something that we can't
>>>> know in target-independent code before register allocation.
>>>
>>> I was thinking that if we've got a class, we've also got things like
>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>> But even in the padding cases an offset-based check in C_C_M_C could
>>> be derived from other information.
>>>
>>> subreg_get_info handles padding with:
>>>
>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>         xmode_unit = xmode;
>>>       else
>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>       gcc_assert (nregs_xmode
>>>                   == (GET_MODE_NUNITS (xmode)
>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>                       * GET_MODE_NUNITS (xmode)));
>>>
>>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>>          picking a different register class, or doing it in memory if
>>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>>          3 for each part, but in memory it's two 128-bit parts.
>>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>>          of each unit.  */
>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>            < GET_MODE_NUNITS (xmode))
>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>         {
>>>           info->representable_p = false;
>>>           rknown = true;
>>>         }
>>>
>>> and I wouldn't really want to force targets to individually reproduce
>>> that kind of logic at the class level.  If the worst comes to the worst
>>> we could cache the difficult cases.
>>>
>>
>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>> to know if the subreg byte is zero or not.  It doesn't care about mode
>> padding.  You are concerned about information passed to
>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>> to process.  It isn't the case for x86.
>
> No, I'm concerned that by going this route, we're forcing every target
> (or at least every target with wider-than-word registers, which is most
> of the common ones) to implement the same target-independent restriction.
> This is not an x86-specific issue.
>

It may not be x86 specific. However, the decision is made
based on enum reg_class:

/* Return true if the registers in CLASS cannot represent the change from
   modes FROM at offset SUBREG_BYTE to TO.  */

bool
ix86_cannot_change_mode_class (enum machine_mode from,
                               unsigned int subreg_byte,
                               enum machine_mode to,
                               enum reg_class regclass)
{
  if (from == to)
    return false;

  /* x87 registers can't do subreg at all, as all values are reformatted
     to extended precision.  */
  if (MAYBE_FLOAT_CLASS_P (regclass))
    return true;

  if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
    {
      /* Vector registers do not support QI or HImode loads.  If we don't
         disallow a change to these modes, reload will assume it's ok to
         drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
         the vec_dupv4hi pattern.  */
      if (GET_MODE_SIZE (from) < 4)
        return true;

      /* Vector registers do not support subreg with nonzero offsets, which
         are otherwise valid for integer registers.  */
      if (subreg_byte != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
        return true;
    }

  return false;
}

We check subreg_byte only for SSE or MMX register classes.
We could add a target-independent hook or add subreg_byte to
CANNOT_CHANGE_MODE_CLASS like my patch does.
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index dea6c28..8e3b962 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5084,6 +5084,7 @@  subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		      && REGNO (to) < FIRST_PSEUDO_REGISTER
 		      && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
 						   GET_MODE (to),
+						   SUBREG_BYTE (x),
 						   GET_MODE (x)))
 		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
 #endif
@@ -6450,6 +6451,7 @@  simplify_set (rtx x)
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
+					 SUBREG_BYTE (src),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..7eac69a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -820,9 +820,9 @@  do {									     \
 
 /*  VFP registers may only be accessed in the mode they
    were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
-   ? reg_classes_intersect_p (FP_REGS, (CLASS))		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
+   ? reg_classes_intersect_p (FP_REGS, (CLASS))			\
    : 0)
 
 
diff --git a/gcc/config/alpha/alpha.h b/gcc/config/alpha/alpha.h
index 2e7c078..a183a44 100644
--- a/gcc/config/alpha/alpha.h
+++ b/gcc/config/alpha/alpha.h
@@ -541,7 +541,7 @@  enum reg_class {
 
 /* Return the class of registers that cannot change mode from FROM to TO.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
    ? reg_classes_intersect_p (FLOAT_REGS, CLASS) : 0)
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b8b80e..f761a3b 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1247,10 +1247,10 @@  enum reg_class
    In big-endian mode, modes greater than word size (i.e. DFmode) are stored in
    VFP registers in little-endian order.  We can't describe that accurately to
    GCC, so avoid taking subregs of such values.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (TARGET_VFP && TARGET_BIG_END				\
-   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD		\
-       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
+  (TARGET_VFP && TARGET_BIG_END					\
+   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD			\
+       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)			\
    && reg_classes_intersect_p (VFP_REGS, (CLASS)))
 
 /* The class value for index registers, and the one for base regs.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 73feef2..0cbb9ae 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -167,7 +167,9 @@  extern bool ix86_modes_tieable_p (enum machine_mode, enum machine_mode);
 extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 					  enum machine_mode, int);
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
-					   enum machine_mode, enum reg_class);
+					   unsigned int,
+					   enum machine_mode,
+					   enum reg_class);
 
 extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cdd63e5..68628ab 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35003,10 +35003,12 @@  ix86_class_max_nregs (reg_class_t rclass, enum machine_mode mode)
 }
 
 /* Return true if the registers in CLASS cannot represent the change from
-   modes FROM to TO.  */
+   modes FROM at offset SUBREG_BYTE to TO.  */
 
 bool
-ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
+ix86_cannot_change_mode_class (enum machine_mode from,
+			       unsigned int subreg_byte,
+			       enum machine_mode to,
 			       enum reg_class regclass)
 {
   if (from == to)
@@ -35027,10 +35029,8 @@  ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
 	return true;
 
       /* Vector registers do not support subreg with nonzero offsets, which
-	 are otherwise valid for integer registers.  Since we can't see
-	 whether we have a nonzero offset from here, prohibit all
-         nonparadoxical subregs changing size.  */
-      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
+	 are otherwise valid for integer registers.  */
+      if (subreg_byte != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
 	return true;
     }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7efd1e0..d43dcbd 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1522,10 +1522,11 @@  enum reg_class
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
-/* Return a class of registers that cannot change FROM mode to TO mode.  */
+/* Return a class of registers that cannot change FROM mode to TO mode
+   with SUBREG_BYTE.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
-  ix86_cannot_change_mode_class (FROM, TO, CLASS)
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
+  ix86_cannot_change_mode_class (FROM, SUBREG_BYTE, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
 
diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
index ae9027c..d3aca62 100644
--- a/gcc/config/ia64/ia64.h
+++ b/gcc/config/ia64/ia64.h
@@ -856,7 +856,7 @@  enum reg_class
    In FP regs, we can't change FP values to integer values and vice versa,
    but we can change e.g. DImode to SImode, and V2SFmode into DImode.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) 		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) 	\
   (reg_classes_intersect_p (CLASS, BR_REGS)			\
    ? (FROM) != (TO)						\
    : (SCALAR_FLOAT_MODE_P (FROM) != SCALAR_FLOAT_MODE_P (TO)	\
diff --git a/gcc/config/m32c/m32c.h b/gcc/config/m32c/m32c.h
index 3ceb093..497a743 100644
--- a/gcc/config/m32c/m32c.h
+++ b/gcc/config/m32c/m32c.h
@@ -415,7 +415,7 @@  enum reg_class
 
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
 
-#define CANNOT_CHANGE_MODE_CLASS(F,T,C) m32c_cannot_change_mode_class(F,T,C)
+#define CANNOT_CHANGE_MODE_CLASS(F,O,T,C) m32c_cannot_change_mode_class(F,T,C)
 
 /* STACK AND CALLING */
 
diff --git a/gcc/config/mep/mep.h b/gcc/config/mep/mep.h
index 023d73c..01bd3cd 100644
--- a/gcc/config/mep/mep.h
+++ b/gcc/config/mep/mep.h
@@ -321,7 +321,7 @@  extern char mep_leaf_registers[];
 
 #define MODES_TIEABLE_P(MODE1, MODE2) 1
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   mep_cannot_change_mode_class (FROM, TO, CLASS)
 
 enum reg_class
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 021419c..ec5e2af 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2104,7 +2104,7 @@  enum reg_class
 
 #define CLASS_MAX_NREGS(CLASS, MODE) mips_class_max_nregs (CLASS, MODE)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   mips_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 953c638..c4cb0fd 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -394,11 +394,11 @@  typedef struct
   ((TARGET_LARGE && ((NREGS) <= 2)) ? PSImode : choose_hard_reg_mode ((REGNO), (NREGS), false))
 
 /* Also stop GCC from thinking that it can eliminate (SUBREG:PSI (SI)).  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM,TO,CLASS) \
-  (   ((TO) == PSImode && (FROM) == SImode)	\
-   || ((TO) == SImode  && (FROM) == PSImode)    \
-   || ((TO) == DImode  && (FROM) == PSImode)    \
-   || ((TO) == PSImode && (FROM) == DImode)     \
+#define CANNOT_CHANGE_MODE_CLASS(FROM,SUBREG_BYTE,TO,CLASS) \
+  (   ((TO) == PSImode && (FROM) == SImode)		    \
+   || ((TO) == SImode  && (FROM) == PSImode)		    \
+   || ((TO) == DImode  && (FROM) == PSImode)		    \
+   || ((TO) == PSImode && (FROM) == DImode)		    \
       )
 
 #define ACCUMULATE_OUTGOING_ARGS 1
diff --git a/gcc/config/pa/pa32-regs.h b/gcc/config/pa/pa32-regs.h
index 098e9ba..83681aa 100644
--- a/gcc/config/pa/pa32-regs.h
+++ b/gcc/config/pa/pa32-regs.h
@@ -296,7 +296,7 @@  enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pa/pa64-regs.h b/gcc/config/pa/pa64-regs.h
index 002520a..583ffa3 100644
--- a/gcc/config/pa/pa64-regs.h
+++ b/gcc/config/pa/pa64-regs.h
@@ -232,7 +232,7 @@  enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index d4bc19a..33d0f9f 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -282,7 +282,7 @@  enum reg_class { NO_REGS, MUL_REGS, GENERAL_REGS, LOAD_FPU_REGS, NO_LOAD_FPU_REG
   1									\
 )
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   pdp11_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb59235..b88209a 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1505,7 +1505,7 @@  extern enum reg_class rs6000_constraints[RS6000_CONSTRAINT_MAX];
 
 /* Return nonzero if for CLASS a mode change from FROM to TO is invalid.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)			\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)		\
   rs6000_cannot_change_mode_class_ptr (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index bca18fe..a947836 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -419,7 +419,7 @@  enum processor_flags
    cannot use SUBREGs to switch between modes in FP registers.
    Likewise for access registers, since they have only half the
    word size on 64-bit.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	        \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			        \
    ? ((reg_classes_intersect_p (FP_REGS, CLASS)				\
        && (GET_MODE_SIZE (FROM) < 8 || GET_MODE_SIZE (TO) < 8))		\
diff --git a/gcc/config/score/score.h b/gcc/config/score/score.h
index ca73401..d5ca021 100644
--- a/gcc/config/score/score.h
+++ b/gcc/config/score/score.h
@@ -414,8 +414,8 @@  enum reg_class
 #define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X) \
   score_secondary_reload_class (CLASS, MODE, X)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)    \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		       \
    ? reg_classes_intersect_p (HI_REG, (CLASS)) : 0)
 
 
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 9f07012..1a4c9e8 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1149,7 +1149,7 @@  extern enum reg_class regno_reg_class[FIRST_PSEUDO_REGISTER];
    operand of a SUBREG that changes the mode of the object illegally.
    ??? We need to renumber the internal numbers for the frnn registers
    when in little endian in order to allow mode size changes.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   sh_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 7533e88..e3d9db8 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -912,7 +912,7 @@  extern enum reg_class sparc_regno_reg_class[FIRST_PSEUDO_REGISTER];
    Likewise for SFmode, since word-mode paradoxical subregs are
    problematic on big-endian architectures.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
   (TARGET_ARCH64						\
    && GET_MODE_SIZE (FROM) == 4					\
    && GET_MODE_SIZE (TO) != 4					\
diff --git a/gcc/config/spu/spu.h b/gcc/config/spu/spu.h
index 64a2ba0..d0be0e3 100644
--- a/gcc/config/spu/spu.h
+++ b/gcc/config/spu/spu.h
@@ -226,7 +226,7 @@  enum reg_class {
 
 /* GCC assumes that modes are in the lowpart of a register, which is
    only true for SPU. */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
         ((GET_MODE_SIZE (FROM) > 4 || GET_MODE_SIZE (TO) > 4) \
 	 && (GET_MODE_SIZE (FROM) < 16 || GET_MODE_SIZE (TO) < 16) \
 	 && GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO))
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 84c0444..7bc37a8 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -1968,11 +1968,12 @@  value @samp{(reg:HI 4)}.
 @cindex @code{CANNOT_CHANGE_MODE_CLASS} and subreg semantics
 The rules above apply to both pseudo @var{reg}s and hard @var{reg}s.
 If the semantics are not correct for particular combinations of
-@var{m1}, @var{m2} and hard @var{reg}, the target-specific code
-must ensure that those combinations are never used.  For example:
+@var{m1}, @var{subreg_byte}, @var{m2} and hard @var{reg}, the
+target-specific code must ensure that those combinations are never used.
+For example:
 
 @smallexample
-CANNOT_CHANGE_MODE_CLASS (@var{m2}, @var{m1}, @var{class})
+CANNOT_CHANGE_MODE_CLASS (@var{m2}, @var{subreg_byte}, @var{m1}, @var{class})
 @end smallexample
 
 must be true for every class @var{class} that includes @var{reg}.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c4ecd99..f49aefb 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2885,9 +2885,11 @@  This macro helps control the handling of multiple-word values
 in the reload pass.
 @end defmac
 
-@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{to}, @var{class})
+@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{subreg_byte}, @var{to}, @var{class})
 If defined, a C expression that returns nonzero for a @var{class} for which
-a change from mode @var{from} to mode @var{to} is invalid.
+a change from mode @var{from} at the @code{subreg} offset @var{subreg_byte}
+to mode @var{to} is invalid.  If the @code{subreg} offset is unknown, the
+size of the largest mode on the target should be used.
 
 For the example, loading 32-bit integer or floating-point objects into
 floating-point registers on the Alpha extends them to 64 bits.
@@ -2897,7 +2899,7 @@  register.  Therefore, @file{alpha.h} defines @code{CANNOT_CHANGE_MODE_CLASS}
 as below:
 
 @smallexample
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
    ? reg_classes_intersect_p (FLOAT_REGS, (CLASS)) : 0)
 @end smallexample
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 7e459eb..ca7f374 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2526,9 +2526,11 @@  This macro helps control the handling of multiple-word values
 in the reload pass.
 @end defmac
 
-@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{to}, @var{class})
+@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{subreg_byte}, @var{to}, @var{class})
 If defined, a C expression that returns nonzero for a @var{class} for which
-a change from mode @var{from} to mode @var{to} is invalid.
+a change from mode @var{from} at the @code{subreg} offset @var{subreg_byte}
+to mode @var{to} is invalid.  If the @code{subreg} offset is unknown, the
+size of the largest mode on the target should be used.
 
 For the example, loading 32-bit integer or floating-point objects into
 floating-point registers on the Alpha extends them to 64 bits.
@@ -2538,7 +2540,7 @@  register.  Therefore, @file{alpha.h} defines @code{CANNOT_CHANGE_MODE_CLASS}
 as below:
 
 @smallexample
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
    ? reg_classes_intersect_p (FLOAT_REGS, (CLASS)) : 0)
 @end smallexample
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index d7fa3a5..b8e3dfd 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -748,7 +748,7 @@  validate_subreg (enum machine_mode omode, enum machine_mode imode,
       if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	  && GET_MODE_INNER (imode) == omode)
 	;
-      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, omode))
+      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, offset, omode))
 	return false;
 #endif
 
diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index ad987f9..11a4b3e 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -716,9 +716,9 @@  extern struct target_hard_regs *this_target_hard_regs;
 
 extern const char * reg_class_names[];
 
-/* Given a hard REGN a FROM mode and a TO mode, return nonzero if
+/* Given a hard REGN a FROM mode at SUBREG_BYTE and a TO mode, return nonzero if
    REGN cannot change modes between the specified modes.  */
-#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, TO)                          \
-         CANNOT_CHANGE_MODE_CLASS (FROM, TO, REGNO_REG_CLASS (REGN))
+#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, SUBREG_BYTE, TO) \
+  CANNOT_CHANGE_MODE_CLASS (FROM, SUBREG_BYTE, TO, REGNO_REG_CLASS (REGN))
 
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/postreload.c b/gcc/postreload.c
index 37bd9ff..8fb2f20 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -349,6 +349,8 @@  reload_cse_simplify_set (rtx set, rtx insn)
 	      && extend_op != UNKNOWN
 #ifdef CANNOT_CHANGE_MODE_CLASS
 	      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+					    (GET_CODE (SET_DEST (set)) == SUBREG
+					     ? SUBREG_BYTE (SET_DEST (set)) : 0),
 					    word_mode,
 					    REGNO_REG_CLASS (REGNO (SET_DEST (set))))
 #endif
@@ -459,6 +461,8 @@  reload_cse_simplify_operands (rtx insn, rtx testreg)
 	     it cannot have been used in word_mode.  */
 	  else if (REG_P (SET_DEST (set))
 		   && CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+						(GET_CODE (SET_DEST (set)) == SUBREG
+						 ? SUBREG_BYTE (SET_DEST (set)) : 0),
 						word_mode,
 						REGNO_REG_CLASS (REGNO (SET_DEST (set)))))
 	    ; /* Continue ordinary processing.  */
diff --git a/gcc/recog.c b/gcc/recog.c
index dbd9a8a..e30d81c 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1069,7 +1069,8 @@  register_operand (rtx op, enum machine_mode mode)
 #ifdef CANNOT_CHANGE_MODE_CLASS
       if (REG_P (sub)
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
-	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
+	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub),
+				       SUBREG_BYTE (op), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
 	  /* LRA can generate some invalid SUBREGS just for matched
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 3c9ef3d..2be5774 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -389,7 +389,9 @@  mode_change_ok (enum machine_mode orig_mode, enum machine_mode new_mode,
     return false;
 
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode, new_mode);
+  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode,
+				    (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				    new_mode);
 #endif
 
   return true;
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 46288eb..6a150a4 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1222,6 +1222,7 @@  record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
 	if (!bitmap_bit_p (invalid_mode_changes,
 			   regno * N_REG_CLASSES + rclass)
 	    && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
+					 (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
 					 mode, (enum reg_class) rclass))
 	  bitmap_set_bit (invalid_mode_changes,
 			  regno * N_REG_CLASSES + rclass);
diff --git a/gcc/reload.c b/gcc/reload.c
index 96619f6..487d4d4 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -1064,7 +1064,8 @@  push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (in != 0 && GET_CODE (in) == SUBREG
       && (subreg_lowpart_p (in) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)),
+				    SUBREG_BYTE (in), inmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
       && (CONSTANT_P (SUBREG_REG (in))
@@ -1113,7 +1114,8 @@  push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	  || (REG_P (SUBREG_REG (in))
 	      && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P
-	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)), inmode))
+	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)),
+	       SUBREG_BYTE (in), inmode))
 #endif
 	  ))
     {
@@ -1174,7 +1176,8 @@  push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (out != 0 && GET_CODE (out) == SUBREG
       && (subreg_lowpart_p (out) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)),
+				    SUBREG_BYTE (out), outmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
       && (CONSTANT_P (SUBREG_REG (out))
@@ -1209,6 +1212,7 @@  push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	      && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P (REGNO (SUBREG_REG (out)),
 					   GET_MODE (SUBREG_REG (out)),
+					   SUBREG_BYTE (out),
 					   outmode))
 #endif
 	  ))
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 47439ce..10d5a4e 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -6609,7 +6609,7 @@  choose_reload_regs (struct insn_chain *chain)
 		     mode MODE.  */
 		  && !REG_CANNOT_CHANGE_MODE_P (REGNO (reg_last_reload_reg[regno]),
 						GET_MODE (reg_last_reload_reg[regno]),
-						mode)
+						byte, mode)
 #endif
 		  )
 		{
@@ -8080,8 +8080,12 @@  inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
 		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
-	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode,
+				     (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				     reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode,
+					(MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+					reg_raw_mode[src]));
 #else
   return true;
 #endif
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 38f9e36..9687110 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3533,7 +3533,7 @@  simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
   /* Give the backend a chance to disallow the mode change.  */
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, offset, ymode)
       /* We can use mode change in LRA for some transformations.  */
       && ! lra_in_progress)
     return -1;