diff mbox

[RFC] LRA subreg handling

Message ID B5E67142681B53468FAF6B7C31356562441183BD@hhmail02.hh.imgtec.org
State New
Headers show

Commit Message

Robert Suchanek Jan. 14, 2015, 7:20 p.m. UTC
Hi Vladimir,

An issue has been identified with LRA when running CPU2006 h264ref benchmark.

I'll try to describe what the issue is and a fix applied as it is very
difficult to reproduce it and it is next to impossible to create a narrowed
testcase on top of the source code restrictions.

The concerned LRA code in lra-constraints.c is the following:

if (GET_CODE (*loc) == SUBREG)
  {
    reg = SUBREG_REG (*loc);
    byte = SUBREG_BYTE (*loc);
    if (REG_P (reg)
        /* Strict_low_part requires reload the register not
           the sub-register.  */
        && (curr_static_id->operand[i].strict_low
            || (GET_MODE_SIZE (mode)
                <= GET_MODE_SIZE (GET_MODE (reg))
                && (hard_regno
                    = get_try_hard_regno (REGNO (reg))) >= 0
                && (simplify_subreg_regno
                    (hard_regno,
                     GET_MODE (reg), byte, mode) < 0)
                && (goal_alt[i] == NO_REGS
                    || (simplify_subreg_regno
                        (ira_class_hard_regs[goal_alt[i]][0],
                         GET_MODE (reg), byte, mode) >= 0))))
      {
        loc = &SUBREG_REG (*loc);
        mode = GET_MODE (*loc);
      }
  }

The above works just fine when we deal with strict_low_part or a subreg
smaller than a word.

However, multi-word operations that were emitted as a sequence of operations
on word sized parts of the DImode register appears to expose a problem with
LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
LRA does not realize that it actually uses the other halve of the DI-mode
register leading to a situation where it modifies one halve of the result and
spills the whole register with the other halve undefined.

In the dump I can see the following:

      Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
 1487: r1552:DI#4=r1404:SI+r1509:SI
      REG_DEAD r1509:SI
      REG_DEAD r1404:SI
    Inserting insn reload after:
 1735: r521:DI=r1552:DI

There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
to load the value before modifying it but it is spilled.

As it is a multi-word register, the split pass emits an additional instruction
to load the whole 64-bit value but since one halve was modified, only
register $20 appears in the live-in set. In contrast to $20, $21 is being used
but not added to the live-in set.

...
;; live  in      4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
[$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
...
(insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
    (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
              (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
{*movsi_internal}
     (nil))
(insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
    (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
              (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
{*movsi_internal}
     (nil))
...

The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT
to treat the pseudo register (r1552 in this case) as input and LRA will be forced
to insert a reload before modifying its contents. 

Handling of strict_low_part case is fine as the operand is described in the MD pattern
as IN_OUT through modifiers.

With the above change in place, we get a reload before assignment:

      Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
 1487: r1552:DI#4=r1404:SI+r1509:SI
      REG_DEAD r1509:SI
      REG_DEAD r1404:SI
    Inserting insn reload before:
 1735: r1552:DI=r521:DI
    Inserting insn reload after:
 1736: r521:DI=r1552:DI

and the benchmark happily passes the runtime check.

The question is whether changing the type to OP_INOUT is the correct and
valid fix?

Regards,
Robert

2015-01-14  Robert Suchanek  <robert.suchanek@imgtec.com>

gcc/
        * lra-constraints.c (curr_insn_transform): Change the type of a reload
        pseudo to OP_INOUT.
---
 gcc/lra-constraints.c |    1 +
 1 file changed, 1 insertion(+)

--
1.7.9.5

Comments

Jeff Law Jan. 15, 2015, 5 a.m. UTC | #1
On 01/14/15 12:20, Robert Suchanek wrote:
> Hi Vladimir,
>
> An issue has been identified with LRA when running CPU2006 h264ref benchmark.
>
> I'll try to describe what the issue is and a fix applied as it is very
> difficult to reproduce it and it is next to impossible to create a narrowed
> testcase on top of the source code restrictions.
>
> The concerned LRA code in lra-constraints.c is the following:
>
> if (GET_CODE (*loc) == SUBREG)
>    {
>      reg = SUBREG_REG (*loc);
>      byte = SUBREG_BYTE (*loc);
>      if (REG_P (reg)
>          /* Strict_low_part requires reload the register not
>             the sub-register.  */
>          && (curr_static_id->operand[i].strict_low
>              || (GET_MODE_SIZE (mode)
>                  <= GET_MODE_SIZE (GET_MODE (reg))
>                  && (hard_regno
>                      = get_try_hard_regno (REGNO (reg))) >= 0
>                  && (simplify_subreg_regno
>                      (hard_regno,
>                       GET_MODE (reg), byte, mode) < 0)
>                  && (goal_alt[i] == NO_REGS
>                      || (simplify_subreg_regno
>                          (ira_class_hard_regs[goal_alt[i]][0],
>                           GET_MODE (reg), byte, mode) >= 0))))
>        {
>          loc = &SUBREG_REG (*loc);
>          mode = GET_MODE (*loc);
>        }
>    }
>
> The above works just fine when we deal with strict_low_part or a subreg
> smaller than a word.
>
> However, multi-word operations that were emitted as a sequence of operations
> on word sized parts of the DImode register appears to expose a problem with
> LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
> LRA does not realize that it actually uses the other halve of the DI-mode
> register leading to a situation where it modifies one halve of the result and
> spills the whole register with the other halve undefined.
>
> In the dump I can see the following:
>
>        Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
>   1487: r1552:DI#4=r1404:SI+r1509:SI
>        REG_DEAD r1509:SI
>        REG_DEAD r1404:SI
>      Inserting insn reload after:
>   1735: r521:DI=r1552:DI
>
> There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
> to load the value before modifying it but it is spilled.
>
> As it is a multi-word register, the split pass emits an additional instruction
> to load the whole 64-bit value but since one halve was modified, only
> register $20 appears in the live-in set. In contrast to $20, $21 is being used
> but not added to the live-in set.
>
> ...
> ;; live  in      4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
> [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
> 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
> ...
> (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
>      (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
>                (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
> {*movsi_internal}
>       (nil))
> (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
>      (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
>                (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
> {*movsi_internal}
>       (nil))
> ...
>
> The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT
> to treat the pseudo register (r1552 in this case) as input and LRA will be forced
> to insert a reload before modifying its contents.
>
> Handling of strict_low_part case is fine as the operand is described in the MD pattern
> as IN_OUT through modifiers.
>
> With the above change in place, we get a reload before assignment:
>
>        Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
>   1487: r1552:DI#4=r1404:SI+r1509:SI
>        REG_DEAD r1509:SI
>        REG_DEAD r1404:SI
>      Inserting insn reload before:
>   1735: r1552:DI=r521:DI
>      Inserting insn reload after:
>   1736: r521:DI=r1552:DI
>
> and the benchmark happily passes the runtime check.
>
> The question is whether changing the type to OP_INOUT is the correct and
> valid fix?
>
> Regards,
> Robert
>
> 2015-01-14  Robert Suchanek  <robert.suchanek@imgtec.com>
>
> gcc/
>          * lra-constraints.c (curr_insn_transform): Change the type of a reload
>          pseudo to OP_INOUT.
Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify 
that the comment just before its return statement is effectively the 
situation you're in.

There are certainly cases where a SUBREG needs to be treated as an 
in-out operand.  We walked through them eons ago when we were poking at 
SSA for RTL.  But the details have long since faded from memory.

jeff
Robert Suchanek Jan. 15, 2015, 10:13 a.m. UTC | #2
> Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify
> that the comment just before its return statement is effectively the
> situation you're in.
> 
> There are certainly cases where a SUBREG needs to be treated as an
> in-out operand.  We walked through them eons ago when we were poking at
> SSA for RTL.  But the details have long since faded from memory.

The comment pretty much applies to my situation.  The only difference I can
see is that reload would have had hard registers at this point.  In the testcase,
LRA does not have hard registers assigned to the concerned pseudo(s), thus, 
it can't rely on the information in hard_regno_nregs to check if the number of
registers in INNER is different to the number of words in INNER.

Robert
Vladimir Makarov Jan. 15, 2015, 4:13 p.m. UTC | #3
On 14/01/15 02:20 PM, Robert Suchanek wrote:
> Hi Vladimir,
>
> An issue has been identified with LRA when running CPU2006 h264ref benchmark.
>
> I'll try to describe what the issue is and a fix applied as it is very
> difficult to reproduce it and it is next to impossible to create a narrowed
> testcase on top of the source code restrictions.
>
> The concerned LRA code in lra-constraints.c is the following:
>
> if (GET_CODE (*loc) == SUBREG)
>   {
>     reg = SUBREG_REG (*loc);
>     byte = SUBREG_BYTE (*loc);
>     if (REG_P (reg)
>         /* Strict_low_part requires reload the register not
>            the sub-register.  */
>         && (curr_static_id->operand[i].strict_low
>             || (GET_MODE_SIZE (mode)
>                 <= GET_MODE_SIZE (GET_MODE (reg))
>                 && (hard_regno
>                     = get_try_hard_regno (REGNO (reg))) >= 0
>                 && (simplify_subreg_regno
>                     (hard_regno,
>                      GET_MODE (reg), byte, mode) < 0)
>                 && (goal_alt[i] == NO_REGS
>                     || (simplify_subreg_regno
>                         (ira_class_hard_regs[goal_alt[i]][0],
>                          GET_MODE (reg), byte, mode) >= 0))))
>       {
>         loc = &SUBREG_REG (*loc);
>         mode = GET_MODE (*loc);
>       }
>   }
>
> The above works just fine when we deal with strict_low_part or a subreg
> smaller than a word.
>
> However, multi-word operations that were emitted as a sequence of operations
> on word sized parts of the DImode register appears to expose a problem with
> LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
> LRA does not realize that it actually uses the other halve of the DI-mode
> register leading to a situation where it modifies one halve of the result and
> spills the whole register with the other halve undefined.
>
> In the dump I can see the following:
>
>       Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
>  1487: r1552:DI#4=r1404:SI+r1509:SI
>       REG_DEAD r1509:SI
>       REG_DEAD r1404:SI
>     Inserting insn reload after:
>  1735: r521:DI=r1552:DI
>
> There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
> to load the value before modifying it but it is spilled.
>
> As it is a multi-word register, the split pass emits an additional instruction
> to load the whole 64-bit value but since one halve was modified, only
> register $20 appears in the live-in set. In contrast to $20, $21 is being used
> but not added to the live-in set.
>
> ...
> ;; live  in      4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
> [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
> 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
> ...
> (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
>     (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
>               (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
> {*movsi_internal}
>      (nil))
> (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
>     (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
>               (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
> {*movsi_internal}
>      (nil))
> ...
>
> The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT
> to treat the pseudo register (r1552 in this case) as input and LRA will be forced
> to insert a reload before modifying its contents. 
>
> Handling of strict_low_part case is fine as the operand is described in the MD pattern
> as IN_OUT through modifiers.
>
> With the above change in place, we get a reload before assignment:
>
>       Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
>  1487: r1552:DI#4=r1404:SI+r1509:SI
>       REG_DEAD r1509:SI
>       REG_DEAD r1404:SI
>     Inserting insn reload before:
>  1735: r1552:DI=r521:DI
>     Inserting insn reload after:
>  1736: r521:DI=r1552:DI
>
> and the benchmark happily passes the runtime check.
>
> The question is whether changing the type to OP_INOUT is the correct and
> valid fix?
>
> Regards,
> Robert
>
> 2015-01-14  Robert Suchanek  <robert.suchanek@imgtec.com>
>
> gcc/
>         * lra-constraints.c (curr_insn_transform): Change the type of a reload
>         pseudo to OP_INOUT.
> ---
>  gcc/lra-constraints.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index ec28b7f..018968b 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -3798,6 +3798,7 @@ curr_insn_transform (void)
>                                   (ira_class_hard_regs[goal_alt[i]][0],
>                                    GET_MODE (reg), byte, mode) >= 0)))))
>                 {
> +                 type = OP_INOUT;
>                   loc = &SUBREG_REG (*loc);
>                   mode = GET_MODE (*loc);
>                 }
>
Robert, thanks for working on the issue.  Your approach is in the right
direction.  But I believe the change should be

if (type == OP_OUT)
  type = OP_INOUT;

Otherwise, I will generate still a right code but an additional insn
which will not go away later.

With this change the patch is ok.  Please, check x86/x86-64 target
bootstrap and GCC testsuite before submitting the patch.  It seems there
will be no problem with the patch on other targets but it is better to
check to be sure.
Jeff Law Jan. 15, 2015, 5:23 p.m. UTC | #4
On 01/15/15 03:13, Robert Suchanek wrote:
>> Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify
>> that the comment just before its return statement is effectively the
>> situation you're in.
>>
>> There are certainly cases where a SUBREG needs to be treated as an
>> in-out operand.  We walked through them eons ago when we were poking at
>> SSA for RTL.  But the details have long since faded from memory.
>
> The comment pretty much applies to my situation.  The only difference I can
> see is that reload would have had hard registers at this point.  In the testcase,
> LRA does not have hard registers assigned to the concerned pseudo(s), thus,
> it can't rely on the information in hard_regno_nregs to check if the number of
> registers in INNER is different to the number of words in INNER.
The differences (hard vs pseudo regs) are primarily an implementation 
detail.  I was really looking to see if there was existing code which 
would turn an output reload into an in-out reload for these subregs.

The in-out nature of certain subregs is something I've personally 
stumbled over in various contexts (for example, this also came up during 
RTL-SSA investigations years ago).

Jeff
Robert Suchanek Jan. 16, 2015, 11:16 a.m. UTC | #5
> The differences (hard vs pseudo regs) are primarily an implementation
> detail.  I was really looking to see if there was existing code which
> would turn an output reload into an in-out reload for these subregs.
> 
> The in-out nature of certain subregs is something I've personally
> stumbled over in various contexts (for example, this also came up during
> RTL-SSA investigations years ago).
> 
> Jeff

Although I haven't seen an output reload changing to in-out reload in 
the testcase after analysing the reload code I'm inclined to say the change
of the out to in-out reload can happen for these subregs in push_reload().

Robert
Matthew Fortune Jan. 16, 2015, 12:33 p.m. UTC | #6
Jeff Law <law@redhat.com> writes:
> On 01/15/15 03:13, Robert Suchanek wrote:
> >> Robert, can you look at reload.c::reload_inner_reg_of_subreg and
> >> verify that the comment just before its return statement is
> >> effectively the situation you're in.
> >>
> >> There are certainly cases where a SUBREG needs to be treated as an
> >> in-out operand.  We walked through them eons ago when we were poking
> >> at SSA for RTL.  But the details have long since faded from memory.
> >
> > The comment pretty much applies to my situation.  The only difference
> > I can see is that reload would have had hard registers at this point.
> > In the testcase, LRA does not have hard registers assigned to the
> > concerned pseudo(s), thus, it can't rely on the information in
> > hard_regno_nregs to check if the number of registers in INNER is
> different to the number of words in INNER.
> The differences (hard vs pseudo regs) are primarily an implementation
> detail.  I was really looking to see if there was existing code which
> would turn an output reload into an in-out reload for these subregs.
> 
> The in-out nature of certain subregs is something I've personally
> stumbled over in various contexts (for example, this also came up during
> RTL-SSA investigations years ago).

Committed as r219730 on Robert's behalf.

Thanks,
Matthew
Richard Sandiford Jan. 24, 2015, 10:53 a.m. UTC | #7
Jeff Law <law@redhat.com> writes:
> On 01/15/15 03:13, Robert Suchanek wrote:
>>> Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify
>>> that the comment just before its return statement is effectively the
>>> situation you're in.
>>>
>>> There are certainly cases where a SUBREG needs to be treated as an
>>> in-out operand.  We walked through them eons ago when we were poking at
>>> SSA for RTL.  But the details have long since faded from memory.
>>
>> The comment pretty much applies to my situation.  The only difference I can
>> see is that reload would have had hard registers at this point.  In
>> the testcase,
>> LRA does not have hard registers assigned to the concerned pseudo(s), thus,
>> it can't rely on the information in hard_regno_nregs to check if the number of
>> registers in INNER is different to the number of words in INNER.

But the code you quote is:

if (GET_CODE (*loc) == SUBREG)
  {
    reg = SUBREG_REG (*loc);
    byte = SUBREG_BYTE (*loc);
    if (REG_P (reg)
        /* Strict_low_part requires reload the register not
           the sub-register.  */
        && (curr_static_id->operand[i].strict_low
            || (GET_MODE_SIZE (mode)
                <= GET_MODE_SIZE (GET_MODE (reg))
                && (hard_regno
                    = get_try_hard_regno (REGNO (reg))) >= 0
                && (simplify_subreg_regno
                    (hard_regno,
                     GET_MODE (reg), byte, mode) < 0)
                && (goal_alt[i] == NO_REGS
                    || (simplify_subreg_regno
                        (ira_class_hard_regs[goal_alt[i]][0],
                         GET_MODE (reg), byte, mode) >= 0))))

Here we do have a hard register, but it isn't valid to form the
subreg on that hard register.  Reload had to cope with that case too.

Since the subreg on the original hard register is invalid, we can't
use it to decide whether the intention was to write to only a part
of the inner register.  But I don't think we need to use the hard
register here.  The original register was a psuedo and I'm pretty
sure...

> The differences (hard vs pseudo regs) are primarily an implementation 
> detail.  I was really looking to see if there was existing code which 
> would turn an output reload into an in-out reload for these subregs.
>
> The in-out nature of certain subregs is something I've personally 
> stumbled over in various contexts (for example, this also came up during 
> RTL-SSA investigations years ago).

...the rule for pseudos is that words of a multiword pseudo can be
accessed independently but subword pieces of an individual word can't.
This obviously isn't ideal if a mode is intended for wider-than-word
registers, since not all words will be independently addressable when
allocated to those registers.  The code above is partly dealing with
the fallout from that.  It's also why we have strict_lowpart for
cases like al on i386.

So IMO the patch is too broad.  I think it should only use INOUT reloads
for !strict_low if the inner mode is wider than a word and the outer mode
is strictly narrower than the inner mode.  That's on top of Vlad's
comment about only converting OP_OUTs, of course.

Thanks,
Richard
Matthew Fortune Jan. 25, 2015, 12:34 p.m. UTC | #8
Richard Sandiford <rdsandiford@googlemail.com> writes:
> Jeff Law <law@redhat.com> writes:
> > On 01/15/15 03:13, Robert Suchanek wrote:
> >>> Robert, can you look at reload.c::reload_inner_reg_of_subreg and
> >>> verify that the comment just before its return statement is
> >>> effectively the situation you're in.
> >>>
> >>> There are certainly cases where a SUBREG needs to be treated as an
> >>> in-out operand.  We walked through them eons ago when we were poking
> >>> at SSA for RTL.  But the details have long since faded from memory.
> >>
> >> The comment pretty much applies to my situation.  The only difference
> >> I can see is that reload would have had hard registers at this point.
> >> In the testcase, LRA does not have hard registers assigned to the
> >> concerned pseudo(s), thus, it can't rely on the information in
> >> hard_regno_nregs to check if the number of registers in INNER is
> >> different to the number of words in INNER.
> 
> But the code you quote is:
> 
> if (GET_CODE (*loc) == SUBREG)
>   {
>     reg = SUBREG_REG (*loc);
>     byte = SUBREG_BYTE (*loc);
>     if (REG_P (reg)
>         /* Strict_low_part requires reload the register not
>            the sub-register.  */
>         && (curr_static_id->operand[i].strict_low
>             || (GET_MODE_SIZE (mode)
>                 <= GET_MODE_SIZE (GET_MODE (reg))
>                 && (hard_regno
>                     = get_try_hard_regno (REGNO (reg))) >= 0
>                 && (simplify_subreg_regno
>                     (hard_regno,
>                      GET_MODE (reg), byte, mode) < 0)
>                 && (goal_alt[i] == NO_REGS
>                     || (simplify_subreg_regno
>                         (ira_class_hard_regs[goal_alt[i]][0],
>                          GET_MODE (reg), byte, mode) >= 0))))
> 
> Here we do have a hard register, but it isn't valid to form the subreg
> on that hard register.  Reload had to cope with that case too.
> 
> Since the subreg on the original hard register is invalid, we can't use
> it to decide whether the intention was to write to only a part of the
> inner register.  But I don't think we need to use the hard register
> here.  The original register was a psuedo and I'm pretty sure...
> 
> > The differences (hard vs pseudo regs) are primarily an implementation
> > detail.  I was really looking to see if there was existing code which
> > would turn an output reload into an in-out reload for these subregs.
> >
> > The in-out nature of certain subregs is something I've personally
> > stumbled over in various contexts (for example, this also came up
> > during RTL-SSA investigations years ago).
> 
> ...the rule for pseudos is that words of a multiword pseudo can be
> accessed independently but subword pieces of an individual word can't.
> This obviously isn't ideal if a mode is intended for wider-than-word
> registers, since not all words will be independently addressable when
> allocated to those registers.  The code above is partly dealing with the
> fallout from that.  It's also why we have strict_lowpart for cases like
> al on i386.
> 
> So IMO the patch is too broad.  I think it should only use INOUT reloads
> for !strict_low if the inner mode is wider than a word and the outer
> mode is strictly narrower than the inner mode.  That's on top of Vlad's
> comment about only converting OP_OUTs, of course.

This sounds correct/sensible. The change as committed will be turning
some OP_OUT reloads into OP_INOUT unnecessarily. Checking for !strict_low
is however probably redundant as strict_low must be OP_INOUT and never
OP_OUT but we could mask backend bugs by converting strict_low subregs.

I think this should be resolved for GCC 5 if others agree it is an issue.

Matthew
Robert Suchanek Jan. 26, 2015, 1:19 p.m. UTC | #9
> > Here we do have a hard register, but it isn't valid to form the subreg
> > on that hard register.  Reload had to cope with that case too.
> >
> > Since the subreg on the original hard register is invalid, we can't use
> > it to decide whether the intention was to write to only a part of the
> > inner register.  But I don't think we need to use the hard register
> > here.  The original register was a psuedo and I'm pretty sure...

Indeed, it is a hard register (64 IIRC) since get_try_hard_regno () must
return >= 0. 
 
> > > The differences (hard vs pseudo regs) are primarily an implementation
> > > detail.  I was really looking to see if there was existing code which
> > > would turn an output reload into an in-out reload for these subregs.
> > >
> > > The in-out nature of certain subregs is something I've personally
> > > stumbled over in various contexts (for example, this also came up
> > > during RTL-SSA investigations years ago).
> >
> > ...the rule for pseudos is that words of a multiword pseudo can be
> > accessed independently but subword pieces of an individual word can't.
> > This obviously isn't ideal if a mode is intended for wider-than-word
> > registers, since not all words will be independently addressable when
> > allocated to those registers.  The code above is partly dealing with the
> > fallout from that.  It's also why we have strict_lowpart for cases like
> > al on i386.
> >
> > So IMO the patch is too broad.  I think it should only use INOUT reloads
> > for !strict_low if the inner mode is wider than a word and the outer
> > mode is strictly narrower than the inner mode.  That's on top of Vlad's
> > comment about only converting OP_OUTs, of course.
> 
> This sounds correct/sensible. The change as committed will be turning
> some OP_OUT reloads into OP_INOUT unnecessarily. Checking for !strict_low
> is however probably redundant as strict_low must be OP_INOUT and never
> OP_OUT but we could mask backend bugs by converting strict_low subregs.
> 
> I think this should be resolved for GCC 5 if others agree it is an issue.
> 
> Matthew

This makes sense since we've got the inner wider than a word and I agree
it'd be better to resolve this for GCC 5.

Robert
diff mbox

Patch

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index ec28b7f..018968b 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3798,6 +3798,7 @@  curr_insn_transform (void)
                                  (ira_class_hard_regs[goal_alt[i]][0],
                                   GET_MODE (reg), byte, mode) >= 0)))))
                {
+                 type = OP_INOUT;
                  loc = &SUBREG_REG (*loc);
                  mode = GET_MODE (*loc);
                }