diff mbox

[avr] Fix PR 65657 - read from __memx address space tramples arguments to function call

Message ID 20150416064347.GA31491@atmel.com
State New
Headers show

Commit Message

Senthil Kumar Selvaraj April 16, 2015, 6:43 a.m. UTC
This patch fixes PR 65657.

When cfgexpand.c expands a function call, it first figures out the
registers in which the arguments will go, followed by expansion of the
arguments themselves (right to left). It later emits mov insns to set 
the precomputed registers with the expanded RTXes.

If one of the arguments is a __memx char pointer dereference, the mov
eventually expands to gen_xload<mode>_A (for certain cases), which 
clobbers R22, R21 and Z.  This causes problems if one of those 
clobbered registers was precomputed to hold another argument.

In general, call expansion does not appear to take clobbers into account - 
when it checks for argument overlap, the RTX (args[i].value) is only a MEM 
in QImode for the memx deref - the clobber shows up when it eventually 
calls emit_move_insn, at which point, it is too late.

This does not happen for a int pointer dereference - turns out that 
precompute_register_parameters does a copy_to_mode_reg if the
cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
pseudo and later assigns the pseudo to the arg register. This is done
before any moves to arg registers is done, so other arguments are not
overwritten.

Doing the same thing - providing a better cost estimate for a MEM rtx in
the non-default address space, makes this problem go away, and that is
what this patch does. Regression testing does not show any new failures.

The fix does seem tangential to the core problem - not sure if there is
a better way to fix this?

If not, could someone commit this please? I don't have commit access.

Regards
Senthil

gcc/ChangeLog

2015-04-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>

	* config/avr/avr.c (avr_rtx_costs_1): Increase cost for
	MEM rtx in non-default address space.

gcc/testsuite/ChangeLog

2015-04-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>

	* gcc.target/avr/pr65657.c: New.

Comments

Georg-Johann Lay April 16, 2015, 9:02 a.m. UTC | #1
Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> This patch fixes PR 65657.

The following artifact appears to be PR63633.

> When cfgexpand.c expands a function call, it first figures out the
> registers in which the arguments will go, followed by expansion of the
> arguments themselves (right to left). It later emits mov insns to set
> the precomputed registers with the expanded RTXes.
>
> If one of the arguments is a __memx char pointer dereference, the mov
> eventually expands to gen_xload<mode>_A (for certain cases), which
> clobbers R22, R21 and Z.  This causes problems if one of those
> clobbered registers was precomputed to hold another argument.
>
> In general, call expansion does not appear to take clobbers into account -
> when it checks for argument overlap, the RTX (args[i].value) is only a MEM
> in QImode for the memx deref - the clobber shows up when it eventually
> calls emit_move_insn, at which point, it is too late.
>
> This does not happen for a int pointer dereference - turns out that
> precompute_register_parameters does a copy_to_mode_reg if the
> cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
> pseudo and later assigns the pseudo to the arg register. This is done
> before any moves to arg registers is done, so other arguments are not
> overwritten.
>
> Doing the same thing - providing a better cost estimate for a MEM rtx in
> the non-default address space, makes this problem go away, and that is
> what this patch does. Regression testing does not show any new failures.
>
> The fix does seem tangential to the core problem - not sure if there is
> a better way to fix this?
>
> If not, could someone commit this please? I don't have commit access.
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2015-04-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
>
> 	* config/avr/avr.c (avr_rtx_costs_1): Increase cost for
> 	MEM rtx in non-default address space.
>
> gcc/testsuite/ChangeLog
>
> 2015-04-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
>
> 	* gcc.target/avr/pr65657.c: New.
>
> diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
> index 68d5ddc..c9b8678 100644
> --- gcc/config/avr/avr.c
> +++ gcc/config/avr/avr.c
> @@ -9959,7 +9959,11 @@ avr_rtx_costs_1 (rtx x, int codearg, int outer_code ATTRIBUTE_UNUSED,
>         return true;
>
>       case MEM:
> -      *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
> +      /* MEM rtx with non-default address space is more
> +         expensive. Not expressing that results in reg
> +         clobber during expand (PR 65657). */
> +      *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)
> +                  + (MEM_ADDR_SPACE(x) == ADDR_SPACE_RAM ? 0 : 5));
>         return true;

IMO costs should not be used to fix wrong code or ICEs.  Presumably the fix is 
similar to the one used by PR63633.

Johann

>       case NEG:
> diff --git gcc/testsuite/gcc.target/avr/pr65657.c gcc/testsuite/gcc.target/avr/pr65657.c
> new file mode 100644
> index 0000000..d908fe3
> --- /dev/null
> +++ gcc/testsuite/gcc.target/avr/pr65657.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */
> +/* { dg-options "-Os" } */
> +
> +/* When cfgexpand expands the call to foo, it
> +   assigns registers R22:R23 to 0xABCD and R24
> +   to *x. Because x is a __memx address space
> +   pointer, the dereference results in an RTL
> +   insn that clobbers R22 among other
> +   registers (xload<mode>_A).
> +
> +   Call expansion does not take this into account
> +   and ends up clobbering R22 set previously to
> +   hold part of the second argument.
> +*/
> +
> +#include <stdlib.h>
> +
> +void __attribute__((noinline))
> +    __attribute__((noclone))
> +foo (char c, volatile unsigned int d)
> +{
> +    if (d != 0xABCD)
> +      abort();
> +    if (c != 'A')
> +        abort();
> +}
> +
> +void __attribute__((noinline))
> +    __attribute__((noclone))
> +readx(const char __memx *x)
> +{
> +    foo (*x, 0xABCD);
> +}
> +
> +const char __memx arr[] = { 'A' };
> +int main()
> +{
> +   readx (arr);
> +}
>
Senthil Kumar Selvaraj April 16, 2015, 9:28 a.m. UTC | #2
On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
> Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> >This patch fixes PR 65657.
> 
> The following artifact appears to be PR63633.
> 

I did see that one - unfortunately, that fix won't help here. IIUC, you
check if input/output operand hard regs are in the clobber list, 
and if yes, you generate pseudos to save and restore clobbered hard
regs.

In this case, the reg is actually clobbered by a different insn (one
that loads the next argument to the function). So unless I blindly generate pseudos for 
all regs in the clobber list, the clobbering will still happen.

FWIW, I did try saving/restoring all clobbered regs, and it did fix the 
problem - just that it appeared like a (worse) hack to me. Aren't we
manually replicating what RA/reload should be doing?

What do you think?

> >When cfgexpand.c expands a function call, it first figures out the
> >registers in which the arguments will go, followed by expansion of the
> >arguments themselves (right to left). It later emits mov insns to set
> >the precomputed registers with the expanded RTXes.
> >
> >If one of the arguments is a __memx char pointer dereference, the mov
> >eventually expands to gen_xload<mode>_A (for certain cases), which
> >clobbers R22, R21 and Z.  This causes problems if one of those
> >clobbered registers was precomputed to hold another argument.
> >
> >In general, call expansion does not appear to take clobbers into account -
> >when it checks for argument overlap, the RTX (args[i].value) is only a MEM
> >in QImode for the memx deref - the clobber shows up when it eventually
> >calls emit_move_insn, at which point, it is too late.
> >
> >This does not happen for a int pointer dereference - turns out that
> >precompute_register_parameters does a copy_to_mode_reg if the
> >cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
> >pseudo and later assigns the pseudo to the arg register. This is done
> >before any moves to arg registers is done, so other arguments are not
> >overwritten.
> >
> >Doing the same thing - providing a better cost estimate for a MEM rtx in
> >the non-default address space, makes this problem go away, and that is
> >what this patch does. Regression testing does not show any new failures.
> >
> >The fix does seem tangential to the core problem - not sure if there is
> >a better way to fix this?
> >
> >If not, could someone commit this please? I don't have commit access.
> >
> >Regards
> >Senthil
> >
> >gcc/ChangeLog
> >
> >2015-04-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
> >
> >	* config/avr/avr.c (avr_rtx_costs_1): Increase cost for
> >	MEM rtx in non-default address space.
> >
> >gcc/testsuite/ChangeLog
> >
> >2015-04-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
> >
> >	* gcc.target/avr/pr65657.c: New.
> >
> >diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
> >index 68d5ddc..c9b8678 100644
> >--- gcc/config/avr/avr.c
> >+++ gcc/config/avr/avr.c
> >@@ -9959,7 +9959,11 @@ avr_rtx_costs_1 (rtx x, int codearg, int outer_code ATTRIBUTE_UNUSED,
> >        return true;
> >
> >      case MEM:
> >-      *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
> >+      /* MEM rtx with non-default address space is more
> >+         expensive. Not expressing that results in reg
> >+         clobber during expand (PR 65657). */
> >+      *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)
> >+                  + (MEM_ADDR_SPACE(x) == ADDR_SPACE_RAM ? 0 : 5));
> >        return true;
> 
> IMO costs should not be used to fix wrong code or ICEs.  Presumably the fix
> is similar to the one used by PR63633.

Yes, fixing it by adjusting costs is fragile, but it appeared cleaner to me.

Regards
Senthil

> 
> Johann
> 
> >      case NEG:
> >diff --git gcc/testsuite/gcc.target/avr/pr65657.c gcc/testsuite/gcc.target/avr/pr65657.c
> >new file mode 100644
> >index 0000000..d908fe3
> >--- /dev/null
> >+++ gcc/testsuite/gcc.target/avr/pr65657.c
> >@@ -0,0 +1,39 @@
> >+/* { dg-do run } */
> >+/* { dg-options "-Os" } */
> >+
> >+/* When cfgexpand expands the call to foo, it
> >+   assigns registers R22:R23 to 0xABCD and R24
> >+   to *x. Because x is a __memx address space
> >+   pointer, the dereference results in an RTL
> >+   insn that clobbers R22 among other
> >+   registers (xload<mode>_A).
> >+
> >+   Call expansion does not take this into account
> >+   and ends up clobbering R22 set previously to
> >+   hold part of the second argument.
> >+*/
> >+
> >+#include <stdlib.h>
> >+
> >+void __attribute__((noinline))
> >+    __attribute__((noclone))
> >+foo (char c, volatile unsigned int d)
> >+{
> >+    if (d != 0xABCD)
> >+      abort();
> >+    if (c != 'A')
> >+        abort();
> >+}
> >+
> >+void __attribute__((noinline))
> >+    __attribute__((noclone))
> >+readx(const char __memx *x)
> >+{
> >+    foo (*x, 0xABCD);
> >+}
> >+
> >+const char __memx arr[] = { 'A' };
> >+int main()
> >+{
> >+   readx (arr);
> >+}
> >
>
Georg-Johann Lay April 16, 2015, 4:47 p.m. UTC | #3
Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
>> Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
>>> This patch fixes PR 65657.
>>
>> The following artifact appears to be PR63633.
>>
>
> I did see that one - unfortunately, that fix won't help here. IIUC, you
> check if input/output operand hard regs are in the clobber list,
> and if yes, you generate pseudos to save and restore clobbered hard
> regs.
>
> In this case, the reg is actually clobbered by a different insn (one

Arrgh, yes...

> that loads the next argument to the function). So unless I blindly generate pseudos for
> all regs in the clobber list, the clobbering will still happen.
>
> FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> problem - just that it appeared like a (worse) hack to me. Aren't we
> manually replicating what RA/reload should be doing?

As it appears, we'll have to do it by hand.  The attaches patch is just a 
sketch that indicates how the problem could be approached.  Notice the new 
assertions in the split expanders; they will throw ICE until the fix is 
actually installed.

The critical insn are generated in movMM expander and are changed to have no 
clobbers (SCRATCHes actually).  An a later pass, when appropriate life info can 
be made available, run yet another avr pass that

1a) Save-restore needed hard regs around the insn.

2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new 
pseudos.  Maybe that could happen due to some hard regs progagation, or we can 
use a new predicate similar combine_pseudo_register_operand.

3) Replace scratch -> hard regs for all scratch_operands.

2b) Restore respective hard regs from their pseudos.

1b) Restore respective hard regs from their pseudos.


And maybe we can get better code by allocating things like address register by 
hand and get better code then.

When I implemented some of the libgcc insns I tried to express the operand by 
means of constraints, e.h. for (reg:HI 22) and let register allocator do the job.

The resulting code was functional but *horrific*.

The register allocator is not yet ready to generate efficient code in such 
demanding situations...

>
> What do you think?
>

IMO sooner or later we'll need such an infrastructure; maybe also for non-mov 
insn that are implemented by transparent libcalls like divmod, mul, etc.

>>> When cfgexpand.c expands a function call, it first figures out the
>>> registers in which the arguments will go, followed by expansion of the
>>> arguments themselves (right to left). It later emits mov insns to set
>>> the precomputed registers with the expanded RTXes.
>>>
>>> If one of the arguments is a __memx char pointer dereference, the mov
>>> eventually expands to gen_xload<mode>_A (for certain cases), which
>>> clobbers R22, R21 and Z.  This causes problems if one of those
>>> clobbered registers was precomputed to hold another argument.
>>>
>>> In general, call expansion does not appear to take clobbers into account -

We had been warned that using hard regs is evil...  But without that technique 
the code quality would decrease way too much.

>>> when it checks for argument overlap, the RTX (args[i].value) is only a MEM
>>> in QImode for the memx deref - the clobber shows up when it eventually
>>> calls emit_move_insn, at which point, it is too late.

Such situations could only be handled by a target hook which allowed to expand 
specific trees by hand...  Such a hook could cater for insn that must use hard 
registers.

>>> This does not happen for a int pointer dereference - turns out that
>>> precompute_register_parameters does a copy_to_mode_reg if the
>>> cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
>>> pseudo and later assigns the pseudo to the arg register. This is done
>>> before any moves to arg registers is done, so other arguments are not
>>> overwritten.
>>>
>>> Doing the same thing - providing a better cost estimate for a MEM rtx in
>>> the non-default address space, makes this problem go away, and that is
>>> what this patch does. Regression testing does not show any new failures.

Can you tell something about overall code quality? If it is not significantly 
worse then I'd propose to apply your rtx-costs solution soon.  The full fix 
will take more time to work it out.


Johann
Denis Chertykov April 16, 2015, 4:54 p.m. UTC | #4
2015-04-16 19:47 GMT+03:00 Georg-Johann Lay <avr@gjlay.de>:
>
> Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
>>
>> On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
>>>
>>> Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
>>>>
>>>> This patch fixes PR 65657.
>>>
>>>
>>> The following artifact appears to be PR63633.
>>>
>>
>> I did see that one - unfortunately, that fix won't help here. IIUC, you
>> check if input/output operand hard regs are in the clobber list,
>> and if yes, you generate pseudos to save and restore clobbered hard
>> regs.
>>
>> In this case, the reg is actually clobbered by a different insn (one
>
>
> Arrgh, yes...
>
>> that loads the next argument to the function). So unless I blindly generate pseudos for
>> all regs in the clobber list, the clobbering will still happen.
>>
>> FWIW, I did try saving/restoring all clobbered regs, and it did fix the
>> problem - just that it appeared like a (worse) hack to me. Aren't we
>> manually replicating what RA/reload should be doing?
>
>
> As it appears, we'll have to do it by hand.  The attaches patch is just a sketch that indicates how the problem could be approached.  Notice the new assertions in the split expanders; they will throw ICE until the fix is actually installed.
>
> The critical insn are generated in movMM expander and are changed to have no clobbers (SCRATCHes actually).  An a later pass, when appropriate life info can be made available, run yet another avr pass that
>
> 1a) Save-restore needed hard regs around the insn.
>
> 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new pseudos.  Maybe that could happen due to some hard regs progagation, or we can use a new predicate similar combine_pseudo_register_operand.
>
> 3) Replace scratch -> hard regs for all scratch_operands.
>
> 2b) Restore respective hard regs from their pseudos.
>
> 1b) Restore respective hard regs from their pseudos.
>
>
> And maybe we can get better code by allocating things like address register by hand and get better code then.
>
> When I implemented some of the libgcc insns I tried to express the operand by means of constraints, e.h. for (reg:HI 22) and let register allocator do the job.
>
> The resulting code was functional but *horrific*.
>
> The register allocator is not yet ready to generate efficient code in such demanding situations...
>
>>
>> What do you think?
>>
>
> IMO sooner or later we'll need such an infrastructure; maybe also for non-mov insn that are implemented by transparent libcalls like divmod, mul, etc.
>
>>>> When cfgexpand.c expands a function call, it first figures out the
>>>> registers in which the arguments will go, followed by expansion of the
>>>> arguments themselves (right to left). It later emits mov insns to set
>>>> the precomputed registers with the expanded RTXes.
>>>>
>>>> If one of the arguments is a __memx char pointer dereference, the mov
>>>> eventually expands to gen_xload<mode>_A (for certain cases), which
>>>> clobbers R22, R21 and Z.  This causes problems if one of those
>>>> clobbered registers was precomputed to hold another argument.
>>>>
>>>> In general, call expansion does not appear to take clobbers into account -
>
>
> We had been warned that using hard regs is evil...  But without that technique the code quality would decrease way too much.
>
>>>> when it checks for argument overlap, the RTX (args[i].value) is only a MEM
>>>> in QImode for the memx deref - the clobber shows up when it eventually
>>>> calls emit_move_insn, at which point, it is too late.
>
>
> Such situations could only be handled by a target hook which allowed to expand specific trees by hand...  Such a hook could cater for insn that must use hard registers.
>
>>>> This does not happen for a int pointer dereference - turns out that
>>>> precompute_register_parameters does a copy_to_mode_reg if the
>>>> cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
>>>> pseudo and later assigns the pseudo to the arg register. This is done
>>>> before any moves to arg registers is done, so other arguments are not
>>>> overwritten.
>>>>
>>>> Doing the same thing - providing a better cost estimate for a MEM rtx in
>>>> the non-default address space, makes this problem go away, and that is
>>>> what this patch does. Regression testing does not show any new failures.
>
>
> Can you tell something about overall code quality? If it is not significantly worse then I'd propose to apply your rtx-costs solution soon.  The full fix will take more time to work it out.

I'm agree with Georg.
Senthil Kumar Selvaraj April 17, 2015, 7:46 a.m. UTC | #5
On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote:
> Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
> >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> >>>This patch fixes PR 65657.
> >>
> >>The following artifact appears to be PR63633.
> >>
> >
> >I did see that one - unfortunately, that fix won't help here. IIUC, you
> >check if input/output operand hard regs are in the clobber list,
> >and if yes, you generate pseudos to save and restore clobbered hard
> >regs.
> >
> >In this case, the reg is actually clobbered by a different insn (one
> 
> Arrgh, yes...
> 
> >that loads the next argument to the function). So unless I blindly generate pseudos for
> >all regs in the clobber list, the clobbering will still happen.
> >
> >FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> >problem - just that it appeared like a (worse) hack to me. Aren't we
> >manually replicating what RA/reload should be doing?
> 
> As it appears, we'll have to do it by hand.  The attaches patch is just a
> sketch that indicates how the problem could be approached.  Notice the new
> assertions in the split expanders; they will throw ICE until the fix is
> actually installed.
> 
> The critical insn are generated in movMM expander and are changed to have no
> clobbers (SCRATCHes actually).  An a later pass, when appropriate life info
> can be made available, run yet another avr pass that
> 
> 1a) Save-restore needed hard regs around the insn.
> 
> 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new
> pseudos.  Maybe that could happen due to some hard regs progagation, or we
> can use a new predicate similar combine_pseudo_register_operand.
> 
> 3) Replace scratch -> hard regs for all scratch_operands.
> 
> 2b) Restore respective hard regs from their pseudos.
> 
> 1b) Restore respective hard regs from their pseudos.
> 
> 
> And maybe we can get better code by allocating things like address register
> by hand and get better code then.
> 
> When I implemented some of the libgcc insns I tried to express the operand
> by means of constraints, e.h. for (reg:HI 22) and let register allocator do
> the job.
> 
> The resulting code was functional but *horrific*.
> 
> The register allocator is not yet ready to generate efficient code in such
> demanding situations...
> 
> >
> >What do you think?
> >
> 
> IMO sooner or later we'll need such an infrastructure; maybe also for
> non-mov insn that are implemented by transparent libcalls like divmod, mul,
> etc.

I'm curious how other embedded targets handle this though - arm, for
example? Surely they should have some libcalls/builtins which need 
specific hard regs? I should check that out.

> 
> >>>When cfgexpand.c expands a function call, it first figures out the
> >>>registers in which the arguments will go, followed by expansion of the
> >>>arguments themselves (right to left). It later emits mov insns to set
> >>>the precomputed registers with the expanded RTXes.
> >>>
> >>>If one of the arguments is a __memx char pointer dereference, the mov
> >>>eventually expands to gen_xload<mode>_A (for certain cases), which
> >>>clobbers R22, R21 and Z.  This causes problems if one of those
> >>>clobbered registers was precomputed to hold another argument.
> >>>
> >>>In general, call expansion does not appear to take clobbers into account -
> 
> We had been warned that using hard regs is evil...  But without that
> technique the code quality would decrease way too much.

Oh ok - do you know some place where this is documented or was
discussed?
> 
> >>>when it checks for argument overlap, the RTX (args[i].value) is only a MEM
> >>>in QImode for the memx deref - the clobber shows up when it eventually
> >>>calls emit_move_insn, at which point, it is too late.
> 
> Such situations could only be handled by a target hook which allowed to
> expand specific trees by hand...  Such a hook could cater for insn that must
> use hard registers.

Yes, I guess so :(
> 
> >>>This does not happen for a int pointer dereference - turns out that
> >>>precompute_register_parameters does a copy_to_mode_reg if the
> >>>cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
> >>>pseudo and later assigns the pseudo to the arg register. This is done
> >>>before any moves to arg registers is done, so other arguments are not
> >>>overwritten.
> >>>
> >>>Doing the same thing - providing a better cost estimate for a MEM rtx in
> >>>the non-default address space, makes this problem go away, and that is
> >>>what this patch does. Regression testing does not show any new failures.
> 
> Can you tell something about overall code quality? If it is not
> significantly worse then I'd propose to apply your rtx-costs solution soon.
> The full fix will take more time to work it out.
> 

For this piece of code - 

void foo ( char c, unsigned int d);
void readx (const char __memx *x)
{
    foo (*x, 0xABCD);
}

the buggy code (for readx) looks like this

	mov r18,r22
	mov r25,r23
	ldi r22,lo8(-51)
	ldi r23,lo8(-85)
	mov r30,r18
	mov r31,r25
	mov r21,r24
	call __xload_1
	mov r24,r22
	jmp foo

With the patch, the code looks like this

	movw r30,r22
	mov r21,r24
	call __xload_1
	mov r24,r22
	ldi r22,lo8(-51)
	ldi r23,lo8(-85)
	jmp foo

The code turns out to be better in this case, as the loads to r22,r23
get done later.

The dump for cfgexpand is where the change originates - the memx dereference is 
saved in a pseudo before the moves to argument registers start.

Before:

(insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
        (reg:PSI 22 r22 [ x ])) foo.c:4 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:HI 22 r22)
        (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
     (nil))
(insn 7 6 8 2 (parallel [
            (set (reg:QI 24 r24)
                (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
            (clobber (reg:QI 22 r22))
            (clobber (reg:QI 21 r21))
            (clobber (reg:HI 30 r30))
        ]) foo.c:5 -1
     (nil))
(call_insn/j 8 7 9 2 (parallel [
            (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7f7129589900 foo>) [0 foo S2 A8])
                (const_int 0 [0]))
            (use (const_int 1 [0x1]))
        ]) foo.c:5 -1
     (nil)
    (expr_list:REG_EQUAL (use (reg:QI 24 r24))
        (expr_list:REG_NONNEG (use (reg:HI 22 r22))
            (nil))))
(barrier 9 8 0)


After:

(insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
        (reg:PSI 22 r22 [ x ])) foo.c:4 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:QI 44)
                (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
            (clobber (reg:QI 22 r22))
            (clobber (reg:QI 21 r21))
            (clobber (reg:HI 30 r30))
        ]) foo.c:5 -1
     (nil))
(insn 7 6 8 2 (set (reg:HI 22 r22)
        (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
     (nil))
(insn 8 7 9 2 (set (reg:QI 24 r24)
        (reg:QI 44)) foo.c:5 -1
     (nil))
(call_insn/j 9 8 10 2 (parallel [
            (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7fa986a0f900 foo>) [0 foo S2 A8])
                (const_int 0 [0]))
            (use (const_int 1 [0x1]))
        ]) foo.c:5 -1
     (nil)
    (expr_list:REG_EQUAL (use (reg:QI 24 r24))
        (expr_list:REG_NONNEG (use (reg:HI 22 r22))
            (nil))))
(barrier 10 9 0)

Without the patch, and with the arguments
reversed, like this

void foo (  unsigned int d, char c);
void readx (const char __memx *x)
{
    foo (0xABCD, *x);
}

the compiler elides the dereference itself - it just emits

	ldi r24,lo8(-51)
	ldi r25,lo8(-85)
	jmp foo

Scary :). That's because expand has

(insn 6 3 7 2 (parallel [
            (set (reg:QI 22 r22)
                (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
            (clobber (reg:QI 22 r22))
            (clobber (reg:QI 21 r21))
            (clobber (reg:HI 30 r30))
        ]) foo.c:5 -1
     (nil))

but this, I guess, will not happen if the fix for PR63633 was applied.

With my cost patch, I get

	movw r30,r22
	mov r21,r24
	call __xload_1
	ldi r24,lo8(-51)
	ldi r25,lo8(-85)
	jmp foo

For the testcase posted originally for the bug,
<snip>

extern void writeFlashByte(uint8_t byte, uint16_t handle);

void writeBuf(const uint8_t __memx *from, const uint8_t __memx *to)
{
    uint16_t handle = ((uint16_t)(((__uint24)to) & 0xFFFFUL));
    writeFlashByte(*from++, handle++);
}

</snip>

Before:

	mov r27,r22
	mov r26,r23
	mov r21,r24
	mov r24,r20
	movw r22,r18
	mov r30,r27
	mov r31,r26
	call __xload_1
	mov r24,r22
	jmp writeFlashByte

After:

	push r16
	push r17
	movw r16,r18
	mov r18,r20
	movw r30,r22
	mov r21,r24
	call __xload_1
	mov r24,r22
	movw r22,r16
	pop r17
	pop r16
	jmp writeFlashByte

Looks like worse code, but reload had to choose call saved regs (r16 and r17) for zero_extendpsisi's 
output operand, as we don't allow REG_X (or greater) for SI mode values
in avr_hard_regno_mode_ok.

Looks reasonable to me - what do you guys say?

Regards
Senthil
> 
> Johann
>
Denis Chertykov April 17, 2015, 11:33 a.m. UTC | #6
2015-04-17 10:46 GMT+03:00 Senthil Kumar Selvaraj
<senthil_kumar.selvaraj@atmel.com>:
>
> On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote:
> > Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> > >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
> > >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> > >>>This patch fixes PR 65657.
> > >>
> > >>The following artifact appears to be PR63633.
> > >>
> > >
> > >I did see that one - unfortunately, that fix won't help here. IIUC, you
> > >check if input/output operand hard regs are in the clobber list,
> > >and if yes, you generate pseudos to save and restore clobbered hard
> > >regs.
> > >
> > >In this case, the reg is actually clobbered by a different insn (one
> >
> > Arrgh, yes...
> >
> > >that loads the next argument to the function). So unless I blindly generate pseudos for
> > >all regs in the clobber list, the clobbering will still happen.
> > >
> > >FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> > >problem - just that it appeared like a (worse) hack to me. Aren't we
> > >manually replicating what RA/reload should be doing?
> >
> > As it appears, we'll have to do it by hand.  The attaches patch is just a
> > sketch that indicates how the problem could be approached.  Notice the new
> > assertions in the split expanders; they will throw ICE until the fix is
> > actually installed.
> >
> > The critical insn are generated in movMM expander and are changed to have no
> > clobbers (SCRATCHes actually).  An a later pass, when appropriate life info
> > can be made available, run yet another avr pass that
> >
> > 1a) Save-restore needed hard regs around the insn.
> >
> > 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new
> > pseudos.  Maybe that could happen due to some hard regs progagation, or we
> > can use a new predicate similar combine_pseudo_register_operand.
> >
> > 3) Replace scratch -> hard regs for all scratch_operands.
> >
> > 2b) Restore respective hard regs from their pseudos.
> >
> > 1b) Restore respective hard regs from their pseudos.
> >
> >
> > And maybe we can get better code by allocating things like address register
> > by hand and get better code then.
> >
> > When I implemented some of the libgcc insns I tried to express the operand
> > by means of constraints, e.h. for (reg:HI 22) and let register allocator do
> > the job.
> >
> > The resulting code was functional but *horrific*.
> >
> > The register allocator is not yet ready to generate efficient code in such
> > demanding situations...
> >
> > >
> > >What do you think?
> > >
> >
> > IMO sooner or later we'll need such an infrastructure; maybe also for
> > non-mov insn that are implemented by transparent libcalls like divmod, mul,
> > etc.
>
> I'm curious how other embedded targets handle this though - arm, for
> example? Surely they should have some libcalls/builtins which need
> specific hard regs? I should check that out.
>
> >
> > >>>When cfgexpand.c expands a function call, it first figures out the
> > >>>registers in which the arguments will go, followed by expansion of the
> > >>>arguments themselves (right to left). It later emits mov insns to set
> > >>>the precomputed registers with the expanded RTXes.
> > >>>
> > >>>If one of the arguments is a __memx char pointer dereference, the mov
> > >>>eventually expands to gen_xload<mode>_A (for certain cases), which
> > >>>clobbers R22, R21 and Z.  This causes problems if one of those
> > >>>clobbered registers was precomputed to hold another argument.
> > >>>
> > >>>In general, call expansion does not appear to take clobbers into account -
> >
> > We had been warned that using hard regs is evil...  But without that
> > technique the code quality would decrease way too much.
>
> Oh ok - do you know some place where this is documented or was
> discussed?

https://gcc.gnu.org/ml/gcc/2014-10/msg00207.html
Senthil Kumar Selvaraj April 23, 2015, 7:04 a.m. UTC | #7
On Fri, Apr 17, 2015 at 01:16:58PM +0530, Senthil Kumar Selvaraj wrote:
> On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote:
> > Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> > >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
> > >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> > >>>This patch fixes PR 65657.
> > >>
> > >>The following artifact appears to be PR63633.
> > >>
> > >
> > >I did see that one - unfortunately, that fix won't help here. IIUC, you
> > >check if input/output operand hard regs are in the clobber list,
> > >and if yes, you generate pseudos to save and restore clobbered hard
> > >regs.
> > >
> > >In this case, the reg is actually clobbered by a different insn (one
> > 
> > Arrgh, yes...
> > 
> > >that loads the next argument to the function). So unless I blindly generate pseudos for
> > >all regs in the clobber list, the clobbering will still happen.
> > >
> > >FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> > >problem - just that it appeared like a (worse) hack to me. Aren't we
> > >manually replicating what RA/reload should be doing?
> > 
> > As it appears, we'll have to do it by hand.  The attaches patch is just a
> > sketch that indicates how the problem could be approached.  Notice the new
> > assertions in the split expanders; they will throw ICE until the fix is
> > actually installed.
> > 
> > The critical insn are generated in movMM expander and are changed to have no
> > clobbers (SCRATCHes actually).  An a later pass, when appropriate life info
> > can be made available, run yet another avr pass that
> > 
> > 1a) Save-restore needed hard regs around the insn.
> > 
> > 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new
> > pseudos.  Maybe that could happen due to some hard regs progagation, or we
> > can use a new predicate similar combine_pseudo_register_operand.
> > 
> > 3) Replace scratch -> hard regs for all scratch_operands.
> > 
> > 2b) Restore respective hard regs from their pseudos.
> > 
> > 1b) Restore respective hard regs from their pseudos.
> > 
> > 
> > And maybe we can get better code by allocating things like address register
> > by hand and get better code then.
> > 
> > When I implemented some of the libgcc insns I tried to express the operand
> > by means of constraints, e.h. for (reg:HI 22) and let register allocator do
> > the job.
> > 
> > The resulting code was functional but *horrific*.
> > 
> > The register allocator is not yet ready to generate efficient code in such
> > demanding situations...
> > 
> > >
> > >What do you think?
> > >
> > 
> > IMO sooner or later we'll need such an infrastructure; maybe also for
> > non-mov insn that are implemented by transparent libcalls like divmod, mul,
> > etc.
> 
> I'm curious how other embedded targets handle this though - arm, for
> example? Surely they should have some libcalls/builtins which need 
> specific hard regs? I should check that out.
> 
> > 
> > >>>When cfgexpand.c expands a function call, it first figures out the
> > >>>registers in which the arguments will go, followed by expansion of the
> > >>>arguments themselves (right to left). It later emits mov insns to set
> > >>>the precomputed registers with the expanded RTXes.
> > >>>
> > >>>If one of the arguments is a __memx char pointer dereference, the mov
> > >>>eventually expands to gen_xload<mode>_A (for certain cases), which
> > >>>clobbers R22, R21 and Z.  This causes problems if one of those
> > >>>clobbered registers was precomputed to hold another argument.
> > >>>
> > >>>In general, call expansion does not appear to take clobbers into account -
> > 
> > We had been warned that using hard regs is evil...  But without that
> > technique the code quality would decrease way too much.
> 
> Oh ok - do you know some place where this is documented or was
> discussed?
> > 
> > >>>when it checks for argument overlap, the RTX (args[i].value) is only a MEM
> > >>>in QImode for the memx deref - the clobber shows up when it eventually
> > >>>calls emit_move_insn, at which point, it is too late.
> > 
> > Such situations could only be handled by a target hook which allowed to
> > expand specific trees by hand...  Such a hook could cater for insn that must
> > use hard registers.
> 
> Yes, I guess so :(
> > 
> > >>>This does not happen for a int pointer dereference - turns out that
> > >>>precompute_register_parameters does a copy_to_mode_reg if the
> > >>>cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
> > >>>pseudo and later assigns the pseudo to the arg register. This is done
> > >>>before any moves to arg registers is done, so other arguments are not
> > >>>overwritten.
> > >>>
> > >>>Doing the same thing - providing a better cost estimate for a MEM rtx in
> > >>>the non-default address space, makes this problem go away, and that is
> > >>>what this patch does. Regression testing does not show any new failures.
> > 
> > Can you tell something about overall code quality? If it is not
> > significantly worse then I'd propose to apply your rtx-costs solution soon.
> > The full fix will take more time to work it out.
> > 
> 
> For this piece of code - 
> 
> void foo ( char c, unsigned int d);
> void readx (const char __memx *x)
> {
>     foo (*x, 0xABCD);
> }
> 
> the buggy code (for readx) looks like this
> 
> 	mov r18,r22
> 	mov r25,r23
> 	ldi r22,lo8(-51)
> 	ldi r23,lo8(-85)
> 	mov r30,r18
> 	mov r31,r25
> 	mov r21,r24
> 	call __xload_1
> 	mov r24,r22
> 	jmp foo
> 
> With the patch, the code looks like this
> 
> 	movw r30,r22
> 	mov r21,r24
> 	call __xload_1
> 	mov r24,r22
> 	ldi r22,lo8(-51)
> 	ldi r23,lo8(-85)
> 	jmp foo
> 
> The code turns out to be better in this case, as the loads to r22,r23
> get done later.
> 
> The dump for cfgexpand is where the change originates - the memx dereference is 
> saved in a pseudo before the moves to argument registers start.
> 
> Before:
> 
> (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
>         (reg:PSI 22 r22 [ x ])) foo.c:4 -1
>      (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (set (reg:HI 22 r22)
>         (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
>      (nil))
> (insn 7 6 8 2 (parallel [
>             (set (reg:QI 24 r24)
>                 (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) foo.c:5 -1
>      (nil))
> (call_insn/j 8 7 9 2 (parallel [
>             (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7f7129589900 foo>) [0 foo S2 A8])
>                 (const_int 0 [0]))
>             (use (const_int 1 [0x1]))
>         ]) foo.c:5 -1
>      (nil)
>     (expr_list:REG_EQUAL (use (reg:QI 24 r24))
>         (expr_list:REG_NONNEG (use (reg:HI 22 r22))
>             (nil))))
> (barrier 9 8 0)
> 
> 
> After:
> 
> (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
>         (reg:PSI 22 r22 [ x ])) foo.c:4 -1
>      (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
>             (set (reg:QI 44)
>                 (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) foo.c:5 -1
>      (nil))
> (insn 7 6 8 2 (set (reg:HI 22 r22)
>         (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
>      (nil))
> (insn 8 7 9 2 (set (reg:QI 24 r24)
>         (reg:QI 44)) foo.c:5 -1
>      (nil))
> (call_insn/j 9 8 10 2 (parallel [
>             (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7fa986a0f900 foo>) [0 foo S2 A8])
>                 (const_int 0 [0]))
>             (use (const_int 1 [0x1]))
>         ]) foo.c:5 -1
>      (nil)
>     (expr_list:REG_EQUAL (use (reg:QI 24 r24))
>         (expr_list:REG_NONNEG (use (reg:HI 22 r22))
>             (nil))))
> (barrier 10 9 0)
> 
> Without the patch, and with the arguments
> reversed, like this
> 
> void foo (  unsigned int d, char c);
> void readx (const char __memx *x)
> {
>     foo (0xABCD, *x);
> }
> 
> the compiler elides the dereference itself - it just emits
> 
> 	ldi r24,lo8(-51)
> 	ldi r25,lo8(-85)
> 	jmp foo
> 
> Scary :). That's because expand has
> 
> (insn 6 3 7 2 (parallel [
>             (set (reg:QI 22 r22)
>                 (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) foo.c:5 -1
>      (nil))
> 
> but this, I guess, will not happen if the fix for PR63633 was applied.
> 
> With my cost patch, I get
> 
> 	movw r30,r22
> 	mov r21,r24
> 	call __xload_1
> 	ldi r24,lo8(-51)
> 	ldi r25,lo8(-85)
> 	jmp foo
> 
> For the testcase posted originally for the bug,
> <snip>
> 
> extern void writeFlashByte(uint8_t byte, uint16_t handle);
> 
> void writeBuf(const uint8_t __memx *from, const uint8_t __memx *to)
> {
>     uint16_t handle = ((uint16_t)(((__uint24)to) & 0xFFFFUL));
>     writeFlashByte(*from++, handle++);
> }
> 
> </snip>
> 
> Before:
> 
> 	mov r27,r22
> 	mov r26,r23
> 	mov r21,r24
> 	mov r24,r20
> 	movw r22,r18
> 	mov r30,r27
> 	mov r31,r26
> 	call __xload_1
> 	mov r24,r22
> 	jmp writeFlashByte
> 
> After:
> 
> 	push r16
> 	push r17
> 	movw r16,r18
> 	mov r18,r20
> 	movw r30,r22
> 	mov r21,r24
> 	call __xload_1
> 	mov r24,r22
> 	movw r22,r16
> 	pop r17
> 	pop r16
> 	jmp writeFlashByte
> 
> Looks like worse code, but reload had to choose call saved regs (r16 and r17) for zero_extendpsisi's 
> output operand, as we don't allow REG_X (or greater) for SI mode values
> in avr_hard_regno_mode_ok.
> 
> Looks reasonable to me - what do you guys say?

I realize Johann is working on eliminating hard regs usage in favor of register
classes/constraints instead, but I think this patch that sets the correct 
cost for MEM rtx in non-default address space does nothing "wrong", even
if it is only a tangential fix to the problem.

What do you guys say?

> 
> Regards
> Senthil
> > 
> > Johann
> >
diff mbox

Patch

diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 68d5ddc..c9b8678 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -9959,7 +9959,11 @@  avr_rtx_costs_1 (rtx x, int codearg, int outer_code ATTRIBUTE_UNUSED,
       return true;
 
     case MEM:
-      *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+      /* MEM rtx with non-default address space is more
+         expensive. Not expressing that results in reg
+         clobber during expand (PR 65657). */
+      *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)
+                  + (MEM_ADDR_SPACE(x) == ADDR_SPACE_RAM ? 0 : 5));
       return true;
 
     case NEG:
diff --git gcc/testsuite/gcc.target/avr/pr65657.c gcc/testsuite/gcc.target/avr/pr65657.c
new file mode 100644
index 0000000..d908fe3
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr65657.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run } */
+/* { dg-options "-Os" } */
+
+/* When cfgexpand expands the call to foo, it
+   assigns registers R22:R23 to 0xABCD and R24
+   to *x. Because x is a __memx address space
+   pointer, the dereference results in an RTL
+   insn that clobbers R22 among other 
+   registers (xload<mode>_A).
+
+   Call expansion does not take this into account
+   and ends up clobbering R22 set previously to 
+   hold part of the second argument.
+*/
+
+#include <stdlib.h>
+
+void __attribute__((noinline))
+    __attribute__((noclone))
+foo (char c, volatile unsigned int d)
+{
+    if (d != 0xABCD)
+      abort();
+    if (c != 'A')
+        abort();
+}
+
+void __attribute__((noinline))
+    __attribute__((noclone))
+readx(const char __memx *x)
+{
+    foo (*x, 0xABCD);
+}
+
+const char __memx arr[] = { 'A' };
+int main()
+{
+   readx (arr); 
+}