diff mbox

[AArch64] Accept more addressing modes for PRFM

Message ID 5894BA14.4090004@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Feb. 3, 2017, 5:12 p.m. UTC
Hi all,

While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 prefetch pattern is
overly restrictive in its address operand. It only accepts simple register addressing modes.
In fact, the PRFM instruction accepts almost all modes that a normal 64-bit LDR supports.
The restriction in the pattern leads to explicit address calculation code to be emitted which we could avoid.

This patch relaxes the restrictions on the prefetch define_insn. It creates a predicate and constraint that
allow the full addressing modes that PRFM allows. Thus for the testcase in the patch (adapted from one of the existing
__builtin_prefetch tests in the testsuite) we can generate a:
prfm    PLDL1STRM, [x1, 8]

instead of the current
prfm    PLDL1STRM, [x1]
with an explicit increment of x1 by 8 in a separate instruction.

I've removed the %a output modifier in the output template and wrapped the address operand into a DImode MEM before
passing it down to aarch64_print_operand.

This is because operand 0 is an address operand rather than a memory operand and thus doesn't have a mode associated
with it.  When processing the 'a' output modifier the code in final.c will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
argument.  This will ICE on aarch64 because we need a mode for the memory in order for aarch64_classify_address to work
correctly.  Rather than overriding the VOIDmode in aarch64_print_operand_address I decided to instead create the DImode
MEM in the "prefetch" output template and treat it as a normal 64-bit memory address, which at the point of assembly output
is what it is anyway.

With this patch I see a reduction in instruction count in the SPEC2006 benchmarks when SW prefetching is enabled on top
of Maxim's patchset because fewer address calculation instructions are emitted due to the use of the more expressive
addressing modes. It also fixes a performance regression that I observed in 410.bwaves from Maxim's patches on Cortex-A72.
I'll be running a full set of benchmarks to evaluate this further, but I think this is the right thing to do.

Bootstrapped and tested on aarch64-none-linux-gnu.

Maxim, do you want to try this on top of your patches on your hardware to see if it helps with the regressions you mentioned?

Thanks,
Kyrill


[1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html

2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (prefetch); Adjust predicate and
     constraint on operand 0 to allow more general addressing modes.
     Adjust output template.
     * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
     New function.
     * config/aarch64/aarch64-protos.h
     (aarch64_address_valid_for_prefetch_p): Declare prototype.
     * config/aarch64/constraints.md (Dp): New address constraint.
     * config/aarch64/predicates.md (aarch64_prefetch_operand): New
     predicate.

2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/prfm_imm_offset_1.c: New test.

Comments

Maxim Kuvyrkov Feb. 4, 2017, 8:19 a.m. UTC | #1
> On Feb 3, 2017, at 8:12 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> 
> Hi all,
> 
> While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 prefetch pattern is
> overly restrictive in its address operand. It only accepts simple register addressing modes.
> In fact, the PRFM instruction accepts almost all modes that a normal 64-bit LDR supports.
> The restriction in the pattern leads to explicit address calculation code to be emitted which we could avoid.

Thanks for this fix, I'll test it on my hardware.

I've reviewed your patch and it looks OK to me.

> 
> This patch relaxes the restrictions on the prefetch define_insn. It creates a predicate and constraint that
> allow the full addressing modes that PRFM allows. Thus for the testcase in the patch (adapted from one of the existing
> __builtin_prefetch tests in the testsuite) we can generate a:
> prfm    PLDL1STRM, [x1, 8]
> 
> instead of the current
> prfm    PLDL1STRM, [x1]
> with an explicit increment of x1 by 8 in a separate instruction.
> 
> I've removed the %a output modifier in the output template and wrapped the address operand into a DImode MEM before
> passing it down to aarch64_print_operand.
> 
> This is because operand 0 is an address operand rather than a memory operand and thus doesn't have a mode associated
> with it.  When processing the 'a' output modifier the code in final.c will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
> argument.  This will ICE on aarch64 because we need a mode for the memory in order for aarch64_classify_address to work
> correctly.  Rather than overriding the VOIDmode in aarch64_print_operand_address I decided to instead create the DImode
> MEM in the "prefetch" output template and treat it as a normal 64-bit memory address, which at the point of assembly output
> is what it is anyway.

I agree that it is cleaner to convert operand of prefetch to DImode just before printing out to assembly.  There is little to be gained in relaxing asserts in aarch64_print_operand_address.

> 
> With this patch I see a reduction in instruction count in the SPEC2006 benchmarks when SW prefetching is enabled on top
> of Maxim's patchset because fewer address calculation instructions are emitted due to the use of the more expressive
> addressing modes. It also fixes a performance regression that I observed in 410.bwaves from Maxim's patches on Cortex-A72.
> I'll be running a full set of benchmarks to evaluate this further, but I think this is the right thing to do.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Maxim, do you want to try this on top of your patches on your hardware to see if it helps with the regressions you mentioned?

Sure.

--
Maxim Kuvyrkov
www.linaro.org
Richard Earnshaw (lists) Feb. 15, 2017, 3 p.m. UTC | #2
On 03/02/17 17:12, Kyrill Tkachov wrote:
> Hi all,
> 
> While evaluating Maxim's SW prefetch patches [1] I noticed that the
> aarch64 prefetch pattern is
> overly restrictive in its address operand. It only accepts simple
> register addressing modes.
> In fact, the PRFM instruction accepts almost all modes that a normal
> 64-bit LDR supports.
> The restriction in the pattern leads to explicit address calculation
> code to be emitted which we could avoid.
> 
> This patch relaxes the restrictions on the prefetch define_insn. It
> creates a predicate and constraint that
> allow the full addressing modes that PRFM allows. Thus for the testcase
> in the patch (adapted from one of the existing
> __builtin_prefetch tests in the testsuite) we can generate a:
> prfm    PLDL1STRM, [x1, 8]
> 
> instead of the current
> prfm    PLDL1STRM, [x1]
> with an explicit increment of x1 by 8 in a separate instruction.
> 
> I've removed the %a output modifier in the output template and wrapped
> the address operand into a DImode MEM before
> passing it down to aarch64_print_operand.
> 
> This is because operand 0 is an address operand rather than a memory
> operand and thus doesn't have a mode associated
> with it.  When processing the 'a' output modifier the code in final.c
> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
> argument.  This will ICE on aarch64 because we need a mode for the
> memory in order for aarch64_classify_address to work
> correctly.  Rather than overriding the VOIDmode in
> aarch64_print_operand_address I decided to instead create the DImode
> MEM in the "prefetch" output template and treat it as a normal 64-bit
> memory address, which at the point of assembly output
> is what it is anyway.
> 
> With this patch I see a reduction in instruction count in the SPEC2006
> benchmarks when SW prefetching is enabled on top
> of Maxim's patchset because fewer address calculation instructions are
> emitted due to the use of the more expressive
> addressing modes. It also fixes a performance regression that I observed
> in 410.bwaves from Maxim's patches on Cortex-A72.
> I'll be running a full set of benchmarks to evaluate this further, but I
> think this is the right thing to do.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Maxim, do you want to try this on top of your patches on your hardware
> to see if it helps with the regressions you mentioned?
> 
> Thanks,
> Kyrill
> 
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
> 
> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>     constraint on operand 0 to allow more general addressing modes.
>     Adjust output template.
>     * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>     New function.
>     * config/aarch64/aarch64-protos.h
>     (aarch64_address_valid_for_prefetch_p): Declare prototype.
>     * config/aarch64/constraints.md (Dp): New address constraint.
>     * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>     predicate.
> 
> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
> 
> aarch64-prfm-imm.patch
> 

Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
contains an address, not a MEM.  So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly.  This change would break that assumption.

R.

> 
> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date:   Thu Feb 2 14:46:11 2017 +0000
> 
>     [AArch64] Accept more addressing modes for PRFM
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index babc327..61706de 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>  
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index acc093a..c05eff3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info,
>      }
>  }
>  
> +/* Return true if the address X is valid for a PRFM instruction.
> +   STRICT_P is true if we should do strict checking with
> +   aarch64_classify_address.  */
> +
> +bool
> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
> +{
> +  struct aarch64_address_info addr;
> +
> +  /* PRFM accepts the same addresses as DImode...  */
> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
> +  if (!res)
> +    return false;
> +
> +  /* ... except writeback forms.  */
> +  return addr.type != ADDRESS_REG_WB;
> +}
> +
>  bool
>  aarch64_symbolic_address_p (rtx x)
>  {
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b9e8edf..c6201a5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -518,27 +518,31 @@ (define_insn "nop"
>  )
>  
>  (define_insn "prefetch"
> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>              (match_operand:QI 1 "const_int_operand" "")
>              (match_operand:QI 2 "const_int_operand" ""))]
>    ""
>    {
> -    const char * pftype[2][4] = 
> +    const char * pftype[2][4] =
>      {
> -      {"prfm\\tPLDL1STRM, %a0",
> -       "prfm\\tPLDL3KEEP, %a0",
> -       "prfm\\tPLDL2KEEP, %a0",
> -       "prfm\\tPLDL1KEEP, %a0"},
> -      {"prfm\\tPSTL1STRM, %a0",
> -       "prfm\\tPSTL3KEEP, %a0",
> -       "prfm\\tPSTL2KEEP, %a0",
> -       "prfm\\tPSTL1KEEP, %a0"},
> +      {"prfm\\tPLDL1STRM, %0",
> +       "prfm\\tPLDL3KEEP, %0",
> +       "prfm\\tPLDL2KEEP, %0",
> +       "prfm\\tPLDL1KEEP, %0"},
> +      {"prfm\\tPSTL1STRM, %0",
> +       "prfm\\tPSTL3KEEP, %0",
> +       "prfm\\tPSTL2KEEP, %0",
> +       "prfm\\tPSTL1KEEP, %0"},
>      };
>  
>      int locality = INTVAL (operands[2]);
>  
>      gcc_assert (IN_RANGE (locality, 0, 3));
>  
> +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
> +       the address into a DImode MEM so that aarch64_print_operand knows
> +       how to print it.  */
> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>      return pftype[INTVAL(operands[1])][locality];
>    }
>    [(set_attr "type" "load1")]
> diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
> index 5a252c0..b829337 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>   A constraint that matches an immediate operand valid for AdvSIMD scalar."
>   (and (match_code "const_int")
>        (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
> +
> +(define_address_constraint "Dp"
> +  "@internal
> + An address valid for a prefetch instruction."
> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> index e83d45b..8e3ea9b 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>         (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
>  					       0)")))
>  
> +(define_predicate "aarch64_prefetch_operand"
> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
> +
>  (define_predicate "aarch64_valid_symref"
>    (match_code "const, symbol_ref, label_ref")
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
> new file mode 100644
> index 0000000..26ab913
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Check that we can generate the immediate-offset addressing
> +   mode for PRFM.  */
> +
> +#define ARRSIZE 65
> +int *bad_addr[ARRSIZE];
> +
> +void
> +prefetch_for_read (void)
> +{
> +  int i;
> +  for (i = 0; i < ARRSIZE; i++)
> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
> +}
> +
> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */
>
Kyrill Tkachov Feb. 15, 2017, 3:03 p.m. UTC | #3
Hi Richard,

On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
> On 03/02/17 17:12, Kyrill Tkachov wrote:
>> Hi all,
>>
>> While evaluating Maxim's SW prefetch patches [1] I noticed that the
>> aarch64 prefetch pattern is
>> overly restrictive in its address operand. It only accepts simple
>> register addressing modes.
>> In fact, the PRFM instruction accepts almost all modes that a normal
>> 64-bit LDR supports.
>> The restriction in the pattern leads to explicit address calculation
>> code to be emitted which we could avoid.
>>
>> This patch relaxes the restrictions on the prefetch define_insn. It
>> creates a predicate and constraint that
>> allow the full addressing modes that PRFM allows. Thus for the testcase
>> in the patch (adapted from one of the existing
>> __builtin_prefetch tests in the testsuite) we can generate a:
>> prfm    PLDL1STRM, [x1, 8]
>>
>> instead of the current
>> prfm    PLDL1STRM, [x1]
>> with an explicit increment of x1 by 8 in a separate instruction.
>>
>> I've removed the %a output modifier in the output template and wrapped
>> the address operand into a DImode MEM before
>> passing it down to aarch64_print_operand.
>>
>> This is because operand 0 is an address operand rather than a memory
>> operand and thus doesn't have a mode associated
>> with it.  When processing the 'a' output modifier the code in final.c
>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
>> argument.  This will ICE on aarch64 because we need a mode for the
>> memory in order for aarch64_classify_address to work
>> correctly.  Rather than overriding the VOIDmode in
>> aarch64_print_operand_address I decided to instead create the DImode
>> MEM in the "prefetch" output template and treat it as a normal 64-bit
>> memory address, which at the point of assembly output
>> is what it is anyway.
>>
>> With this patch I see a reduction in instruction count in the SPEC2006
>> benchmarks when SW prefetching is enabled on top
>> of Maxim's patchset because fewer address calculation instructions are
>> emitted due to the use of the more expressive
>> addressing modes. It also fixes a performance regression that I observed
>> in 410.bwaves from Maxim's patches on Cortex-A72.
>> I'll be running a full set of benchmarks to evaluate this further, but I
>> think this is the right thing to do.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>
>> Maxim, do you want to try this on top of your patches on your hardware
>> to see if it helps with the regressions you mentioned?
>>
>> Thanks,
>> Kyrill
>>
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>>
>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>>      constraint on operand 0 to allow more general addressing modes.
>>      Adjust output template.
>>      * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>>      New function.
>>      * config/aarch64/aarch64-protos.h
>>      (aarch64_address_valid_for_prefetch_p): Declare prototype.
>>      * config/aarch64/constraints.md (Dp): New address constraint.
>>      * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>>      predicate.
>>
>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>>
>> aarch64-prfm-imm.patch
>>
> Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
> contains an address, not a MEM.  So it's theoretically possible for
> generic code to want to look inside the first operand and find an
> address directly.  This change would break that assumption.

With this change the prefetch operand is still an address, not a MEM during all the
optimisation passes.
It's wrapped in a MEM only during the ultimate printing of the assembly string
during 'final'.

Kyrill

> R.
>
>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>> Date:   Thu Feb 2 14:46:11 2017 +0000
>>
>>      [AArch64] Accept more addressing modes for PRFM
>>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>> index babc327..61706de 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>>   
>>   HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>>   int aarch64_get_condition_code (rtx);
>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>>   bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>   unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>>   unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index acc093a..c05eff3 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info,
>>       }
>>   }
>>   
>> +/* Return true if the address X is valid for a PRFM instruction.
>> +   STRICT_P is true if we should do strict checking with
>> +   aarch64_classify_address.  */
>> +
>> +bool
>> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
>> +{
>> +  struct aarch64_address_info addr;
>> +
>> +  /* PRFM accepts the same addresses as DImode...  */
>> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
>> +  if (!res)
>> +    return false;
>> +
>> +  /* ... except writeback forms.  */
>> +  return addr.type != ADDRESS_REG_WB;
>> +}
>> +
>>   bool
>>   aarch64_symbolic_address_p (rtx x)
>>   {
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index b9e8edf..c6201a5 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -518,27 +518,31 @@ (define_insn "nop"
>>   )
>>   
>>   (define_insn "prefetch"
>> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
>> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>>               (match_operand:QI 1 "const_int_operand" "")
>>               (match_operand:QI 2 "const_int_operand" ""))]
>>     ""
>>     {
>> -    const char * pftype[2][4] =
>> +    const char * pftype[2][4] =
>>       {
>> -      {"prfm\\tPLDL1STRM, %a0",
>> -       "prfm\\tPLDL3KEEP, %a0",
>> -       "prfm\\tPLDL2KEEP, %a0",
>> -       "prfm\\tPLDL1KEEP, %a0"},
>> -      {"prfm\\tPSTL1STRM, %a0",
>> -       "prfm\\tPSTL3KEEP, %a0",
>> -       "prfm\\tPSTL2KEEP, %a0",
>> -       "prfm\\tPSTL1KEEP, %a0"},
>> +      {"prfm\\tPLDL1STRM, %0",
>> +       "prfm\\tPLDL3KEEP, %0",
>> +       "prfm\\tPLDL2KEEP, %0",
>> +       "prfm\\tPLDL1KEEP, %0"},
>> +      {"prfm\\tPSTL1STRM, %0",
>> +       "prfm\\tPSTL3KEEP, %0",
>> +       "prfm\\tPSTL2KEEP, %0",
>> +       "prfm\\tPSTL1KEEP, %0"},
>>       };
>>   
>>       int locality = INTVAL (operands[2]);
>>   
>>       gcc_assert (IN_RANGE (locality, 0, 3));
>>   
>> +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
>> +       the address into a DImode MEM so that aarch64_print_operand knows
>> +       how to print it.  */
>> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>>       return pftype[INTVAL(operands[1])][locality];
>>     }
>>     [(set_attr "type" "load1")]
>> diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
>> index 5a252c0..b829337 100644
>> --- a/gcc/config/aarch64/constraints.md
>> +++ b/gcc/config/aarch64/constraints.md
>> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>>    A constraint that matches an immediate operand valid for AdvSIMD scalar."
>>    (and (match_code "const_int")
>>         (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
>> +
>> +(define_address_constraint "Dp"
>> +  "@internal
>> + An address valid for a prefetch instruction."
>> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
>> index e83d45b..8e3ea9b 100644
>> --- a/gcc/config/aarch64/predicates.md
>> +++ b/gcc/config/aarch64/predicates.md
>> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>>          (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
>>   					       0)")))
>>   
>> +(define_predicate "aarch64_prefetch_operand"
>> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
>> +
>>   (define_predicate "aarch64_valid_symref"
>>     (match_code "const, symbol_ref, label_ref")
>>   {
>> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>> new file mode 100644
>> index 0000000..26ab913
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +/* Check that we can generate the immediate-offset addressing
>> +   mode for PRFM.  */
>> +
>> +#define ARRSIZE 65
>> +int *bad_addr[ARRSIZE];
>> +
>> +void
>> +prefetch_for_read (void)
>> +{
>> +  int i;
>> +  for (i = 0; i < ARRSIZE; i++)
>> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */
>>
Richard Earnshaw (lists) Feb. 15, 2017, 3:30 p.m. UTC | #4
On 15/02/17 15:03, Kyrill Tkachov wrote:
> Hi Richard,
> 
> On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
>> On 03/02/17 17:12, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> While evaluating Maxim's SW prefetch patches [1] I noticed that the
>>> aarch64 prefetch pattern is
>>> overly restrictive in its address operand. It only accepts simple
>>> register addressing modes.
>>> In fact, the PRFM instruction accepts almost all modes that a normal
>>> 64-bit LDR supports.
>>> The restriction in the pattern leads to explicit address calculation
>>> code to be emitted which we could avoid.
>>>
>>> This patch relaxes the restrictions on the prefetch define_insn. It
>>> creates a predicate and constraint that
>>> allow the full addressing modes that PRFM allows. Thus for the testcase
>>> in the patch (adapted from one of the existing
>>> __builtin_prefetch tests in the testsuite) we can generate a:
>>> prfm    PLDL1STRM, [x1, 8]
>>>
>>> instead of the current
>>> prfm    PLDL1STRM, [x1]
>>> with an explicit increment of x1 by 8 in a separate instruction.
>>>
>>> I've removed the %a output modifier in the output template and wrapped
>>> the address operand into a DImode MEM before
>>> passing it down to aarch64_print_operand.
>>>
>>> This is because operand 0 is an address operand rather than a memory
>>> operand and thus doesn't have a mode associated
>>> with it.  When processing the 'a' output modifier the code in final.c
>>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
>>> argument.  This will ICE on aarch64 because we need a mode for the
>>> memory in order for aarch64_classify_address to work
>>> correctly.  Rather than overriding the VOIDmode in
>>> aarch64_print_operand_address I decided to instead create the DImode
>>> MEM in the "prefetch" output template and treat it as a normal 64-bit
>>> memory address, which at the point of assembly output
>>> is what it is anyway.
>>>
>>> With this patch I see a reduction in instruction count in the SPEC2006
>>> benchmarks when SW prefetching is enabled on top
>>> of Maxim's patchset because fewer address calculation instructions are
>>> emitted due to the use of the more expressive
>>> addressing modes. It also fixes a performance regression that I observed
>>> in 410.bwaves from Maxim's patches on Cortex-A72.
>>> I'll be running a full set of benchmarks to evaluate this further, but I
>>> think this is the right thing to do.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Maxim, do you want to try this on top of your patches on your hardware
>>> to see if it helps with the regressions you mentioned?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>>>
>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>>>      constraint on operand 0 to allow more general addressing modes.
>>>      Adjust output template.
>>>      * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>>>      New function.
>>>      * config/aarch64/aarch64-protos.h
>>>      (aarch64_address_valid_for_prefetch_p): Declare prototype.
>>>      * config/aarch64/constraints.md (Dp): New address constraint.
>>>      * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>>>      predicate.
>>>
>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>>>
>>> aarch64-prfm-imm.patch
>>>
>> Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
>> contains an address, not a MEM.  So it's theoretically possible for
>> generic code to want to look inside the first operand and find an
>> address directly.  This change would break that assumption.
> 
> With this change the prefetch operand is still an address, not a MEM
> during all the
> optimisation passes.
> It's wrapped in a MEM only during the ultimate printing of the assembly
> string
> during 'final'.
> 

Ah!  I'd missed that.

This is OK for stage1.

R.

> Kyrill
> 
>> R.
>>
>>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
>>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>> Date:   Thu Feb 2 14:46:11 2017 +0000
>>>
>>>      [AArch64] Accept more addressing modes for PRFM
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index babc327..61706de 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>>>     HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>> unsigned);
>>>   int aarch64_get_condition_code (rtx);
>>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>>>   bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>>   unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>>>   unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index acc093a..c05eff3 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct
>>> aarch64_address_info *info,
>>>       }
>>>   }
>>>   +/* Return true if the address X is valid for a PRFM instruction.
>>> +   STRICT_P is true if we should do strict checking with
>>> +   aarch64_classify_address.  */
>>> +
>>> +bool
>>> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
>>> +{
>>> +  struct aarch64_address_info addr;
>>> +
>>> +  /* PRFM accepts the same addresses as DImode...  */
>>> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM,
>>> strict_p);
>>> +  if (!res)
>>> +    return false;
>>> +
>>> +  /* ... except writeback forms.  */
>>> +  return addr.type != ADDRESS_REG_WB;
>>> +}
>>> +
>>>   bool
>>>   aarch64_symbolic_address_p (rtx x)
>>>   {
>>> diff --git a/gcc/config/aarch64/aarch64.md
>>> b/gcc/config/aarch64/aarch64.md
>>> index b9e8edf..c6201a5 100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -518,27 +518,31 @@ (define_insn "nop"
>>>   )
>>>     (define_insn "prefetch"
>>> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
>>> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>>>               (match_operand:QI 1 "const_int_operand" "")
>>>               (match_operand:QI 2 "const_int_operand" ""))]
>>>     ""
>>>     {
>>> -    const char * pftype[2][4] =
>>> +    const char * pftype[2][4] =
>>>       {
>>> -      {"prfm\\tPLDL1STRM, %a0",
>>> -       "prfm\\tPLDL3KEEP, %a0",
>>> -       "prfm\\tPLDL2KEEP, %a0",
>>> -       "prfm\\tPLDL1KEEP, %a0"},
>>> -      {"prfm\\tPSTL1STRM, %a0",
>>> -       "prfm\\tPSTL3KEEP, %a0",
>>> -       "prfm\\tPSTL2KEEP, %a0",
>>> -       "prfm\\tPSTL1KEEP, %a0"},
>>> +      {"prfm\\tPLDL1STRM, %0",
>>> +       "prfm\\tPLDL3KEEP, %0",
>>> +       "prfm\\tPLDL2KEEP, %0",
>>> +       "prfm\\tPLDL1KEEP, %0"},
>>> +      {"prfm\\tPSTL1STRM, %0",
>>> +       "prfm\\tPSTL3KEEP, %0",
>>> +       "prfm\\tPSTL2KEEP, %0",
>>> +       "prfm\\tPSTL1KEEP, %0"},
>>>       };
>>>         int locality = INTVAL (operands[2]);
>>>         gcc_assert (IN_RANGE (locality, 0, 3));
>>>   +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
>>> +       the address into a DImode MEM so that aarch64_print_operand
>>> knows
>>> +       how to print it.  */
>>> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>>>       return pftype[INTVAL(operands[1])][locality];
>>>     }
>>>     [(set_attr "type" "load1")]
>>> diff --git a/gcc/config/aarch64/constraints.md
>>> b/gcc/config/aarch64/constraints.md
>>> index 5a252c0..b829337 100644
>>> --- a/gcc/config/aarch64/constraints.md
>>> +++ b/gcc/config/aarch64/constraints.md
>>> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>>>    A constraint that matches an immediate operand valid for AdvSIMD
>>> scalar."
>>>    (and (match_code "const_int")
>>>         (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
>>> +
>>> +(define_address_constraint "Dp"
>>> +  "@internal
>>> + An address valid for a prefetch instruction."
>>> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
>>> diff --git a/gcc/config/aarch64/predicates.md
>>> b/gcc/config/aarch64/predicates.md
>>> index e83d45b..8e3ea9b 100644
>>> --- a/gcc/config/aarch64/predicates.md
>>> +++ b/gcc/config/aarch64/predicates.md
>>> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>>>          (match_test "aarch64_legitimate_address_p (mode, XEXP (op,
>>> 0), PARALLEL,
>>>                              0)")))
>>>   +(define_predicate "aarch64_prefetch_operand"
>>> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
>>> +
>>>   (define_predicate "aarch64_valid_symref"
>>>     (match_code "const, symbol_ref, label_ref")
>>>   {
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>> b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>> new file mode 100644
>>> index 0000000..26ab913
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>> @@ -0,0 +1,18 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +/* Check that we can generate the immediate-offset addressing
>>> +   mode for PRFM.  */
>>> +
>>> +#define ARRSIZE 65
>>> +int *bad_addr[ARRSIZE];
>>> +
>>> +void
>>> +prefetch_for_read (void)
>>> +{
>>> +  int i;
>>> +  for (i = 0; i < ARRSIZE; i++)
>>> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1
>>> } } */
>>>
>
Kyrill Tkachov May 4, 2017, 4:15 p.m. UTC | #5
On 15/02/17 15:30, Richard Earnshaw (lists) wrote:
> On 15/02/17 15:03, Kyrill Tkachov wrote:
>> Hi Richard,
>>
>> On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
>>> On 03/02/17 17:12, Kyrill Tkachov wrote:
>>>> Hi all,
>>>>
>>>> While evaluating Maxim's SW prefetch patches [1] I noticed that the
>>>> aarch64 prefetch pattern is
>>>> overly restrictive in its address operand. It only accepts simple
>>>> register addressing modes.
>>>> In fact, the PRFM instruction accepts almost all modes that a normal
>>>> 64-bit LDR supports.
>>>> The restriction in the pattern leads to explicit address calculation
>>>> code to be emitted which we could avoid.
>>>>
>>>> This patch relaxes the restrictions on the prefetch define_insn. It
>>>> creates a predicate and constraint that
>>>> allow the full addressing modes that PRFM allows. Thus for the testcase
>>>> in the patch (adapted from one of the existing
>>>> __builtin_prefetch tests in the testsuite) we can generate a:
>>>> prfm    PLDL1STRM, [x1, 8]
>>>>
>>>> instead of the current
>>>> prfm    PLDL1STRM, [x1]
>>>> with an explicit increment of x1 by 8 in a separate instruction.
>>>>
>>>> I've removed the %a output modifier in the output template and wrapped
>>>> the address operand into a DImode MEM before
>>>> passing it down to aarch64_print_operand.
>>>>
>>>> This is because operand 0 is an address operand rather than a memory
>>>> operand and thus doesn't have a mode associated
>>>> with it.  When processing the 'a' output modifier the code in final.c
>>>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
>>>> argument.  This will ICE on aarch64 because we need a mode for the
>>>> memory in order for aarch64_classify_address to work
>>>> correctly.  Rather than overriding the VOIDmode in
>>>> aarch64_print_operand_address I decided to instead create the DImode
>>>> MEM in the "prefetch" output template and treat it as a normal 64-bit
>>>> memory address, which at the point of assembly output
>>>> is what it is anyway.
>>>>
>>>> With this patch I see a reduction in instruction count in the SPEC2006
>>>> benchmarks when SW prefetching is enabled on top
>>>> of Maxim's patchset because fewer address calculation instructions are
>>>> emitted due to the use of the more expressive
>>>> addressing modes. It also fixes a performance regression that I observed
>>>> in 410.bwaves from Maxim's patches on Cortex-A72.
>>>> I'll be running a full set of benchmarks to evaluate this further, but I
>>>> think this is the right thing to do.
>>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>>
>>>> Maxim, do you want to try this on top of your patches on your hardware
>>>> to see if it helps with the regressions you mentioned?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>>
>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>>>>
>>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>>>>       constraint on operand 0 to allow more general addressing modes.
>>>>       Adjust output template.
>>>>       * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>>>>       New function.
>>>>       * config/aarch64/aarch64-protos.h
>>>>       (aarch64_address_valid_for_prefetch_p): Declare prototype.
>>>>       * config/aarch64/constraints.md (Dp): New address constraint.
>>>>       * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>>>>       predicate.
>>>>
>>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>>>>
>>>> aarch64-prfm-imm.patch
>>>>
>>> Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
>>> contains an address, not a MEM.  So it's theoretically possible for
>>> generic code to want to look inside the first operand and find an
>>> address directly.  This change would break that assumption.
>> With this change the prefetch operand is still an address, not a MEM
>> during all the
>> optimisation passes.
>> It's wrapped in a MEM only during the ultimate printing of the assembly
>> string
>> during 'final'.
>>
> Ah!  I'd missed that.
>
> This is OK for stage1.

I've bootstrapped and tested the patch against current trunk and committed
it as r247603.

Thanks,
Kyrill

> R.
>
>> Kyrill
>>
>>> R.
>>>
>>>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
>>>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>> Date:   Thu Feb 2 14:46:11 2017 +0000
>>>>
>>>>       [AArch64] Accept more addressing modes for PRFM
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>>> b/gcc/config/aarch64/aarch64-protos.h
>>>> index babc327..61706de 100644
>>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>>>>      HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>>> unsigned);
>>>>    int aarch64_get_condition_code (rtx);
>>>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>>>>    bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>>>    unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>>>>    unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index acc093a..c05eff3 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct
>>>> aarch64_address_info *info,
>>>>        }
>>>>    }
>>>>    +/* Return true if the address X is valid for a PRFM instruction.
>>>> +   STRICT_P is true if we should do strict checking with
>>>> +   aarch64_classify_address.  */
>>>> +
>>>> +bool
>>>> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
>>>> +{
>>>> +  struct aarch64_address_info addr;
>>>> +
>>>> +  /* PRFM accepts the same addresses as DImode...  */
>>>> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM,
>>>> strict_p);
>>>> +  if (!res)
>>>> +    return false;
>>>> +
>>>> +  /* ... except writeback forms.  */
>>>> +  return addr.type != ADDRESS_REG_WB;
>>>> +}
>>>> +
>>>>    bool
>>>>    aarch64_symbolic_address_p (rtx x)
>>>>    {
>>>> diff --git a/gcc/config/aarch64/aarch64.md
>>>> b/gcc/config/aarch64/aarch64.md
>>>> index b9e8edf..c6201a5 100644
>>>> --- a/gcc/config/aarch64/aarch64.md
>>>> +++ b/gcc/config/aarch64/aarch64.md
>>>> @@ -518,27 +518,31 @@ (define_insn "nop"
>>>>    )
>>>>      (define_insn "prefetch"
>>>> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
>>>> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>>>>                (match_operand:QI 1 "const_int_operand" "")
>>>>                (match_operand:QI 2 "const_int_operand" ""))]
>>>>      ""
>>>>      {
>>>> -    const char * pftype[2][4] =
>>>> +    const char * pftype[2][4] =
>>>>        {
>>>> -      {"prfm\\tPLDL1STRM, %a0",
>>>> -       "prfm\\tPLDL3KEEP, %a0",
>>>> -       "prfm\\tPLDL2KEEP, %a0",
>>>> -       "prfm\\tPLDL1KEEP, %a0"},
>>>> -      {"prfm\\tPSTL1STRM, %a0",
>>>> -       "prfm\\tPSTL3KEEP, %a0",
>>>> -       "prfm\\tPSTL2KEEP, %a0",
>>>> -       "prfm\\tPSTL1KEEP, %a0"},
>>>> +      {"prfm\\tPLDL1STRM, %0",
>>>> +       "prfm\\tPLDL3KEEP, %0",
>>>> +       "prfm\\tPLDL2KEEP, %0",
>>>> +       "prfm\\tPLDL1KEEP, %0"},
>>>> +      {"prfm\\tPSTL1STRM, %0",
>>>> +       "prfm\\tPSTL3KEEP, %0",
>>>> +       "prfm\\tPSTL2KEEP, %0",
>>>> +       "prfm\\tPSTL1KEEP, %0"},
>>>>        };
>>>>          int locality = INTVAL (operands[2]);
>>>>          gcc_assert (IN_RANGE (locality, 0, 3));
>>>>    +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
>>>> +       the address into a DImode MEM so that aarch64_print_operand
>>>> knows
>>>> +       how to print it.  */
>>>> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>>>>        return pftype[INTVAL(operands[1])][locality];
>>>>      }
>>>>      [(set_attr "type" "load1")]
>>>> diff --git a/gcc/config/aarch64/constraints.md
>>>> b/gcc/config/aarch64/constraints.md
>>>> index 5a252c0..b829337 100644
>>>> --- a/gcc/config/aarch64/constraints.md
>>>> +++ b/gcc/config/aarch64/constraints.md
>>>> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>>>>     A constraint that matches an immediate operand valid for AdvSIMD
>>>> scalar."
>>>>     (and (match_code "const_int")
>>>>          (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
>>>> +
>>>> +(define_address_constraint "Dp"
>>>> +  "@internal
>>>> + An address valid for a prefetch instruction."
>>>> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
>>>> diff --git a/gcc/config/aarch64/predicates.md
>>>> b/gcc/config/aarch64/predicates.md
>>>> index e83d45b..8e3ea9b 100644
>>>> --- a/gcc/config/aarch64/predicates.md
>>>> +++ b/gcc/config/aarch64/predicates.md
>>>> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>>>>           (match_test "aarch64_legitimate_address_p (mode, XEXP (op,
>>>> 0), PARALLEL,
>>>>                               0)")))
>>>>    +(define_predicate "aarch64_prefetch_operand"
>>>> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
>>>> +
>>>>    (define_predicate "aarch64_valid_symref"
>>>>      (match_code "const, symbol_ref, label_ref")
>>>>    {
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>>> b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>>> new file mode 100644
>>>> index 0000000..26ab913
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>>> @@ -0,0 +1,18 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2" } */
>>>> +
>>>> +/* Check that we can generate the immediate-offset addressing
>>>> +   mode for PRFM.  */
>>>> +
>>>> +#define ARRSIZE 65
>>>> +int *bad_addr[ARRSIZE];
>>>> +
>>>> +void
>>>> +prefetch_for_read (void)
>>>> +{
>>>> +  int i;
>>>> +  for (i = 0; i < ARRSIZE; i++)
>>>> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1
>>>> } } */
>>>>
diff mbox

Patch

commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Thu Feb 2 14:46:11 2017 +0000

    [AArch64] Accept more addressing modes for PRFM

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index babc327..61706de 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -300,6 +300,7 @@  extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
+bool aarch64_address_valid_for_prefetch_p (rtx, bool);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
 unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
 unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acc093a..c05eff3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4549,6 +4549,24 @@  aarch64_classify_address (struct aarch64_address_info *info,
     }
 }
 
+/* Return true if the address X is valid for a PRFM instruction.
+   STRICT_P is true if we should do strict checking with
+   aarch64_classify_address.  */
+
+bool
+aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
+{
+  struct aarch64_address_info addr;
+
+  /* PRFM accepts the same addresses as DImode...  */
+  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
+  if (!res)
+    return false;
+
+  /* ... except writeback forms.  */
+  return addr.type != ADDRESS_REG_WB;
+}
+
 bool
 aarch64_symbolic_address_p (rtx x)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9e8edf..c6201a5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -518,27 +518,31 @@  (define_insn "nop"
 )
 
 (define_insn "prefetch"
-  [(prefetch (match_operand:DI 0 "register_operand" "r")
+  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
             (match_operand:QI 1 "const_int_operand" "")
             (match_operand:QI 2 "const_int_operand" ""))]
   ""
   {
-    const char * pftype[2][4] = 
+    const char * pftype[2][4] =
     {
-      {"prfm\\tPLDL1STRM, %a0",
-       "prfm\\tPLDL3KEEP, %a0",
-       "prfm\\tPLDL2KEEP, %a0",
-       "prfm\\tPLDL1KEEP, %a0"},
-      {"prfm\\tPSTL1STRM, %a0",
-       "prfm\\tPSTL3KEEP, %a0",
-       "prfm\\tPSTL2KEEP, %a0",
-       "prfm\\tPSTL1KEEP, %a0"},
+      {"prfm\\tPLDL1STRM, %0",
+       "prfm\\tPLDL3KEEP, %0",
+       "prfm\\tPLDL2KEEP, %0",
+       "prfm\\tPLDL1KEEP, %0"},
+      {"prfm\\tPSTL1STRM, %0",
+       "prfm\\tPSTL3KEEP, %0",
+       "prfm\\tPSTL2KEEP, %0",
+       "prfm\\tPSTL1KEEP, %0"},
     };
 
     int locality = INTVAL (operands[2]);
 
     gcc_assert (IN_RANGE (locality, 0, 3));
 
+    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
+       the address into a DImode MEM so that aarch64_print_operand knows
+       how to print it.  */
+    operands[0] = gen_rtx_MEM (DImode, operands[0]);
     return pftype[INTVAL(operands[1])][locality];
   }
   [(set_attr "type" "load1")]
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 5a252c0..b829337 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -214,3 +214,8 @@  (define_constraint "Dd"
  A constraint that matches an immediate operand valid for AdvSIMD scalar."
  (and (match_code "const_int")
       (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
+
+(define_address_constraint "Dp"
+  "@internal
+ An address valid for a prefetch instruction."
+ (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index e83d45b..8e3ea9b 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -165,6 +165,9 @@  (define_predicate "aarch64_mem_pair_operand"
        (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
 					       0)")))
 
+(define_predicate "aarch64_prefetch_operand"
+  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
+
 (define_predicate "aarch64_valid_symref"
   (match_code "const, symbol_ref, label_ref")
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
new file mode 100644
index 0000000..26ab913
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that we can generate the immediate-offset addressing
+   mode for PRFM.  */
+
+#define ARRSIZE 65
+int *bad_addr[ARRSIZE];
+
+void
+prefetch_for_read (void)
+{
+  int i;
+  for (i = 0; i < ARRSIZE; i++)
+    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
+}
+
+/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */