diff mbox

[ARM] Handle REG addressing mode in output_move_neon explicitly

Message ID 002f01cf6357$854c7a00$8fe56e00$@arm.com
State New
Headers show

Commit Message

Bin Cheng April 29, 2014, 3:02 a.m. UTC
Hi,
Function output_move_neon now generates vld1.64 for memory ref like "dx <-
[r1:SI]", this is bogus because it requires at least 64-bit alignment for
32-bit aligned memory ref.  It works now because GCC doesn't generate such
insns in the first place, but things are going to change if memset/memcpy
calls are inlined by using neon instructions.

This patch fixes the issue by generating ldr for such instructions.

Bootstrapped on cortex-a15 with neon.
Is it OK?

Thanks,
bin


2014-04-29  Bin Cheng  <bin.cheng@arm.com>

	* config/arm/arm.c (output_move_neon): Handle REG explicitly.

Comments

Richard Earnshaw May 1, 2014, 2:03 p.m. UTC | #1
On 29/04/14 04:02, bin.cheng wrote:
> Hi,
> Function output_move_neon now generates vld1.64 for memory ref like "dx <-
> [r1:SI]", this is bogus because it requires at least 64-bit alignment for
> 32-bit aligned memory ref.  It works now because GCC doesn't generate such
> insns in the first place, but things are going to change if memset/memcpy
> calls are inlined by using neon instructions.
> 

V[LD/ST]1.64 only need to be 64-bit aligned if strict alignment is
enabled.  We normally assume that not to be the case.  The exception to
this is when an explicit alignment check is used in the address
expression (the :64 suffix), which causes the address to be checked for
strict alignment at all times.

Do you have a testcase?

R.

> This patch fixes the issue by generating ldr for such instructions.
> 
> Bootstrapped on cortex-a15 with neon.
> Is it OK?
> 
> Thanks,
> bin
> 
> 
> 2014-04-29  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* config/arm/arm.c (output_move_neon): Handle REG explicitly.
> 
> 
> output_move_neon-20140429.txt
> 
> 
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c	(revision 209852)
> +++ gcc/config/arm/arm.c	(working copy)
> @@ -18427,6 +18453,20 @@ output_move_neon (rtx *operands)
>        /* FIXME: Not currently enabled in neon_vector_mem_operand.  */
>        gcc_unreachable ();
>  
> +    case REG:
> +      /* We have to use vldm / vstm for too-large modes.  */
> +      if (nregs > 1)
> +	{
> +	  if (nregs > 4)
> +	    templ = "v%smia%%?\t%%m0, %%h1";
> +	  else
> +	    templ = "v%s1.64\t%%h1, %%A0";
> +
> +	  ops[0] = mem;
> +	  ops[1] = reg;
> +	  break;
> +	}
> +      /* Fall through.  */
>      case LABEL_REF:
>      case PLUS:
>        {
> @@ -18460,14 +18500,7 @@ output_move_neon (rtx *operands)
>        }
>  
>      default:
> -      /* We have to use vldm / vstm for too-large modes.  */
> -      if (nregs > 4)
> -	templ = "v%smia%%?\t%%m0, %%h1";
> -      else
> -	templ = "v%s1.64\t%%h1, %%A0";
> -
> -      ops[0] = mem;
> -      ops[1] = reg;
> +      gcc_unreachable ();
>      }
>  
>    sprintf (buff, templ, load ? "ld" : "st");
>
Bin Cheng May 5, 2014, 7:21 a.m. UTC | #2
> -----Original Message-----
> From: Richard Earnshaw
> Sent: Thursday, May 01, 2014 10:03 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH ARM]Handle REG addressing mode in
> output_move_neon explicitly
> 
> On 29/04/14 04:02, bin.cheng wrote:
> > Hi,
> > Function output_move_neon now generates vld1.64 for memory ref like
> > "dx <- [r1:SI]", this is bogus because it requires at least 64-bit
> > alignment for 32-bit aligned memory ref.  It works now because GCC
> > doesn't generate such insns in the first place, but things are going
> > to change if memset/memcpy calls are inlined by using neon instructions.
> >
> 
> V[LD/ST]1.64 only need to be 64-bit aligned if strict alignment is
enabled.  We
> normally assume that not to be the case.  The exception to this is when an
theoretically, this doesn't make the problem go away, right?

> explicit alignment check is used in the address expression (the :64
suffix),
> which causes the address to be checked for strict alignment at all times.
> 
> Do you have a testcase?
I can't provide a test case without the memset inlining patch.

Thanks,
bin
Ramana Radhakrishnan July 2, 2014, 6:48 a.m. UTC | #3
On Mon, May 5, 2014 at 8:21 AM, bin.cheng <bin.cheng@arm.com> wrote:
>
>
>> -----Original Message-----
>> From: Richard Earnshaw
>> Sent: Thursday, May 01, 2014 10:03 PM
>> To: Bin Cheng
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH ARM]Handle REG addressing mode in
>> output_move_neon explicitly
>>
>> On 29/04/14 04:02, bin.cheng wrote:
>> > Hi,
>> > Function output_move_neon now generates vld1.64 for memory ref like
>> > "dx <- [r1:SI]", this is bogus because it requires at least 64-bit
>> > alignment for 32-bit aligned memory ref.  It works now because GCC
>> > doesn't generate such insns in the first place, but things are going
>> > to change if memset/memcpy calls are inlined by using neon instructions.
>> >
>>
>> V[LD/ST]1.64 only need to be 64-bit aligned if strict alignment is
> enabled.  We
>> normally assume that not to be the case.  The exception to this is when an
> theoretically, this doesn't make the problem go away, right?
>
>> explicit alignment check is used in the address expression (the :64
> suffix),
>> which causes the address to be checked for strict alignment at all times.
>>
>> Do you have a testcase?
> I can't provide a test case without the memset inlining patch.
>
Are the tests in the memset inlining patch now sufficient to expose
the problem or do we need another test ?

Ramana
> Thanks,
> bin
>
>
>
>
Bin.Cheng July 2, 2014, 12:46 p.m. UTC | #4
On Wed, Jul 2, 2014 at 7:48 AM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Mon, May 5, 2014 at 8:21 AM, bin.cheng <bin.cheng@arm.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Richard Earnshaw
>>> Sent: Thursday, May 01, 2014 10:03 PM
>>> To: Bin Cheng
>>> Cc: gcc-patches@gcc.gnu.org
>>> Subject: Re: [PATCH ARM]Handle REG addressing mode in
>>> output_move_neon explicitly
>>>
>>> On 29/04/14 04:02, bin.cheng wrote:
>>> > Hi,
>>> > Function output_move_neon now generates vld1.64 for memory ref like
>>> > "dx <- [r1:SI]", this is bogus because it requires at least 64-bit
>>> > alignment for 32-bit aligned memory ref.  It works now because GCC
>>> > doesn't generate such insns in the first place, but things are going
>>> > to change if memset/memcpy calls are inlined by using neon instructions.
>>> >
>>>
>>> V[LD/ST]1.64 only need to be 64-bit aligned if strict alignment is
>> enabled.  We
>>> normally assume that not to be the case.  The exception to this is when an
>> theoretically, this doesn't make the problem go away, right?
>>
>>> explicit alignment check is used in the address expression (the :64
>> suffix),
>>> which causes the address to be checked for strict alignment at all times.
>>>
>>> Do you have a testcase?
>> I can't provide a test case without the memset inlining patch.
>>
> Are the tests in the memset inlining patch now sufficient to expose
> the problem or do we need another test ?

Yes, it's covered by the 4th/7th test cases in memset inlining patch.

Thanks,
bin
diff mbox

Patch

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 209852)
+++ gcc/config/arm/arm.c	(working copy)
@@ -18427,6 +18453,20 @@  output_move_neon (rtx *operands)
       /* FIXME: Not currently enabled in neon_vector_mem_operand.  */
       gcc_unreachable ();
 
+    case REG:
+      /* We have to use vldm / vstm for too-large modes.  */
+      if (nregs > 1)
+	{
+	  if (nregs > 4)
+	    templ = "v%smia%%?\t%%m0, %%h1";
+	  else
+	    templ = "v%s1.64\t%%h1, %%A0";
+
+	  ops[0] = mem;
+	  ops[1] = reg;
+	  break;
+	}
+      /* Fall through.  */
     case LABEL_REF:
     case PLUS:
       {
@@ -18460,14 +18500,7 @@  output_move_neon (rtx *operands)
       }
 
     default:
-      /* We have to use vldm / vstm for too-large modes.  */
-      if (nregs > 4)
-	templ = "v%smia%%?\t%%m0, %%h1";
-      else
-	templ = "v%s1.64\t%%h1, %%A0";
-
-      ops[0] = mem;
-      ops[1] = reg;
+      gcc_unreachable ();
     }
 
   sprintf (buff, templ, load ? "ld" : "st");