Patchwork [ARM] one_cmpldi2 in NEON

login
register
mail settings
Submitter Andrew Stubbs
Date Dec. 6, 2011, 5:59 p.m.
Message ID <4EDE5810.1070204@codesourcery.com>
Download mbox | patch
Permalink /patch/129772/
State New
Headers show

Comments

Andrew Stubbs - Dec. 6, 2011, 5:59 p.m.
This patch adds a one's complement pattern for doing DImode 'not' in 
NEON registers.

There are already patterns for doing one's complement of vectors, and 
even though it boils down to the same instruction, the DImode case was 
missing.

The patch needs to be a little more complicated than using a mode 
iterator that includes DI because it needs to coexist with the non-neon 
one_cmpldi2 (renamed by this patch to "one_cmpldi2_core").

OK for when stage 1 opens again?

Andrew
Julian Brown - Dec. 6, 2011, 6:43 p.m.
On Tue, 06 Dec 2011 17:59:44 +0000
Andrew Stubbs <ams@codesourcery.com> wrote:

> This patch adds a one's complement pattern for doing DImode 'not' in 
> NEON registers.
> 
> There are already patterns for doing one's complement of vectors, and 
> even though it boils down to the same instruction, the DImode case
> was missing.
> 
> The patch needs to be a little more complicated than using a mode 
> iterator that includes DI because it needs to coexist with the
> non-neon one_cmpldi2 (renamed by this patch to "one_cmpldi2_core").
> 
> OK for when stage 1 opens again?
> 
> Andrew

+(define_insn "*one_cmpldi2_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	 "=w,?&r,?&r,?w")
+	(not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]
+  "TARGET_NEON"
+  "@
+  vmvn\t%P0, %P1
+  #
+  #
+  vmvn\t%P0, %P1"
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
+   (set_attr "length" "*,8,8,*")
+   (set_attr "arch" "nota8,*,*,onlya8")]
+)

Don't you need to specify an element type on those instructions? (".s64"
maybe)?

Julian
Stubbs, Andrew - Dec. 6, 2011, 8:28 p.m.
On Tue 06 Dec 2011 18:43:05 GMT, Julian Brown wrote:
> On Tue, 06 Dec 2011 17:59:44 +0000
> Andrew Stubbs<ams@codesourcery.com>  wrote:
>
>> This patch adds a one's complement pattern for doing DImode 'not' in
>> NEON registers.
>>
>> There are already patterns for doing one's complement of vectors, and
>> even though it boils down to the same instruction, the DImode case
>> was missing.
>>
>> The patch needs to be a little more complicated than using a mode
>> iterator that includes DI because it needs to coexist with the
>> non-neon one_cmpldi2 (renamed by this patch to "one_cmpldi2_core").
>>
>> OK for when stage 1 opens again?
>>
>> Andrew
>
> +(define_insn "*one_cmpldi2_neon"
> +  [(set (match_operand:DI 0 "s_register_operand"	 "=w,?&r,?&r,?w")
> +	(not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]
> +  "TARGET_NEON"
> +  "@
> +  vmvn\t%P0, %P1
> +  #
> +  #
> +  vmvn\t%P0, %P1"
> +  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
> +   (set_attr "length" "*,8,8,*")
> +   (set_attr "arch" "nota8,*,*,onlya8")]
> +)
>
> Don't you need to specify an element type on those instructions? (".s64"
> maybe)?

No; I believe you can, but it's ignored, given that it makes zero 
difference. Same for vand, vorr, veor etc. In any case this was copied 
from the existing "one_cmpl<mode>2" pattern, and gas accepts it.

Andrew
Richard Henderson - Dec. 6, 2011, 9:05 p.m.
On 12/06/2011 09:59 AM, Andrew Stubbs wrote:
> +(define_insn "*one_cmpldi2_neon"
> +  [(set (match_operand:DI 0 "s_register_operand"	 "=w,?&r,?&r,?w")
> +	(not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]

alternative 0 == alternative 3?


r~
Stubbs, Andrew - Dec. 6, 2011, 9:42 p.m.
On Tue 06 Dec 2011 21:05:30 GMT, Richard Henderson wrote:
> On 12/06/2011 09:59 AM, Andrew Stubbs wrote:
>> +(define_insn "*one_cmpldi2_neon"
>> +  [(set (match_operand:DI 0 "s_register_operand"	 "=w,?&r,?&r,?w")
>> +	(not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]
>
> alternative 0 == alternative 3?

Yes and no. This is an idiom used in several places in neon.md. They 
are the same, but only one or other is enabled at the same time (see 
the "arch" attribute) so while both 'w' and 'r' options are always 
available, the order of preference is different.

If you have a better way of achieving this goal, then I'm eager to hear 
it.

FYI, the reason is that Cortex-A8 supports NEON, but the cost of 
transferring values between the register files is prohibitively high so 
64-bit NEON operations are discouraged unless the values are already in 
NEON, or core register pressure makes it more worthwhile.

Andrew
Richard Henderson - Dec. 6, 2011, 11:07 p.m.
On 12/06/2011 01:42 PM, Andrew Stubbs wrote:
> On Tue 06 Dec 2011 21:05:30 GMT, Richard Henderson wrote:
>> On 12/06/2011 09:59 AM, Andrew Stubbs wrote:
>>> +(define_insn "*one_cmpldi2_neon"
>>> +  [(set (match_operand:DI 0 "s_register_operand"     "=w,?&r,?&r,?w")
>>> +    (not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]
>>
>> alternative 0 == alternative 3?
> 
> Yes and no. This is an idiom used in several places in neon.md. They
> are the same, but only one or other is enabled at the same time (see
> the "arch" attribute) so while both 'w' and 'r' options are always
> available, the order of preference is different.

Except that without *, I don't think this is what is actually achieved.
Perhaps I'm mistaken about how register preferencing is handled in the
current state of the register allocator...


r~
Stubbs, Andrew - Dec. 7, 2011, 9:15 a.m.
On 06/12/11 23:07, Richard Henderson wrote:
> On 12/06/2011 01:42 PM, Andrew Stubbs wrote:
>> On Tue 06 Dec 2011 21:05:30 GMT, Richard Henderson wrote:
>>> On 12/06/2011 09:59 AM, Andrew Stubbs wrote:
>>>> +(define_insn "*one_cmpldi2_neon"
>>>> +  [(set (match_operand:DI 0 "s_register_operand"     "=w,?&r,?&r,?w")
>>>> +    (not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]
>>>
>>> alternative 0 == alternative 3?
>>
>> Yes and no. This is an idiom used in several places in neon.md. They
>> are the same, but only one or other is enabled at the same time (see
>> the "arch" attribute) so while both 'w' and 'r' options are always
>> available, the order of preference is different.
>
> Except that without *, I don't think this is what is actually achieved.
> Perhaps I'm mistaken about how register preferencing is handled in the
> current state of the register allocator...

Well .... I can demonstrate that it does work in a simple testcase, and 
it has a visible affect in benchmark figures.

Of course, that doesn't mean it doing it the right way. As far as I 
understand it the current implementation relies on the left-to-right 
priority of the constraints.

On A9, and other A-series parts, we want it to prefer 'w'-constraint 
registers, but still use 'r'-constraint registers if those are more 
convenient.

On A8, we want it to prefer 'r' registers, all else being equal, but 
still have the freedom to use 'w' registers if the values are already there.

 From reading the internals manual, it's not clear to me what the '*' 
constraint modifier really means, or how it would work in this case? 
Could you enlighten me?

Andrew
Richard Earnshaw - Dec. 7, 2011, 4:25 p.m.
On 06/12/11 17:59, Andrew Stubbs wrote:
> This patch adds a one's complement pattern for doing DImode 'not' in 
> NEON registers.
> 
> There are already patterns for doing one's complement of vectors, and 
> even though it boils down to the same instruction, the DImode case was 
> missing.
> 
> The patch needs to be a little more complicated than using a mode 
> iterator that includes DI because it needs to coexist with the non-neon 
> one_cmpldi2 (renamed by this patch to "one_cmpldi2_core").
> 
> OK for when stage 1 opens again?
> 
> Andrew
> 
> 
> neon-not.patch
> 
> 
> 2011-12-06  Andrew Stubbs  <ams@codesourcery.com>
> 
> 	gcc/
> 	* config/arm/arm.md (one_cmpldi2): Rename to ...
> 	(one_cmpldi2_core): ... this, and modify it to prevent it being
> 	used for NEON.
> 	(one_cmpldi2): New define_expand.
> 	* config/arm/neon.md (one_cmpldi2_neon): New define_insn.


> +(define_insn_and_split "*one_cmpldi2_core"
> +  [(set (match_operand:DI 0 "arm_general_register_operand" "=&r,&r")
> +	(not:DI (match_operand:DI 1 "arm_general_register_operand" "0,r")))]

Thinking about it, for an operation with one input and one output, there's no need for the
earlyclobber marker when the input is tied to the output (there's no other operand that can be
clobbered).

Otherwise OK.

R.

Patch

2011-12-06  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (one_cmpldi2): Rename to ...
	(one_cmpldi2_core): ... this, and modify it to prevent it being
	used for NEON.
	(one_cmpldi2): New define_expand.
	* config/arm/neon.md (one_cmpldi2_neon): New define_insn.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4199,10 +4199,16 @@ 
   "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_VFP_DOUBLE)"
   "")
 
-(define_insn_and_split "one_cmpldi2"
-  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r")
-	(not:DI (match_operand:DI 1 "s_register_operand" "0,r")))]
+(define_expand "one_cmpldi2"
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(not:DI (match_operand:DI 1 "s_register_operand" "")))]
   "TARGET_32BIT"
+  "")
+
+(define_insn_and_split "*one_cmpldi2_core"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "=&r,&r")
+	(not:DI (match_operand:DI 1 "arm_general_register_operand" "0,r")))]
+  "TARGET_32BIT && !TARGET_NEON"
   "#"
   "TARGET_32BIT && reload_completed"
   [(set (match_dup 0) (not:SI (match_dup 1)))
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -896,6 +896,20 @@ 
   [(set_attr "neon_type" "neon_int_1")]
 )
 
+(define_insn "*one_cmpldi2_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	 "=w,?&r,?&r,?w")
+	(not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]
+  "TARGET_NEON"
+  "@
+  vmvn\t%P0, %P1
+  #
+  #
+  vmvn\t%P0, %P1"
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
+   (set_attr "length" "*,8,8,*")
+   (set_attr "arch" "nota8,*,*,onlya8")]
+)
+
 (define_insn "abs<mode>2"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
 	(abs:VDQW (match_operand:VDQW 1 "s_register_operand" "w")))]