diff mbox

Fix MMX/SSE/AVX* shifts by non-immediate scalar (PR target/80286)

Message ID 20170406084007.GB17461@tucnak
State New
Headers show

Commit Message

Jakub Jelinek April 6, 2017, 8:40 a.m. UTC
On Thu, Apr 06, 2017 at 09:33:58AM +0200, Uros Bizjak wrote:
> Newly introduced alternatives (x/x) and (v/v) are valid also for
> 32-bit targets, so we have to adjust insn constraint of
> *vec_extractv4si_0_zext and enable alternatives accordingly. After the

That is true.  But if we provide just the x/x and v/v alternatives in
*vec_extractv4si_0_zext, then it will be forced to always do the zero
extraction on the SSE registers in 32-bit mode.  Is that what we want?

As for the define_insn_and_split that would transform sign extensions
used solely by the vector shifts by scalar shift count, did you mean
something like following (for every shift pattern)?


The problem with that is that apparently our infrastructure doesn't support
define_subst for define_insn_and_split (and define_split), so either we'd
need to have separate define_insn_and_split for masked and for non-masked,
or we'd need to extend the define_subst infrastructure for
define_insn_and_split somehow.  Looking say at
(define_subst "mask"
  [(set (match_operand:SUBST_V 0)
        (match_operand:SUBST_V 1))]
  "TARGET_AVX512F"
  [(set (match_dup 0)
        (vec_merge:SUBST_V
          (match_dup 1)
          (match_operand:SUBST_V 2 "vector_move_operand" "0C")
          (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk")))])
that is a transformation we want to do on the define_insn part of
define_insn_and_split, but not exactly what we want to do on the split
part of the insn - there we want literaly match_dup 0, match_dup 1,
and instead of the 2 other match_operand match_dup 2 and match_dup 3.

	Jakub

Comments

Uros Bizjak April 6, 2017, 8:47 a.m. UTC | #1
On Thu, Apr 6, 2017 at 10:40 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Apr 06, 2017 at 09:33:58AM +0200, Uros Bizjak wrote:
>> Newly introduced alternatives (x/x) and (v/v) are valid also for
>> 32-bit targets, so we have to adjust insn constraint of
>> *vec_extractv4si_0_zext and enable alternatives accordingly. After the
>
> That is true.  But if we provide just the x/x and v/v alternatives in
> *vec_extractv4si_0_zext, then it will be forced to always do the zero
> extraction on the SSE registers in 32-bit mode.  Is that what we want?

Yes, for SSE4 targets. We are sure that we have SSE source register
here, and there is no direct zero-extension to a general reg in
32-bit case.

> As for the define_insn_and_split that would transform sign extensions
> used solely by the vector shifts by scalar shift count, did you mean
> something like following (for every shift pattern)?
>
> --- sse.md.jj1  2017-04-04 19:51:01.000000000 +0200
> +++ sse.md      2017-04-06 10:26:26.877545109 +0200
> @@ -10696,6 +10696,22 @@
>     (set_attr "prefix" "orig,vex")
>     (set_attr "mode" "<sseinsnmode>")])
>
> +(define_insn_and_split "*<shift_insn><mode>3<mask_name>_1"
> +  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand")
> +       (any_lshift:VI2_AVX2_AVX512BW
> +         (match_operand:VI2_AVX2_AVX512BW 1 "register_operand")
> +         (sign_extend:DI (match_operand:SI 2 "nonmemory_operand"))))]
> +  "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 3) (zero_extend:DI (match_dup 2)))
> +   (set (match_dup 0) (any_lshift:VI2_AVX2_AVX512BW
> +                       (match_dup 1) (match_dup 3)))]
> +{
> +  operands[3] = gen_reg_rtx (DImode);
> +})
>
Yes, something like this. You ca use any_extend instead of
sign_extend, so the pattern will also remove possible zero_extend of
count operand.

>  (define_insn "<shift_insn><mode>3<mask_name>"
>    [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
>         (any_lshift:VI48_AVX2
>
> The problem with that is that apparently our infrastructure doesn't support
> define_subst for define_insn_and_split (and define_split), so either we'd
> need to have separate define_insn_and_split for masked and for non-masked,
> or we'd need to extend the define_subst infrastructure for
> define_insn_and_split somehow.  Looking say at
> (define_subst "mask"
>   [(set (match_operand:SUBST_V 0)
>         (match_operand:SUBST_V 1))]
>   "TARGET_AVX512F"
>   [(set (match_dup 0)
>         (vec_merge:SUBST_V
>           (match_dup 1)
>           (match_operand:SUBST_V 2 "vector_move_operand" "0C")
>           (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk")))])
> that is a transformation we want to do on the define_insn part of
> define_insn_and_split, but not exactly what we want to do on the split
> part of the insn - there we want literaly match_dup 0, match_dup 1,
> and instead of the 2 other match_operand match_dup 2 and match_dup 3.

Hm, I'm not that versed in define_subst, but that looks quite a
drawback of define_subst to me.

Uros.
Jakub Jelinek April 6, 2017, 8:48 a.m. UTC | #2
On Thu, Apr 06, 2017 at 10:40:07AM +0200, Jakub Jelinek wrote:
> On Thu, Apr 06, 2017 at 09:33:58AM +0200, Uros Bizjak wrote:
> > Newly introduced alternatives (x/x) and (v/v) are valid also for
> > 32-bit targets, so we have to adjust insn constraint of
> > *vec_extractv4si_0_zext and enable alternatives accordingly. After the
> 
> That is true.  But if we provide just the x/x and v/v alternatives in
> *vec_extractv4si_0_zext, then it will be forced to always do the zero
> extraction on the SSE registers in 32-bit mode.  Is that what we want?

Also, I think we can do the zero extension even without SSE4.1,
if we have a spare SSE register (or before reload), we can use
pxor into that scratch reg and punpck* it, if we don't, we can
construct a V4SI constaint in memory with { -1, 0, 0, 0 } or so
and and with that.

	Jakub
Jakub Jelinek April 6, 2017, 9:55 a.m. UTC | #3
On Thu, Apr 06, 2017 at 10:47:03AM +0200, Uros Bizjak wrote:
> > +(define_insn_and_split "*<shift_insn><mode>3<mask_name>_1"
> > +  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand")
> > +       (any_lshift:VI2_AVX2_AVX512BW
> > +         (match_operand:VI2_AVX2_AVX512BW 1 "register_operand")
> > +         (sign_extend:DI (match_operand:SI 2 "nonmemory_operand"))))]
> > +  "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>
> > +   && can_create_pseudo_p ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 3) (zero_extend:DI (match_dup 2)))
> > +   (set (match_dup 0) (any_lshift:VI2_AVX2_AVX512BW
> > +                       (match_dup 1) (match_dup 3)))]
> > +{
> > +  operands[3] = gen_reg_rtx (DImode);
> > +})
> >
> Yes, something like this. You ca use any_extend instead of
> sign_extend, so the pattern will also remove possible zero_extend of
> count operand.

The pattern splits it immediately (during split1) into a zext + shift,
so unless we let the pattern survive in this form (but then we need
constraints and it is unclear which ones) after reload, I don't see
advantage in matching it for zext, it is split exactly to what there
used to be before.

> >           (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk")))])
> > that is a transformation we want to do on the define_insn part of
> > define_insn_and_split, but not exactly what we want to do on the split
> > part of the insn - there we want literaly match_dup 0, match_dup 1,
> > and instead of the 2 other match_operand match_dup 2 and match_dup 3.
> 
> Hm, I'm not that versed in define_subst, but that looks quite a
> drawback of define_subst to me.

Perhaps, but we'd need to define what it means to subst a
define_insn_and_split.

	Jakub
diff mbox

Patch

--- sse.md.jj1	2017-04-04 19:51:01.000000000 +0200
+++ sse.md	2017-04-06 10:26:26.877545109 +0200
@@ -10696,6 +10696,22 @@ 
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn_and_split "*<shift_insn><mode>3<mask_name>_1"
+  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand")
+	(any_lshift:VI2_AVX2_AVX512BW
+	  (match_operand:VI2_AVX2_AVX512BW 1 "register_operand")
+	  (sign_extend:DI (match_operand:SI 2 "nonmemory_operand"))))]
+  "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 3) (zero_extend:DI (match_dup 2)))
+   (set (match_dup 0) (any_lshift:VI2_AVX2_AVX512BW
+			(match_dup 1) (match_dup 3)))]
+{
+  operands[3] = gen_reg_rtx (DImode);
+})
+
 (define_insn "<shift_insn><mode>3<mask_name>"
   [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
 	(any_lshift:VI48_AVX2