diff mbox

Fix MMX/SSE/AVX* shifts by non-immediate scalar (PR target/80286)

Message ID 20170403203437.GF17461@tucnak
State New
Headers show

Commit Message

Jakub Jelinek April 3, 2017, 8:34 p.m. UTC
Hi!

This patch deals just with correctness of vector shifts by scalar
non-immediate.  The manuals say the shift count is bits [0:63] of
the corresponding source operand (XMM reg or memory in some cases),
and if the count is bigger than number of bits - 1 in the vector element,
it is treated as number of bits shift count.
We are modelling it as SImode shift count though, the upper 32 bits
may be random in some cases which causes wrong-code.
Fixed by using DImode that matches what the insns do.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Any thoughts on what to do to generate reasonable code when the shift count
comes from memory (e.g. as int variable) or is in the low bits of some XMM
regioster?
First of all, perhaps we could have some combiner (or peephole) pattern that would
transform sign-extend from e.g. SI to DI on the shift count into zero-extend
if there are no other uses of the extension result - if the shift count is
negative in SImode (or even QImode), then it is already large number and the
upper 32 bits or more don't really change anything on that.
Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
extended.  Not sure if we want to add =v / vm alternative to
zero_extendsidi2*, it already has some x but with ?s that prevent the RA
from using it.  So thoughts on that?

2017-04-03  Jakub Jelinek  <jakub@redhat.com>

	PR target/80286
	* config/i386/i386.c (ix86_expand_args_builtin): If op has scalar
	int mode, convert_modes it to mode as unsigned, otherwise use
	lowpart_subreg to mode rather than SImode.
	* config/i386/sse.md (<mask_codefor>ashr<mode>3<mask_name>,
	ashr<mode>3, ashr<mode>3<mask_name>, <shift_insn><mode>3<mask_name>):
	Use DImode instead of SImode for the shift count operand.
	* config/i386/mmx.md (mmx_ashr<mode>3, mmx_<shift_insn><mode>3):
	Likewise.
testsuite/
	* gcc.target/i386/avx-pr80286.c: New test.
	* gcc.dg/pr80286.c: New test.


	Jakub

Comments

Uros Bizjak April 4, 2017, 6:39 a.m. UTC | #1
On Mon, Apr 3, 2017 at 10:34 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch deals just with correctness of vector shifts by scalar
> non-immediate.  The manuals say the shift count is bits [0:63] of
> the corresponding source operand (XMM reg or memory in some cases),
> and if the count is bigger than number of bits - 1 in the vector element,
> it is treated as number of bits shift count.
> We are modelling it as SImode shift count though, the upper 32 bits
> may be random in some cases which causes wrong-code.
> Fixed by using DImode that matches what the insns do.

IIRC, SImode was choosen to simplify GPR->XMM register moves on 32bit
target. It does look this was wrong choice from the correctness point.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Any thoughts on what to do to generate reasonable code when the shift count
> comes from memory (e.g. as int variable) or is in the low bits of some XMM
> regioster?

The problem with int variable from memory is, that shifts access full
128bits for their count operand, so this is effectively a no-go. If
there is a 128bit count value in memory, we can maybe define shift
pattern with:

(subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN"))

?

> First of all, perhaps we could have some combiner (or peephole) pattern that would
> transform sign-extend from e.g. SI to DI on the shift count into zero-extend
> if there are no other uses of the extension result - if the shift count is
> negative in SImode (or even QImode), then it is already large number and the
> upper 32 bits or more don't really change anything on that.

We can introduce shift patterns with embedded extensions, and split
them to zext + shift. These new patterns can be easily macroized with
any_extend code iterator and SWI124 mode iterator, so we avoid pattern
explosion.

> Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
> GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
> extended.  Not sure if we want to add =v / vm alternative to
> zero_extendsidi2*, it already has some x but with ?s that prevent the RA
> from using it.  So thoughts on that?

The ? is there to discourage RA from allocating xmm reg (all these
alternatives have * on xmm reg), in effect instructing RA to prefer
GPRs. If the value is already in xmm reg, then I expect ? alternative
will be used. So, yes, v/v alternative as you proposed would be a good
addition to zero_extendsidi alternatives. Please note though that
pmovzxdq operates on a vector value, so memory operands should be
avoided.

>
> 2017-04-03  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/80286
>         * config/i386/i386.c (ix86_expand_args_builtin): If op has scalar
>         int mode, convert_modes it to mode as unsigned, otherwise use
>         lowpart_subreg to mode rather than SImode.
>         * config/i386/sse.md (<mask_codefor>ashr<mode>3<mask_name>,
>         ashr<mode>3, ashr<mode>3<mask_name>, <shift_insn><mode>3<mask_name>):
>         Use DImode instead of SImode for the shift count operand.
>         * config/i386/mmx.md (mmx_ashr<mode>3, mmx_<shift_insn><mode>3):
>         Likewise.
> testsuite/
>         * gcc.target/i386/avx-pr80286.c: New test.
>         * gcc.dg/pr80286.c: New test.

OK for trunk and backports.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2017-04-03 10:40:22.000000000 +0200
> +++ gcc/config/i386/i386.c      2017-04-03 18:31:39.482367634 +0200
> @@ -35582,10 +35582,17 @@ ix86_expand_args_builtin (const struct b
>         {
>           /* SIMD shift insns take either an 8-bit immediate or
>              register as count.  But builtin functions take int as
> -            count.  If count doesn't match, we put it in register.  */
> +            count.  If count doesn't match, we put it in register.
> +            The instructions are using 64-bit count, if op is just
> +            32-bit, zero-extend it, as negative shift counts
> +            are undefined behavior and zero-extension is more
> +            efficient.  */
>           if (!match)
>             {
> -             op = lowpart_subreg (SImode, op, GET_MODE (op));
> +             if (SCALAR_INT_MODE_P (GET_MODE (op)))
> +               op = convert_modes (mode, GET_MODE (op), op, 1);
> +             else
> +               op = lowpart_subreg (mode, op, GET_MODE (op));
>               if (!insn_p->operand[i + 1].predicate (op, mode))
>                 op = copy_to_reg (op);
>             }
> --- gcc/config/i386/sse.md.jj   2017-04-03 13:43:50.179572564 +0200
> +++ gcc/config/i386/sse.md      2017-04-03 18:01:19.713852914 +0200
> @@ -10620,7 +10620,7 @@ (define_insn "<mask_codefor>ashr<mode>3<
>    [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
>         (ashiftrt:VI24_AVX512BW_1
>           (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
> -         (match_operand:SI 2 "nonmemory_operand" "v,N")))]
> +         (match_operand:DI 2 "nonmemory_operand" "v,N")))]
>    "TARGET_AVX512VL"
>    "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
>    [(set_attr "type" "sseishft")
> @@ -10634,7 +10634,7 @@ (define_insn "ashr<mode>3"
>    [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
>         (ashiftrt:VI24_AVX2
>           (match_operand:VI24_AVX2 1 "register_operand" "0,x")
> -         (match_operand:SI 2 "nonmemory_operand" "xN,xN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "xN,xN")))]
>    "TARGET_SSE2"
>    "@
>     psra<ssemodesuffix>\t{%2, %0|%0, %2}
> @@ -10667,7 +10667,7 @@ (define_insn "ashr<mode>3<mask_name>"
>    [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v")
>         (ashiftrt:VI248_AVX512BW_AVX512VL
>           (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand" "v,vm")
> -         (match_operand:SI 2 "nonmemory_operand" "v,N")))]
> +         (match_operand:DI 2 "nonmemory_operand" "v,N")))]
>    "TARGET_AVX512F"
>    "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
>    [(set_attr "type" "sseishft")
> @@ -10681,7 +10681,7 @@ (define_insn "<shift_insn><mode>3<mask_n
>    [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v")
>         (any_lshift:VI2_AVX2_AVX512BW
>           (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v")
> -         (match_operand:SI 2 "nonmemory_operand" "xN,vN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "xN,vN")))]
>    "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
>    "@
>     p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
> @@ -10700,7 +10700,7 @@ (define_insn "<shift_insn><mode>3<mask_n
>    [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
>         (any_lshift:VI48_AVX2
>           (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
> -         (match_operand:SI 2 "nonmemory_operand" "xN,xN,vN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))]
>    "TARGET_SSE2 && <mask_mode512bit_condition>"
>    "@
>     p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
> @@ -10720,7 +10720,7 @@ (define_insn "<shift_insn><mode>3<mask_n
>    [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
>         (any_lshift:VI48_512
>           (match_operand:VI48_512 1 "nonimmediate_operand" "v,m")
> -         (match_operand:SI 2 "nonmemory_operand" "vN,N")))]
> +         (match_operand:DI 2 "nonmemory_operand" "vN,N")))]
>    "TARGET_AVX512F && <mask_mode512bit_condition>"
>    "vp<vshift><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
>    [(set_attr "isa" "avx512f")
> --- gcc/config/i386/mmx.md.jj   2017-04-03 13:43:50.119573339 +0200
> +++ gcc/config/i386/mmx.md      2017-04-03 18:01:19.708852979 +0200
> @@ -930,7 +930,7 @@ (define_insn "mmx_ashr<mode>3"
>    [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
>          (ashiftrt:MMXMODE24
>           (match_operand:MMXMODE24 1 "register_operand" "0")
> -         (match_operand:SI 2 "nonmemory_operand" "yN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "yN")))]
>    "TARGET_MMX"
>    "psra<mmxvecsize>\t{%2, %0|%0, %2}"
>    [(set_attr "type" "mmxshft")
> @@ -944,7 +944,7 @@ (define_insn "mmx_<shift_insn><mode>3"
>    [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
>          (any_lshift:MMXMODE248
>           (match_operand:MMXMODE248 1 "register_operand" "0")
> -         (match_operand:SI 2 "nonmemory_operand" "yN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "yN")))]
>    "TARGET_MMX"
>    "p<vshift><mmxvecsize>\t{%2, %0|%0, %2}"
>    [(set_attr "type" "mmxshft")
> --- gcc/testsuite/gcc.target/i386/avx-pr80286.c.jj      2017-04-03 18:44:07.552698281 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-pr80286.c 2017-04-03 18:43:51.000000000 +0200
> @@ -0,0 +1,26 @@
> +/* PR target/80286 */
> +/* { dg-do run { target avx } } */
> +/* { dg-options "-O2 -mavx" } */
> +
> +#include "avx-check.h"
> +#include <immintrin.h>
> +
> +__m256i m;
> +
> +__attribute__((noinline, noclone)) __m128i
> +foo (__m128i x)
> +{
> +  int s = _mm_cvtsi128_si32 (_mm256_castsi256_si128 (m));
> +  return _mm_srli_epi16 (x, s);
> +}
> +
> +static void
> +avx_test (void)
> +{
> +  __m128i a = (__m128i) (__v8hi) { 1 << 7, 2 << 8, 3 << 9, 4 << 10, 5 << 11, 6 << 12, 7 << 13, 8 << 12 };
> +  m = (__m256i) (__v8si) { 7, 8, 9, 10, 11, 12, 13, 14 };
> +  __m128i c = foo (a);
> +  __m128i b = (__m128i) (__v8hi) { 1, 2 << 1, 3 << 2, 4 << 3, 5 << 4, 6 << 5, 7 << 6, 8 << 5 };
> +  if (__builtin_memcmp (&c, &b, sizeof (__m128i)))
> +    __builtin_abort ();
> +}
> --- gcc/testsuite/gcc.dg/pr80286.c.jj   2017-04-03 18:45:27.574663948 +0200
> +++ gcc/testsuite/gcc.dg/pr80286.c      2017-04-03 18:45:18.386782707 +0200
> @@ -0,0 +1,23 @@
> +/* PR target/80286 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -Wno-psabi" } */
> +
> +typedef int V __attribute__((vector_size (4 * sizeof (int))));
> +
> +__attribute__((noinline, noclone)) V
> +foo (V x, V y)
> +{
> +  return x << y[0];
> +}
> +
> +int
> +main ()
> +{
> +  V x = { 1, 2, 3, 4 };
> +  V y = { 5, 6, 7, 8 };
> +  V z = foo (x, y);
> +  V e = { 1 << 5, 2 << 5, 3 << 5, 4 << 5 };
> +  if (__builtin_memcmp (&z, &e, sizeof (V)))
> +    __builtin_abort ();
> +  return 0;
> +}
>
>         Jakub
Jakub Jelinek April 4, 2017, noon UTC | #2
On Tue, Apr 04, 2017 at 08:39:59AM +0200, Uros Bizjak wrote:
> > Any thoughts on what to do to generate reasonable code when the shift count
> > comes from memory (e.g. as int variable) or is in the low bits of some XMM
> > regioster?
> 
> The problem with int variable from memory is, that shifts access full
> 128bits for their count operand, so this is effectively a no-go. If
> there is a 128bit count value in memory, we can maybe define shift
> pattern with:
> 
> (subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN"))
> 
> ?

Well, if the original memory is say int, then we can't just read it as V2DI
or V4SI.

> > First of all, perhaps we could have some combiner (or peephole) pattern that would
> > transform sign-extend from e.g. SI to DI on the shift count into zero-extend
> > if there are no other uses of the extension result - if the shift count is
> > negative in SImode (or even QImode), then it is already large number and the
> > upper 32 bits or more don't really change anything on that.
> 
> We can introduce shift patterns with embedded extensions, and split
> them to zext + shift. These new patterns can be easily macroized with
> any_extend code iterator and SWI124 mode iterator, so we avoid pattern
> explosion.

I assume split those before reload.  Because we want to give reload a chance
to do the zero extension on GPRs if it is more beneficial, and it might
choose to store it into memory and load into XMM from memory and that is
hard to do after reload.

> > Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
> > GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
> > extended.  Not sure if we want to add =v / vm alternative to
> > zero_extendsidi2*, it already has some x but with ?s that prevent the RA
> > from using it.  So thoughts on that?
> 
> The ? is there to discourage RA from allocating xmm reg (all these
> alternatives have * on xmm reg), in effect instructing RA to prefer
> GPRs. If the value is already in xmm reg, then I expect ? alternative
> will be used. So, yes, v/v alternative as you proposed would be a good
> addition to zero_extendsidi alternatives. Please note though that
> pmovzxdq operates on a vector value, so memory operands should be
> avoided.

With ? in front of it or without?  I admit I've only tried so far:
@@ -4049,24 +4049,29 @@ (define_expand "extendsidi2"
 })

 (define_insn "*extendsidi2_rex64"
-  [(set (match_operand:DI 0 "register_operand" "=*a,r")
-       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm")))]
+  [(set (match_operand:DI 0 "register_operand" "=*a,r,v")
+       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm,vm")))]
   "TARGET_64BIT"
   "@
    {cltq|cdqe}
-   movs{lq|x}\t{%1, %0|%0, %1}"
-  [(set_attr "type" "imovx")
-   (set_attr "mode" "DI")
-   (set_attr "prefix_0f" "0")
-   (set_attr "modrm" "0,1")])
+   movs{lq|x}\t{%1, %0|%0, %1}
+   %vpmovsxdq\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,sse4")
+   (set_attr "type" "imovx,imovx,ssemov")
+   (set_attr "mode" "DI,DI,TI")
+   (set_attr "prefix_0f" "0,0,*")
+   (set_attr "prefix_extra" "*,*,1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "modrm" "0,1,*")])


and with the ? in front of v it for some reason didn't trigger.
I'll try the zero_extendsidi2 now and see how it works.

> OK for trunk and backports.

Committed to trunk so far, backports in a week or so when I backport
dozens of other patches together with it.

	Jakub
Uros Bizjak April 4, 2017, 12:33 p.m. UTC | #3
On Tue, Apr 4, 2017 at 2:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Apr 04, 2017 at 08:39:59AM +0200, Uros Bizjak wrote:
>> > Any thoughts on what to do to generate reasonable code when the shift count
>> > comes from memory (e.g. as int variable) or is in the low bits of some XMM
>> > regioster?
>>
>> The problem with int variable from memory is, that shifts access full
>> 128bits for their count operand, so this is effectively a no-go. If
>> there is a 128bit count value in memory, we can maybe define shift
>> pattern with:
>>
>> (subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN"))
>>
>> ?
>
> Well, if the original memory is say int, then we can't just read it as V2DI
> or V4SI.

Of course. The above was for the case when we *want* to load from
memory. The insn loads full 128bit value.

>> > First of all, perhaps we could have some combiner (or peephole) pattern that would
>> > transform sign-extend from e.g. SI to DI on the shift count into zero-extend
>> > if there are no other uses of the extension result - if the shift count is
>> > negative in SImode (or even QImode), then it is already large number and the
>> > upper 32 bits or more don't really change anything on that.
>>
>> We can introduce shift patterns with embedded extensions, and split
>> them to zext + shift. These new patterns can be easily macroized with
>> any_extend code iterator and SWI124 mode iterator, so we avoid pattern
>> explosion.
>
> I assume split those before reload.  Because we want to give reload a chance
> to do the zero extension on GPRs if it is more beneficial, and it might
> choose to store it into memory and load into XMM from memory and that is
> hard to do after reload.

Yes, split before reload, and hope that alternative's decorations play
well with RA.

>> > Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
>> > GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
>> > extended.  Not sure if we want to add =v / vm alternative to
>> > zero_extendsidi2*, it already has some x but with ?s that prevent the RA
>> > from using it.  So thoughts on that?
>>
>> The ? is there to discourage RA from allocating xmm reg (all these
>> alternatives have * on xmm reg), in effect instructing RA to prefer
>> GPRs. If the value is already in xmm reg, then I expect ? alternative
>> will be used. So, yes, v/v alternative as you proposed would be a good
>> addition to zero_extendsidi alternatives. Please note though that
>> pmovzxdq operates on a vector value, so memory operands should be
>> avoided.
>
> With ? in front of it or without?  I admit I've only tried so far:

I'd leave ?* in this case. In my experience, RA allocates alternative
with ?* only when really needed.

> @@ -4049,24 +4049,29 @@ (define_expand "extendsidi2"
>  })
>
>  (define_insn "*extendsidi2_rex64"
> -  [(set (match_operand:DI 0 "register_operand" "=*a,r")
> -       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm")))]
> +  [(set (match_operand:DI 0 "register_operand" "=*a,r,v")
> +       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm,vm")))]
>    "TARGET_64BIT"
>    "@
>     {cltq|cdqe}
> -   movs{lq|x}\t{%1, %0|%0, %1}"
> -  [(set_attr "type" "imovx")
> -   (set_attr "mode" "DI")
> -   (set_attr "prefix_0f" "0")
> -   (set_attr "modrm" "0,1")])
> +   movs{lq|x}\t{%1, %0|%0, %1}
> +   %vpmovsxdq\t{%1, %0|%0, %1}"
> +  [(set_attr "isa" "*,*,sse4")
> +   (set_attr "type" "imovx,imovx,ssemov")
> +   (set_attr "mode" "DI,DI,TI")
> +   (set_attr "prefix_0f" "0,0,*")
> +   (set_attr "prefix_extra" "*,*,1")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> +   (set_attr "modrm" "0,1,*")])
>
>
> and with the ? in front of v it for some reason didn't trigger.
> I'll try the zero_extendsidi2 now and see how it works.
>
>> OK for trunk and backports.
>
> Committed to trunk so far, backports in a week or so when I backport
> dozens of other patches together with it.
>
>         Jakub

Uros.
diff mbox

Patch

--- gcc/config/i386/i386.c.jj	2017-04-03 10:40:22.000000000 +0200
+++ gcc/config/i386/i386.c	2017-04-03 18:31:39.482367634 +0200
@@ -35582,10 +35582,17 @@  ix86_expand_args_builtin (const struct b
 	{
 	  /* SIMD shift insns take either an 8-bit immediate or
 	     register as count.  But builtin functions take int as
-	     count.  If count doesn't match, we put it in register.  */
+	     count.  If count doesn't match, we put it in register.
+	     The instructions are using 64-bit count, if op is just
+	     32-bit, zero-extend it, as negative shift counts
+	     are undefined behavior and zero-extension is more
+	     efficient.  */
 	  if (!match)
 	    {
-	      op = lowpart_subreg (SImode, op, GET_MODE (op));
+	      if (SCALAR_INT_MODE_P (GET_MODE (op)))
+		op = convert_modes (mode, GET_MODE (op), op, 1);
+	      else
+		op = lowpart_subreg (mode, op, GET_MODE (op));
 	      if (!insn_p->operand[i + 1].predicate (op, mode))
 		op = copy_to_reg (op);
 	    }
--- gcc/config/i386/sse.md.jj	2017-04-03 13:43:50.179572564 +0200
+++ gcc/config/i386/sse.md	2017-04-03 18:01:19.713852914 +0200
@@ -10620,7 +10620,7 @@  (define_insn "<mask_codefor>ashr<mode>3<
   [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
 	(ashiftrt:VI24_AVX512BW_1
 	  (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
-	  (match_operand:SI 2 "nonmemory_operand" "v,N")))]
+	  (match_operand:DI 2 "nonmemory_operand" "v,N")))]
   "TARGET_AVX512VL"
   "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
@@ -10634,7 +10634,7 @@  (define_insn "ashr<mode>3"
   [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
 	(ashiftrt:VI24_AVX2
 	  (match_operand:VI24_AVX2 1 "register_operand" "0,x")
-	  (match_operand:SI 2 "nonmemory_operand" "xN,xN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "xN,xN")))]
   "TARGET_SSE2"
   "@
    psra<ssemodesuffix>\t{%2, %0|%0, %2}
@@ -10667,7 +10667,7 @@  (define_insn "ashr<mode>3<mask_name>"
   [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v")
 	(ashiftrt:VI248_AVX512BW_AVX512VL
 	  (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand" "v,vm")
-	  (match_operand:SI 2 "nonmemory_operand" "v,N")))]
+	  (match_operand:DI 2 "nonmemory_operand" "v,N")))]
   "TARGET_AVX512F"
   "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
@@ -10681,7 +10681,7 @@  (define_insn "<shift_insn><mode>3<mask_n
   [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v")
 	(any_lshift:VI2_AVX2_AVX512BW
 	  (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v")
-	  (match_operand:SI 2 "nonmemory_operand" "xN,vN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "xN,vN")))]
   "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
    p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
@@ -10700,7 +10700,7 @@  (define_insn "<shift_insn><mode>3<mask_n
   [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
 	(any_lshift:VI48_AVX2
 	  (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
-	  (match_operand:SI 2 "nonmemory_operand" "xN,xN,vN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))]
   "TARGET_SSE2 && <mask_mode512bit_condition>"
   "@
    p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
@@ -10720,7 +10720,7 @@  (define_insn "<shift_insn><mode>3<mask_n
   [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
 	(any_lshift:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "v,m")
-	  (match_operand:SI 2 "nonmemory_operand" "vN,N")))]
+	  (match_operand:DI 2 "nonmemory_operand" "vN,N")))]
   "TARGET_AVX512F && <mask_mode512bit_condition>"
   "vp<vshift><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
--- gcc/config/i386/mmx.md.jj	2017-04-03 13:43:50.119573339 +0200
+++ gcc/config/i386/mmx.md	2017-04-03 18:01:19.708852979 +0200
@@ -930,7 +930,7 @@  (define_insn "mmx_ashr<mode>3"
   [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
         (ashiftrt:MMXMODE24
 	  (match_operand:MMXMODE24 1 "register_operand" "0")
-	  (match_operand:SI 2 "nonmemory_operand" "yN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "yN")))]
   "TARGET_MMX"
   "psra<mmxvecsize>\t{%2, %0|%0, %2}"
   [(set_attr "type" "mmxshft")
@@ -944,7 +944,7 @@  (define_insn "mmx_<shift_insn><mode>3"
   [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
         (any_lshift:MMXMODE248
 	  (match_operand:MMXMODE248 1 "register_operand" "0")
-	  (match_operand:SI 2 "nonmemory_operand" "yN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "yN")))]
   "TARGET_MMX"
   "p<vshift><mmxvecsize>\t{%2, %0|%0, %2}"
   [(set_attr "type" "mmxshft")
--- gcc/testsuite/gcc.target/i386/avx-pr80286.c.jj	2017-04-03 18:44:07.552698281 +0200
+++ gcc/testsuite/gcc.target/i386/avx-pr80286.c	2017-04-03 18:43:51.000000000 +0200
@@ -0,0 +1,26 @@ 
+/* PR target/80286 */
+/* { dg-do run { target avx } } */
+/* { dg-options "-O2 -mavx" } */
+
+#include "avx-check.h"
+#include <immintrin.h>
+
+__m256i m;
+
+__attribute__((noinline, noclone)) __m128i
+foo (__m128i x)
+{
+  int s = _mm_cvtsi128_si32 (_mm256_castsi256_si128 (m));
+  return _mm_srli_epi16 (x, s);
+}
+
+static void
+avx_test (void)
+{
+  __m128i a = (__m128i) (__v8hi) { 1 << 7, 2 << 8, 3 << 9, 4 << 10, 5 << 11, 6 << 12, 7 << 13, 8 << 12 };
+  m = (__m256i) (__v8si) { 7, 8, 9, 10, 11, 12, 13, 14 };
+  __m128i c = foo (a);
+  __m128i b = (__m128i) (__v8hi) { 1, 2 << 1, 3 << 2, 4 << 3, 5 << 4, 6 << 5, 7 << 6, 8 << 5 };
+  if (__builtin_memcmp (&c, &b, sizeof (__m128i)))
+    __builtin_abort ();
+}
--- gcc/testsuite/gcc.dg/pr80286.c.jj	2017-04-03 18:45:27.574663948 +0200
+++ gcc/testsuite/gcc.dg/pr80286.c	2017-04-03 18:45:18.386782707 +0200
@@ -0,0 +1,23 @@ 
+/* PR target/80286 */
+/* { dg-do run } */
+/* { dg-options "-O2 -Wno-psabi" } */
+
+typedef int V __attribute__((vector_size (4 * sizeof (int))));
+
+__attribute__((noinline, noclone)) V
+foo (V x, V y)
+{
+  return x << y[0];
+}
+
+int
+main ()
+{
+  V x = { 1, 2, 3, 4 };
+  V y = { 5, 6, 7, 8 };
+  V z = foo (x, y);
+  V e = { 1 << 5, 2 << 5, 3 << 5, 4 << 5 };
+  if (__builtin_memcmp (&z, &e, sizeof (V)))
+    __builtin_abort ();
+  return 0;
+}