diff mbox series

Fix expand_mult_const (PR middle-end/87138)

Message ID 20180830071911.GI2218@tucnak
State New
Headers show
Series Fix expand_mult_const (PR middle-end/87138) | expand

Commit Message

Jakub Jelinek Aug. 30, 2018, 7:19 a.m. UTC
Hi!

The following testcase is miscompiled, because expand_mult_const adds an
incorrect REG_EQUAL note to a temporary pseudo.  In the testcase, we
do an unsigned __int128 multiplication of a variable by
0x7fffffffffffffff, which is determined to be best performed as
shift left by 63 (multiplication by 0x8000000000000000U) followed by
subtraction, i.e. (x << 63) - x.  The val_so_far is tracked in an UHWI and
is necessarily always non-negative even because the caller ensures that,
if (is_neg && mode_bitsize > HOST_BITS_PER_WIDE_INT) then it performs
expand_mult_const on the negation and negates afterwards the result.
The problem is that it calls gen_int_mode, where the argument is signed hwi
(well, these days poly_int64).  That is fine for DImode multiplication, we
don't really care about bits above the mode, but for TImode multiplication
it is significant.  Without this patch we emit in that case:
     (expr_list:REG_EQUAL (mult:TI (reg/v:TI 85 [ x ])
            (const_int -9223372036854775808 [0x8000000000000000]))
but that is for TImode actually multiplication by
0xffffffffffffffff8000000000000000
rather than
0x00000000000000008000000000000000, so we need to emit
	    (const_wide_int 0x08000000000000000)
instead.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and after a while for 8.3?

2018-08-30  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/87138
	* expmed.c (expand_mult_const): If val_so_far has MSB set, use
	immed_wide_int_const instead of gen_int_mode.  Formatting fixes.

	* gcc.target/i386/avx512bw-pr87138.c: New test.


	Jakub

Comments

Richard Biener Aug. 30, 2018, 7:31 a.m. UTC | #1
On Thu, 30 Aug 2018, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because expand_mult_const adds an
> incorrect REG_EQUAL note to a temporary pseudo.  In the testcase, we
> do an unsigned __int128 multiplication of a variable by
> 0x7fffffffffffffff, which is determined to be best performed as
> shift left by 63 (multiplication by 0x8000000000000000U) followed by
> subtraction, i.e. (x << 63) - x.  The val_so_far is tracked in an UHWI and
> is necessarily always non-negative even because the caller ensures that,
> if (is_neg && mode_bitsize > HOST_BITS_PER_WIDE_INT) then it performs
> expand_mult_const on the negation and negates afterwards the result.
> The problem is that it calls gen_int_mode, where the argument is signed hwi
> (well, these days poly_int64).  That is fine for DImode multiplication, we
> don't really care about bits above the mode, but for TImode multiplication
> it is significant.  Without this patch we emit in that case:
>      (expr_list:REG_EQUAL (mult:TI (reg/v:TI 85 [ x ])
>             (const_int -9223372036854775808 [0x8000000000000000]))
> but that is for TImode actually multiplication by
> 0xffffffffffffffff8000000000000000
> rather than
> 0x00000000000000008000000000000000, so we need to emit
> 	    (const_wide_int 0x08000000000000000)
> instead.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk and after a while for 8.3?

Ugh.  I wonder if we should add a

rtx gen_int_mode (poly_uint64 c, machine_mode mode)

and assert that the topmost bit is not set if GET_MODE_PRECISION (mode) > 64?

But I guess passing unsigned HOST_WIDE_INT will make this ambiguous.
So maybe a unsigned HOST_WIDE_INT overload instead.

Just to catch the possibly very many places where things can go wrong...

I also wonder why callers have to decide whether to use a CONST_INT,
CONST_WIDE_INT or CONST_DOUBLE and why we cannot have a single
interface here, eventually taking a sign in addition to the mode.

Code looks a bit ugly below, let's see if Richard can come up with
sth nicer.  Conceptually it is OK, but I wonder why not simply
unconditionally use a CONST_WIDE_INT/CONST_DOUBLE since
the value-range of unsigned HOST_WIDE_INT cannot be expressed
in a CONST_INT (unless the mode restricts the value-range appropriately).

Richard.

> 2018-08-30  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/87138
> 	* expmed.c (expand_mult_const): If val_so_far has MSB set, use
> 	immed_wide_int_const instead of gen_int_mode.  Formatting fixes.
> 
> 	* gcc.target/i386/avx512bw-pr87138.c: New test.
> 
> --- gcc/expmed.c.jj	2018-08-29 14:20:53.427109187 +0200
> +++ gcc/expmed.c	2018-08-29 17:22:52.416860714 +0200
> @@ -3347,19 +3347,27 @@ expand_mult_const (machine_mode mode, rt
>  	  /* Write a REG_EQUAL note on the last insn so that we can cse
>  	     multiplication sequences.  Note that if ACCUM is a SUBREG,
>  	     we've set the inner register and must properly indicate that.  */
> -          tem = op0, nmode = mode;
> -          accum_inner = accum;
> -          if (GET_CODE (accum) == SUBREG)
> +	  tem = op0, nmode = mode;
> +	  accum_inner = accum;
> +	  if (GET_CODE (accum) == SUBREG)
>  	    {
>  	      accum_inner = SUBREG_REG (accum);
>  	      nmode = GET_MODE (accum_inner);
>  	      tem = gen_lowpart (nmode, op0);
>  	    }
>  
> -          insn = get_last_insn ();
> -          set_dst_reg_note (insn, REG_EQUAL,
> -			    gen_rtx_MULT (nmode, tem,
> -					  gen_int_mode (val_so_far, nmode)),
> +	  insn = get_last_insn ();
> +	  rtx c;
> +	  if (val_so_far > (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MAX)
> +	    {
> +	      wide_int wval_so_far
> +		= wi::uhwi (val_so_far,
> +			    GET_MODE_PRECISION (as_a <scalar_mode> (nmode)));
> +	      c = immed_wide_int_const (wval_so_far, nmode);
> +	    }
> +	  else
> +	    c = gen_int_mode (val_so_far, nmode);
> +	  set_dst_reg_note (insn, REG_EQUAL, gen_rtx_MULT (nmode, tem, c),
>  			    accum_inner);
>  	}
>      }
> --- gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c.jj	2018-08-29 17:55:08.550154082 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c	2018-08-29 17:56:11.112131910 +0200
> @@ -0,0 +1,29 @@
> +/* PR middle-end/87138 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O -fno-tree-fre -mavx512bw -mtune=k8" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include "avx512bw-check.h"
> +
> +typedef int U __attribute__ ((vector_size (64)));
> +typedef __int128 V __attribute__ ((vector_size (64)));
> +V g, i;
> +
> +static inline void
> +foo (unsigned h, V j, U k, V n)
> +{
> +  k /= h;
> +  __builtin_memmove (&h, &n, 1);
> +  n[j[1]] *= 0x7FFFFFFFFFFFFFFF;
> +  j[k[5]] = 0;
> +  g = n;
> +  i = h + j + n;
> +}
> +
> +void
> +avx512bw_test ()
> +{
> +  foo (~0, (V) { }, (U) { 5 }, (V) { 3 });
> +  if (g[0] != (__int128) 3 * 0x7FFFFFFFFFFFFFFF)
> +    abort ();
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek Aug. 30, 2018, 7:54 a.m. UTC | #2
On Thu, Aug 30, 2018 at 09:31:01AM +0200, Richard Biener wrote:
> > The following testcase is miscompiled, because expand_mult_const adds an
> > incorrect REG_EQUAL note to a temporary pseudo.  In the testcase, we
> > do an unsigned __int128 multiplication of a variable by
> > 0x7fffffffffffffff, which is determined to be best performed as
> > shift left by 63 (multiplication by 0x8000000000000000U) followed by
> > subtraction, i.e. (x << 63) - x.  The val_so_far is tracked in an UHWI and
> > is necessarily always non-negative even because the caller ensures that,
> > if (is_neg && mode_bitsize > HOST_BITS_PER_WIDE_INT) then it performs
> > expand_mult_const on the negation and negates afterwards the result.
> > The problem is that it calls gen_int_mode, where the argument is signed hwi
> > (well, these days poly_int64).  That is fine for DImode multiplication, we
> > don't really care about bits above the mode, but for TImode multiplication
> > it is significant.  Without this patch we emit in that case:
> >      (expr_list:REG_EQUAL (mult:TI (reg/v:TI 85 [ x ])
> >             (const_int -9223372036854775808 [0x8000000000000000]))
> > but that is for TImode actually multiplication by
> > 0xffffffffffffffff8000000000000000
> > rather than
> > 0x00000000000000008000000000000000, so we need to emit
> > 	    (const_wide_int 0x08000000000000000)
> > instead.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk and after a while for 8.3?
> 
> Ugh.  I wonder if we should add a
> 
> rtx gen_int_mode (poly_uint64 c, machine_mode mode)
> 
> and assert that the topmost bit is not set if GET_MODE_PRECISION (mode) > 64?
> 
> But I guess passing unsigned HOST_WIDE_INT will make this ambiguous.
> So maybe a unsigned HOST_WIDE_INT overload instead.

I can try that, but I think there might be false positives too,
I think we use uhwi in many spots for whatever reason (e.g. defined overflow
behavior) and still want to treat it as shwi in the end.  Plus testsuite
coverage for TImode arithmetics is limited.

> Just to catch the possibly very many places where things can go wrong...
> 
> I also wonder why callers have to decide whether to use a CONST_INT,
> CONST_WIDE_INT or CONST_DOUBLE and why we cannot have a single
> interface here, eventually taking a sign in addition to the mode.

We do have it, it is the immed_wide_int_const.  The then block can be used
unconditionally, the reason I've made that conditional was to optimize for
the 99.9% common case, where we'd construct a wide_int, call
immed_wide_int_const which would do some check and call gen_int_mode again.

If you think for maintainance it is better to just do that unconditionally,
I can certainly retest.

> > +	  if (val_so_far > (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MAX)
> > +	    {
> > +	      wide_int wval_so_far
> > +		= wi::uhwi (val_so_far,
> > +			    GET_MODE_PRECISION (as_a <scalar_mode> (nmode)));
> > +	      c = immed_wide_int_const (wval_so_far, nmode);
> > +	    }
> > +	  else
> > +	    c = gen_int_mode (val_so_far, nmode);

	Jakub
Richard Biener Aug. 30, 2018, 7:57 a.m. UTC | #3
On Thu, 30 Aug 2018, Jakub Jelinek wrote:

> On Thu, Aug 30, 2018 at 09:31:01AM +0200, Richard Biener wrote:
> > > The following testcase is miscompiled, because expand_mult_const adds an
> > > incorrect REG_EQUAL note to a temporary pseudo.  In the testcase, we
> > > do an unsigned __int128 multiplication of a variable by
> > > 0x7fffffffffffffff, which is determined to be best performed as
> > > shift left by 63 (multiplication by 0x8000000000000000U) followed by
> > > subtraction, i.e. (x << 63) - x.  The val_so_far is tracked in an UHWI and
> > > is necessarily always non-negative even because the caller ensures that,
> > > if (is_neg && mode_bitsize > HOST_BITS_PER_WIDE_INT) then it performs
> > > expand_mult_const on the negation and negates afterwards the result.
> > > The problem is that it calls gen_int_mode, where the argument is signed hwi
> > > (well, these days poly_int64).  That is fine for DImode multiplication, we
> > > don't really care about bits above the mode, but for TImode multiplication
> > > it is significant.  Without this patch we emit in that case:
> > >      (expr_list:REG_EQUAL (mult:TI (reg/v:TI 85 [ x ])
> > >             (const_int -9223372036854775808 [0x8000000000000000]))
> > > but that is for TImode actually multiplication by
> > > 0xffffffffffffffff8000000000000000
> > > rather than
> > > 0x00000000000000008000000000000000, so we need to emit
> > > 	    (const_wide_int 0x08000000000000000)
> > > instead.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > > trunk and after a while for 8.3?
> > 
> > Ugh.  I wonder if we should add a
> > 
> > rtx gen_int_mode (poly_uint64 c, machine_mode mode)
> > 
> > and assert that the topmost bit is not set if GET_MODE_PRECISION (mode) > 64?
> > 
> > But I guess passing unsigned HOST_WIDE_INT will make this ambiguous.
> > So maybe a unsigned HOST_WIDE_INT overload instead.
> 
> I can try that, but I think there might be false positives too,
> I think we use uhwi in many spots for whatever reason (e.g. defined overflow
> behavior) and still want to treat it as shwi in the end.  Plus testsuite
> coverage for TImode arithmetics is limited.
> 
> > Just to catch the possibly very many places where things can go wrong...
> > 
> > I also wonder why callers have to decide whether to use a CONST_INT,
> > CONST_WIDE_INT or CONST_DOUBLE and why we cannot have a single
> > interface here, eventually taking a sign in addition to the mode.
> 
> We do have it, it is the immed_wide_int_const.  The then block can be used
> unconditionally, the reason I've made that conditional was to optimize for
> the 99.9% common case, where we'd construct a wide_int, call
> immed_wide_int_const which would do some check and call gen_int_mode again.
> 
> If you think for maintainance it is better to just do that unconditionally,
> I can certainly retest.

Yes, I think so.  It also makes it explicit wheter we want an unsigned
or signed value.

Richard.
Jakub Jelinek Aug. 30, 2018, 9:58 a.m. UTC | #4
On Thu, Aug 30, 2018 at 09:57:48AM +0200, Richard Biener wrote:
> > > Ugh.  I wonder if we should add a
> > > 
> > > rtx gen_int_mode (poly_uint64 c, machine_mode mode)
> > > 
> > > and assert that the topmost bit is not set if GET_MODE_PRECISION (mode) > 64?
> > > 
> > > But I guess passing unsigned HOST_WIDE_INT will make this ambiguous.
> > > So maybe a unsigned HOST_WIDE_INT overload instead.
> > 
> > I can try that, but I think there might be false positives too,

Neither works, if there is
rtx gen_int_mode (poly_int64 c, machine_mode mode);
rtx gen_int_mode (poly_uint64 c, machine_mode mode);
it is indeed ambiguous, if it is:
rtx gen_int_mode (poly_int64 c, machine_mode mode);
rtx gen_int_mode (unsigned HOST_WIDE_INT c, machine_mode mode);
then the latter is called when the argument is signed HOST_WIDE_INT and
the former is called when the argument is poly_uint64.

> > If you think for maintainance it is better to just do that unconditionally,
> > I can certainly retest.
> 
> Yes, I think so.  It also makes it explicit wheter we want an unsigned
> or signed value.

So like this if it passes bootstrap/regtest?

I've so far verified it does the right thing in this
unsigned __int128 x; ... x * 0x7fffffffffffffff;
case (use (const_wide_int 0x08000000000000000)), and with
unsigned __int128 x; ... x * 0x7fffffff;
case (use (const_int 0x80000000)), and with
unsigned long x; ... x * 0x7fffffffffffffff;
case (use (const_int 0x8000000000000000)).

2018-08-30  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/87138
	* expmed.c (expand_mult_const): Use immed_wide_int_const instead of
	gen_int_mode.  Formatting fixes.

	* gcc.target/i386/avx512bw-pr87138.c: New test.

--- gcc/expmed.c.jj	2018-08-29 23:36:11.854187906 +0200
+++ gcc/expmed.c	2018-08-30 11:38:01.151850590 +0200
@@ -3347,19 +3347,21 @@ expand_mult_const (machine_mode mode, rt
 	  /* Write a REG_EQUAL note on the last insn so that we can cse
 	     multiplication sequences.  Note that if ACCUM is a SUBREG,
 	     we've set the inner register and must properly indicate that.  */
-          tem = op0, nmode = mode;
-          accum_inner = accum;
-          if (GET_CODE (accum) == SUBREG)
+	  tem = op0, nmode = mode;
+	  accum_inner = accum;
+	  if (GET_CODE (accum) == SUBREG)
 	    {
 	      accum_inner = SUBREG_REG (accum);
 	      nmode = GET_MODE (accum_inner);
 	      tem = gen_lowpart (nmode, op0);
 	    }
 
-          insn = get_last_insn ();
-          set_dst_reg_note (insn, REG_EQUAL,
-			    gen_rtx_MULT (nmode, tem,
-					  gen_int_mode (val_so_far, nmode)),
+	  insn = get_last_insn ();
+	  wide_int wval_so_far
+	    = wi::uhwi (val_so_far,
+			GET_MODE_PRECISION (as_a <scalar_mode> (nmode)));
+	  rtx c = immed_wide_int_const (wval_so_far, nmode);
+	  set_dst_reg_note (insn, REG_EQUAL, gen_rtx_MULT (nmode, tem, c),
 			    accum_inner);
 	}
     }
--- gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c.jj	2018-08-30 11:37:23.833464733 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c	2018-08-30 11:37:23.833464733 +0200
@@ -0,0 +1,29 @@
+/* PR middle-end/87138 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O -fno-tree-fre -mavx512bw -mtune=k8" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+typedef int U __attribute__ ((vector_size (64)));
+typedef __int128 V __attribute__ ((vector_size (64)));
+V g, i;
+
+static inline void
+foo (unsigned h, V j, U k, V n)
+{
+  k /= h;
+  __builtin_memmove (&h, &n, 1);
+  n[j[1]] *= 0x7FFFFFFFFFFFFFFF;
+  j[k[5]] = 0;
+  g = n;
+  i = h + j + n;
+}
+
+void
+avx512bw_test ()
+{
+  foo (~0, (V) { }, (U) { 5 }, (V) { 3 });
+  if (g[0] != (__int128) 3 * 0x7FFFFFFFFFFFFFFF)
+    abort ();
+}


	Jakub
Richard Biener Aug. 30, 2018, 10:19 a.m. UTC | #5
On Thu, 30 Aug 2018, Jakub Jelinek wrote:

> On Thu, Aug 30, 2018 at 09:57:48AM +0200, Richard Biener wrote:
> > > > Ugh.  I wonder if we should add a
> > > > 
> > > > rtx gen_int_mode (poly_uint64 c, machine_mode mode)
> > > > 
> > > > and assert that the topmost bit is not set if GET_MODE_PRECISION (mode) > 64?
> > > > 
> > > > But I guess passing unsigned HOST_WIDE_INT will make this ambiguous.
> > > > So maybe a unsigned HOST_WIDE_INT overload instead.
> > > 
> > > I can try that, but I think there might be false positives too,
> 
> Neither works, if there is
> rtx gen_int_mode (poly_int64 c, machine_mode mode);
> rtx gen_int_mode (poly_uint64 c, machine_mode mode);
> it is indeed ambiguous, if it is:
> rtx gen_int_mode (poly_int64 c, machine_mode mode);
> rtx gen_int_mode (unsigned HOST_WIDE_INT c, machine_mode mode);
> then the latter is called when the argument is signed HOST_WIDE_INT and
> the former is called when the argument is poly_uint64.
> 
> > > If you think for maintainance it is better to just do that unconditionally,
> > > I can certainly retest.
> > 
> > Yes, I think so.  It also makes it explicit wheter we want an unsigned
> > or signed value.
> 
> So like this if it passes bootstrap/regtest?

Yes.

Thanks,
Richard.

> I've so far verified it does the right thing in this
> unsigned __int128 x; ... x * 0x7fffffffffffffff;
> case (use (const_wide_int 0x08000000000000000)), and with
> unsigned __int128 x; ... x * 0x7fffffff;
> case (use (const_int 0x80000000)), and with
> unsigned long x; ... x * 0x7fffffffffffffff;
> case (use (const_int 0x8000000000000000)).
> 
> 2018-08-30  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/87138
> 	* expmed.c (expand_mult_const): Use immed_wide_int_const instead of
> 	gen_int_mode.  Formatting fixes.
> 
> 	* gcc.target/i386/avx512bw-pr87138.c: New test.
> 
> --- gcc/expmed.c.jj	2018-08-29 23:36:11.854187906 +0200
> +++ gcc/expmed.c	2018-08-30 11:38:01.151850590 +0200
> @@ -3347,19 +3347,21 @@ expand_mult_const (machine_mode mode, rt
>  	  /* Write a REG_EQUAL note on the last insn so that we can cse
>  	     multiplication sequences.  Note that if ACCUM is a SUBREG,
>  	     we've set the inner register and must properly indicate that.  */
> -          tem = op0, nmode = mode;
> -          accum_inner = accum;
> -          if (GET_CODE (accum) == SUBREG)
> +	  tem = op0, nmode = mode;
> +	  accum_inner = accum;
> +	  if (GET_CODE (accum) == SUBREG)
>  	    {
>  	      accum_inner = SUBREG_REG (accum);
>  	      nmode = GET_MODE (accum_inner);
>  	      tem = gen_lowpart (nmode, op0);
>  	    }
>  
> -          insn = get_last_insn ();
> -          set_dst_reg_note (insn, REG_EQUAL,
> -			    gen_rtx_MULT (nmode, tem,
> -					  gen_int_mode (val_so_far, nmode)),
> +	  insn = get_last_insn ();
> +	  wide_int wval_so_far
> +	    = wi::uhwi (val_so_far,
> +			GET_MODE_PRECISION (as_a <scalar_mode> (nmode)));
> +	  rtx c = immed_wide_int_const (wval_so_far, nmode);
> +	  set_dst_reg_note (insn, REG_EQUAL, gen_rtx_MULT (nmode, tem, c),
>  			    accum_inner);
>  	}
>      }
> --- gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c.jj	2018-08-30 11:37:23.833464733 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c	2018-08-30 11:37:23.833464733 +0200
> @@ -0,0 +1,29 @@
> +/* PR middle-end/87138 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O -fno-tree-fre -mavx512bw -mtune=k8" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include "avx512bw-check.h"
> +
> +typedef int U __attribute__ ((vector_size (64)));
> +typedef __int128 V __attribute__ ((vector_size (64)));
> +V g, i;
> +
> +static inline void
> +foo (unsigned h, V j, U k, V n)
> +{
> +  k /= h;
> +  __builtin_memmove (&h, &n, 1);
> +  n[j[1]] *= 0x7FFFFFFFFFFFFFFF;
> +  j[k[5]] = 0;
> +  g = n;
> +  i = h + j + n;
> +}
> +
> +void
> +avx512bw_test ()
> +{
> +  foo (~0, (V) { }, (U) { 5 }, (V) { 3 });
> +  if (g[0] != (__int128) 3 * 0x7FFFFFFFFFFFFFFF)
> +    abort ();
> +}
> 
> 
> 	Jakub
> 
>
diff mbox series

Patch

--- gcc/expmed.c.jj	2018-08-29 14:20:53.427109187 +0200
+++ gcc/expmed.c	2018-08-29 17:22:52.416860714 +0200
@@ -3347,19 +3347,27 @@  expand_mult_const (machine_mode mode, rt
 	  /* Write a REG_EQUAL note on the last insn so that we can cse
 	     multiplication sequences.  Note that if ACCUM is a SUBREG,
 	     we've set the inner register and must properly indicate that.  */
-          tem = op0, nmode = mode;
-          accum_inner = accum;
-          if (GET_CODE (accum) == SUBREG)
+	  tem = op0, nmode = mode;
+	  accum_inner = accum;
+	  if (GET_CODE (accum) == SUBREG)
 	    {
 	      accum_inner = SUBREG_REG (accum);
 	      nmode = GET_MODE (accum_inner);
 	      tem = gen_lowpart (nmode, op0);
 	    }
 
-          insn = get_last_insn ();
-          set_dst_reg_note (insn, REG_EQUAL,
-			    gen_rtx_MULT (nmode, tem,
-					  gen_int_mode (val_so_far, nmode)),
+	  insn = get_last_insn ();
+	  rtx c;
+	  if (val_so_far > (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MAX)
+	    {
+	      wide_int wval_so_far
+		= wi::uhwi (val_so_far,
+			    GET_MODE_PRECISION (as_a <scalar_mode> (nmode)));
+	      c = immed_wide_int_const (wval_so_far, nmode);
+	    }
+	  else
+	    c = gen_int_mode (val_so_far, nmode);
+	  set_dst_reg_note (insn, REG_EQUAL, gen_rtx_MULT (nmode, tem, c),
 			    accum_inner);
 	}
     }
--- gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c.jj	2018-08-29 17:55:08.550154082 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-pr87138.c	2018-08-29 17:56:11.112131910 +0200
@@ -0,0 +1,29 @@ 
+/* PR middle-end/87138 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O -fno-tree-fre -mavx512bw -mtune=k8" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+typedef int U __attribute__ ((vector_size (64)));
+typedef __int128 V __attribute__ ((vector_size (64)));
+V g, i;
+
+static inline void
+foo (unsigned h, V j, U k, V n)
+{
+  k /= h;
+  __builtin_memmove (&h, &n, 1);
+  n[j[1]] *= 0x7FFFFFFFFFFFFFFF;
+  j[k[5]] = 0;
+  g = n;
+  i = h + j + n;
+}
+
+void
+avx512bw_test ()
+{
+  foo (~0, (V) { }, (U) { 5 }, (V) { 3 });
+  if (g[0] != (__int128) 3 * 0x7FFFFFFFFFFFFFFF)
+    abort ();
+}