diff mbox

Fix ICE when generating a vector shift by scalar

Message ID 1441052882.4779.3.camel@oc8801110288.ibm.com
State New
Headers show

Commit Message

Bill Schmidt Aug. 31, 2015, 8:28 p.m. UTC
Hi,

The following simple test fails when attempting to convert a vector
shift-by-scalar into a vector shift-by-vector.

  typedef unsigned char v16ui __attribute__((vector_size(16)));

  v16ui vslb(v16ui v, unsigned char i)
  {
    return v << i;
  }

When this code is gimplified, the shift amount gets expanded to an
unsigned int:

  vslb (v16ui v, unsigned char i)
  {
    v16ui D.2300;
    unsigned int D.2301;

    D.2301 = (unsigned int) i;
    D.2300 = v << D.2301;
    return D.2300;
  }

In expand_binop, the shift-by-scalar is converted into a shift-by-vector
using expand_vector_broadcast, which produces the following rtx to be
used to initialize a V16QI vector:

(parallel:V16QI [
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
    ])

The back end eventually chokes trying to generate a copy of the SImode
expression into a QImode memory slot.

This patch fixes this problem by ensuring that the shift amount is
truncated to the inner mode of the vector when necessary.  I've added a
test case verifying correct PowerPC code generation in this case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* optabs.c (expand_binop): Don't create a broadcast vector with a
	source element wider than the inner mode.

[gcc/testsuite]

2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-shift.c: New test.

Comments

Richard Biener Sept. 1, 2015, 9:01 a.m. UTC | #1
On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> The following simple test fails when attempting to convert a vector
> shift-by-scalar into a vector shift-by-vector.
>
>   typedef unsigned char v16ui __attribute__((vector_size(16)));
>
>   v16ui vslb(v16ui v, unsigned char i)
>   {
>     return v << i;
>   }
>
> When this code is gimplified, the shift amount gets expanded to an
> unsigned int:
>
>   vslb (v16ui v, unsigned char i)
>   {
>     v16ui D.2300;
>     unsigned int D.2301;
>
>     D.2301 = (unsigned int) i;
>     D.2300 = v << D.2301;
>     return D.2300;
>   }
>
> In expand_binop, the shift-by-scalar is converted into a shift-by-vector
> using expand_vector_broadcast, which produces the following rtx to be
> used to initialize a V16QI vector:
>
> (parallel:V16QI [
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>     ])
>
> The back end eventually chokes trying to generate a copy of the SImode
> expression into a QImode memory slot.
>
> This patch fixes this problem by ensuring that the shift amount is
> truncated to the inner mode of the vector when necessary.  I've added a
> test case verifying correct PowerPC code generation in this case.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * optabs.c (expand_binop): Don't create a broadcast vector with a
>         source element wider than the inner mode.
>
> [gcc/testsuite]
>
> 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/vec-shift.c: New test.
>
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        (revision 227353)
> +++ gcc/optabs.c        (working copy)
> @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
>
>        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
>         {
> +         /* The scalar may have been extended to be too wide.  Truncate
> +            it back to the proper size to fit in the broadcast vector.  */
> +         machine_mode inner_mode = GET_MODE_INNER (mode);
> +         if (GET_MODE_BITSIZE (inner_mode)
> +             < GET_MODE_BITSIZE (GET_MODE (op1)))

Does that work for modeless constants?  Btw, what do other targets do
here?  Do they
also choke or do they cope with the wide operand?

> +           op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
> +                                     GET_MODE (op1));
>           rtx vop1 = expand_vector_broadcast (mode, op1);
>           if (vop1)
>             {
> Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
> +/* { dg-options "-mcpu=power7 -O2" } */
> +
> +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
> +   int.  We used to fail at expand time as we tried to cram an SImode item
> +   into a QImode memory slot.  This has been fixed to properly truncate the
> +   shift amount when splatting it into a vector.  */
> +
> +typedef unsigned char v16ui __attribute__((vector_size(16)));
> +
> +v16ui vslb(v16ui v, unsigned char i)
> +{
> +       return v << i;
> +}
> +
> +/* { dg-final { scan-assembler "vspltb" } } */
> +/* { dg-final { scan-assembler "vslb" } } */
>
>
>
Bill Schmidt Sept. 1, 2015, 3:53 p.m. UTC | #2
On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Hi,
> >
> > The following simple test fails when attempting to convert a vector
> > shift-by-scalar into a vector shift-by-vector.
> >
> >   typedef unsigned char v16ui __attribute__((vector_size(16)));
> >
> >   v16ui vslb(v16ui v, unsigned char i)
> >   {
> >     return v << i;
> >   }
> >
> > When this code is gimplified, the shift amount gets expanded to an
> > unsigned int:
> >
> >   vslb (v16ui v, unsigned char i)
> >   {
> >     v16ui D.2300;
> >     unsigned int D.2301;
> >
> >     D.2301 = (unsigned int) i;
> >     D.2300 = v << D.2301;
> >     return D.2300;
> >   }
> >
> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
> > using expand_vector_broadcast, which produces the following rtx to be
> > used to initialize a V16QI vector:
> >
> > (parallel:V16QI [
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >     ])
> >
> > The back end eventually chokes trying to generate a copy of the SImode
> > expression into a QImode memory slot.
> >
> > This patch fixes this problem by ensuring that the shift amount is
> > truncated to the inner mode of the vector when necessary.  I've added a
> > test case verifying correct PowerPC code generation in this case.
> >
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions.  Is this ok for trunk?
> >
> > Thanks,
> > Bill
> >
> >
> > [gcc]
> >
> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> >         * optabs.c (expand_binop): Don't create a broadcast vector with a
> >         source element wider than the inner mode.
> >
> > [gcc/testsuite]
> >
> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> >         * gcc.target/powerpc/vec-shift.c: New test.
> >
> >
> > Index: gcc/optabs.c
> > ===================================================================
> > --- gcc/optabs.c        (revision 227353)
> > +++ gcc/optabs.c        (working copy)
> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
> >
> >        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
> >         {
> > +         /* The scalar may have been extended to be too wide.  Truncate
> > +            it back to the proper size to fit in the broadcast vector.  */
> > +         machine_mode inner_mode = GET_MODE_INNER (mode);
> > +         if (GET_MODE_BITSIZE (inner_mode)
> > +             < GET_MODE_BITSIZE (GET_MODE (op1)))
> 
> Does that work for modeless constants?  Btw, what do other targets do
> here?  Do they
> also choke or do they cope with the wide operand?

Good question.  This works by serendipity more than by design.  Because
a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
won't be generated.  It would be better for me to put in an explicit
check for CONST_INT rather than relying on this, though.  I'll fix that.

I am not sure what other targets do here; I can check.  However, do you
think that's relevant?  I'm concerned that

(parallel:V16QI [
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
        (subreg/s/v:SI (reg:DI 155) 0)
    ])

is a nonsensical expression and shouldn't be produced by common code, in
my view.  It seems best to make this explicitly correct.  Please let me
know if that's off-base.

Thanks,
Bill

> 
> > +           op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
> > +                                     GET_MODE (op1));
> >           rtx vop1 = expand_vector_broadcast (mode, op1);
> >           if (vop1)
> >             {
> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
> > +/* { dg-options "-mcpu=power7 -O2" } */
> > +
> > +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
> > +   int.  We used to fail at expand time as we tried to cram an SImode item
> > +   into a QImode memory slot.  This has been fixed to properly truncate the
> > +   shift amount when splatting it into a vector.  */
> > +
> > +typedef unsigned char v16ui __attribute__((vector_size(16)));
> > +
> > +v16ui vslb(v16ui v, unsigned char i)
> > +{
> > +       return v << i;
> > +}
> > +
> > +/* { dg-final { scan-assembler "vspltb" } } */
> > +/* { dg-final { scan-assembler "vslb" } } */
> >
> >
> >
>
Richard Biener Sept. 2, 2015, 12:44 p.m. UTC | #3
On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
>> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > Hi,
>> >
>> > The following simple test fails when attempting to convert a vector
>> > shift-by-scalar into a vector shift-by-vector.
>> >
>> >   typedef unsigned char v16ui __attribute__((vector_size(16)));
>> >
>> >   v16ui vslb(v16ui v, unsigned char i)
>> >   {
>> >     return v << i;
>> >   }
>> >
>> > When this code is gimplified, the shift amount gets expanded to an
>> > unsigned int:
>> >
>> >   vslb (v16ui v, unsigned char i)
>> >   {
>> >     v16ui D.2300;
>> >     unsigned int D.2301;
>> >
>> >     D.2301 = (unsigned int) i;
>> >     D.2300 = v << D.2301;
>> >     return D.2300;
>> >   }
>> >
>> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
>> > using expand_vector_broadcast, which produces the following rtx to be
>> > used to initialize a V16QI vector:
>> >
>> > (parallel:V16QI [
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >     ])
>> >
>> > The back end eventually chokes trying to generate a copy of the SImode
>> > expression into a QImode memory slot.
>> >
>> > This patch fixes this problem by ensuring that the shift amount is
>> > truncated to the inner mode of the vector when necessary.  I've added a
>> > test case verifying correct PowerPC code generation in this case.
>> >
>> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> > regressions.  Is this ok for trunk?
>> >
>> > Thanks,
>> > Bill
>> >
>> >
>> > [gcc]
>> >
>> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>> >
>> >         * optabs.c (expand_binop): Don't create a broadcast vector with a
>> >         source element wider than the inner mode.
>> >
>> > [gcc/testsuite]
>> >
>> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>> >
>> >         * gcc.target/powerpc/vec-shift.c: New test.
>> >
>> >
>> > Index: gcc/optabs.c
>> > ===================================================================
>> > --- gcc/optabs.c        (revision 227353)
>> > +++ gcc/optabs.c        (working copy)
>> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
>> >
>> >        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
>> >         {
>> > +         /* The scalar may have been extended to be too wide.  Truncate
>> > +            it back to the proper size to fit in the broadcast vector.  */
>> > +         machine_mode inner_mode = GET_MODE_INNER (mode);
>> > +         if (GET_MODE_BITSIZE (inner_mode)
>> > +             < GET_MODE_BITSIZE (GET_MODE (op1)))
>>
>> Does that work for modeless constants?  Btw, what do other targets do
>> here?  Do they
>> also choke or do they cope with the wide operand?
>
> Good question.  This works by serendipity more than by design.  Because
> a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
> won't be generated.  It would be better for me to put in an explicit
> check for CONST_INT rather than relying on this, though.  I'll fix that.
>
> I am not sure what other targets do here; I can check.  However, do you
> think that's relevant?  I'm concerned that
>
> (parallel:V16QI [
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>     ])
>
> is a nonsensical expression and shouldn't be produced by common code, in
> my view.  It seems best to make this explicitly correct.  Please let me
> know if that's off-base.

No, the above indeed looks fishy though other backends vec_init_optab might
have just handle it fine.

OTOH if a conversion is required it would be nice to CSE it, thus
force the result to a register (not sure if the targets handle invalid
RTL sharing in vec_init_optab).

> Thanks,
> Bill
>
>>
>> > +           op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
>> > +                                     GET_MODE (op1));
>> >           rtx vop1 = expand_vector_broadcast (mode, op1);
>> >           if (vop1)
>> >             {
>> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
>> > ===================================================================
>> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
>> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
>> > @@ -0,0 +1,20 @@
>> > +/* { dg-do compile { target { powerpc*-*-* } } } */
>> > +/* { dg-require-effective-target powerpc_altivec_ok } */
>> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
>> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
>> > +/* { dg-options "-mcpu=power7 -O2" } */
>> > +
>> > +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
>> > +   int.  We used to fail at expand time as we tried to cram an SImode item
>> > +   into a QImode memory slot.  This has been fixed to properly truncate the
>> > +   shift amount when splatting it into a vector.  */
>> > +
>> > +typedef unsigned char v16ui __attribute__((vector_size(16)));
>> > +
>> > +v16ui vslb(v16ui v, unsigned char i)
>> > +{
>> > +       return v << i;
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler "vspltb" } } */
>> > +/* { dg-final { scan-assembler "vslb" } } */
>> >
>> >
>> >
>>
>
>
diff mbox

Patch

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 227353)
+++ gcc/optabs.c	(working copy)
@@ -1608,6 +1608,13 @@  expand_binop (machine_mode mode, optab binoptab, r
 
       if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
 	{
+	  /* The scalar may have been extended to be too wide.  Truncate
+	     it back to the proper size to fit in the broadcast vector.  */
+	  machine_mode inner_mode = GET_MODE_INNER (mode);
+	  if (GET_MODE_BITSIZE (inner_mode)
+	      < GET_MODE_BITSIZE (GET_MODE (op1)))
+	    op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
+				      GET_MODE (op1));
 	  rtx vop1 = expand_vector_broadcast (mode, op1);
 	  if (vop1)
 	    {
Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-shift.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-shift.c	(working copy)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -O2" } */
+
+/* This used to ICE.  During gimplification, "i" is widened to an unsigned
+   int.  We used to fail at expand time as we tried to cram an SImode item
+   into a QImode memory slot.  This has been fixed to properly truncate the
+   shift amount when splatting it into a vector.  */
+
+typedef unsigned char v16ui __attribute__((vector_size(16)));
+
+v16ui vslb(v16ui v, unsigned char i)
+{
+	return v << i;
+}
+
+/* { dg-final { scan-assembler "vspltb" } } */
+/* { dg-final { scan-assembler "vslb" } } */