diff mbox series

Fix AVX512VL gather ICEs (PR target/88513, PR target/88514)

Message ID 20181217223741.GF23305@tucnak
State New
Headers show
Series Fix AVX512VL gather ICEs (PR target/88513, PR target/88514) | expand

Commit Message

Jakub Jelinek Dec. 17, 2018, 10:37 p.m. UTC
Hi!

Some of the following testcases ICE, because I was assuming that
VEC_UNPACK_{LO,HI}_EXPR and VEC_PACK_TRUNC_EXPR just work on the
VECTOR_BOOLEAN_TYPE_P mask types that AVX512* has (with scalar modes),
but they really only work if the wider mode is different from the narrower
one, so e.g. one can extract the lo or hi half of a nunits 16 VECTOR_BOOLEAN_TYPE_P
type with VEC_UNPACK_*_EXPR to have a HImode -> QImode expander, or
combine two nunits 8 halves into one 16 nunits one, i.e. QImode + QImode ->
HImode with VEC_PACK_TRUNC_EXPR, but for bitmasks with fewer bits it is
ambigious - either we need to extract lo/hi half of nunits 8
VECTOR_BOOLEAN_TYPE_P or lo/hi half of nunits 4 VECTOR_BOOLEAN_TYPE_P, both
would be QImode -> QImode operation and from the mode one can't figure out
which one is which.  For VEC_PACK_TRUNC_EXPR it is even more complicated,
because we use the operand mode as the name in the optab, so
vec_pack_trunc_qi is already used the the 8 + 8 -> 16 nunits one which gives
a HImode result and there is nothing left for the 4 + 4 -> 8 and 2 + 2 -> 4.

When not assuming it works, e.g. if I just used
supportable_widening_operation in the gather and scatter INTEGRAL_TYPE_P
masktype handling code, it would lead just to missed optimizations, because
the vectorizer just punted in those cases (note, it doesn't affect
-mprefer-vector-width=512 case, because even for DF/DImode we use 8 bits
already, but the cases when AVX512VL is used with
-mprefer-vector-width={128,256}.

The following patch introduces 3 new optabs which are like
vec_pack_trunc_optab resp. vec_unpacks_{lo,hi}_optab, except that their
expanders take another argument - CONST_INT representing the number of units
in the wider of the two bitmask (VECTOR_BOOLEAN_TYPE_P) types and is meant
to be used for the cases where both modes have different
TYPE_VECTOR_SUBPARTS, but the same TYPE_MODE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-17  Jakub Jelinek  <jakub@redhat.com>

	PR target/88513
	PR target/88514
	* optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
	vec_unpacks_sbool_lo_optab): New optabs.
	* optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
	and pass additional argument if both input and target have the same
	scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
	* expr.c (expand_expr_real_2) <case VEC_PACK_TRUNC_EXPR>: Handle
	VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
	as the operands using vec_pack_sbool_trunc_optab.
	* tree-vect-stmts.c (supportable_widening_operation): Use
	vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
	where both wider_vectype and vectype have the same scalar mode.
	(supportable_narrowing_operation): Similarly use
	vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
	scalar mode.
	* config/i386/i386.c (ix86_get_builtin)
	<case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for non-VECTOR_MODE_P
	rather than VOIDmode.
	* config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_<mode>):
	Remove useless ()s around "register_operand", formatting fixes.
	(vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
	vec_unpacks_sbool_hi_qi): New expanders.
	* doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
	vec_unpacks_sbool_lo_M): Document.

	* gcc.target/i386/avx512f-pr88513-1.c: New test.
	* gcc.target/i386/avx512f-pr88513-2.c: New test.
	* gcc.target/i386/avx512vl-pr88464-1.c: New test.
	* gcc.target/i386/avx512vl-pr88464-2.c: New test.
	* gcc.target/i386/avx512vl-pr88464-3.c: New test.
	* gcc.target/i386/avx512vl-pr88464-4.c: New test.
	* gcc.target/i386/avx512vl-pr88513-1.c: New test.
	* gcc.target/i386/avx512vl-pr88513-2.c: New test.
	* gcc.target/i386/avx512vl-pr88513-3.c: New test.
	* gcc.target/i386/avx512vl-pr88513-4.c: New test.
	* gcc.target/i386/avx512vl-pr88514-1.c: New test.
	* gcc.target/i386/avx512vl-pr88514-2.c: New test.
	* gcc.target/i386/avx512vl-pr88514-3.c: New test.


	Jakub

Comments

Uros Bizjak Dec. 18, 2018, 7:25 a.m. UTC | #1
On Mon, Dec 17, 2018 at 11:37 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> Some of the following testcases ICE, because I was assuming that
> VEC_UNPACK_{LO,HI}_EXPR and VEC_PACK_TRUNC_EXPR just work on the
> VECTOR_BOOLEAN_TYPE_P mask types that AVX512* has (with scalar modes),
> but they really only work if the wider mode is different from the narrower
> one, so e.g. one can extract the lo or hi half of a nunits 16 VECTOR_BOOLEAN_TYPE_P
> type with VEC_UNPACK_*_EXPR to have a HImode -> QImode expander, or
> combine two nunits 8 halves into one 16 nunits one, i.e. QImode + QImode ->
> HImode with VEC_PACK_TRUNC_EXPR, but for bitmasks with fewer bits it is
> ambigious - either we need to extract lo/hi half of nunits 8
> VECTOR_BOOLEAN_TYPE_P or lo/hi half of nunits 4 VECTOR_BOOLEAN_TYPE_P, both
> would be QImode -> QImode operation and from the mode one can't figure out
> which one is which.  For VEC_PACK_TRUNC_EXPR it is even more complicated,
> because we use the operand mode as the name in the optab, so
> vec_pack_trunc_qi is already used the the 8 + 8 -> 16 nunits one which gives
> a HImode result and there is nothing left for the 4 + 4 -> 8 and 2 + 2 -> 4.
>
> When not assuming it works, e.g. if I just used
> supportable_widening_operation in the gather and scatter INTEGRAL_TYPE_P
> masktype handling code, it would lead just to missed optimizations, because
> the vectorizer just punted in those cases (note, it doesn't affect
> -mprefer-vector-width=512 case, because even for DF/DImode we use 8 bits
> already, but the cases when AVX512VL is used with
> -mprefer-vector-width={128,256}.
>
> The following patch introduces 3 new optabs which are like
> vec_pack_trunc_optab resp. vec_unpacks_{lo,hi}_optab, except that their
> expanders take another argument - CONST_INT representing the number of units
> in the wider of the two bitmask (VECTOR_BOOLEAN_TYPE_P) types and is meant
> to be used for the cases where both modes have different
> TYPE_VECTOR_SUBPARTS, but the same TYPE_MODE.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-12-17  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/88513
>         PR target/88514
>         * optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
>         vec_unpacks_sbool_lo_optab): New optabs.
>         * optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
>         and pass additional argument if both input and target have the same
>         scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
>         * expr.c (expand_expr_real_2) <case VEC_PACK_TRUNC_EXPR>: Handle
>         VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
>         as the operands using vec_pack_sbool_trunc_optab.
>         * tree-vect-stmts.c (supportable_widening_operation): Use
>         vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
>         where both wider_vectype and vectype have the same scalar mode.
>         (supportable_narrowing_operation): Similarly use
>         vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
>         scalar mode.
>         * config/i386/i386.c (ix86_get_builtin)
>         <case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for non-VECTOR_MODE_P
>         rather than VOIDmode.

This entry doesn't match the change, you are checking for
VECTOR_MODE_P. On a related note, should similar
IX86_BUILTIN_GATHER3ALTDIV16{SF,SI} builtins be changed in the same
way?

>         * config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_<mode>):
>         Remove useless ()s around "register_operand", formatting fixes.
>         (vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
>         vec_unpacks_sbool_hi_qi): New expanders.
>         * doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
>         vec_unpacks_sbool_lo_M): Document.
>
>         * gcc.target/i386/avx512f-pr88513-1.c: New test.
>         * gcc.target/i386/avx512f-pr88513-2.c: New test.
>         * gcc.target/i386/avx512vl-pr88464-1.c: New test.
>         * gcc.target/i386/avx512vl-pr88464-2.c: New test.
>         * gcc.target/i386/avx512vl-pr88464-3.c: New test.
>         * gcc.target/i386/avx512vl-pr88464-4.c: New test.
>         * gcc.target/i386/avx512vl-pr88513-1.c: New test.
>         * gcc.target/i386/avx512vl-pr88513-2.c: New test.
>         * gcc.target/i386/avx512vl-pr88513-3.c: New test.
>         * gcc.target/i386/avx512vl-pr88513-4.c: New test.
>         * gcc.target/i386/avx512vl-pr88514-1.c: New test.
>         * gcc.target/i386/avx512vl-pr88514-2.c: New test.
>         * gcc.target/i386/avx512vl-pr88514-3.c: New test.

LGTM for the x86 part.

Thanks,
Uros.

> --- gcc/optabs.def.jj   2018-12-14 20:35:37.883126125 +0100
> +++ gcc/optabs.def      2018-12-17 15:18:27.817103784 +0100
> @@ -335,6 +335,7 @@ OPTAB_D (vec_pack_sfix_trunc_optab, "vec
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
> +OPTAB_D (vec_pack_sbool_trunc_optab, "vec_pack_sbool_trunc_$a")
>  OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
>  OPTAB_D (vec_packs_float_optab, "vec_packs_float_$a")
>  OPTAB_D (vec_packu_float_optab, "vec_packu_float_$a")
> @@ -350,6 +351,8 @@ OPTAB_D (vec_unpacks_float_hi_optab, "ve
>  OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a")
>  OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a")
>  OPTAB_D (vec_unpacks_lo_optab, "vec_unpacks_lo_$a")
> +OPTAB_D (vec_unpacks_sbool_hi_optab, "vec_unpacks_sbool_hi_$a")
> +OPTAB_D (vec_unpacks_sbool_lo_optab, "vec_unpacks_sbool_lo_$a")
>  OPTAB_D (vec_unpacku_float_hi_optab, "vec_unpacku_float_hi_$a")
>  OPTAB_D (vec_unpacku_float_lo_optab, "vec_unpacku_float_lo_$a")
>  OPTAB_D (vec_unpacku_hi_optab, "vec_unpacku_hi_$a")
> --- gcc/optabs.c.jj     2018-12-01 00:25:09.334989051 +0100
> +++ gcc/optabs.c        2018-12-17 16:19:10.331102731 +0100
> @@ -256,6 +256,7 @@ expand_widen_pattern_expr (sepops ops, r
>    enum insn_code icode;
>    int nops = TREE_CODE_LENGTH (ops->code);
>    int op;
> +  bool sbool = false;
>
>    oprnd0 = ops->op0;
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> @@ -265,6 +266,22 @@ expand_widen_pattern_expr (sepops ops, r
>         for these ops.  */
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, ops->type, optab_default);
> +  else if ((ops->code == VEC_UNPACK_HI_EXPR
> +           || ops->code == VEC_UNPACK_LO_EXPR)
> +          && VECTOR_BOOLEAN_TYPE_P (ops->type)
> +          && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (oprnd0))
> +          && TYPE_MODE (ops->type) == TYPE_MODE (TREE_TYPE (oprnd0))
> +          && SCALAR_INT_MODE_P (TYPE_MODE (ops->type)))
> +    {
> +      /* For VEC_UNPACK_{LO,HI}_EXPR if the mode of op0 and result is
> +        the same scalar mode for VECTOR_BOOLEAN_TYPE_P vectors, use
> +        vec_unpacks_sbool_{lo,hi}_optab, so that we can pass in
> +        the pattern number of elements in the wider vector.  */
> +      widen_pattern_optab
> +       = (ops->code == VEC_UNPACK_HI_EXPR
> +          ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> +      sbool = true;
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -282,6 +299,12 @@ expand_widen_pattern_expr (sepops ops, r
>        oprnd1 = ops->op1;
>        tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>      }
> +  else if (sbool)
> +    {
> +      nops = 2;
> +      op1 = GEN_INT (TYPE_VECTOR_SUBPARTS (TREE_TYPE (oprnd0)).to_constant ());
> +      tmode1 = tmode0;
> +    }
>
>    /* The last operand is of a wider mode than the rest of the operands.  */
>    if (nops == 2)
> --- gcc/expr.c.jj       2018-12-14 20:37:02.833751212 +0100
> +++ gcc/expr.c  2018-12-17 18:33:12.855336376 +0100
> @@ -9493,12 +9493,34 @@ expand_expr_real_2 (sepops ops, rtx targ
>        gcc_assert (target);
>        return target;
>
> -    case VEC_PACK_TRUNC_EXPR:
>      case VEC_PACK_SAT_EXPR:
>      case VEC_PACK_FIX_TRUNC_EXPR:
>        mode = TYPE_MODE (TREE_TYPE (treeop0));
>        goto binop;
>
> +    case VEC_PACK_TRUNC_EXPR:
> +      if (VECTOR_BOOLEAN_TYPE_P (type)
> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (treeop0))
> +         && mode == TYPE_MODE (TREE_TYPE (treeop0))
> +         && SCALAR_INT_MODE_P (mode))
> +       {
> +         struct expand_operand eops[4];
> +         machine_mode imode = TYPE_MODE (TREE_TYPE (treeop0));
> +         expand_operands (treeop0, treeop1,
> +                          subtarget, &op0, &op1, EXPAND_NORMAL);
> +         this_optab = vec_pack_sbool_trunc_optab;
> +         enum insn_code icode = optab_handler (this_optab, imode);
> +         create_output_operand (&eops[0], target, mode);
> +         create_convert_operand_from (&eops[1], op0, imode, false);
> +         create_convert_operand_from (&eops[2], op1, imode, false);
> +         temp = GEN_INT (TYPE_VECTOR_SUBPARTS (type).to_constant ());
> +         create_input_operand (&eops[3], temp, imode);
> +         expand_insn (icode, 4, eops);
> +         return eops[0].value;
> +       }
> +      mode = TYPE_MODE (TREE_TYPE (treeop0));
> +      goto binop;
> +
>      case VEC_PACK_FLOAT_EXPR:
>        mode = TYPE_MODE (TREE_TYPE (treeop0));
>        expand_operands (treeop0, treeop1,
> --- gcc/tree-vect-stmts.c.jj    2018-12-15 12:00:23.570570010 +0100
> +++ gcc/tree-vect-stmts.c       2018-12-17 18:23:57.899374515 +0100
> @@ -10313,6 +10313,17 @@ supportable_widening_operation (enum tre
>        optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
>        optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
>      }
> +  else if (CONVERT_EXPR_CODE_P (code)
> +          && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
> +          && VECTOR_BOOLEAN_TYPE_P (vectype)
> +          && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
> +          && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> +    {
> +      /* If the input and result modes are the same, a different optab
> +        is needed where we pass in the number of units in vectype.  */
> +      optab1 = vec_unpacks_sbool_lo_optab;
> +      optab2 = vec_unpacks_sbool_hi_optab;
> +    }
>    else
>      {
>        optab1 = optab_for_tree_code (c1, vectype, optab_default);
> @@ -10332,12 +10343,16 @@ supportable_widening_operation (enum tre
>
>    if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
>        && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
> +    {
> +      if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +       return true;
>        /* For scalar masks we may have different boolean
>          vector types having the same QImode.  Thus we
>          add additional check for elements number.  */
> -    return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -           || known_eq (TYPE_VECTOR_SUBPARTS (vectype),
> -                        TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
> +      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype),
> +                   TYPE_VECTOR_SUBPARTS (wide_vectype) * 2))
> +       return true;
> +    }
>
>    /* Check if it's a multi-step conversion that can be done using intermediate
>       types.  */
> @@ -10367,8 +10382,21 @@ supportable_widening_operation (enum tre
>           = lang_hooks.types.type_for_mode (intermediate_mode,
>                                             TYPE_UNSIGNED (prev_type));
>
> -      optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
> -      optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
> +      if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
> +         && VECTOR_BOOLEAN_TYPE_P (prev_type)
> +         && intermediate_mode == prev_mode
> +         && SCALAR_INT_MODE_P (prev_mode))
> +       {
> +         /* If the input and result modes are the same, a different optab
> +            is needed where we pass in the number of units in vectype.  */
> +         optab3 = vec_unpacks_sbool_lo_optab;
> +         optab4 = vec_unpacks_sbool_hi_optab;
> +       }
> +      else
> +       {
> +         optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
> +         optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
> +       }
>
>        if (!optab3 || !optab4
>            || (icode1 = optab_handler (optab1, prev_mode)) == CODE_FOR_nothing
> @@ -10386,9 +10414,13 @@ supportable_widening_operation (enum tre
>
>        if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
>           && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
> -       return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -               || known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
> -                            TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
> +       {
> +         if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +           return true;
> +         if (known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
> +                       TYPE_VECTOR_SUBPARTS (wide_vectype) * 2))
> +           return true;
> +       }
>
>        prev_type = intermediate_type;
>        prev_mode = intermediate_mode;
> @@ -10441,26 +10473,30 @@ supportable_narrowing_operation (enum tr
>      {
>      CASE_CONVERT:
>        c1 = VEC_PACK_TRUNC_EXPR;
> +      if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> +         && VECTOR_BOOLEAN_TYPE_P (vectype)
> +         && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +         && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> +       optab1 = vec_pack_sbool_trunc_optab;
> +      else
> +       optab1 = optab_for_tree_code (c1, vectype, optab_default);
>        break;
>
>      case FIX_TRUNC_EXPR:
>        c1 = VEC_PACK_FIX_TRUNC_EXPR;
> +      /* The signedness is determined from output operand.  */
> +      optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
>        break;
>
>      case FLOAT_EXPR:
>        c1 = VEC_PACK_FLOAT_EXPR;
> +      optab1 = optab_for_tree_code (c1, vectype, optab_default);
>        break;
>
>      default:
>        gcc_unreachable ();
>      }
>
> -  if (code == FIX_TRUNC_EXPR)
> -    /* The signedness is determined from output operand.  */
> -    optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
> -  else
> -    optab1 = optab_for_tree_code (c1, vectype, optab_default);
> -
>    if (!optab1)
>      return false;
>
> @@ -10471,12 +10507,16 @@ supportable_narrowing_operation (enum tr
>    *code1 = c1;
>
>    if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
> -    /* For scalar masks we may have different boolean
> -       vector types having the same QImode.  Thus we
> -       add additional check for elements number.  */
> -    return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -           || known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
> -                        TYPE_VECTOR_SUBPARTS (narrow_vectype)));
> +    {
> +      if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +       return true;
> +      /* For scalar masks we may have different boolean
> +        vector types having the same QImode.  Thus we
> +        add additional check for elements number.  */
> +      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
> +                   TYPE_VECTOR_SUBPARTS (narrow_vectype)))
> +       return true;
> +    }
>
>    if (code == FLOAT_EXPR)
>      return false;
> @@ -10528,9 +10568,15 @@ supportable_narrowing_operation (enum tr
>        else
>         intermediate_type
>           = lang_hooks.types.type_for_mode (intermediate_mode, uns);
> -      interm_optab
> -       = optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
> -                              optab_default);
> +      if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
> +         && VECTOR_BOOLEAN_TYPE_P (prev_type)
> +         && intermediate_mode == prev_mode
> +         && SCALAR_INT_MODE_P (prev_mode))
> +       interm_optab = vec_pack_sbool_trunc_optab;
> +      else
> +       interm_optab
> +         = optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
> +                                optab_default);
>        if (!interm_optab
>           || ((icode1 = optab_handler (optab1, prev_mode)) == CODE_FOR_nothing)
>           || insn_data[icode1].operand[0].mode != intermediate_mode
> @@ -10542,9 +10588,13 @@ supportable_narrowing_operation (enum tr
>        (*multi_step_cvt)++;
>
>        if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
> -       return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -               || known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
> -                            TYPE_VECTOR_SUBPARTS (narrow_vectype)));
> +       {
> +         if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +           return true;
> +         if (known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
> +                       TYPE_VECTOR_SUBPARTS (narrow_vectype)))
> +           return true;
> +       }
>
>        prev_mode = intermediate_mode;
>        prev_type = intermediate_type;
> --- gcc/config/i386/i386.c.jj   2018-12-15 12:00:23.579569863 +0100
> +++ gcc/config/i386/i386.c      2018-12-17 12:55:11.964845121 +0100
> @@ -37622,7 +37622,7 @@ rdseed_step:
>             op0 = copy_to_mode_reg (GET_MODE (op0), op0);
>           emit_insn (gen (half, op0));
>           op0 = half;
> -         if (GET_MODE (op3) != VOIDmode)
> +         if (VECTOR_MODE_P (GET_MODE (op3)))
>             {
>               half = gen_reg_rtx (mode0);
>               if (!nonimmediate_operand (op3, GET_MODE (op3)))
> --- gcc/config/i386/sse.md.jj   2018-12-15 00:20:37.936738279 +0100
> +++ gcc/config/i386/sse.md      2018-12-17 18:13:48.109327499 +0100
> @@ -12435,22 +12435,59 @@ (define_expand "vec_pack_trunc_<mode>"
>  })
>
>  (define_expand "vec_pack_trunc_qi"
> -  [(set (match_operand:HI 0 ("register_operand"))
> -        (ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 ("register_operand")))
> +  [(set (match_operand:HI 0 "register_operand")
> +       (ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 "register_operand"))
>                             (const_int 8))
> -                (zero_extend:HI (match_operand:QI 1 ("register_operand")))))]
> +               (zero_extend:HI (match_operand:QI 1 "register_operand"))))]
>    "TARGET_AVX512F")
>
>  (define_expand "vec_pack_trunc_<mode>"
> -  [(set (match_operand:<DOUBLEMASKMODE> 0 ("register_operand"))
> -        (ior:<DOUBLEMASKMODE> (ashift:<DOUBLEMASKMODE> (zero_extend:<DOUBLEMASKMODE> (match_operand:SWI24 2 ("register_operand")))
> -                           (match_dup 3))
> -                (zero_extend:<DOUBLEMASKMODE> (match_operand:SWI24 1 ("register_operand")))))]
> +  [(set (match_operand:<DOUBLEMASKMODE> 0 "register_operand")
> +       (ior:<DOUBLEMASKMODE>
> +         (ashift:<DOUBLEMASKMODE>
> +           (zero_extend:<DOUBLEMASKMODE>
> +             (match_operand:SWI24 2 "register_operand"))
> +           (match_dup 3))
> +         (zero_extend:<DOUBLEMASKMODE>
> +           (match_operand:SWI24 1 "register_operand"))))]
>    "TARGET_AVX512BW"
>  {
>    operands[3] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
>  })
>
> +(define_expand "vec_pack_sbool_trunc_qi"
> +  [(match_operand:QI 0 "register_operand")
> +   (match_operand:QI 1 "register_operand")
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand")]
> +  "TARGET_AVX512F"
> +{
> +  HOST_WIDE_INT nunits = INTVAL (operands[3]);
> +  rtx mask, tem1, tem2;
> +  if (nunits != 8 && nunits != 4)
> +    FAIL;
> +  mask = gen_reg_rtx (QImode);
> +  emit_move_insn (mask, GEN_INT ((1 << (nunits / 2)) - 1));
> +  tem1 = gen_reg_rtx (QImode);
> +  emit_insn (gen_kandqi (tem1, operands[1], mask));
> +  if (TARGET_AVX512DQ)
> +    {
> +      tem2 = gen_reg_rtx (QImode);
> +      emit_insn (gen_kashiftqi (tem2, operands[2],
> +                               GEN_INT (nunits / 2)));
> +    }
> +  else
> +    {
> +      tem2 = gen_reg_rtx (HImode);
> +      emit_insn (gen_kashifthi (tem2, lowpart_subreg (HImode, operands[2],
> +                                                     QImode),
> +                               GEN_INT (nunits / 2)));
> +      tem2 = lowpart_subreg (QImode, tem2, HImode);
> +    }
> +  emit_insn (gen_kiorqi (operands[0], tem1, tem2));
> +  DONE;
> +})
> +
>  (define_insn "<sse2_avx2>_packsswb<mask_name>"
>    [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
>         (vec_concat:VI1_AVX512
> @@ -14603,6 +14640,18 @@ (define_expand "vec_unpacku_lo_<mode>"
>    "TARGET_SSE2"
>    "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;")
>
> +(define_expand "vec_unpacks_sbool_lo_qi"
> +  [(match_operand:QI 0 "register_operand")
> +   (match_operand:QI 1 "register_operand")
> +   (match_operand:QI 2 "const_int_operand")]
> +  "TARGET_AVX512F"
> +{
> +  if (INTVAL (operands[2]) != 8 && INTVAL (operands[2]) != 4)
> +    FAIL;
> +  emit_move_insn (operands[0], operands[1]);
> +  DONE;
> +})
> +
>  (define_expand "vec_unpacks_lo_hi"
>    [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
>          (match_operand:HI 1 "register_operand"))]
> @@ -14624,6 +14673,29 @@ (define_expand "vec_unpacku_hi_<mode>"
>    "TARGET_SSE2"
>    "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;")
>
> +(define_expand "vec_unpacks_sbool_hi_qi"
> +  [(match_operand:QI 0 "register_operand")
> +   (match_operand:QI 1 "register_operand")
> +   (match_operand:QI 2 "const_int_operand")]
> +  "TARGET_AVX512F"
> +{
> +  HOST_WIDE_INT nunits = INTVAL (operands[2]);
> +  if (nunits != 8 && nunits != 4)
> +    FAIL;
> +  if (TARGET_AVX512DQ)
> +    emit_insn (gen_klshiftrtqi (operands[0], operands[1],
> +                               GEN_INT (nunits / 2)));
> +  else
> +    {
> +      rtx tem = gen_reg_rtx (HImode);
> +      emit_insn (gen_klshiftrthi (tem, lowpart_subreg (HImode, operands[1],
> +                                                      QImode),
> +                                 GEN_INT (nunits / 2)));
> +      emit_move_insn (operands[0], lowpart_subreg (QImode, tem, HImode));
> +    }
> +  DONE;
> +})
> +
>  (define_expand "vec_unpacks_hi_hi"
>    [(parallel
>       [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
> --- gcc/doc/md.texi.jj  2018-11-27 16:37:37.000472599 +0100
> +++ gcc/doc/md.texi     2018-12-17 18:47:51.115042917 +0100
> @@ -5428,6 +5428,16 @@ are vectors of the same mode having N in
>  of size S@.  Operand 0 is the resulting vector in which 2*N elements of
>  size N/2 are concatenated after narrowing them down using truncation.
>
> +@cindex @code{vec_pack_sbool_trunc_@var{m}} instruction pattern
> +@item @samp{vec_pack_sbool_trunc_@var{m}}
> +Narrow and merge the elements of two vectors.  Operands 1 and 2 are vectors
> +of the same type having N boolean elements.  Operand 0 is the resulting
> +vector in which 2*N elements are concatenated.  The last operand (operand 3)
> +is the number of elements in the output vector 2*N as a @code{CONST_INT}.
> +This instruction pattern is used when all the vector input and output
> +operands have the same scalar mode @var{m} and thus using
> +@code{vec_pack_trunc_@var{m}} would be ambiguous.
> +
>  @cindex @code{vec_pack_ssat_@var{m}} instruction pattern
>  @cindex @code{vec_pack_usat_@var{m}} instruction pattern
>  @item @samp{vec_pack_ssat_@var{m}}, @samp{vec_pack_usat_@var{m}}
> @@ -5470,6 +5480,17 @@ integral elements.  The input vector (op
>  Widen (promote) the high/low elements of the vector using zero extension and
>  place the resulting N/2 values of size 2*S in the output vector (operand 0).
>
> +@cindex @code{vec_unpacks_sbool_hi_@var{m}} instruction pattern
> +@cindex @code{vec_unpacks_sbool_lo_@var{m}} instruction pattern
> +@item @samp{vec_unpacks_sbool_hi_@var{m}}, @samp{vec_unpacks_sbool_lo_@var{m}}
> +Extract the high/low part of a vector of boolean elements that have scalar
> +mode @var{m}.  The input vector (operand 1) has N elements, the output
> +vector (operand 0) has N/2 elements.  The last operand (operand 2) is the
> +number of elements of the input vector N as a @code{CONST_INT}.  These
> +patterns are used if both the input and output vectors have the same scalar
> +mode @var{m} and thus using @code{vec_unpacks_hi_@var{m}} or
> +@code{vec_unpacks_lo_@var{m}} would be ambiguous.
> +
>  @cindex @code{vec_unpacks_float_hi_@var{m}} instruction pattern
>  @cindex @code{vec_unpacks_float_lo_@var{m}} instruction pattern
>  @cindex @code{vec_unpacku_float_hi_@var{m}} instruction pattern
> --- gcc/testsuite/gcc.target/i386/avx512f-pr88513-1.c.jj        2018-12-17 13:20:51.486755223 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512f-pr88513-1.c   2018-12-17 13:21:14.441379518 +0100
> @@ -0,0 +1,16 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512f -mtune=intel -mprefer-vector-width=512 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-1.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_512 (void)
> +{
> +  bar ();
> +}
> --- gcc/testsuite/gcc.target/i386/avx512f-pr88513-2.c.jj        2018-12-17 13:21:25.028206248 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512f-pr88513-2.c   2018-12-17 13:21:32.636081728 +0100
> @@ -0,0 +1,16 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512f -mtune=intel -mprefer-vector-width=512 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-2.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_512 (void)
> +{
> +  bar ();
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-1.c.jj       2018-12-17 18:50:06.197844506 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-1.c  2018-12-17 18:50:56.496025932 +0100
> @@ -0,0 +1,7 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=256 -mtune=skylake-avx512 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized using 32 byte vectors" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
> +
> +#include "avx512f-pr88464-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c.jj       2018-12-17 18:52:08.277857736 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c  2018-12-17 18:55:09.416909825 +0100
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do run { target { avx512vl } } } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=256 -mtune=skylake-avx512" } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +
> +#include "avx512f-pr88464-2.c"
> +
> +static void
> +test_256 (void)
> +{
> +  avx512f_test ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-3.c.jj       2018-12-17 19:04:42.820578091 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-3.c  2018-12-17 19:04:53.262408158 +0100
> @@ -0,0 +1,7 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=128 -mtune=skylake-avx512 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized using 16 byte vectors" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
> +
> +#include "avx512f-pr88464-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c.jj       2018-12-17 19:05:05.488209195 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c  2018-12-17 19:05:13.809073780 +0100
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do run { target { avx512vl } } } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=128 -mtune=skylake-avx512" } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +
> +#include "avx512f-pr88464-2.c"
> +
> +static void
> +test_256 (void)
> +{
> +  avx512f_test ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-1.c.jj       2018-12-17 13:09:21.035010881 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-1.c  2018-12-17 13:12:02.407380317 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=128 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-1.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-2.c.jj       2018-12-17 13:09:24.051961700 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-2.c  2018-12-17 13:12:25.042011342 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=128 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-2.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-3.c.jj       2018-12-17 13:19:26.298144342 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-3.c  2018-12-17 13:20:05.584503930 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=256 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-1.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-4.c.jj       2018-12-17 13:20:12.998383071 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-4.c  2018-12-17 13:20:21.088251199 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=256 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-2.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-1.c.jj       2018-12-17 13:03:23.934831994 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-1.c  2018-12-17 14:25:21.407699798 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/88514 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=128" } */
> +
> +#include "avx512vl-pr79299-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-2.c.jj       2018-12-17 13:03:34.080666665 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-2.c  2018-12-17 14:25:32.814514127 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/88514 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=256" } */
> +
> +#include "avx512vl-pr79299-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-3.c.jj       2018-12-17 13:03:50.291402408 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-3.c  2018-12-17 14:25:48.827253484 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/88514 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=512" } */
> +
> +#include "avx512vl-pr79299-1.c"
>
>         Jakub
Jakub Jelinek Dec. 18, 2018, 8:01 a.m. UTC | #2
On Tue, Dec 18, 2018 at 08:25:37AM +0100, Uros Bizjak wrote:
> >         <case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for non-VECTOR_MODE_P
> >         rather than VOIDmode.
> 
> This entry doesn't match the change, you are checking for
> VECTOR_MODE_P. On a related note, should similar

Ok, I'll write: Check for VECTOR_MODE_P rather than non-VOIDmode.

> IX86_BUILTIN_GATHER3ALTDIV16{SF,SI} builtins be changed in the same
> way?

No, because IX86_BUILTIN_GATHER3ALTDIV16S{F,I} has always the HImode
for the mask argument (sometimes CONST_INT with VOIDmode) and wants QImode.

The reason we need the change in this patch is that we have the same case
handling:
        case IX86_BUILTIN_GATHERALTDIV8SF:
        case IX86_BUILTIN_GATHERALTDIV8SI:
which have V8S[IF]mode for the mask argument and we want to do what is
inside of the if where the patch changes the guard, and
        case IX86_BUILTIN_GATHER3ALTDIV8SF:
        case IX86_BUILTIN_GATHER3ALTDIV8SI:
which have QImode mask argument (sometimes CONST_INT with VOIDmode) and
wants QImode (just cares about the low 4 bits of it rather than all 8 bits
it was passed in).  We could use a kand, but the instruction ignores the
argument and we'd need to load the mask immediate 15 into a register, copy
it into a mask register and then kand.

	Jakub
Richard Biener Dec. 18, 2018, 11:12 a.m. UTC | #3
On Mon, 17 Dec 2018, Jakub Jelinek wrote:

> Hi!
> 
> Some of the following testcases ICE, because I was assuming that
> VEC_UNPACK_{LO,HI}_EXPR and VEC_PACK_TRUNC_EXPR just work on the
> VECTOR_BOOLEAN_TYPE_P mask types that AVX512* has (with scalar modes),
> but they really only work if the wider mode is different from the narrower
> one, so e.g. one can extract the lo or hi half of a nunits 16 VECTOR_BOOLEAN_TYPE_P
> type with VEC_UNPACK_*_EXPR to have a HImode -> QImode expander, or
> combine two nunits 8 halves into one 16 nunits one, i.e. QImode + QImode ->
> HImode with VEC_PACK_TRUNC_EXPR, but for bitmasks with fewer bits it is
> ambigious - either we need to extract lo/hi half of nunits 8
> VECTOR_BOOLEAN_TYPE_P or lo/hi half of nunits 4 VECTOR_BOOLEAN_TYPE_P, both
> would be QImode -> QImode operation and from the mode one can't figure out
> which one is which.  For VEC_PACK_TRUNC_EXPR it is even more complicated,
> because we use the operand mode as the name in the optab, so
> vec_pack_trunc_qi is already used the the 8 + 8 -> 16 nunits one which gives
> a HImode result and there is nothing left for the 4 + 4 -> 8 and 2 + 2 -> 4.
> 
> When not assuming it works, e.g. if I just used
> supportable_widening_operation in the gather and scatter INTEGRAL_TYPE_P
> masktype handling code, it would lead just to missed optimizations, because
> the vectorizer just punted in those cases (note, it doesn't affect
> -mprefer-vector-width=512 case, because even for DF/DImode we use 8 bits
> already, but the cases when AVX512VL is used with
> -mprefer-vector-width={128,256}.
> 
> The following patch introduces 3 new optabs which are like
> vec_pack_trunc_optab resp. vec_unpacks_{lo,hi}_optab, except that their
> expanders take another argument - CONST_INT representing the number of units
> in the wider of the two bitmask (VECTOR_BOOLEAN_TYPE_P) types and is meant
> to be used for the cases where both modes have different
> TYPE_VECTOR_SUBPARTS, but the same TYPE_MODE.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2018-12-17  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/88513
> 	PR target/88514
> 	* optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
> 	vec_unpacks_sbool_lo_optab): New optabs.
> 	* optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
> 	and pass additional argument if both input and target have the same
> 	scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
> 	* expr.c (expand_expr_real_2) <case VEC_PACK_TRUNC_EXPR>: Handle
> 	VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
> 	as the operands using vec_pack_sbool_trunc_optab.
> 	* tree-vect-stmts.c (supportable_widening_operation): Use
> 	vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
> 	where both wider_vectype and vectype have the same scalar mode.
> 	(supportable_narrowing_operation): Similarly use
> 	vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
> 	scalar mode.
> 	* config/i386/i386.c (ix86_get_builtin)
> 	<case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for non-VECTOR_MODE_P
> 	rather than VOIDmode.
> 	* config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_<mode>):
> 	Remove useless ()s around "register_operand", formatting fixes.
> 	(vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
> 	vec_unpacks_sbool_hi_qi): New expanders.
> 	* doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
> 	vec_unpacks_sbool_lo_M): Document.
> 
> 	* gcc.target/i386/avx512f-pr88513-1.c: New test.
> 	* gcc.target/i386/avx512f-pr88513-2.c: New test.
> 	* gcc.target/i386/avx512vl-pr88464-1.c: New test.
> 	* gcc.target/i386/avx512vl-pr88464-2.c: New test.
> 	* gcc.target/i386/avx512vl-pr88464-3.c: New test.
> 	* gcc.target/i386/avx512vl-pr88464-4.c: New test.
> 	* gcc.target/i386/avx512vl-pr88513-1.c: New test.
> 	* gcc.target/i386/avx512vl-pr88513-2.c: New test.
> 	* gcc.target/i386/avx512vl-pr88513-3.c: New test.
> 	* gcc.target/i386/avx512vl-pr88513-4.c: New test.
> 	* gcc.target/i386/avx512vl-pr88514-1.c: New test.
> 	* gcc.target/i386/avx512vl-pr88514-2.c: New test.
> 	* gcc.target/i386/avx512vl-pr88514-3.c: New test.
> 
> --- gcc/optabs.def.jj	2018-12-14 20:35:37.883126125 +0100
> +++ gcc/optabs.def	2018-12-17 15:18:27.817103784 +0100
> @@ -335,6 +335,7 @@ OPTAB_D (vec_pack_sfix_trunc_optab, "vec
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
> +OPTAB_D (vec_pack_sbool_trunc_optab, "vec_pack_sbool_trunc_$a")
>  OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
>  OPTAB_D (vec_packs_float_optab, "vec_packs_float_$a")
>  OPTAB_D (vec_packu_float_optab, "vec_packu_float_$a")
> @@ -350,6 +351,8 @@ OPTAB_D (vec_unpacks_float_hi_optab, "ve
>  OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a")
>  OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a")
>  OPTAB_D (vec_unpacks_lo_optab, "vec_unpacks_lo_$a")
> +OPTAB_D (vec_unpacks_sbool_hi_optab, "vec_unpacks_sbool_hi_$a")
> +OPTAB_D (vec_unpacks_sbool_lo_optab, "vec_unpacks_sbool_lo_$a")
>  OPTAB_D (vec_unpacku_float_hi_optab, "vec_unpacku_float_hi_$a")
>  OPTAB_D (vec_unpacku_float_lo_optab, "vec_unpacku_float_lo_$a")
>  OPTAB_D (vec_unpacku_hi_optab, "vec_unpacku_hi_$a")
> --- gcc/optabs.c.jj	2018-12-01 00:25:09.334989051 +0100
> +++ gcc/optabs.c	2018-12-17 16:19:10.331102731 +0100
> @@ -256,6 +256,7 @@ expand_widen_pattern_expr (sepops ops, r
>    enum insn_code icode;
>    int nops = TREE_CODE_LENGTH (ops->code);
>    int op;
> +  bool sbool = false;
>  
>    oprnd0 = ops->op0;
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> @@ -265,6 +266,22 @@ expand_widen_pattern_expr (sepops ops, r
>         for these ops.  */
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, ops->type, optab_default);
> +  else if ((ops->code == VEC_UNPACK_HI_EXPR
> +	    || ops->code == VEC_UNPACK_LO_EXPR)
> +	   && VECTOR_BOOLEAN_TYPE_P (ops->type)
> +	   && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (oprnd0))
> +	   && TYPE_MODE (ops->type) == TYPE_MODE (TREE_TYPE (oprnd0))
> +	   && SCALAR_INT_MODE_P (TYPE_MODE (ops->type)))
> +    {
> +      /* For VEC_UNPACK_{LO,HI}_EXPR if the mode of op0 and result is
> +	 the same scalar mode for VECTOR_BOOLEAN_TYPE_P vectors, use
> +	 vec_unpacks_sbool_{lo,hi}_optab, so that we can pass in
> +	 the pattern number of elements in the wider vector.  */
> +      widen_pattern_optab
> +	= (ops->code == VEC_UNPACK_HI_EXPR
> +	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> +      sbool = true;
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -282,6 +299,12 @@ expand_widen_pattern_expr (sepops ops, r
>        oprnd1 = ops->op1;
>        tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>      }
> +  else if (sbool)
> +    {
> +      nops = 2;
> +      op1 = GEN_INT (TYPE_VECTOR_SUBPARTS (TREE_TYPE (oprnd0)).to_constant ());
> +      tmode1 = tmode0;
> +    }
>  
>    /* The last operand is of a wider mode than the rest of the operands.  */
>    if (nops == 2)
> --- gcc/expr.c.jj	2018-12-14 20:37:02.833751212 +0100
> +++ gcc/expr.c	2018-12-17 18:33:12.855336376 +0100
> @@ -9493,12 +9493,34 @@ expand_expr_real_2 (sepops ops, rtx targ
>        gcc_assert (target);
>        return target;
>  
> -    case VEC_PACK_TRUNC_EXPR:
>      case VEC_PACK_SAT_EXPR:
>      case VEC_PACK_FIX_TRUNC_EXPR:
>        mode = TYPE_MODE (TREE_TYPE (treeop0));
>        goto binop;
>  
> +    case VEC_PACK_TRUNC_EXPR:
> +      if (VECTOR_BOOLEAN_TYPE_P (type)
> +	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (treeop0))
> +	  && mode == TYPE_MODE (TREE_TYPE (treeop0))
> +	  && SCALAR_INT_MODE_P (mode))
> +	{
> +	  struct expand_operand eops[4];
> +	  machine_mode imode = TYPE_MODE (TREE_TYPE (treeop0));
> +	  expand_operands (treeop0, treeop1,
> +			   subtarget, &op0, &op1, EXPAND_NORMAL);
> +	  this_optab = vec_pack_sbool_trunc_optab;
> +	  enum insn_code icode = optab_handler (this_optab, imode);
> +	  create_output_operand (&eops[0], target, mode);
> +	  create_convert_operand_from (&eops[1], op0, imode, false);
> +	  create_convert_operand_from (&eops[2], op1, imode, false);
> +	  temp = GEN_INT (TYPE_VECTOR_SUBPARTS (type).to_constant ());
> +	  create_input_operand (&eops[3], temp, imode);
> +	  expand_insn (icode, 4, eops);
> +	  return eops[0].value;
> +	}
> +      mode = TYPE_MODE (TREE_TYPE (treeop0));
> +      goto binop;
> +
>      case VEC_PACK_FLOAT_EXPR:
>        mode = TYPE_MODE (TREE_TYPE (treeop0));
>        expand_operands (treeop0, treeop1,
> --- gcc/tree-vect-stmts.c.jj	2018-12-15 12:00:23.570570010 +0100
> +++ gcc/tree-vect-stmts.c	2018-12-17 18:23:57.899374515 +0100
> @@ -10313,6 +10313,17 @@ supportable_widening_operation (enum tre
>        optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
>        optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
>      }
> +  else if (CONVERT_EXPR_CODE_P (code)
> +	   && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
> +	   && VECTOR_BOOLEAN_TYPE_P (vectype)
> +	   && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
> +	   && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> +    {
> +      /* If the input and result modes are the same, a different optab
> +	 is needed where we pass in the number of units in vectype.  */
> +      optab1 = vec_unpacks_sbool_lo_optab;
> +      optab2 = vec_unpacks_sbool_hi_optab;
> +    }
>    else
>      {
>        optab1 = optab_for_tree_code (c1, vectype, optab_default);
> @@ -10332,12 +10343,16 @@ supportable_widening_operation (enum tre
>  
>    if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
>        && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
> +    {
> +      if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +	return true;
>        /* For scalar masks we may have different boolean
>  	 vector types having the same QImode.  Thus we
>  	 add additional check for elements number.  */
> -    return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -	    || known_eq (TYPE_VECTOR_SUBPARTS (vectype),
> -			 TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
> +      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype),
> +		    TYPE_VECTOR_SUBPARTS (wide_vectype) * 2))
> +	return true;
> +    }
>  
>    /* Check if it's a multi-step conversion that can be done using intermediate
>       types.  */
> @@ -10367,8 +10382,21 @@ supportable_widening_operation (enum tre
>  	  = lang_hooks.types.type_for_mode (intermediate_mode,
>  					    TYPE_UNSIGNED (prev_type));
>  
> -      optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
> -      optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
> +      if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
> +	  && VECTOR_BOOLEAN_TYPE_P (prev_type)
> +	  && intermediate_mode == prev_mode
> +	  && SCALAR_INT_MODE_P (prev_mode))
> +	{
> +	  /* If the input and result modes are the same, a different optab
> +	     is needed where we pass in the number of units in vectype.  */
> +	  optab3 = vec_unpacks_sbool_lo_optab;
> +	  optab4 = vec_unpacks_sbool_hi_optab;
> +	}
> +      else
> +	{
> +	  optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
> +	  optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
> +	}
>  
>        if (!optab3 || !optab4
>            || (icode1 = optab_handler (optab1, prev_mode)) == CODE_FOR_nothing
> @@ -10386,9 +10414,13 @@ supportable_widening_operation (enum tre
>  
>        if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
>  	  && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
> -	return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -		|| known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
> -			     TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
> +	{
> +	  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +	    return true;
> +	  if (known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
> +			TYPE_VECTOR_SUBPARTS (wide_vectype) * 2))
> +	    return true;
> +	}
>  
>        prev_type = intermediate_type;
>        prev_mode = intermediate_mode;
> @@ -10441,26 +10473,30 @@ supportable_narrowing_operation (enum tr
>      {
>      CASE_CONVERT:
>        c1 = VEC_PACK_TRUNC_EXPR;
> +      if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> +	  && VECTOR_BOOLEAN_TYPE_P (vectype)
> +	  && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +	  && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> +	optab1 = vec_pack_sbool_trunc_optab;
> +      else
> +	optab1 = optab_for_tree_code (c1, vectype, optab_default);
>        break;
>  
>      case FIX_TRUNC_EXPR:
>        c1 = VEC_PACK_FIX_TRUNC_EXPR;
> +      /* The signedness is determined from output operand.  */
> +      optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
>        break;
>  
>      case FLOAT_EXPR:
>        c1 = VEC_PACK_FLOAT_EXPR;
> +      optab1 = optab_for_tree_code (c1, vectype, optab_default);
>        break;
>  
>      default:
>        gcc_unreachable ();
>      }
>  
> -  if (code == FIX_TRUNC_EXPR)
> -    /* The signedness is determined from output operand.  */
> -    optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
> -  else
> -    optab1 = optab_for_tree_code (c1, vectype, optab_default);
> -
>    if (!optab1)
>      return false;
>  
> @@ -10471,12 +10507,16 @@ supportable_narrowing_operation (enum tr
>    *code1 = c1;
>  
>    if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
> -    /* For scalar masks we may have different boolean
> -       vector types having the same QImode.  Thus we
> -       add additional check for elements number.  */
> -    return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -	    || known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
> -			 TYPE_VECTOR_SUBPARTS (narrow_vectype)));
> +    {
> +      if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +	return true;
> +      /* For scalar masks we may have different boolean
> +	 vector types having the same QImode.  Thus we
> +	 add additional check for elements number.  */
> +      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
> +		    TYPE_VECTOR_SUBPARTS (narrow_vectype)))
> +	return true;
> +    }
>  
>    if (code == FLOAT_EXPR)
>      return false;
> @@ -10528,9 +10568,15 @@ supportable_narrowing_operation (enum tr
>        else
>  	intermediate_type
>  	  = lang_hooks.types.type_for_mode (intermediate_mode, uns);
> -      interm_optab
> -	= optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
> -			       optab_default);
> +      if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
> +	  && VECTOR_BOOLEAN_TYPE_P (prev_type)
> +	  && intermediate_mode == prev_mode
> +	  && SCALAR_INT_MODE_P (prev_mode))
> +	interm_optab = vec_pack_sbool_trunc_optab;
> +      else
> +	interm_optab
> +	  = optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
> +				 optab_default);
>        if (!interm_optab
>  	  || ((icode1 = optab_handler (optab1, prev_mode)) == CODE_FOR_nothing)
>  	  || insn_data[icode1].operand[0].mode != intermediate_mode
> @@ -10542,9 +10588,13 @@ supportable_narrowing_operation (enum tr
>        (*multi_step_cvt)++;
>  
>        if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
> -	return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -		|| known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
> -			     TYPE_VECTOR_SUBPARTS (narrow_vectype)));
> +	{
> +	  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +	    return true;
> +	  if (known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
> +			TYPE_VECTOR_SUBPARTS (narrow_vectype)))
> +	    return true;
> +	}
>  
>        prev_mode = intermediate_mode;
>        prev_type = intermediate_type;
> --- gcc/config/i386/i386.c.jj	2018-12-15 12:00:23.579569863 +0100
> +++ gcc/config/i386/i386.c	2018-12-17 12:55:11.964845121 +0100
> @@ -37622,7 +37622,7 @@ rdseed_step:
>  	    op0 = copy_to_mode_reg (GET_MODE (op0), op0);
>  	  emit_insn (gen (half, op0));
>  	  op0 = half;
> -	  if (GET_MODE (op3) != VOIDmode)
> +	  if (VECTOR_MODE_P (GET_MODE (op3)))
>  	    {
>  	      half = gen_reg_rtx (mode0);
>  	      if (!nonimmediate_operand (op3, GET_MODE (op3)))
> --- gcc/config/i386/sse.md.jj	2018-12-15 00:20:37.936738279 +0100
> +++ gcc/config/i386/sse.md	2018-12-17 18:13:48.109327499 +0100
> @@ -12435,22 +12435,59 @@ (define_expand "vec_pack_trunc_<mode>"
>  })
>  
>  (define_expand "vec_pack_trunc_qi"
> -  [(set (match_operand:HI 0 ("register_operand"))
> -        (ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 ("register_operand")))
> +  [(set (match_operand:HI 0 "register_operand")
> +	(ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 "register_operand"))
>                             (const_int 8))
> -                (zero_extend:HI (match_operand:QI 1 ("register_operand")))))]
> +		(zero_extend:HI (match_operand:QI 1 "register_operand"))))]
>    "TARGET_AVX512F")
>  
>  (define_expand "vec_pack_trunc_<mode>"
> -  [(set (match_operand:<DOUBLEMASKMODE> 0 ("register_operand"))
> -        (ior:<DOUBLEMASKMODE> (ashift:<DOUBLEMASKMODE> (zero_extend:<DOUBLEMASKMODE> (match_operand:SWI24 2 ("register_operand")))
> -                           (match_dup 3))
> -                (zero_extend:<DOUBLEMASKMODE> (match_operand:SWI24 1 ("register_operand")))))]
> +  [(set (match_operand:<DOUBLEMASKMODE> 0 "register_operand")
> +	(ior:<DOUBLEMASKMODE>
> +	  (ashift:<DOUBLEMASKMODE>
> +	    (zero_extend:<DOUBLEMASKMODE>
> +	      (match_operand:SWI24 2 "register_operand"))
> +	    (match_dup 3))
> +	  (zero_extend:<DOUBLEMASKMODE>
> +	    (match_operand:SWI24 1 "register_operand"))))]
>    "TARGET_AVX512BW"
>  {
>    operands[3] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
>  })
>  
> +(define_expand "vec_pack_sbool_trunc_qi"
> +  [(match_operand:QI 0 "register_operand")
> +   (match_operand:QI 1 "register_operand")
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand")]
> +  "TARGET_AVX512F"
> +{
> +  HOST_WIDE_INT nunits = INTVAL (operands[3]);
> +  rtx mask, tem1, tem2;
> +  if (nunits != 8 && nunits != 4)
> +    FAIL;
> +  mask = gen_reg_rtx (QImode);
> +  emit_move_insn (mask, GEN_INT ((1 << (nunits / 2)) - 1));
> +  tem1 = gen_reg_rtx (QImode);
> +  emit_insn (gen_kandqi (tem1, operands[1], mask));
> +  if (TARGET_AVX512DQ)
> +    {
> +      tem2 = gen_reg_rtx (QImode);
> +      emit_insn (gen_kashiftqi (tem2, operands[2],
> +				GEN_INT (nunits / 2)));
> +    }
> +  else
> +    {
> +      tem2 = gen_reg_rtx (HImode);
> +      emit_insn (gen_kashifthi (tem2, lowpart_subreg (HImode, operands[2],
> +						      QImode),
> +				GEN_INT (nunits / 2)));
> +      tem2 = lowpart_subreg (QImode, tem2, HImode);
> +    }
> +  emit_insn (gen_kiorqi (operands[0], tem1, tem2));
> +  DONE;
> +})
> +
>  (define_insn "<sse2_avx2>_packsswb<mask_name>"
>    [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
>  	(vec_concat:VI1_AVX512
> @@ -14603,6 +14640,18 @@ (define_expand "vec_unpacku_lo_<mode>"
>    "TARGET_SSE2"
>    "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;")
>  
> +(define_expand "vec_unpacks_sbool_lo_qi"
> +  [(match_operand:QI 0 "register_operand")
> +   (match_operand:QI 1 "register_operand")
> +   (match_operand:QI 2 "const_int_operand")]
> +  "TARGET_AVX512F"
> +{
> +  if (INTVAL (operands[2]) != 8 && INTVAL (operands[2]) != 4)
> +    FAIL;
> +  emit_move_insn (operands[0], operands[1]);
> +  DONE;
> +})
> +
>  (define_expand "vec_unpacks_lo_hi"
>    [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
>          (match_operand:HI 1 "register_operand"))]
> @@ -14624,6 +14673,29 @@ (define_expand "vec_unpacku_hi_<mode>"
>    "TARGET_SSE2"
>    "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;")
>  
> +(define_expand "vec_unpacks_sbool_hi_qi"
> +  [(match_operand:QI 0 "register_operand")
> +   (match_operand:QI 1 "register_operand")
> +   (match_operand:QI 2 "const_int_operand")]
> +  "TARGET_AVX512F"
> +{
> +  HOST_WIDE_INT nunits = INTVAL (operands[2]);
> +  if (nunits != 8 && nunits != 4)
> +    FAIL;
> +  if (TARGET_AVX512DQ)
> +    emit_insn (gen_klshiftrtqi (operands[0], operands[1],
> +				GEN_INT (nunits / 2)));
> +  else
> +    {
> +      rtx tem = gen_reg_rtx (HImode);
> +      emit_insn (gen_klshiftrthi (tem, lowpart_subreg (HImode, operands[1],
> +						       QImode),
> +				  GEN_INT (nunits / 2)));
> +      emit_move_insn (operands[0], lowpart_subreg (QImode, tem, HImode));
> +    }
> +  DONE;
> +})
> +
>  (define_expand "vec_unpacks_hi_hi"
>    [(parallel
>       [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
> --- gcc/doc/md.texi.jj	2018-11-27 16:37:37.000472599 +0100
> +++ gcc/doc/md.texi	2018-12-17 18:47:51.115042917 +0100
> @@ -5428,6 +5428,16 @@ are vectors of the same mode having N in
>  of size S@.  Operand 0 is the resulting vector in which 2*N elements of
>  size N/2 are concatenated after narrowing them down using truncation.
>  
> +@cindex @code{vec_pack_sbool_trunc_@var{m}} instruction pattern
> +@item @samp{vec_pack_sbool_trunc_@var{m}}
> +Narrow and merge the elements of two vectors.  Operands 1 and 2 are vectors
> +of the same type having N boolean elements.  Operand 0 is the resulting
> +vector in which 2*N elements are concatenated.  The last operand (operand 3)
> +is the number of elements in the output vector 2*N as a @code{CONST_INT}.
> +This instruction pattern is used when all the vector input and output
> +operands have the same scalar mode @var{m} and thus using
> +@code{vec_pack_trunc_@var{m}} would be ambiguous.
> +
>  @cindex @code{vec_pack_ssat_@var{m}} instruction pattern
>  @cindex @code{vec_pack_usat_@var{m}} instruction pattern
>  @item @samp{vec_pack_ssat_@var{m}}, @samp{vec_pack_usat_@var{m}}
> @@ -5470,6 +5480,17 @@ integral elements.  The input vector (op
>  Widen (promote) the high/low elements of the vector using zero extension and
>  place the resulting N/2 values of size 2*S in the output vector (operand 0).
>  
> +@cindex @code{vec_unpacks_sbool_hi_@var{m}} instruction pattern
> +@cindex @code{vec_unpacks_sbool_lo_@var{m}} instruction pattern
> +@item @samp{vec_unpacks_sbool_hi_@var{m}}, @samp{vec_unpacks_sbool_lo_@var{m}}
> +Extract the high/low part of a vector of boolean elements that have scalar
> +mode @var{m}.  The input vector (operand 1) has N elements, the output
> +vector (operand 0) has N/2 elements.  The last operand (operand 2) is the
> +number of elements of the input vector N as a @code{CONST_INT}.  These
> +patterns are used if both the input and output vectors have the same scalar
> +mode @var{m} and thus using @code{vec_unpacks_hi_@var{m}} or
> +@code{vec_unpacks_lo_@var{m}} would be ambiguous.
> +
>  @cindex @code{vec_unpacks_float_hi_@var{m}} instruction pattern
>  @cindex @code{vec_unpacks_float_lo_@var{m}} instruction pattern
>  @cindex @code{vec_unpacku_float_hi_@var{m}} instruction pattern
> --- gcc/testsuite/gcc.target/i386/avx512f-pr88513-1.c.jj	2018-12-17 13:20:51.486755223 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512f-pr88513-1.c	2018-12-17 13:21:14.441379518 +0100
> @@ -0,0 +1,16 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512f -mtune=intel -mprefer-vector-width=512 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-1.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_512 (void)
> +{
> +  bar ();
> +}
> --- gcc/testsuite/gcc.target/i386/avx512f-pr88513-2.c.jj	2018-12-17 13:21:25.028206248 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512f-pr88513-2.c	2018-12-17 13:21:32.636081728 +0100
> @@ -0,0 +1,16 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512f -mtune=intel -mprefer-vector-width=512 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-2.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_512 (void)
> +{
> +  bar ();
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-1.c.jj	2018-12-17 18:50:06.197844506 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-1.c	2018-12-17 18:50:56.496025932 +0100
> @@ -0,0 +1,7 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=256 -mtune=skylake-avx512 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized using 32 byte vectors" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
> +
> +#include "avx512f-pr88464-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c.jj	2018-12-17 18:52:08.277857736 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c	2018-12-17 18:55:09.416909825 +0100
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do run { target { avx512vl } } } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=256 -mtune=skylake-avx512" } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +
> +#include "avx512f-pr88464-2.c"
> +
> +static void
> +test_256 (void)
> +{
> +  avx512f_test ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-3.c.jj	2018-12-17 19:04:42.820578091 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-3.c	2018-12-17 19:04:53.262408158 +0100
> @@ -0,0 +1,7 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=128 -mtune=skylake-avx512 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized using 16 byte vectors" 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
> +
> +#include "avx512f-pr88464-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c.jj	2018-12-17 19:05:05.488209195 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c	2018-12-17 19:05:13.809073780 +0100
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/88464 */
> +/* { dg-do run { target { avx512vl } } } */
> +/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=128 -mtune=skylake-avx512" } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +
> +#include "avx512f-pr88464-2.c"
> +
> +static void
> +test_256 (void)
> +{
> +  avx512f_test ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-1.c.jj	2018-12-17 13:09:21.035010881 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-1.c	2018-12-17 13:12:02.407380317 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=128 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-1.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-2.c.jj	2018-12-17 13:09:24.051961700 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-2.c	2018-12-17 13:12:25.042011342 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=128 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-2.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-3.c.jj	2018-12-17 13:19:26.298144342 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-3.c	2018-12-17 13:20:05.584503930 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=256 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-1.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-4.c.jj	2018-12-17 13:20:12.998383071 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-4.c	2018-12-17 13:20:21.088251199 +0100
> @@ -0,0 +1,24 @@
> +/* PR target/88513 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=256 -fno-vect-cost-model" } */
> +/* { dg-require-effective-target avx512vl } */
> +
> +#define AVX512VL
> +#define AVX512F_LEN 512
> +#define AVX512F_LEN_HALF 256
> +#define CHECK_H "avx512f-check.h"
> +
> +#include "../../gcc.dg/vect/pr59591-2.c"
> +
> +#include CHECK_H
> +
> +static void
> +test_256 (void)
> +{
> +  bar ();
> +}
> +
> +static void
> +test_128 (void)
> +{
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-1.c.jj	2018-12-17 13:03:23.934831994 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-1.c	2018-12-17 14:25:21.407699798 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/88514 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=128" } */
> +
> +#include "avx512vl-pr79299-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-2.c.jj	2018-12-17 13:03:34.080666665 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-2.c	2018-12-17 14:25:32.814514127 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/88514 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=256" } */
> +
> +#include "avx512vl-pr79299-1.c"
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-3.c.jj	2018-12-17 13:03:50.291402408 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-3.c	2018-12-17 14:25:48.827253484 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/88514 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=512" } */
> +
> +#include "avx512vl-pr79299-1.c"
> 
> 	Jakub
> 
>
diff mbox series

Patch

--- gcc/optabs.def.jj	2018-12-14 20:35:37.883126125 +0100
+++ gcc/optabs.def	2018-12-17 15:18:27.817103784 +0100
@@ -335,6 +335,7 @@  OPTAB_D (vec_pack_sfix_trunc_optab, "vec
 OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
 OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
 OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
+OPTAB_D (vec_pack_sbool_trunc_optab, "vec_pack_sbool_trunc_$a")
 OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
 OPTAB_D (vec_packs_float_optab, "vec_packs_float_$a")
 OPTAB_D (vec_packu_float_optab, "vec_packu_float_$a")
@@ -350,6 +351,8 @@  OPTAB_D (vec_unpacks_float_hi_optab, "ve
 OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a")
 OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a")
 OPTAB_D (vec_unpacks_lo_optab, "vec_unpacks_lo_$a")
+OPTAB_D (vec_unpacks_sbool_hi_optab, "vec_unpacks_sbool_hi_$a")
+OPTAB_D (vec_unpacks_sbool_lo_optab, "vec_unpacks_sbool_lo_$a")
 OPTAB_D (vec_unpacku_float_hi_optab, "vec_unpacku_float_hi_$a")
 OPTAB_D (vec_unpacku_float_lo_optab, "vec_unpacku_float_lo_$a")
 OPTAB_D (vec_unpacku_hi_optab, "vec_unpacku_hi_$a")
--- gcc/optabs.c.jj	2018-12-01 00:25:09.334989051 +0100
+++ gcc/optabs.c	2018-12-17 16:19:10.331102731 +0100
@@ -256,6 +256,7 @@  expand_widen_pattern_expr (sepops ops, r
   enum insn_code icode;
   int nops = TREE_CODE_LENGTH (ops->code);
   int op;
+  bool sbool = false;
 
   oprnd0 = ops->op0;
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
@@ -265,6 +266,22 @@  expand_widen_pattern_expr (sepops ops, r
        for these ops.  */
     widen_pattern_optab
       = optab_for_tree_code (ops->code, ops->type, optab_default);
+  else if ((ops->code == VEC_UNPACK_HI_EXPR
+	    || ops->code == VEC_UNPACK_LO_EXPR)
+	   && VECTOR_BOOLEAN_TYPE_P (ops->type)
+	   && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (oprnd0))
+	   && TYPE_MODE (ops->type) == TYPE_MODE (TREE_TYPE (oprnd0))
+	   && SCALAR_INT_MODE_P (TYPE_MODE (ops->type)))
+    {
+      /* For VEC_UNPACK_{LO,HI}_EXPR if the mode of op0 and result is
+	 the same scalar mode for VECTOR_BOOLEAN_TYPE_P vectors, use
+	 vec_unpacks_sbool_{lo,hi}_optab, so that we can pass in
+	 the pattern number of elements in the wider vector.  */
+      widen_pattern_optab
+	= (ops->code == VEC_UNPACK_HI_EXPR
+	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
+      sbool = true;
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -282,6 +299,12 @@  expand_widen_pattern_expr (sepops ops, r
       oprnd1 = ops->op1;
       tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
     }
+  else if (sbool)
+    {
+      nops = 2;
+      op1 = GEN_INT (TYPE_VECTOR_SUBPARTS (TREE_TYPE (oprnd0)).to_constant ());
+      tmode1 = tmode0;
+    }
 
   /* The last operand is of a wider mode than the rest of the operands.  */
   if (nops == 2)
--- gcc/expr.c.jj	2018-12-14 20:37:02.833751212 +0100
+++ gcc/expr.c	2018-12-17 18:33:12.855336376 +0100
@@ -9493,12 +9493,34 @@  expand_expr_real_2 (sepops ops, rtx targ
       gcc_assert (target);
       return target;
 
-    case VEC_PACK_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
 
+    case VEC_PACK_TRUNC_EXPR:
+      if (VECTOR_BOOLEAN_TYPE_P (type)
+	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (treeop0))
+	  && mode == TYPE_MODE (TREE_TYPE (treeop0))
+	  && SCALAR_INT_MODE_P (mode))
+	{
+	  struct expand_operand eops[4];
+	  machine_mode imode = TYPE_MODE (TREE_TYPE (treeop0));
+	  expand_operands (treeop0, treeop1,
+			   subtarget, &op0, &op1, EXPAND_NORMAL);
+	  this_optab = vec_pack_sbool_trunc_optab;
+	  enum insn_code icode = optab_handler (this_optab, imode);
+	  create_output_operand (&eops[0], target, mode);
+	  create_convert_operand_from (&eops[1], op0, imode, false);
+	  create_convert_operand_from (&eops[2], op1, imode, false);
+	  temp = GEN_INT (TYPE_VECTOR_SUBPARTS (type).to_constant ());
+	  create_input_operand (&eops[3], temp, imode);
+	  expand_insn (icode, 4, eops);
+	  return eops[0].value;
+	}
+      mode = TYPE_MODE (TREE_TYPE (treeop0));
+      goto binop;
+
     case VEC_PACK_FLOAT_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       expand_operands (treeop0, treeop1,
--- gcc/tree-vect-stmts.c.jj	2018-12-15 12:00:23.570570010 +0100
+++ gcc/tree-vect-stmts.c	2018-12-17 18:23:57.899374515 +0100
@@ -10313,6 +10313,17 @@  supportable_widening_operation (enum tre
       optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
       optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
     }
+  else if (CONVERT_EXPR_CODE_P (code)
+	   && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
+	   && VECTOR_BOOLEAN_TYPE_P (vectype)
+	   && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
+	   && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
+    {
+      /* If the input and result modes are the same, a different optab
+	 is needed where we pass in the number of units in vectype.  */
+      optab1 = vec_unpacks_sbool_lo_optab;
+      optab2 = vec_unpacks_sbool_hi_optab;
+    }
   else
     {
       optab1 = optab_for_tree_code (c1, vectype, optab_default);
@@ -10332,12 +10343,16 @@  supportable_widening_operation (enum tre
 
   if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
       && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
+    {
+      if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+	return true;
       /* For scalar masks we may have different boolean
 	 vector types having the same QImode.  Thus we
 	 add additional check for elements number.  */
-    return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-	    || known_eq (TYPE_VECTOR_SUBPARTS (vectype),
-			 TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
+      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype),
+		    TYPE_VECTOR_SUBPARTS (wide_vectype) * 2))
+	return true;
+    }
 
   /* Check if it's a multi-step conversion that can be done using intermediate
      types.  */
@@ -10367,8 +10382,21 @@  supportable_widening_operation (enum tre
 	  = lang_hooks.types.type_for_mode (intermediate_mode,
 					    TYPE_UNSIGNED (prev_type));
 
-      optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
-      optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
+      if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
+	  && VECTOR_BOOLEAN_TYPE_P (prev_type)
+	  && intermediate_mode == prev_mode
+	  && SCALAR_INT_MODE_P (prev_mode))
+	{
+	  /* If the input and result modes are the same, a different optab
+	     is needed where we pass in the number of units in vectype.  */
+	  optab3 = vec_unpacks_sbool_lo_optab;
+	  optab4 = vec_unpacks_sbool_hi_optab;
+	}
+      else
+	{
+	  optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
+	  optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
+	}
 
       if (!optab3 || !optab4
           || (icode1 = optab_handler (optab1, prev_mode)) == CODE_FOR_nothing
@@ -10386,9 +10414,13 @@  supportable_widening_operation (enum tre
 
       if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
 	  && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
-	return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-		|| known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
-			     TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
+	{
+	  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+	    return true;
+	  if (known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
+			TYPE_VECTOR_SUBPARTS (wide_vectype) * 2))
+	    return true;
+	}
 
       prev_type = intermediate_type;
       prev_mode = intermediate_mode;
@@ -10441,26 +10473,30 @@  supportable_narrowing_operation (enum tr
     {
     CASE_CONVERT:
       c1 = VEC_PACK_TRUNC_EXPR;
+      if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
+	  && VECTOR_BOOLEAN_TYPE_P (vectype)
+	  && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
+	  && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
+	optab1 = vec_pack_sbool_trunc_optab;
+      else
+	optab1 = optab_for_tree_code (c1, vectype, optab_default);
       break;
 
     case FIX_TRUNC_EXPR:
       c1 = VEC_PACK_FIX_TRUNC_EXPR;
+      /* The signedness is determined from output operand.  */
+      optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
       break;
 
     case FLOAT_EXPR:
       c1 = VEC_PACK_FLOAT_EXPR;
+      optab1 = optab_for_tree_code (c1, vectype, optab_default);
       break;
 
     default:
       gcc_unreachable ();
     }
 
-  if (code == FIX_TRUNC_EXPR)
-    /* The signedness is determined from output operand.  */
-    optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
-  else
-    optab1 = optab_for_tree_code (c1, vectype, optab_default);
-
   if (!optab1)
     return false;
 
@@ -10471,12 +10507,16 @@  supportable_narrowing_operation (enum tr
   *code1 = c1;
 
   if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
-    /* For scalar masks we may have different boolean
-       vector types having the same QImode.  Thus we
-       add additional check for elements number.  */
-    return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-	    || known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
-			 TYPE_VECTOR_SUBPARTS (narrow_vectype)));
+    {
+      if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+	return true;
+      /* For scalar masks we may have different boolean
+	 vector types having the same QImode.  Thus we
+	 add additional check for elements number.  */
+      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
+		    TYPE_VECTOR_SUBPARTS (narrow_vectype)))
+	return true;
+    }
 
   if (code == FLOAT_EXPR)
     return false;
@@ -10528,9 +10568,15 @@  supportable_narrowing_operation (enum tr
       else
 	intermediate_type
 	  = lang_hooks.types.type_for_mode (intermediate_mode, uns);
-      interm_optab
-	= optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
-			       optab_default);
+      if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
+	  && VECTOR_BOOLEAN_TYPE_P (prev_type)
+	  && intermediate_mode == prev_mode
+	  && SCALAR_INT_MODE_P (prev_mode))
+	interm_optab = vec_pack_sbool_trunc_optab;
+      else
+	interm_optab
+	  = optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
+				 optab_default);
       if (!interm_optab
 	  || ((icode1 = optab_handler (optab1, prev_mode)) == CODE_FOR_nothing)
 	  || insn_data[icode1].operand[0].mode != intermediate_mode
@@ -10542,9 +10588,13 @@  supportable_narrowing_operation (enum tr
       (*multi_step_cvt)++;
 
       if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
-	return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-		|| known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
-			     TYPE_VECTOR_SUBPARTS (narrow_vectype)));
+	{
+	  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+	    return true;
+	  if (known_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
+			TYPE_VECTOR_SUBPARTS (narrow_vectype)))
+	    return true;
+	}
 
       prev_mode = intermediate_mode;
       prev_type = intermediate_type;
--- gcc/config/i386/i386.c.jj	2018-12-15 12:00:23.579569863 +0100
+++ gcc/config/i386/i386.c	2018-12-17 12:55:11.964845121 +0100
@@ -37622,7 +37622,7 @@  rdseed_step:
 	    op0 = copy_to_mode_reg (GET_MODE (op0), op0);
 	  emit_insn (gen (half, op0));
 	  op0 = half;
-	  if (GET_MODE (op3) != VOIDmode)
+	  if (VECTOR_MODE_P (GET_MODE (op3)))
 	    {
 	      half = gen_reg_rtx (mode0);
 	      if (!nonimmediate_operand (op3, GET_MODE (op3)))
--- gcc/config/i386/sse.md.jj	2018-12-15 00:20:37.936738279 +0100
+++ gcc/config/i386/sse.md	2018-12-17 18:13:48.109327499 +0100
@@ -12435,22 +12435,59 @@  (define_expand "vec_pack_trunc_<mode>"
 })
 
 (define_expand "vec_pack_trunc_qi"
-  [(set (match_operand:HI 0 ("register_operand"))
-        (ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 ("register_operand")))
+  [(set (match_operand:HI 0 "register_operand")
+	(ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 "register_operand"))
                            (const_int 8))
-                (zero_extend:HI (match_operand:QI 1 ("register_operand")))))]
+		(zero_extend:HI (match_operand:QI 1 "register_operand"))))]
   "TARGET_AVX512F")
 
 (define_expand "vec_pack_trunc_<mode>"
-  [(set (match_operand:<DOUBLEMASKMODE> 0 ("register_operand"))
-        (ior:<DOUBLEMASKMODE> (ashift:<DOUBLEMASKMODE> (zero_extend:<DOUBLEMASKMODE> (match_operand:SWI24 2 ("register_operand")))
-                           (match_dup 3))
-                (zero_extend:<DOUBLEMASKMODE> (match_operand:SWI24 1 ("register_operand")))))]
+  [(set (match_operand:<DOUBLEMASKMODE> 0 "register_operand")
+	(ior:<DOUBLEMASKMODE>
+	  (ashift:<DOUBLEMASKMODE>
+	    (zero_extend:<DOUBLEMASKMODE>
+	      (match_operand:SWI24 2 "register_operand"))
+	    (match_dup 3))
+	  (zero_extend:<DOUBLEMASKMODE>
+	    (match_operand:SWI24 1 "register_operand"))))]
   "TARGET_AVX512BW"
 {
   operands[3] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
 })
 
+(define_expand "vec_pack_sbool_trunc_qi"
+  [(match_operand:QI 0 "register_operand")
+   (match_operand:QI 1 "register_operand")
+   (match_operand:QI 2 "register_operand")
+   (match_operand:QI 3 "const_int_operand")]
+  "TARGET_AVX512F"
+{
+  HOST_WIDE_INT nunits = INTVAL (operands[3]);
+  rtx mask, tem1, tem2;
+  if (nunits != 8 && nunits != 4)
+    FAIL;
+  mask = gen_reg_rtx (QImode);
+  emit_move_insn (mask, GEN_INT ((1 << (nunits / 2)) - 1));
+  tem1 = gen_reg_rtx (QImode);
+  emit_insn (gen_kandqi (tem1, operands[1], mask));
+  if (TARGET_AVX512DQ)
+    {
+      tem2 = gen_reg_rtx (QImode);
+      emit_insn (gen_kashiftqi (tem2, operands[2],
+				GEN_INT (nunits / 2)));
+    }
+  else
+    {
+      tem2 = gen_reg_rtx (HImode);
+      emit_insn (gen_kashifthi (tem2, lowpart_subreg (HImode, operands[2],
+						      QImode),
+				GEN_INT (nunits / 2)));
+      tem2 = lowpart_subreg (QImode, tem2, HImode);
+    }
+  emit_insn (gen_kiorqi (operands[0], tem1, tem2));
+  DONE;
+})
+
 (define_insn "<sse2_avx2>_packsswb<mask_name>"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
 	(vec_concat:VI1_AVX512
@@ -14603,6 +14640,18 @@  (define_expand "vec_unpacku_lo_<mode>"
   "TARGET_SSE2"
   "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;")
 
+(define_expand "vec_unpacks_sbool_lo_qi"
+  [(match_operand:QI 0 "register_operand")
+   (match_operand:QI 1 "register_operand")
+   (match_operand:QI 2 "const_int_operand")]
+  "TARGET_AVX512F"
+{
+  if (INTVAL (operands[2]) != 8 && INTVAL (operands[2]) != 4)
+    FAIL;
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
+
 (define_expand "vec_unpacks_lo_hi"
   [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
         (match_operand:HI 1 "register_operand"))]
@@ -14624,6 +14673,29 @@  (define_expand "vec_unpacku_hi_<mode>"
   "TARGET_SSE2"
   "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;")
 
+(define_expand "vec_unpacks_sbool_hi_qi"
+  [(match_operand:QI 0 "register_operand")
+   (match_operand:QI 1 "register_operand")
+   (match_operand:QI 2 "const_int_operand")]
+  "TARGET_AVX512F"
+{
+  HOST_WIDE_INT nunits = INTVAL (operands[2]);
+  if (nunits != 8 && nunits != 4)
+    FAIL;
+  if (TARGET_AVX512DQ)
+    emit_insn (gen_klshiftrtqi (operands[0], operands[1],
+				GEN_INT (nunits / 2)));
+  else
+    {
+      rtx tem = gen_reg_rtx (HImode);
+      emit_insn (gen_klshiftrthi (tem, lowpart_subreg (HImode, operands[1],
+						       QImode),
+				  GEN_INT (nunits / 2)));
+      emit_move_insn (operands[0], lowpart_subreg (QImode, tem, HImode));
+    }
+  DONE;
+})
+
 (define_expand "vec_unpacks_hi_hi"
   [(parallel
      [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
--- gcc/doc/md.texi.jj	2018-11-27 16:37:37.000472599 +0100
+++ gcc/doc/md.texi	2018-12-17 18:47:51.115042917 +0100
@@ -5428,6 +5428,16 @@  are vectors of the same mode having N in
 of size S@.  Operand 0 is the resulting vector in which 2*N elements of
 size N/2 are concatenated after narrowing them down using truncation.
 
+@cindex @code{vec_pack_sbool_trunc_@var{m}} instruction pattern
+@item @samp{vec_pack_sbool_trunc_@var{m}}
+Narrow and merge the elements of two vectors.  Operands 1 and 2 are vectors
+of the same type having N boolean elements.  Operand 0 is the resulting
+vector in which 2*N elements are concatenated.  The last operand (operand 3)
+is the number of elements in the output vector 2*N as a @code{CONST_INT}.
+This instruction pattern is used when all the vector input and output
+operands have the same scalar mode @var{m} and thus using
+@code{vec_pack_trunc_@var{m}} would be ambiguous.
+
 @cindex @code{vec_pack_ssat_@var{m}} instruction pattern
 @cindex @code{vec_pack_usat_@var{m}} instruction pattern
 @item @samp{vec_pack_ssat_@var{m}}, @samp{vec_pack_usat_@var{m}}
@@ -5470,6 +5480,17 @@  integral elements.  The input vector (op
 Widen (promote) the high/low elements of the vector using zero extension and
 place the resulting N/2 values of size 2*S in the output vector (operand 0).
 
+@cindex @code{vec_unpacks_sbool_hi_@var{m}} instruction pattern
+@cindex @code{vec_unpacks_sbool_lo_@var{m}} instruction pattern
+@item @samp{vec_unpacks_sbool_hi_@var{m}}, @samp{vec_unpacks_sbool_lo_@var{m}}
+Extract the high/low part of a vector of boolean elements that have scalar
+mode @var{m}.  The input vector (operand 1) has N elements, the output
+vector (operand 0) has N/2 elements.  The last operand (operand 2) is the
+number of elements of the input vector N as a @code{CONST_INT}.  These
+patterns are used if both the input and output vectors have the same scalar
+mode @var{m} and thus using @code{vec_unpacks_hi_@var{m}} or
+@code{vec_unpacks_lo_@var{m}} would be ambiguous.
+
 @cindex @code{vec_unpacks_float_hi_@var{m}} instruction pattern
 @cindex @code{vec_unpacks_float_lo_@var{m}} instruction pattern
 @cindex @code{vec_unpacku_float_hi_@var{m}} instruction pattern
--- gcc/testsuite/gcc.target/i386/avx512f-pr88513-1.c.jj	2018-12-17 13:20:51.486755223 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-pr88513-1.c	2018-12-17 13:21:14.441379518 +0100
@@ -0,0 +1,16 @@ 
+/* PR target/88513 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mtune=intel -mprefer-vector-width=512 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx512f } */
+
+#define CHECK_H "avx512f-check.h"
+
+#include "../../gcc.dg/vect/pr59591-1.c"
+
+#include CHECK_H
+
+static void
+test_512 (void)
+{
+  bar ();
+}
--- gcc/testsuite/gcc.target/i386/avx512f-pr88513-2.c.jj	2018-12-17 13:21:25.028206248 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-pr88513-2.c	2018-12-17 13:21:32.636081728 +0100
@@ -0,0 +1,16 @@ 
+/* PR target/88513 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mtune=intel -mprefer-vector-width=512 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx512f } */
+
+#define CHECK_H "avx512f-check.h"
+
+#include "../../gcc.dg/vect/pr59591-2.c"
+
+#include CHECK_H
+
+static void
+test_512 (void)
+{
+  bar ();
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-1.c.jj	2018-12-17 18:50:06.197844506 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-1.c	2018-12-17 18:50:56.496025932 +0100
@@ -0,0 +1,7 @@ 
+/* PR tree-optimization/88464 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=256 -mtune=skylake-avx512 -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "loop vectorized using 32 byte vectors" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
+
+#include "avx512f-pr88464-1.c"
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c.jj	2018-12-17 18:52:08.277857736 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c	2018-12-17 18:55:09.416909825 +0100
@@ -0,0 +1,20 @@ 
+/* PR tree-optimization/88464 */
+/* { dg-do run { target { avx512vl } } } */
+/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=256 -mtune=skylake-avx512" } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+
+#include "avx512f-pr88464-2.c"
+
+static void
+test_256 (void)
+{
+  avx512f_test ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-3.c.jj	2018-12-17 19:04:42.820578091 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-3.c	2018-12-17 19:04:53.262408158 +0100
@@ -0,0 +1,7 @@ 
+/* PR tree-optimization/88464 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=128 -mtune=skylake-avx512 -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "loop vectorized using 16 byte vectors" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
+
+#include "avx512f-pr88464-1.c"
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c.jj	2018-12-17 19:05:05.488209195 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c	2018-12-17 19:05:13.809073780 +0100
@@ -0,0 +1,20 @@ 
+/* PR tree-optimization/88464 */
+/* { dg-do run { target { avx512vl } } } */
+/* { dg-options "-O3 -mavx512vl -mprefer-vector-width=128 -mtune=skylake-avx512" } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+
+#include "avx512f-pr88464-2.c"
+
+static void
+test_256 (void)
+{
+  avx512f_test ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-1.c.jj	2018-12-17 13:09:21.035010881 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-1.c	2018-12-17 13:12:02.407380317 +0100
@@ -0,0 +1,24 @@ 
+/* PR target/88513 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=128 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+#define CHECK_H "avx512f-check.h"
+
+#include "../../gcc.dg/vect/pr59591-1.c"
+
+#include CHECK_H
+
+static void
+test_256 (void)
+{
+  bar ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-2.c.jj	2018-12-17 13:09:24.051961700 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-2.c	2018-12-17 13:12:25.042011342 +0100
@@ -0,0 +1,24 @@ 
+/* PR target/88513 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=128 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+#define CHECK_H "avx512f-check.h"
+
+#include "../../gcc.dg/vect/pr59591-2.c"
+
+#include CHECK_H
+
+static void
+test_256 (void)
+{
+  bar ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-3.c.jj	2018-12-17 13:19:26.298144342 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-3.c	2018-12-17 13:20:05.584503930 +0100
@@ -0,0 +1,24 @@ 
+/* PR target/88513 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=256 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+#define CHECK_H "avx512f-check.h"
+
+#include "../../gcc.dg/vect/pr59591-1.c"
+
+#include CHECK_H
+
+static void
+test_256 (void)
+{
+  bar ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88513-4.c.jj	2018-12-17 13:20:12.998383071 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88513-4.c	2018-12-17 13:20:21.088251199 +0100
@@ -0,0 +1,24 @@ 
+/* PR target/88513 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl -mtune=intel -mprefer-vector-width=256 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+#define CHECK_H "avx512f-check.h"
+
+#include "../../gcc.dg/vect/pr59591-2.c"
+
+#include CHECK_H
+
+static void
+test_256 (void)
+{
+  bar ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-1.c.jj	2018-12-17 13:03:23.934831994 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-1.c	2018-12-17 14:25:21.407699798 +0100
@@ -0,0 +1,5 @@ 
+/* PR target/88514 */
+/* { dg-do assemble { target avx512vl } } */
+/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=128" } */
+
+#include "avx512vl-pr79299-1.c"
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-2.c.jj	2018-12-17 13:03:34.080666665 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-2.c	2018-12-17 14:25:32.814514127 +0100
@@ -0,0 +1,5 @@ 
+/* PR target/88514 */
+/* { dg-do assemble { target avx512vl } } */
+/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=256" } */
+
+#include "avx512vl-pr79299-1.c"
--- gcc/testsuite/gcc.target/i386/avx512vl-pr88514-3.c.jj	2018-12-17 13:03:50.291402408 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr88514-3.c	2018-12-17 14:25:48.827253484 +0100
@@ -0,0 +1,5 @@ 
+/* PR target/88514 */
+/* { dg-do assemble { target avx512vl } } */
+/* { dg-options "-Ofast -mavx512vl -mtune=intel -mprefer-vector-width=512" } */
+
+#include "avx512vl-pr79299-1.c"