diff mbox series

[x86,SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

Message ID 001801daa896$39a7a5e0$acf6f1a0$@nextmovesoftware.com
State New
Headers show
Series [x86,SSE] Improve handling of ternlog instructions in i386/sse.md (v2) | expand

Commit Message

Roger Sayle May 17, 2024, 8:10 p.m. UTC
Hi Hongtao,
Many thanks for the review, bug fixes and suggestions for improvements.
This revised version of the patch, implements all of your corrections.  In theory
the "ternlog idx" should guarantee that some operands are non-null, but I agree
that it's better defensive programming to check invariants not easily proved.
Instead of calling ix86_expand_vector_move, I use ix86_broadcast_from_constant
to achieve the same effect of using a broadcast when possible, but has the benefit
of still using a memory operand (instead of a vector load) when broadcasting isn't
possible.  There are other places that could benefit from the same trick, but I can
address these in a follow-up patch (it may even be preferrable to keep these as
CONST_VECTOR during early RTL passes and lower to broadcast or constant pool
using splitters).

This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-05-17  Roger Sayle  <roger@nextmovesoftware.com>
            Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
        PR target/115021
        * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
        fixup_modeless_constant before testing predicates.  Only call
        copy_to_mode_reg on memory operands (after the first one).
        (ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR
        into a VEC_DUPLICATE if possible.
        (ix86_ternlog_idx):  Convert an RTX expression into a ternlog
        index between 0 and 255, recording the operands in ARGS, if
        possible or return -1 if this is not possible/valid.
        (ix86_ternlog_leaf_p): Helper function to identify "leaves"
        of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc.
        (ix86_ternlog_operand_p): Test whether a expression is suitable
        for and prefered as an UNSPEC_TERNLOG.
        (ix86_expand_ternlog_binop): Helper function to construct the
        binary operation corresponding to a sufficiently simple ternlog.
        (ix86_expand_ternlog_andnot): Helper function to construct a
        ANDN operation corresponding to a sufficiently simple ternlog.
        (ix86_expand_ternlog): Expand a 3-operand ternary logic
        expression, constructing either an UNSPEC_TERNLOG or simpler
        rtx expression.  Called from builtin expanders and pre-reload
        splitters.
        * config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here.
        (ix86_ternlog_operand_p): Likewise.
        (ix86_expand_ternlog): Likewise.
        * config/i386/predicates.md (ternlog_operand): New predicate
        that calls xi86_ternlog_operand_p.
        * config/i386/sse.md (<avx512>_vpternlog<mode>_0): New
        define_insn_and_split that recognizes a SET_SRC of ternlog_operand
        and expands it via ix86_expand_ternlog pre-reload.
        (<avx512>_vternlog<mode>_mask): Convert from define_insn to
        define_expand.  Use ix86_expand_ternlog if the mask operand is
        ~0 (or 255 or -1).
        (*<avx512>_vternlog<mode>_mask): define_insn renamed from above.

gcc/testsuite/ChangeLog
        * gcc.target/i386/avx512f-andn-di-zmm-2.c: Update test case.
        * gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise.
        * gcc.target/i386/avx512f-orn-si-zmm-1.c: Likewise.
        * gcc.target/i386/avx512f-orn-si-zmm-2.c: Likewise.
        * gcc.target/i386/avx512f-vpternlogd-1.c: Likewise.
        * gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
        * gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
        * gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
        * gcc.target/i386/pr100711-3.c: Likewise.
        * gcc.target/i386/pr100711-4.c: Likewise.
        * gcc.target/i386/pr100711-5.c: Likewise.


Thanks again,
Roger
--

> From: Hongtao Liu <crazylht@gmail.com>
> Sent: 14 May 2024 09:46
> On Mon, May 13, 2024 at 5:57 AM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch improves the way that the x86 backend recognizes and
> > expands AVX512's bitwise ternary logic (vpternlog) instructions.
> I like the patch.
> 
> 1 file changed, 25 insertions(+), 1 deletion(-) gcc/config/i386/i386-expand.cc | 26
> +++++++++++++++++++++++++-
> 
> modified   gcc/config/i386/i386-expand.cc
> @@ -25601,6 +25601,7 @@ ix86_gen_bcst_mem (machine_mode mode, rtx x)
> int  ix86_ternlog_idx (rtx op, rtx *args)  {
> +  /* Nice dynamic programming:)  */
>    int idx0, idx1;
> 
>    if (!op)
> @@ -25651,6 +25652,7 @@ ix86_ternlog_idx (rtx op, rtx *args)
>     return 0xaa;
>   }
>        /* Maximum of one volatile memory reference per expression.  */
> +      /* According to comments, it should be && ?  */
>        if (side_effects_p (op) || side_effects_p (args[2]))
>   return -1;
>        if (rtx_equal_p (op, args[2]))
> @@ -25666,6 +25668,8 @@ ix86_ternlog_idx (rtx op, rtx *args)
> 
>      case SUBREG:
>        if (!VECTOR_MODE_P (GET_MODE (SUBREG_REG (op)))
> +   /* It could be TI/OI/XImode since it's just bit operations,
> +      So no need for VECTOR_MODE_P?  */
>     || GET_MODE_SIZE (GET_MODE (SUBREG_REG (op)))
>        != GET_MODE_SIZE (GET_MODE (op)))
>   return -1;
> @@ -25701,7 +25705,7 @@ ix86_ternlog_idx (rtx op, rtx *args)
>      case UNSPEC:
>        if (XINT (op, 1) != UNSPEC_VTERNLOG
>     || XVECLEN (op, 0) != 4
> -   || CONST_INT_P (XVECEXP (op, 0, 3)))
> +   || !CONST_INT_P (XVECEXP (op, 0, 3)))
>   return -1;
> 
>        /* TODO: Handle permuted operands.  */ @@ -25778,10 +25782,13 @@
> ix86_ternlog_operand_p (rtx op)
>        /* Prefer pxor.  */
>        if (ix86_ternlog_leaf_p (XEXP (op, 0), mode)
>     && (ix86_ternlog_leaf_p (op1, mode)
> +       /* Add some comments, it's because we already have
> <mask_codefor>one_cmpl<mode>2<mask_name>.  */
>         || vector_all_ones_operand (op1, mode)))
>   return false;
>        break;
> 
> +      /* Wouldn't pternlog match (SUBREG: (REG))???,and it should
> also be excluded.
> +        Similar for SUBREG: (AND/IOR/XOR)?   */
>      default:
>        break;
>      }
> @@ -25865,25 +25872,35 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
> 
>      case 0x0a: /* ~a&c */
>        if ((!op1 || !side_effects_p (op1))
> +   /* shouldn't op1 always be register_operand with no side effects
> when it exists?
> +      <avx512>_vternlog<mode>_mask only supports register_operand for op1.
> +      ix86_ternlog_idx only assigns REG to args[1].
> +      Ditto for op0, also we should add op2 && register_operand (op2, mode)
> +      to avoid segment fault?   */
>     && register_operand (op0, mode)
>     && register_operand (op2, mode))
>   return ix86_expand_ternlog_andnot (mode, op0, op1, target);
> +      /* op2 instead of op1??? */
>        break;
> 
>      case 0x0c: /* ~a&b */
>        if ((!op2 || !side_effects_p (op2))
>     && register_operand (op0, mode)
>     && register_operand (op1, mode))
> + /* If op0 and op1 exist, they must be register_operand? So just op0
> && op1?  */
>   return ix86_expand_ternlog_andnot (mode, op0, op1, target);
>        break;
> 
>      case 0x0f:  /* ~a */
>        if ((!op1 || !side_effects_p (op1))
> +   /* No need for !side_effects for op1?  */
> +   /* Ditto.  */
>     && (!op2 || !side_effects_p (op2)))
>   {
>     if (GET_MODE (op0) != mode)
>       op0 = gen_lowpart (mode, op0);
>     if (!TARGET_64BIT && !register_operand (op0, mode))
> +     /* It must be register_operand for op0 when it exists, no? */
>       op0 = force_reg (mode, op0);
>     emit_move_insn (target, gen_rtx_XOR (mode, op0, CONSTM1_RTX (mode)));
>     return target;
> @@ -25894,6 +25911,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        if ((!op0 || !side_effects_p (op0))
>     && register_operand (op1, mode)
>     && register_operand (op2, mode))
> + /* op1 && op2 && register_operand (op2, mode)??  */
>   return ix86_expand_ternlog_andnot (mode, op1, op2, target);
>        break;
> 
> @@ -25901,12 +25919,14 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        if ((!op2 || !side_effects_p (op2))
>     && register_operand (op0, mode)
>     && register_operand (op1, mode))
> + /* op0 && op1? */
>   return ix86_expand_ternlog_andnot (mode, op1, op0, target);
>        break;
> 
>      case 0x33:  /* ~b */
>        if ((!op0 || !side_effects_p (op0))
>     && (!op2 || !side_effects_p (op2)))
> + /* op1 && (!op2 || !side_effects_p (op2)) ?  */
>   {
>     if (GET_MODE (op1) != mode)
>       op1 = gen_lowpart (mode, op1);
> @@ -26051,6 +26071,10 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        tmp2 = ix86_gen_bcst_mem (mode, op2);
>        if (!tmp2)
>   tmp2 = validize_mem (force_const_mem (mode, op2));
> +      /* Can we use ix86_expand_vector_move here, it will try move
> integer to gpr,
> + and broadcast gpr to the vector register.
> + It should be faster than a constant pool, and PR115021 should be
> + solved by another way instead of this walkaround.  */
>      }
>    else
>      tmp2 = op2;
> 
> 
> 
> 
> --
> BR,
> Hongtao

Comments

Alexander Monakov May 20, 2024, 9:46 p.m. UTC | #1
Hello!

I looked at ternlog a bit last year, so I'd like to offer some drive-by
comments. If you want to tackle them in a follow-up patch, or leave for
someone else to handle, please let me know.

On Fri, 17 May 2024, Roger Sayle wrote:

> This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Just to make sure: no new tests for the new tricks?

> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> +/* Determine the ternlog immediate index that implements 3-operand
> +   ternary logic expression OP.  This uses and modifies the 3 element
> +   array ARGS to record and check the leaves, either 3 REGs, or 2 REGs
> +   and MEM.  Returns an index between 0 and 255 for a valid ternlog,
> +   or -1 if the expression isn't suitable.  */
> +
> +int
> +ix86_ternlog_idx (rtx op, rtx *args)
> +{
> +  int idx0, idx1;
> +
> +  if (!op)
> +    return -1;
> +
> +  switch (GET_CODE (op))
> +    {
> +    case REG:
> +      if (!args[0])
> +	{
> +	  args[0] = op;
> +	  return 0xf0;

From readability perspective, I wonder if it's nicer to have something like

enum {
  TERNLOG_A = 0xf0,
  TERNLOG_B = 0xcc,
  TERNLOG_C = 0xaa
}

and then use them to build the immediates.

> +	}
> +      if (REGNO (op) == REGNO (args[0]))
> +	return 0xf0;
> +      if (!args[1])
> +	{
> +	  args[1] = op;
> +	  return 0xcc;
> +	}
[snip]
> +
> +/* Return TRUE if OP (in mode MODE) is the leaf of a ternary logic
> +   expression, such as a register or a memory reference.  */
> + 
> +bool
> +ix86_ternlog_leaf_p (rtx op, machine_mode mode)
> +{
> +  /* We can't use memory_operand here, as it may return a different
> +     value before and after reload (for volatile MEMs) which creates
> +     problems splitting instructions.  */
> +  return register_operand (op, mode)
> +	 || MEM_P (op)
> +	 || GET_CODE (op) == CONST_VECTOR
> +	 || bcst_mem_operand (op, mode);

Did your editor automatically indent this correctly for you? I think
usually such expressions have outer parenthesis.

> +}
[snip]
> +/* Expand a 3-operand ternary logic expression.  Return TARGET. */
> +rtx
> +ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx,
> +		     rtx target)
> +{
> +  rtx tmp0, tmp1, tmp2;
> +
> +  if (!target)
> +    target = gen_reg_rtx (mode);
> +
> +  /* Canonicalize ternlog index for degenerate (duplicated) operands.  */

But this only canonicalizes the case of triplicated operands, and does nothing
if two operands are duplicates of each other, and the third is distinct.
Handling that would complicate the already large patch a lot though.

> +  if (rtx_equal_p (op0, op1) && rtx_equal_p (op0, op2))
> +    switch (idx & 0x81)
> +      {
> +      case 0x00:
> +	idx = 0x00;
> +	break;
> +      case 0x01:
> +	idx = 0x0f;
> +	break;
> +      case 0x80:
> +	idx = 0xf0;
> +	break;
> +      case 0x81:
> +	idx = 0xff;
> +	break;
> +      }
> +
> +  switch (idx & 0xff)
> +    {
> +    case 0x00:
> +      if ((!op0 || !side_effects_p (op0))
> +          && (!op1 || !side_effects_p (op1))
> +          && (!op2 || !side_effects_p (op2)))
> +        {
> +	  emit_move_insn (target, CONST0_RTX (mode));
> +	  return target;
> +	}
> +      break;
> +
> +    case 0x0a: /* ~a&c */

With the enum idea above, this could be 'case ~TERNLOG_A & TERNLOG_C', etc.

Alexander
Hongtao Liu May 27, 2024, 6:39 a.m. UTC | #2
On Tue, May 21, 2024 at 5:46 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> Hello!
>
> I looked at ternlog a bit last year, so I'd like to offer some drive-by
> comments. If you want to tackle them in a follow-up patch, or leave for
> someone else to handle, please let me know.
>
> On Fri, 17 May 2024, Roger Sayle wrote:
>
> > This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
>
> Just to make sure: no new tests for the new tricks?
>
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > +/* Determine the ternlog immediate index that implements 3-operand
> > +   ternary logic expression OP.  This uses and modifies the 3 element
> > +   array ARGS to record and check the leaves, either 3 REGs, or 2 REGs
> > +   and MEM.  Returns an index between 0 and 255 for a valid ternlog,
> > +   or -1 if the expression isn't suitable.  */
> > +
> > +int
> > +ix86_ternlog_idx (rtx op, rtx *args)
> > +{
> > +  int idx0, idx1;
> > +
> > +  if (!op)
> > +    return -1;
> > +
> > +  switch (GET_CODE (op))
> > +    {
> > +    case REG:
> > +      if (!args[0])
> > +     {
> > +       args[0] = op;
> > +       return 0xf0;
>
> From readability perspective, I wonder if it's nicer to have something like
>
> enum {
>   TERNLOG_A = 0xf0,
>   TERNLOG_B = 0xcc,
>   TERNLOG_C = 0xaa
> }
>
> and then use them to build the immediates.
>
> > +     }
> > +      if (REGNO (op) == REGNO (args[0]))
> > +     return 0xf0;
> > +      if (!args[1])
> > +     {
> > +       args[1] = op;
> > +       return 0xcc;
> > +     }
> [snip]
> > +
> > +/* Return TRUE if OP (in mode MODE) is the leaf of a ternary logic
> > +   expression, such as a register or a memory reference.  */
> > +
> > +bool
> > +ix86_ternlog_leaf_p (rtx op, machine_mode mode)
> > +{
> > +  /* We can't use memory_operand here, as it may return a different
> > +     value before and after reload (for volatile MEMs) which creates
> > +     problems splitting instructions.  */
> > +  return register_operand (op, mode)
> > +      || MEM_P (op)
> > +      || GET_CODE (op) == CONST_VECTOR
> > +      || bcst_mem_operand (op, mode);
>
> Did your editor automatically indent this correctly for you? I think
> usually such expressions have outer parenthesis.
>
> > +}
> [snip]
> > +/* Expand a 3-operand ternary logic expression.  Return TARGET. */
> > +rtx
> > +ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx,
> > +                  rtx target)
> > +{
> > +  rtx tmp0, tmp1, tmp2;
> > +
> > +  if (!target)
> > +    target = gen_reg_rtx (mode);
> > +
> > +  /* Canonicalize ternlog index for degenerate (duplicated) operands.  */
>
> But this only canonicalizes the case of triplicated operands, and does nothing
> if two operands are duplicates of each other, and the third is distinct.
> Handling that would complicate the already large patch a lot though.
I think it's handled below,
  tmp0 = register_operand (op0, mode) ? op0 : force_reg (mode, op0);
  if (GET_MODE (tmp0) != mode)
    tmp0 = gen_lowpart (mode, tmp0);

  if (!op1 || rtx_equal_p (op0, op1))  ---- here
    tmp1 = copy_rtx (tmp0);
  else if (!register_operand (op1, mode))
    tmp1 = force_reg (mode, op1);
  else
    tmp1 = op1;
  if (GET_MODE (tmp1) != mode)
    tmp1 = gen_lowpart (mode, tmp1);

  if (!op2 || rtx_equal_p (op0, op2)) ---------- and here.
    tmp2 = copy_rtx (tmp0);
  else if (rtx_equal_p (op1, op2))
    tmp2 = copy_rtx (tmp1);
  else if (GET_CODE (op2) == CONST_VECTOR)
    {
      if (GET_MODE (op2) != mode)
op2 = gen_lowpart (mode, op2);
      tmp2 = ix86_gen_bcst_mem (mode, op2);
      if (!tmp2)
{
  tmp2 = validize_mem (force
>
> > +  if (rtx_equal_p (op0, op1) && rtx_equal_p (op0, op2))
> > +    switch (idx & 0x81)
> > +      {
> > +      case 0x00:
> > +     idx = 0x00;
> > +     break;
> > +      case 0x01:
> > +     idx = 0x0f;
> > +     break;
> > +      case 0x80:
> > +     idx = 0xf0;
> > +     break;
> > +      case 0x81:
> > +     idx = 0xff;
> > +     break;
> > +      }
> > +
> > +  switch (idx & 0xff)
> > +    {
> > +    case 0x00:
> > +      if ((!op0 || !side_effects_p (op0))
> > +          && (!op1 || !side_effects_p (op1))
> > +          && (!op2 || !side_effects_p (op2)))
> > +        {
> > +       emit_move_insn (target, CONST0_RTX (mode));
> > +       return target;
> > +     }
> > +      break;
> > +
> > +    case 0x0a: /* ~a&c */
>
> With the enum idea above, this could be 'case ~TERNLOG_A & TERNLOG_C', etc.
>
> Alexander
Hongtao Liu May 27, 2024, 6:48 a.m. UTC | #3
On Sat, May 18, 2024 at 4:10 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Hongtao,
> Many thanks for the review, bug fixes and suggestions for improvements.
> This revised version of the patch, implements all of your corrections.  In theory
> the "ternlog idx" should guarantee that some operands are non-null, but I agree
> that it's better defensive programming to check invariants not easily proved.
> Instead of calling ix86_expand_vector_move, I use ix86_broadcast_from_constant
> to achieve the same effect of using a broadcast when possible, but has the benefit
> of still using a memory operand (instead of a vector load) when broadcasting isn't
> possible.  There are other places that could benefit from the same trick, but I can
> address these in a follow-up patch (it may even be preferrable to keep these as
> CONST_VECTOR during early RTL passes and lower to broadcast or constant pool
> using splitters).
>
> This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
1 file changed, 41 insertions(+)
gcc/config/i386/i386-expand.cc | 41 +++++++++++++++++++++++++++++++++++++++++

modified   gcc/config/i386/i386-expand.cc
@@ -25579,14 +25579,22 @@ ix86_gen_bcst_mem (machine_mode mode, rtx x)
       && !CONST_DOUBLE_P (cst)
       && !CONST_FIXED_P (cst))
     return NULL_RTX;
+  /* I think VALID_BCST_MODE_P should be sufficient to
+     make sure cst is CONST_INT or CONST_DOUBLE.  */

   int n_elts = GET_MODE_NUNITS (mode);
   if (CONST_VECTOR_NUNITS (x) != n_elts)
     return NULL_RTX;
+  /* Do we need this? I saw from caller side there's already
+       if (GET_MODE (op2) != mode)
+ op2 = gen_lowpart (mode, op2);
+ tmp2 = ix86_gen_bcst_mem (mode, op2);  */
+

   for (int i = 1; i < n_elts; i++)
     if (!rtx_equal_p (cst, CONST_VECTOR_ELT (x, i)))
       return NULL_RTX;
+  /* CONST_VECTOR_DUPLICATE_P (op)? */

   rtx mem = force_const_mem (GET_MODE_INNER (mode), cst);
   return gen_rtx_VEC_DUPLICATE (mode, validize_mem (mem));
@@ -25709,6 +25717,21 @@ ix86_ternlog_idx (rtx op, rtx *args)
    || ix86_ternlog_idx (XVECEXP (op, 0, 2), args) != 0xaa)
  return -1;
       return INTVAL (XVECEXP (op, 0, 3));
+      /* I think we can add some testcase for this.
+ .i.e
+ #include <immintrin.h>
+
+ __m256i
+ foo (__m256i a, __m256i b, __m256i c)
+ {
+ return (a & _mm256_ternarylogic_epi64 (a, b, c, 0xe4));
+ }
+
+ __m256i
+ foo1 (__m256i a, __m256i b, __m256i c)
+ {
+ return (b & _mm256_ternarylogic_epi64 (a, b, c, 0xe4));
+ }  */

     default:
       return -1;
@@ -25778,6 +25801,8 @@ ix86_ternlog_operand_p (rtx op)
       if (ix86_ternlog_leaf_p (XEXP (op, 0), mode)
    && (ix86_ternlog_leaf_p (op1, mode)
        || vector_all_ones_operand (op1, mode)))
+ /* There's CONST_VECTOR check in x86_ternlog_leaf_p,
+    so vector_all_ones_operand is not needed.  */
  return false;
       break;

@@ -25862,6 +25887,10 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
       if ((!op0 || !side_effects_p (op0))
           && (!op1 || !side_effects_p (op1))
           && (!op2 || !side_effects_p (op2)))
+ /* I think only op2 needs to check side_effects_p, op0
+    and op1 must be register operand when it exists, no need for
side_effects_p?
+    Similar for all below side_effects_p (op0/op1)
+    the check is redundant.  */
         {
    emit_move_insn (target, CONST0_RTX (mode));
    return target;
@@ -25872,6 +25901,9 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
       if ((!op1 || !side_effects_p (op1))
    && op0 && register_operand (op0, mode)
    && op2 && register_operand (op2, mode))
+ /* op0/op1 must be register_operand when it exists,
+    so register_operand (op0/op1, mode) is not needed.
+    similar for all below register_operand (op0/op1, mode).  */
  return ix86_expand_ternlog_andnot (mode, op0, op2, target);
       break;

@@ -25879,6 +25911,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
       if ((!op2 || !side_effects_p (op2))
    && op0 && register_operand (op0, mode)
    && op1 && register_operand (op1, mode))
+ /* op0 && op1? */
  return ix86_expand_ternlog_andnot (mode, op0, op1, target);
       break;

@@ -25948,6 +25981,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
       if ((!op0 || !side_effects_p (op0))
    && (!op1 || !side_effects_p (op1))
           && op2)
+ /* if (op2).  */
  {
    if (GET_MODE (op2) != mode)
      op2 = gen_lowpart (mode, op2);
@@ -25961,18 +25995,21 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
     case 0x5a:  /* a^c */
       if (op0 && op2
           && (!op1 || !side_effects_p (op1)))
+ /* if (op0 && op2).  */
  return ix86_expand_ternlog_binop (XOR, mode, op0, op2, target);
       break;

     case 0x66:  /* b^c */
       if ((!op0 || !side_effects_p (op0))
           && op1 && op2)
+ /* if (op1 && op2).  */
  return ix86_expand_ternlog_binop (XOR, mode, op1, op2, target);
       break;

     case 0x88:  /* b&c */
       if ((!op0 || !side_effects_p (op0))
           && op1 && op2)
+ /* if (op1 && op2).  */
  return ix86_expand_ternlog_binop (AND, mode, op1, op2, target);
       break;

@@ -26054,6 +26091,9 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
     }

   tmp0 = register_operand (op0, mode) ? op0 : force_reg (mode, op0);
+  /* Do you observe there're cases of op0 not register_operand?.
+     if it's from <avx512>_vternlog<mode>_mask, it must be register_operand.
+     if it's from ix86_ternlog_idx, it must REG_P.  */
   if (GET_MODE (tmp0) != mode)
     tmp0 = gen_lowpart (mode, tmp0);

@@ -26061,6 +26101,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
op0, rtx op1, rtx op2, int idx,
     tmp1 = copy_rtx (tmp0);
   else if (!register_operand (op1, mode))
     tmp1 = force_reg (mode, op1);
+  /* Ditto.  */
   else
     tmp1 = op1;
   if (GET_MODE (tmp1) != mode)
Hongtao Liu May 27, 2024, 8:58 a.m. UTC | #4
On Mon, May 27, 2024 at 2:48 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Sat, May 18, 2024 at 4:10 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > Hi Hongtao,
> > Many thanks for the review, bug fixes and suggestions for improvements.
> > This revised version of the patch, implements all of your corrections.  In theory
> > the "ternlog idx" should guarantee that some operands are non-null, but I agree
> > that it's better defensive programming to check invariants not easily proved.
> > Instead of calling ix86_expand_vector_move, I use ix86_broadcast_from_constant
> > to achieve the same effect of using a broadcast when possible, but has the benefit
> > of still using a memory operand (instead of a vector load) when broadcasting isn't
> > possible.  There are other places that could benefit from the same trick, but I can
> > address these in a follow-up patch (it may even be preferrable to keep these as
> > CONST_VECTOR during early RTL passes and lower to broadcast or constant pool
> > using splitters).
> >
> > This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 1 file changed, 41 insertions(+)
> gcc/config/i386/i386-expand.cc | 41 +++++++++++++++++++++++++++++++++++++++++
>
> modified   gcc/config/i386/i386-expand.cc
> @@ -25579,14 +25579,22 @@ ix86_gen_bcst_mem (machine_mode mode, rtx x)
>        && !CONST_DOUBLE_P (cst)
>        && !CONST_FIXED_P (cst))
>      return NULL_RTX;
> +  /* I think VALID_BCST_MODE_P should be sufficient to
> +     make sure cst is CONST_INT or CONST_DOUBLE.  */
>
>    int n_elts = GET_MODE_NUNITS (mode);
>    if (CONST_VECTOR_NUNITS (x) != n_elts)
>      return NULL_RTX;
> +  /* Do we need this? I saw from caller side there's already
> +       if (GET_MODE (op2) != mode)
> + op2 = gen_lowpart (mode, op2);
> + tmp2 = ix86_gen_bcst_mem (mode, op2);  */
> +
>
>    for (int i = 1; i < n_elts; i++)
>      if (!rtx_equal_p (cst, CONST_VECTOR_ELT (x, i)))
>        return NULL_RTX;
> +  /* CONST_VECTOR_DUPLICATE_P (op)? */
>
>    rtx mem = force_const_mem (GET_MODE_INNER (mode), cst);
>    return gen_rtx_VEC_DUPLICATE (mode, validize_mem (mem));
> @@ -25709,6 +25717,21 @@ ix86_ternlog_idx (rtx op, rtx *args)
>     || ix86_ternlog_idx (XVECEXP (op, 0, 2), args) != 0xaa)
>   return -1;
>        return INTVAL (XVECEXP (op, 0, 3));
> +      /* I think we can add some testcase for this.
> + .i.e
> + #include <immintrin.h>
> +
> + __m256i
> + foo (__m256i a, __m256i b, __m256i c)
> + {
> + return (a & _mm256_ternarylogic_epi64 (a, b, c, 0xe4));
> + }
> +
> + __m256i
> + foo1 (__m256i a, __m256i b, __m256i c)
> + {
> + return (b & _mm256_ternarylogic_epi64 (a, b, c, 0xe4));
> + }  */
>
>      default:
>        return -1;
> @@ -25778,6 +25801,8 @@ ix86_ternlog_operand_p (rtx op)
>        if (ix86_ternlog_leaf_p (XEXP (op, 0), mode)
>     && (ix86_ternlog_leaf_p (op1, mode)
>         || vector_all_ones_operand (op1, mode)))
> + /* There's CONST_VECTOR check in x86_ternlog_leaf_p,
> +    so vector_all_ones_operand is not needed.  */
>   return false;
>        break;
>
> @@ -25862,6 +25887,10 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        if ((!op0 || !side_effects_p (op0))
>            && (!op1 || !side_effects_p (op1))
>            && (!op2 || !side_effects_p (op2)))
> + /* I think only op2 needs to check side_effects_p, op0
> +    and op1 must be register operand when it exists, no need for
> side_effects_p?
> +    Similar for all below side_effects_p (op0/op1)
> +    the check is redundant.  */
>          {
>     emit_move_insn (target, CONST0_RTX (mode));
>     return target;
> @@ -25872,6 +25901,9 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        if ((!op1 || !side_effects_p (op1))
>     && op0 && register_operand (op0, mode)
>     && op2 && register_operand (op2, mode))
> + /* op0/op1 must be register_operand when it exists,
> +    so register_operand (op0/op1, mode) is not needed.
> +    similar for all below register_operand (op0/op1, mode).  */
>   return ix86_expand_ternlog_andnot (mode, op0, op2, target);
>        break;
>
> @@ -25879,6 +25911,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        if ((!op2 || !side_effects_p (op2))
>     && op0 && register_operand (op0, mode)
>     && op1 && register_operand (op1, mode))
> + /* op0 && op1? */
>   return ix86_expand_ternlog_andnot (mode, op0, op1, target);
>        break;
>
> @@ -25948,6 +25981,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>        if ((!op0 || !side_effects_p (op0))
>     && (!op1 || !side_effects_p (op1))
>            && op2)
> + /* if (op2).  */
>   {
>     if (GET_MODE (op2) != mode)
>       op2 = gen_lowpart (mode, op2);
> @@ -25961,18 +25995,21 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>      case 0x5a:  /* a^c */
>        if (op0 && op2
>            && (!op1 || !side_effects_p (op1)))
> + /* if (op0 && op2).  */
>   return ix86_expand_ternlog_binop (XOR, mode, op0, op2, target);
>        break;
>
>      case 0x66:  /* b^c */
>        if ((!op0 || !side_effects_p (op0))
>            && op1 && op2)
> + /* if (op1 && op2).  */
>   return ix86_expand_ternlog_binop (XOR, mode, op1, op2, target);
>        break;
>
>      case 0x88:  /* b&c */
>        if ((!op0 || !side_effects_p (op0))
>            && op1 && op2)
> + /* if (op1 && op2).  */
>   return ix86_expand_ternlog_binop (AND, mode, op1, op2, target);
>        break;
>
> @@ -26054,6 +26091,9 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>      }
>
>    tmp0 = register_operand (op0, mode) ? op0 : force_reg (mode, op0);
> +  /* Do you observe there're cases of op0 not register_operand?.
> +     if it's from <avx512>_vternlog<mode>_mask, it must be register_operand.
> +     if it's from ix86_ternlog_idx, it must REG_P.  */
>    if (GET_MODE (tmp0) != mode)
>      tmp0 = gen_lowpart (mode, tmp0);
>
> @@ -26061,6 +26101,7 @@ ix86_expand_ternlog (machine_mode mode, rtx
> op0, rtx op1, rtx op2, int idx,
>      tmp1 = copy_rtx (tmp0);
>    else if (!register_operand (op1, mode))
>      tmp1 = force_reg (mode, op1);
> +  /* Ditto.  */
>    else
>      tmp1 = op1;
>    if (GET_MODE (tmp1) != mode)
>
>
>
>
> --
> BR,
> Hongtao

Got ICE for below testcase

#include <immintrin.h>
__m256i
foo2 (__m256i** a, __m256i b)
{
  return ~(**a);
}

with -march=x86-64-v4 -O2

 (insn 17 7 13 2 (set (reg:V4DI 103 [ _5 ])
        (xor:V4DI (mem:V4DI (mem/f:DI (reg:DI 105) [1 *a_4(D)+0 S8
A64]) [0 *_1+0 S32 A256])
            (const_vector:V4DI [
                    (const_int -1 [0xffffffffffffffff]) repeated x4
                ]))) "test.c":7:10 -1
     (expr_list:REG_DEAD (reg:DI 105)
        (nil)))
during RTL pass: ira

I think we need to check memory_operand in ix86_ternlog_idx

    case MEM:
      if (MEM_P (op)
  && MEM_VOLATILE_P (op)
  && !volatile_ok)
return -1;
      /* FALLTHRU */
diff mbox series

Patch

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a613291..5f0b725 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -11707,6 +11707,8 @@  ix86_expand_args_builtin (const struct builtin_description *d,
       tree arg = CALL_EXPR_ARG (exp, i);
       rtx op = expand_normal (arg);
       machine_mode mode = insn_p->operand[i + 1].mode;
+      /* Need to fixup modeless constant before testing predicate.  */
+      op = fixup_modeless_constant (op, mode);
       bool match = insn_p->operand[i + 1].predicate (op, mode);
 
       if (second_arg_count && i == 1)
@@ -11873,13 +11875,15 @@  ix86_expand_args_builtin (const struct builtin_description *d,
 	  /* If we aren't optimizing, only allow one memory operand to
 	     be generated.  */
 	  if (memory_operand (op, mode))
-	    num_memory++;
-
-	  op = fixup_modeless_constant (op, mode);
+	    {
+	      num_memory++;
+	      if (!optimize && num_memory > 1)
+		op = copy_to_mode_reg (mode, op);
+	    }
 
 	  if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
 	    {
-	      if (optimize || !match || num_memory > 1)
+	      if (!match)
 		op = copy_to_mode_reg (mode, op);
 	    }
 	  else
@@ -25480,4 +25484,548 @@  ix86_expand_fast_convert_bf_to_sf (rtx val)
   return ret;
 }
 
+/* Attempt to convert a CONST_VECTOR into a bcst_mem_operand.
+   Returns NULL_RTX if X is cannot be expressed as a suitable
+   VEC_DUPLICATE in mode MODE.  */
+
+static rtx
+ix86_gen_bcst_mem (machine_mode mode, rtx x)
+{
+  if (!TARGET_AVX512F
+      || GET_CODE (x) != CONST_VECTOR
+      || (!TARGET_AVX512VL
+	  && (GET_MODE_SIZE (mode) != 64 || !TARGET_EVEX512))
+      || !VALID_BCST_MODE_P (GET_MODE_INNER (mode))
+	 /* Disallow HFmode broadcast.  */
+      || GET_MODE_SIZE (GET_MODE_INNER (mode)) < 4)
+    return NULL_RTX;
+
+  rtx cst = CONST_VECTOR_ELT (x, 0);
+  if (!CONST_SCALAR_INT_P (cst)
+      && !CONST_DOUBLE_P (cst)
+      && !CONST_FIXED_P (cst))
+    return NULL_RTX;
+  
+  int n_elts = GET_MODE_NUNITS (mode);
+  if (CONST_VECTOR_NUNITS (x) != n_elts)
+    return NULL_RTX;
+
+  for (int i = 1; i < n_elts; i++)
+    if (!rtx_equal_p (cst, CONST_VECTOR_ELT (x, i)))
+      return NULL_RTX;
+
+  rtx mem = force_const_mem (GET_MODE_INNER (mode), cst);
+  return gen_rtx_VEC_DUPLICATE (mode, validize_mem (mem));
+}
+
+/* Determine the ternlog immediate index that implements 3-operand
+   ternary logic expression OP.  This uses and modifies the 3 element
+   array ARGS to record and check the leaves, either 3 REGs, or 2 REGs
+   and MEM.  Returns an index between 0 and 255 for a valid ternlog,
+   or -1 if the expression isn't suitable.  */
+
+int
+ix86_ternlog_idx (rtx op, rtx *args)
+{
+  int idx0, idx1;
+
+  if (!op)
+    return -1;
+
+  switch (GET_CODE (op))
+    {
+    case REG:
+      if (!args[0])
+	{
+	  args[0] = op;
+	  return 0xf0;
+	}
+      if (REGNO (op) == REGNO (args[0]))
+	return 0xf0;
+      if (!args[1])
+	{
+	  args[1] = op;
+	  return 0xcc;
+	}
+      if (REGNO (op) == REGNO (args[1]))
+	return 0xcc;
+      if (!args[2])
+	{
+	  args[2] = op;
+	  return 0xaa;
+	}
+      if (REG_P (args[2]) && REGNO (op) == REGNO (args[2]))
+	return 0xaa;
+      return -1;
+
+    case VEC_DUPLICATE:
+      if (!bcst_mem_operand (op, GET_MODE (op)))
+	return -1;
+      /* FALLTHRU */
+
+    case MEM:
+      if (MEM_P (op)
+	  && MEM_VOLATILE_P (op)
+	  && !volatile_ok)
+	return -1;
+      /* FALLTHRU */
+
+    case CONST_VECTOR:
+      if (!args[2])
+	{
+	  args[2] = op;
+	  return 0xaa;
+	}
+      /* Maximum of one volatile memory reference per expression.  */
+      if (side_effects_p (op) && side_effects_p (args[2]))
+	return -1;
+      if (rtx_equal_p (op, args[2]))
+	return 0xaa;
+      /* Check if one CONST_VECTOR is the ones-complement of the other.  */
+      if (GET_CODE (op) == CONST_VECTOR
+	  && GET_CODE (args[2]) == CONST_VECTOR
+	  && rtx_equal_p (simplify_const_unary_operation (NOT, GET_MODE (op),
+							  op, GET_MODE (op)),
+			  args[2]))
+	return 0x55;
+      return -1;
+
+    case SUBREG:
+      if (GET_MODE_SIZE (GET_MODE (SUBREG_REG (op)))
+	  != GET_MODE_SIZE (GET_MODE (op)))
+	return -1;
+      return ix86_ternlog_idx (SUBREG_REG (op), args);
+
+    case NOT:
+      idx0 = ix86_ternlog_idx (XEXP (op, 0), args);
+      return (idx0 >= 0) ? idx0 ^ 0xff : -1;
+
+    case AND:
+      idx0 = ix86_ternlog_idx (XEXP (op, 0), args);
+      if (idx0 < 0)
+	return -1;
+      idx1 = ix86_ternlog_idx (XEXP (op, 1), args);
+      return (idx1 >= 0) ? idx0 & idx1 : -1;
+
+    case IOR:
+      idx0 = ix86_ternlog_idx (XEXP (op, 0), args);
+      if (idx0 < 0)
+	return -1;
+      idx1 = ix86_ternlog_idx (XEXP (op, 1), args);
+      return (idx1 >= 0) ? idx0 | idx1 : -1;
+
+    case XOR:
+      idx0 = ix86_ternlog_idx (XEXP (op, 0), args);
+      if (idx0 < 0)
+	return -1;
+      if (vector_all_ones_operand (XEXP (op, 1), GET_MODE (op)))
+	return idx0 ^ 0xff;
+      idx1 = ix86_ternlog_idx (XEXP (op, 1), args);
+      return (idx1 >= 0) ? idx0 ^ idx1 : -1;
+
+    case UNSPEC:
+      if (XINT (op, 1) != UNSPEC_VTERNLOG
+	  || XVECLEN (op, 0) != 4
+	  || !CONST_INT_P (XVECEXP (op, 0, 3)))
+	return -1;
+
+      /* TODO: Handle permuted operands.  */
+      if (ix86_ternlog_idx (XVECEXP (op, 0, 0), args) != 0xf0
+	  || ix86_ternlog_idx (XVECEXP (op, 0, 1), args) != 0xcc
+	  || ix86_ternlog_idx (XVECEXP (op, 0, 2), args) != 0xaa)
+	return -1;
+      return INTVAL (XVECEXP (op, 0, 3));
+
+    default:
+      return -1;
+    }
+}
+
+/* Return TRUE if OP (in mode MODE) is the leaf of a ternary logic
+   expression, such as a register or a memory reference.  */
+ 
+bool
+ix86_ternlog_leaf_p (rtx op, machine_mode mode)
+{
+  /* We can't use memory_operand here, as it may return a different
+     value before and after reload (for volatile MEMs) which creates
+     problems splitting instructions.  */
+  return register_operand (op, mode)
+	 || MEM_P (op)
+	 || GET_CODE (op) == CONST_VECTOR
+	 || bcst_mem_operand (op, mode);
+}
+
+/* Test whether OP is a 3-operand ternary logic expression suitable
+   for use in a ternlog instruction.  */
+
+bool
+ix86_ternlog_operand_p (rtx op)
+{
+  rtx op0, op1;
+  rtx args[3];
+
+  args[0] = NULL_RTX;
+  args[1] = NULL_RTX;
+  args[2] = NULL_RTX;
+  int idx = ix86_ternlog_idx (op, args);
+  if (idx < 0)
+    return false;
+
+  /* Don't match simple (binary or unary) expressions.  */
+  machine_mode mode = GET_MODE (op);
+  switch (GET_CODE (op))
+    {
+    case AND:
+      op0 = XEXP (op, 0);
+      op1 = XEXP (op, 1);
+
+      /* Prefer pand.  */
+      if (ix86_ternlog_leaf_p (op0, mode)
+	  && ix86_ternlog_leaf_p (op1, mode))
+	return false;
+      /* Prefer pandn.  */
+      if (GET_CODE (op0) == NOT
+	  && register_operand (XEXP (op0, 0), mode)
+	  && ix86_ternlog_leaf_p (op1, mode))
+	return false;
+      break;
+
+    case IOR:
+      /* Prefer por.  */
+      if (ix86_ternlog_leaf_p (XEXP (op, 0), mode)
+	  && ix86_ternlog_leaf_p (XEXP (op, 1), mode))
+	return false;
+      break;
+
+    case XOR:
+      op1 = XEXP (op, 1);
+      /* Prefer pxor, or one_cmpl<vmode>2.  */
+      if (ix86_ternlog_leaf_p (XEXP (op, 0), mode)
+	  && (ix86_ternlog_leaf_p (op1, mode)
+	      || vector_all_ones_operand (op1, mode)))
+	return false;
+      break;
+
+    default:
+      break;
+    }
+  return true;
+}
+
+/* Helper function for ix86_expand_ternlog.  */
+static rtx
+ix86_expand_ternlog_binop (enum rtx_code code, machine_mode mode,
+			   rtx op0, rtx op1, rtx target)
+{
+  if (GET_MODE (op0) != mode)
+    op0 = gen_lowpart (mode, op0);
+  if (GET_MODE (op1) != mode)
+    op1 = gen_lowpart (mode, op1);
+
+  if (GET_CODE (op0) == CONST_VECTOR)
+    op0 = validize_mem (force_const_mem (mode, op0));
+  if (GET_CODE (op1) == CONST_VECTOR)
+    op1 = validize_mem (force_const_mem (mode, op1));
+
+  if (memory_operand (op0, mode))
+    {
+      if (memory_operand (op1, mode))
+	op0 = force_reg (mode, op0);
+      else
+	std::swap (op0, op1);
+    }
+  rtx ops[3] = { target, op0, op1 };
+  ix86_expand_vector_logical_operator (code, mode, ops);
+  return target;
+}
+
+
+/* Helper function for ix86_expand_ternlog.  */
+static rtx
+ix86_expand_ternlog_andnot (machine_mode mode, rtx op0, rtx op1, rtx target)
+{
+  if (GET_MODE (op0) != mode)
+    op0 = gen_lowpart (mode, op0);
+  op0 = gen_rtx_NOT (mode, op0);
+  if (GET_MODE (op1) != mode)
+    op1 = gen_lowpart (mode, op1);
+  emit_move_insn (target, gen_rtx_AND (mode, op0, op1));
+  return target;
+}
+
+/* Expand a 3-operand ternary logic expression.  Return TARGET. */
+rtx
+ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx,
+		     rtx target)
+{
+  rtx tmp0, tmp1, tmp2;
+
+  if (!target)
+    target = gen_reg_rtx (mode);
+
+  /* Canonicalize ternlog index for degenerate (duplicated) operands.  */
+  if (rtx_equal_p (op0, op1) && rtx_equal_p (op0, op2))
+    switch (idx & 0x81)
+      {
+      case 0x00:
+	idx = 0x00;
+	break;
+      case 0x01:
+	idx = 0x0f;
+	break;
+      case 0x80:
+	idx = 0xf0;
+	break;
+      case 0x81:
+	idx = 0xff;
+	break;
+      }
+
+  switch (idx & 0xff)
+    {
+    case 0x00:
+      if ((!op0 || !side_effects_p (op0))
+          && (!op1 || !side_effects_p (op1))
+          && (!op2 || !side_effects_p (op2)))
+        {
+	  emit_move_insn (target, CONST0_RTX (mode));
+	  return target;
+	}
+      break;
+
+    case 0x0a: /* ~a&c */
+      if ((!op1 || !side_effects_p (op1))
+	  && op0 && register_operand (op0, mode)
+	  && op2 && register_operand (op2, mode))
+	return ix86_expand_ternlog_andnot (mode, op0, op2, target);
+      break;
+
+    case 0x0c: /* ~a&b */
+      if ((!op2 || !side_effects_p (op2))
+	  && op0 && register_operand (op0, mode)
+	  && op1 && register_operand (op1, mode))
+	return ix86_expand_ternlog_andnot (mode, op0, op1, target);
+      break;
+
+    case 0x0f:  /* ~a */
+      if ((!op1 || !side_effects_p (op1))
+	  && (!op2 || !side_effects_p (op2))
+          && op0)
+	{
+	  if (GET_MODE (op0) != mode)
+	    op0 = gen_lowpart (mode, op0);
+	  if (!TARGET_64BIT && !register_operand (op0, mode))
+	    op0 = force_reg (mode, op0);
+	  emit_move_insn (target, gen_rtx_XOR (mode, op0, CONSTM1_RTX (mode)));
+	  return target;
+	}
+      break;
+  
+    case 0x22: /* ~b&c */
+      if ((!op0 || !side_effects_p (op0))
+	  && op1 && register_operand (op1, mode)
+	  && op2 && register_operand (op2, mode))
+	return ix86_expand_ternlog_andnot (mode, op1, op2, target);
+      break;
+
+    case 0x30: /* ~b&a */
+      if ((!op2 || !side_effects_p (op2))
+	  && op0 && register_operand (op0, mode)
+	  && op1 && register_operand (op1, mode))
+	return ix86_expand_ternlog_andnot (mode, op1, op0, target);
+      break;
+
+    case 0x33:  /* ~b */
+      if ((!op0 || !side_effects_p (op0))
+	  && (!op2 || !side_effects_p (op2))
+          && op1)
+	{
+	  if (GET_MODE (op1) != mode)
+	    op1 = gen_lowpart (mode, op1);
+	  if (!TARGET_64BIT && !register_operand (op1, mode))
+	    op1 = force_reg (mode, op1);
+	  emit_move_insn (target, gen_rtx_XOR (mode, op1, CONSTM1_RTX (mode)));
+	  return target;
+	}
+      break;
+
+    case 0x3c:  /* a^b */
+      if (op0 && op1
+          && (!op2 || !side_effects_p (op2)))
+	return ix86_expand_ternlog_binop (XOR, mode, op0, op1, target);
+      break;
+
+    case 0x44: /* ~c&b */
+      if ((!op0 || !side_effects_p (op0))
+	  && op1 && register_operand (op1, mode)
+	  && op2 && register_operand (op2, mode))
+	return ix86_expand_ternlog_andnot (mode, op2, op1, target);
+      break;
+
+    case 0x50: /* ~c&a */
+      if ((!op1 || !side_effects_p (op1))
+	  && op0 && register_operand (op0, mode)
+	  && op2 && register_operand (op2, mode))
+	return ix86_expand_ternlog_andnot (mode, op2, op0, target);
+      break;
+
+    case 0x55:  /* ~c */
+      if ((!op0 || !side_effects_p (op0))
+	  && (!op1 || !side_effects_p (op1))
+          && op2)
+	{
+	  if (GET_MODE (op2) != mode)
+	    op2 = gen_lowpart (mode, op2);
+	  if (!TARGET_64BIT && !register_operand (op2, mode))
+	    op2 = force_reg (mode, op2);
+	  emit_move_insn (target, gen_rtx_XOR (mode, op2, CONSTM1_RTX (mode)));
+	  return target;
+	}
+      break;
+  
+    case 0x5a:  /* a^c */
+      if (op0 && op2
+          && (!op1 || !side_effects_p (op1)))
+	return ix86_expand_ternlog_binop (XOR, mode, op0, op2, target);
+      break;
+
+    case 0x66:  /* b^c */
+      if ((!op0 || !side_effects_p (op0))
+          && op1 && op2)
+	return ix86_expand_ternlog_binop (XOR, mode, op1, op2, target);
+      break;
+
+    case 0x88:  /* b&c */
+      if ((!op0 || !side_effects_p (op0))
+          && op1 && op2)
+	return ix86_expand_ternlog_binop (AND, mode, op1, op2, target);
+      break;
+
+    case 0xa0:  /* a&c */
+      if ((!op1 || !side_effects_p (op1))
+          && op0 && op2)
+	return ix86_expand_ternlog_binop (AND, mode, op0, op2, target);
+      break;
+
+    case 0xaa:  /* c */
+      if ((!op0 || !side_effects_p (op0))
+	  && (!op1 || !side_effects_p (op1))
+          && op2)
+	{
+	  if (GET_MODE (op2) != mode)
+	    op2 = gen_lowpart (mode, op2);
+	  emit_move_insn (target, op2);
+	  return target;
+	}
+      break;
+
+    case 0xc0:  /* a&b */
+      if (op0 && op1
+          && (!op2 || !side_effects_p (op2)))
+	return ix86_expand_ternlog_binop (AND, mode, op0, op1, target);
+      break;
+
+    case 0xcc:  /* b */
+      if ((!op0 || !side_effects_p (op0))
+          && op1
+	  && (!op2 || !side_effects_p (op2)))
+	{
+	  if (GET_MODE (op1) != mode)
+	    op1 = gen_lowpart (mode, op1);
+	  emit_move_insn (target, op1);
+	  return target;
+	}
+      break;
+
+    case 0xee:  /* b|c */
+      if ((!op0 || !side_effects_p (op0))
+          && op1 && op2)
+	return ix86_expand_ternlog_binop (IOR, mode, op1, op2, target);
+      break;
+
+    case 0xf0:  /* a */
+      if (op0
+          && (!op1 || !side_effects_p (op1))
+	  && (!op2 || !side_effects_p (op2)))
+	{
+	  if (GET_MODE (op0) != mode)
+	    op0 = gen_lowpart (mode, op0);
+	  emit_move_insn (target, op0);
+	  return target;
+	}
+      break;
+
+    case 0xfa:  /* a|c */
+      if (op0 && op2
+          && (!op1 || !side_effects_p (op1)))
+	return ix86_expand_ternlog_binop (IOR, mode, op0, op2, target);
+      break;
+
+    case 0xfc:  /* a|b */
+      if (op0 && op1
+          && (!op2 || !side_effects_p (op2)))
+	return ix86_expand_ternlog_binop (IOR, mode, op0, op1, target);
+      break;
+
+    case 0xff:
+      if ((!op0 || !side_effects_p (op0))
+          && (!op1 || !side_effects_p (op1))
+          && (!op2 || !side_effects_p (op2)))
+        {
+          emit_move_insn (target, CONSTM1_RTX (mode));
+	  return target;
+	}
+      break;
+    }
+
+  tmp0 = register_operand (op0, mode) ? op0 : force_reg (mode, op0);
+  if (GET_MODE (tmp0) != mode)
+    tmp0 = gen_lowpart (mode, tmp0);
+
+  if (!op1 || rtx_equal_p (op0, op1))
+    tmp1 = copy_rtx (tmp0);
+  else if (!register_operand (op1, mode))
+    tmp1 = force_reg (mode, op1);
+  else
+    tmp1 = op1;
+  if (GET_MODE (tmp1) != mode)
+    tmp1 = gen_lowpart (mode, tmp1);
+
+  if (!op2 || rtx_equal_p (op0, op2))
+    tmp2 = copy_rtx (tmp0);
+  else if (rtx_equal_p (op1, op2))
+    tmp2 = copy_rtx (tmp1);
+  else if (GET_CODE (op2) == CONST_VECTOR)
+    {
+      if (GET_MODE (op2) != mode)
+	op2 = gen_lowpart (mode, op2);
+      tmp2 = ix86_gen_bcst_mem (mode, op2);
+      if (!tmp2)
+	{
+	  tmp2 = validize_mem (force_const_mem (mode, op2));
+	  rtx bcast = ix86_broadcast_from_constant (mode, tmp2);
+	  if (bcast)
+	    {
+	      rtx reg2 = gen_reg_rtx (mode);
+	      bool ok = ix86_expand_vector_init_duplicate (false, mode,
+							   reg2, bcast);
+	      if (ok)
+		tmp2 = reg2;
+	    }
+	}
+    }
+  else
+    tmp2 = op2;
+  if (GET_MODE (tmp2) != mode)
+    tmp2 = gen_lowpart (mode, tmp2);
+  /* Some memory_operands are not vector_memory_operands.  */
+  if (!bcst_vector_operand (tmp2, mode))
+    tmp2 = force_reg (mode, tmp2);
+
+  rtvec vec = gen_rtvec (4, tmp0, tmp1, tmp2, GEN_INT (idx));
+  emit_move_insn (target, gen_rtx_UNSPEC (mode, vec, UNSPEC_VTERNLOG));
+  return target;
+}
+
 #include "gt-i386-expand.h"
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 46214a6..9a3e183 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -245,6 +245,11 @@  extern rtx ix86_expand_fast_convert_bf_to_sf (rtx);
 extern rtx ix86_memtag_untagged_pointer (rtx, rtx);
 extern bool ix86_memtag_can_tag_addresses (void);
 
+extern int ix86_ternlog_idx (rtx op, rtx *args);
+extern bool ix86_ternlog_operand_p (rtx op);
+extern rtx ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2,
+				int idx, rtx target);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif	/* TREE_CODE  */
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 2a97776..7afe310 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1098,6 +1098,11 @@ 
        (and (match_code "not")
 	    (match_test "nonimmediate_operand (XEXP (op, 0), mode)"))))
 
+;; True for expressions valid for 3-operand ternlog instructions.
+(define_predicate "ternlog_operand"
+  (and (match_code "not,and,ior,xor")
+       (match_test "ix86_ternlog_operand_p (op)")))
+
 ;; True if OP is acceptable as operand of DImode shift expander.
 (define_predicate "shiftdi_operand"
   (if_then_else (match_test "TARGET_64BIT")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1bf5072..3148651 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -12940,6 +12940,26 @@ 
 ;;
 ;; and so on.
 
+(define_insn_and_split "*<avx512>_vpternlog<mode>_0"
+  [(set (match_operand:V 0 "register_operand")
+	(match_operand:V 1 "ternlog_operand"))]
+  "(<MODE_SIZE> == 64 || TARGET_AVX512VL
+    || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx args[3];
+  args[0] = NULL_RTX;
+  args[1] = NULL_RTX;
+  args[2] = NULL_RTX;
+  int idx = ix86_ternlog_idx (operands[1], args);
+  ix86_expand_ternlog (<MODE>mode, args[0], args[1], args[2], idx,
+		       operands[0]);
+  DONE;
+})
+
 (define_code_iterator any_logic1 [and ior xor])
 (define_code_iterator any_logic2 [and ior xor])
 (define_code_attr logic_op [(and "&") (ior "|") (xor "^")])
@@ -13160,7 +13180,33 @@ 
 })
 
 
-(define_insn "<avx512>_vternlog<mode>_mask"
+(define_expand "<avx512>_vternlog<mode>_mask"
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
+	(vec_merge:VI48_AVX512VL
+	  (unspec:VI48_AVX512VL
+	    [(match_operand:VI48_AVX512VL 1 "register_operand")
+	     (match_operand:VI48_AVX512VL 2 "register_operand")
+	     (match_operand:VI48_AVX512VL 3 "bcst_vector_operand")
+	     (match_operand:SI 4 "const_0_to_255_operand")]
+	    UNSPEC_VTERNLOG)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "general_operand")))]
+  "TARGET_AVX512F"
+{
+  unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (<avx512fmaskmode>mode);
+  if (CONST_INT_P (operands[5])
+      && (UINTVAL (operands[5]) & mode_mask) == mode_mask)
+    {
+      ix86_expand_ternlog (<MODE>mode, operands[1], operands[2],
+			   operands[3], INTVAL (operands[4]),
+			   operands[0]);
+      DONE;
+    }
+  if (!register_operand (operands[5], <avx512fmaskmode>mode))
+    operands[5] = force_reg (<avx512fmaskmode>mode, operands[5]);
+})
+
+(define_insn "*<avx512>_vternlog<mode>_mask"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
 	(vec_merge:VI48_AVX512VL
 	  (unspec:VI48_AVX512VL
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-2.c
index 4ebb30f..24f3d6c 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-2.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\\\$0x44, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\\\$80, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
 /* { dg-final { scan-assembler-not "vpbroadcast" } } */
 
 #define type __m512i
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c
index 86e7ebe..1f5e72d 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0x44, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$80, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
 /* { dg-final { scan-assembler-not "vpbroadcast" } } */
 
 #define type __m512i
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-1.c
index 7d02f03..d21f48f 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-1.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0xdd, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$245, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
 /* { dg-final { scan-assembler-not "vpbroadcast" } } */
 
 #define type __m512i
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-2.c
index c793083..5359200 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-2.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0xbb, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$175, \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
 /* { dg-final { scan-assembler-not "vpbroadcast" } } */
 
 #define type __m512i
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-1.c
index a88153a..b098487 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-1.c
@@ -1,6 +1,5 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogq-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogq-1.c
index ef30246..8e5d22f 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogq-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogq-1.c
@@ -1,6 +1,5 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogd-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogd-1.c
index 045a266..dd53563 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogd-1.c
@@ -1,7 +1,5 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512vl -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
-/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogq-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogq-1.c
index 3a6707c..31fec3e 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogq-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpternlogq-1.c
@@ -1,7 +1,5 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512vl -O2" } */
-/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
-/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-3.c b/gcc/testsuite/gcc.target/i386/pr100711-3.c
index 98cc1c3..ea60190 100644
--- a/gcc/testsuite/gcc.target/i386/pr100711-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr100711-3.c
@@ -39,4 +39,4 @@  v8di foo_v8di (long long a, v8di b)
 
 /* { dg-final { scan-assembler-times "vpandn" 4 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "vpandn" 2 { target { ia32 } } } } */
-/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0x44" 2 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$80" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-4.c b/gcc/testsuite/gcc.target/i386/pr100711-4.c
index 3ca524f..a33f0a1 100644
--- a/gcc/testsuite/gcc.target/i386/pr100711-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr100711-4.c
@@ -37,6 +37,6 @@  v8di foo_v8di (long long a, v8di b)
     return (__extension__ (v8di) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b;
 }
 
-/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" 4 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" 2 { target { ia32 } } } } */
-/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xdd" 2 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$207" 4 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$207" 2 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$245" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-5.c b/gcc/testsuite/gcc.target/i386/pr100711-5.c
index 161fbfc..99cafc1 100644
--- a/gcc/testsuite/gcc.target/i386/pr100711-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr100711-5.c
@@ -37,4 +37,4 @@  v8di foo_v8di (long long a, v8di b)
     return (__extension__ (v8di) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) ^ b;
 }
 
-/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0x99" 4 } } */
+/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$1\[69\]5" 4 } } */