diff mbox series

i386: Optimize away shift count masking of shifts/rotates some more [PR105778]

Message ID YphuFT+7216wgG2m@tucnak
State New
Headers show
Series i386: Optimize away shift count masking of shifts/rotates some more [PR105778] | expand

Commit Message

Jakub Jelinek June 2, 2022, 8 a.m. UTC
Hi!

As the following testcase shows, our x86 backend support for optimizing
out useless masking of shift/rotate counts when using instructions
that naturally modulo the count themselves is insufficient.
The *_mask define_insn_and_split patterns use
(subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand")))
for the masking, but that can catch only the case where the masking
is done in SImode, so typically in SImode in the source.
We then have another set of patterns, *_mask_1, which use
(and:QI (match_operand:QI) (match_operand "const_int_operand"))
If the masking is done in DImode or in theory in HImode, we don't match
it.
The following patch does 4 different things to improve this:
1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI
   and replaces that by checking that the register shift count has
   SWI48 mode - I think doing it this way is cheaper than adding
   another mode iterator to patterns which use already another mode
   iterator and sometimes a code iterator as well
2) the doubleword shift patterns were only handling the case where
   the shift count is masked with a constant that has the most significant
   bit clear, i.e. where we know the shift count is less than half the
   number of bits in double-word.  If the mask is equal to half the
   number of bits in double-word minus 1, the masking was optimized
   away, otherwise the AND was kept.
   But if the most significant bit isn't clear, e use a word-sized shift
   and SHRD instruction, where the former does the modulo and the latter
   modulo with 64 / 32 depending on what mode the CPU is in (so 64 for
   128-bit doubleword and 32 or 64-bit doubleword).  So we can also
   optimize away the masking when the mask has all the relevant bits set,
   masking with the most significant bit will remain for the cmove
   test.
3) as requested, this patch adds a bunch of force_reg calls before
   gen_lowpart
4) 1-3 above unfortunately regressed
   +FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t]
   +FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t]
   where we during combine match the new pattern we didn't match
   before and in the end don't match the pattern we were testing for.
   These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern
   addition and small tweak to target rtx_costs, because even with
   the pattern around we'd refuse to match it because it appeared to
   have higher instruction cost

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-06-02  Jakub Jelinek  <jakub@redhat.com>

	PR target/105778
	* config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI
	from AND and its operands and just verify operands[2] has HImode,
	SImode or for TARGET_64BIT DImode.  Allow operands[3] to be a mask
	with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
	just throw away the masking.  Use force_reg before calling
	gen_lowpart.
	(*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
	with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
	just throw away the masking.
	(*ashl<mode>3_doubleword): Rename to ...
	(ashl<mode>3_doubleword): ... this.
	(*ashl<mode>3_mask): Remove :SI from AND and its operands and just
	verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
	Use force_reg before calling gen_lowpart.
	(*<insn><mode>3_mask): Likewise.
	(*<insn><dwi>3_doubleword_mask): Likewise.  Allow operands[3] to be
	a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that
	case just throw away the masking.  Use force_reg before calling
	gen_lowpart.
	(*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
	with all low 6 (64-bit) or 5 (32-bit) bits set and in that case just
	throw away the masking.
	(*<insn><mode>3_doubleword): Rename to ...
	(<insn><mode>3_doubleword): ... this.
	(*<insn><mode>3_mask): Remove :SI from AND and its operands and just
	verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
	Use force_reg before calling gen_lowpart.
	(splitter after it): Remove :SI from AND and its operands and just
	verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
	(*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and its
	operands and just verify operands[1] has HImode, SImode or for
	TARGET_64BIT DImode.  Use force_reg before calling gen_lowpart.
	(*jcc_bt<mode>_mask_1): New define_insn_and_split pattern.
	* config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with
	ZERO_EXTEND QI->SI in last operand ignore the cost of the ZERO_EXTEND.

	* gcc.target/i386/pr105778.c: New test.


	Jakub

Comments

Uros Bizjak June 2, 2022, 8:34 a.m. UTC | #1
On Thu, Jun 2, 2022 at 10:00 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> As the following testcase shows, our x86 backend support for optimizing
> out useless masking of shift/rotate counts when using instructions
> that naturally modulo the count themselves is insufficient.
> The *_mask define_insn_and_split patterns use
> (subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand")))
> for the masking, but that can catch only the case where the masking
> is done in SImode, so typically in SImode in the source.
> We then have another set of patterns, *_mask_1, which use
> (and:QI (match_operand:QI) (match_operand "const_int_operand"))
> If the masking is done in DImode or in theory in HImode, we don't match
> it.
> The following patch does 4 different things to improve this:
> 1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI
>    and replaces that by checking that the register shift count has
>    SWI48 mode - I think doing it this way is cheaper than adding
>    another mode iterator to patterns which use already another mode
>    iterator and sometimes a code iterator as well
> 2) the doubleword shift patterns were only handling the case where
>    the shift count is masked with a constant that has the most significant
>    bit clear, i.e. where we know the shift count is less than half the
>    number of bits in double-word.  If the mask is equal to half the
>    number of bits in double-word minus 1, the masking was optimized
>    away, otherwise the AND was kept.
>    But if the most significant bit isn't clear, e use a word-sized shift
>    and SHRD instruction, where the former does the modulo and the latter
>    modulo with 64 / 32 depending on what mode the CPU is in (so 64 for
>    128-bit doubleword and 32 or 64-bit doubleword).  So we can also
>    optimize away the masking when the mask has all the relevant bits set,
>    masking with the most significant bit will remain for the cmove
>    test.
> 3) as requested, this patch adds a bunch of force_reg calls before
>    gen_lowpart
> 4) 1-3 above unfortunately regressed
>    +FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t]
>    +FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t]
>    where we during combine match the new pattern we didn't match
>    before and in the end don't match the pattern we were testing for.
>    These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern
>    addition and small tweak to target rtx_costs, because even with
>    the pattern around we'd refuse to match it because it appeared to
>    have higher instruction cost
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-06-02  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/105778
>         * config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI
>         from AND and its operands and just verify operands[2] has HImode,
>         SImode or for TARGET_64BIT DImode.  Allow operands[3] to be a mask
>         with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
>         just throw away the masking.  Use force_reg before calling
>         gen_lowpart.
>         (*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
>         with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
>         just throw away the masking.
>         (*ashl<mode>3_doubleword): Rename to ...
>         (ashl<mode>3_doubleword): ... this.
>         (*ashl<mode>3_mask): Remove :SI from AND and its operands and just
>         verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
>         Use force_reg before calling gen_lowpart.
>         (*<insn><mode>3_mask): Likewise.
>         (*<insn><dwi>3_doubleword_mask): Likewise.  Allow operands[3] to be
>         a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that
>         case just throw away the masking.  Use force_reg before calling
>         gen_lowpart.
>         (*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
>         with all low 6 (64-bit) or 5 (32-bit) bits set and in that case just
>         throw away the masking.
>         (*<insn><mode>3_doubleword): Rename to ...
>         (<insn><mode>3_doubleword): ... this.
>         (*<insn><mode>3_mask): Remove :SI from AND and its operands and just
>         verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
>         Use force_reg before calling gen_lowpart.
>         (splitter after it): Remove :SI from AND and its operands and just
>         verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
>         (*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and its
>         operands and just verify operands[1] has HImode, SImode or for
>         TARGET_64BIT DImode.  Use force_reg before calling gen_lowpart.
>         (*jcc_bt<mode>_mask_1): New define_insn_and_split pattern.
>         * config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with
>         ZERO_EXTEND QI->SI in last operand ignore the cost of the ZERO_EXTEND.
>
>         * gcc.target/i386/pr105778.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.md.jj  2022-05-31 11:33:51.457251607 +0200
> +++ gcc/config/i386/i386.md     2022-06-01 11:59:27.388631872 +0200
> @@ -11890,11 +11890,16 @@ (define_insn_and_split "*ashl<dwi>3_doub
>         (ashift:<DWI>
>           (match_operand:<DWI> 1 "register_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> -   (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +           (and
> +             (match_operand 2 "register_operand" "c")
> +             (match_operand 3 "const_int_operand")) 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -11912,6 +11917,15 @@ (define_insn_and_split "*ashl<dwi>3_doub
>            (ashift:DWIH (match_dup 5) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +      operands[2] = gen_lowpart (QImode, operands[2]);
> +      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
> +                                           operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -11925,6 +11939,7 @@ (define_insn_and_split "*ashl<dwi>3_doub
>        operands[2] = tem;
>      }
>
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
>    operands[2] = gen_lowpart (QImode, operands[2]);
>
>    if (!rtx_equal_p (operands[6], operands[7]))
> @@ -11939,7 +11954,9 @@ (define_insn_and_split "*ashl<dwi>3_doub
>             (match_operand:QI 2 "register_operand" "c")
>             (match_operand:QI 3 "const_int_operand"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -11957,6 +11974,13 @@ (define_insn_and_split "*ashl<dwi>3_doub
>            (ashift:DWIH (match_dup 5) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
> +                                           operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -11974,7 +11998,7 @@ (define_insn_and_split "*ashl<dwi>3_doub
>      emit_move_insn (operands[6], operands[7]);
>  })
>
> -(define_insn "*ashl<mode>3_doubleword"
> +(define_insn "ashl<mode>3_doubleword"
>    [(set (match_operand:DWI 0 "register_operand" "=&r")
>         (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n")
>                     (match_operand:QI 2 "nonmemory_operand" "<S>c")))
> @@ -12186,13 +12210,16 @@ (define_insn_and_split "*ashl<mode>3_mas
>         (ashift:SWI48
>           (match_operand:SWI48 1 "nonimmediate_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c,r")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> +           (and
> +             (match_operand 2 "register_operand" "c,r")
> +             (match_operand 3 "const_int_operand")) 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)
>     && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12201,7 +12228,10 @@ (define_insn_and_split "*ashl<mode>3_mas
>            (ashift:SWI48 (match_dup 1)
>                          (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[2] = gen_lowpart (QImode, operands[2]);"
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);
> +}
>    [(set_attr "isa" "*,bmi2")])
>
>  (define_insn_and_split "*ashl<mode>3_mask_1"
> @@ -12774,13 +12804,16 @@ (define_insn_and_split "*<insn><mode>3_m
>         (any_shiftrt:SWI48
>           (match_operand:SWI48 1 "nonimmediate_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c,r")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> +           (and
> +             (match_operand 2 "register_operand" "c,r")
> +             (match_operand 3 "const_int_operand")) 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
>     && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12789,7 +12822,10 @@ (define_insn_and_split "*<insn><mode>3_m
>            (any_shiftrt:SWI48 (match_dup 1)
>                               (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[2] = gen_lowpart (QImode, operands[2]);"
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);
> +}
>    [(set_attr "isa" "*,bmi2")])
>
>  (define_insn_and_split "*<insn><mode>3_mask_1"
> @@ -12819,11 +12855,16 @@ (define_insn_and_split "*<insn><dwi>3_do
>         (any_shiftrt:<DWI>
>           (match_operand:<DWI> 1 "register_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> -   (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +           (and
> +             (match_operand 2 "register_operand" "c")
> +             (match_operand 3 "const_int_operand")) 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12841,6 +12882,15 @@ (define_insn_and_split "*<insn><dwi>3_do
>            (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +      operands[2] = gen_lowpart (QImode, operands[2]);
> +      emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1],
> +                                             operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -12854,6 +12904,7 @@ (define_insn_and_split "*<insn><dwi>3_do
>        operands[2] = tem;
>      }
>
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
>    operands[2] = gen_lowpart (QImode, operands[2]);
>
>    if (!rtx_equal_p (operands[4], operands[5]))
> @@ -12868,7 +12919,9 @@ (define_insn_and_split "*<insn><dwi>3_do
>             (match_operand:QI 2 "register_operand" "c")
>             (match_operand:QI 3 "const_int_operand"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12886,6 +12939,13 @@ (define_insn_and_split "*<insn><dwi>3_do
>            (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1],
> +                                             operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -12903,7 +12963,7 @@ (define_insn_and_split "*<insn><dwi>3_do
>      emit_move_insn (operands[4], operands[5]);
>  })
>
> -(define_insn_and_split "*<insn><mode>3_doubleword"
> +(define_insn_and_split "<insn><mode>3_doubleword"
>    [(set (match_operand:DWI 0 "register_operand" "=&r")
>         (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
>                          (match_operand:QI 2 "nonmemory_operand" "<S>c")))
> @@ -13586,13 +13646,16 @@ (define_insn_and_split "*<insn><mode>3_m
>         (any_rotate:SWI
>           (match_operand:SWI 1 "nonimmediate_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> +           (and
> +             (match_operand 2 "register_operand" "c")
> +             (match_operand 3 "const_int_operand")) 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
>     && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -13601,18 +13664,24 @@ (define_insn_and_split "*<insn><mode>3_m
>            (any_rotate:SWI (match_dup 1)
>                            (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[2] = gen_lowpart (QImode, operands[2]);")
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);
> +})
>
>  (define_split
>    [(set (match_operand:SWI 0 "register_operand")
>         (any_rotate:SWI
>           (match_operand:SWI 1 "const_int_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand")
> -             (match_operand:SI 3 "const_int_operand")) 0)))]
> +           (and
> +             (match_operand 2 "register_operand")
> +             (match_operand 3 "const_int_operand")) 0)))]
>   "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1))
> -   == GET_MODE_BITSIZE (<MODE>mode) - 1"
> +   == GET_MODE_BITSIZE (<MODE>mode) - 1
> +  && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +  && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +              4 << (TARGET_64BIT ? 1 : 0))"
>   [(set (match_dup 4) (match_dup 1))
>    (set (match_dup 0)
>         (any_rotate:SWI (match_dup 4)
> @@ -13976,14 +14045,17 @@ (define_insn_and_split "*<btsc><mode>_ma
>           (ashift:SWI48
>             (const_int 1)
>             (subreg:QI
> -             (and:SI
> -               (match_operand:SI 1 "register_operand")
> -               (match_operand:SI 2 "const_int_operand")) 0))
> +             (and
> +               (match_operand 1 "register_operand")
> +               (match_operand 2 "const_int_operand")) 0))
>           (match_operand:SWI48 3 "register_operand")))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_USE_BT
>     && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -13994,7 +14066,10 @@ (define_insn_and_split "*<btsc><mode>_ma
>                            (match_dup 1))
>              (match_dup 3)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[1] = gen_lowpart (QImode, operands[1]);")
> +{
> +  operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
> +  operands[1] = gen_lowpart (QImode, operands[1]);
> +})
>
>  (define_insn_and_split "*<btsc><mode>_mask_1"
>    [(set (match_operand:SWI48 0 "register_operand")
> @@ -14041,14 +14116,17 @@ (define_insn_and_split "*btr<mode>_mask"
>           (rotate:SWI48
>             (const_int -2)
>             (subreg:QI
> -             (and:SI
> -               (match_operand:SI 1 "register_operand")
> -               (match_operand:SI 2 "const_int_operand")) 0))
> +             (and
> +               (match_operand 1 "register_operand")
> +               (match_operand 2 "const_int_operand")) 0))
>           (match_operand:SWI48 3 "register_operand")))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_USE_BT
>     && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -14059,7 +14137,10 @@ (define_insn_and_split "*btr<mode>_mask"
>                            (match_dup 1))
>              (match_dup 3)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[1] = gen_lowpart (QImode, operands[1]);")
> +{
> +  operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
> +  operands[1] = gen_lowpart (QImode, operands[1]);
> +})
>
>  (define_insn_and_split "*btr<mode>_mask_1"
>    [(set (match_operand:SWI48 0 "register_operand")
> @@ -14409,6 +14490,47 @@ (define_insn_and_split "*jcc_bt<mode>_ma
>    operands[0] = shallow_copy_rtx (operands[0]);
>    PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
>  })
> +
> +(define_insn_and_split "*jcc_bt<mode>_mask_1"
> +  [(set (pc)
> +       (if_then_else (match_operator 0 "bt_comparison_operator"
> +                       [(zero_extract:SWI48
> +                          (match_operand:SWI48 1 "register_operand")
> +                          (const_int 1)
> +                          (zero_extend:SI
> +                            (subreg:QI
> +                              (and
> +                                (match_operand 2 "register_operand")
> +                                (match_operand 3 "const_int_operand")) 0)))])
> +                     (label_ref (match_operand 4))
> +                     (pc)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
> +      == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (reg:CCC FLAGS_REG)
> +       (compare:CCC
> +         (zero_extract:SWI48
> +           (match_dup 1)
> +           (const_int 1)
> +           (match_dup 2))
> +         (const_int 0)))
> +   (set (pc)
> +       (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
> +                     (label_ref (match_dup 4))
> +                     (pc)))]
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (SImode, operands[2]);
> +  operands[0] = shallow_copy_rtx (operands[0]);
> +  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
> +})
>
>  ;; Help combine recognize bt followed by cmov
>  (define_split
> --- gcc/config/i386/i386.cc.jj  2022-05-31 11:33:51.452251660 +0200
> +++ gcc/config/i386/i386.cc     2022-06-01 12:40:06.189186012 +0200
> @@ -20995,6 +20995,20 @@ ix86_rtx_costs (rtx x, machine_mode mode
>          *total += 1;
>        return false;
>
> +    case ZERO_EXTRACT:
> +      if (XEXP (x, 1) == const1_rtx
> +         && GET_CODE (XEXP (x, 2)) == ZERO_EXTEND
> +         && GET_MODE (XEXP (x, 2)) == SImode
> +         && GET_MODE (XEXP (XEXP (x, 2), 0)) == QImode)
> +       {
> +         /* Ignore cost of zero extension and masking of last argument.  */
> +         *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed);
> +         *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed);
> +         *total += rtx_cost (XEXP (XEXP (x, 2), 0), mode, code, 2, speed);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        return false;
>      }
> --- gcc/testsuite/gcc.target/i386/pr105778.c.jj 2022-05-31 13:59:12.470814609 +0200
> +++ gcc/testsuite/gcc.target/i386/pr105778.c    2022-05-31 13:58:50.624044700 +0200
> @@ -0,0 +1,45 @@
> +/* PR target/105778 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tand\[^\n\r]*\(31\|63\|127\|255\)" } } */
> +
> +unsigned int f1 (unsigned int x, unsigned long y) { y &= 31; return x << y; }
> +unsigned int f2 (unsigned int x, unsigned long y) { return x << (y & 31); }
> +unsigned int f3 (unsigned int x, unsigned long y) { y &= 31; return x >> y; }
> +unsigned int f4 (unsigned int x, unsigned long y) { return x >> (y & 31); }
> +int f5 (int x, unsigned long y) { y &= 31; return x >> y; }
> +int f6 (int x, unsigned long y) { return x >> (y & 31); }
> +unsigned long long f7 (unsigned long long x, unsigned long y) { y &= 63; return x << y; }
> +unsigned long long f8 (unsigned long long x, unsigned long y) { return x << (y & 63); }
> +unsigned long long f9 (unsigned long long x, unsigned long y) { y &= 63; return x >> y; }
> +unsigned long long f10 (unsigned long long x, unsigned long y) { return x >> (y & 63); }
> +long long f11 (long long x, unsigned long y) { y &= 63; return x >> y; }
> +long long f12 (long long x, unsigned long y) { return x >> (y & 63); }
> +#ifdef __SIZEOF_INT128__
> +unsigned __int128 f13 (unsigned __int128 x, unsigned long y) { y &= 127; return x << y; }
> +unsigned __int128 f14 (unsigned __int128 x, unsigned long y) { return x << (y & 127); }
> +unsigned __int128 f15 (unsigned __int128 x, unsigned long y) { y &= 127; return x >> y; }
> +unsigned __int128 f16 (unsigned __int128 x, unsigned long y) { return x >> (y & 127); }
> +__int128 f17 (__int128 x, unsigned long y) { y &= 127; return x >> y; }
> +__int128 f18 (__int128 x, unsigned long y) { return x >> (y & 127); }
> +#endif
> +unsigned int f19 (unsigned int x, unsigned long y) { y &= 63; return x << y; }
> +unsigned int f20 (unsigned int x, unsigned long y) { return x << (y & 63); }
> +unsigned int f21 (unsigned int x, unsigned long y) { y &= 63; return x >> y; }
> +unsigned int f22 (unsigned int x, unsigned long y) { return x >> (y & 63); }
> +int f23 (int x, unsigned long y) { y &= 63; return x >> y; }
> +int f24 (int x, unsigned long y) { return x >> (y & 63); }
> +unsigned long long f25 (unsigned long long x, unsigned long y) { y &= 127; return x << y; }
> +unsigned long long f26 (unsigned long long x, unsigned long y) { return x << (y & 127); }
> +unsigned long long f27 (unsigned long long x, unsigned long y) { y &= 127; return x >> y; }
> +unsigned long long f28 (unsigned long long x, unsigned long y) { return x >> (y & 127); }
> +long long f29 (long long x, unsigned long y) { y &= 127; return x >> y; }
> +long long f30 (long long x, unsigned long y) { return x >> (y & 127); }
> +#ifdef __SIZEOF_INT128__
> +unsigned __int128 f31 (unsigned __int128 x, unsigned long y) { y &= 255; return x << y; }
> +unsigned __int128 f32 (unsigned __int128 x, unsigned long y) { return x << (y & 255); }
> +unsigned __int128 f33 (unsigned __int128 x, unsigned long y) { y &= 255; return x >> y; }
> +unsigned __int128 f34 (unsigned __int128 x, unsigned long y) { return x >> (y & 255); }
> +__int128 f35 (__int128 x, unsigned long y) { y &= 255; return x >> y; }
> +__int128 f36 (__int128 x, unsigned long y) { return x >> (y & 255); }
> +#endif
>
>         Jakub
>
diff mbox series

Patch

--- gcc/config/i386/i386.md.jj	2022-05-31 11:33:51.457251607 +0200
+++ gcc/config/i386/i386.md	2022-06-01 11:59:27.388631872 +0200
@@ -11890,11 +11890,16 @@  (define_insn_and_split "*ashl<dwi>3_doub
 	(ashift:<DWI>
 	  (match_operand:<DWI> 1 "register_operand")
 	  (subreg:QI
-	    (and:SI
-	      (match_operand:SI 2 "register_operand" "c")
-	      (match_operand:SI 3 "const_int_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+	    (and
+	      (match_operand 2 "register_operand" "c")
+	      (match_operand 3 "const_int_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
+	 == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
+   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -11912,6 +11917,15 @@  (define_insn_and_split "*ashl<dwi>3_doub
 	   (ashift:DWIH (match_dup 5) (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
 {
+  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
+    {
+      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+      operands[2] = gen_lowpart (QImode, operands[2]);
+      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
+					    operands[2]));
+      DONE;
+    }
+
   split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
 
   operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
@@ -11925,6 +11939,7 @@  (define_insn_and_split "*ashl<dwi>3_doub
       operands[2] = tem;
     }
 
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
   operands[2] = gen_lowpart (QImode, operands[2]);
 
   if (!rtx_equal_p (operands[6], operands[7]))
@@ -11939,7 +11954,9 @@  (define_insn_and_split "*ashl<dwi>3_doub
 	    (match_operand:QI 2 "register_operand" "c")
 	    (match_operand:QI 3 "const_int_operand"))))
    (clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
+	 == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -11957,6 +11974,13 @@  (define_insn_and_split "*ashl<dwi>3_doub
 	   (ashift:DWIH (match_dup 5) (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
 {
+  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
+    {
+      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
+					    operands[2]));
+      DONE;
+    }
+
   split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
 
   operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
@@ -11974,7 +11998,7 @@  (define_insn_and_split "*ashl<dwi>3_doub
     emit_move_insn (operands[6], operands[7]);
 })
 
-(define_insn "*ashl<mode>3_doubleword"
+(define_insn "ashl<mode>3_doubleword"
   [(set (match_operand:DWI 0 "register_operand" "=&r")
 	(ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n")
 		    (match_operand:QI 2 "nonmemory_operand" "<S>c")))
@@ -12186,13 +12210,16 @@  (define_insn_and_split "*ashl<mode>3_mas
 	(ashift:SWI48
 	  (match_operand:SWI48 1 "nonimmediate_operand")
 	  (subreg:QI
-	    (and:SI
-	      (match_operand:SI 2 "register_operand" "c,r")
-	      (match_operand:SI 3 "const_int_operand")) 0)))
+	    (and
+	      (match_operand 2 "register_operand" "c,r")
+	      (match_operand 3 "const_int_operand")) 0)))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)
    && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
       == GET_MODE_BITSIZE (<MODE>mode)-1
+   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -12201,7 +12228,10 @@  (define_insn_and_split "*ashl<mode>3_mas
 	   (ashift:SWI48 (match_dup 1)
 			 (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
-  "operands[2] = gen_lowpart (QImode, operands[2]);"
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
   [(set_attr "isa" "*,bmi2")])
 
 (define_insn_and_split "*ashl<mode>3_mask_1"
@@ -12774,13 +12804,16 @@  (define_insn_and_split "*<insn><mode>3_m
 	(any_shiftrt:SWI48
 	  (match_operand:SWI48 1 "nonimmediate_operand")
 	  (subreg:QI
-	    (and:SI
-	      (match_operand:SI 2 "register_operand" "c,r")
-	      (match_operand:SI 3 "const_int_operand")) 0)))
+	    (and
+	      (match_operand 2 "register_operand" "c,r")
+	      (match_operand 3 "const_int_operand")) 0)))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
    && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
       == GET_MODE_BITSIZE (<MODE>mode)-1
+   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -12789,7 +12822,10 @@  (define_insn_and_split "*<insn><mode>3_m
 	   (any_shiftrt:SWI48 (match_dup 1)
 			      (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
-  "operands[2] = gen_lowpart (QImode, operands[2]);"
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
   [(set_attr "isa" "*,bmi2")])
 
 (define_insn_and_split "*<insn><mode>3_mask_1"
@@ -12819,11 +12855,16 @@  (define_insn_and_split "*<insn><dwi>3_do
 	(any_shiftrt:<DWI>
 	  (match_operand:<DWI> 1 "register_operand")
 	  (subreg:QI
-	    (and:SI
-	      (match_operand:SI 2 "register_operand" "c")
-	      (match_operand:SI 3 "const_int_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+	    (and
+	      (match_operand 2 "register_operand" "c")
+	      (match_operand 3 "const_int_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
+	 == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
+   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -12841,6 +12882,15 @@  (define_insn_and_split "*<insn><dwi>3_do
 	   (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
 {
+  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
+    {
+      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+      operands[2] = gen_lowpart (QImode, operands[2]);
+      emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1],
+					      operands[2]));
+      DONE;
+    }
+
   split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
 
   operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
@@ -12854,6 +12904,7 @@  (define_insn_and_split "*<insn><dwi>3_do
       operands[2] = tem;
     }
 
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
   operands[2] = gen_lowpart (QImode, operands[2]);
 
   if (!rtx_equal_p (operands[4], operands[5]))
@@ -12868,7 +12919,9 @@  (define_insn_and_split "*<insn><dwi>3_do
 	    (match_operand:QI 2 "register_operand" "c")
 	    (match_operand:QI 3 "const_int_operand"))))
    (clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
+	 == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -12886,6 +12939,13 @@  (define_insn_and_split "*<insn><dwi>3_do
 	   (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
 {
+  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
+    {
+      emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1],
+					      operands[2]));
+      DONE;
+    }
+
   split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
 
   operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
@@ -12903,7 +12963,7 @@  (define_insn_and_split "*<insn><dwi>3_do
     emit_move_insn (operands[4], operands[5]);
 })
 
-(define_insn_and_split "*<insn><mode>3_doubleword"
+(define_insn_and_split "<insn><mode>3_doubleword"
   [(set (match_operand:DWI 0 "register_operand" "=&r")
 	(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
 			 (match_operand:QI 2 "nonmemory_operand" "<S>c")))
@@ -13586,13 +13646,16 @@  (define_insn_and_split "*<insn><mode>3_m
 	(any_rotate:SWI
 	  (match_operand:SWI 1 "nonimmediate_operand")
 	  (subreg:QI
-	    (and:SI
-	      (match_operand:SI 2 "register_operand" "c")
-	      (match_operand:SI 3 "const_int_operand")) 0)))
+	    (and
+	      (match_operand 2 "register_operand" "c")
+	      (match_operand 3 "const_int_operand")) 0)))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
    && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
       == GET_MODE_BITSIZE (<MODE>mode)-1
+   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -13601,18 +13664,24 @@  (define_insn_and_split "*<insn><mode>3_m
 	   (any_rotate:SWI (match_dup 1)
 			   (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
-  "operands[2] = gen_lowpart (QImode, operands[2]);")
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+})
 
 (define_split
   [(set (match_operand:SWI 0 "register_operand")
 	(any_rotate:SWI
 	  (match_operand:SWI 1 "const_int_operand")
 	  (subreg:QI
-	    (and:SI
-	      (match_operand:SI 2 "register_operand")
-	      (match_operand:SI 3 "const_int_operand")) 0)))]
+	    (and
+	      (match_operand 2 "register_operand")
+	      (match_operand 3 "const_int_operand")) 0)))]
  "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1))
-   == GET_MODE_BITSIZE (<MODE>mode) - 1"
+   == GET_MODE_BITSIZE (<MODE>mode) - 1
+  && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+  && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+	       4 << (TARGET_64BIT ? 1 : 0))"
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0)
        (any_rotate:SWI (match_dup 4)
@@ -13976,14 +14045,17 @@  (define_insn_and_split "*<btsc><mode>_ma
 	  (ashift:SWI48
 	    (const_int 1)
 	    (subreg:QI
-	      (and:SI
-		(match_operand:SI 1 "register_operand")
-		(match_operand:SI 2 "const_int_operand")) 0))
+	      (and
+		(match_operand 1 "register_operand")
+		(match_operand 2 "const_int_operand")) 0))
 	  (match_operand:SWI48 3 "register_operand")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_USE_BT
    && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
       == GET_MODE_BITSIZE (<MODE>mode)-1
+   && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -13994,7 +14066,10 @@  (define_insn_and_split "*<btsc><mode>_ma
 			   (match_dup 1))
 	     (match_dup 3)))
       (clobber (reg:CC FLAGS_REG))])]
-  "operands[1] = gen_lowpart (QImode, operands[1]);")
+{
+  operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
+  operands[1] = gen_lowpart (QImode, operands[1]);
+})
 
 (define_insn_and_split "*<btsc><mode>_mask_1"
   [(set (match_operand:SWI48 0 "register_operand")
@@ -14041,14 +14116,17 @@  (define_insn_and_split "*btr<mode>_mask"
 	  (rotate:SWI48
 	    (const_int -2)
 	    (subreg:QI
-	      (and:SI
-		(match_operand:SI 1 "register_operand")
-		(match_operand:SI 2 "const_int_operand")) 0))
+	      (and
+		(match_operand 1 "register_operand")
+		(match_operand 2 "const_int_operand")) 0))
 	  (match_operand:SWI48 3 "register_operand")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_USE_BT
    && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
       == GET_MODE_BITSIZE (<MODE>mode)-1
+   && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -14059,7 +14137,10 @@  (define_insn_and_split "*btr<mode>_mask"
 			   (match_dup 1))
 	     (match_dup 3)))
       (clobber (reg:CC FLAGS_REG))])]
-  "operands[1] = gen_lowpart (QImode, operands[1]);")
+{
+  operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
+  operands[1] = gen_lowpart (QImode, operands[1]);
+})
 
 (define_insn_and_split "*btr<mode>_mask_1"
   [(set (match_operand:SWI48 0 "register_operand")
@@ -14409,6 +14490,47 @@  (define_insn_and_split "*jcc_bt<mode>_ma
   operands[0] = shallow_copy_rtx (operands[0]);
   PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
 })
+
+(define_insn_and_split "*jcc_bt<mode>_mask_1"
+  [(set (pc)
+	(if_then_else (match_operator 0 "bt_comparison_operator"
+			[(zero_extract:SWI48
+			   (match_operand:SWI48 1 "register_operand")
+			   (const_int 1)
+			   (zero_extend:SI
+			     (subreg:QI
+			       (and
+				 (match_operand 2 "register_operand")
+				 (match_operand 3 "const_int_operand")) 0)))])
+		      (label_ref (match_operand 4))
+		      (pc)))
+   (clobber (reg:CC FLAGS_REG))]
+  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
+      == GET_MODE_BITSIZE (<MODE>mode)-1
+   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
+   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
+		4 << (TARGET_64BIT ? 1 : 0))
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (zero_extract:SWI48
+	    (match_dup 1)
+	    (const_int 1)
+	    (match_dup 2))
+	  (const_int 0)))
+   (set (pc)
+	(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
+		      (label_ref (match_dup 4))
+		      (pc)))]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (SImode, operands[2]);
+  operands[0] = shallow_copy_rtx (operands[0]);
+  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
+})
 
 ;; Help combine recognize bt followed by cmov
 (define_split
--- gcc/config/i386/i386.cc.jj	2022-05-31 11:33:51.452251660 +0200
+++ gcc/config/i386/i386.cc	2022-06-01 12:40:06.189186012 +0200
@@ -20995,6 +20995,20 @@  ix86_rtx_costs (rtx x, machine_mode mode
         *total += 1;
       return false;
 
+    case ZERO_EXTRACT:
+      if (XEXP (x, 1) == const1_rtx
+	  && GET_CODE (XEXP (x, 2)) == ZERO_EXTEND
+	  && GET_MODE (XEXP (x, 2)) == SImode
+	  && GET_MODE (XEXP (XEXP (x, 2), 0)) == QImode)
+	{
+	  /* Ignore cost of zero extension and masking of last argument.  */
+	  *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed);
+	  *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed);
+	  *total += rtx_cost (XEXP (XEXP (x, 2), 0), mode, code, 2, speed);
+	  return true;
+	}
+      return false;
+
     default:
       return false;
     }
--- gcc/testsuite/gcc.target/i386/pr105778.c.jj	2022-05-31 13:59:12.470814609 +0200
+++ gcc/testsuite/gcc.target/i386/pr105778.c	2022-05-31 13:58:50.624044700 +0200
@@ -0,0 +1,45 @@ 
+/* PR target/105778 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "\tand\[^\n\r]*\(31\|63\|127\|255\)" } } */
+
+unsigned int f1 (unsigned int x, unsigned long y) { y &= 31; return x << y; }
+unsigned int f2 (unsigned int x, unsigned long y) { return x << (y & 31); }
+unsigned int f3 (unsigned int x, unsigned long y) { y &= 31; return x >> y; }
+unsigned int f4 (unsigned int x, unsigned long y) { return x >> (y & 31); }
+int f5 (int x, unsigned long y) { y &= 31; return x >> y; }
+int f6 (int x, unsigned long y) { return x >> (y & 31); }
+unsigned long long f7 (unsigned long long x, unsigned long y) { y &= 63; return x << y; }
+unsigned long long f8 (unsigned long long x, unsigned long y) { return x << (y & 63); }
+unsigned long long f9 (unsigned long long x, unsigned long y) { y &= 63; return x >> y; }
+unsigned long long f10 (unsigned long long x, unsigned long y) { return x >> (y & 63); }
+long long f11 (long long x, unsigned long y) { y &= 63; return x >> y; }
+long long f12 (long long x, unsigned long y) { return x >> (y & 63); }
+#ifdef __SIZEOF_INT128__
+unsigned __int128 f13 (unsigned __int128 x, unsigned long y) { y &= 127; return x << y; }
+unsigned __int128 f14 (unsigned __int128 x, unsigned long y) { return x << (y & 127); }
+unsigned __int128 f15 (unsigned __int128 x, unsigned long y) { y &= 127; return x >> y; }
+unsigned __int128 f16 (unsigned __int128 x, unsigned long y) { return x >> (y & 127); }
+__int128 f17 (__int128 x, unsigned long y) { y &= 127; return x >> y; }
+__int128 f18 (__int128 x, unsigned long y) { return x >> (y & 127); }
+#endif
+unsigned int f19 (unsigned int x, unsigned long y) { y &= 63; return x << y; }
+unsigned int f20 (unsigned int x, unsigned long y) { return x << (y & 63); }
+unsigned int f21 (unsigned int x, unsigned long y) { y &= 63; return x >> y; }
+unsigned int f22 (unsigned int x, unsigned long y) { return x >> (y & 63); }
+int f23 (int x, unsigned long y) { y &= 63; return x >> y; }
+int f24 (int x, unsigned long y) { return x >> (y & 63); }
+unsigned long long f25 (unsigned long long x, unsigned long y) { y &= 127; return x << y; }
+unsigned long long f26 (unsigned long long x, unsigned long y) { return x << (y & 127); }
+unsigned long long f27 (unsigned long long x, unsigned long y) { y &= 127; return x >> y; }
+unsigned long long f28 (unsigned long long x, unsigned long y) { return x >> (y & 127); }
+long long f29 (long long x, unsigned long y) { y &= 127; return x >> y; }
+long long f30 (long long x, unsigned long y) { return x >> (y & 127); }
+#ifdef __SIZEOF_INT128__
+unsigned __int128 f31 (unsigned __int128 x, unsigned long y) { y &= 255; return x << y; }
+unsigned __int128 f32 (unsigned __int128 x, unsigned long y) { return x << (y & 255); }
+unsigned __int128 f33 (unsigned __int128 x, unsigned long y) { y &= 255; return x >> y; }
+unsigned __int128 f34 (unsigned __int128 x, unsigned long y) { return x >> (y & 255); }
+__int128 f35 (__int128 x, unsigned long y) { y &= 255; return x >> y; }
+__int128 f36 (__int128 x, unsigned long y) { return x >> (y & 255); }
+#endif