diff mbox series

assorted improvements for fold_truth_andor_1

Message ID or1riqasdy.fsf@livre.home
State New
Headers show
Series assorted improvements for fold_truth_andor_1 | expand

Commit Message

Alexandre Oliva Sept. 25, 2020, 1:38 p.m. UTC
This patch introduces various improvements to the logic that merges
field compares.

Before the patch, we could merge:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

into something like:

  (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)

if both of A's fields live within the same alignment boundaries, and
so do B's, at the same relative positions.  Constants may be used
instead of the object B.

The initial goal of this patch was to enable such combinations when a
field crossed alignment boundaries, e.g. for packed types.  We can't
generally access such fields with a single memory access, so when we
come across such a compare, we will attempt to combine each access
separately.

Some merging opportunities were missed because of right-shifts,
compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
narrowing conversions, especially after earlier merges.  This patch
introduces handlers for several cases involving these.

Other merging opportunities were missed because of association.  The
existing logic would only succeed in merging a pair of consecutive
compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying
e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D.  I've generalized the
handling of the rightmost compare in the left-hand operand, going for
the leftmost compare in the right-hand operand, and then onto trying
to merge compares pairwise, one from each operand, even if they are
not consecutive, taking care to avoid merging operations with
intervening side effects, including volatile accesses.

When it is the second of a non-consecutive pair of compares that first
accesses a word, we may merge the first compare with part of the
second compare that refers to the same word, keeping the compare of
the remaining bits at the spot where the second compare used to be.

Handling compares with non-constant fields was somewhat generalized,
now handling non-adjacent fields.  When a field of one object crosses
an alignment boundary but the other doesn't, we issue the same load in
both compares; gimple optimizers will later turn it into a single
load, without our having to handle SAVE_EXPRs at this point.

The logic for issuing split loads and compares, and ordering them, is
now shared between all cases of compares with constants and with
another object.

The -Wno-error for toplev.o on rs6000 is because of toplev.c's:

  if ((flag_sanitize & SANITIZE_ADDRESS)
      && !FRAME_GROWS_DOWNWARD)

and rs6000.h's:

#define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0			\
			      || (flag_sanitize & SANITIZE_ADDRESS) != 0)

The mutually exclusive conditions involving flag_sanitize are now
noticed and reported by fold-const.c's:

	      warning (0,
		       "%<and%> of mutually exclusive equal-tests"
		       " is always 0");

This patch enables over 12k compare-merging opportunities that we used
to miss in a GCC bootstrap.

Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

	* fold-const.c (prepare_xor): New.
	(decode_field_reference): Handle xor, shift, and narrowing
	conversions.
	(all_ones_mask_p): Remove.
	(compute_split_boundary_from_align): New.
	(build_split_load, reuse_split_load): New.
	(fold_truth_andor_1): Add recursion to combine pairs of
	non-neighboring compares.  Handle xor compared with zero.
	Handle fields straddling across alignment boundaries.
	Generalize handling of non-constant rhs.
	(fold_truth_andor): Leave sub-expression handling to the
	recursion above.
	* config/rs6000/t-rs6000 (toplev.o-warn): Disable errors.

for  gcc/testsuite/ChangeLog

	* gcc.dg/field-merge-1.c: New.
	* gcc.dg/field-merge-2.c: New.
	* gcc.dg/field-merge-3.c: New.
	* gcc.dg/field-merge-4.c: New.
	* gcc.dg/field-merge-5.c: New.
---
 gcc/config/rs6000/t-rs6000           |    4 
 gcc/fold-const.c                     |  818 ++++++++++++++++++++++++++++------
 gcc/testsuite/gcc.dg/field-merge-1.c |   64 +++
 gcc/testsuite/gcc.dg/field-merge-2.c |   31 +
 gcc/testsuite/gcc.dg/field-merge-3.c |   36 +
 gcc/testsuite/gcc.dg/field-merge-4.c |   40 ++
 gcc/testsuite/gcc.dg/field-merge-5.c |   40 ++
 7 files changed, 882 insertions(+), 151 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c

Comments

Richard Biener Sept. 28, 2020, 8:22 a.m. UTC | #1
On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva <oliva@adacore.com> wrote:
>
>
> This patch introduces various improvements to the logic that merges
> field compares.
>
> Before the patch, we could merge:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)
>
> into something like:
>
>   (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)
>
> if both of A's fields live within the same alignment boundaries, and
> so do B's, at the same relative positions.  Constants may be used
> instead of the object B.
>
> The initial goal of this patch was to enable such combinations when a
> field crossed alignment boundaries, e.g. for packed types.  We can't
> generally access such fields with a single memory access, so when we
> come across such a compare, we will attempt to combine each access
> separately.
>
> Some merging opportunities were missed because of right-shifts,
> compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
> narrowing conversions, especially after earlier merges.  This patch
> introduces handlers for several cases involving these.
>
> Other merging opportunities were missed because of association.  The
> existing logic would only succeed in merging a pair of consecutive
> compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying
> e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D.  I've generalized the
> handling of the rightmost compare in the left-hand operand, going for
> the leftmost compare in the right-hand operand, and then onto trying
> to merge compares pairwise, one from each operand, even if they are
> not consecutive, taking care to avoid merging operations with
> intervening side effects, including volatile accesses.
>
> When it is the second of a non-consecutive pair of compares that first
> accesses a word, we may merge the first compare with part of the
> second compare that refers to the same word, keeping the compare of
> the remaining bits at the spot where the second compare used to be.
>
> Handling compares with non-constant fields was somewhat generalized,
> now handling non-adjacent fields.  When a field of one object crosses
> an alignment boundary but the other doesn't, we issue the same load in
> both compares; gimple optimizers will later turn it into a single
> load, without our having to handle SAVE_EXPRs at this point.
>
> The logic for issuing split loads and compares, and ordering them, is
> now shared between all cases of compares with constants and with
> another object.
>
> The -Wno-error for toplev.o on rs6000 is because of toplev.c's:
>
>   if ((flag_sanitize & SANITIZE_ADDRESS)
>       && !FRAME_GROWS_DOWNWARD)
>
> and rs6000.h's:
>
> #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0                   \
>                               || (flag_sanitize & SANITIZE_ADDRESS) != 0)
>
> The mutually exclusive conditions involving flag_sanitize are now
> noticed and reported by fold-const.c's:
>
>               warning (0,
>                        "%<and%> of mutually exclusive equal-tests"
>                        " is always 0");
>
> This patch enables over 12k compare-merging opportunities that we used
> to miss in a GCC bootstrap.
>
> Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Ok to install?

Sorry for throwing a wrench in here but doing this kind of transform
during GENERIC folding is considered a bad thing.

There are better places during GIMPLE opts to replicate (and enhance)
what fold_truth_andor does, namely ifcombine and reassoc, to mention
the two passes that do similar transforms (they are by no means exact
matches, otherwise fold_truth_andor would be gone already).

Richard.

>
> for  gcc/ChangeLog
>
>         * fold-const.c (prepare_xor): New.
>         (decode_field_reference): Handle xor, shift, and narrowing
>         conversions.
>         (all_ones_mask_p): Remove.
>         (compute_split_boundary_from_align): New.
>         (build_split_load, reuse_split_load): New.
>         (fold_truth_andor_1): Add recursion to combine pairs of
>         non-neighboring compares.  Handle xor compared with zero.
>         Handle fields straddling across alignment boundaries.
>         Generalize handling of non-constant rhs.
>         (fold_truth_andor): Leave sub-expression handling to the
>         recursion above.
>         * config/rs6000/t-rs6000 (toplev.o-warn): Disable errors.
>
> for  gcc/testsuite/ChangeLog
>
>         * gcc.dg/field-merge-1.c: New.
>         * gcc.dg/field-merge-2.c: New.
>         * gcc.dg/field-merge-3.c: New.
>         * gcc.dg/field-merge-4.c: New.
>         * gcc.dg/field-merge-5.c: New.
> ---
>  gcc/config/rs6000/t-rs6000           |    4
>  gcc/fold-const.c                     |  818 ++++++++++++++++++++++++++++------
>  gcc/testsuite/gcc.dg/field-merge-1.c |   64 +++
>  gcc/testsuite/gcc.dg/field-merge-2.c |   31 +
>  gcc/testsuite/gcc.dg/field-merge-3.c |   36 +
>  gcc/testsuite/gcc.dg/field-merge-4.c |   40 ++
>  gcc/testsuite/gcc.dg/field-merge-5.c |   40 ++
>  7 files changed, 882 insertions(+), 151 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c
>
> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
> index 1ddb572..516486d 100644
> --- a/gcc/config/rs6000/t-rs6000
> +++ b/gcc/config/rs6000/t-rs6000
> @@ -52,6 +52,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
>         $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
>                 $(srcdir)/config/rs6000/rs6000-tables.opt
>
> +# FRAME_GROWS_DOWNWARD tests flag_sanitize in a way that rules out a
> +# test in toplev.c.
> +toplev.o-warn = -Wno-error
> +
>  # The rs6000 backend doesn't cause warnings in these files.
>  insn-conditions.o-warn =
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0cc80ad..bfbf88c 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -4603,6 +4603,41 @@ optimize_bit_field_compare (location_t loc, enum tree_code code,
>    return lhs;
>  }
>
> +/* If *R_ARG is a constant zero, and L_ARG is a possibly masked
> +   BIT_XOR_EXPR, return 1 and set *r_arg to l_arg.
> +   Otherwise, return 0.
> +
> +   The returned value should be passed to decode_field_reference for it
> +   to handle l_arg, and then doubled for r_arg.  */
> +static int
> +prepare_xor (tree l_arg, tree *r_arg)
> +{
> +  int ret = 0;
> +
> +  if (!integer_zerop (*r_arg))
> +    return ret;
> +
> +  tree exp = l_arg;
> +  STRIP_NOPS (exp);
> +
> +  if (TREE_CODE (exp) == BIT_AND_EXPR)
> +    {
> +      tree and_mask = TREE_OPERAND (exp, 1);
> +      exp = TREE_OPERAND (exp, 0);
> +      STRIP_NOPS (exp); STRIP_NOPS (and_mask);
> +      if (TREE_CODE (and_mask) != INTEGER_CST)
> +       return ret;
> +    }
> +
> +  if (TREE_CODE (exp) == BIT_XOR_EXPR)
> +    {
> +      *r_arg = l_arg;
> +      return 1;
> +    }
> +
> +  return ret;
> +}
> +
>  /* Subroutine for fold_truth_andor_1: decode a field reference.
>
>     If EXP is a comparison reference, we return the innermost reference.
> @@ -4625,6 +4660,10 @@ optimize_bit_field_compare (location_t loc, enum tree_code code,
>
>     *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
>
> +   XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked)
> +   BIT_XOR_EXPR compared with zero.  We're to take the first or second
> +   operand thereof if so.  It should be zero otherwise.
> +
>     Return 0 if this is not a component reference or is one that we can't
>     do anything with.  */
>
> @@ -4632,7 +4671,7 @@ static tree
>  decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
>                         HOST_WIDE_INT *pbitpos, machine_mode *pmode,
>                         int *punsignedp, int *preversep, int *pvolatilep,
> -                       tree *pmask, tree *pand_mask)
> +                       tree *pmask, tree *pand_mask, int xor_which)
>  {
>    tree exp = *exp_;
>    tree outer_type = 0;
> @@ -4640,6 +4679,7 @@ decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
>    tree mask, inner, offset;
>    tree unsigned_type;
>    unsigned int precision;
> +  HOST_WIDE_INT shiftrt = 0;
>
>    /* All the optimizations using this function assume integer fields.
>       There are problems with FP fields since the type_for_size call
> @@ -4664,13 +4704,55 @@ decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
>         return NULL_TREE;
>      }
>
> +  if (xor_which)
> +    {
> +      gcc_checking_assert (TREE_CODE (exp) == BIT_XOR_EXPR);
> +      exp = TREE_OPERAND (exp, xor_which - 1);
> +      STRIP_NOPS (exp);
> +    }
> +
> +  if (CONVERT_EXPR_P (exp)
> +      || TREE_CODE (exp) == NON_LVALUE_EXPR)
> +    {
> +      if (!outer_type)
> +       outer_type = TREE_TYPE (exp);
> +      exp = TREE_OPERAND (exp, 0);
> +      STRIP_NOPS (exp);
> +    }
> +
> +  if (TREE_CODE (exp) == RSHIFT_EXPR
> +      && TREE_CODE (TREE_OPERAND (exp, 1)) == INTEGER_CST
> +      && tree_fits_shwi_p (TREE_OPERAND (exp, 1)))
> +    {
> +      tree shift = TREE_OPERAND (exp, 1);
> +      STRIP_NOPS (shift);
> +      shiftrt = tree_to_shwi (shift);
> +      if (shiftrt > 0)
> +       {
> +         exp = TREE_OPERAND (exp, 0);
> +         STRIP_NOPS (exp);
> +       }
> +      else
> +       shiftrt = 0;
> +    }
> +
> +  if (CONVERT_EXPR_P (exp)
> +      || TREE_CODE (exp) == NON_LVALUE_EXPR)
> +    {
> +      if (!outer_type)
> +       outer_type = TREE_TYPE (exp);
> +      exp = TREE_OPERAND (exp, 0);
> +      STRIP_NOPS (exp);
> +    }
> +
>    poly_int64 poly_bitsize, poly_bitpos;
>    inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
>                                pmode, punsignedp, preversep, pvolatilep);
> +
>    if ((inner == exp && and_mask == 0)
>        || !poly_bitsize.is_constant (pbitsize)
>        || !poly_bitpos.is_constant (pbitpos)
> -      || *pbitsize < 0
> +      || *pbitsize <= shiftrt
>        || offset != 0
>        || TREE_CODE (inner) == PLACEHOLDER_EXPR
>        /* Reject out-of-bound accesses (PR79731).  */
> @@ -4679,6 +4761,21 @@ decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
>                                *pbitpos + *pbitsize) < 0))
>      return NULL_TREE;
>
> +  if (shiftrt)
> +    {
> +      if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       *pbitpos += shiftrt;
> +      *pbitsize -= shiftrt;
> +    }
> +
> +  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
> +    {
> +      HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
> +      if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       *pbitpos += excess;
> +      *pbitsize -= excess;
> +    }
> +
>    unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
>    if (unsigned_type == NULL_TREE)
>      return NULL_TREE;
> @@ -4709,27 +4806,6 @@ decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
>    return inner;
>  }
>
> -/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
> -   bit positions and MASK is SIGNED.  */
> -
> -static bool
> -all_ones_mask_p (const_tree mask, unsigned int size)
> -{
> -  tree type = TREE_TYPE (mask);
> -  unsigned int precision = TYPE_PRECISION (type);
> -
> -  /* If this function returns true when the type of the mask is
> -     UNSIGNED, then there will be errors.  In particular see
> -     gcc.c-torture/execute/990326-1.c.  There does not appear to be
> -     any documentation paper trail as to why this is so.  But the pre
> -     wide-int worked with that restriction and it has been preserved
> -     here.  */
> -  if (size > precision || TYPE_SIGN (type) == UNSIGNED)
> -    return false;
> -
> -  return wi::mask (size, false, precision) == wi::to_wide (mask);
> -}
> -
>  /* Subroutine for fold: determine if VAL is the INTEGER_CONST that
>     represents the sign bit of EXP's type.  If EXP represents a sign
>     or zero extension, also test VAL against the unextended type.
> @@ -6120,6 +6196,139 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
>    return NULL_TREE;
>  }
>
> +/* Return the one bitpos within bit extents L or R that is at an
> +   ALIGN-bit alignment boundary, or -1 if there is more than one such
> +   boundary, if there isn't any, or if there is any such boundary
> +   between the extents.  L and R are given by bitpos and bitsize.  If
> +   it doesn't return -1, there are two consecutive ALIGN-bit words
> +   that contain both extents, and at least one of the extents
> +   straddles across the returned alignment boundary.  */
> +static inline HOST_WIDE_INT
> +compute_split_boundary_from_align (HOST_WIDE_INT align,
> +                                  HOST_WIDE_INT l_bitpos,
> +                                  HOST_WIDE_INT l_bitsize,
> +                                  HOST_WIDE_INT r_bitpos,
> +                                  HOST_WIDE_INT r_bitsize)
> +{
> +  HOST_WIDE_INT amask = ~(align - 1);
> +
> +  HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
> +  HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
> +
> +  HOST_WIDE_INT boundary = (end_bit - 1) & amask;
> +
> +  /* Make sure we're crossing no more than one alignment boundary.
> +
> +     ??? We don't have logic to recombine loads of two adjacent
> +     fields that each crosses a different alignment boundary, so
> +     as to load the middle word only once, if other words can't be
> +     otherwise recombined.  */
> +  if (boundary - first_bit > align)
> +    return -1;
> +
> +  HOST_WIDE_INT l_start_word = l_bitpos & amask;
> +  HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
> +
> +  HOST_WIDE_INT r_start_word = r_bitpos & amask;
> +  HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
> +
> +  /* If neither field straddles across an alignment boundary, it's no
> +     use to even try to merge them.  */
> +  if (l_start_word == l_end_word && r_start_word == r_end_word)
> +    return -1;
> +
> +  return boundary;
> +}
> +
> +/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
> +   LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
> +   respectively, starting at BIT_POS, using reversed endianness if
> +   REVERSEP.  Also initialize BITPOS (the starting position of each
> +   part into INNER), BITSIZ (the bit count starting at BITPOS),
> +   TOSHIFT[1] (the amount by which the part and its mask are to be
> +   shifted right to bring its least-significant bit to bit zero) and
> +   SHIFTED (the amount by which the part, by separate loading, has
> +   already been shifted right, but that the mask needs shifting to
> +   match).  */
> +static inline void
> +build_split_load (tree /* out */ ln_arg[2],
> +                 HOST_WIDE_INT /* out */ bitpos[2],
> +                 HOST_WIDE_INT /* out */ bitsiz[2],
> +                 HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> +                 HOST_WIDE_INT /* out */ shifted[2],
> +                 location_t loc, tree inner, tree orig_inner,
> +                 scalar_int_mode mode, scalar_int_mode mode2,
> +                 HOST_WIDE_INT bit_pos, bool reversep)
> +{
> +  bitsiz[0] = GET_MODE_BITSIZE (mode);
> +  bitsiz[1] = GET_MODE_BITSIZE (mode2);
> +
> +  for (int i = 0; i < 2; i++)
> +    {
> +      tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
> +      bitpos[i] = bit_pos;
> +      ln_arg[i] = make_bit_field_ref (loc, inner, orig_inner,
> +                                     type, bitsiz[i],
> +                                     bit_pos, 1, reversep);
> +      bit_pos += bitsiz[i];
> +    }
> +
> +  toshift[1] = toshift[0];
> +  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      shifted[0] = bitsiz[1];
> +      shifted[1] = 0;
> +      toshift[0] = 0;
> +    }
> +  else
> +    {
> +      shifted[1] = bitsiz[0];
> +      shifted[0] = 0;
> +      toshift[1] = 0;
> +    }
> +}
> +
> +/* Make arrangements to split at bit BOUNDARY a single loaded word
> +   (with REVERSEP bit order) LN_ARG[0], to be shifted right by
> +   TOSHIFT[0] to bring the field of interest to the least-significant
> +   bit.  The expectation is that the same loaded word will be
> +   propagated from part 0 to part 1, with just different shifting and
> +   masking to extract both parts.  MASK is not expected to do more
> +   than masking out the bits that belong to the other part.  See
> +   build_split_load for more information on the other fields.  */
> +static inline void
> +reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
> +                 HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
> +                 HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
> +                 HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> +                 HOST_WIDE_INT /* out */ shifted[2],
> +                 tree /* out */ mask[2],
> +                 HOST_WIDE_INT boundary, bool reversep)
> +{
> +  ln_arg[1] = ln_arg[0];
> +  bitpos[1] = bitpos[0];
> +  bitsiz[1] = bitsiz[0];
> +  shifted[1] = shifted[0] = 0;
> +
> +  tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1);
> +
> +  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      toshift[1] = toshift[0];
> +      toshift[0] = bitpos[0] + bitsiz[0] - boundary;
> +      mask[0] = const_binop (LSHIFT_EXPR, basemask,
> +                            bitsize_int (toshift[0]));
> +      mask[1] = const_binop (BIT_XOR_EXPR, basemask, mask[0]);
> +    }
> +  else
> +    {
> +      toshift[1] = boundary - bitpos[1];
> +      mask[1] = const_binop (LSHIFT_EXPR, basemask,
> +                            bitsize_int (toshift[1]));
> +      mask[0] = const_binop (BIT_XOR_EXPR, basemask, mask[1]);
> +    }
> +}
> +
>  /* Find ways of folding logical expressions of LHS and RHS:
>     Try to merge two comparisons to the same innermost item.
>     Look for range tests like "ch >= '0' && ch <= '9'".
> @@ -6142,11 +6351,19 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
>     TRUTH_TYPE is the type of the logical operand and LHS and RHS are its
>     two operands.
>
> +   SEPARATEP should be NULL if LHS and RHS are adjacent in
> +   CODE-chained compares, and a non-NULL pointer to NULL_TREE
> +   otherwise.  If the "words" accessed by RHS are already accessed by
> +   LHS, this won't matter, but if RHS accesses "words" that LHS
> +   doesn't, then *SEPARATEP will be set to the compares that should
> +   take RHS's place.  By "words" we mean contiguous bits that do not
> +   cross a an TYPE_ALIGN boundary of the accessed object's type.
> +
>     We return the simplified tree or 0 if no optimization is possible.  */
>
>  static tree
>  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
> -                   tree lhs, tree rhs)
> +                   tree lhs, tree rhs, tree *separatep)
>  {
>    /* If this is the "or" of two comparisons, we can do something if
>       the comparisons are NE_EXPR.  If this is the "and", we can do something
> @@ -6157,6 +6374,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>       convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
>       comparison for one-bit fields.  */
>
> +  enum tree_code orig_code = code;
>    enum tree_code wanted_code;
>    enum tree_code lcode, rcode;
>    tree ll_arg, lr_arg, rl_arg, rr_arg;
> @@ -6168,13 +6386,16 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>    int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
>    int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
>    machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
> -  scalar_int_mode lnmode, rnmode;
> +  scalar_int_mode lnmode, lnmode2, rnmode;
>    tree ll_mask, lr_mask, rl_mask, rr_mask;
>    tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
>    tree l_const, r_const;
>    tree lntype, rntype, result;
>    HOST_WIDE_INT first_bit, end_bit;
>    int volatilep;
> +  bool l_split_load;
> +
> +  gcc_checking_assert (!separatep || !*separatep);
>
>    /* Start by getting the comparison codes.  Fail if anything is volatile.
>       If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> @@ -6202,7 +6423,116 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>
>    if (TREE_CODE_CLASS (lcode) != tcc_comparison
>        || TREE_CODE_CLASS (rcode) != tcc_comparison)
> -    return 0;
> +    {
> +      tree separate = NULL;
> +
> +      /* Check for the possibility of merging component references.
> +        If any of our operands is another similar operation, recurse
> +        to try to merge individual operands, but avoiding double
> +        recursion: recurse to each leaf of LHS, and from there to
> +        each leaf of RHS, but don't bother recursing into LHS if RHS
> +        is neither a comparison nor a compound expr, nor into RHS if
> +        the LHS leaf isn't a comparison.  In case of no successful
> +        merging, recursion depth is limited to the sum of the depths
> +        of LHS and RHS, and the non-recursing code below will run no
> +        more times than the product of the leaf counts of LHS and
> +        RHS.  If there is a successful merge, we (recursively)
> +        further attempt to fold the result, so recursion depth and
> +        merge attempts are harder to compute.  */
> +      if (TREE_CODE (lhs) == code && TREE_TYPE (lhs) == truth_type
> +         && (TREE_CODE_CLASS (rcode) == tcc_comparison
> +             || (TREE_CODE (rhs) == code && TREE_TYPE (rhs) == truth_type)))
> +       {
> +         if ((result = fold_truth_andor_1 (loc, code, truth_type,
> +                                           TREE_OPERAND (lhs, 1), rhs,
> +                                           separatep
> +                                           ? separatep
> +                                           : NULL)) != 0)
> +           {
> +             /* We have combined the latter part of LHS with RHS.  If
> +                they were separate, the recursion already placed any
> +                remains of RHS in *SEPARATEP, otherwise they are all
> +                in RESULT, so we just have to prepend to result the
> +                former part of LHS.  */
> +             result = fold_build2_loc (loc, code, truth_type,
> +                                       TREE_OPERAND (lhs, 0), result);
> +             return result;
> +           }
> +         if ((result = fold_truth_andor_1 (loc, code, truth_type,
> +                                           TREE_OPERAND (lhs, 0), rhs,
> +                                           separatep
> +                                           ? separatep
> +                                           : &separate)) != 0)
> +           {
> +             /* We have combined the former part of LHS with RHS.  If
> +                they were separate, the recursive call will have
> +                placed remnants of RHS in *SEPARATEP.  If they
> +                wereń't, they will be in SEPARATE instead.  Append
> +                the latter part of LHS to the result, and then any
> +                remnants of RHS that we haven't passed on to the
> +                caller.  */
> +             result = fold_build2_loc (loc, code, truth_type,
> +                                       result, TREE_OPERAND (lhs, 1));
> +             if (separate)
> +               result = fold_build2_loc (loc, code, truth_type,
> +                                         result, separate);
> +             return result;
> +           }
> +       }
> +      else if (TREE_CODE_CLASS (lcode) == tcc_comparison
> +              && TREE_CODE (rhs) == code && TREE_TYPE (rhs) == truth_type)
> +       {
> +         if ((result = fold_truth_andor_1 (loc, code, truth_type,
> +                                           lhs, TREE_OPERAND (rhs, 0),
> +                                           separatep
> +                                           ? &separate
> +                                           : NULL)) != 0)
> +           {
> +             /* We have combined LHS with the former part of RHS.  If
> +                they were separate, have any remnants of RHS placed
> +                in separate, so that we can combine them with the
> +                latter part of RHS, and then send them back for the
> +                caller to handle.  If they were adjacent, we can just
> +                append the latter part of RHS to the RESULT.  */
> +             if (!separate)
> +               separate = TREE_OPERAND (rhs, 1);
> +             else
> +               separate = fold_build2_loc (loc, code, truth_type,
> +                                           separate, TREE_OPERAND (rhs, 1));
> +             if (separatep)
> +               *separatep = separate;
> +             else
> +               result = fold_build2_loc (loc, code, truth_type,
> +                                         result, separate);
> +             return result;
> +           }
> +         if ((result = fold_truth_andor_1 (loc, code, truth_type,
> +                                           lhs, TREE_OPERAND (rhs, 1),
> +                                           &separate)) != 0)
> +           {
> +             /* We have combined LHS with the latter part of RHS.
> +                They're definitely not adjacent, so we get the
> +                remains of RHS in SEPARATE, and then prepend the
> +                former part of RHS to it.  If LHS and RHS were
> +                already separate to begin with, we leave the remnants
> +                of RHS for the caller to deal with, otherwise we
> +                append them to the RESULT.  */
> +             if (!separate)
> +               separate = TREE_OPERAND (rhs, 0);
> +             else
> +               separate = fold_build2_loc (loc, code, truth_type,
> +                                           TREE_OPERAND (rhs, 0), separate);
> +             if (separatep)
> +               *separatep = separate;
> +             else
> +               result = fold_build2_loc (loc, code, truth_type,
> +                                         result, separate);
> +             return result;
> +           }
> +       }
> +
> +      return 0;
> +    }
>
>    ll_arg = TREE_OPERAND (lhs, 0);
>    lr_arg = TREE_OPERAND (lhs, 1);
> @@ -6278,22 +6608,24 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>
>    ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
>    volatilep = 0;
> +  int l_xor = prepare_xor (ll_arg, &lr_arg);
>    ll_inner = decode_field_reference (loc, &ll_arg,
>                                      &ll_bitsize, &ll_bitpos, &ll_mode,
>                                      &ll_unsignedp, &ll_reversep, &volatilep,
> -                                    &ll_mask, &ll_and_mask);
> +                                    &ll_mask, &ll_and_mask, l_xor);
>    lr_inner = decode_field_reference (loc, &lr_arg,
>                                      &lr_bitsize, &lr_bitpos, &lr_mode,
>                                      &lr_unsignedp, &lr_reversep, &volatilep,
> -                                    &lr_mask, &lr_and_mask);
> +                                    &lr_mask, &lr_and_mask, 2 * l_xor);
> +  int r_xor = prepare_xor (rl_arg, &rr_arg);
>    rl_inner = decode_field_reference (loc, &rl_arg,
>                                      &rl_bitsize, &rl_bitpos, &rl_mode,
>                                      &rl_unsignedp, &rl_reversep, &volatilep,
> -                                    &rl_mask, &rl_and_mask);
> +                                    &rl_mask, &rl_and_mask, r_xor);
>    rr_inner = decode_field_reference (loc, &rr_arg,
>                                      &rr_bitsize, &rr_bitpos, &rr_mode,
>                                      &rr_unsignedp, &rr_reversep, &volatilep,
> -                                    &rr_mask, &rr_and_mask);
> +                                    &rr_mask, &rr_and_mask, 2 * r_xor);
>
>    /* It must be true that the inner operation on the lhs of each
>       comparison must be the same if we are to be able to do anything.
> @@ -6349,6 +6681,72 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>         return 0;
>      }
>
> +  /* This will be bumped to 2 if any of the field pairs crosses an
> +     alignment boundary, so the merged compare has to be done in two
> +     parts.  */
> +  int parts = 1;
> +  /* Set to true if the second combined compare should come first,
> +     e.g., because the second original compare accesses a word that
> +     the first one doesn't, and the combined compares access those in
> +     cmp[0].  */
> +  bool first1 = false;
> +  /* Set to true if the first original compare is not the one being
> +     split.  */
> +  bool maybe_separate = false;
> +
> +  /* The following 2-dimensional arrays use the first index to
> +     identify left(0)- vs right(1)-hand compare operands, and the
> +     second one to identify merged compare parts.  */
> +  /* The memory loads or constants to be compared.  */
> +  tree ld_arg[2][2];
> +  /* The first bit of the corresponding inner object that the
> +     corresponding LD_ARG covers.  */
> +  HOST_WIDE_INT bitpos[2][2];
> +  /* The bit count starting at BITPOS that the corresponding LD_ARG
> +     covers.  */
> +  HOST_WIDE_INT bitsiz[2][2];
> +  /* The number of bits by which LD_ARG has already been shifted
> +     right, WRT mask.  */
> +  HOST_WIDE_INT shifted[2][2];
> +  /* The number of bits by which both LD_ARG and MASK need shifting to
> +     bring its least-significant bit to bit zero.  */
> +  HOST_WIDE_INT toshift[2][2];
> +  /* An additional mask to be applied to LD_ARG, to remove any bits
> +     that may have been loaded for use in another compare, but that
> +     don't belong in the corresponding compare.  */
> +  tree xmask[2][2] = {};
> +
> +  /* The combined compare or compares.  */
> +  tree cmp[2];
> +
> +  /* Consider we're comparing two non-contiguous fields of packed
> +     structs, both aligned at 32-bit boundaries:
> +
> +     ll_arg: an 8-bit field at offset 0
> +     lr_arg: a 16-bit field at offset 2
> +
> +     rl_arg: an 8-bit field at offset 1
> +     rr_arg: a 16-bit field at offset 3
> +
> +     We'll have r_split_load, because rr_arg straddles across an
> +     alignment boundary.
> +
> +     We'll want to have:
> +
> +     bitpos  = { {  0,  0 }, {  0, 32 } }
> +     bitsiz  = { { 32, 32 }, { 32,  8 } }
> +
> +     And, for little-endian:
> +
> +     shifted = { {  0,  0 }, {  0, 32 } }
> +     toshift = { {  0, 24 }, {  0,  0 } }
> +
> +     Or, for big-endian:
> +
> +     shifted = { {  0,  0 }, {  8,  0 } }
> +     toshift = { {  8,  0 }, {  0,  0 } }
> +  */
> +
>    /* See if we can find a mode that contains both fields being compared on
>       the left.  If we can't, fail.  Otherwise, update all constants and masks
>       to be relative to a field of that size.  */
> @@ -6357,11 +6755,41 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>    if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
>                       TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
>                       volatilep, &lnmode))
> -    return 0;
> +    {
> +      /* Consider the possibility of recombining loads if any of the
> +        fields straddles across an alignment boundary, so that either
> +        part can be loaded along with the other field.  */
> +      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
> +      HOST_WIDE_INT boundary = compute_split_boundary_from_align
> +       (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +
> +      if (boundary < 0
> +         || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> +                            align, BITS_PER_WORD, volatilep, &lnmode)
> +         || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> +                            align, BITS_PER_WORD, volatilep, &lnmode2))
> +       return 0;
> +
> +      l_split_load = true;
> +      parts = 2;
> +      if (ll_bitpos >= boundary)
> +       maybe_separate = first1 = true;
> +      else if (ll_bitpos + ll_bitsize <= boundary)
> +       maybe_separate = true;
> +    }
> +  else
> +    l_split_load = false;
>
>    lnbitsize = GET_MODE_BITSIZE (lnmode);
>    lnbitpos = first_bit & ~ (lnbitsize - 1);
> +  if (l_split_load)
> +    lnbitsize += GET_MODE_BITSIZE (lnmode2);
>    lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
> +  if (!lntype)
> +    {
> +      gcc_checking_assert (l_split_load);
> +      lntype = build_nonstandard_integer_type (lnbitsize, 1);
> +    }
>    xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
>
>    if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> @@ -6414,19 +6842,58 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>           || ll_reversep != lr_reversep
>           /* Make sure the two fields on the right
>              correspond to the left without being swapped.  */
> -         || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
> +         || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos
> +         || lnbitpos < 0)
>         return 0;
>
> +      bool r_split_load;
> +      scalar_int_mode rnmode2;
> +
>        first_bit = MIN (lr_bitpos, rr_bitpos);
>        end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
>        if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
>                           TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
>                           volatilep, &rnmode))
> -       return 0;
> +       {
> +         /* Consider the possibility of recombining loads if any of the
> +            fields straddles across an alignment boundary, so that either
> +            part can be loaded along with the other field.  */
> +         HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
> +         HOST_WIDE_INT boundary = compute_split_boundary_from_align
> +           (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
> +
> +         if (boundary < 0
> +             /* If we're to split both, make sure the split point is
> +                the same.  */
> +             || (l_split_load
> +                 && (boundary - lr_bitpos
> +                     != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
> +             || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> +                                align, BITS_PER_WORD, volatilep, &rnmode)
> +             || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> +                                align, BITS_PER_WORD, volatilep, &rnmode2))
> +           return 0;
> +
> +         r_split_load = true;
> +         parts = 2;
> +         if (lr_bitpos >= boundary)
> +           maybe_separate = first1 = true;
> +         else if (lr_bitpos + lr_bitsize <= boundary)
> +           maybe_separate = true;
> +       }
> +      else
> +       r_split_load = false;
>
>        rnbitsize = GET_MODE_BITSIZE (rnmode);
>        rnbitpos = first_bit & ~ (rnbitsize - 1);
> +      if (r_split_load)
> +       rnbitsize += GET_MODE_BITSIZE (rnmode2);
>        rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
> +      if (!rntype)
> +       {
> +         gcc_checking_assert (r_split_load);
> +         rntype = build_nonstandard_integer_type (rnbitsize, 1);
> +       }
>        xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
>
>        if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> @@ -6442,133 +6909,190 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>                                                             rntype, rr_mask),
>                              size_int (xrr_bitpos));
>
> -      /* Make a mask that corresponds to both fields being compared.
> -        Do this for both items being compared.  If the operands are the
> -        same size and the bits being compared are in the same position
> -        then we can do this by masking both and comparing the masked
> -        results.  */
> -      ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
>        lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
> -      if (lnbitsize == rnbitsize
> -         && xll_bitpos == xlr_bitpos
> -         && lnbitpos >= 0
> -         && rnbitpos >= 0)
> -       {
> -         lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
> -                                   lntype, lnbitsize, lnbitpos,
> -                                   ll_unsignedp || rl_unsignedp, ll_reversep);
> -         if (! all_ones_mask_p (ll_mask, lnbitsize))
> -           lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
> -
> -         rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
> -                                   rntype, rnbitsize, rnbitpos,
> -                                   lr_unsignedp || rr_unsignedp, lr_reversep);
> -         if (! all_ones_mask_p (lr_mask, rnbitsize))
> -           rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
> -
> -         return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> -       }
> -
> -      /* There is still another way we can do something:  If both pairs of
> -        fields being compared are adjacent, we may be able to make a wider
> -        field containing them both.
> -
> -        Note that we still must mask the lhs/rhs expressions.  Furthermore,
> -        the mask must be shifted to account for the shift done by
> -        make_bit_field_ref.  */
> -      if (((ll_bitsize + ll_bitpos == rl_bitpos
> -           && lr_bitsize + lr_bitpos == rr_bitpos)
> -          || (ll_bitpos == rl_bitpos + rl_bitsize
> -              && lr_bitpos == rr_bitpos + rr_bitsize))
> -         && ll_bitpos >= 0
> -         && rl_bitpos >= 0
> -         && lr_bitpos >= 0
> -         && rr_bitpos >= 0)
> -       {
> -         tree type;
> -
> -         lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
> -                                   ll_bitsize + rl_bitsize,
> -                                   MIN (ll_bitpos, rl_bitpos),
> -                                   ll_unsignedp, ll_reversep);
> -         rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
> -                                   lr_bitsize + rr_bitsize,
> -                                   MIN (lr_bitpos, rr_bitpos),
> -                                   lr_unsignedp, lr_reversep);
> -
> -         ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
> -                                size_int (MIN (xll_bitpos, xrl_bitpos)));
> -         lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
> -                                size_int (MIN (xlr_bitpos, xrr_bitpos)));
> -
> -         /* Convert to the smaller type before masking out unwanted bits.  */
> -         type = lntype;
> -         if (lntype != rntype)
> +
> +      toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
> +      shifted[1][0] = 0;
> +
> +      if (!r_split_load)
> +       {
> +         bitpos[1][0] = rnbitpos;
> +         bitsiz[1][0] = rnbitsize;
> +         ld_arg[1][0] = make_bit_field_ref (loc, lr_inner, lr_arg,
> +                                            rntype, rnbitsize, rnbitpos,
> +                                            lr_unsignedp || rr_unsignedp,
> +                                            lr_reversep);
> +       }
> +
> +      if (parts > 1)
> +       {
> +         if (r_split_load)
> +           build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                             shifted[1], loc, lr_inner, lr_arg,
> +                             rnmode, rnmode2, rnbitpos, lr_reversep);
> +         else
> +           reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                             shifted[1], xmask[1],
> +                             lnbitpos + GET_MODE_BITSIZE (lnmode)
> +                             - ll_bitpos + lr_bitpos, lr_reversep);
> +       }
> +    }
> +  else
> +    {
> +      /* Handle the case of comparisons with constants.  If there is
> +        something in common between the masks, those bits of the
> +        constants must be the same.  If not, the condition is always
> +        false.  Test for this to avoid generating incorrect code
> +        below.  */
> +      result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> +      if (! integer_zerop (result)
> +         && simple_cst_equal (const_binop (BIT_AND_EXPR,
> +                                           result, l_const),
> +                              const_binop (BIT_AND_EXPR,
> +                                           result, r_const)) != 1)
> +       {
> +         if (wanted_code == NE_EXPR)
>             {
> -             if (lnbitsize > rnbitsize)
> -               {
> -                 lhs = fold_convert_loc (loc, rntype, lhs);
> -                 ll_mask = fold_convert_loc (loc, rntype, ll_mask);
> -                 type = rntype;
> -               }
> -             else if (lnbitsize < rnbitsize)
> -               {
> -                 rhs = fold_convert_loc (loc, lntype, rhs);
> -                 lr_mask = fold_convert_loc (loc, lntype, lr_mask);
> -                 type = lntype;
> -               }
> +             warning (0,
> +                      "%<or%> of unmatched not-equal tests"
> +                      " is always 1");
> +             return constant_boolean_node (true, truth_type);
>             }
> +         else
> +           {
> +             warning (0,
> +                      "%<and%> of mutually exclusive equal-tests"
> +                      " is always 0");
> +             return constant_boolean_node (false, truth_type);
> +           }
> +       }
>
> -         if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
> -           lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
> +      if (lnbitpos < 0)
> +       return 0;
>
> -         if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
> -           rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
> +      /* The constants are combined so as to line up with the loaded
> +        field, so use the same parameters.  */
> +      ld_arg[1][0] = const_binop (BIT_IOR_EXPR, l_const, r_const);
> +      toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
> +      shifted[1][0] = 0;
> +      bitpos[1][0] = lnbitpos;
> +      bitsiz[1][0] = lnbitsize;
>
> -         return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> -       }
> +      if (parts > 1)
> +       reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                         shifted[1], xmask[1],
> +                         lnbitpos + GET_MODE_BITSIZE (lnmode),
> +                         lr_reversep);
>
> -      return 0;
> +      lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1);
> +
> +      /* If the compiler thinks this is used uninitialized below, it's
> +        because it can't realize that parts can only be 2 when
> +        comparing wiht constants if l_split_load is also true.  This
> +        just silences the warning.  */
> +      rnbitpos = 0;
>      }
>
> -  /* Handle the case of comparisons with constants.  If there is something in
> -     common between the masks, those bits of the constants must be the same.
> -     If not, the condition is always false.  Test for this to avoid generating
> -     incorrect code below.  */
> -  result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> -  if (! integer_zerop (result)
> -      && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
> -                          const_binop (BIT_AND_EXPR, result, r_const)) != 1)
> +  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> +  toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
> +  shifted[0][0] = 0;
> +
> +  if (!l_split_load)
>      {
> -      if (wanted_code == NE_EXPR)
> +      bitpos[0][0] = lnbitpos;
> +      bitsiz[0][0] = lnbitsize;
> +      ld_arg[0][0] = make_bit_field_ref (loc, ll_inner, ll_arg,
> +                                        lntype, lnbitsize, lnbitpos,
> +                                        ll_unsignedp || rl_unsignedp,
> +                                        ll_reversep);
> +    }
> +
> +  if (parts > 1)
> +    {
> +      if (l_split_load)
> +       build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> +                         shifted[0], loc, ll_inner, ll_arg,
> +                         lnmode, lnmode2, lnbitpos, ll_reversep);
> +      else
> +       reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> +                         shifted[0], xmask[0],
> +                         rnbitpos + GET_MODE_BITSIZE (rnmode)
> +                         - lr_bitpos + ll_bitpos, ll_reversep);
> +    }
> +
> +  for (int i = 0; i < parts; i++)
> +    {
> +      tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
> +      tree mask[2] = { ll_mask, lr_mask };
> +
> +      for (int j = 0; j < 2; j++)
>         {
> -         warning (0, "%<or%> of unmatched not-equal tests is always 1");
> -         return constant_boolean_node (true, truth_type);
> +         /* Mask out the bits belonging to the other part.  */
> +         if (xmask[j][i])
> +           mask[j] = const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]);
> +
> +         if (shifted[j][i])
> +           {
> +             tree shiftsz = bitsize_int (shifted[j][i]);
> +             mask[j] = const_binop (RSHIFT_EXPR, mask[j], shiftsz);
> +           }
> +         mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]);
>         }
> -      else
> +
> +      HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
> +
> +      if (shift)
>         {
> -         warning (0, "%<and%> of mutually exclusive equal-tests is always 0");
> -         return constant_boolean_node (false, truth_type);
> +         int j;
> +         if (shift > 0)
> +           j = 0;
> +         else
> +           {
> +             j = 1;
> +             shift = -shift;
> +           }
> +
> +         tree shiftsz = bitsize_int (shift);
> +         op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]),
> +                                  op[j], shiftsz);
> +         mask[j] = const_binop (RSHIFT_EXPR, mask[j], shiftsz);
>         }
> -    }
>
> -  if (lnbitpos < 0)
> -    return 0;
> +      /* Convert to the smaller type before masking out unwanted
> +        bits.  */
> +      tree type = TREE_TYPE (op[0]);
> +      if (type != TREE_TYPE (op[1]))
> +       {
> +         int j = (TYPE_PRECISION (type)
> +                  < TYPE_PRECISION (TREE_TYPE (op[1])));
> +         if (!j)
> +           type = TREE_TYPE (op[1]);
> +         op[j] = fold_convert_loc (loc, type, op[j]);
> +         mask[j] = fold_convert_loc (loc, type, mask[j]);
> +       }
>
> -  /* Construct the expression we will return.  First get the component
> -     reference we will make.  Unless the mask is all ones the width of
> -     that field, perform the mask operation.  Then compare with the
> -     merged constant.  */
> -  result = make_bit_field_ref (loc, ll_inner, ll_arg,
> -                              lntype, lnbitsize, lnbitpos,
> -                              ll_unsignedp || rl_unsignedp, ll_reversep);
> +      for (int j = 0; j < 2; j++)
> +       if (! integer_all_onesp (mask[j]))
> +         op[j] = build2_loc (loc, BIT_AND_EXPR, type,
> +                             op[j], mask[j]);
>
> -  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> -  if (! all_ones_mask_p (ll_mask, lnbitsize))
> -    result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
> +      cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]);
> +    }
> +
> +  if (first1)
> +    std::swap (cmp[0], cmp[1]);
> +
> +  if (parts == 1)
> +    result = cmp[0];
> +  else if (!separatep || !maybe_separate)
> +    result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]);
> +  else
> +    {
> +      result = cmp[0];
> +      *separatep = cmp[1];
> +    }
>
> -  return build2_loc (loc, wanted_code, truth_type, result,
> -                    const_binop (BIT_IOR_EXPR, l_const, r_const));
> +  return result;
>  }
>
>  /* T is an integer expression that is being multiplied, divided, or taken a
> @@ -9166,15 +9690,7 @@ fold_truth_andor (location_t loc, enum tree_code code, tree type,
>         return fold_build2_loc (loc, code, type, arg0, tem);
>      }
>
> -  /* Check for the possibility of merging component references.  If our
> -     lhs is another similar operation, try to merge its rhs with our
> -     rhs.  Then try to merge our lhs and rhs.  */
> -  if (TREE_CODE (arg0) == code
> -      && (tem = fold_truth_andor_1 (loc, code, type,
> -                                   TREE_OPERAND (arg0, 1), arg1)) != 0)
> -    return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
> -
> -  if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0)
> +  if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1, NULL)) != 0)
>      return tem;
>
>    bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT;
> diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c b/gcc/testsuite/gcc.dg/field-merge-1.c
> new file mode 100644
> index 00000000..1818e104
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-1.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
> +
> +/* Check that field loads compared with constants are merged, even if
> +   tested out of order, and when fields straddle across alignment
> +   boundaries.  */
> +
> +struct TL {
> +  unsigned char p;
> +  unsigned int a;
> +  unsigned char q;
> +  unsigned int b;
> +  unsigned char r;
> +  unsigned int c;
> +  unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("little-endian")));
> +
> +struct TB {
> +  unsigned char p;
> +  unsigned int a;
> +  unsigned char q;
> +  unsigned int b;
> +  unsigned char r;
> +  unsigned int c;
> +  unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
> +struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
> +
> +void f (void) {
> +  /* Which words of    | vL | vB | */
> +  /* are accessed by   |0123|0123| */
> +  if (0 /* the tests?  |    |    | */
> +      || vL.p != vc /* |*   |    | */
> +      || vB.p != vc /* |    |*   | */
> +      || vL.s != vc /* |   *|    | */
> +      || vB.q != vc /* |    | *  | */
> +      || vL.a != vi /* |^*  |    | */
> +      || vB.b != vi /* |    | ^* | */
> +      || vL.c != vi /* |  *^|    | */
> +      || vB.c != vi /* |    |  ^*| */
> +      || vL.b != vi /* | ^^ |    | */
> +      || vL.q != vc /* | ^  |    | */
> +      || vB.a != vi /* |    |^^  | */
> +      || vB.r != vc /* |    |  ^ | */
> +      || vB.s != vc /* |    |   ^| */
> +      || vL.r != vc /* |  ^ |    | */
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
> +/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || rs6000-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c b/gcc/testsuite/gcc.dg/field-merge-2.c
> new file mode 100644
> index 00000000..80c573b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct TL {
> +  unsigned short a;
> +  unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +struct TB {
> +  unsigned char p;
> +  unsigned short a;
> +  unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +#define vc 0xaa
> +
> +struct TL vL = { vc, vc };
> +struct TB vB = { vc, vc, vc };
> +
> +void f (void) {
> +  if (0
> +      || vL.b != vB.b
> +      || vL.a != vB.a
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c b/gcc/testsuite/gcc.dg/field-merge-3.c
> new file mode 100644
> index 00000000..8fdbb9a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-3.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
> +
> +struct T1 {
> +  unsigned char p[2];
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +struct T2 {
> +  unsigned short p;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
> +struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
> +      || v1.a != v2.a
> +      || (v1.z ^ v2.z) & 0xff00ff00 != 0
> +      || (v1.z ^ v2.z) & 0x00ff00ff != 0)
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c b/gcc/testsuite/gcc.dg/field-merge-4.c
> new file mode 100644
> index 00000000..c629069
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-4.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> +  unsigned int zn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +struct T2 {
> +  unsigned int zn;
> +  unsigned char rn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p != v2.p
> +      || v1.a != v2.a
> +      || v1.z != v2.z
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c b/gcc/testsuite/gcc.dg/field-merge-5.c
> new file mode 100644
> index 00000000..1580b14bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-5.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> +  unsigned int zn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +struct T2 {
> +  unsigned int zn;
> +  unsigned char rn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p != v2.p
> +      || v1.a != v2.a
> +      || v1.z != v2.z
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
>
>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
Alexandre Oliva Sept. 29, 2020, 7:23 a.m. UTC | #2
On Sep 28, 2020, Richard Biener <richard.guenther@gmail.com> wrote:

> On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva <oliva@adacore.com> wrote:

>> This patch introduces various improvements to the logic that merges
>> field compares.

> Sorry for throwing a wrench in here but doing this kind of transform
> during GENERIC folding is considered a bad thing.

Ugh.  Even tree-ifcombine itself relies on tree folding logic to perform
this and various similar sorts of optimizations.

Is the issue just that we don't want to perform this kind of
transformation while still in GENERIC, or that the logic must not even
be there?  I ask because it wouldn't be too hard to disable unwanted
cases of folding while in GENERIC, and extending it to follow SSA defs
so that ifcombine would those optimizations back at an acceptable spot.

Please help me understand what it is that makes this change as it is
undesirable, so that I can figure out how to do it in an acceptable way,
and justify the additional time and effort it will take.


I *think* ifcombine could even be extended so as to reuse the
separate-test logic I put in, by looking for non-immediate dominating
outer conditions for the inner condition.  A further modified version of
fold_truth_andor_1 could then be used to combine the separate tests.


As for reassoc...  It doesn't seem a good fit at all for reassociating
short-circuiting boolean operations, that normally get translated as
multiple basic blocks.
Richard Biener Sept. 29, 2020, 8:37 a.m. UTC | #3
On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Sep 28, 2020, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva <oliva@adacore.com> wrote:
>
> >> This patch introduces various improvements to the logic that merges
> >> field compares.
>
> > Sorry for throwing a wrench in here but doing this kind of transform
> > during GENERIC folding is considered a bad thing.
>
> Ugh.  Even tree-ifcombine itself relies on tree folding logic to perform
> this and various similar sorts of optimizations.

ifcombine should stop using fold*, yeah - there was a start last year
making maybe_fold_and/or_comparisons rely on match.pd
patterns instead but it didn't went all the way disabling dispatching
to fold. I also think it will not end up using the simplifications using
loads.

> Is the issue just that we don't want to perform this kind of
> transformation while still in GENERIC, or that the logic must not even
> be there?  I ask because it wouldn't be too hard to disable unwanted
> cases of folding while in GENERIC, and extending it to follow SSA defs
> so that ifcombine would those optimizations back at an acceptable spot.

One of of the issues of these "combine two memory ops using $fancy
for branches" transforms is that they are quite premature, making
optimizations like SRA, value-range propagation, jump threading, etc.
very much harder.  Not to mention diagnostics that rely on the IL
being close to the AST (well, I regularly argue those are broken
expectations, so ;))

> Please help me understand what it is that makes this change as it is
> undesirable, so that I can figure out how to do it in an acceptable way,
> and justify the additional time and effort it will take.

The desire is to move the existing and related transforms
(optimize_bit_field_compare) to GIMPLE.  By making them
more powerful this task is made more difficult.

Specifically your patch seems to introduce splitting of loads
at alignment boundaries which is even more pemature
since some targets would likely prefer an unaligned single
load plus we try to not deviate too much in the initial
phase of compilation depending on target capabilities.
It also seems to be sth not suited for GENERIC which is
somehow our highlevel AST.

> I *think* ifcombine could even be extended so as to reuse the
> separate-test logic I put in, by looking for non-immediate dominating
> outer conditions for the inner condition.  A further modified version of
> fold_truth_andor_1 could then be used to combine the separate tests.

I think the structure of ifcombine doesn't exactly match what
fold_truth_andor does but since ifcombine first matches
the if "structure" a common worker could be indeed factored out
(but as said above, I see little value in retaining fold_truth_andor
on GENERIC).

>
> As for reassoc...  It doesn't seem a good fit at all for reassociating
> short-circuiting boolean operations, that normally get translated as
> multiple basic blocks.

Yeah - it did accumulate quite some range test optimizations but
I think it doesn't cover multiple BBs.

Richard.

> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
Alexandre Oliva Sept. 29, 2020, 1:07 p.m. UTC | #4
On Sep 29, 2020, Richard Biener <richard.guenther@gmail.com> wrote:

> On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva <oliva@adacore.com> wrote:

>> On Sep 28, 2020, Richard Biener <richard.guenther@gmail.com> wrote:

> ifcombine should stop using fold*, yeah

Wow, that's quite a lot of work for no expected improvement in codegen.
I don't expect to be able to justify such an undertaking :-(

> I also think it will not end up using the simplifications using loads.

Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
the inner block.  that won't do when the whole point is to merge loads
from memory.

That seems excessive.  Since we rule out any memory-changing side
effects, I suppose we could get away with checking for volatile operands
there.  Then, adding just a little SSA_DEF chasing, I believe I could
bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
without much difficulty, and then we could do away with at least that
part of fold_truth_andor.


> Specifically your patch seems to introduce splitting of loads
> at alignment boundaries

... when there's another compare involving a load from either side of
the crossed alignment boundary.  Even on arches that can do unaligned
loads, the result is no worse, and if there are multiple fields crossing
consecutive alignment boundaries, the codegen and performance difference
can be pretty significant.



>> I *think* ifcombine could even be extended so as to reuse the
>> separate-test logic I put in, by looking for non-immediate dominating
>> outer conditions for the inner condition.  A further modified version of
>> fold_truth_andor_1 could then be used to combine the separate tests.

> I think the structure of ifcombine doesn't exactly match what
> fold_truth_andor does

How so?  AFAICT ifcombine_ifandif deals exactly with the (gimplified
version of the) structure I described in the patch that started the
thread:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)
Alexandre Oliva Sept. 29, 2020, 4:07 p.m. UTC | #5
On Sep 29, 2020, Alexandre Oliva <oliva@adacore.com> wrote:

> Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
> the inner block.  that won't do when the whole point is to merge loads
> from memory.

> That seems excessive.  Since we rule out any memory-changing side
> effects, I suppose we could get away with checking for volatile operands
> there.  Then, adding just a little SSA_DEF chasing, I believe I could
> bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
> without much difficulty, and then we could do away with at least that
> part of fold_truth_andor.

Confirmed, a very ugly prototype seems to work!
Richard Biener Sept. 30, 2020, 6:25 a.m. UTC | #6
On Tue, Sep 29, 2020 at 3:07 PM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Sep 29, 2020, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> >> On Sep 28, 2020, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > ifcombine should stop using fold*, yeah
>
> Wow, that's quite a lot of work for no expected improvement in codegen.
> I don't expect to be able to justify such an undertaking :-(
>
> > I also think it will not end up using the simplifications using loads.
>
> Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
> the inner block.  that won't do when the whole point is to merge loads
> from memory.
>
> That seems excessive.  Since we rule out any memory-changing side
> effects, I suppose we could get away with checking for volatile operands
> there.  Then, adding just a little SSA_DEF chasing, I believe I could
> bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
> without much difficulty, and then we could do away with at least that
> part of fold_truth_andor.

The current restrictions were for sure to make my life easier at start
when implementing the pass ;)  Note that you have to watch out
for short-circuited stmts that may trap or invoke undefined behavior
at runtime.

> > Specifically your patch seems to introduce splitting of loads
> > at alignment boundaries
>
> ... when there's another compare involving a load from either side of
> the crossed alignment boundary.  Even on arches that can do unaligned
> loads, the result is no worse, and if there are multiple fields crossing
> consecutive alignment boundaries, the codegen and performance difference
> can be pretty significant.

Ah, OK - I didn't look that closely.

> >> I *think* ifcombine could even be extended so as to reuse the
> >> separate-test logic I put in, by looking for non-immediate dominating
> >> outer conditions for the inner condition.  A further modified version of
> >> fold_truth_andor_1 could then be used to combine the separate tests.
>
> > I think the structure of ifcombine doesn't exactly match what
> > fold_truth_andor does
>
> How so?  AFAICT ifcombine_ifandif deals exactly with the (gimplified
> version of the) structure I described in the patch that started the
> thread:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

Indeed.

>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
Alexandre Oliva Oct. 2, 2020, 8:43 a.m. UTC | #7
Here's what I got so far, passing regstrap except for field-merge-1.c,
that relies on combining non-adjacent compares, which I haven't
implemented yet.

I had to retain some parts of fold_truth_andor_1 to avoid regressions in
gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c.
gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required
turning sign tests into masks.

I moved the field-compare-merging bits of fold_truth_andor_1, and
auxiliary functions for it, into fold_truth_andor_maybe_separate, though
the ability to deal with non-adjacent compares is no longer used, at
least for the time being.

I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in
maybe_fold_and_comparisons, but...  the fact that it may return a tree
that isn't a gimple condexpr, e.g. when there is masking or shifting,
and that requires re-gimplification, suggests moving it to
tree-ssa-ifcombine.c where this can be dealt with, possibly even for
non-adjacent compares.  Right now, there is code in ifcombine to
re-gimplify results from the regular folder, which I've extended, for
the time being, to maybe_fold_and_comparisons results as well.

I've reworked decode_field_reference to seek and "substitute" SSA DEFs
in a way that avoids building new trees, but the test for
substitutability feels incomplete: there's nothing to ensure that vuses
aren't brought from much-earlier blocks, and that there couldn't
possibly be an intervening store.  I suspect I will have to give these
pieces a little more info for it to be able to tell whether the memory
accesses, if moved, would still get the same value.  Is there anything
easily reusable for this sort of test?

As for fields crossing alignment boundaries, the two-tests condition
currently returned by fold_truth_andor_maybe_separate, that ends up
gimplified into a new block, causes the chain of ifcombine optimizations
to stop, something that I'm yet to investigate.  My plan is to rearrange
ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and
to handle such resplit conditions by inserting one in each of the
preexisting blocks, to simplify the gimplification and in preparation
for combining non-adjacent compares, if we wish to do that.  Do we?  I
was convinced that it was a safe optimization, because of the absence of
intervening side effects and the undefinedness of intervening trapping
memory accesses, but combining later tests that access a word with
earlier tests that access the same word may cause intervening trapping
accesses to be skipped.  Is this a possibility we should enable or
disable on e.g. a per-language basis, avoid altogether, or ...  any
other suggestions?

Richi, is this sort of what you had in mind, or were you thinking of
something for match.pd or so?  Is match.pd even able to perform
cross-block multi-condition matches?

Any further advice as to my plans above?

Thanks,


diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0cc80ad..47b5419 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -125,7 +125,6 @@ static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
 static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree);
-static tree unextend (tree, int, int, tree);
 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
 static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
 static tree fold_binary_op_with_conditional_arg (location_t,
@@ -4353,7 +4352,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
    is the original memory reference used to preserve the alias set of
    the access.  */
 
-static tree
+tree
 make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
 		    HOST_WIDE_INT bitsize, poly_int64 bitpos,
 		    int unsignedp, int reversep)
@@ -4603,132 +4602,6 @@ optimize_bit_field_compare (location_t loc, enum tree_code code,
   return lhs;
 }
 
-/* Subroutine for fold_truth_andor_1: decode a field reference.
-
-   If EXP is a comparison reference, we return the innermost reference.
-
-   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
-   set to the starting bit number.
-
-   If the innermost field can be completely contained in a mode-sized
-   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
-
-   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
-   otherwise it is not changed.
-
-   *PUNSIGNEDP is set to the signedness of the field.
-
-   *PREVERSEP is set to the storage order of the field.
-
-   *PMASK is set to the mask used.  This is either contained in a
-   BIT_AND_EXPR or derived from the width of the field.
-
-   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
-
-   Return 0 if this is not a component reference or is one that we can't
-   do anything with.  */
-
-static tree
-decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
-			HOST_WIDE_INT *pbitpos, machine_mode *pmode,
-			int *punsignedp, int *preversep, int *pvolatilep,
-			tree *pmask, tree *pand_mask)
-{
-  tree exp = *exp_;
-  tree outer_type = 0;
-  tree and_mask = 0;
-  tree mask, inner, offset;
-  tree unsigned_type;
-  unsigned int precision;
-
-  /* All the optimizations using this function assume integer fields.
-     There are problems with FP fields since the type_for_size call
-     below can fail for, e.g., XFmode.  */
-  if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
-    return NULL_TREE;
-
-  /* We are interested in the bare arrangement of bits, so strip everything
-     that doesn't affect the machine mode.  However, record the type of the
-     outermost expression if it may matter below.  */
-  if (CONVERT_EXPR_P (exp)
-      || TREE_CODE (exp) == NON_LVALUE_EXPR)
-    outer_type = TREE_TYPE (exp);
-  STRIP_NOPS (exp);
-
-  if (TREE_CODE (exp) == BIT_AND_EXPR)
-    {
-      and_mask = TREE_OPERAND (exp, 1);
-      exp = TREE_OPERAND (exp, 0);
-      STRIP_NOPS (exp); STRIP_NOPS (and_mask);
-      if (TREE_CODE (and_mask) != INTEGER_CST)
-	return NULL_TREE;
-    }
-
-  poly_int64 poly_bitsize, poly_bitpos;
-  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
-			       pmode, punsignedp, preversep, pvolatilep);
-  if ((inner == exp && and_mask == 0)
-      || !poly_bitsize.is_constant (pbitsize)
-      || !poly_bitpos.is_constant (pbitpos)
-      || *pbitsize < 0
-      || offset != 0
-      || TREE_CODE (inner) == PLACEHOLDER_EXPR
-      /* Reject out-of-bound accesses (PR79731).  */
-      || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
-	  && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
-			       *pbitpos + *pbitsize) < 0))
-    return NULL_TREE;
-
-  unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
-  if (unsigned_type == NULL_TREE)
-    return NULL_TREE;
-
-  *exp_ = exp;
-
-  /* If the number of bits in the reference is the same as the bitsize of
-     the outer type, then the outer type gives the signedness. Otherwise
-     (in case of a small bitfield) the signedness is unchanged.  */
-  if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
-    *punsignedp = TYPE_UNSIGNED (outer_type);
-
-  /* Compute the mask to access the bitfield.  */
-  precision = TYPE_PRECISION (unsigned_type);
-
-  mask = build_int_cst_type (unsigned_type, -1);
-
-  mask = const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
-  mask = const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
-
-  /* Merge it with the mask we found in the BIT_AND_EXPR, if any.  */
-  if (and_mask != 0)
-    mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
-			fold_convert_loc (loc, unsigned_type, and_mask), mask);
-
-  *pmask = mask;
-  *pand_mask = and_mask;
-  return inner;
-}
-
-/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
-   bit positions and MASK is SIGNED.  */
-
-static bool
-all_ones_mask_p (const_tree mask, unsigned int size)
-{
-  tree type = TREE_TYPE (mask);
-  unsigned int precision = TYPE_PRECISION (type);
-
-  /* If this function returns true when the type of the mask is
-     UNSIGNED, then there will be errors.  In particular see
-     gcc.c-torture/execute/990326-1.c.  There does not appear to be
-     any documentation paper trail as to why this is so.  But the pre
-     wide-int worked with that restriction and it has been preserved
-     here.  */
-  if (size > precision || TYPE_SIGN (type) == UNSIGNED)
-    return false;
-
-  return wi::mask (size, false, precision) == wi::to_wide (mask);
-}
 
 /* Subroutine for fold: determine if VAL is the INTEGER_CONST that
    represents the sign bit of EXP's type.  If EXP represents a sign
@@ -6008,48 +5881,6 @@ fold_range_test (location_t loc, enum tree_code code, tree type,
   return 0;
 }
 
-/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
-   bit value.  Arrange things so the extra bits will be set to zero if and
-   only if C is signed-extended to its full width.  If MASK is nonzero,
-   it is an INTEGER_CST that should be AND'ed with the extra bits.  */
-
-static tree
-unextend (tree c, int p, int unsignedp, tree mask)
-{
-  tree type = TREE_TYPE (c);
-  int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
-  tree temp;
-
-  if (p == modesize || unsignedp)
-    return c;
-
-  /* We work by getting just the sign bit into the low-order bit, then
-     into the high-order bit, then sign-extend.  We then XOR that value
-     with C.  */
-  temp = build_int_cst (TREE_TYPE (c),
-			wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
-
-  /* We must use a signed type in order to get an arithmetic right shift.
-     However, we must also avoid introducing accidental overflows, so that
-     a subsequent call to integer_zerop will work.  Hence we must
-     do the type conversion here.  At this point, the constant is either
-     zero or one, and the conversion to a signed type can never overflow.
-     We could get an overflow if this conversion is done anywhere else.  */
-  if (TYPE_UNSIGNED (type))
-    temp = fold_convert (signed_type_for (type), temp);
-
-  temp = const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
-  temp = const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
-  if (mask != 0)
-    temp = const_binop (BIT_AND_EXPR, temp,
-			fold_convert (TREE_TYPE (c), mask));
-  /* If necessary, convert the type back to match the type of C.  */
-  if (TYPE_UNSIGNED (type))
-    temp = fold_convert (type, temp);
-
-  return fold_convert (type, const_binop (BIT_XOR_EXPR, c, temp));
-}
-
 /* For an expression that has the form
      (A && B) || ~B
    or
@@ -6119,20 +5950,13 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
 			    lhs, rhs);
   return NULL_TREE;
 }
-
+
 /* Find ways of folding logical expressions of LHS and RHS:
    Try to merge two comparisons to the same innermost item.
    Look for range tests like "ch >= '0' && ch <= '9'".
    Look for combinations of simple terms on machines with expensive branches
    and evaluate the RHS unconditionally.
 
-   For example, if we have p->a == 2 && p->b == 4 and we can make an
-   object large enough to span both A and B, we can do this with a comparison
-   against the object ANDed with the a mask.
-
-   If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
-   operations to do this with one comparison.
-
    We check for both normal comparisons and the BIT_AND_EXPRs made this by
    function and the one above.
 
@@ -6157,24 +5981,9 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
      convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
      comparison for one-bit fields.  */
 
-  enum tree_code wanted_code;
   enum tree_code lcode, rcode;
   tree ll_arg, lr_arg, rl_arg, rr_arg;
-  tree ll_inner, lr_inner, rl_inner, rr_inner;
-  HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
-  HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
-  HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
-  HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
-  int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
-  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
-  machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
-  scalar_int_mode lnmode, rnmode;
-  tree ll_mask, lr_mask, rl_mask, rr_mask;
-  tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
-  tree l_const, r_const;
-  tree lntype, rntype, result;
-  HOST_WIDE_INT first_bit, end_bit;
-  int volatilep;
+  tree result;
 
   /* Start by getting the comparison codes.  Fail if anything is volatile.
      If one operand is a BIT_AND_EXPR with the constant one, treat it as if
@@ -6269,306 +6078,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 			   build_int_cst (TREE_TYPE (ll_arg), 0));
     }
 
-  /* See if the comparisons can be merged.  Then get all the parameters for
-     each side.  */
-
-  if ((lcode != EQ_EXPR && lcode != NE_EXPR)
-      || (rcode != EQ_EXPR && rcode != NE_EXPR))
-    return 0;
-
-  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
-  volatilep = 0;
-  ll_inner = decode_field_reference (loc, &ll_arg,
-				     &ll_bitsize, &ll_bitpos, &ll_mode,
-				     &ll_unsignedp, &ll_reversep, &volatilep,
-				     &ll_mask, &ll_and_mask);
-  lr_inner = decode_field_reference (loc, &lr_arg,
-				     &lr_bitsize, &lr_bitpos, &lr_mode,
-				     &lr_unsignedp, &lr_reversep, &volatilep,
-				     &lr_mask, &lr_and_mask);
-  rl_inner = decode_field_reference (loc, &rl_arg,
-				     &rl_bitsize, &rl_bitpos, &rl_mode,
-				     &rl_unsignedp, &rl_reversep, &volatilep,
-				     &rl_mask, &rl_and_mask);
-  rr_inner = decode_field_reference (loc, &rr_arg,
-				     &rr_bitsize, &rr_bitpos, &rr_mode,
-				     &rr_unsignedp, &rr_reversep, &volatilep,
-				     &rr_mask, &rr_and_mask);
-
-  /* It must be true that the inner operation on the lhs of each
-     comparison must be the same if we are to be able to do anything.
-     Then see if we have constants.  If not, the same must be true for
-     the rhs's.  */
-  if (volatilep
-      || ll_reversep != rl_reversep
-      || ll_inner == 0 || rl_inner == 0
-      || ! operand_equal_p (ll_inner, rl_inner, 0))
-    return 0;
-
-  if (TREE_CODE (lr_arg) == INTEGER_CST
-      && TREE_CODE (rr_arg) == INTEGER_CST)
-    {
-      l_const = lr_arg, r_const = rr_arg;
-      lr_reversep = ll_reversep;
-    }
-  else if (lr_reversep != rr_reversep
-	   || lr_inner == 0 || rr_inner == 0
-	   || ! operand_equal_p (lr_inner, rr_inner, 0))
-    return 0;
-  else
-    l_const = r_const = 0;
-
-  /* If either comparison code is not correct for our logical operation,
-     fail.  However, we can convert a one-bit comparison against zero into
-     the opposite comparison against that bit being set in the field.  */
-
-  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
-  if (lcode != wanted_code)
-    {
-      if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
-	{
-	  /* Make the left operand unsigned, since we are only interested
-	     in the value of one bit.  Otherwise we are doing the wrong
-	     thing below.  */
-	  ll_unsignedp = 1;
-	  l_const = ll_mask;
-	}
-      else
-	return 0;
-    }
-
-  /* This is analogous to the code for l_const above.  */
-  if (rcode != wanted_code)
-    {
-      if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
-	{
-	  rl_unsignedp = 1;
-	  r_const = rl_mask;
-	}
-      else
-	return 0;
-    }
-
-  /* See if we can find a mode that contains both fields being compared on
-     the left.  If we can't, fail.  Otherwise, update all constants and masks
-     to be relative to a field of that size.  */
-  first_bit = MIN (ll_bitpos, rl_bitpos);
-  end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
-		      TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
-		      volatilep, &lnmode))
-    return 0;
-
-  lnbitsize = GET_MODE_BITSIZE (lnmode);
-  lnbitpos = first_bit & ~ (lnbitsize - 1);
-  lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
-  xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
-
-  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
-    {
-      xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
-      xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
-    }
-
-  ll_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, ll_mask),
-			 size_int (xll_bitpos));
-  rl_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, rl_mask),
-			 size_int (xrl_bitpos));
-
-  if (l_const)
-    {
-      l_const = fold_convert_loc (loc, lntype, l_const);
-      l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
-      l_const = const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
-      if (! integer_zerop (const_binop (BIT_AND_EXPR, l_const,
-					fold_build1_loc (loc, BIT_NOT_EXPR,
-						     lntype, ll_mask))))
-	{
-	  warning (0, "comparison is always %d", wanted_code == NE_EXPR);
-
-	  return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
-	}
-    }
-  if (r_const)
-    {
-      r_const = fold_convert_loc (loc, lntype, r_const);
-      r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
-      r_const = const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
-      if (! integer_zerop (const_binop (BIT_AND_EXPR, r_const,
-					fold_build1_loc (loc, BIT_NOT_EXPR,
-						     lntype, rl_mask))))
-	{
-	  warning (0, "comparison is always %d", wanted_code == NE_EXPR);
-
-	  return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
-	}
-    }
-
-  /* If the right sides are not constant, do the same for it.  Also,
-     disallow this optimization if a size, signedness or storage order
-     mismatch occurs between the left and right sides.  */
-  if (l_const == 0)
-    {
-      if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
-	  || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
-	  || ll_reversep != lr_reversep
-	  /* Make sure the two fields on the right
-	     correspond to the left without being swapped.  */
-	  || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
-	return 0;
-
-      first_bit = MIN (lr_bitpos, rr_bitpos);
-      end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
-			  TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
-			  volatilep, &rnmode))
-	return 0;
-
-      rnbitsize = GET_MODE_BITSIZE (rnmode);
-      rnbitpos = first_bit & ~ (rnbitsize - 1);
-      rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
-      xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
-
-      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
-	{
-	  xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
-	  xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
-	}
-
-      lr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
-							    rntype, lr_mask),
-			     size_int (xlr_bitpos));
-      rr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
-							    rntype, rr_mask),
-			     size_int (xrr_bitpos));
-
-      /* Make a mask that corresponds to both fields being compared.
-	 Do this for both items being compared.  If the operands are the
-	 same size and the bits being compared are in the same position
-	 then we can do this by masking both and comparing the masked
-	 results.  */
-      ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
-      lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
-      if (lnbitsize == rnbitsize
-	  && xll_bitpos == xlr_bitpos
-	  && lnbitpos >= 0
-	  && rnbitpos >= 0)
-	{
-	  lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
-				    lntype, lnbitsize, lnbitpos,
-				    ll_unsignedp || rl_unsignedp, ll_reversep);
-	  if (! all_ones_mask_p (ll_mask, lnbitsize))
-	    lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
-
-	  rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
-				    rntype, rnbitsize, rnbitpos,
-				    lr_unsignedp || rr_unsignedp, lr_reversep);
-	  if (! all_ones_mask_p (lr_mask, rnbitsize))
-	    rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
-
-	  return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
-	}
-
-      /* There is still another way we can do something:  If both pairs of
-	 fields being compared are adjacent, we may be able to make a wider
-	 field containing them both.
-
-	 Note that we still must mask the lhs/rhs expressions.  Furthermore,
-	 the mask must be shifted to account for the shift done by
-	 make_bit_field_ref.  */
-      if (((ll_bitsize + ll_bitpos == rl_bitpos
-	    && lr_bitsize + lr_bitpos == rr_bitpos)
-	   || (ll_bitpos == rl_bitpos + rl_bitsize
-	       && lr_bitpos == rr_bitpos + rr_bitsize))
-	  && ll_bitpos >= 0
-	  && rl_bitpos >= 0
-	  && lr_bitpos >= 0
-	  && rr_bitpos >= 0)
-	{
-	  tree type;
-
-	  lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
-				    ll_bitsize + rl_bitsize,
-				    MIN (ll_bitpos, rl_bitpos),
-				    ll_unsignedp, ll_reversep);
-	  rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
-				    lr_bitsize + rr_bitsize,
-				    MIN (lr_bitpos, rr_bitpos),
-				    lr_unsignedp, lr_reversep);
-
-	  ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
-				 size_int (MIN (xll_bitpos, xrl_bitpos)));
-	  lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
-				 size_int (MIN (xlr_bitpos, xrr_bitpos)));
-
-	  /* Convert to the smaller type before masking out unwanted bits.  */
-	  type = lntype;
-	  if (lntype != rntype)
-	    {
-	      if (lnbitsize > rnbitsize)
-		{
-		  lhs = fold_convert_loc (loc, rntype, lhs);
-		  ll_mask = fold_convert_loc (loc, rntype, ll_mask);
-		  type = rntype;
-		}
-	      else if (lnbitsize < rnbitsize)
-		{
-		  rhs = fold_convert_loc (loc, lntype, rhs);
-		  lr_mask = fold_convert_loc (loc, lntype, lr_mask);
-		  type = lntype;
-		}
-	    }
-
-	  if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
-	    lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
-
-	  if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
-	    rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
-
-	  return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
-	}
-
-      return 0;
-    }
-
-  /* Handle the case of comparisons with constants.  If there is something in
-     common between the masks, those bits of the constants must be the same.
-     If not, the condition is always false.  Test for this to avoid generating
-     incorrect code below.  */
-  result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
-  if (! integer_zerop (result)
-      && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
-			   const_binop (BIT_AND_EXPR, result, r_const)) != 1)
-    {
-      if (wanted_code == NE_EXPR)
-	{
-	  warning (0, "%<or%> of unmatched not-equal tests is always 1");
-	  return constant_boolean_node (true, truth_type);
-	}
-      else
-	{
-	  warning (0, "%<and%> of mutually exclusive equal-tests is always 0");
-	  return constant_boolean_node (false, truth_type);
-	}
-    }
-
-  if (lnbitpos < 0)
-    return 0;
-
-  /* Construct the expression we will return.  First get the component
-     reference we will make.  Unless the mask is all ones the width of
-     that field, perform the mask operation.  Then compare with the
-     merged constant.  */
-  result = make_bit_field_ref (loc, ll_inner, ll_arg,
-			       lntype, lnbitsize, lnbitpos,
-			       ll_unsignedp || rl_unsignedp, ll_reversep);
-
-  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
-  if (! all_ones_mask_p (ll_mask, lnbitsize))
-    result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
-
-  return build2_loc (loc, wanted_code, truth_type, result,
-		     const_binop (BIT_IOR_EXPR, l_const, r_const));
+  return 0;
 }
 
 /* T is an integer expression that is being multiplied, divided, or taken a
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index c3fa4cb..c3657c0 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -67,6 +67,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-strlen.h"
 #include "varasm.h"
 
+/* ??? Move this to some header, it's defined in fold-const.c.  */
+extern tree
+make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
+		    HOST_WIDE_INT bitsize, poly_int64 bitpos,
+		    int unsignedp, int reversep);
+
 enum strlen_range_kind {
   /* Compute the exact constant string length.  */
   SRK_STRLEN,
@@ -5964,7 +5970,1139 @@ maybe_fold_comparisons_from_match_pd (tree type, enum tree_code code,
 
   return NULL_TREE;
 }
+
+/* Return TRUE if expression STMT is suitable for replacement.  ???
+   Same as ssa_is_replaceable_p, except that we don't insist it has a
+   single use.  */
+
+bool
+ssa_is_substitutable_p (gimple *stmt)
+{
+#if 0
+  use_operand_p use_p;
+#endif
+  tree def;
+#if 0
+  gimple *use_stmt;
+#endif
+
+  /* Only consider modify stmts.  */
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  /* If the statement may throw an exception, it cannot be replaced.  */
+  if (stmt_could_throw_p (cfun, stmt))
+    return false;
+
+  /* Punt if there is more than 1 def.  */
+  def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF);
+  if (!def)
+    return false;
+
+#if 0
+  /* Only consider definitions which have a single use.  */
+  if (!single_imm_use (def, &use_p, &use_stmt))
+    return false;
+
+  /* Used in this block, but at the TOP of the block, not the end.  */
+  if (gimple_code (use_stmt) == GIMPLE_PHI)
+    return false;
+#endif
+
+  /* There must be no VDEFs.  */
+  if (gimple_vdef (stmt))
+    return false;
+
+  /* Float expressions must go through memory if float-store is on.  */
+  if (flag_float_store
+      && FLOAT_TYPE_P (gimple_expr_type (stmt)))
+    return false;
+
+  /* An assignment with a register variable on the RHS is not
+     replaceable.  */
+  if (gimple_assign_rhs_code (stmt) == VAR_DECL
+      && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)))
+    return false;
+
+  /* No function calls can be replaced.  */
+  if (is_gimple_call (stmt))
+    return false;
+
+  /* Leave any stmt with volatile operands alone as well.  */
+  if (gimple_has_volatile_ops (stmt))
+    return false;
+
+  return true;
+}
+
+/* Follow substitutable SSA DEFs for *NAME, including type casts,
+   adjusting *NAME to the single rhs or the type cast operand along
+   the way.  Return the target type of the earliest type cast
+   found.  */
+
+static tree
+is_cast_p (tree *name)
+{
+  tree type = 0;
+
+  while (TREE_CODE (*name) == SSA_NAME
+	 && !SSA_NAME_IS_DEFAULT_DEF (*name))
+    {
+      gimple *def = SSA_NAME_DEF_STMT (*name);
+
+      if (!ssa_is_substitutable_p (def))
+	break;
+
+      if (gimple_num_ops (def) != 2)
+	break;
+
+      if (get_gimple_rhs_class (gimple_expr_code (def))
+	  == GIMPLE_SINGLE_RHS)
+	{
+	  *name = gimple_assign_rhs1 (def);
+	  continue;
+	}
+
+      if (!gimple_assign_cast_p (def))
+	break;
+
+      if (!type)
+	type = TREE_TYPE (*name);
+      *name = gimple_assign_rhs1 (def);
+    }
+
+  return type;
+}
+
+/* Follow substitutable SSA DEFs for *NAME.  If we find it is assigned
+   a CODE binary expr, return the second operand, and set *NAME to the
+   first operand.  */
+
+static tree
+is_binop_p (enum tree_code code, tree *name)
+{
+  while (TREE_CODE (*name) == SSA_NAME
+	 && !SSA_NAME_IS_DEFAULT_DEF (*name))
+    {
+      gimple *def = SSA_NAME_DEF_STMT (*name);
+
+      if (!ssa_is_substitutable_p (def))
+	return 0;
+
+      switch (gimple_num_ops (def))
+	{
+	default:
+	  return 0;
+
+	case 2:
+	  if (get_gimple_rhs_class (gimple_expr_code (def))
+	      == GIMPLE_SINGLE_RHS)
+	    {
+	      *name = gimple_assign_rhs1 (def);
+	      continue;
+	    }
+	  return 0;
+
+	case 3:
+	  ;
+	}
+
+      if (gimple_assign_rhs_code (def) != code)
+	return 0;
+
+      *name = gimple_assign_rhs1 (def);
+      return gimple_assign_rhs2 (def);
+    }
+
+  return 0;
+}
+
+/* If *R_ARG is a constant zero, and L_ARG is a possibly masked
+   BIT_XOR_EXPR, return 1 and set *r_arg to l_arg.
+   Otherwise, return 0.
+
+   The returned value should be passed to decode_field_reference for it
+   to handle l_arg, and then doubled for r_arg.  */
+
+static int
+prepare_xor (tree l_arg, tree *r_arg)
+{
+  int ret = 0;
+
+  if (!integer_zerop (*r_arg))
+    return ret;
+
+  tree exp = l_arg;
+
+  if (tree and_mask = is_binop_p (BIT_AND_EXPR, &exp))
+    {
+      if (TREE_CODE (and_mask) != INTEGER_CST)
+	return ret;
+    }
+
+  if (is_binop_p (BIT_XOR_EXPR, &exp))
+    {
+      *r_arg = l_arg;
+      return 1;
+    }
+
+  return ret;
+}
+
+/* Subroutine for fold_truth_andor_1: decode a field reference.
+
+   If EXP is a comparison reference, we return the innermost reference.
+
+   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
+   set to the starting bit number.
+
+   If the innermost field can be completely contained in a mode-sized
+   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
+
+   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
+   otherwise it is not changed.
+
+   *PUNSIGNEDP is set to the signedness of the field.
+
+   *PREVERSEP is set to the storage order of the field.
+
+   *PMASK is set to the mask used.  This is either contained in a
+   BIT_AND_EXPR or derived from the width of the field.
+
+   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
+
+   XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked)
+   BIT_XOR_EXPR compared with zero.  We're to take the first or second
+   operand thereof if so.  It should be zero otherwise.
+
+   Return 0 if this is not a component reference or is one that we can't
+   do anything with.  */
+
+static tree
+decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
+			HOST_WIDE_INT *pbitpos, machine_mode *pmode,
+			int *punsignedp, int *preversep, int *pvolatilep,
+			tree *pmask, tree *pand_mask, int xor_which)
+{
+  tree exp = *exp_;
+  tree outer_type = 0;
+  tree and_mask = 0;
+  tree mask, inner, offset;
+  tree unsigned_type;
+  unsigned int precision;
+  HOST_WIDE_INT shiftrt = 0;
+
+  /* All the optimizations using this function assume integer fields.
+     There are problems with FP fields since the type_for_size call
+     below can fail for, e.g., XFmode.  */
+  if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
+    return 0;
+
+  /* We are interested in the bare arrangement of bits, so strip everything
+     that doesn't affect the machine mode.  However, record the type of the
+     outermost expression if it may matter below.  */
+  outer_type = is_cast_p (&exp);
+
+  if ((and_mask = is_binop_p (BIT_AND_EXPR, &exp)))
+    {
+      if (TREE_CODE (and_mask) != INTEGER_CST)
+	return 0;
+    }
+
+  if (xor_which)
+    {
+      tree op2 = is_binop_p (BIT_XOR_EXPR, &exp);
+      gcc_checking_assert (op2);
+      if (xor_which > 1)
+	exp = op2;
+    }
+
+  if (tree t = is_cast_p (&exp))
+    if (!outer_type)
+      outer_type = t;
+
+  if (tree shift = is_binop_p (RSHIFT_EXPR, &exp))
+    {
+      if (TREE_CODE (shift) != INTEGER_CST || !tree_fits_shwi_p (shift))
+	return 0;
+      shiftrt = tree_to_shwi (shift);
+      if (shiftrt <= 0)
+	return 0;
+    }
+
+  if (tree t = is_cast_p (&exp))
+    if (!outer_type)
+      outer_type = t;
+
+  poly_int64 poly_bitsize, poly_bitpos;
+  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
+			       pmode, punsignedp, preversep, pvolatilep);
+
+  if ((inner == exp && and_mask == 0)
+      || !poly_bitsize.is_constant (pbitsize)
+      || !poly_bitpos.is_constant (pbitpos)
+      || *pbitsize <= shiftrt
+      || offset != 0
+      || TREE_CODE (inner) == PLACEHOLDER_EXPR
+      /* Reject out-of-bound accesses (PR79731).  */
+      || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
+	  && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
+			       *pbitpos + *pbitsize) < 0))
+    return NULL_TREE;
+
+  if (shiftrt)
+    {
+      if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	*pbitpos += shiftrt;
+      *pbitsize -= shiftrt;
+    }
+
+  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
+    {
+      HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
+      if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	*pbitpos += excess;
+      *pbitsize -= excess;
+    }
+
+  unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
+  if (unsigned_type == NULL_TREE)
+    return NULL_TREE;
+
+  *exp_ = exp;
+
+  /* If the number of bits in the reference is the same as the bitsize of
+     the outer type, then the outer type gives the signedness. Otherwise
+     (in case of a small bitfield) the signedness is unchanged.  */
+  if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
+    *punsignedp = TYPE_UNSIGNED (outer_type);
+
+  /* Compute the mask to access the bitfield.  */
+  precision = TYPE_PRECISION (unsigned_type);
+
+  mask = build_int_cst_type (unsigned_type, -1);
+
+  mask = int_const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
+  mask = int_const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
+
+  /* Merge it with the mask we found in the BIT_AND_EXPR, if any.  */
+  if (and_mask != 0)
+    mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
+			fold_convert_loc (loc, unsigned_type, and_mask), mask);
+
+  *pmask = mask;
+  *pand_mask = and_mask;
+  return inner;
+}
+
+/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
+   bit value.  Arrange things so the extra bits will be set to zero if and
+   only if C is signed-extended to its full width.  If MASK is nonzero,
+   it is an INTEGER_CST that should be AND'ed with the extra bits.  */
+
+static tree
+unextend (tree c, int p, int unsignedp, tree mask)
+{
+  tree type = TREE_TYPE (c);
+  int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
+  tree temp;
+
+  if (p == modesize || unsignedp)
+    return c;
+
+  /* We work by getting just the sign bit into the low-order bit, then
+     into the high-order bit, then sign-extend.  We then XOR that value
+     with C.  */
+  temp = build_int_cst (TREE_TYPE (c),
+			wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
+
+  /* We must use a signed type in order to get an arithmetic right shift.
+     However, we must also avoid introducing accidental overflows, so that
+     a subsequent call to integer_zerop will work.  Hence we must
+     do the type conversion here.  At this point, the constant is either
+     zero or one, and the conversion to a signed type can never overflow.
+     We could get an overflow if this conversion is done anywhere else.  */
+  if (TYPE_UNSIGNED (type))
+    temp = fold_convert (signed_type_for (type), temp);
+
+  temp = int_const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
+  temp = int_const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
+  if (mask != 0)
+    temp = int_const_binop (BIT_AND_EXPR, temp,
+			    fold_convert (TREE_TYPE (c), mask));
+  /* If necessary, convert the type back to match the type of C.  */
+  if (TYPE_UNSIGNED (type))
+    temp = fold_convert (type, temp);
+
+  return fold_convert (type, int_const_binop (BIT_XOR_EXPR, c, temp));
+}
+
+/* Return the one bitpos within bit extents L or R that is at an
+   ALIGN-bit alignment boundary, or -1 if there is more than one such
+   boundary, if there isn't any, or if there is any such boundary
+   between the extents.  L and R are given by bitpos and bitsize.  If
+   it doesn't return -1, there are two consecutive ALIGN-bit words
+   that contain both extents, and at least one of the extents
+   straddles across the returned alignment boundary.  */
+
+static inline HOST_WIDE_INT
+compute_split_boundary_from_align (HOST_WIDE_INT align,
+				   HOST_WIDE_INT l_bitpos,
+				   HOST_WIDE_INT l_bitsize,
+				   HOST_WIDE_INT r_bitpos,
+				   HOST_WIDE_INT r_bitsize)
+{
+  HOST_WIDE_INT amask = ~(align - 1);
+
+  HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
+  HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
+
+  HOST_WIDE_INT boundary = (end_bit - 1) & amask;
+
+  /* Make sure we're crossing no more than one alignment boundary.
+
+     ??? We don't have logic to recombine loads of two adjacent
+     fields that each crosses a different alignment boundary, so
+     as to load the middle word only once, if other words can't be
+     otherwise recombined.  */
+  if (boundary - first_bit > align)
+    return -1;
+
+  HOST_WIDE_INT l_start_word = l_bitpos & amask;
+  HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
+
+  HOST_WIDE_INT r_start_word = r_bitpos & amask;
+  HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
+
+  /* If neither field straddles across an alignment boundary, it's no
+     use to even try to merge them.  */
+  if (l_start_word == l_end_word && r_start_word == r_end_word)
+    return -1;
+
+  return boundary;
+}
+
+/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
+   LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
+   respectively, starting at BIT_POS, using reversed endianness if
+   REVERSEP.  Also initialize BITPOS (the starting position of each
+   part into INNER), BITSIZ (the bit count starting at BITPOS),
+   TOSHIFT[1] (the amount by which the part and its mask are to be
+   shifted right to bring its least-significant bit to bit zero) and
+   SHIFTED (the amount by which the part, by separate loading, has
+   already been shifted right, but that the mask needs shifting to
+   match).  */
+
+static inline void
+build_split_load (tree /* out */ ln_arg[2],
+		  HOST_WIDE_INT /* out */ bitpos[2],
+		  HOST_WIDE_INT /* out */ bitsiz[2],
+		  HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
+		  HOST_WIDE_INT /* out */ shifted[2],
+		  location_t loc, tree inner, tree orig_inner,
+		  scalar_int_mode mode, scalar_int_mode mode2,
+		  HOST_WIDE_INT bit_pos, bool reversep)
+{
+  bitsiz[0] = GET_MODE_BITSIZE (mode);
+  bitsiz[1] = GET_MODE_BITSIZE (mode2);
+
+  for (int i = 0; i < 2; i++)
+    {
+      tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
+      bitpos[i] = bit_pos;
+      ln_arg[i] = make_bit_field_ref (loc, inner, orig_inner,
+				      type, bitsiz[i],
+				      bit_pos, 1, reversep);
+      bit_pos += bitsiz[i];
+    }
+
+  toshift[1] = toshift[0];
+  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+    {
+      shifted[0] = bitsiz[1];
+      shifted[1] = 0;
+      toshift[0] = 0;
+    }
+  else
+    {
+      shifted[1] = bitsiz[0];
+      shifted[0] = 0;
+      toshift[1] = 0;
+    }
+}
+
+/* Make arrangements to split at bit BOUNDARY a single loaded word
+   (with REVERSEP bit order) LN_ARG[0], to be shifted right by
+   TOSHIFT[0] to bring the field of interest to the least-significant
+   bit.  The expectation is that the same loaded word will be
+   propagated from part 0 to part 1, with just different shifting and
+   masking to extract both parts.  MASK is not expected to do more
+   than masking out the bits that belong to the other part.  See
+   build_split_load for more information on the other fields.  */
+
+static inline void
+reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
+		  HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
+		  HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
+		  HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
+		  HOST_WIDE_INT /* out */ shifted[2],
+		  tree /* out */ mask[2],
+		  HOST_WIDE_INT boundary, bool reversep)
+{
+  ln_arg[1] = ln_arg[0];
+  bitpos[1] = bitpos[0];
+  bitsiz[1] = bitsiz[0];
+  shifted[1] = shifted[0] = 0;
+
+  tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1);
+
+  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+    {
+      toshift[1] = toshift[0];
+      toshift[0] = bitpos[0] + bitsiz[0] - boundary;
+      mask[0] = int_const_binop (LSHIFT_EXPR, basemask,
+				 bitsize_int (toshift[0]));
+      mask[1] = int_const_binop (BIT_XOR_EXPR, basemask, mask[0]);
+    }
+  else
+    {
+      toshift[1] = boundary - bitpos[1];
+      mask[1] = int_const_binop (LSHIFT_EXPR, basemask,
+				 bitsize_int (toshift[1]));
+      mask[0] = int_const_binop (BIT_XOR_EXPR, basemask, mask[1]);
+    }
+}
+
+/* Find ways of folding logical expressions of LHS and RHS:
+   Try to merge two comparisons to the same innermost item.
+   Look for range tests like "ch >= '0' && ch <= '9'".
+   Look for combinations of simple terms on machines with expensive branches
+   and evaluate the RHS unconditionally.
+
+   For example, if we have p->a == 2 && p->b == 4 and we can make an
+   object large enough to span both A and B, we can do this with a comparison
+   against the object ANDed with the a mask.
+
+   If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
+   operations to do this with one comparison.
+
+   We check for both normal comparisons and the BIT_AND_EXPRs made this by
+   function and the one above.
+
+   CODE is the logical operation being done.  It can be TRUTH_ANDIF_EXPR,
+   TRUTH_AND_EXPR, TRUTH_ORIF_EXPR, or TRUTH_OR_EXPR.
+
+   TRUTH_TYPE is the type of the logical operand and LHS and RHS are its
+   two operands.
+
+   SEPARATEP should be NULL if LHS and RHS are adjacent in
+   CODE-chained compares, and a non-NULL pointer to NULL_TREE
+   otherwise.  If the "words" accessed by RHS are already accessed by
+   LHS, this won't matter, but if RHS accesses "words" that LHS
+   doesn't, then *SEPARATEP will be set to the compares that should
+   take RHS's place.  By "words" we mean contiguous bits that do not
+   cross a an TYPE_ALIGN boundary of the accessed object's type.
+
+   We return the simplified tree or 0 if no optimization is possible.  */
+
+tree
+fold_truth_andor_maybe_separate (location_t loc,
+				 enum tree_code code, tree truth_type,
+				 enum tree_code lcode, tree ll_arg, tree lr_arg,
+				 enum tree_code rcode, tree rl_arg, tree rr_arg,
+				 tree *separatep)
+{
+  /* If this is the "or" of two comparisons, we can do something if
+     the comparisons are NE_EXPR.  If this is the "and", we can do something
+     if the comparisons are EQ_EXPR.  I.e.,
+	(a->b == 2 && a->c == 4) can become (a->new == NEW).
+
+     WANTED_CODE is this operation code.  For single bit fields, we can
+     convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
+     comparison for one-bit fields.  */
+
+  enum tree_code orig_code = code;
+  enum tree_code wanted_code;
+  tree ll_inner, lr_inner, rl_inner, rr_inner;
+  HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
+  HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
+  HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
+  HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
+  int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
+  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
+  machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
+  scalar_int_mode lnmode, lnmode2, rnmode;
+  tree ll_mask, lr_mask, rl_mask, rr_mask;
+  tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
+  tree l_const, r_const;
+  tree lntype, rntype, result;
+  HOST_WIDE_INT first_bit, end_bit;
+  int volatilep;
+  bool l_split_load;
+
+  gcc_checking_assert (!separatep || !*separatep);
+
+  /* Start by getting the comparison codes.  Fail if anything is volatile.
+     If one operand is a BIT_AND_EXPR with the constant one, treat it as if
+     it were surrounded with a NE_EXPR.  */
+
+  if (TREE_SIDE_EFFECTS (ll_arg) || TREE_SIDE_EFFECTS (lr_arg)
+      || TREE_SIDE_EFFECTS (rl_arg) || TREE_SIDE_EFFECTS (rr_arg))
+    return 0;
+
+  if (TREE_CODE_CLASS (lcode) != tcc_comparison
+      || TREE_CODE_CLASS (rcode) != tcc_comparison)
+    return 0;
+
+  code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
+	  ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
+
+  bool lsignbit = false, rsignbit = false;
+  if ((lcode == LT_EXPR || lcode == GE_EXPR)
+      && integer_zerop (lr_arg)
+      && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
+      && !TYPE_UNSIGNED (TREE_TYPE (ll_arg)))
+    {
+      lsignbit = true;
+      lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
+    }
+  if ((rcode == LT_EXPR || rcode == GE_EXPR)
+      && integer_zerop (rr_arg)
+      && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
+      && !TYPE_UNSIGNED (TREE_TYPE (rl_arg)))
+    {
+      rsignbit = true;
+      rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
+    }
+
+  /* See if the comparisons can be merged.  Then get all the parameters for
+     each side.  */
+
+  if ((lcode != EQ_EXPR && lcode != NE_EXPR)
+      || (rcode != EQ_EXPR && rcode != NE_EXPR))
+    return 0;
+
+  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
+  volatilep = 0;
+  int l_xor = prepare_xor (ll_arg, &lr_arg);
+  ll_inner = decode_field_reference (loc, &ll_arg,
+				     &ll_bitsize, &ll_bitpos, &ll_mode,
+				     &ll_unsignedp, &ll_reversep, &volatilep,
+				     &ll_mask, &ll_and_mask, l_xor);
+  lr_inner = decode_field_reference (loc, &lr_arg,
+				     &lr_bitsize, &lr_bitpos, &lr_mode,
+				     &lr_unsignedp, &lr_reversep, &volatilep,
+				     &lr_mask, &lr_and_mask, 2 * l_xor);
+  int r_xor = prepare_xor (rl_arg, &rr_arg);
+  rl_inner = decode_field_reference (loc, &rl_arg,
+				     &rl_bitsize, &rl_bitpos, &rl_mode,
+				     &rl_unsignedp, &rl_reversep, &volatilep,
+				     &rl_mask, &rl_and_mask, r_xor);
+  rr_inner = decode_field_reference (loc, &rr_arg,
+				     &rr_bitsize, &rr_bitpos, &rr_mode,
+				     &rr_unsignedp, &rr_reversep, &volatilep,
+				     &rr_mask, &rr_and_mask, 2 * r_xor);
+
+  /* It must be true that the inner operation on the lhs of each
+     comparison must be the same if we are to be able to do anything.
+     Then see if we have constants.  If not, the same must be true for
+     the rhs's.  */
+  if (volatilep
+      || ll_reversep != rl_reversep
+      || ll_inner == 0 || rl_inner == 0
+      || ! operand_equal_p (ll_inner, rl_inner, 0))
+    return 0;
+
+  if (TREE_CODE (lr_arg) == INTEGER_CST
+      && TREE_CODE (rr_arg) == INTEGER_CST)
+    {
+      l_const = lr_arg, r_const = rr_arg;
+      lr_reversep = ll_reversep;
+    }
+  else if (lr_reversep != rr_reversep
+	   || lr_inner == 0 || rr_inner == 0
+	   || ! operand_equal_p (lr_inner, rr_inner, 0))
+    return 0;
+  else
+    l_const = r_const = 0;
+
+  if (lsignbit)
+    {
+      tree mask = build_int_cst_type (TREE_TYPE (ll_arg), -1);
+      tree sign = int_const_binop (LSHIFT_EXPR, mask,
+				   bitsize_int (ll_bitsize - 1));
+      if (!ll_mask)
+	ll_mask = sign;
+      else
+	ll_mask = int_const_binop (BIT_AND_EXPR, ll_mask, sign);
+      if (!ll_and_mask)
+	ll_and_mask = sign;
+      else
+	ll_and_mask = int_const_binop (BIT_AND_EXPR, ll_and_mask, sign);
+    }
+
+  if (rsignbit)
+    {
+      tree mask = build_int_cst_type (TREE_TYPE (rl_arg), -1);
+      tree sign = int_const_binop (LSHIFT_EXPR, mask,
+				   bitsize_int (rl_bitsize - 1));
+      if (!rl_mask)
+	rl_mask = sign;
+      else
+	rl_mask = int_const_binop (BIT_AND_EXPR, rl_mask, sign);
+      if (!rl_and_mask)
+	rl_and_mask = sign;
+      else
+	rl_and_mask = int_const_binop (BIT_AND_EXPR, rl_and_mask, sign);
+    }
+
+  /* If either comparison code is not correct for our logical operation,
+     fail.  However, we can convert a one-bit comparison against zero into
+     the opposite comparison against that bit being set in the field.  */
+
+  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
+  if (lcode != wanted_code)
+    {
+      if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
+	{
+	  /* Make the left operand unsigned, since we are only interested
+	     in the value of one bit.  Otherwise we are doing the wrong
+	     thing below.  */
+	  ll_unsignedp = 1;
+	  l_const = ll_mask;
+	}
+      else
+	return 0;
+    }
+
+  /* This is analogous to the code for l_const above.  */
+  if (rcode != wanted_code)
+    {
+      if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
+	{
+	  rl_unsignedp = 1;
+	  r_const = rl_mask;
+	}
+      else
+	return 0;
+    }
+
+  /* This will be bumped to 2 if any of the field pairs crosses an
+     alignment boundary, so the merged compare has to be done in two
+     parts.  */
+  int parts = 1;
+  /* Set to true if the second combined compare should come first,
+     e.g., because the second original compare accesses a word that
+     the first one doesn't, and the combined compares access those in
+     cmp[0].  */
+  bool first1 = false;
+  /* Set to true if the first original compare is not the one being
+     split.  */
+  bool maybe_separate = false;
+
+  /* The following 2-dimensional arrays use the first index to
+     identify left(0)- vs right(1)-hand compare operands, and the
+     second one to identify merged compare parts.  */
+  /* The memory loads or constants to be compared.  */
+  tree ld_arg[2][2];
+  /* The first bit of the corresponding inner object that the
+     corresponding LD_ARG covers.  */
+  HOST_WIDE_INT bitpos[2][2];
+  /* The bit count starting at BITPOS that the corresponding LD_ARG
+     covers.  */
+  HOST_WIDE_INT bitsiz[2][2];
+  /* The number of bits by which LD_ARG has already been shifted
+     right, WRT mask.  */
+  HOST_WIDE_INT shifted[2][2];
+  /* The number of bits by which both LD_ARG and MASK need shifting to
+     bring its least-significant bit to bit zero.  */
+  HOST_WIDE_INT toshift[2][2];
+  /* An additional mask to be applied to LD_ARG, to remove any bits
+     that may have been loaded for use in another compare, but that
+     don't belong in the corresponding compare.  */
+  tree xmask[2][2] = {};
+
+  /* The combined compare or compares.  */
+  tree cmp[2];
+
+  /* Consider we're comparing two non-contiguous fields of packed
+     structs, both aligned at 32-bit boundaries:
+
+     ll_arg: an 8-bit field at offset 0
+     lr_arg: a 16-bit field at offset 2
+
+     rl_arg: an 8-bit field at offset 1
+     rr_arg: a 16-bit field at offset 3
+
+     We'll have r_split_load, because rr_arg straddles across an
+     alignment boundary.
+
+     We'll want to have:
+
+     bitpos  = { {  0,  0 }, {  0, 32 } }
+     bitsiz  = { { 32, 32 }, { 32,  8 } }
+
+     And, for little-endian:
+
+     shifted = { {  0,  0 }, {  0, 32 } }
+     toshift = { {  0, 24 }, {  0,  0 } }
+
+     Or, for big-endian:
+
+     shifted = { {  0,  0 }, {  8,  0 } }
+     toshift = { {  8,  0 }, {  0,  0 } }
+  */
+
+  /* See if we can find a mode that contains both fields being compared on
+     the left.  If we can't, fail.  Otherwise, update all constants and masks
+     to be relative to a field of that size.  */
+  first_bit = MIN (ll_bitpos, rl_bitpos);
+  end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
+  if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+		      TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
+		      volatilep, &lnmode))
+    {
+      /* Consider the possibility of recombining loads if any of the
+	 fields straddles across an alignment boundary, so that either
+	 part can be loaded along with the other field.  */
+      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
+      HOST_WIDE_INT boundary = compute_split_boundary_from_align
+	(align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+
+      if (boundary < 0
+	  || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
+			     align, BITS_PER_WORD, volatilep, &lnmode)
+	  || !get_best_mode (end_bit - boundary, boundary, 0, 0,
+			     align, BITS_PER_WORD, volatilep, &lnmode2))
+	return 0;
+
+      l_split_load = true;
+      parts = 2;
+      if (ll_bitpos >= boundary)
+	maybe_separate = first1 = true;
+      else if (ll_bitpos + ll_bitsize <= boundary)
+	maybe_separate = true;
+    }
+  else
+    l_split_load = false;
+
+  lnbitsize = GET_MODE_BITSIZE (lnmode);
+  lnbitpos = first_bit & ~ (lnbitsize - 1);
+  if (l_split_load)
+    lnbitsize += GET_MODE_BITSIZE (lnmode2);
+  lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
+  if (!lntype)
+    {
+      gcc_checking_assert (l_split_load);
+      lntype = build_nonstandard_integer_type (lnbitsize, 1);
+    }
+  xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
+
+  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+    {
+      xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
+      xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
+    }
+
+  ll_mask = int_const_binop (LSHIFT_EXPR,
+			     fold_convert_loc (loc, lntype, ll_mask),
+			     size_int (xll_bitpos));
+  rl_mask = int_const_binop (LSHIFT_EXPR,
+			     fold_convert_loc (loc, lntype, rl_mask),
+			     size_int (xrl_bitpos));
+
+  if (l_const)
+    {
+      l_const = fold_convert_loc (loc, lntype, l_const);
+      l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
+      l_const = int_const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
+      if (! integer_zerop (int_const_binop (BIT_AND_EXPR, l_const,
+					    fold_build1_loc (loc, BIT_NOT_EXPR,
+							     lntype, ll_mask))))
+	{
+	  warning (0, "comparison is always %d", wanted_code == NE_EXPR);
+
+	  return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
+	}
+    }
+  if (r_const)
+    {
+      r_const = fold_convert_loc (loc, lntype, r_const);
+      r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
+      r_const = int_const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
+      if (! integer_zerop (int_const_binop (BIT_AND_EXPR, r_const,
+					    fold_build1_loc (loc, BIT_NOT_EXPR,
+							     lntype, rl_mask))))
+	{
+	  warning (0, "comparison is always %d", wanted_code == NE_EXPR);
+
+	  return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
+	}
+    }
+
+  /* If the right sides are not constant, do the same for it.  Also,
+     disallow this optimization if a size, signedness or storage order
+     mismatch occurs between the left and right sides.  */
+  if (l_const == 0)
+    {
+      if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
+	  || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
+	  || ll_reversep != lr_reversep
+	  /* Make sure the two fields on the right
+	     correspond to the left without being swapped.  */
+	  || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos
+	  || lnbitpos < 0)
+	return 0;
+
+      bool r_split_load;
+      scalar_int_mode rnmode2;
+
+      first_bit = MIN (lr_bitpos, rr_bitpos);
+      end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
+      if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+			  TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
+			  volatilep, &rnmode))
+	{
+	  /* Consider the possibility of recombining loads if any of the
+	     fields straddles across an alignment boundary, so that either
+	     part can be loaded along with the other field.  */
+	  HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
+	  HOST_WIDE_INT boundary = compute_split_boundary_from_align
+	    (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
+
+	  if (boundary < 0
+	      /* If we're to split both, make sure the split point is
+		 the same.  */
+	      || (l_split_load
+		  && (boundary - lr_bitpos
+		      != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
+	      || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
+				 align, BITS_PER_WORD, volatilep, &rnmode)
+	      || !get_best_mode (end_bit - boundary, boundary, 0, 0,
+				 align, BITS_PER_WORD, volatilep, &rnmode2))
+	    return 0;
+
+	  r_split_load = true;
+	  parts = 2;
+	  if (lr_bitpos >= boundary)
+	    maybe_separate = first1 = true;
+	  else if (lr_bitpos + lr_bitsize <= boundary)
+	    maybe_separate = true;
+	}
+      else
+	r_split_load = false;
+
+      rnbitsize = GET_MODE_BITSIZE (rnmode);
+      rnbitpos = first_bit & ~ (rnbitsize - 1);
+      if (r_split_load)
+	rnbitsize += GET_MODE_BITSIZE (rnmode2);
+      rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
+      if (!rntype)
+	{
+	  gcc_checking_assert (r_split_load);
+	  rntype = build_nonstandard_integer_type (rnbitsize, 1);
+	}
+      xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
+
+      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	{
+	  xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
+	  xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
+	}
+
+      lr_mask = int_const_binop (LSHIFT_EXPR,
+				 fold_convert_loc (loc, rntype, lr_mask),
+				 size_int (xlr_bitpos));
+      rr_mask = int_const_binop (LSHIFT_EXPR,
+				 fold_convert_loc (loc, rntype, rr_mask),
+				 size_int (xrr_bitpos));
+
+      lr_mask = int_const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
+
+      toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
+      shifted[1][0] = 0;
+
+      if (!r_split_load)
+	{
+	  bitpos[1][0] = rnbitpos;
+	  bitsiz[1][0] = rnbitsize;
+	  ld_arg[1][0] = make_bit_field_ref (loc, lr_inner, lr_arg,
+					     rntype, rnbitsize, rnbitpos,
+					     lr_unsignedp || rr_unsignedp,
+					     lr_reversep);
+	}
+
+      if (parts > 1)
+	{
+	  if (r_split_load)
+	    build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+			      shifted[1], loc, lr_inner, lr_arg,
+			      rnmode, rnmode2, rnbitpos, lr_reversep);
+	  else
+	    reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+			      shifted[1], xmask[1],
+			      lnbitpos + GET_MODE_BITSIZE (lnmode)
+			      - ll_bitpos + lr_bitpos, lr_reversep);
+	}
+    }
+  else
+    {
+      /* Handle the case of comparisons with constants.  If there is
+	 something in common between the masks, those bits of the
+	 constants must be the same.  If not, the condition is always
+	 false.  Test for this to avoid generating incorrect code
+	 below.  */
+      result = int_const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
+      if (! integer_zerop (result)
+	  && simple_cst_equal (int_const_binop (BIT_AND_EXPR,
+						result, l_const),
+			       int_const_binop (BIT_AND_EXPR,
+						result, r_const)) != 1)
+	{
+	  if (wanted_code == NE_EXPR)
+	    {
+	      warning (0,
+		       "%<or%> of unmatched not-equal tests"
+		       " is always 1");
+	      return constant_boolean_node (true, truth_type);
+	    }
+	  else
+	    {
+	      warning (0,
+		       "%<and%> of mutually exclusive equal-tests"
+		       " is always 0");
+	      return constant_boolean_node (false, truth_type);
+	    }
+	}
+
+      if (lnbitpos < 0)
+	return 0;
+
+      /* The constants are combined so as to line up with the loaded
+	 field, so use the same parameters.  */
+      ld_arg[1][0] = int_const_binop (BIT_IOR_EXPR, l_const, r_const);
+      toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
+      shifted[1][0] = 0;
+      bitpos[1][0] = lnbitpos;
+      bitsiz[1][0] = lnbitsize;
+
+      if (parts > 1)
+	reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+			  shifted[1], xmask[1],
+			  lnbitpos + GET_MODE_BITSIZE (lnmode),
+			  lr_reversep);
+
+      lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1);
+
+      /* If the compiler thinks this is used uninitialized below, it's
+	 because it can't realize that parts can only be 2 when
+	 comparing wiht constants if l_split_load is also true.  This
+	 just silences the warning.  */
+      rnbitpos = 0;
+    }
+
+  ll_mask = int_const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
+  toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
+  shifted[0][0] = 0;
+
+  if (!l_split_load)
+    {
+      bitpos[0][0] = lnbitpos;
+      bitsiz[0][0] = lnbitsize;
+      ld_arg[0][0] = make_bit_field_ref (loc, ll_inner, ll_arg,
+					 lntype, lnbitsize, lnbitpos,
+					 ll_unsignedp || rl_unsignedp,
+					 ll_reversep);
+    }
 
+  if (parts > 1)
+    {
+      if (l_split_load)
+	build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
+			  shifted[0], loc, ll_inner, ll_arg,
+			  lnmode, lnmode2, lnbitpos, ll_reversep);
+      else
+	reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
+			  shifted[0], xmask[0],
+			  rnbitpos + GET_MODE_BITSIZE (rnmode)
+			  - lr_bitpos + ll_bitpos, ll_reversep);
+    }
+
+  for (int i = 0; i < parts; i++)
+    {
+      tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
+      tree mask[2] = { ll_mask, lr_mask };
+
+      for (int j = 0; j < 2; j++)
+	{
+	  /* Mask out the bits belonging to the other part.  */
+	  if (xmask[j][i])
+	    mask[j] = int_const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]);
+
+	  if (shifted[j][i])
+	    {
+	      tree shiftsz = bitsize_int (shifted[j][i]);
+	      mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
+	    }
+	  mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]);
+	}
+
+      HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
+
+      if (shift)
+	{
+	  int j;
+	  if (shift > 0)
+	    j = 0;
+	  else
+	    {
+	      j = 1;
+	      shift = -shift;
+	    }
+
+	  tree shiftsz = bitsize_int (shift);
+	  op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]),
+				   op[j], shiftsz);
+	  mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
+	}
+
+      /* Convert to the smaller type before masking out unwanted
+	 bits.  */
+      tree type = TREE_TYPE (op[0]);
+      if (type != TREE_TYPE (op[1]))
+	{
+	  int j = (TYPE_PRECISION (type)
+		   < TYPE_PRECISION (TREE_TYPE (op[1])));
+	  if (!j)
+	    type = TREE_TYPE (op[1]);
+	  op[j] = fold_convert_loc (loc, type, op[j]);
+	  mask[j] = fold_convert_loc (loc, type, mask[j]);
+	}
+
+      for (int j = 0; j < 2; j++)
+	if (! integer_all_onesp (mask[j]))
+	  op[j] = build2_loc (loc, BIT_AND_EXPR, type,
+			      op[j], mask[j]);
+
+      cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]);
+    }
+
+  if (first1)
+    std::swap (cmp[0], cmp[1]);
+
+  if (parts == 1)
+    result = cmp[0];
+  else if (!separatep || !maybe_separate)
+    result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]);
+  else
+    {
+      result = cmp[0];
+      *separatep = cmp[1];
+    }
+
+  return result;
+}
+
 /* Try to simplify the AND of two comparisons, specified by
    (OP1A CODE1 OP1B) and (OP2B CODE2 OP2B), respectively.
    If this can be simplified to a single expression (without requiring
@@ -5988,6 +7126,13 @@ maybe_fold_and_comparisons (tree type,
 						     op2b))
     return t;
 
+  if (tree t = fold_truth_andor_maybe_separate (UNKNOWN_LOCATION,
+						TRUTH_ANDIF_EXPR, type,
+						code2, op2a, op2b,
+						code1, op1a, op1b,
+						NULL))
+    return t;
+
   return NULL_TREE;
 }
 
diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c b/gcc/testsuite/gcc.dg/field-merge-1.c
new file mode 100644
index 00000000..1818e104
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-1.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
+
+/* Check that field loads compared with constants are merged, even if
+   tested out of order, and when fields straddle across alignment
+   boundaries.  */
+
+struct TL {
+  unsigned char p;
+  unsigned int a;
+  unsigned char q;
+  unsigned int b;
+  unsigned char r;
+  unsigned int c;
+  unsigned char s;
+} __attribute__ ((packed, aligned (4), scalar_storage_order ("little-endian")));
+
+struct TB {
+  unsigned char p;
+  unsigned int a;
+  unsigned char q;
+  unsigned int b;
+  unsigned char r;
+  unsigned int c;
+  unsigned char s;
+} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
+
+#define vc 0xaa
+#define vi 0x12345678
+
+struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
+struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
+
+void f (void) {
+  /* Which words of    | vL | vB | */
+  /* are accessed by   |0123|0123| */
+  if (0 /* the tests?  |    |    | */
+      || vL.p != vc /* |*   |    | */
+      || vB.p != vc /* |    |*   | */
+      || vL.s != vc /* |   *|    | */
+      || vB.q != vc /* |    | *  | */
+      || vL.a != vi /* |^*  |    | */
+      || vB.b != vi /* |    | ^* | */
+      || vL.c != vi /* |  *^|    | */
+      || vB.c != vi /* |    |  ^*| */
+      || vL.b != vi /* | ^^ |    | */
+      || vL.q != vc /* | ^  |    | */
+      || vB.a != vi /* |    |^^  | */
+      || vB.r != vc /* |    |  ^ | */
+      || vB.s != vc /* |    |   ^| */
+      || vL.r != vc /* |  ^ |    | */
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
+/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } } } } */
+/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || x86_64-*-* } } } } */
+/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || rs6000-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c b/gcc/testsuite/gcc.dg/field-merge-2.c
new file mode 100644
index 00000000..80c573b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct TL {
+  unsigned short a;
+  unsigned short b;
+} __attribute__ ((packed, aligned (8)));
+
+struct TB {
+  unsigned char p;
+  unsigned short a;
+  unsigned short b;
+} __attribute__ ((packed, aligned (8)));
+
+#define vc 0xaa
+
+struct TL vL = { vc, vc };
+struct TB vB = { vc, vc, vc };
+
+void f (void) {
+  if (0
+      || vL.b != vB.b
+      || vL.a != vB.a
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c b/gcc/testsuite/gcc.dg/field-merge-3.c
new file mode 100644
index 00000000..f26e8a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-3.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
+
+struct T1 {
+  unsigned char p[2];
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__aligned__(8)));
+
+struct T2 {
+  unsigned short p;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__aligned__(8)));
+
+#define vc 0xaa
+#define vi 0x12345678
+
+struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
+struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
+
+void f (void) {
+  if (0
+      || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
+      || v1.a != v2.a
+      || ((v1.z ^ v2.z) & 0xff00ff00) != 0
+      || ((v1.z ^ v2.z) & 0x00ff00ff) != 0)
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c b/gcc/testsuite/gcc.dg/field-merge-4.c
new file mode 100644
index 00000000..c629069
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-4.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct T1 {
+  unsigned int zn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(4)));
+
+struct T2 {
+  unsigned int zn;
+  unsigned char rn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(4)));
+
+#define vc 0xaa
+#define vs 0xccdd
+#define vi 0x12345678
+
+struct T1 v1 = { -1, vc, 1, vs, vi };
+struct T2 v2 = { -1, 0, vc, 1, vs, vi };
+
+void f (void) {
+  if (0
+      || v1.p != v2.p
+      || v1.a != v2.a
+      || v1.z != v2.z
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c b/gcc/testsuite/gcc.dg/field-merge-5.c
new file mode 100644
index 00000000..1580b14bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-5.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct T1 {
+  unsigned int zn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(8)));
+
+struct T2 {
+  unsigned int zn;
+  unsigned char rn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(8)));
+
+#define vc 0xaa
+#define vs 0xccdd
+#define vi 0x12345678
+
+struct T1 v1 = { -1, vc, 1, vs, vi };
+struct T2 v2 = { -1, 0, vc, 1, vs, vi };
+
+void f (void) {
+  if (0
+      || v1.p != v2.p
+      || v1.a != v2.a
+      || v1.z != v2.z
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c
index 21a70f4..6234756 100644
--- a/gcc/tree-ssa-ifcombine.c
+++ b/gcc/tree-ssa-ifcombine.c
@@ -40,7 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify-me.h"
 #include "tree-cfg.h"
 #include "tree-ssa.h"
-
+ 
 #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
 #define LOGICAL_OP_NON_SHORT_CIRCUIT \
   (BRANCH_COST (optimize_function_for_speed_p (cfun), \
@@ -126,7 +126,7 @@ bb_no_side_effects_p (basic_block bb)
       if (gimple_has_side_effects (stmt)
 	  || gimple_uses_undefined_value_p (stmt)
 	  || gimple_could_trap_p (stmt)
-	  || gimple_vuse (stmt)
+	  || gimple_has_volatile_ops (stmt)
 	  /* const calls don't match any of the above, yet they could
 	     still have some side-effects - they could contain
 	     gimple_could_trap_p statements, like floating point
@@ -561,8 +561,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 					    gimple_cond_lhs (outer_cond),
 					    gimple_cond_rhs (outer_cond))))
 	{
+	  {
 	  tree t1, t2;
-	  gimple_stmt_iterator gsi;
 	  bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT;
 	  if (param_logical_op_non_short_circuit != -1)
 	    logical_op_non_short_circuit
@@ -584,6 +584,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 				gimple_cond_rhs (outer_cond));
 	  t = fold_build2_loc (gimple_location (inner_cond), 
 			       TRUTH_AND_EXPR, boolean_type_node, t1, t2);
+	  }
+	gimplify_after_fold:
 	  if (result_inv)
 	    {
 	      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
@@ -593,6 +595,9 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 	  t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr, NULL, true,
 					  GSI_SAME_STMT);
         }
+      /* ??? Fold should avoid this.  */
+      else if (!is_gimple_condexpr (t))
+	goto gimplify_after_fold;
       if (result_inv)
 	t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
       t = canonicalize_cond_expr_cond (t);
Richard Biener Oct. 6, 2020, 8:33 a.m. UTC | #8
On Fri, Oct 2, 2020 at 10:43 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> Here's what I got so far, passing regstrap except for field-merge-1.c,
> that relies on combining non-adjacent compares, which I haven't
> implemented yet.

Thanks for trying.

> I had to retain some parts of fold_truth_andor_1 to avoid regressions in
> gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c.
> gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required
> turning sign tests into masks.
>
> I moved the field-compare-merging bits of fold_truth_andor_1, and
> auxiliary functions for it, into fold_truth_andor_maybe_separate, though
> the ability to deal with non-adjacent compares is no longer used, at
> least for the time being.
>
> I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in
> maybe_fold_and_comparisons, but...  the fact that it may return a tree
> that isn't a gimple condexpr, e.g. when there is masking or shifting,
> and that requires re-gimplification, suggests moving it to
> tree-ssa-ifcombine.c where this can be dealt with, possibly even for
> non-adjacent compares.  Right now, there is code in ifcombine to
> re-gimplify results from the regular folder, which I've extended, for
> the time being, to maybe_fold_and_comparisons results as well.

A bit of a baind-aid, but let it be this way for now.

> I've reworked decode_field_reference to seek and "substitute" SSA DEFs
> in a way that avoids building new trees, but the test for
> substitutability feels incomplete: there's nothing to ensure that vuses
> aren't brought from much-earlier blocks, and that there couldn't
> possibly be an intervening store.  I suspect I will have to give these
> pieces a little more info for it to be able to tell whether the memory
> accesses, if moved, would still get the same value.  Is there anything
> easily reusable for this sort of test?

Well, the easiest way (and matching the GENERIC folding) is to
make sure the VUSEs are the same.  I didn't quite get the "substitute"
code though.

>
> As for fields crossing alignment boundaries, the two-tests condition
> currently returned by fold_truth_andor_maybe_separate, that ends up
> gimplified into a new block, causes the chain of ifcombine optimizations
> to stop, something that I'm yet to investigate.  My plan is to rearrange
> ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and
> to handle such resplit conditions by inserting one in each of the
> preexisting blocks, to simplify the gimplification and in preparation
> for combining non-adjacent compares, if we wish to do that.  Do we?

I guess it's simply because we iterate over a fixed set of BBs?  Or of course
if you re-shape conditions rather than just "merging" aka eliding one you
might create new opportunities.

>  I
> was convinced that it was a safe optimization, because of the absence of
> intervening side effects and the undefinedness of intervening trapping
> memory accesses, but combining later tests that access a word with
> earlier tests that access the same word may cause intervening trapping
> accesses to be skipped.  Is this a possibility we should enable or
> disable on e.g. a per-language basis, avoid altogether, or ...  any
> other suggestions?

ISTR ifcombine checks whether there are side-effects (such as trapping)
that might be skipped.  And indeed we shouldn't do such thing though
eliding a trap is OK-ish, introducing one is obviously not ;)  We do,
after all, fold 0/x to 0 (just not literal 0/0).

>
> Richi, is this sort of what you had in mind, or were you thinking of
> something for match.pd or so?  Is match.pd even able to perform
> cross-block multi-condition matches?

So what I see in your patch is matching the condition itself - this
part can be done using match.pd (match ...) patterns in a much
nicer way, see for example the use of gimple_ctz_table_index
in tree-ssa-forwprop.c and the corresponding

/* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
   The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
   constant which when multiplied by a power of 2 contains a unique value
   in the top 5 or 6 bits.  This is then indexed into a table which maps it
   to the number of trailing zeroes.  */
(match (ctz_table_index @1 @2 @3)
  (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))

match.pd itself doesn't match "control flow", that is you'd have to simplify
  if (c == 1) if (c != 5) { ... }
as (c == 1) & (c != 5) which match.pd might turn into 0.

> Any further advice as to my plans above?

I do think we eventually want to remove all GENERIC time handling of _load_
simplifications.  Note with match.pd you cannot easily handle loads, but
eventually there's bits that do not involve loads that can be implemented
as match.pd patterns.

Otherwise thanks for working on this.

Richard.

> Thanks,
>
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0cc80ad..47b5419 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -125,7 +125,6 @@ static tree range_predecessor (tree);
>  static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>  static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree);
> -static tree unextend (tree, int, int, tree);
>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>  static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
>  static tree fold_binary_op_with_conditional_arg (location_t,
> @@ -4353,7 +4352,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
>     is the original memory reference used to preserve the alias set of
>     the access.  */
>
> -static tree
> +tree
>  make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
>                     HOST_WIDE_INT bitsize, poly_int64 bitpos,
>                     int unsignedp, int reversep)
> @@ -4603,132 +4602,6 @@ optimize_bit_field_compare (location_t loc, enum tree_code code,
>    return lhs;
>  }
>
> -/* Subroutine for fold_truth_andor_1: decode a field reference.
> -
> -   If EXP is a comparison reference, we return the innermost reference.
> -
> -   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
> -   set to the starting bit number.
> -
> -   If the innermost field can be completely contained in a mode-sized
> -   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
> -
> -   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
> -   otherwise it is not changed.
> -
> -   *PUNSIGNEDP is set to the signedness of the field.
> -
> -   *PREVERSEP is set to the storage order of the field.
> -
> -   *PMASK is set to the mask used.  This is either contained in a
> -   BIT_AND_EXPR or derived from the width of the field.
> -
> -   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
> -
> -   Return 0 if this is not a component reference or is one that we can't
> -   do anything with.  */
> -
> -static tree
> -decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
> -                       HOST_WIDE_INT *pbitpos, machine_mode *pmode,
> -                       int *punsignedp, int *preversep, int *pvolatilep,
> -                       tree *pmask, tree *pand_mask)
> -{
> -  tree exp = *exp_;
> -  tree outer_type = 0;
> -  tree and_mask = 0;
> -  tree mask, inner, offset;
> -  tree unsigned_type;
> -  unsigned int precision;
> -
> -  /* All the optimizations using this function assume integer fields.
> -     There are problems with FP fields since the type_for_size call
> -     below can fail for, e.g., XFmode.  */
> -  if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> -    return NULL_TREE;
> -
> -  /* We are interested in the bare arrangement of bits, so strip everything
> -     that doesn't affect the machine mode.  However, record the type of the
> -     outermost expression if it may matter below.  */
> -  if (CONVERT_EXPR_P (exp)
> -      || TREE_CODE (exp) == NON_LVALUE_EXPR)
> -    outer_type = TREE_TYPE (exp);
> -  STRIP_NOPS (exp);
> -
> -  if (TREE_CODE (exp) == BIT_AND_EXPR)
> -    {
> -      and_mask = TREE_OPERAND (exp, 1);
> -      exp = TREE_OPERAND (exp, 0);
> -      STRIP_NOPS (exp); STRIP_NOPS (and_mask);
> -      if (TREE_CODE (and_mask) != INTEGER_CST)
> -       return NULL_TREE;
> -    }
> -
> -  poly_int64 poly_bitsize, poly_bitpos;
> -  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
> -                              pmode, punsignedp, preversep, pvolatilep);
> -  if ((inner == exp && and_mask == 0)
> -      || !poly_bitsize.is_constant (pbitsize)
> -      || !poly_bitpos.is_constant (pbitpos)
> -      || *pbitsize < 0
> -      || offset != 0
> -      || TREE_CODE (inner) == PLACEHOLDER_EXPR
> -      /* Reject out-of-bound accesses (PR79731).  */
> -      || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> -         && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> -                              *pbitpos + *pbitsize) < 0))
> -    return NULL_TREE;
> -
> -  unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
> -  if (unsigned_type == NULL_TREE)
> -    return NULL_TREE;
> -
> -  *exp_ = exp;
> -
> -  /* If the number of bits in the reference is the same as the bitsize of
> -     the outer type, then the outer type gives the signedness. Otherwise
> -     (in case of a small bitfield) the signedness is unchanged.  */
> -  if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
> -    *punsignedp = TYPE_UNSIGNED (outer_type);
> -
> -  /* Compute the mask to access the bitfield.  */
> -  precision = TYPE_PRECISION (unsigned_type);
> -
> -  mask = build_int_cst_type (unsigned_type, -1);
> -
> -  mask = const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> -  mask = const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> -
> -  /* Merge it with the mask we found in the BIT_AND_EXPR, if any.  */
> -  if (and_mask != 0)
> -    mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
> -                       fold_convert_loc (loc, unsigned_type, and_mask), mask);
> -
> -  *pmask = mask;
> -  *pand_mask = and_mask;
> -  return inner;
> -}
> -
> -/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
> -   bit positions and MASK is SIGNED.  */
> -
> -static bool
> -all_ones_mask_p (const_tree mask, unsigned int size)
> -{
> -  tree type = TREE_TYPE (mask);
> -  unsigned int precision = TYPE_PRECISION (type);
> -
> -  /* If this function returns true when the type of the mask is
> -     UNSIGNED, then there will be errors.  In particular see
> -     gcc.c-torture/execute/990326-1.c.  There does not appear to be
> -     any documentation paper trail as to why this is so.  But the pre
> -     wide-int worked with that restriction and it has been preserved
> -     here.  */
> -  if (size > precision || TYPE_SIGN (type) == UNSIGNED)
> -    return false;
> -
> -  return wi::mask (size, false, precision) == wi::to_wide (mask);
> -}
>
>  /* Subroutine for fold: determine if VAL is the INTEGER_CONST that
>     represents the sign bit of EXP's type.  If EXP represents a sign
> @@ -6008,48 +5881,6 @@ fold_range_test (location_t loc, enum tree_code code, tree type,
>    return 0;
>  }
>
> -/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
> -   bit value.  Arrange things so the extra bits will be set to zero if and
> -   only if C is signed-extended to its full width.  If MASK is nonzero,
> -   it is an INTEGER_CST that should be AND'ed with the extra bits.  */
> -
> -static tree
> -unextend (tree c, int p, int unsignedp, tree mask)
> -{
> -  tree type = TREE_TYPE (c);
> -  int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
> -  tree temp;
> -
> -  if (p == modesize || unsignedp)
> -    return c;
> -
> -  /* We work by getting just the sign bit into the low-order bit, then
> -     into the high-order bit, then sign-extend.  We then XOR that value
> -     with C.  */
> -  temp = build_int_cst (TREE_TYPE (c),
> -                       wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
> -
> -  /* We must use a signed type in order to get an arithmetic right shift.
> -     However, we must also avoid introducing accidental overflows, so that
> -     a subsequent call to integer_zerop will work.  Hence we must
> -     do the type conversion here.  At this point, the constant is either
> -     zero or one, and the conversion to a signed type can never overflow.
> -     We could get an overflow if this conversion is done anywhere else.  */
> -  if (TYPE_UNSIGNED (type))
> -    temp = fold_convert (signed_type_for (type), temp);
> -
> -  temp = const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
> -  temp = const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
> -  if (mask != 0)
> -    temp = const_binop (BIT_AND_EXPR, temp,
> -                       fold_convert (TREE_TYPE (c), mask));
> -  /* If necessary, convert the type back to match the type of C.  */
> -  if (TYPE_UNSIGNED (type))
> -    temp = fold_convert (type, temp);
> -
> -  return fold_convert (type, const_binop (BIT_XOR_EXPR, c, temp));
> -}
> -
>  /* For an expression that has the form
>       (A && B) || ~B
>     or
> @@ -6119,20 +5950,13 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
>                             lhs, rhs);
>    return NULL_TREE;
>  }
> -
> +
>  /* Find ways of folding logical expressions of LHS and RHS:
>     Try to merge two comparisons to the same innermost item.
>     Look for range tests like "ch >= '0' && ch <= '9'".
>     Look for combinations of simple terms on machines with expensive branches
>     and evaluate the RHS unconditionally.
>
> -   For example, if we have p->a == 2 && p->b == 4 and we can make an
> -   object large enough to span both A and B, we can do this with a comparison
> -   against the object ANDed with the a mask.
> -
> -   If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
> -   operations to do this with one comparison.
> -
>     We check for both normal comparisons and the BIT_AND_EXPRs made this by
>     function and the one above.
>
> @@ -6157,24 +5981,9 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>       convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
>       comparison for one-bit fields.  */
>
> -  enum tree_code wanted_code;
>    enum tree_code lcode, rcode;
>    tree ll_arg, lr_arg, rl_arg, rr_arg;
> -  tree ll_inner, lr_inner, rl_inner, rr_inner;
> -  HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
> -  HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
> -  HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
> -  HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
> -  int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
> -  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
> -  machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
> -  scalar_int_mode lnmode, rnmode;
> -  tree ll_mask, lr_mask, rl_mask, rr_mask;
> -  tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
> -  tree l_const, r_const;
> -  tree lntype, rntype, result;
> -  HOST_WIDE_INT first_bit, end_bit;
> -  int volatilep;
> +  tree result;
>
>    /* Start by getting the comparison codes.  Fail if anything is volatile.
>       If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> @@ -6269,306 +6078,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>                            build_int_cst (TREE_TYPE (ll_arg), 0));
>      }
>
> -  /* See if the comparisons can be merged.  Then get all the parameters for
> -     each side.  */
> -
> -  if ((lcode != EQ_EXPR && lcode != NE_EXPR)
> -      || (rcode != EQ_EXPR && rcode != NE_EXPR))
> -    return 0;
> -
> -  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
> -  volatilep = 0;
> -  ll_inner = decode_field_reference (loc, &ll_arg,
> -                                    &ll_bitsize, &ll_bitpos, &ll_mode,
> -                                    &ll_unsignedp, &ll_reversep, &volatilep,
> -                                    &ll_mask, &ll_and_mask);
> -  lr_inner = decode_field_reference (loc, &lr_arg,
> -                                    &lr_bitsize, &lr_bitpos, &lr_mode,
> -                                    &lr_unsignedp, &lr_reversep, &volatilep,
> -                                    &lr_mask, &lr_and_mask);
> -  rl_inner = decode_field_reference (loc, &rl_arg,
> -                                    &rl_bitsize, &rl_bitpos, &rl_mode,
> -                                    &rl_unsignedp, &rl_reversep, &volatilep,
> -                                    &rl_mask, &rl_and_mask);
> -  rr_inner = decode_field_reference (loc, &rr_arg,
> -                                    &rr_bitsize, &rr_bitpos, &rr_mode,
> -                                    &rr_unsignedp, &rr_reversep, &volatilep,
> -                                    &rr_mask, &rr_and_mask);
> -
> -  /* It must be true that the inner operation on the lhs of each
> -     comparison must be the same if we are to be able to do anything.
> -     Then see if we have constants.  If not, the same must be true for
> -     the rhs's.  */
> -  if (volatilep
> -      || ll_reversep != rl_reversep
> -      || ll_inner == 0 || rl_inner == 0
> -      || ! operand_equal_p (ll_inner, rl_inner, 0))
> -    return 0;
> -
> -  if (TREE_CODE (lr_arg) == INTEGER_CST
> -      && TREE_CODE (rr_arg) == INTEGER_CST)
> -    {
> -      l_const = lr_arg, r_const = rr_arg;
> -      lr_reversep = ll_reversep;
> -    }
> -  else if (lr_reversep != rr_reversep
> -          || lr_inner == 0 || rr_inner == 0
> -          || ! operand_equal_p (lr_inner, rr_inner, 0))
> -    return 0;
> -  else
> -    l_const = r_const = 0;
> -
> -  /* If either comparison code is not correct for our logical operation,
> -     fail.  However, we can convert a one-bit comparison against zero into
> -     the opposite comparison against that bit being set in the field.  */
> -
> -  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
> -  if (lcode != wanted_code)
> -    {
> -      if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
> -       {
> -         /* Make the left operand unsigned, since we are only interested
> -            in the value of one bit.  Otherwise we are doing the wrong
> -            thing below.  */
> -         ll_unsignedp = 1;
> -         l_const = ll_mask;
> -       }
> -      else
> -       return 0;
> -    }
> -
> -  /* This is analogous to the code for l_const above.  */
> -  if (rcode != wanted_code)
> -    {
> -      if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
> -       {
> -         rl_unsignedp = 1;
> -         r_const = rl_mask;
> -       }
> -      else
> -       return 0;
> -    }
> -
> -  /* See if we can find a mode that contains both fields being compared on
> -     the left.  If we can't, fail.  Otherwise, update all constants and masks
> -     to be relative to a field of that size.  */
> -  first_bit = MIN (ll_bitpos, rl_bitpos);
> -  end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> -  if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> -                     TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> -                     volatilep, &lnmode))
> -    return 0;
> -
> -  lnbitsize = GET_MODE_BITSIZE (lnmode);
> -  lnbitpos = first_bit & ~ (lnbitsize - 1);
> -  lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
> -  xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
> -
> -  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> -    {
> -      xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
> -      xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
> -    }
> -
> -  ll_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, ll_mask),
> -                        size_int (xll_bitpos));
> -  rl_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, rl_mask),
> -                        size_int (xrl_bitpos));
> -
> -  if (l_const)
> -    {
> -      l_const = fold_convert_loc (loc, lntype, l_const);
> -      l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
> -      l_const = const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
> -      if (! integer_zerop (const_binop (BIT_AND_EXPR, l_const,
> -                                       fold_build1_loc (loc, BIT_NOT_EXPR,
> -                                                    lntype, ll_mask))))
> -       {
> -         warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> -
> -         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> -       }
> -    }
> -  if (r_const)
> -    {
> -      r_const = fold_convert_loc (loc, lntype, r_const);
> -      r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
> -      r_const = const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
> -      if (! integer_zerop (const_binop (BIT_AND_EXPR, r_const,
> -                                       fold_build1_loc (loc, BIT_NOT_EXPR,
> -                                                    lntype, rl_mask))))
> -       {
> -         warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> -
> -         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> -       }
> -    }
> -
> -  /* If the right sides are not constant, do the same for it.  Also,
> -     disallow this optimization if a size, signedness or storage order
> -     mismatch occurs between the left and right sides.  */
> -  if (l_const == 0)
> -    {
> -      if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
> -         || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
> -         || ll_reversep != lr_reversep
> -         /* Make sure the two fields on the right
> -            correspond to the left without being swapped.  */
> -         || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
> -       return 0;
> -
> -      first_bit = MIN (lr_bitpos, rr_bitpos);
> -      end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> -      if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> -                         TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> -                         volatilep, &rnmode))
> -       return 0;
> -
> -      rnbitsize = GET_MODE_BITSIZE (rnmode);
> -      rnbitpos = first_bit & ~ (rnbitsize - 1);
> -      rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
> -      xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
> -
> -      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> -       {
> -         xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
> -         xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
> -       }
> -
> -      lr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
> -                                                           rntype, lr_mask),
> -                            size_int (xlr_bitpos));
> -      rr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
> -                                                           rntype, rr_mask),
> -                            size_int (xrr_bitpos));
> -
> -      /* Make a mask that corresponds to both fields being compared.
> -        Do this for both items being compared.  If the operands are the
> -        same size and the bits being compared are in the same position
> -        then we can do this by masking both and comparing the masked
> -        results.  */
> -      ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> -      lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
> -      if (lnbitsize == rnbitsize
> -         && xll_bitpos == xlr_bitpos
> -         && lnbitpos >= 0
> -         && rnbitpos >= 0)
> -       {
> -         lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
> -                                   lntype, lnbitsize, lnbitpos,
> -                                   ll_unsignedp || rl_unsignedp, ll_reversep);
> -         if (! all_ones_mask_p (ll_mask, lnbitsize))
> -           lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
> -
> -         rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
> -                                   rntype, rnbitsize, rnbitpos,
> -                                   lr_unsignedp || rr_unsignedp, lr_reversep);
> -         if (! all_ones_mask_p (lr_mask, rnbitsize))
> -           rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
> -
> -         return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> -       }
> -
> -      /* There is still another way we can do something:  If both pairs of
> -        fields being compared are adjacent, we may be able to make a wider
> -        field containing them both.
> -
> -        Note that we still must mask the lhs/rhs expressions.  Furthermore,
> -        the mask must be shifted to account for the shift done by
> -        make_bit_field_ref.  */
> -      if (((ll_bitsize + ll_bitpos == rl_bitpos
> -           && lr_bitsize + lr_bitpos == rr_bitpos)
> -          || (ll_bitpos == rl_bitpos + rl_bitsize
> -              && lr_bitpos == rr_bitpos + rr_bitsize))
> -         && ll_bitpos >= 0
> -         && rl_bitpos >= 0
> -         && lr_bitpos >= 0
> -         && rr_bitpos >= 0)
> -       {
> -         tree type;
> -
> -         lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
> -                                   ll_bitsize + rl_bitsize,
> -                                   MIN (ll_bitpos, rl_bitpos),
> -                                   ll_unsignedp, ll_reversep);
> -         rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
> -                                   lr_bitsize + rr_bitsize,
> -                                   MIN (lr_bitpos, rr_bitpos),
> -                                   lr_unsignedp, lr_reversep);
> -
> -         ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
> -                                size_int (MIN (xll_bitpos, xrl_bitpos)));
> -         lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
> -                                size_int (MIN (xlr_bitpos, xrr_bitpos)));
> -
> -         /* Convert to the smaller type before masking out unwanted bits.  */
> -         type = lntype;
> -         if (lntype != rntype)
> -           {
> -             if (lnbitsize > rnbitsize)
> -               {
> -                 lhs = fold_convert_loc (loc, rntype, lhs);
> -                 ll_mask = fold_convert_loc (loc, rntype, ll_mask);
> -                 type = rntype;
> -               }
> -             else if (lnbitsize < rnbitsize)
> -               {
> -                 rhs = fold_convert_loc (loc, lntype, rhs);
> -                 lr_mask = fold_convert_loc (loc, lntype, lr_mask);
> -                 type = lntype;
> -               }
> -           }
> -
> -         if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
> -           lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
> -
> -         if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
> -           rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
> -
> -         return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> -       }
> -
> -      return 0;
> -    }
> -
> -  /* Handle the case of comparisons with constants.  If there is something in
> -     common between the masks, those bits of the constants must be the same.
> -     If not, the condition is always false.  Test for this to avoid generating
> -     incorrect code below.  */
> -  result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> -  if (! integer_zerop (result)
> -      && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
> -                          const_binop (BIT_AND_EXPR, result, r_const)) != 1)
> -    {
> -      if (wanted_code == NE_EXPR)
> -       {
> -         warning (0, "%<or%> of unmatched not-equal tests is always 1");
> -         return constant_boolean_node (true, truth_type);
> -       }
> -      else
> -       {
> -         warning (0, "%<and%> of mutually exclusive equal-tests is always 0");
> -         return constant_boolean_node (false, truth_type);
> -       }
> -    }
> -
> -  if (lnbitpos < 0)
> -    return 0;
> -
> -  /* Construct the expression we will return.  First get the component
> -     reference we will make.  Unless the mask is all ones the width of
> -     that field, perform the mask operation.  Then compare with the
> -     merged constant.  */
> -  result = make_bit_field_ref (loc, ll_inner, ll_arg,
> -                              lntype, lnbitsize, lnbitpos,
> -                              ll_unsignedp || rl_unsignedp, ll_reversep);
> -
> -  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> -  if (! all_ones_mask_p (ll_mask, lnbitsize))
> -    result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
> -
> -  return build2_loc (loc, wanted_code, truth_type, result,
> -                    const_binop (BIT_IOR_EXPR, l_const, r_const));
> +  return 0;
>  }
>
>  /* T is an integer expression that is being multiplied, divided, or taken a
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index c3fa4cb..c3657c0 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -67,6 +67,12 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-strlen.h"
>  #include "varasm.h"
>
> +/* ??? Move this to some header, it's defined in fold-const.c.  */
> +extern tree
> +make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
> +                   HOST_WIDE_INT bitsize, poly_int64 bitpos,
> +                   int unsignedp, int reversep);
> +
>  enum strlen_range_kind {
>    /* Compute the exact constant string length.  */
>    SRK_STRLEN,
> @@ -5964,7 +5970,1139 @@ maybe_fold_comparisons_from_match_pd (tree type, enum tree_code code,
>
>    return NULL_TREE;
>  }
> +
> +/* Return TRUE if expression STMT is suitable for replacement.  ???
> +   Same as ssa_is_replaceable_p, except that we don't insist it has a
> +   single use.  */
> +
> +bool
> +ssa_is_substitutable_p (gimple *stmt)
> +{
> +#if 0
> +  use_operand_p use_p;
> +#endif
> +  tree def;
> +#if 0
> +  gimple *use_stmt;
> +#endif
> +
> +  /* Only consider modify stmts.  */
> +  if (!is_gimple_assign (stmt))
> +    return false;
> +
> +  /* If the statement may throw an exception, it cannot be replaced.  */
> +  if (stmt_could_throw_p (cfun, stmt))
> +    return false;
> +
> +  /* Punt if there is more than 1 def.  */
> +  def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF);
> +  if (!def)
> +    return false;
> +
> +#if 0
> +  /* Only consider definitions which have a single use.  */
> +  if (!single_imm_use (def, &use_p, &use_stmt))
> +    return false;
> +
> +  /* Used in this block, but at the TOP of the block, not the end.  */
> +  if (gimple_code (use_stmt) == GIMPLE_PHI)
> +    return false;
> +#endif
> +
> +  /* There must be no VDEFs.  */
> +  if (gimple_vdef (stmt))
> +    return false;
> +
> +  /* Float expressions must go through memory if float-store is on.  */
> +  if (flag_float_store
> +      && FLOAT_TYPE_P (gimple_expr_type (stmt)))
> +    return false;
> +
> +  /* An assignment with a register variable on the RHS is not
> +     replaceable.  */
> +  if (gimple_assign_rhs_code (stmt) == VAR_DECL
> +      && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)))
> +    return false;
> +
> +  /* No function calls can be replaced.  */
> +  if (is_gimple_call (stmt))
> +    return false;
> +
> +  /* Leave any stmt with volatile operands alone as well.  */
> +  if (gimple_has_volatile_ops (stmt))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Follow substitutable SSA DEFs for *NAME, including type casts,
> +   adjusting *NAME to the single rhs or the type cast operand along
> +   the way.  Return the target type of the earliest type cast
> +   found.  */
> +
> +static tree
> +is_cast_p (tree *name)
> +{
> +  tree type = 0;
> +
> +  while (TREE_CODE (*name) == SSA_NAME
> +        && !SSA_NAME_IS_DEFAULT_DEF (*name))
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (*name);
> +
> +      if (!ssa_is_substitutable_p (def))
> +       break;
> +
> +      if (gimple_num_ops (def) != 2)
> +       break;
> +
> +      if (get_gimple_rhs_class (gimple_expr_code (def))
> +         == GIMPLE_SINGLE_RHS)
> +       {
> +         *name = gimple_assign_rhs1 (def);
> +         continue;
> +       }
> +
> +      if (!gimple_assign_cast_p (def))
> +       break;
> +
> +      if (!type)
> +       type = TREE_TYPE (*name);
> +      *name = gimple_assign_rhs1 (def);
> +    }
> +
> +  return type;
> +}
> +
> +/* Follow substitutable SSA DEFs for *NAME.  If we find it is assigned
> +   a CODE binary expr, return the second operand, and set *NAME to the
> +   first operand.  */
> +
> +static tree
> +is_binop_p (enum tree_code code, tree *name)
> +{
> +  while (TREE_CODE (*name) == SSA_NAME
> +        && !SSA_NAME_IS_DEFAULT_DEF (*name))
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (*name);
> +
> +      if (!ssa_is_substitutable_p (def))
> +       return 0;
> +
> +      switch (gimple_num_ops (def))
> +       {
> +       default:
> +         return 0;
> +
> +       case 2:
> +         if (get_gimple_rhs_class (gimple_expr_code (def))
> +             == GIMPLE_SINGLE_RHS)
> +           {
> +             *name = gimple_assign_rhs1 (def);
> +             continue;
> +           }
> +         return 0;
> +
> +       case 3:
> +         ;
> +       }
> +
> +      if (gimple_assign_rhs_code (def) != code)
> +       return 0;
> +
> +      *name = gimple_assign_rhs1 (def);
> +      return gimple_assign_rhs2 (def);
> +    }
> +
> +  return 0;
> +}
> +
> +/* If *R_ARG is a constant zero, and L_ARG is a possibly masked
> +   BIT_XOR_EXPR, return 1 and set *r_arg to l_arg.
> +   Otherwise, return 0.
> +
> +   The returned value should be passed to decode_field_reference for it
> +   to handle l_arg, and then doubled for r_arg.  */
> +
> +static int
> +prepare_xor (tree l_arg, tree *r_arg)
> +{
> +  int ret = 0;
> +
> +  if (!integer_zerop (*r_arg))
> +    return ret;
> +
> +  tree exp = l_arg;
> +
> +  if (tree and_mask = is_binop_p (BIT_AND_EXPR, &exp))
> +    {
> +      if (TREE_CODE (and_mask) != INTEGER_CST)
> +       return ret;
> +    }
> +
> +  if (is_binop_p (BIT_XOR_EXPR, &exp))
> +    {
> +      *r_arg = l_arg;
> +      return 1;
> +    }
> +
> +  return ret;
> +}
> +
> +/* Subroutine for fold_truth_andor_1: decode a field reference.
> +
> +   If EXP is a comparison reference, we return the innermost reference.
> +
> +   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
> +   set to the starting bit number.
> +
> +   If the innermost field can be completely contained in a mode-sized
> +   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
> +
> +   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
> +   otherwise it is not changed.
> +
> +   *PUNSIGNEDP is set to the signedness of the field.
> +
> +   *PREVERSEP is set to the storage order of the field.
> +
> +   *PMASK is set to the mask used.  This is either contained in a
> +   BIT_AND_EXPR or derived from the width of the field.
> +
> +   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
> +
> +   XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked)
> +   BIT_XOR_EXPR compared with zero.  We're to take the first or second
> +   operand thereof if so.  It should be zero otherwise.
> +
> +   Return 0 if this is not a component reference or is one that we can't
> +   do anything with.  */
> +
> +static tree
> +decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
> +                       HOST_WIDE_INT *pbitpos, machine_mode *pmode,
> +                       int *punsignedp, int *preversep, int *pvolatilep,
> +                       tree *pmask, tree *pand_mask, int xor_which)
> +{
> +  tree exp = *exp_;
> +  tree outer_type = 0;
> +  tree and_mask = 0;
> +  tree mask, inner, offset;
> +  tree unsigned_type;
> +  unsigned int precision;
> +  HOST_WIDE_INT shiftrt = 0;
> +
> +  /* All the optimizations using this function assume integer fields.
> +     There are problems with FP fields since the type_for_size call
> +     below can fail for, e.g., XFmode.  */
> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> +    return 0;
> +
> +  /* We are interested in the bare arrangement of bits, so strip everything
> +     that doesn't affect the machine mode.  However, record the type of the
> +     outermost expression if it may matter below.  */
> +  outer_type = is_cast_p (&exp);
> +
> +  if ((and_mask = is_binop_p (BIT_AND_EXPR, &exp)))
> +    {
> +      if (TREE_CODE (and_mask) != INTEGER_CST)
> +       return 0;
> +    }
> +
> +  if (xor_which)
> +    {
> +      tree op2 = is_binop_p (BIT_XOR_EXPR, &exp);
> +      gcc_checking_assert (op2);
> +      if (xor_which > 1)
> +       exp = op2;
> +    }
> +
> +  if (tree t = is_cast_p (&exp))
> +    if (!outer_type)
> +      outer_type = t;
> +
> +  if (tree shift = is_binop_p (RSHIFT_EXPR, &exp))
> +    {
> +      if (TREE_CODE (shift) != INTEGER_CST || !tree_fits_shwi_p (shift))
> +       return 0;
> +      shiftrt = tree_to_shwi (shift);
> +      if (shiftrt <= 0)
> +       return 0;
> +    }
> +
> +  if (tree t = is_cast_p (&exp))
> +    if (!outer_type)
> +      outer_type = t;
> +
> +  poly_int64 poly_bitsize, poly_bitpos;
> +  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
> +                              pmode, punsignedp, preversep, pvolatilep);
> +
> +  if ((inner == exp && and_mask == 0)
> +      || !poly_bitsize.is_constant (pbitsize)
> +      || !poly_bitpos.is_constant (pbitpos)
> +      || *pbitsize <= shiftrt
> +      || offset != 0
> +      || TREE_CODE (inner) == PLACEHOLDER_EXPR
> +      /* Reject out-of-bound accesses (PR79731).  */
> +      || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> +         && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> +                              *pbitpos + *pbitsize) < 0))
> +    return NULL_TREE;
> +
> +  if (shiftrt)
> +    {
> +      if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       *pbitpos += shiftrt;
> +      *pbitsize -= shiftrt;
> +    }
> +
> +  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
> +    {
> +      HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
> +      if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       *pbitpos += excess;
> +      *pbitsize -= excess;
> +    }
> +
> +  unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
> +  if (unsigned_type == NULL_TREE)
> +    return NULL_TREE;
> +
> +  *exp_ = exp;
> +
> +  /* If the number of bits in the reference is the same as the bitsize of
> +     the outer type, then the outer type gives the signedness. Otherwise
> +     (in case of a small bitfield) the signedness is unchanged.  */
> +  if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
> +    *punsignedp = TYPE_UNSIGNED (outer_type);
> +
> +  /* Compute the mask to access the bitfield.  */
> +  precision = TYPE_PRECISION (unsigned_type);
> +
> +  mask = build_int_cst_type (unsigned_type, -1);
> +
> +  mask = int_const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> +  mask = int_const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> +
> +  /* Merge it with the mask we found in the BIT_AND_EXPR, if any.  */
> +  if (and_mask != 0)
> +    mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
> +                       fold_convert_loc (loc, unsigned_type, and_mask), mask);
> +
> +  *pmask = mask;
> +  *pand_mask = and_mask;
> +  return inner;
> +}
> +
> +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
> +   bit value.  Arrange things so the extra bits will be set to zero if and
> +   only if C is signed-extended to its full width.  If MASK is nonzero,
> +   it is an INTEGER_CST that should be AND'ed with the extra bits.  */
> +
> +static tree
> +unextend (tree c, int p, int unsignedp, tree mask)
> +{
> +  tree type = TREE_TYPE (c);
> +  int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
> +  tree temp;
> +
> +  if (p == modesize || unsignedp)
> +    return c;
> +
> +  /* We work by getting just the sign bit into the low-order bit, then
> +     into the high-order bit, then sign-extend.  We then XOR that value
> +     with C.  */
> +  temp = build_int_cst (TREE_TYPE (c),
> +                       wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
> +
> +  /* We must use a signed type in order to get an arithmetic right shift.
> +     However, we must also avoid introducing accidental overflows, so that
> +     a subsequent call to integer_zerop will work.  Hence we must
> +     do the type conversion here.  At this point, the constant is either
> +     zero or one, and the conversion to a signed type can never overflow.
> +     We could get an overflow if this conversion is done anywhere else.  */
> +  if (TYPE_UNSIGNED (type))
> +    temp = fold_convert (signed_type_for (type), temp);
> +
> +  temp = int_const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
> +  temp = int_const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
> +  if (mask != 0)
> +    temp = int_const_binop (BIT_AND_EXPR, temp,
> +                           fold_convert (TREE_TYPE (c), mask));
> +  /* If necessary, convert the type back to match the type of C.  */
> +  if (TYPE_UNSIGNED (type))
> +    temp = fold_convert (type, temp);
> +
> +  return fold_convert (type, int_const_binop (BIT_XOR_EXPR, c, temp));
> +}
> +
> +/* Return the one bitpos within bit extents L or R that is at an
> +   ALIGN-bit alignment boundary, or -1 if there is more than one such
> +   boundary, if there isn't any, or if there is any such boundary
> +   between the extents.  L and R are given by bitpos and bitsize.  If
> +   it doesn't return -1, there are two consecutive ALIGN-bit words
> +   that contain both extents, and at least one of the extents
> +   straddles across the returned alignment boundary.  */
> +
> +static inline HOST_WIDE_INT
> +compute_split_boundary_from_align (HOST_WIDE_INT align,
> +                                  HOST_WIDE_INT l_bitpos,
> +                                  HOST_WIDE_INT l_bitsize,
> +                                  HOST_WIDE_INT r_bitpos,
> +                                  HOST_WIDE_INT r_bitsize)
> +{
> +  HOST_WIDE_INT amask = ~(align - 1);
> +
> +  HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
> +  HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
> +
> +  HOST_WIDE_INT boundary = (end_bit - 1) & amask;
> +
> +  /* Make sure we're crossing no more than one alignment boundary.
> +
> +     ??? We don't have logic to recombine loads of two adjacent
> +     fields that each crosses a different alignment boundary, so
> +     as to load the middle word only once, if other words can't be
> +     otherwise recombined.  */
> +  if (boundary - first_bit > align)
> +    return -1;
> +
> +  HOST_WIDE_INT l_start_word = l_bitpos & amask;
> +  HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
> +
> +  HOST_WIDE_INT r_start_word = r_bitpos & amask;
> +  HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
> +
> +  /* If neither field straddles across an alignment boundary, it's no
> +     use to even try to merge them.  */
> +  if (l_start_word == l_end_word && r_start_word == r_end_word)
> +    return -1;
> +
> +  return boundary;
> +}
> +
> +/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
> +   LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
> +   respectively, starting at BIT_POS, using reversed endianness if
> +   REVERSEP.  Also initialize BITPOS (the starting position of each
> +   part into INNER), BITSIZ (the bit count starting at BITPOS),
> +   TOSHIFT[1] (the amount by which the part and its mask are to be
> +   shifted right to bring its least-significant bit to bit zero) and
> +   SHIFTED (the amount by which the part, by separate loading, has
> +   already been shifted right, but that the mask needs shifting to
> +   match).  */
> +
> +static inline void
> +build_split_load (tree /* out */ ln_arg[2],
> +                 HOST_WIDE_INT /* out */ bitpos[2],
> +                 HOST_WIDE_INT /* out */ bitsiz[2],
> +                 HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> +                 HOST_WIDE_INT /* out */ shifted[2],
> +                 location_t loc, tree inner, tree orig_inner,
> +                 scalar_int_mode mode, scalar_int_mode mode2,
> +                 HOST_WIDE_INT bit_pos, bool reversep)
> +{
> +  bitsiz[0] = GET_MODE_BITSIZE (mode);
> +  bitsiz[1] = GET_MODE_BITSIZE (mode2);
> +
> +  for (int i = 0; i < 2; i++)
> +    {
> +      tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
> +      bitpos[i] = bit_pos;
> +      ln_arg[i] = make_bit_field_ref (loc, inner, orig_inner,
> +                                     type, bitsiz[i],
> +                                     bit_pos, 1, reversep);
> +      bit_pos += bitsiz[i];
> +    }
> +
> +  toshift[1] = toshift[0];
> +  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      shifted[0] = bitsiz[1];
> +      shifted[1] = 0;
> +      toshift[0] = 0;
> +    }
> +  else
> +    {
> +      shifted[1] = bitsiz[0];
> +      shifted[0] = 0;
> +      toshift[1] = 0;
> +    }
> +}
> +
> +/* Make arrangements to split at bit BOUNDARY a single loaded word
> +   (with REVERSEP bit order) LN_ARG[0], to be shifted right by
> +   TOSHIFT[0] to bring the field of interest to the least-significant
> +   bit.  The expectation is that the same loaded word will be
> +   propagated from part 0 to part 1, with just different shifting and
> +   masking to extract both parts.  MASK is not expected to do more
> +   than masking out the bits that belong to the other part.  See
> +   build_split_load for more information on the other fields.  */
> +
> +static inline void
> +reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
> +                 HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
> +                 HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
> +                 HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> +                 HOST_WIDE_INT /* out */ shifted[2],
> +                 tree /* out */ mask[2],
> +                 HOST_WIDE_INT boundary, bool reversep)
> +{
> +  ln_arg[1] = ln_arg[0];
> +  bitpos[1] = bitpos[0];
> +  bitsiz[1] = bitsiz[0];
> +  shifted[1] = shifted[0] = 0;
> +
> +  tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1);
> +
> +  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      toshift[1] = toshift[0];
> +      toshift[0] = bitpos[0] + bitsiz[0] - boundary;
> +      mask[0] = int_const_binop (LSHIFT_EXPR, basemask,
> +                                bitsize_int (toshift[0]));
> +      mask[1] = int_const_binop (BIT_XOR_EXPR, basemask, mask[0]);
> +    }
> +  else
> +    {
> +      toshift[1] = boundary - bitpos[1];
> +      mask[1] = int_const_binop (LSHIFT_EXPR, basemask,
> +                                bitsize_int (toshift[1]));
> +      mask[0] = int_const_binop (BIT_XOR_EXPR, basemask, mask[1]);
> +    }
> +}
> +
> +/* Find ways of folding logical expressions of LHS and RHS:
> +   Try to merge two comparisons to the same innermost item.
> +   Look for range tests like "ch >= '0' && ch <= '9'".
> +   Look for combinations of simple terms on machines with expensive branches
> +   and evaluate the RHS unconditionally.
> +
> +   For example, if we have p->a == 2 && p->b == 4 and we can make an
> +   object large enough to span both A and B, we can do this with a comparison
> +   against the object ANDed with the a mask.
> +
> +   If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
> +   operations to do this with one comparison.
> +
> +   We check for both normal comparisons and the BIT_AND_EXPRs made this by
> +   function and the one above.
> +
> +   CODE is the logical operation being done.  It can be TRUTH_ANDIF_EXPR,
> +   TRUTH_AND_EXPR, TRUTH_ORIF_EXPR, or TRUTH_OR_EXPR.
> +
> +   TRUTH_TYPE is the type of the logical operand and LHS and RHS are its
> +   two operands.
> +
> +   SEPARATEP should be NULL if LHS and RHS are adjacent in
> +   CODE-chained compares, and a non-NULL pointer to NULL_TREE
> +   otherwise.  If the "words" accessed by RHS are already accessed by
> +   LHS, this won't matter, but if RHS accesses "words" that LHS
> +   doesn't, then *SEPARATEP will be set to the compares that should
> +   take RHS's place.  By "words" we mean contiguous bits that do not
> +   cross a an TYPE_ALIGN boundary of the accessed object's type.
> +
> +   We return the simplified tree or 0 if no optimization is possible.  */
> +
> +tree
> +fold_truth_andor_maybe_separate (location_t loc,
> +                                enum tree_code code, tree truth_type,
> +                                enum tree_code lcode, tree ll_arg, tree lr_arg,
> +                                enum tree_code rcode, tree rl_arg, tree rr_arg,
> +                                tree *separatep)
> +{
> +  /* If this is the "or" of two comparisons, we can do something if
> +     the comparisons are NE_EXPR.  If this is the "and", we can do something
> +     if the comparisons are EQ_EXPR.  I.e.,
> +       (a->b == 2 && a->c == 4) can become (a->new == NEW).
> +
> +     WANTED_CODE is this operation code.  For single bit fields, we can
> +     convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
> +     comparison for one-bit fields.  */
> +
> +  enum tree_code orig_code = code;
> +  enum tree_code wanted_code;
> +  tree ll_inner, lr_inner, rl_inner, rr_inner;
> +  HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
> +  HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
> +  HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
> +  HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
> +  int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
> +  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
> +  machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
> +  scalar_int_mode lnmode, lnmode2, rnmode;
> +  tree ll_mask, lr_mask, rl_mask, rr_mask;
> +  tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
> +  tree l_const, r_const;
> +  tree lntype, rntype, result;
> +  HOST_WIDE_INT first_bit, end_bit;
> +  int volatilep;
> +  bool l_split_load;
> +
> +  gcc_checking_assert (!separatep || !*separatep);
> +
> +  /* Start by getting the comparison codes.  Fail if anything is volatile.
> +     If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> +     it were surrounded with a NE_EXPR.  */
> +
> +  if (TREE_SIDE_EFFECTS (ll_arg) || TREE_SIDE_EFFECTS (lr_arg)
> +      || TREE_SIDE_EFFECTS (rl_arg) || TREE_SIDE_EFFECTS (rr_arg))
> +    return 0;
> +
> +  if (TREE_CODE_CLASS (lcode) != tcc_comparison
> +      || TREE_CODE_CLASS (rcode) != tcc_comparison)
> +    return 0;
> +
> +  code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
> +         ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
> +
> +  bool lsignbit = false, rsignbit = false;
> +  if ((lcode == LT_EXPR || lcode == GE_EXPR)
> +      && integer_zerop (lr_arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
> +      && !TYPE_UNSIGNED (TREE_TYPE (ll_arg)))
> +    {
> +      lsignbit = true;
> +      lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
> +    }
> +  if ((rcode == LT_EXPR || rcode == GE_EXPR)
> +      && integer_zerop (rr_arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
> +      && !TYPE_UNSIGNED (TREE_TYPE (rl_arg)))
> +    {
> +      rsignbit = true;
> +      rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
> +    }
> +
> +  /* See if the comparisons can be merged.  Then get all the parameters for
> +     each side.  */
> +
> +  if ((lcode != EQ_EXPR && lcode != NE_EXPR)
> +      || (rcode != EQ_EXPR && rcode != NE_EXPR))
> +    return 0;
> +
> +  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
> +  volatilep = 0;
> +  int l_xor = prepare_xor (ll_arg, &lr_arg);
> +  ll_inner = decode_field_reference (loc, &ll_arg,
> +                                    &ll_bitsize, &ll_bitpos, &ll_mode,
> +                                    &ll_unsignedp, &ll_reversep, &volatilep,
> +                                    &ll_mask, &ll_and_mask, l_xor);
> +  lr_inner = decode_field_reference (loc, &lr_arg,
> +                                    &lr_bitsize, &lr_bitpos, &lr_mode,
> +                                    &lr_unsignedp, &lr_reversep, &volatilep,
> +                                    &lr_mask, &lr_and_mask, 2 * l_xor);
> +  int r_xor = prepare_xor (rl_arg, &rr_arg);
> +  rl_inner = decode_field_reference (loc, &rl_arg,
> +                                    &rl_bitsize, &rl_bitpos, &rl_mode,
> +                                    &rl_unsignedp, &rl_reversep, &volatilep,
> +                                    &rl_mask, &rl_and_mask, r_xor);
> +  rr_inner = decode_field_reference (loc, &rr_arg,
> +                                    &rr_bitsize, &rr_bitpos, &rr_mode,
> +                                    &rr_unsignedp, &rr_reversep, &volatilep,
> +                                    &rr_mask, &rr_and_mask, 2 * r_xor);
> +
> +  /* It must be true that the inner operation on the lhs of each
> +     comparison must be the same if we are to be able to do anything.
> +     Then see if we have constants.  If not, the same must be true for
> +     the rhs's.  */
> +  if (volatilep
> +      || ll_reversep != rl_reversep
> +      || ll_inner == 0 || rl_inner == 0
> +      || ! operand_equal_p (ll_inner, rl_inner, 0))
> +    return 0;
> +
> +  if (TREE_CODE (lr_arg) == INTEGER_CST
> +      && TREE_CODE (rr_arg) == INTEGER_CST)
> +    {
> +      l_const = lr_arg, r_const = rr_arg;
> +      lr_reversep = ll_reversep;
> +    }
> +  else if (lr_reversep != rr_reversep
> +          || lr_inner == 0 || rr_inner == 0
> +          || ! operand_equal_p (lr_inner, rr_inner, 0))
> +    return 0;
> +  else
> +    l_const = r_const = 0;
> +
> +  if (lsignbit)
> +    {
> +      tree mask = build_int_cst_type (TREE_TYPE (ll_arg), -1);
> +      tree sign = int_const_binop (LSHIFT_EXPR, mask,
> +                                  bitsize_int (ll_bitsize - 1));
> +      if (!ll_mask)
> +       ll_mask = sign;
> +      else
> +       ll_mask = int_const_binop (BIT_AND_EXPR, ll_mask, sign);
> +      if (!ll_and_mask)
> +       ll_and_mask = sign;
> +      else
> +       ll_and_mask = int_const_binop (BIT_AND_EXPR, ll_and_mask, sign);
> +    }
> +
> +  if (rsignbit)
> +    {
> +      tree mask = build_int_cst_type (TREE_TYPE (rl_arg), -1);
> +      tree sign = int_const_binop (LSHIFT_EXPR, mask,
> +                                  bitsize_int (rl_bitsize - 1));
> +      if (!rl_mask)
> +       rl_mask = sign;
> +      else
> +       rl_mask = int_const_binop (BIT_AND_EXPR, rl_mask, sign);
> +      if (!rl_and_mask)
> +       rl_and_mask = sign;
> +      else
> +       rl_and_mask = int_const_binop (BIT_AND_EXPR, rl_and_mask, sign);
> +    }
> +
> +  /* If either comparison code is not correct for our logical operation,
> +     fail.  However, we can convert a one-bit comparison against zero into
> +     the opposite comparison against that bit being set in the field.  */
> +
> +  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
> +  if (lcode != wanted_code)
> +    {
> +      if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
> +       {
> +         /* Make the left operand unsigned, since we are only interested
> +            in the value of one bit.  Otherwise we are doing the wrong
> +            thing below.  */
> +         ll_unsignedp = 1;
> +         l_const = ll_mask;
> +       }
> +      else
> +       return 0;
> +    }
> +
> +  /* This is analogous to the code for l_const above.  */
> +  if (rcode != wanted_code)
> +    {
> +      if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
> +       {
> +         rl_unsignedp = 1;
> +         r_const = rl_mask;
> +       }
> +      else
> +       return 0;
> +    }
> +
> +  /* This will be bumped to 2 if any of the field pairs crosses an
> +     alignment boundary, so the merged compare has to be done in two
> +     parts.  */
> +  int parts = 1;
> +  /* Set to true if the second combined compare should come first,
> +     e.g., because the second original compare accesses a word that
> +     the first one doesn't, and the combined compares access those in
> +     cmp[0].  */
> +  bool first1 = false;
> +  /* Set to true if the first original compare is not the one being
> +     split.  */
> +  bool maybe_separate = false;
> +
> +  /* The following 2-dimensional arrays use the first index to
> +     identify left(0)- vs right(1)-hand compare operands, and the
> +     second one to identify merged compare parts.  */
> +  /* The memory loads or constants to be compared.  */
> +  tree ld_arg[2][2];
> +  /* The first bit of the corresponding inner object that the
> +     corresponding LD_ARG covers.  */
> +  HOST_WIDE_INT bitpos[2][2];
> +  /* The bit count starting at BITPOS that the corresponding LD_ARG
> +     covers.  */
> +  HOST_WIDE_INT bitsiz[2][2];
> +  /* The number of bits by which LD_ARG has already been shifted
> +     right, WRT mask.  */
> +  HOST_WIDE_INT shifted[2][2];
> +  /* The number of bits by which both LD_ARG and MASK need shifting to
> +     bring its least-significant bit to bit zero.  */
> +  HOST_WIDE_INT toshift[2][2];
> +  /* An additional mask to be applied to LD_ARG, to remove any bits
> +     that may have been loaded for use in another compare, but that
> +     don't belong in the corresponding compare.  */
> +  tree xmask[2][2] = {};
> +
> +  /* The combined compare or compares.  */
> +  tree cmp[2];
> +
> +  /* Consider we're comparing two non-contiguous fields of packed
> +     structs, both aligned at 32-bit boundaries:
> +
> +     ll_arg: an 8-bit field at offset 0
> +     lr_arg: a 16-bit field at offset 2
> +
> +     rl_arg: an 8-bit field at offset 1
> +     rr_arg: a 16-bit field at offset 3
> +
> +     We'll have r_split_load, because rr_arg straddles across an
> +     alignment boundary.
> +
> +     We'll want to have:
> +
> +     bitpos  = { {  0,  0 }, {  0, 32 } }
> +     bitsiz  = { { 32, 32 }, { 32,  8 } }
> +
> +     And, for little-endian:
> +
> +     shifted = { {  0,  0 }, {  0, 32 } }
> +     toshift = { {  0, 24 }, {  0,  0 } }
> +
> +     Or, for big-endian:
> +
> +     shifted = { {  0,  0 }, {  8,  0 } }
> +     toshift = { {  8,  0 }, {  0,  0 } }
> +  */
> +
> +  /* See if we can find a mode that contains both fields being compared on
> +     the left.  If we can't, fail.  Otherwise, update all constants and masks
> +     to be relative to a field of that size.  */
> +  first_bit = MIN (ll_bitpos, rl_bitpos);
> +  end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> +  if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> +                     TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> +                     volatilep, &lnmode))
> +    {
> +      /* Consider the possibility of recombining loads if any of the
> +        fields straddles across an alignment boundary, so that either
> +        part can be loaded along with the other field.  */
> +      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
> +      HOST_WIDE_INT boundary = compute_split_boundary_from_align
> +       (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +
> +      if (boundary < 0
> +         || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> +                            align, BITS_PER_WORD, volatilep, &lnmode)
> +         || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> +                            align, BITS_PER_WORD, volatilep, &lnmode2))
> +       return 0;
> +
> +      l_split_load = true;
> +      parts = 2;
> +      if (ll_bitpos >= boundary)
> +       maybe_separate = first1 = true;
> +      else if (ll_bitpos + ll_bitsize <= boundary)
> +       maybe_separate = true;
> +    }
> +  else
> +    l_split_load = false;
> +
> +  lnbitsize = GET_MODE_BITSIZE (lnmode);
> +  lnbitpos = first_bit & ~ (lnbitsize - 1);
> +  if (l_split_load)
> +    lnbitsize += GET_MODE_BITSIZE (lnmode2);
> +  lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
> +  if (!lntype)
> +    {
> +      gcc_checking_assert (l_split_load);
> +      lntype = build_nonstandard_integer_type (lnbitsize, 1);
> +    }
> +  xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
> +
> +  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
> +      xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
> +    }
> +
> +  ll_mask = int_const_binop (LSHIFT_EXPR,
> +                            fold_convert_loc (loc, lntype, ll_mask),
> +                            size_int (xll_bitpos));
> +  rl_mask = int_const_binop (LSHIFT_EXPR,
> +                            fold_convert_loc (loc, lntype, rl_mask),
> +                            size_int (xrl_bitpos));
> +
> +  if (l_const)
> +    {
> +      l_const = fold_convert_loc (loc, lntype, l_const);
> +      l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
> +      l_const = int_const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
> +      if (! integer_zerop (int_const_binop (BIT_AND_EXPR, l_const,
> +                                           fold_build1_loc (loc, BIT_NOT_EXPR,
> +                                                            lntype, ll_mask))))
> +       {
> +         warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> +
> +         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> +       }
> +    }
> +  if (r_const)
> +    {
> +      r_const = fold_convert_loc (loc, lntype, r_const);
> +      r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
> +      r_const = int_const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
> +      if (! integer_zerop (int_const_binop (BIT_AND_EXPR, r_const,
> +                                           fold_build1_loc (loc, BIT_NOT_EXPR,
> +                                                            lntype, rl_mask))))
> +       {
> +         warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> +
> +         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> +       }
> +    }
> +
> +  /* If the right sides are not constant, do the same for it.  Also,
> +     disallow this optimization if a size, signedness or storage order
> +     mismatch occurs between the left and right sides.  */
> +  if (l_const == 0)
> +    {
> +      if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
> +         || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
> +         || ll_reversep != lr_reversep
> +         /* Make sure the two fields on the right
> +            correspond to the left without being swapped.  */
> +         || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos
> +         || lnbitpos < 0)
> +       return 0;
> +
> +      bool r_split_load;
> +      scalar_int_mode rnmode2;
> +
> +      first_bit = MIN (lr_bitpos, rr_bitpos);
> +      end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> +      if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> +                         TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> +                         volatilep, &rnmode))
> +       {
> +         /* Consider the possibility of recombining loads if any of the
> +            fields straddles across an alignment boundary, so that either
> +            part can be loaded along with the other field.  */
> +         HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
> +         HOST_WIDE_INT boundary = compute_split_boundary_from_align
> +           (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
> +
> +         if (boundary < 0
> +             /* If we're to split both, make sure the split point is
> +                the same.  */
> +             || (l_split_load
> +                 && (boundary - lr_bitpos
> +                     != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
> +             || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> +                                align, BITS_PER_WORD, volatilep, &rnmode)
> +             || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> +                                align, BITS_PER_WORD, volatilep, &rnmode2))
> +           return 0;
> +
> +         r_split_load = true;
> +         parts = 2;
> +         if (lr_bitpos >= boundary)
> +           maybe_separate = first1 = true;
> +         else if (lr_bitpos + lr_bitsize <= boundary)
> +           maybe_separate = true;
> +       }
> +      else
> +       r_split_load = false;
> +
> +      rnbitsize = GET_MODE_BITSIZE (rnmode);
> +      rnbitpos = first_bit & ~ (rnbitsize - 1);
> +      if (r_split_load)
> +       rnbitsize += GET_MODE_BITSIZE (rnmode2);
> +      rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
> +      if (!rntype)
> +       {
> +         gcc_checking_assert (r_split_load);
> +         rntype = build_nonstandard_integer_type (rnbitsize, 1);
> +       }
> +      xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
> +
> +      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       {
> +         xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
> +         xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
> +       }
> +
> +      lr_mask = int_const_binop (LSHIFT_EXPR,
> +                                fold_convert_loc (loc, rntype, lr_mask),
> +                                size_int (xlr_bitpos));
> +      rr_mask = int_const_binop (LSHIFT_EXPR,
> +                                fold_convert_loc (loc, rntype, rr_mask),
> +                                size_int (xrr_bitpos));
> +
> +      lr_mask = int_const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
> +
> +      toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
> +      shifted[1][0] = 0;
> +
> +      if (!r_split_load)
> +       {
> +         bitpos[1][0] = rnbitpos;
> +         bitsiz[1][0] = rnbitsize;
> +         ld_arg[1][0] = make_bit_field_ref (loc, lr_inner, lr_arg,
> +                                            rntype, rnbitsize, rnbitpos,
> +                                            lr_unsignedp || rr_unsignedp,
> +                                            lr_reversep);
> +       }
> +
> +      if (parts > 1)
> +       {
> +         if (r_split_load)
> +           build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                             shifted[1], loc, lr_inner, lr_arg,
> +                             rnmode, rnmode2, rnbitpos, lr_reversep);
> +         else
> +           reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                             shifted[1], xmask[1],
> +                             lnbitpos + GET_MODE_BITSIZE (lnmode)
> +                             - ll_bitpos + lr_bitpos, lr_reversep);
> +       }
> +    }
> +  else
> +    {
> +      /* Handle the case of comparisons with constants.  If there is
> +        something in common between the masks, those bits of the
> +        constants must be the same.  If not, the condition is always
> +        false.  Test for this to avoid generating incorrect code
> +        below.  */
> +      result = int_const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> +      if (! integer_zerop (result)
> +         && simple_cst_equal (int_const_binop (BIT_AND_EXPR,
> +                                               result, l_const),
> +                              int_const_binop (BIT_AND_EXPR,
> +                                               result, r_const)) != 1)
> +       {
> +         if (wanted_code == NE_EXPR)
> +           {
> +             warning (0,
> +                      "%<or%> of unmatched not-equal tests"
> +                      " is always 1");
> +             return constant_boolean_node (true, truth_type);
> +           }
> +         else
> +           {
> +             warning (0,
> +                      "%<and%> of mutually exclusive equal-tests"
> +                      " is always 0");
> +             return constant_boolean_node (false, truth_type);
> +           }
> +       }
> +
> +      if (lnbitpos < 0)
> +       return 0;
> +
> +      /* The constants are combined so as to line up with the loaded
> +        field, so use the same parameters.  */
> +      ld_arg[1][0] = int_const_binop (BIT_IOR_EXPR, l_const, r_const);
> +      toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
> +      shifted[1][0] = 0;
> +      bitpos[1][0] = lnbitpos;
> +      bitsiz[1][0] = lnbitsize;
> +
> +      if (parts > 1)
> +       reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                         shifted[1], xmask[1],
> +                         lnbitpos + GET_MODE_BITSIZE (lnmode),
> +                         lr_reversep);
> +
> +      lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1);
> +
> +      /* If the compiler thinks this is used uninitialized below, it's
> +        because it can't realize that parts can only be 2 when
> +        comparing wiht constants if l_split_load is also true.  This
> +        just silences the warning.  */
> +      rnbitpos = 0;
> +    }
> +
> +  ll_mask = int_const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> +  toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
> +  shifted[0][0] = 0;
> +
> +  if (!l_split_load)
> +    {
> +      bitpos[0][0] = lnbitpos;
> +      bitsiz[0][0] = lnbitsize;
> +      ld_arg[0][0] = make_bit_field_ref (loc, ll_inner, ll_arg,
> +                                        lntype, lnbitsize, lnbitpos,
> +                                        ll_unsignedp || rl_unsignedp,
> +                                        ll_reversep);
> +    }
>
> +  if (parts > 1)
> +    {
> +      if (l_split_load)
> +       build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> +                         shifted[0], loc, ll_inner, ll_arg,
> +                         lnmode, lnmode2, lnbitpos, ll_reversep);
> +      else
> +       reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> +                         shifted[0], xmask[0],
> +                         rnbitpos + GET_MODE_BITSIZE (rnmode)
> +                         - lr_bitpos + ll_bitpos, ll_reversep);
> +    }
> +
> +  for (int i = 0; i < parts; i++)
> +    {
> +      tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
> +      tree mask[2] = { ll_mask, lr_mask };
> +
> +      for (int j = 0; j < 2; j++)
> +       {
> +         /* Mask out the bits belonging to the other part.  */
> +         if (xmask[j][i])
> +           mask[j] = int_const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]);
> +
> +         if (shifted[j][i])
> +           {
> +             tree shiftsz = bitsize_int (shifted[j][i]);
> +             mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
> +           }
> +         mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]);
> +       }
> +
> +      HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
> +
> +      if (shift)
> +       {
> +         int j;
> +         if (shift > 0)
> +           j = 0;
> +         else
> +           {
> +             j = 1;
> +             shift = -shift;
> +           }
> +
> +         tree shiftsz = bitsize_int (shift);
> +         op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]),
> +                                  op[j], shiftsz);
> +         mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
> +       }
> +
> +      /* Convert to the smaller type before masking out unwanted
> +        bits.  */
> +      tree type = TREE_TYPE (op[0]);
> +      if (type != TREE_TYPE (op[1]))
> +       {
> +         int j = (TYPE_PRECISION (type)
> +                  < TYPE_PRECISION (TREE_TYPE (op[1])));
> +         if (!j)
> +           type = TREE_TYPE (op[1]);
> +         op[j] = fold_convert_loc (loc, type, op[j]);
> +         mask[j] = fold_convert_loc (loc, type, mask[j]);
> +       }
> +
> +      for (int j = 0; j < 2; j++)
> +       if (! integer_all_onesp (mask[j]))
> +         op[j] = build2_loc (loc, BIT_AND_EXPR, type,
> +                             op[j], mask[j]);
> +
> +      cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]);
> +    }
> +
> +  if (first1)
> +    std::swap (cmp[0], cmp[1]);
> +
> +  if (parts == 1)
> +    result = cmp[0];
> +  else if (!separatep || !maybe_separate)
> +    result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]);
> +  else
> +    {
> +      result = cmp[0];
> +      *separatep = cmp[1];
> +    }
> +
> +  return result;
> +}
> +
>  /* Try to simplify the AND of two comparisons, specified by
>     (OP1A CODE1 OP1B) and (OP2B CODE2 OP2B), respectively.
>     If this can be simplified to a single expression (without requiring
> @@ -5988,6 +7126,13 @@ maybe_fold_and_comparisons (tree type,
>                                                      op2b))
>      return t;
>
> +  if (tree t = fold_truth_andor_maybe_separate (UNKNOWN_LOCATION,
> +                                               TRUTH_ANDIF_EXPR, type,
> +                                               code2, op2a, op2b,
> +                                               code1, op1a, op1b,
> +                                               NULL))
> +    return t;
> +
>    return NULL_TREE;
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c b/gcc/testsuite/gcc.dg/field-merge-1.c
> new file mode 100644
> index 00000000..1818e104
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-1.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
> +
> +/* Check that field loads compared with constants are merged, even if
> +   tested out of order, and when fields straddle across alignment
> +   boundaries.  */
> +
> +struct TL {
> +  unsigned char p;
> +  unsigned int a;
> +  unsigned char q;
> +  unsigned int b;
> +  unsigned char r;
> +  unsigned int c;
> +  unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("little-endian")));
> +
> +struct TB {
> +  unsigned char p;
> +  unsigned int a;
> +  unsigned char q;
> +  unsigned int b;
> +  unsigned char r;
> +  unsigned int c;
> +  unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
> +struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
> +
> +void f (void) {
> +  /* Which words of    | vL | vB | */
> +  /* are accessed by   |0123|0123| */
> +  if (0 /* the tests?  |    |    | */
> +      || vL.p != vc /* |*   |    | */
> +      || vB.p != vc /* |    |*   | */
> +      || vL.s != vc /* |   *|    | */
> +      || vB.q != vc /* |    | *  | */
> +      || vL.a != vi /* |^*  |    | */
> +      || vB.b != vi /* |    | ^* | */
> +      || vL.c != vi /* |  *^|    | */
> +      || vB.c != vi /* |    |  ^*| */
> +      || vL.b != vi /* | ^^ |    | */
> +      || vL.q != vc /* | ^  |    | */
> +      || vB.a != vi /* |    |^^  | */
> +      || vB.r != vc /* |    |  ^ | */
> +      || vB.s != vc /* |    |   ^| */
> +      || vL.r != vc /* |  ^ |    | */
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
> +/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || rs6000-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c b/gcc/testsuite/gcc.dg/field-merge-2.c
> new file mode 100644
> index 00000000..80c573b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct TL {
> +  unsigned short a;
> +  unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +struct TB {
> +  unsigned char p;
> +  unsigned short a;
> +  unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +#define vc 0xaa
> +
> +struct TL vL = { vc, vc };
> +struct TB vB = { vc, vc, vc };
> +
> +void f (void) {
> +  if (0
> +      || vL.b != vB.b
> +      || vL.a != vB.a
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c b/gcc/testsuite/gcc.dg/field-merge-3.c
> new file mode 100644
> index 00000000..f26e8a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-3.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
> +
> +struct T1 {
> +  unsigned char p[2];
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +struct T2 {
> +  unsigned short p;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
> +struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
> +      || v1.a != v2.a
> +      || ((v1.z ^ v2.z) & 0xff00ff00) != 0
> +      || ((v1.z ^ v2.z) & 0x00ff00ff) != 0)
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c b/gcc/testsuite/gcc.dg/field-merge-4.c
> new file mode 100644
> index 00000000..c629069
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-4.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> +  unsigned int zn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +struct T2 {
> +  unsigned int zn;
> +  unsigned char rn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p != v2.p
> +      || v1.a != v2.a
> +      || v1.z != v2.z
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c b/gcc/testsuite/gcc.dg/field-merge-5.c
> new file mode 100644
> index 00000000..1580b14bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-5.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> +  unsigned int zn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +struct T2 {
> +  unsigned int zn;
> +  unsigned char rn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p != v2.p
> +      || v1.a != v2.a
> +      || v1.z != v2.z
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c
> index 21a70f4..6234756 100644
> --- a/gcc/tree-ssa-ifcombine.c
> +++ b/gcc/tree-ssa-ifcombine.c
> @@ -40,7 +40,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify-me.h"
>  #include "tree-cfg.h"
>  #include "tree-ssa.h"
> -
> +
>  #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
>  #define LOGICAL_OP_NON_SHORT_CIRCUIT \
>    (BRANCH_COST (optimize_function_for_speed_p (cfun), \
> @@ -126,7 +126,7 @@ bb_no_side_effects_p (basic_block bb)
>        if (gimple_has_side_effects (stmt)
>           || gimple_uses_undefined_value_p (stmt)
>           || gimple_could_trap_p (stmt)
> -         || gimple_vuse (stmt)
> +         || gimple_has_volatile_ops (stmt)
>           /* const calls don't match any of the above, yet they could
>              still have some side-effects - they could contain
>              gimple_could_trap_p statements, like floating point
> @@ -561,8 +561,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
>                                             gimple_cond_lhs (outer_cond),
>                                             gimple_cond_rhs (outer_cond))))
>         {
> +         {
>           tree t1, t2;
> -         gimple_stmt_iterator gsi;
>           bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT;
>           if (param_logical_op_non_short_circuit != -1)
>             logical_op_non_short_circuit
> @@ -584,6 +584,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
>                                 gimple_cond_rhs (outer_cond));
>           t = fold_build2_loc (gimple_location (inner_cond),
>                                TRUTH_AND_EXPR, boolean_type_node, t1, t2);
> +         }
> +       gimplify_after_fold:
>           if (result_inv)
>             {
>               t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
> @@ -593,6 +595,9 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
>           t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr, NULL, true,
>                                           GSI_SAME_STMT);
>          }
> +      /* ??? Fold should avoid this.  */
> +      else if (!is_gimple_condexpr (t))
> +       goto gimplify_after_fold;
>        if (result_inv)
>         t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>        t = canonicalize_cond_expr_cond (t);
>
>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
diff mbox series

Patch

diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb572..516486d 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -52,6 +52,10 @@  $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
 	$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
 		$(srcdir)/config/rs6000/rs6000-tables.opt
 
+# FRAME_GROWS_DOWNWARD tests flag_sanitize in a way that rules out a
+# test in toplev.c.
+toplev.o-warn = -Wno-error
+
 # The rs6000 backend doesn't cause warnings in these files.
 insn-conditions.o-warn =
 
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0cc80ad..bfbf88c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -4603,6 +4603,41 @@  optimize_bit_field_compare (location_t loc, enum tree_code code,
   return lhs;
 }
 
+/* If *R_ARG is a constant zero, and L_ARG is a possibly masked
+   BIT_XOR_EXPR, return 1 and set *r_arg to l_arg.
+   Otherwise, return 0.
+
+   The returned value should be passed to decode_field_reference for it
+   to handle l_arg, and then doubled for r_arg.  */
+static int
+prepare_xor (tree l_arg, tree *r_arg)
+{
+  int ret = 0;
+
+  if (!integer_zerop (*r_arg))
+    return ret;
+
+  tree exp = l_arg;
+  STRIP_NOPS (exp);
+
+  if (TREE_CODE (exp) == BIT_AND_EXPR)
+    {
+      tree and_mask = TREE_OPERAND (exp, 1);
+      exp = TREE_OPERAND (exp, 0);
+      STRIP_NOPS (exp); STRIP_NOPS (and_mask);
+      if (TREE_CODE (and_mask) != INTEGER_CST)
+	return ret;
+    }
+
+  if (TREE_CODE (exp) == BIT_XOR_EXPR)
+    {
+      *r_arg = l_arg;
+      return 1;
+    }
+
+  return ret;
+}
+
 /* Subroutine for fold_truth_andor_1: decode a field reference.
 
    If EXP is a comparison reference, we return the innermost reference.
@@ -4625,6 +4660,10 @@  optimize_bit_field_compare (location_t loc, enum tree_code code,
 
    *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
 
+   XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked)
+   BIT_XOR_EXPR compared with zero.  We're to take the first or second
+   operand thereof if so.  It should be zero otherwise.
+
    Return 0 if this is not a component reference or is one that we can't
    do anything with.  */
 
@@ -4632,7 +4671,7 @@  static tree
 decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
 			HOST_WIDE_INT *pbitpos, machine_mode *pmode,
 			int *punsignedp, int *preversep, int *pvolatilep,
-			tree *pmask, tree *pand_mask)
+			tree *pmask, tree *pand_mask, int xor_which)
 {
   tree exp = *exp_;
   tree outer_type = 0;
@@ -4640,6 +4679,7 @@  decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
   tree mask, inner, offset;
   tree unsigned_type;
   unsigned int precision;
+  HOST_WIDE_INT shiftrt = 0;
 
   /* All the optimizations using this function assume integer fields.
      There are problems with FP fields since the type_for_size call
@@ -4664,13 +4704,55 @@  decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
 	return NULL_TREE;
     }
 
+  if (xor_which)
+    {
+      gcc_checking_assert (TREE_CODE (exp) == BIT_XOR_EXPR);
+      exp = TREE_OPERAND (exp, xor_which - 1);
+      STRIP_NOPS (exp);
+    }
+
+  if (CONVERT_EXPR_P (exp)
+      || TREE_CODE (exp) == NON_LVALUE_EXPR)
+    {
+      if (!outer_type)
+	outer_type = TREE_TYPE (exp);
+      exp = TREE_OPERAND (exp, 0);
+      STRIP_NOPS (exp);
+    }
+
+  if (TREE_CODE (exp) == RSHIFT_EXPR
+      && TREE_CODE (TREE_OPERAND (exp, 1)) == INTEGER_CST
+      && tree_fits_shwi_p (TREE_OPERAND (exp, 1)))
+    {
+      tree shift = TREE_OPERAND (exp, 1);
+      STRIP_NOPS (shift);
+      shiftrt = tree_to_shwi (shift);
+      if (shiftrt > 0)
+	{
+	  exp = TREE_OPERAND (exp, 0);
+	  STRIP_NOPS (exp);
+	}
+      else
+	shiftrt = 0;
+    }
+
+  if (CONVERT_EXPR_P (exp)
+      || TREE_CODE (exp) == NON_LVALUE_EXPR)
+    {
+      if (!outer_type)
+	outer_type = TREE_TYPE (exp);
+      exp = TREE_OPERAND (exp, 0);
+      STRIP_NOPS (exp);
+    }
+
   poly_int64 poly_bitsize, poly_bitpos;
   inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
 			       pmode, punsignedp, preversep, pvolatilep);
+
   if ((inner == exp && and_mask == 0)
       || !poly_bitsize.is_constant (pbitsize)
       || !poly_bitpos.is_constant (pbitpos)
-      || *pbitsize < 0
+      || *pbitsize <= shiftrt
       || offset != 0
       || TREE_CODE (inner) == PLACEHOLDER_EXPR
       /* Reject out-of-bound accesses (PR79731).  */
@@ -4679,6 +4761,21 @@  decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
 			       *pbitpos + *pbitsize) < 0))
     return NULL_TREE;
 
+  if (shiftrt)
+    {
+      if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	*pbitpos += shiftrt;
+      *pbitsize -= shiftrt;
+    }
+
+  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
+    {
+      HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
+      if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	*pbitpos += excess;
+      *pbitsize -= excess;
+    }
+
   unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
   if (unsigned_type == NULL_TREE)
     return NULL_TREE;
@@ -4709,27 +4806,6 @@  decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
   return inner;
 }
 
-/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
-   bit positions and MASK is SIGNED.  */
-
-static bool
-all_ones_mask_p (const_tree mask, unsigned int size)
-{
-  tree type = TREE_TYPE (mask);
-  unsigned int precision = TYPE_PRECISION (type);
-
-  /* If this function returns true when the type of the mask is
-     UNSIGNED, then there will be errors.  In particular see
-     gcc.c-torture/execute/990326-1.c.  There does not appear to be
-     any documentation paper trail as to why this is so.  But the pre
-     wide-int worked with that restriction and it has been preserved
-     here.  */
-  if (size > precision || TYPE_SIGN (type) == UNSIGNED)
-    return false;
-
-  return wi::mask (size, false, precision) == wi::to_wide (mask);
-}
-
 /* Subroutine for fold: determine if VAL is the INTEGER_CONST that
    represents the sign bit of EXP's type.  If EXP represents a sign
    or zero extension, also test VAL against the unextended type.
@@ -6120,6 +6196,139 @@  merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
   return NULL_TREE;
 }
 
+/* Return the one bitpos within bit extents L or R that is at an
+   ALIGN-bit alignment boundary, or -1 if there is more than one such
+   boundary, if there isn't any, or if there is any such boundary
+   between the extents.  L and R are given by bitpos and bitsize.  If
+   it doesn't return -1, there are two consecutive ALIGN-bit words
+   that contain both extents, and at least one of the extents
+   straddles across the returned alignment boundary.  */
+static inline HOST_WIDE_INT
+compute_split_boundary_from_align (HOST_WIDE_INT align,
+				   HOST_WIDE_INT l_bitpos,
+				   HOST_WIDE_INT l_bitsize,
+				   HOST_WIDE_INT r_bitpos,
+				   HOST_WIDE_INT r_bitsize)
+{
+  HOST_WIDE_INT amask = ~(align - 1);
+
+  HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
+  HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
+
+  HOST_WIDE_INT boundary = (end_bit - 1) & amask;
+
+  /* Make sure we're crossing no more than one alignment boundary.
+
+     ??? We don't have logic to recombine loads of two adjacent
+     fields that each crosses a different alignment boundary, so
+     as to load the middle word only once, if other words can't be
+     otherwise recombined.  */
+  if (boundary - first_bit > align)
+    return -1;
+
+  HOST_WIDE_INT l_start_word = l_bitpos & amask;
+  HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
+
+  HOST_WIDE_INT r_start_word = r_bitpos & amask;
+  HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
+
+  /* If neither field straddles across an alignment boundary, it's no
+     use to even try to merge them.  */
+  if (l_start_word == l_end_word && r_start_word == r_end_word)
+    return -1;
+
+  return boundary;
+}
+
+/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
+   LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
+   respectively, starting at BIT_POS, using reversed endianness if
+   REVERSEP.  Also initialize BITPOS (the starting position of each
+   part into INNER), BITSIZ (the bit count starting at BITPOS),
+   TOSHIFT[1] (the amount by which the part and its mask are to be
+   shifted right to bring its least-significant bit to bit zero) and
+   SHIFTED (the amount by which the part, by separate loading, has
+   already been shifted right, but that the mask needs shifting to
+   match).  */
+static inline void
+build_split_load (tree /* out */ ln_arg[2],
+		  HOST_WIDE_INT /* out */ bitpos[2],
+		  HOST_WIDE_INT /* out */ bitsiz[2],
+		  HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
+		  HOST_WIDE_INT /* out */ shifted[2],
+		  location_t loc, tree inner, tree orig_inner,
+		  scalar_int_mode mode, scalar_int_mode mode2,
+		  HOST_WIDE_INT bit_pos, bool reversep)
+{
+  bitsiz[0] = GET_MODE_BITSIZE (mode);
+  bitsiz[1] = GET_MODE_BITSIZE (mode2);
+
+  for (int i = 0; i < 2; i++)
+    {
+      tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
+      bitpos[i] = bit_pos;
+      ln_arg[i] = make_bit_field_ref (loc, inner, orig_inner,
+				      type, bitsiz[i],
+				      bit_pos, 1, reversep);
+      bit_pos += bitsiz[i];
+    }
+
+  toshift[1] = toshift[0];
+  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+    {
+      shifted[0] = bitsiz[1];
+      shifted[1] = 0;
+      toshift[0] = 0;
+    }
+  else
+    {
+      shifted[1] = bitsiz[0];
+      shifted[0] = 0;
+      toshift[1] = 0;
+    }
+}
+
+/* Make arrangements to split at bit BOUNDARY a single loaded word
+   (with REVERSEP bit order) LN_ARG[0], to be shifted right by
+   TOSHIFT[0] to bring the field of interest to the least-significant
+   bit.  The expectation is that the same loaded word will be
+   propagated from part 0 to part 1, with just different shifting and
+   masking to extract both parts.  MASK is not expected to do more
+   than masking out the bits that belong to the other part.  See
+   build_split_load for more information on the other fields.  */
+static inline void
+reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
+		  HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
+		  HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
+		  HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
+		  HOST_WIDE_INT /* out */ shifted[2],
+		  tree /* out */ mask[2],
+		  HOST_WIDE_INT boundary, bool reversep)
+{
+  ln_arg[1] = ln_arg[0];
+  bitpos[1] = bitpos[0];
+  bitsiz[1] = bitsiz[0];
+  shifted[1] = shifted[0] = 0;
+
+  tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1);
+
+  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+    {
+      toshift[1] = toshift[0];
+      toshift[0] = bitpos[0] + bitsiz[0] - boundary;
+      mask[0] = const_binop (LSHIFT_EXPR, basemask,
+			     bitsize_int (toshift[0]));
+      mask[1] = const_binop (BIT_XOR_EXPR, basemask, mask[0]);
+    }
+  else
+    {
+      toshift[1] = boundary - bitpos[1];
+      mask[1] = const_binop (LSHIFT_EXPR, basemask,
+			     bitsize_int (toshift[1]));
+      mask[0] = const_binop (BIT_XOR_EXPR, basemask, mask[1]);
+    }
+}
+
 /* Find ways of folding logical expressions of LHS and RHS:
    Try to merge two comparisons to the same innermost item.
    Look for range tests like "ch >= '0' && ch <= '9'".
@@ -6142,11 +6351,19 @@  merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
    TRUTH_TYPE is the type of the logical operand and LHS and RHS are its
    two operands.
 
+   SEPARATEP should be NULL if LHS and RHS are adjacent in
+   CODE-chained compares, and a non-NULL pointer to NULL_TREE
+   otherwise.  If the "words" accessed by RHS are already accessed by
+   LHS, this won't matter, but if RHS accesses "words" that LHS
+   doesn't, then *SEPARATEP will be set to the compares that should
+   take RHS's place.  By "words" we mean contiguous bits that do not
+   cross a an TYPE_ALIGN boundary of the accessed object's type.
+
    We return the simplified tree or 0 if no optimization is possible.  */
 
 static tree
 fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
-		    tree lhs, tree rhs)
+		    tree lhs, tree rhs, tree *separatep)
 {
   /* If this is the "or" of two comparisons, we can do something if
      the comparisons are NE_EXPR.  If this is the "and", we can do something
@@ -6157,6 +6374,7 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
      convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
      comparison for one-bit fields.  */
 
+  enum tree_code orig_code = code;
   enum tree_code wanted_code;
   enum tree_code lcode, rcode;
   tree ll_arg, lr_arg, rl_arg, rr_arg;
@@ -6168,13 +6386,16 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
   int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
   int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
   machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
-  scalar_int_mode lnmode, rnmode;
+  scalar_int_mode lnmode, lnmode2, rnmode;
   tree ll_mask, lr_mask, rl_mask, rr_mask;
   tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
   tree l_const, r_const;
   tree lntype, rntype, result;
   HOST_WIDE_INT first_bit, end_bit;
   int volatilep;
+  bool l_split_load;
+
+  gcc_checking_assert (!separatep || !*separatep);
 
   /* Start by getting the comparison codes.  Fail if anything is volatile.
      If one operand is a BIT_AND_EXPR with the constant one, treat it as if
@@ -6202,7 +6423,116 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 
   if (TREE_CODE_CLASS (lcode) != tcc_comparison
       || TREE_CODE_CLASS (rcode) != tcc_comparison)
-    return 0;
+    {
+      tree separate = NULL;
+
+      /* Check for the possibility of merging component references.
+	 If any of our operands is another similar operation, recurse
+	 to try to merge individual operands, but avoiding double
+	 recursion: recurse to each leaf of LHS, and from there to
+	 each leaf of RHS, but don't bother recursing into LHS if RHS
+	 is neither a comparison nor a compound expr, nor into RHS if
+	 the LHS leaf isn't a comparison.  In case of no successful
+	 merging, recursion depth is limited to the sum of the depths
+	 of LHS and RHS, and the non-recursing code below will run no
+	 more times than the product of the leaf counts of LHS and
+	 RHS.  If there is a successful merge, we (recursively)
+	 further attempt to fold the result, so recursion depth and
+	 merge attempts are harder to compute.  */
+      if (TREE_CODE (lhs) == code && TREE_TYPE (lhs) == truth_type
+	  && (TREE_CODE_CLASS (rcode) == tcc_comparison
+	      || (TREE_CODE (rhs) == code && TREE_TYPE (rhs) == truth_type)))
+	{
+	  if ((result = fold_truth_andor_1 (loc, code, truth_type,
+					    TREE_OPERAND (lhs, 1), rhs,
+					    separatep
+					    ? separatep
+					    : NULL)) != 0)
+	    {
+	      /* We have combined the latter part of LHS with RHS.  If
+		 they were separate, the recursion already placed any
+		 remains of RHS in *SEPARATEP, otherwise they are all
+		 in RESULT, so we just have to prepend to result the
+		 former part of LHS.  */
+	      result = fold_build2_loc (loc, code, truth_type,
+					TREE_OPERAND (lhs, 0), result);
+	      return result;
+	    }
+	  if ((result = fold_truth_andor_1 (loc, code, truth_type,
+					    TREE_OPERAND (lhs, 0), rhs,
+					    separatep
+					    ? separatep
+					    : &separate)) != 0)
+	    {
+	      /* We have combined the former part of LHS with RHS.  If
+		 they were separate, the recursive call will have
+		 placed remnants of RHS in *SEPARATEP.  If they
+		 wereń't, they will be in SEPARATE instead.  Append
+		 the latter part of LHS to the result, and then any
+		 remnants of RHS that we haven't passed on to the
+		 caller.  */
+	      result = fold_build2_loc (loc, code, truth_type,
+					result, TREE_OPERAND (lhs, 1));
+	      if (separate)
+		result = fold_build2_loc (loc, code, truth_type,
+					  result, separate);
+	      return result;
+	    }
+	}
+      else if (TREE_CODE_CLASS (lcode) == tcc_comparison
+	       && TREE_CODE (rhs) == code && TREE_TYPE (rhs) == truth_type)
+	{
+	  if ((result = fold_truth_andor_1 (loc, code, truth_type,
+					    lhs, TREE_OPERAND (rhs, 0),
+					    separatep
+					    ? &separate
+					    : NULL)) != 0)
+	    {
+	      /* We have combined LHS with the former part of RHS.  If
+		 they were separate, have any remnants of RHS placed
+		 in separate, so that we can combine them with the
+		 latter part of RHS, and then send them back for the
+		 caller to handle.  If they were adjacent, we can just
+		 append the latter part of RHS to the RESULT.  */
+	      if (!separate)
+		separate = TREE_OPERAND (rhs, 1);
+	      else
+		separate = fold_build2_loc (loc, code, truth_type,
+					    separate, TREE_OPERAND (rhs, 1));
+	      if (separatep)
+		*separatep = separate;
+	      else
+		result = fold_build2_loc (loc, code, truth_type,
+					  result, separate);
+	      return result;
+	    }
+	  if ((result = fold_truth_andor_1 (loc, code, truth_type,
+					    lhs, TREE_OPERAND (rhs, 1),
+					    &separate)) != 0)
+	    {
+	      /* We have combined LHS with the latter part of RHS.
+		 They're definitely not adjacent, so we get the
+		 remains of RHS in SEPARATE, and then prepend the
+		 former part of RHS to it.  If LHS and RHS were
+		 already separate to begin with, we leave the remnants
+		 of RHS for the caller to deal with, otherwise we
+		 append them to the RESULT.  */
+	      if (!separate)
+		separate = TREE_OPERAND (rhs, 0);
+	      else
+		separate = fold_build2_loc (loc, code, truth_type,
+					    TREE_OPERAND (rhs, 0), separate);
+	      if (separatep)
+		*separatep = separate;
+	      else
+		result = fold_build2_loc (loc, code, truth_type,
+					  result, separate);
+	      return result;
+	    }
+	}
+
+      return 0;
+    }
 
   ll_arg = TREE_OPERAND (lhs, 0);
   lr_arg = TREE_OPERAND (lhs, 1);
@@ -6278,22 +6608,24 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 
   ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
   volatilep = 0;
+  int l_xor = prepare_xor (ll_arg, &lr_arg);
   ll_inner = decode_field_reference (loc, &ll_arg,
 				     &ll_bitsize, &ll_bitpos, &ll_mode,
 				     &ll_unsignedp, &ll_reversep, &volatilep,
-				     &ll_mask, &ll_and_mask);
+				     &ll_mask, &ll_and_mask, l_xor);
   lr_inner = decode_field_reference (loc, &lr_arg,
 				     &lr_bitsize, &lr_bitpos, &lr_mode,
 				     &lr_unsignedp, &lr_reversep, &volatilep,
-				     &lr_mask, &lr_and_mask);
+				     &lr_mask, &lr_and_mask, 2 * l_xor);
+  int r_xor = prepare_xor (rl_arg, &rr_arg);
   rl_inner = decode_field_reference (loc, &rl_arg,
 				     &rl_bitsize, &rl_bitpos, &rl_mode,
 				     &rl_unsignedp, &rl_reversep, &volatilep,
-				     &rl_mask, &rl_and_mask);
+				     &rl_mask, &rl_and_mask, r_xor);
   rr_inner = decode_field_reference (loc, &rr_arg,
 				     &rr_bitsize, &rr_bitpos, &rr_mode,
 				     &rr_unsignedp, &rr_reversep, &volatilep,
-				     &rr_mask, &rr_and_mask);
+				     &rr_mask, &rr_and_mask, 2 * r_xor);
 
   /* It must be true that the inner operation on the lhs of each
      comparison must be the same if we are to be able to do anything.
@@ -6349,6 +6681,72 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 	return 0;
     }
 
+  /* This will be bumped to 2 if any of the field pairs crosses an
+     alignment boundary, so the merged compare has to be done in two
+     parts.  */
+  int parts = 1;
+  /* Set to true if the second combined compare should come first,
+     e.g., because the second original compare accesses a word that
+     the first one doesn't, and the combined compares access those in
+     cmp[0].  */
+  bool first1 = false;
+  /* Set to true if the first original compare is not the one being
+     split.  */
+  bool maybe_separate = false;
+
+  /* The following 2-dimensional arrays use the first index to
+     identify left(0)- vs right(1)-hand compare operands, and the
+     second one to identify merged compare parts.  */
+  /* The memory loads or constants to be compared.  */
+  tree ld_arg[2][2];
+  /* The first bit of the corresponding inner object that the
+     corresponding LD_ARG covers.  */
+  HOST_WIDE_INT bitpos[2][2];
+  /* The bit count starting at BITPOS that the corresponding LD_ARG
+     covers.  */
+  HOST_WIDE_INT bitsiz[2][2];
+  /* The number of bits by which LD_ARG has already been shifted
+     right, WRT mask.  */
+  HOST_WIDE_INT shifted[2][2];
+  /* The number of bits by which both LD_ARG and MASK need shifting to
+     bring its least-significant bit to bit zero.  */
+  HOST_WIDE_INT toshift[2][2];
+  /* An additional mask to be applied to LD_ARG, to remove any bits
+     that may have been loaded for use in another compare, but that
+     don't belong in the corresponding compare.  */
+  tree xmask[2][2] = {};
+
+  /* The combined compare or compares.  */
+  tree cmp[2];
+
+  /* Consider we're comparing two non-contiguous fields of packed
+     structs, both aligned at 32-bit boundaries:
+
+     ll_arg: an 8-bit field at offset 0
+     lr_arg: a 16-bit field at offset 2
+
+     rl_arg: an 8-bit field at offset 1
+     rr_arg: a 16-bit field at offset 3
+
+     We'll have r_split_load, because rr_arg straddles across an
+     alignment boundary.
+
+     We'll want to have:
+
+     bitpos  = { {  0,  0 }, {  0, 32 } }
+     bitsiz  = { { 32, 32 }, { 32,  8 } }
+
+     And, for little-endian:
+
+     shifted = { {  0,  0 }, {  0, 32 } }
+     toshift = { {  0, 24 }, {  0,  0 } }
+
+     Or, for big-endian:
+
+     shifted = { {  0,  0 }, {  8,  0 } }
+     toshift = { {  8,  0 }, {  0,  0 } }
+  */
+
   /* See if we can find a mode that contains both fields being compared on
      the left.  If we can't, fail.  Otherwise, update all constants and masks
      to be relative to a field of that size.  */
@@ -6357,11 +6755,41 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
   if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 		      TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
 		      volatilep, &lnmode))
-    return 0;
+    {
+      /* Consider the possibility of recombining loads if any of the
+	 fields straddles across an alignment boundary, so that either
+	 part can be loaded along with the other field.  */
+      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
+      HOST_WIDE_INT boundary = compute_split_boundary_from_align
+	(align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+
+      if (boundary < 0
+	  || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
+			     align, BITS_PER_WORD, volatilep, &lnmode)
+	  || !get_best_mode (end_bit - boundary, boundary, 0, 0,
+			     align, BITS_PER_WORD, volatilep, &lnmode2))
+	return 0;
+
+      l_split_load = true;
+      parts = 2;
+      if (ll_bitpos >= boundary)
+	maybe_separate = first1 = true;
+      else if (ll_bitpos + ll_bitsize <= boundary)
+	maybe_separate = true;
+    }
+  else
+    l_split_load = false;
 
   lnbitsize = GET_MODE_BITSIZE (lnmode);
   lnbitpos = first_bit & ~ (lnbitsize - 1);
+  if (l_split_load)
+    lnbitsize += GET_MODE_BITSIZE (lnmode2);
   lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
+  if (!lntype)
+    {
+      gcc_checking_assert (l_split_load);
+      lntype = build_nonstandard_integer_type (lnbitsize, 1);
+    }
   xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
 
   if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
@@ -6414,19 +6842,58 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 	  || ll_reversep != lr_reversep
 	  /* Make sure the two fields on the right
 	     correspond to the left without being swapped.  */
-	  || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
+	  || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos
+	  || lnbitpos < 0)
 	return 0;
 
+      bool r_split_load;
+      scalar_int_mode rnmode2;
+
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
       if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			  TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
 			  volatilep, &rnmode))
-	return 0;
+	{
+	  /* Consider the possibility of recombining loads if any of the
+	     fields straddles across an alignment boundary, so that either
+	     part can be loaded along with the other field.  */
+	  HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
+	  HOST_WIDE_INT boundary = compute_split_boundary_from_align
+	    (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
+
+	  if (boundary < 0
+	      /* If we're to split both, make sure the split point is
+		 the same.  */
+	      || (l_split_load
+		  && (boundary - lr_bitpos
+		      != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
+	      || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
+				 align, BITS_PER_WORD, volatilep, &rnmode)
+	      || !get_best_mode (end_bit - boundary, boundary, 0, 0,
+				 align, BITS_PER_WORD, volatilep, &rnmode2))
+	    return 0;
+
+	  r_split_load = true;
+	  parts = 2;
+	  if (lr_bitpos >= boundary)
+	    maybe_separate = first1 = true;
+	  else if (lr_bitpos + lr_bitsize <= boundary)
+	    maybe_separate = true;
+	}
+      else
+	r_split_load = false;
 
       rnbitsize = GET_MODE_BITSIZE (rnmode);
       rnbitpos = first_bit & ~ (rnbitsize - 1);
+      if (r_split_load)
+	rnbitsize += GET_MODE_BITSIZE (rnmode2);
       rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
+      if (!rntype)
+	{
+	  gcc_checking_assert (r_split_load);
+	  rntype = build_nonstandard_integer_type (rnbitsize, 1);
+	}
       xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
 
       if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
@@ -6442,133 +6909,190 @@  fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 							    rntype, rr_mask),
 			     size_int (xrr_bitpos));
 
-      /* Make a mask that corresponds to both fields being compared.
-	 Do this for both items being compared.  If the operands are the
-	 same size and the bits being compared are in the same position
-	 then we can do this by masking both and comparing the masked
-	 results.  */
-      ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
       lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
-      if (lnbitsize == rnbitsize
-	  && xll_bitpos == xlr_bitpos
-	  && lnbitpos >= 0
-	  && rnbitpos >= 0)
-	{
-	  lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
-				    lntype, lnbitsize, lnbitpos,
-				    ll_unsignedp || rl_unsignedp, ll_reversep);
-	  if (! all_ones_mask_p (ll_mask, lnbitsize))
-	    lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
-
-	  rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
-				    rntype, rnbitsize, rnbitpos,
-				    lr_unsignedp || rr_unsignedp, lr_reversep);
-	  if (! all_ones_mask_p (lr_mask, rnbitsize))
-	    rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
-
-	  return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
-	}
-
-      /* There is still another way we can do something:  If both pairs of
-	 fields being compared are adjacent, we may be able to make a wider
-	 field containing them both.
-
-	 Note that we still must mask the lhs/rhs expressions.  Furthermore,
-	 the mask must be shifted to account for the shift done by
-	 make_bit_field_ref.  */
-      if (((ll_bitsize + ll_bitpos == rl_bitpos
-	    && lr_bitsize + lr_bitpos == rr_bitpos)
-	   || (ll_bitpos == rl_bitpos + rl_bitsize
-	       && lr_bitpos == rr_bitpos + rr_bitsize))
-	  && ll_bitpos >= 0
-	  && rl_bitpos >= 0
-	  && lr_bitpos >= 0
-	  && rr_bitpos >= 0)
-	{
-	  tree type;
-
-	  lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
-				    ll_bitsize + rl_bitsize,
-				    MIN (ll_bitpos, rl_bitpos),
-				    ll_unsignedp, ll_reversep);
-	  rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
-				    lr_bitsize + rr_bitsize,
-				    MIN (lr_bitpos, rr_bitpos),
-				    lr_unsignedp, lr_reversep);
-
-	  ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
-				 size_int (MIN (xll_bitpos, xrl_bitpos)));
-	  lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
-				 size_int (MIN (xlr_bitpos, xrr_bitpos)));
-
-	  /* Convert to the smaller type before masking out unwanted bits.  */
-	  type = lntype;
-	  if (lntype != rntype)
+
+      toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
+      shifted[1][0] = 0;
+
+      if (!r_split_load)
+	{
+	  bitpos[1][0] = rnbitpos;
+	  bitsiz[1][0] = rnbitsize;
+	  ld_arg[1][0] = make_bit_field_ref (loc, lr_inner, lr_arg,
+					     rntype, rnbitsize, rnbitpos,
+					     lr_unsignedp || rr_unsignedp,
+					     lr_reversep);
+	}
+
+      if (parts > 1)
+	{
+	  if (r_split_load)
+	    build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+			      shifted[1], loc, lr_inner, lr_arg,
+			      rnmode, rnmode2, rnbitpos, lr_reversep);
+	  else
+	    reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+			      shifted[1], xmask[1],
+			      lnbitpos + GET_MODE_BITSIZE (lnmode)
+			      - ll_bitpos + lr_bitpos, lr_reversep);
+	}
+    }
+  else
+    {
+      /* Handle the case of comparisons with constants.  If there is
+	 something in common between the masks, those bits of the
+	 constants must be the same.  If not, the condition is always
+	 false.  Test for this to avoid generating incorrect code
+	 below.  */
+      result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
+      if (! integer_zerop (result)
+	  && simple_cst_equal (const_binop (BIT_AND_EXPR,
+					    result, l_const),
+			       const_binop (BIT_AND_EXPR,
+					    result, r_const)) != 1)
+	{
+	  if (wanted_code == NE_EXPR)
 	    {
-	      if (lnbitsize > rnbitsize)
-		{
-		  lhs = fold_convert_loc (loc, rntype, lhs);
-		  ll_mask = fold_convert_loc (loc, rntype, ll_mask);
-		  type = rntype;
-		}
-	      else if (lnbitsize < rnbitsize)
-		{
-		  rhs = fold_convert_loc (loc, lntype, rhs);
-		  lr_mask = fold_convert_loc (loc, lntype, lr_mask);
-		  type = lntype;
-		}
+	      warning (0,
+		       "%<or%> of unmatched not-equal tests"
+		       " is always 1");
+	      return constant_boolean_node (true, truth_type);
 	    }
+	  else
+	    {
+	      warning (0,
+		       "%<and%> of mutually exclusive equal-tests"
+		       " is always 0");
+	      return constant_boolean_node (false, truth_type);
+	    }
+	}
 
-	  if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
-	    lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
+      if (lnbitpos < 0)
+	return 0;
 
-	  if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
-	    rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
+      /* The constants are combined so as to line up with the loaded
+	 field, so use the same parameters.  */
+      ld_arg[1][0] = const_binop (BIT_IOR_EXPR, l_const, r_const);
+      toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
+      shifted[1][0] = 0;
+      bitpos[1][0] = lnbitpos;
+      bitsiz[1][0] = lnbitsize;
 
-	  return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
-	}
+      if (parts > 1)
+	reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+			  shifted[1], xmask[1],
+			  lnbitpos + GET_MODE_BITSIZE (lnmode),
+			  lr_reversep);
 
-      return 0;
+      lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1);
+
+      /* If the compiler thinks this is used uninitialized below, it's
+	 because it can't realize that parts can only be 2 when
+	 comparing wiht constants if l_split_load is also true.  This
+	 just silences the warning.  */
+      rnbitpos = 0;
     }
 
-  /* Handle the case of comparisons with constants.  If there is something in
-     common between the masks, those bits of the constants must be the same.
-     If not, the condition is always false.  Test for this to avoid generating
-     incorrect code below.  */
-  result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
-  if (! integer_zerop (result)
-      && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
-			   const_binop (BIT_AND_EXPR, result, r_const)) != 1)
+  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
+  toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
+  shifted[0][0] = 0;
+
+  if (!l_split_load)
     {
-      if (wanted_code == NE_EXPR)
+      bitpos[0][0] = lnbitpos;
+      bitsiz[0][0] = lnbitsize;
+      ld_arg[0][0] = make_bit_field_ref (loc, ll_inner, ll_arg,
+					 lntype, lnbitsize, lnbitpos,
+					 ll_unsignedp || rl_unsignedp,
+					 ll_reversep);
+    }
+
+  if (parts > 1)
+    {
+      if (l_split_load)
+	build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
+			  shifted[0], loc, ll_inner, ll_arg,
+			  lnmode, lnmode2, lnbitpos, ll_reversep);
+      else
+	reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
+			  shifted[0], xmask[0],
+			  rnbitpos + GET_MODE_BITSIZE (rnmode)
+			  - lr_bitpos + ll_bitpos, ll_reversep);
+    }
+
+  for (int i = 0; i < parts; i++)
+    {
+      tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
+      tree mask[2] = { ll_mask, lr_mask };
+
+      for (int j = 0; j < 2; j++)
 	{
-	  warning (0, "%<or%> of unmatched not-equal tests is always 1");
-	  return constant_boolean_node (true, truth_type);
+	  /* Mask out the bits belonging to the other part.  */
+	  if (xmask[j][i])
+	    mask[j] = const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]);
+
+	  if (shifted[j][i])
+	    {
+	      tree shiftsz = bitsize_int (shifted[j][i]);
+	      mask[j] = const_binop (RSHIFT_EXPR, mask[j], shiftsz);
+	    }
+	  mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]);
 	}
-      else
+
+      HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
+
+      if (shift)
 	{
-	  warning (0, "%<and%> of mutually exclusive equal-tests is always 0");
-	  return constant_boolean_node (false, truth_type);
+	  int j;
+	  if (shift > 0)
+	    j = 0;
+	  else
+	    {
+	      j = 1;
+	      shift = -shift;
+	    }
+
+	  tree shiftsz = bitsize_int (shift);
+	  op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]),
+				   op[j], shiftsz);
+	  mask[j] = const_binop (RSHIFT_EXPR, mask[j], shiftsz);
 	}
-    }
 
-  if (lnbitpos < 0)
-    return 0;
+      /* Convert to the smaller type before masking out unwanted
+	 bits.  */
+      tree type = TREE_TYPE (op[0]);
+      if (type != TREE_TYPE (op[1]))
+	{
+	  int j = (TYPE_PRECISION (type)
+		   < TYPE_PRECISION (TREE_TYPE (op[1])));
+	  if (!j)
+	    type = TREE_TYPE (op[1]);
+	  op[j] = fold_convert_loc (loc, type, op[j]);
+	  mask[j] = fold_convert_loc (loc, type, mask[j]);
+	}
 
-  /* Construct the expression we will return.  First get the component
-     reference we will make.  Unless the mask is all ones the width of
-     that field, perform the mask operation.  Then compare with the
-     merged constant.  */
-  result = make_bit_field_ref (loc, ll_inner, ll_arg,
-			       lntype, lnbitsize, lnbitpos,
-			       ll_unsignedp || rl_unsignedp, ll_reversep);
+      for (int j = 0; j < 2; j++)
+	if (! integer_all_onesp (mask[j]))
+	  op[j] = build2_loc (loc, BIT_AND_EXPR, type,
+			      op[j], mask[j]);
 
-  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
-  if (! all_ones_mask_p (ll_mask, lnbitsize))
-    result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
+      cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]);
+    }
+
+  if (first1)
+    std::swap (cmp[0], cmp[1]);
+
+  if (parts == 1)
+    result = cmp[0];
+  else if (!separatep || !maybe_separate)
+    result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]);
+  else
+    {
+      result = cmp[0];
+      *separatep = cmp[1];
+    }
 
-  return build2_loc (loc, wanted_code, truth_type, result,
-		     const_binop (BIT_IOR_EXPR, l_const, r_const));
+  return result;
 }
 
 /* T is an integer expression that is being multiplied, divided, or taken a
@@ -9166,15 +9690,7 @@  fold_truth_andor (location_t loc, enum tree_code code, tree type,
 	return fold_build2_loc (loc, code, type, arg0, tem);
     }
 
-  /* Check for the possibility of merging component references.  If our
-     lhs is another similar operation, try to merge its rhs with our
-     rhs.  Then try to merge our lhs and rhs.  */
-  if (TREE_CODE (arg0) == code
-      && (tem = fold_truth_andor_1 (loc, code, type,
-				    TREE_OPERAND (arg0, 1), arg1)) != 0)
-    return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
-
-  if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0)
+  if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1, NULL)) != 0)
     return tem;
 
   bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT;
diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c b/gcc/testsuite/gcc.dg/field-merge-1.c
new file mode 100644
index 00000000..1818e104
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-1.c
@@ -0,0 +1,64 @@ 
+/* { dg-do run } */
+/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
+
+/* Check that field loads compared with constants are merged, even if
+   tested out of order, and when fields straddle across alignment
+   boundaries.  */
+
+struct TL {
+  unsigned char p;
+  unsigned int a;
+  unsigned char q;
+  unsigned int b;
+  unsigned char r;
+  unsigned int c;
+  unsigned char s;
+} __attribute__ ((packed, aligned (4), scalar_storage_order ("little-endian")));
+
+struct TB {
+  unsigned char p;
+  unsigned int a;
+  unsigned char q;
+  unsigned int b;
+  unsigned char r;
+  unsigned int c;
+  unsigned char s;
+} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
+
+#define vc 0xaa
+#define vi 0x12345678
+
+struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
+struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
+
+void f (void) {
+  /* Which words of    | vL | vB | */
+  /* are accessed by   |0123|0123| */
+  if (0 /* the tests?  |    |    | */
+      || vL.p != vc /* |*   |    | */
+      || vB.p != vc /* |    |*   | */
+      || vL.s != vc /* |   *|    | */
+      || vB.q != vc /* |    | *  | */
+      || vL.a != vi /* |^*  |    | */
+      || vB.b != vi /* |    | ^* | */
+      || vL.c != vi /* |  *^|    | */
+      || vB.c != vi /* |    |  ^*| */
+      || vL.b != vi /* | ^^ |    | */
+      || vL.q != vc /* | ^  |    | */
+      || vB.a != vi /* |    |^^  | */
+      || vB.r != vc /* |    |  ^ | */
+      || vB.s != vc /* |    |   ^| */
+      || vL.r != vc /* |  ^ |    | */
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
+/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } } } } */
+/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || x86_64-*-* } } } } */
+/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || rs6000-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c b/gcc/testsuite/gcc.dg/field-merge-2.c
new file mode 100644
index 00000000..80c573b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-2.c
@@ -0,0 +1,31 @@ 
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct TL {
+  unsigned short a;
+  unsigned short b;
+} __attribute__ ((packed, aligned (8)));
+
+struct TB {
+  unsigned char p;
+  unsigned short a;
+  unsigned short b;
+} __attribute__ ((packed, aligned (8)));
+
+#define vc 0xaa
+
+struct TL vL = { vc, vc };
+struct TB vB = { vc, vc, vc };
+
+void f (void) {
+  if (0
+      || vL.b != vB.b
+      || vL.a != vB.a
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c b/gcc/testsuite/gcc.dg/field-merge-3.c
new file mode 100644
index 00000000..8fdbb9a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-3.c
@@ -0,0 +1,36 @@ 
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
+
+struct T1 {
+  unsigned char p[2];
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__aligned__(8)));
+
+struct T2 {
+  unsigned short p;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__aligned__(8)));
+
+#define vc 0xaa
+#define vi 0x12345678
+
+struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
+struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
+
+void f (void) {
+  if (0
+      || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
+      || v1.a != v2.a
+      || (v1.z ^ v2.z) & 0xff00ff00 != 0
+      || (v1.z ^ v2.z) & 0x00ff00ff != 0)
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c b/gcc/testsuite/gcc.dg/field-merge-4.c
new file mode 100644
index 00000000..c629069
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-4.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct T1 {
+  unsigned int zn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(4)));
+
+struct T2 {
+  unsigned int zn;
+  unsigned char rn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(4)));
+
+#define vc 0xaa
+#define vs 0xccdd
+#define vi 0x12345678
+
+struct T1 v1 = { -1, vc, 1, vs, vi };
+struct T2 v2 = { -1, 0, vc, 1, vs, vi };
+
+void f (void) {
+  if (0
+      || v1.p != v2.p
+      || v1.a != v2.a
+      || v1.z != v2.z
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c b/gcc/testsuite/gcc.dg/field-merge-5.c
new file mode 100644
index 00000000..1580b14bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-5.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct T1 {
+  unsigned int zn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(8)));
+
+struct T2 {
+  unsigned int zn;
+  unsigned char rn;
+  unsigned char p;
+  unsigned char qn;
+  unsigned short a;
+  unsigned int z;
+} __attribute__((__packed__, __aligned__(8)));
+
+#define vc 0xaa
+#define vs 0xccdd
+#define vi 0x12345678
+
+struct T1 v1 = { -1, vc, 1, vs, vi };
+struct T2 v2 = { -1, 0, vc, 1, vs, vi };
+
+void f (void) {
+  if (0
+      || v1.p != v2.p
+      || v1.a != v2.a
+      || v1.z != v2.z
+      )
+    __builtin_abort ();
+}
+
+int main () {
+  f ();
+  return 0;
+}