diff mbox

[4/6] scalar-storage-order merge: bulk

Message ID 10321904.dUOa0O0I86@polaris
State New
Headers show

Commit Message

Eric Botcazou June 16, 2015, 9:22 a.m. UTC
This is the bulk of the implementation.

	* calls.c (store_unaligned_arguments_into_pseudos): Adjust calls to
	extract_bit_field and store_bit_field.
	(initialize_argument_information): Adjust call to store_expr.
	(load_register_parameters): Adjust call to extract_bit_field.
	* expmed.c (check_reverse_storage_order_support): New function.
	(check_reverse_float_storage_order_support): Likewise.
	(flip_storage_order): Likewise.
	(store_bit_field_1): Add REVERSE parameter.  Flip the storage order
	of the value if it is true.  Pass REVERSE to recursive call after
	adjusting the target offset.
	Do not use extraction or movstrict instruction if REVERSE is true.
	Pass REVERSE to store_fixed_bit_field.
	(store_bit_field): Add REVERSE parameter and pass to it to above.
	(store_fixed_bit_field): Add REVERSE parameter and pass to it to
	store_split_bit_field and store_fixed_bit_field_1.
	(store_fixed_bit_field_1):  Add REVERSE parameter.  Flip the storage
	order of the value if it is true and adjust the target offset.
	(store_split_bit_field): Add REVERSE parameter and pass it to
	store_fixed_bit_field.  Adjust the target offset if it is true.
	(extract_bit_field_1): Add REVERSE parameter.  Flip the storage order
	of the value if it is true.  Pass REVERSE to recursive call after
	adjusting the target offset.
	Do not use extraction or subreg instruction if REVERSE is true.
	Pass REVERSE to extract_fixed_bit_field.
	(extract_bit_field): Add REVERSE parameter and pass to it to above.
	(extract_fixed_bit_field): Add REVERSE parameter and pass to it to
	extract_split_bit_field and extract_fixed_bit_field_1.
	(extract_fixed_bit_field_1): Add REVERSE parameter.  Flip the storage
	order of the value if it is true and adjust the target offset.
	(extract_split_bit_field): Add REVERSE parameter and pass it to
	extract_fixed_bit_field.  Adjust the target offset if it is true.
	* expmed.h (flip_storage_order): Declare.
	(store_bit_field): Adjust prototype.
	(extract_bit_field): Likewise.
	* expr.c (emit_group_load_1): Adjust calls to extract_bit_field.
	(emit_group_store): Adjust call to store_bit_field.
	(copy_blkmode_from_reg): Likewise.
	(copy_blkmode_to_reg): Likewise.
	(write_complex_part): Likewise.
	(read_complex_part): Likewise.
	(optimize_bitfield_assignment_op): Add REVERSE parameter.  Assert
	that it isn't true if the target is a register.
	<PLUS_EXPR>: If it is, do not optimize unless bitsize is equal to 1,
	and flip the storage order of the mask.
	<BIT_IOR_EXPR>: Use str_mode/str_bitsize instead of more convoluted
	expressions and flip the storage order of the mask.
	(get_bit_range): Adjust call to get_inner_reference.
	(expand_assignment): Adjust calls to get_inner_reference, store_expr,
	optimize_bitfield_assignment_op and store_field.  Handle MEM_EXPRs
	with reverse storage order.
	(store_expr_with_bounds): Add REVERSE parameter and pass it to recursive
	calls and call to store_bit_field.  Force the value into a register if
	it is true and then flip the storage order of the value.
	(store_expr): Add REVERSE parameter and pass it to above.
	(store_constructor_field): Add REVERSE parameter and pass it to
	recursive calls and call to store_field.
	(store_constructor): Add REVERSE parameter and pass it to calls to
	store_constructor_field and store_expr.  Set it to true for an
	aggregate type with TYPE_REVERSE_STORAGE_ORDER.
	(store_field): Add REVERSE parameter and pass it to recursive calls
	and calls to store_expr and store_bit_field.  Temporarily flip the
	storage order of the value with record type and integral mode and
	adjust the shift if it is true.
	(get_inner_reference): Add PREVERSEP parameter and set it to true
	upon encoutering a reference with reverse storage order.
	(expand_expr_addr_expr_1): Adjust call to get_inner_reference.
	(expand_constructor): Adjust call to store_constructor.
	(expand_expr_real_2) <CASE_CONVERT>: Pass TYPE_REVERSE_STORAGE_ORDER
	of the union type to store_expr in the MEM case and assert that it
	isn't set in the REG case.  Adjust call to store_field.
	(expand_expr_real_1) <MEM_REF>: Handle reverse storage order.
	<normal_inner_ref>: Add REVERSEP variable and adjust calls to
	get_inner_reference and extract_bit_field. Temporarily flip the
	storage order of the value with record type and integral mode and
	adjust the shift if it is true.  Flip the storage order of the value
	at the end if it is true.
	<VIEW_CONVERT_EXPR>: Add REVERSEP variable and adjust call to
	get_inner_reference.  Do not fetch an inner reference if it is true.
	* expr.h (store_expr_with_bounds): Ajust prototype.
	(store_expr): Likewise.
	* fold-const.c (make_bit_field_ref): Add REVERSEP parameter and set
	REF_REVERSE_STORAGE_ORDER on the reference according to it.
	(optimize_bit_field_compare): Deal with reverse storage order.  Adjust
	calls to get_inner_reference and make_bit_field_ref.
	(decode_field_reference): Add PREVERSEP parameter and adjust call to
	get_inner_reference.
	(fold_truth_andor_1): Deal with reverse storage order.  Adjust calls to
	decode_field_reference and make_bit_field_ref.
	(fold_unary_loc) <CASE_CONVERT>: Adjust call to get_inner_reference.
	<VIEW_CONVERT_EXPR>: Propagate the REF_REVERSE_STORAGE_ORDER flag.
	(fold_comparison): Adjust call to get_inner_reference.
	(split_address_to_core_and_offset): Adjust call to get_inner_reference.
	* gimple-expr.c (useless_type_conversion_p): Return false for array
	types with different TYPE_REVERSE_STORAGE_ORDER flag.
	* gimplify.c (gimplify_expr) <MEM_REF>: Propagate the
	REF_REVERSE_STORAGE_ORDER flag.
	* output.h (assemble_real): Adjust prototype.
	* print-tree.c (print_node): Convey TYPE_REVERSE_STORAGE_ORDER.
	* tree-core.h (TYPE_REVERSE_STORAGE_ORDER): Document.
	(TYPE_SATURATING): Adjust.
	(REF_REVERSE_STORAGE_ORDER): Document.
	* tree-dfa.c (get_ref_base_and_extent): Add PREVERSE parameter and set
	it to true upon encoutering a reference with reverse storage order.
	* tree-dfa.h (get_ref_base_and_extent): Adjust prototype.
	* tree-inline.c (remap_gimple_op_r): Propagate the
	REF_REVERSE_STORAGE_ORDER flag.
	(copy_tree_body_r): Likewise.
	* tree-outof-ssa.c (insert_value_copy_on_edge): Adjust call to
	store_expr.
	* tree.c (stabilize_reference) <BIT_FIELD_REF>: Propagate the
	REF_REVERSE_STORAGE_ORDER flag.
	(verify_type_variant): Deal with TYPE_REVERSE_STORAGE_ORDER.
	(gimple_canonical_types_compatible_p): Likewise.
	* tree.h (TYPE_REVERSE_STORAGE_ORDER): New flag.
	(TYPE_SATURATING): Adjust.
	(REF_REVERSE_STORAGE_ORDER): New flag.
	(storage_order_barrier_p): New inline predicate.
	(get_inner_reference): Adjust prototype.
	* varasm.c (assemble_real): Add REVERSE parameter.  Flip the storage
	order of the value if REVERSE is true.
	(compare_constant) <CONSTRUCTOR>: Compare TYPE_REVERSE_STORAGE_ORDER.
	(assemble_constant_contents): Adjust call to output_constant.
	(output_constant_pool_2): Adjust call to assemble_real.
	(initializer_constant_valid_p_1) <CONSTRUCTOR>: Deal with
	TYPE_REVERSE_STORAGE_ORDER.
	(output_constant): Add REVERSE parameter.
	<INTEGER_TYPE>: Flip the storage order of the value if REVERSE is true.
	<REAL_TYPE>: Adjust call to assemble_real.
	<COMPLEX_TYPE>: Pass it to recursive calls.
	<ARRAY_TYPE>: Likewise.  Adjust call to output_constructor.
	<RECORD_TYPE>: Likewise.  Adjust call to output_constructor.
	(struct oc_local_state): Add REVERSE field.
	(output_constructor_array_range): Adjust calls to output_constant.
	(output_constructor_regular_field): Likewise.
	(output_constructor_bitfield): Adjust call to output_constructor.  Flip
	the storage order of the value if REVERSE is true.
	(output_constructor): Add REVERSE parameter.  Set it to true for an
	aggregate type with TYPE_REVERSE_STORAGE_ORDER.  Adjust call to
	output_constructor_bitfield.

 calls.c          |   10 -
 expmed.c         |  231 +++++++++++++++++++++++++++++++---------
 expmed.h         |    8 +
 expr.c           |  314 +++++++++++++++++++++++++++++++++--------------------
 expr.h           |    4 
 fold-const.c     |  132 +++++++++++++----------
 gimple-expr.c    |    8 -
 gimplify.c       |    2 
 output.h         |    2 
 print-tree.c     |    7 +
 tree-core.h      |    8 +
 tree-dfa.c       |   30 ++++-
 tree-dfa.h       |    2 
 tree-inline.c    |    2 
 tree-outof-ssa.c |    2 
 tree.c           |   10 +
 tree.h           |   52 ++++++++-
 varasm.c         |  102 +++++++++++------
 18 files changed, 650 insertions(+), 276 deletions(-)
diff mbox

Patch

Index: gcc/tree.c
===================================================================
--- gcc/tree.c	(.../trunk)	(revision 224461)
+++ gcc/tree.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -4140,6 +4140,7 @@  stabilize_reference (tree ref)
       result = build_nt (BIT_FIELD_REF,
 			 stabilize_reference (TREE_OPERAND (ref, 0)),
 			 TREE_OPERAND (ref, 1), TREE_OPERAND (ref, 2));
+      REF_REVERSE_STORAGE_ORDER (result) = REF_REVERSE_STORAGE_ORDER (ref);
       break;
 
     case ARRAY_REF:
@@ -12681,7 +12682,10 @@  verify_type_variant (const_tree t, tree
   verify_variant_match (TYPE_PACKED);
   if (TREE_CODE (t) == REFERENCE_TYPE)
     verify_variant_match (TYPE_REF_IS_RVALUE);
-  verify_variant_match (TYPE_SATURATING);
+  if (AGGREGATE_TYPE_P (t))
+    verify_variant_match (TYPE_REVERSE_STORAGE_ORDER);
+  else
+    verify_variant_match (TYPE_SATURATING);
   /* FIXME: This check trigger during libstdc++ build.  */
   if (RECORD_OR_UNION_TYPE_P (t) && COMPLETE_TYPE_P (t) && 0)
     verify_variant_match (TYPE_FINAL_P);
@@ -12981,6 +12985,7 @@  gimple_canonical_types_compatible_p (con
       if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
 						trust_type_canonical)
 	  || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
+	  || TYPE_REVERSE_STORAGE_ORDER (t1) != TYPE_REVERSE_STORAGE_ORDER (t2)
 	  || TYPE_NONALIASED_COMPONENT (t1) != TYPE_NONALIASED_COMPONENT (t2))
 	return false;
       else
@@ -13054,6 +13059,9 @@  gimple_canonical_types_compatible_p (con
       {
 	tree f1, f2;
 
+	if (TYPE_REVERSE_STORAGE_ORDER (t1) != TYPE_REVERSE_STORAGE_ORDER (t2))
+	  return false;
+
 	/* For aggregate types, all the fields must be the same.  */
 	for (f1 = TYPE_FIELDS (t1), f2 = TYPE_FIELDS (t2);
 	     f1 || f2;
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(.../trunk)	(revision 224461)
+++ gcc/tree.h	(.../branches/scalar-storage-order)	(revision 224467)
@@ -896,8 +896,29 @@  extern void omp_clause_range_check_faile
 #define IDENTIFIER_TRANSPARENT_ALIAS(NODE) \
   (IDENTIFIER_NODE_CHECK (NODE)->base.deprecated_flag)
 
-/* In fixed-point types, means a saturating type.  */
-#define TYPE_SATURATING(NODE) (TYPE_CHECK (NODE)->base.u.bits.saturating_flag)
+/* In an aggregate type, indicates that the scalar fields of the type are
+   stored in reverse order from the target order.  This effectively
+   toggles BYTES_BIG_ENDIAN and WORDS_BIG_ENDIAN within the type.  */
+#define TYPE_REVERSE_STORAGE_ORDER(NODE) \
+  (TREE_CHECK4 (NODE, RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE, ARRAY_TYPE)->base.u.bits.saturating_flag)
+
+/* In a non-aggregate type, indicates a saturating type.  */
+#define TYPE_SATURATING(NODE) \
+  (TREE_NOT_CHECK4 (NODE, RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE, ARRAY_TYPE)->base.u.bits.saturating_flag)
+
+/* In a BIT_FIELD_REF and MEM_REF, indicates that the reference is to a group
+   of bits stored in reverse order from the target order.  This effectively
+   toggles both BYTES_BIG_ENDIAN and WORDS_BIG_ENDIAN for the reference.
+
+   The overall strategy is to preserve the invariant that every scalar in
+   memory is associated with a single storage order, i.e. all accesses to
+   this scalar are done with the same storage order.  This invariant makes
+   it possible to factor out the storage order in most transformations, as
+   only the address and/or the value (in target order) matter for them.
+   But, of course, the storage order must be preserved when the accesses
+   themselves are rewritten or transformed.  */
+#define REF_REVERSE_STORAGE_ORDER(NODE) \
+  (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag)
 
 /* These flags are available for each language front end to use internally.  */
 #define TREE_LANG_FLAG_0(NODE) \
@@ -4285,6 +4306,31 @@  handled_component_p (const_tree t)
     }
 }
 
+/* Return true if REF is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
+   that can modify the storage order of objects.  Note that, even if the
+   TYPE_REVERSE_STORAGE_ORDER flag is set on both the inner type and the
+   outer type, a VIEW_CONVERT_EXPR can modify the storage order because
+   it can change the partition of the aggregate object into scalars.  */
+
+static inline bool
+storage_order_barrier_p (const_tree ref)
+{
+  if (TREE_CODE (ref) != VIEW_CONVERT_EXPR)
+    return false;
+
+  if (AGGREGATE_TYPE_P (TREE_TYPE (ref))
+      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (ref)))
+    return true;
+
+  tree op = TREE_OPERAND (ref, 0);
+
+  if (AGGREGATE_TYPE_P (TREE_TYPE (op))
+      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (op)))
+    return true;
+
+  return false;
+}
+
 /* Given a DECL or TYPE, return the scope in which it was declared, or
    NUL_TREE if there is no containing scope.  */
 
@@ -5104,7 +5150,7 @@  extern bool complete_ctor_at_level_p (co
    the access position and size.  */
 extern tree get_inner_reference (tree, HOST_WIDE_INT *, HOST_WIDE_INT *,
 				 tree *, machine_mode *, int *, int *,
-				 bool);
+				 int *, bool);
 
 extern tree build_personality_function (const char *);
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(.../trunk)	(revision 224461)
+++ gcc/fold-const.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -126,12 +126,12 @@  static int twoval_comparison_p (tree, tr
 static tree eval_subst (location_t, tree, tree, tree, tree, tree);
 static tree distribute_bit_expr (location_t, enum tree_code, tree, tree, tree);
 static tree make_bit_field_ref (location_t, tree, tree,
-				HOST_WIDE_INT, HOST_WIDE_INT, int);
+				HOST_WIDE_INT, HOST_WIDE_INT, int, int);
 static tree optimize_bit_field_compare (location_t, enum tree_code,
 					tree, tree, tree);
 static tree decode_field_reference (location_t, tree, HOST_WIDE_INT *,
 				    HOST_WIDE_INT *,
-				    machine_mode *, int *, int *,
+				    machine_mode *, int *, int *, int *,
 				    tree *, tree *);
 static int simple_operand_p (const_tree);
 static bool simple_operand_p_2 (tree);
@@ -3658,15 +3658,17 @@  distribute_real_division (location_t loc
 }
 
 /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER
-   starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero.  */
+   starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero
+   and uses reverse storage order if REVERSEP is nonzero.  */
 
 static tree
 make_bit_field_ref (location_t loc, tree inner, tree type,
-		    HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos, int unsignedp)
+		    HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+		    int unsignedp, int reversep)
 {
   tree result, bftype;
 
-  if (bitpos == 0)
+  if (bitpos == 0 && !reversep)
     {
       tree size = TYPE_SIZE (TREE_TYPE (inner));
       if ((INTEGRAL_TYPE_P (TREE_TYPE (inner))
@@ -3683,6 +3685,7 @@  make_bit_field_ref (location_t loc, tree
 
   result = build3_loc (loc, BIT_FIELD_REF, bftype, inner,
 		       size_int (bitsize), bitsize_int (bitpos));
+  REF_REVERSE_STORAGE_ORDER (result) = reversep;
 
   if (bftype != type)
     result = fold_convert_loc (loc, type, result);
@@ -3720,6 +3723,7 @@  optimize_bit_field_compare (location_t l
   int const_p = TREE_CODE (rhs) == INTEGER_CST;
   machine_mode lmode, rmode, nmode;
   int lunsignedp, runsignedp;
+  int lreversep, rreversep;
   int lvolatilep = 0, rvolatilep = 0;
   tree linner, rinner = NULL_TREE;
   tree mask;
@@ -3731,20 +3735,23 @@  optimize_bit_field_compare (location_t l
      do anything if the inner expression is a PLACEHOLDER_EXPR since we
      then will no longer be able to replace it.  */
   linner = get_inner_reference (lhs, &lbitsize, &lbitpos, &offset, &lmode,
-				&lunsignedp, &lvolatilep, false);
+				&lunsignedp, &lreversep, &lvolatilep, false);
   if (linner == lhs || lbitsize == GET_MODE_BITSIZE (lmode) || lbitsize < 0
       || offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR || lvolatilep)
     return 0;
 
- if (!const_p)
+  if (const_p)
+    rreversep = lreversep;
+  else
    {
      /* If this is not a constant, we can only do something if bit positions,
-	sizes, and signedness are the same.  */
-     rinner = get_inner_reference (rhs, &rbitsize, &rbitpos, &offset, &rmode,
-				   &runsignedp, &rvolatilep, false);
+	sizes, signedness and storage order are the same.  */
+     rinner
+       = get_inner_reference (rhs, &rbitsize, &rbitpos, &offset, &rmode,
+			      &runsignedp, &rreversep, &rvolatilep, false);
 
      if (rinner == rhs || lbitpos != rbitpos || lbitsize != rbitsize
-	 || lunsignedp != runsignedp || offset != 0
+	 || lunsignedp != runsignedp || lreversep != rreversep || offset != 0
 	 || TREE_CODE (rinner) == PLACEHOLDER_EXPR || rvolatilep)
        return 0;
    }
@@ -3772,7 +3779,7 @@  optimize_bit_field_compare (location_t l
   if (nbitsize == lbitsize)
     return 0;
 
-  if (BYTES_BIG_ENDIAN)
+  if (lreversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     lbitpos = nbitsize - lbitsize - lbitpos;
 
   /* Make the mask to be used against the extracted field.  */
@@ -3789,17 +3796,17 @@  optimize_bit_field_compare (location_t l
 				     make_bit_field_ref (loc, linner,
 							 unsigned_type,
 							 nbitsize, nbitpos,
-							 1),
+							 1, lreversep),
 				     mask),
 			fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
 				     make_bit_field_ref (loc, rinner,
 							 unsigned_type,
 							 nbitsize, nbitpos,
-							 1),
+							 1, rreversep),
 				     mask));
 
-  /* Otherwise, we are handling the constant case. See if the constant is too
-     big for the field.  Warn and return a tree of for 0 (false) if so.  We do
+  /* Otherwise, we are handling the constant case.  See if the constant is too
+     big for the field.  Warn and return a tree for 0 (false) if so.  We do
      this not only for its own sake, but to avoid having to test for this
      error case below.  If we didn't, we might generate wrong code.
 
@@ -3837,7 +3844,8 @@  optimize_bit_field_compare (location_t l
   /* Make a new bitfield reference, shift the constant over the
      appropriate number of bits and mask it with the computed mask
      (in case this was a signed field).  If we changed it, make a new one.  */
-  lhs = make_bit_field_ref (loc, linner, unsigned_type, nbitsize, nbitpos, 1);
+  lhs = make_bit_field_ref (loc, linner, unsigned_type, nbitsize, nbitpos, 1,
+			    lreversep);
 
   rhs = const_binop (BIT_AND_EXPR,
 		     const_binop (LSHIFT_EXPR,
@@ -3865,6 +3873,8 @@  optimize_bit_field_compare (location_t l
 
    *PUNSIGNEDP is set to the signedness of the field.
 
+   *PREVERSEP is set to the storage order of the field.
+
    *PMASK is set to the mask used.  This is either contained in a
    BIT_AND_EXPR or derived from the width of the field.
 
@@ -3876,7 +3886,7 @@  optimize_bit_field_compare (location_t l
 static tree
 decode_field_reference (location_t loc, tree exp, HOST_WIDE_INT *pbitsize,
 			HOST_WIDE_INT *pbitpos, machine_mode *pmode,
-			int *punsignedp, int *pvolatilep,
+			int *punsignedp, int *preversep, int *pvolatilep,
 			tree *pmask, tree *pand_mask)
 {
   tree outer_type = 0;
@@ -3909,7 +3919,7 @@  decode_field_reference (location_t loc,
     }
 
   inner = get_inner_reference (exp, pbitsize, pbitpos, &offset, pmode,
-			       punsignedp, pvolatilep, false);
+			       punsignedp, preversep, pvolatilep, false);
   if ((inner == exp && and_mask == 0)
       || *pbitsize < 0 || offset != 0
       || TREE_CODE (inner) == PLACEHOLDER_EXPR)
@@ -5415,6 +5425,7 @@  fold_truth_andor_1 (location_t loc, enum
   HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
   HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
   int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
+  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
   machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
   machine_mode lnmode, rnmode;
   tree ll_mask, lr_mask, rl_mask, rr_mask;
@@ -5527,33 +5538,39 @@  fold_truth_andor_1 (location_t loc, enum
   volatilep = 0;
   ll_inner = decode_field_reference (loc, ll_arg,
 				     &ll_bitsize, &ll_bitpos, &ll_mode,
-				     &ll_unsignedp, &volatilep, &ll_mask,
-				     &ll_and_mask);
+				     &ll_unsignedp, &ll_reversep, &volatilep,
+				     &ll_mask, &ll_and_mask);
   lr_inner = decode_field_reference (loc, lr_arg,
 				     &lr_bitsize, &lr_bitpos, &lr_mode,
-				     &lr_unsignedp, &volatilep, &lr_mask,
-				     &lr_and_mask);
+				     &lr_unsignedp, &lr_reversep, &volatilep,
+				     &lr_mask, &lr_and_mask);
   rl_inner = decode_field_reference (loc, rl_arg,
 				     &rl_bitsize, &rl_bitpos, &rl_mode,
-				     &rl_unsignedp, &volatilep, &rl_mask,
-				     &rl_and_mask);
+				     &rl_unsignedp, &rl_reversep, &volatilep,
+				     &rl_mask, &rl_and_mask);
   rr_inner = decode_field_reference (loc, rr_arg,
 				     &rr_bitsize, &rr_bitpos, &rr_mode,
-				     &rr_unsignedp, &volatilep, &rr_mask,
-				     &rr_and_mask);
+				     &rr_unsignedp, &rr_reversep, &volatilep,
+				     &rr_mask, &rr_and_mask);
 
   /* It must be true that the inner operation on the lhs of each
      comparison must be the same if we are to be able to do anything.
      Then see if we have constants.  If not, the same must be true for
      the rhs's.  */
-  if (volatilep || ll_inner == 0 || rl_inner == 0
+  if (volatilep
+      || ll_reversep != rl_reversep
+      || ll_inner == 0 || rl_inner == 0
       || ! operand_equal_p (ll_inner, rl_inner, 0))
     return 0;
 
   if (TREE_CODE (lr_arg) == INTEGER_CST
       && TREE_CODE (rr_arg) == INTEGER_CST)
-    l_const = lr_arg, r_const = rr_arg;
-  else if (lr_inner == 0 || rr_inner == 0
+    {
+      l_const = lr_arg, r_const = rr_arg;
+      lr_reversep = ll_reversep;
+    }
+  else if (lr_reversep != rr_reversep
+	   || lr_inner == 0 || rr_inner == 0
 	   || ! operand_equal_p (lr_inner, rr_inner, 0))
     return 0;
   else
@@ -5606,7 +5623,7 @@  fold_truth_andor_1 (location_t loc, enum
   lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
   xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
 
-  if (BYTES_BIG_ENDIAN)
+  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     {
       xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
       xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
@@ -5671,7 +5688,7 @@  fold_truth_andor_1 (location_t loc, enum
       rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
       xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
 
-      if (BYTES_BIG_ENDIAN)
+      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
 	  xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
@@ -5694,12 +5711,12 @@  fold_truth_andor_1 (location_t loc, enum
       if (lnbitsize == rnbitsize && xll_bitpos == xlr_bitpos)
 	{
 	  lhs = make_bit_field_ref (loc, ll_inner, lntype, lnbitsize, lnbitpos,
-				    ll_unsignedp || rl_unsignedp);
+				    ll_unsignedp || rl_unsignedp, ll_reversep);
 	  if (! all_ones_mask_p (ll_mask, lnbitsize))
 	    lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
 
 	  rhs = make_bit_field_ref (loc, lr_inner, rntype, rnbitsize, rnbitpos,
-				    lr_unsignedp || rr_unsignedp);
+				    lr_unsignedp || rr_unsignedp, lr_reversep);
 	  if (! all_ones_mask_p (lr_mask, rnbitsize))
 	    rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
 
@@ -5722,10 +5739,12 @@  fold_truth_andor_1 (location_t loc, enum
 
 	  lhs = make_bit_field_ref (loc, ll_inner, lntype,
 				    ll_bitsize + rl_bitsize,
-				    MIN (ll_bitpos, rl_bitpos), ll_unsignedp);
+				    MIN (ll_bitpos, rl_bitpos),
+				    ll_unsignedp, ll_reversep);
 	  rhs = make_bit_field_ref (loc, lr_inner, rntype,
 				    lr_bitsize + rr_bitsize,
-				    MIN (lr_bitpos, rr_bitpos), lr_unsignedp);
+				    MIN (lr_bitpos, rr_bitpos),
+				    lr_unsignedp, lr_reversep);
 
 	  ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
 				 size_int (MIN (xll_bitpos, xrl_bitpos)));
@@ -5788,7 +5807,7 @@  fold_truth_andor_1 (location_t loc, enum
      that field, perform the mask operation.  Then compare with the
      merged constant.  */
   result = make_bit_field_ref (loc, ll_inner, lntype, lnbitsize, lnbitpos,
-			       ll_unsignedp || rl_unsignedp);
+			       ll_unsignedp || rl_unsignedp, ll_reversep);
 
   ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
   if (! all_ones_mask_p (ll_mask, lnbitsize))
@@ -7968,10 +7987,11 @@  fold_unary_loc (location_t loc, enum tre
 	  HOST_WIDE_INT bitsize, bitpos;
 	  tree offset;
 	  machine_mode mode;
-	  int unsignedp, volatilep;
-          tree base = TREE_OPERAND (op0, 0);
-	  base = get_inner_reference (base, &bitsize, &bitpos, &offset,
-				      &mode, &unsignedp, &volatilep, false);
+	  int unsignedp, reversep, volatilep;
+	  tree base
+	    = get_inner_reference (TREE_OPERAND (op0, 0), &bitsize, &bitpos,
+				   &offset, &mode, &unsignedp, &reversep,
+				   &volatilep, false);
 	  /* If the reference was to a (constant) zero offset, we can use
 	     the address of the base if it has the same base type
 	     as the result type and the pointer type is unqualified.  */
@@ -8112,8 +8132,12 @@  fold_unary_loc (location_t loc, enum tre
 
     case VIEW_CONVERT_EXPR:
       if (TREE_CODE (op0) == MEM_REF)
-	return fold_build2_loc (loc, MEM_REF, type,
-				TREE_OPERAND (op0, 0), TREE_OPERAND (op0, 1));
+        {
+	  tem = fold_build2_loc (loc, MEM_REF, type,
+				 TREE_OPERAND (op0, 0), TREE_OPERAND (op0, 1));
+	  REF_REVERSE_STORAGE_ORDER (tem) = REF_REVERSE_STORAGE_ORDER (op0);
+	  return tem;
+	}
 
       return NULL_TREE;
 
@@ -8845,7 +8869,7 @@  fold_comparison (location_t loc, enum tr
       tree base0, base1, offset0 = NULL_TREE, offset1 = NULL_TREE;
       HOST_WIDE_INT bitsize, bitpos0 = 0, bitpos1 = 0;
       machine_mode mode;
-      int volatilep, unsignedp;
+      int volatilep, reversep, unsignedp;
       bool indirect_base0 = false, indirect_base1 = false;
 
       /* Get base and offset for the access.  Strip ADDR_EXPR for
@@ -8855,9 +8879,10 @@  fold_comparison (location_t loc, enum tr
       base0 = arg0;
       if (TREE_CODE (arg0) == ADDR_EXPR)
 	{
-	  base0 = get_inner_reference (TREE_OPERAND (arg0, 0),
-				       &bitsize, &bitpos0, &offset0, &mode,
-				       &unsignedp, &volatilep, false);
+	  base0
+	    = get_inner_reference (TREE_OPERAND (arg0, 0),
+				   &bitsize, &bitpos0, &offset0, &mode,
+				   &unsignedp, &reversep, &volatilep, false);
 	  if (TREE_CODE (base0) == INDIRECT_REF)
 	    base0 = TREE_OPERAND (base0, 0);
 	  else
@@ -8889,9 +8914,10 @@  fold_comparison (location_t loc, enum tr
       base1 = arg1;
       if (TREE_CODE (arg1) == ADDR_EXPR)
 	{
-	  base1 = get_inner_reference (TREE_OPERAND (arg1, 0),
-				       &bitsize, &bitpos1, &offset1, &mode,
-				       &unsignedp, &volatilep, false);
+	  base1
+	    = get_inner_reference (TREE_OPERAND (arg1, 0),
+				   &bitsize, &bitpos1, &offset1, &mode,
+				   &unsignedp, &reversep, &volatilep, false);
 	  if (TREE_CODE (base1) == INDIRECT_REF)
 	    base1 = TREE_OPERAND (base1, 0);
 	  else
@@ -16061,15 +16087,15 @@  split_address_to_core_and_offset (tree e
 {
   tree core;
   machine_mode mode;
-  int unsignedp, volatilep;
+  int unsignedp, reversep, volatilep;
   HOST_WIDE_INT bitsize;
   location_t loc = EXPR_LOCATION (exp);
 
   if (TREE_CODE (exp) == ADDR_EXPR)
     {
       core = get_inner_reference (TREE_OPERAND (exp, 0), &bitsize, pbitpos,
-				  poffset, &mode, &unsignedp, &volatilep,
-				  false);
+				  poffset, &mode, &unsignedp, &reversep,
+				  &volatilep, false);
       core = build_fold_addr_expr_loc (loc, core);
     }
   else
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(.../trunk)	(revision 224461)
+++ gcc/expr.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -147,11 +147,11 @@  static rtx_insn *compress_float_constant
 static rtx get_subtarget (rtx);
 static void store_constructor_field (rtx, unsigned HOST_WIDE_INT,
 				     HOST_WIDE_INT, machine_mode,
-				     tree, int, alias_set_type);
-static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
+				     tree, int, alias_set_type, bool);
+static void store_constructor (tree, rtx, int, HOST_WIDE_INT, bool);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
 			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
-			machine_mode, tree, alias_set_type, bool);
+			machine_mode, tree, alias_set_type, bool, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
 
@@ -1710,7 +1710,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, r
 		  && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode))
 		tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
 					     (bytepos % slen0) * BITS_PER_UNIT,
-					     1, NULL_RTX, mode, mode);
+					     1, NULL_RTX, mode, mode, false);
 	    }
 	  else
 	    {
@@ -1720,7 +1720,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, r
 	      mem = assign_stack_temp (GET_MODE (src), slen);
 	      emit_move_insn (mem, src);
 	      tmps[i] = extract_bit_field (mem, bytelen * BITS_PER_UNIT,
-					   0, 1, NULL_RTX, mode, mode);
+					   0, 1, NULL_RTX, mode, mode, false);
 	    }
 	}
       /* FIXME: A SIMD parallel will eventually lead to a subreg of a
@@ -1763,7 +1763,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, r
       else
 	tmps[i] = extract_bit_field (src, bytelen * BITS_PER_UNIT,
 				     bytepos * BITS_PER_UNIT, 1, NULL_RTX,
-				     mode, mode);
+				     mode, mode, false);
 
       if (shift)
 	tmps[i] = expand_shift (LSHIFT_EXPR, mode, tmps[i],
@@ -2071,7 +2071,7 @@  emit_group_store (rtx orig_dst, rtx src,
 	  store_bit_field (dest,
 			   adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
 			   bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
-			   VOIDmode, tmps[i]);
+			   VOIDmode, tmps[i], false);
 	}
 
       /* Optimize the access just a bit.  */
@@ -2084,7 +2084,7 @@  emit_group_store (rtx orig_dst, rtx src,
 
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 0, 0, mode, tmps[i], false);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2214,7 +2214,9 @@  copy_blkmode_from_reg (rtx target, rtx s
       store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1,
-					  NULL_RTX, copy_mode, copy_mode));
+					  NULL_RTX, copy_mode, copy_mode,
+					  false),
+		       false);
     }
 }
 
@@ -2291,7 +2293,9 @@  copy_blkmode_to_reg (machine_mode mode,
 		       0, 0, word_mode,
 		       extract_bit_field (src_word, bitsize,
 					  bitpos % BITS_PER_WORD, 1,
-					  NULL_RTX, word_mode, word_mode));
+					  NULL_RTX, word_mode, word_mode,
+					  false),
+		       false);
     }
 
   if (mode == BLKmode)
@@ -3036,7 +3040,8 @@  write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
+		   false);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3099,7 +3104,7 @@  read_complex_part (rtx cplx, bool imag_p
     }
 
   return extract_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
-			    true, NULL_RTX, imode, imode);
+			    true, NULL_RTX, imode, imode, false);
 }
 
 /* A subroutine of emit_move_insn_1.  Yet another lowpart generator.
@@ -4498,7 +4503,7 @@  optimize_bitfield_assignment_op (unsigne
 				 unsigned HOST_WIDE_INT bitregion_start,
 				 unsigned HOST_WIDE_INT bitregion_end,
 				 machine_mode mode1, rtx str_rtx,
-				 tree to, tree src)
+				 tree to, tree src, bool reverse)
 {
   machine_mode str_mode = GET_MODE (str_rtx);
   unsigned int str_bitsize = GET_MODE_BITSIZE (str_mode);
@@ -4571,6 +4576,8 @@  optimize_bitfield_assignment_op (unsigne
     }
   else if (!REG_P (str_rtx) && GET_CODE (str_rtx) != SUBREG)
     return false;
+  else
+    gcc_assert (!reverse);
 
   /* If the bit field covers the whole REG/MEM, store_field
      will likely generate better code.  */
@@ -4594,7 +4601,7 @@  optimize_bitfield_assignment_op (unsigne
 	 We might win by one instruction for the other bitfields
 	 too if insv/extv instructions aren't used, so that
 	 can be added later.  */
-      if (bitpos + bitsize != str_bitsize
+      if ((reverse || bitpos + bitsize != str_bitsize)
 	  && (bitsize != 1 || TREE_CODE (op1) != INTEGER_CST))
 	break;
 
@@ -4612,13 +4619,17 @@  optimize_bitfield_assignment_op (unsigne
 	  set_mem_expr (str_rtx, 0);
 	}
 
-      binop = code == PLUS_EXPR ? add_optab : sub_optab;
-      if (bitsize == 1 && bitpos + bitsize != str_bitsize)
+      if (bitsize == 1 && (reverse || bitpos + bitsize != str_bitsize))
 	{
 	  value = expand_and (str_mode, value, const1_rtx, NULL);
 	  binop = xor_optab;
 	}
+      else
+	binop = code == PLUS_EXPR ? add_optab : sub_optab;
+
       value = expand_shift (LSHIFT_EXPR, str_mode, value, bitpos, NULL_RTX, 1);
+      if (reverse)
+	value = flip_storage_order (str_mode, value);
       result = expand_binop (str_mode, binop, str_rtx,
 			     value, str_rtx, 1, OPTAB_WIDEN);
       if (result != str_rtx)
@@ -4651,6 +4662,8 @@  optimize_bitfield_assignment_op (unsigne
 	  value = expand_and (str_mode, value, mask, NULL_RTX);
 	}
       value = expand_shift (LSHIFT_EXPR, str_mode, value, bitpos, NULL_RTX, 1);
+      if (reverse)
+	value = flip_storage_order (str_mode, value);
       result = expand_binop (str_mode, binop, str_rtx,
 			     value, str_rtx, 1, OPTAB_WIDEN);
       if (result != str_rtx)
@@ -4705,10 +4718,10 @@  get_bit_range (unsigned HOST_WIDE_INT *b
       machine_mode rmode;
       HOST_WIDE_INT rbitsize, rbitpos;
       tree roffset;
-      int unsignedp;
-      int volatilep = 0;
+      int unsignedp, reversep, volatilep = 0;
       get_inner_reference (TREE_OPERAND (exp, 0), &rbitsize, &rbitpos,
-			   &roffset, &rmode, &unsignedp, &volatilep, false);
+			   &roffset, &rmode, &unsignedp, &reversep,
+			   &volatilep, false);
       if ((rbitpos % BITS_PER_UNIT) != 0)
 	{
 	  *bitstart = *bitend = 0;
@@ -4824,6 +4837,8 @@  expand_assignment (tree to, tree from, b
       reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
       reg = force_not_mem (reg);
       mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
+      if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
+	reg = flip_storage_order (mode, reg);
 
       if (icode != CODE_FOR_nothing)
 	{
@@ -4836,7 +4851,8 @@  expand_assignment (tree to, tree from, b
 	  expand_insn (icode, 2, ops);
 	}
       else
-	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg);
+	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
+			 false);
       return;
     }
 
@@ -4847,7 +4863,8 @@  expand_assignment (tree to, tree from, b
      problem.  Same for (partially) storing into a non-memory object.  */
   if (handled_component_p (to)
       || (TREE_CODE (to) == MEM_REF
-	  && mem_ref_refers_to_non_mem_p (to))
+	  && (REF_REVERSE_STORAGE_ORDER (to)
+	      || mem_ref_refers_to_non_mem_p (to)))
       || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE)
     {
       machine_mode mode1;
@@ -4855,13 +4872,12 @@  expand_assignment (tree to, tree from, b
       unsigned HOST_WIDE_INT bitregion_start = 0;
       unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
-      int unsignedp;
-      int volatilep = 0;
+      int unsignedp, reversep, volatilep = 0;
       tree tem;
 
       push_temp_slots ();
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
-				 &unsignedp, &volatilep, true);
+				 &unsignedp, &reversep, &volatilep, true);
 
       /* Make sure bitpos is not negative, it can wreak havoc later.  */
       if (bitpos < 0)
@@ -4980,22 +4996,22 @@  expand_assignment (tree to, tree from, b
 	  if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from)))
 	      && bitpos == 0
 	      && bitsize == mode_bitsize)
-	    result = store_expr (from, to_rtx, false, nontemporal);
+	    result = store_expr (from, to_rtx, false, nontemporal, reversep);
 	  else if (bitsize == mode_bitsize / 2
 		   && (bitpos == 0 || bitpos == mode_bitsize / 2))
 	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
-				 nontemporal);
+				 nontemporal, reversep);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
 				  bitregion_start, bitregion_end,
-				  mode1, from,
-				  get_alias_set (to), nontemporal);
+				  mode1, from, get_alias_set (to),
+				  nontemporal, reversep);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
 				  bitregion_start, bitregion_end,
-				  mode1, from,
-				  get_alias_set (to), nontemporal);
+				  mode1, from, get_alias_set (to),
+				  nontemporal, reversep);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
 	    {
 	      rtx from_rtx;
@@ -5015,8 +5031,8 @@  expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
-				    mode1, from,
-				    get_alias_set (to), nontemporal);
+				    mode1, from, get_alias_set (to),
+				    nontemporal, reversep);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
 	      emit_move_insn (XEXP (to_rtx, 1), read_complex_part (temp, true));
 	    }
@@ -5035,14 +5051,14 @@  expand_assignment (tree to, tree from, b
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
 					       bitregion_start, bitregion_end,
-					       mode1,
-					       to_rtx, to, from))
+					       mode1, to_rtx, to, from,
+					       reversep))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
 				  bitregion_start, bitregion_end,
-				  mode1, from,
-				  get_alias_set (to), nontemporal);
+				  mode1, from, get_alias_set (to),
+				  nontemporal, reversep);
 	}
 
       if (result)
@@ -5196,7 +5212,7 @@  expand_assignment (tree to, tree from, b
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
-  result = store_expr_with_bounds (from, to_rtx, 0, nontemporal, to);
+  result = store_expr_with_bounds (from, to_rtx, 0, nontemporal, false, to);
   preserve_temp_slots (result);
   pop_temp_slots ();
   return;
@@ -5235,12 +5251,14 @@  emit_storent_insn (rtx to, rtx from)
 
    If NONTEMPORAL is true, try using a nontemporal store instruction.
 
+   If REVERSE is true, the store is to be done in reverse order.
+
    If BTARGET is not NULL then computed bounds of EXP are
    associated with BTARGET.  */
 
 rtx
 store_expr_with_bounds (tree exp, rtx target, int call_param_p,
-			bool nontemporal, tree btarget)
+			bool nontemporal, bool reverse, tree btarget)
 {
   rtx temp;
   rtx alt_rtl = NULL_RTX;
@@ -5262,7 +5280,8 @@  store_expr_with_bounds (tree exp, rtx ta
       expand_expr (TREE_OPERAND (exp, 0), const0_rtx, VOIDmode,
 		   call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL);
       return store_expr_with_bounds (TREE_OPERAND (exp, 1), target,
-				     call_param_p, nontemporal, btarget);
+				     call_param_p, nontemporal, reverse,
+				     btarget);
     }
   else if (TREE_CODE (exp) == COND_EXPR && GET_MODE (target) == BLKmode)
     {
@@ -5277,12 +5296,12 @@  store_expr_with_bounds (tree exp, rtx ta
       NO_DEFER_POP;
       jumpifnot (TREE_OPERAND (exp, 0), lab1, -1);
       store_expr_with_bounds (TREE_OPERAND (exp, 1), target, call_param_p,
-			      nontemporal, btarget);
+			      nontemporal, reverse, btarget);
       emit_jump_insn (gen_jump (lab2));
       emit_barrier ();
       emit_label (lab1);
       store_expr_with_bounds (TREE_OPERAND (exp, 2), target, call_param_p,
-			      nontemporal, btarget);
+			      nontemporal, reverse, btarget);
       emit_label (lab2);
       OK_DEFER_POP;
 
@@ -5421,9 +5440,9 @@  store_expr_with_bounds (tree exp, rtx ta
       rtx tmp_target;
 
   normal_expr:
-      /* If we want to use a nontemporal store, force the value to
-	 register first.  */
-      tmp_target = nontemporal ? NULL_RTX : target;
+      /* If we want to use a nontemporal or a reverse order store, force the
+	 value into a register first.  */
+      tmp_target = nontemporal || reverse ? NULL_RTX : target;
       temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
 			       (call_param_p
 				? EXPAND_STACK_PARM : EXPAND_NORMAL),
@@ -5498,7 +5517,7 @@  store_expr_with_bounds (tree exp, rtx ta
 	      else
 		store_bit_field (target,
 				 INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-				 0, 0, 0, GET_MODE (temp), temp);
+				 0, 0, 0, GET_MODE (temp), temp, reverse);
 	    }
 	  else
 	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
@@ -5597,6 +5616,8 @@  store_expr_with_bounds (tree exp, rtx ta
 	;
       else
 	{
+	  if (reverse)
+	    temp = flip_storage_order (GET_MODE (target), temp);
 	  temp = force_operand (temp, target);
 	  if (temp != target)
 	    emit_move_insn (target, temp);
@@ -5608,9 +5629,11 @@  store_expr_with_bounds (tree exp, rtx ta
 
 /* Same as store_expr_with_bounds but ignoring bounds of EXP.  */
 rtx
-store_expr (tree exp, rtx target, int call_param_p, bool nontemporal)
+store_expr (tree exp, rtx target, int call_param_p, bool nontemporal,
+	    bool reverse)
 {
-  return store_expr_with_bounds (exp, target, call_param_p, nontemporal, NULL);
+  return store_expr_with_bounds (exp, target, call_param_p, nontemporal,
+				 reverse, NULL);
 }
 
 /* Return true if field F of structure TYPE is a flexible array.  */
@@ -5936,6 +5959,7 @@  all_zeros_p (const_tree exp)
    TARGET, BITSIZE, BITPOS, MODE, EXP are as for store_field.
    CLEARED is as for store_constructor.
    ALIAS_SET is the alias set to use for any stores.
+   If REVERSE is true, the store is to be done in reverse order.
 
    This provides a recursive shortcut back to store_constructor when it isn't
    necessary to go through store_field.  This is so that we can pass through
@@ -5945,7 +5969,8 @@  all_zeros_p (const_tree exp)
 static void
 store_constructor_field (rtx target, unsigned HOST_WIDE_INT bitsize,
 			 HOST_WIDE_INT bitpos, machine_mode mode,
-			 tree exp, int cleared, alias_set_type alias_set)
+			 tree exp, int cleared,
+			 alias_set_type alias_set, bool reverse)
 {
   if (TREE_CODE (exp) == CONSTRUCTOR
       /* We can only call store_constructor recursively if the size and
@@ -5974,10 +5999,12 @@  store_constructor_field (rtx target, uns
 	  set_mem_alias_set (target, alias_set);
 	}
 
-      store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
+      store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT,
+			 reverse);
     }
   else
-    store_field (target, bitsize, bitpos, 0, 0, mode, exp, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, 0, mode, exp, alias_set, false,
+		 reverse);
 }
 
 
@@ -6003,10 +6030,12 @@  fields_length (const_tree type)
    CLEARED is true if TARGET is known to have been zero'd.
    SIZE is the number of bytes of TARGET we are allowed to modify: this
    may not be the same as the size of EXP if we are assigning to a field
-   which has been packed to exclude padding bits.  */
+   which has been packed to exclude padding bits.
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 static void
-store_constructor (tree exp, rtx target, int cleared, HOST_WIDE_INT size)
+store_constructor (tree exp, rtx target, int cleared, HOST_WIDE_INT size,
+		   bool reverse)
 {
   tree type = TREE_TYPE (exp);
 #ifdef WORD_REGISTER_OPERATIONS
@@ -6022,6 +6051,9 @@  store_constructor (tree exp, rtx target,
 	unsigned HOST_WIDE_INT idx;
 	tree field, value;
 
+	/* The storage order is specified for every aggregate type.  */
+	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
+
 	/* If size is zero or the target is already cleared, do nothing.  */
 	if (size == 0 || cleared)
 	  cleared = 1;
@@ -6166,7 +6198,8 @@  store_constructor (tree exp, rtx target,
 
 	    store_constructor_field (to_rtx, bitsize, bitpos, mode,
 				     value, cleared,
-				     get_alias_set (TREE_TYPE (field)));
+				     get_alias_set (TREE_TYPE (field)),
+				     reverse);
 	  }
 	break;
       }
@@ -6181,6 +6214,9 @@  store_constructor (tree exp, rtx target,
 	HOST_WIDE_INT minelt = 0;
 	HOST_WIDE_INT maxelt = 0;
 
+	/* The storage order is specified for every aggregate type.  */
+	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
+
 	domain = TYPE_DOMAIN (type);
 	const_bounds_p = (TYPE_MIN_VALUE (domain)
 			  && TYPE_MAX_VALUE (domain)
@@ -6321,7 +6357,7 @@  store_constructor (tree exp, rtx target,
 
 			store_constructor_field
 			  (target, bitsize, bitpos, mode, value, cleared,
-			   get_alias_set (elttype));
+			   get_alias_set (elttype), reverse);
 		      }
 		  }
 		else
@@ -6336,7 +6372,7 @@  store_constructor (tree exp, rtx target,
 					VAR_DECL, NULL_TREE, domain);
 		    index_r = gen_reg_rtx (promote_decl_mode (index, NULL));
 		    SET_DECL_RTL (index, index_r);
-		    store_expr (lo_index, index_r, 0, false);
+		    store_expr (lo_index, index_r, 0, false, reverse);
 
 		    /* Build the head of the loop.  */
 		    do_pending_stack_adjust ();
@@ -6361,9 +6397,9 @@  store_constructor (tree exp, rtx target,
 		    xtarget = adjust_address (xtarget, mode, 0);
 		    if (TREE_CODE (value) == CONSTRUCTOR)
 		      store_constructor (value, xtarget, cleared,
-					 bitsize / BITS_PER_UNIT);
+					 bitsize / BITS_PER_UNIT, reverse);
 		    else
-		      store_expr (value, xtarget, 0, false);
+		      store_expr (value, xtarget, 0, false, reverse);
 
 		    /* Generate a conditional jump to exit the loop.  */
 		    exit_cond = build2 (LT_EXPR, integer_type_node,
@@ -6406,7 +6442,7 @@  store_constructor (tree exp, rtx target,
 					  expand_normal (position),
 					  highest_pow2_factor (position));
 		xtarget = adjust_address (xtarget, mode, 0);
-		store_expr (value, xtarget, 0, false);
+		store_expr (value, xtarget, 0, false, reverse);
 	      }
 	    else
 	      {
@@ -6424,7 +6460,8 @@  store_constructor (tree exp, rtx target,
 		    MEM_KEEP_ALIAS_SET_P (target) = 1;
 		  }
 		store_constructor_field (target, bitsize, bitpos, mode, value,
-					 cleared, get_alias_set (elttype));
+					 cleared, get_alias_set (elttype),
+					 reverse);
 	      }
 	  }
 	break;
@@ -6557,7 +6594,7 @@  store_constructor (tree exp, rtx target,
 		  : eltmode;
 		bitpos = eltpos * elt_size;
 		store_constructor_field (target, bitsize, bitpos, value_mode,
-					 value, cleared, alias);
+					 value, cleared, alias, reverse);
 	      }
 	  }
 
@@ -6590,14 +6627,16 @@  store_constructor (tree exp, rtx target,
    (in general) be different from that for TARGET, since TARGET is a
    reference to the containing structure.
 
-   If NONTEMPORAL is true, try generating a nontemporal store.  */
+   If NONTEMPORAL is true, try generating a nontemporal store.
+
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
 	     unsigned HOST_WIDE_INT bitregion_start,
 	     unsigned HOST_WIDE_INT bitregion_end,
 	     machine_mode mode, tree exp,
-	     alias_set_type alias_set, bool nontemporal)
+	     alias_set_type alias_set, bool nontemporal,  bool reverse)
 {
   if (TREE_CODE (exp) == ERROR_MARK)
     return const0_rtx;
@@ -6612,7 +6651,7 @@  store_field (rtx target, HOST_WIDE_INT b
       /* We're storing into a struct containing a single __complex.  */
 
       gcc_assert (!bitpos);
-      return store_expr (exp, target, 0, nontemporal);
+      return store_expr (exp, target, 0, nontemporal, reverse);
     }
 
   /* If the structure is in a register or if the component
@@ -6674,16 +6713,27 @@  store_field (rtx target, HOST_WIDE_INT b
 
       temp = expand_normal (exp);
 
-      /* If BITSIZE is narrower than the size of the type of EXP
-	 we will be narrowing TEMP.  Normally, what's wanted are the
-	 low-order bits.  However, if EXP's type is a record and this is
-	 big-endian machine, we want the upper BITSIZE bits.  */
-      if (BYTES_BIG_ENDIAN && GET_MODE_CLASS (GET_MODE (temp)) == MODE_INT
-	  && bitsize < (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (temp))
-	  && TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
-	temp = expand_shift (RSHIFT_EXPR, GET_MODE (temp), temp,
-			     GET_MODE_BITSIZE (GET_MODE (temp)) - bitsize,
-			     NULL_RTX, 1);
+      /* If the value has a record type and an integral mode then, if BITSIZE
+	 is narrower than this mode and this is a big-endian machine, we must
+	 first put the value into the low-order bits.  Moreover, the field may
+	 be not aligned on a byte boundary; in this case, if it has reverse
+	 storage order, it needs to be accessed as a scalar field with reverse
+	 storage order and we must first put the value into target order.  */
+      if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE
+	  && GET_MODE_CLASS (GET_MODE (temp)) == MODE_INT)
+	{
+	  HOST_WIDE_INT size = GET_MODE_BITSIZE (GET_MODE (temp));
+
+	  reverse = TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (exp));
+
+	  if (reverse)
+	    temp = flip_storage_order (GET_MODE (temp), temp);
+
+	  if (bitsize < size
+	      && reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	    temp = expand_shift (RSHIFT_EXPR, GET_MODE (temp), temp,
+				 size - bitsize, NULL_RTX, 1);
+	}
 
       /* Unless MODE is VOIDmode or BLKmode, convert TEMP to MODE.  */
       if (mode != VOIDmode && mode != BLKmode
@@ -6743,7 +6793,7 @@  store_field (rtx target, HOST_WIDE_INT b
 	      temp_target = gen_reg_rtx (mode);
 	      temp_target
 	        = extract_bit_field (temp, size * BITS_PER_UNIT, 0, 1,
-				     temp_target, mode, mode);
+				     temp_target, mode, mode, false);
 	      temp = temp_target;
 	    }
 	}
@@ -6751,7 +6801,7 @@  store_field (rtx target, HOST_WIDE_INT b
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
 		       bitregion_start, bitregion_end,
-		       mode, temp);
+		       mode, temp, reverse);
 
       return const0_rtx;
     }
@@ -6766,7 +6816,7 @@  store_field (rtx target, HOST_WIDE_INT b
       if (!MEM_KEEP_ALIAS_SET_P (to_rtx) && MEM_ALIAS_SET (to_rtx) != 0)
 	set_mem_alias_set (to_rtx, alias_set);
 
-      return store_expr (exp, to_rtx, 0, nontemporal);
+      return store_expr (exp, to_rtx, 0, nontemporal, reverse);
     }
 }
 
@@ -6775,7 +6825,8 @@  store_field (rtx target, HOST_WIDE_INT b
    codes and find the ultimate containing object, which we return.
 
    We set *PBITSIZE to the size in bits that we want, *PBITPOS to the
-   bit position, and *PUNSIGNEDP to the signedness of the field.
+   bit position, *PUNSIGNEDP to the signedness and *PREVERSEP to the
+   storage order of the field.
    If the position of the field is variable, we store a tree
    giving the variable offset (in units) in *POFFSET.
    This offset is in addition to the bit position.
@@ -6809,7 +6860,7 @@  tree
 get_inner_reference (tree exp, HOST_WIDE_INT *pbitsize,
 		     HOST_WIDE_INT *pbitpos, tree *poffset,
 		     machine_mode *pmode, int *punsignedp,
-		     int *pvolatilep, bool keep_aligning)
+		     int *preversep, int *pvolatilep, bool keep_aligning)
 {
   tree size_tree = 0;
   machine_mode mode = VOIDmode;
@@ -6817,8 +6868,8 @@  get_inner_reference (tree exp, HOST_WIDE
   tree offset = size_zero_node;
   offset_int bit_offset = 0;
 
-  /* First get the mode, signedness, and size.  We do this from just the
-     outermost expression.  */
+  /* First get the mode, signedness, storage order and size.  We do this from
+     just the outermost expression.  */
   *pbitsize = -1;
   if (TREE_CODE (exp) == COMPONENT_REF)
     {
@@ -6838,12 +6889,18 @@  get_inner_reference (tree exp, HOST_WIDE
 	blkmode_bitfield = true;
 
       *punsignedp = DECL_UNSIGNED (field);
+      /* ??? Fortran can take COMPONENT_REF of a void type.  */
+      *preversep
+        = !VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (exp, 0)))
+	  && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (exp, 0)))
+	  && !AGGREGATE_TYPE_P (TREE_TYPE (exp));
     }
   else if (TREE_CODE (exp) == BIT_FIELD_REF)
     {
       size_tree = TREE_OPERAND (exp, 1);
       *punsignedp = (! INTEGRAL_TYPE_P (TREE_TYPE (exp))
 		     || TYPE_UNSIGNED (TREE_TYPE (exp)));
+      *preversep = REF_REVERSE_STORAGE_ORDER (exp);
 
       /* For vector types, with the correct size of access, use the mode of
 	 inner type.  */
@@ -6856,6 +6913,12 @@  get_inner_reference (tree exp, HOST_WIDE
     {
       mode = TYPE_MODE (TREE_TYPE (exp));
       *punsignedp = TYPE_UNSIGNED (TREE_TYPE (exp));
+      *preversep
+	= ((TREE_CODE (exp) == ARRAY_REF
+	    && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (exp, 0))))
+	   || (TREE_CODE (exp) == MEM_REF
+	       && REF_REVERSE_STORAGE_ORDER (exp)))
+	  && !AGGREGATE_TYPE_P (TREE_TYPE (exp));
 
       if (mode == BLKmode)
 	size_tree = TYPE_SIZE (TREE_TYPE (exp));
@@ -7547,7 +7610,7 @@  expand_expr_addr_expr_1 (tree exp, rtx t
   rtx result, subtarget;
   tree inner, offset;
   HOST_WIDE_INT bitsize, bitpos;
-  int volatilep, unsignedp;
+  int unsignedp, reversep, volatilep = 0;
   machine_mode mode1;
 
   /* If we are taking the address of a constant and are at the top level,
@@ -7662,8 +7725,8 @@  expand_expr_addr_expr_1 (tree exp, rtx t
 	 handle "aligning nodes" here: we can just bypass them because
 	 they won't change the final object whose address will be returned
 	 (they actually exist only for that purpose).  */
-      inner = get_inner_reference (exp, &bitsize, &bitpos, &offset,
-				   &mode1, &unsignedp, &volatilep, false);
+      inner = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
+				   &unsignedp, &reversep, &volatilep, false);
       break;
     }
 
@@ -7847,7 +7910,7 @@  expand_constructor (tree exp, rtx target
       target = assign_temp (type, TREE_ADDRESSABLE (exp), 1);
     }
 
-  store_constructor (exp, target, 0, int_expr_size (exp));
+  store_constructor (exp, target, 0, int_expr_size (exp), false);
   return target;
 }
 
@@ -8120,11 +8183,12 @@  expand_expr_real_2 (sepops ops, rtx targ
 	    store_expr (treeop0,
 			adjust_address (target, TYPE_MODE (valtype), 0),
 			modifier == EXPAND_STACK_PARM,
-			false);
+			false, TYPE_REVERSE_STORAGE_ORDER (type));
 
 	  else
 	    {
-	      gcc_assert (REG_P (target));
+	      gcc_assert (REG_P (target)
+			  && !TYPE_REVERSE_STORAGE_ORDER (type));
 
 	      /* Store this field into a union of the proper type.  */
 	      store_field (target,
@@ -8132,7 +8196,8 @@  expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0, false);
+			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0,
+			   false, false);
 	    }
 
 	  /* Return the entire union.  */
@@ -9064,7 +9129,7 @@  expand_expr_real_2 (sepops ops, rtx targ
         int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
         int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
         temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
-				  target, mode, mode);
+				  target, mode, mode, false);
         gcc_assert (temp);
         return temp;
       }
@@ -9220,14 +9285,14 @@  expand_expr_real_2 (sepops ops, rtx targ
 	jumpifnot (treeop0, lab0, -1);
 	store_expr (treeop1, temp,
 		    modifier == EXPAND_STACK_PARM,
-		    false);
+		    false, false);
 
 	emit_jump_insn (gen_jump (lab1));
 	emit_barrier ();
 	emit_label (lab0);
 	store_expr (treeop2, temp,
 		    modifier == EXPAND_STACK_PARM,
-		    false);
+		    false, false);
 
 	emit_label (lab1);
 	OK_DEFER_POP;
@@ -9767,6 +9832,7 @@  expand_expr_real_1 (tree exp, rtx target
 
     case MEM_REF:
       {
+	const bool reverse = REF_REVERSE_STORAGE_ORDER (exp);
 	addr_space_t as
 	  = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0))));
 	machine_mode address_mode;
@@ -9781,6 +9847,7 @@  expand_expr_real_1 (tree exp, rtx target
 	    HOST_WIDE_INT offset = mem_ref_offset (exp).to_short_addr ();
 	    base = TREE_OPERAND (base, 0);
 	    if (offset == 0
+	        && !reverse
 		&& tree_fits_uhwi_p (TYPE_SIZE (type))
 		&& (GET_MODE_BITSIZE (DECL_MODE (base))
 		    == tree_to_uhwi (TYPE_SIZE (type))))
@@ -9790,13 +9857,14 @@  expand_expr_real_1 (tree exp, rtx target
 	      {
 		temp = assign_stack_temp (DECL_MODE (base),
 					  GET_MODE_SIZE (DECL_MODE (base)));
-		store_expr (base, temp, 0, false);
+		store_expr (base, temp, 0, false, false);
 		temp = adjust_address (temp, BLKmode, offset);
 		set_mem_size (temp, int_size_in_bytes (type));
 		return temp;
 	      }
 	    exp = build3 (BIT_FIELD_REF, type, base, TYPE_SIZE (type),
 			  bitsize_int (offset * BITS_PER_UNIT));
+	    REF_REVERSE_STORAGE_ORDER (exp) = reverse;
 	    return expand_expr (exp, target, tmode, modifier);
 	  }
 	address_mode = targetm.addr_space.address_mode (as);
@@ -9846,8 +9914,10 @@  expand_expr_real_1 (tree exp, rtx target
 					0, TYPE_UNSIGNED (TREE_TYPE (exp)),
 					(modifier == EXPAND_STACK_PARM
 					 ? NULL_RTX : target),
-					mode, mode);
+					mode, mode, false);
 	  }
+	if (reverse && modifier != EXPAND_WRITE)
+	  temp = flip_storage_order (mode, temp);
 	return temp;
       }
 
@@ -10053,9 +10123,10 @@  expand_expr_real_1 (tree exp, rtx target
 	machine_mode mode1, mode2;
 	HOST_WIDE_INT bitsize, bitpos;
 	tree offset;
-	int volatilep = 0, must_force_mem;
-	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
-					&mode1, &unsignedp, &volatilep, true);
+	int reversep, volatilep = 0, must_force_mem;
+	tree tem
+	  = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
+				 &unsignedp, &reversep, &volatilep, true);
 	rtx orig_op0, memloc;
 	bool clear_mem_expr = false;
 
@@ -10296,20 +10367,38 @@  expand_expr_real_1 (tree exp, rtx target
 	    if (MEM_P (op0) && REG_P (XEXP (op0, 0)))
 	      mark_reg_pointer (XEXP (op0, 0), MEM_ALIGN (op0));
 
+	    /* If the result has a record type and the extraction is done in
+	       an integral mode, then the field may be not aligned on a byte
+	       boundary; in this case, if it has reverse storage order, it
+	       needs to be extracted as a scalar field with reverse storage
+	       order and put back into memory order afterwards.  */
+	    if (TREE_CODE (type) == RECORD_TYPE
+		&& GET_MODE_CLASS (ext_mode) == MODE_INT)
+	      reversep = TYPE_REVERSE_STORAGE_ORDER (type);
+
 	    op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp,
 				     (modifier == EXPAND_STACK_PARM
 				      ? NULL_RTX : target),
-				     ext_mode, ext_mode);
+				     ext_mode, ext_mode, reversep);
+
+	    /* If the result has a record type and the mode of OP0 is an
+	       integral mode then, if BITSIZE is narrower than this mode
+	       and this is a big-endian machine, we must put the field
+	       into the high-order bits.  And we must also put it back
+	       into memory order if it has been previously reversed.  */
+	    if (TREE_CODE (type) == RECORD_TYPE
+		&& GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT)
+	      {
+		HOST_WIDE_INT size = GET_MODE_BITSIZE (GET_MODE (op0));
 
-	    /* If the result is a record type and BITSIZE is narrower than
-	       the mode of OP0, an integral mode, and this is a big endian
-	       machine, we must put the field into the high-order bits.  */
-	    if (TREE_CODE (type) == RECORD_TYPE && BYTES_BIG_ENDIAN
-		&& GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
-		&& bitsize < (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (op0)))
-	      op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
-				  GET_MODE_BITSIZE (GET_MODE (op0))
-				  - bitsize, op0, 1);
+		if (bitsize < size
+		    && reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+		  op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
+				      size - bitsize, op0, 1);
+
+		if (reversep)
+		  op0 = flip_storage_order (GET_MODE (op0), op0);
+	      }
 
 	    /* If the result type is BLKmode, store the data into a temporary
 	       of the appropriate type, but with the mode corresponding to the
@@ -10355,6 +10444,10 @@  expand_expr_real_1 (tree exp, rtx target
 	  set_mem_expr (op0, NULL_TREE);
 
 	MEM_VOLATILE_P (op0) |= volatilep;
+
+        if (reversep)
+	  op0 = flip_storage_order (mode1, op0);
+
 	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
 	    || modifier == EXPAND_CONST_ADDRESS
 	    || modifier == EXPAND_INITIALIZER)
@@ -10418,17 +10511,16 @@  expand_expr_real_1 (tree exp, rtx target
 	machine_mode mode1;
 	HOST_WIDE_INT bitsize, bitpos;
 	tree offset;
-	int unsignedp;
-	int volatilep = 0;
+	int unsignedp, reversep, volatilep = 0;
 	tree tem
-	  = get_inner_reference (treeop0, &bitsize, &bitpos,
-				 &offset, &mode1, &unsignedp, &volatilep,
-				 true);
+	  = get_inner_reference (treeop0, &bitsize, &bitpos, &offset, &mode1,
+				 &unsignedp, &reversep, &volatilep, true);
 	rtx orig_op0;
 
 	/* ??? We should work harder and deal with non-zero offsets.  */
 	if (!offset
 	    && (bitpos % BITS_PER_UNIT) == 0
+	    && !reversep
 	    && bitsize >= 0
 	    && compare_tree_int (TYPE_SIZE (type), bitsize) == 0)
 	  {
@@ -10502,7 +10594,7 @@  expand_expr_real_1 (tree exp, rtx target
       else if (reduce_bit_field)
 	return extract_bit_field (op0, TYPE_PRECISION (type), 0,
 				  TYPE_UNSIGNED (type), NULL_RTX,
-				  mode, mode);
+				  mode, mode, false);
       /* As a last resort, spill op0 to memory, and reload it in a
 	 different mode.  */
       else if (!MEM_P (op0))
Index: gcc/expr.h
===================================================================
--- gcc/expr.h	(.../trunk)	(revision 224461)
+++ gcc/expr.h	(.../branches/scalar-storage-order)	(revision 224467)
@@ -229,8 +229,8 @@  extern void expand_assignment (tree, tre
    and storing the value into TARGET.
    If SUGGEST_REG is nonzero, copy the value through a register
    and return that register, if that is possible.  */
-extern rtx store_expr_with_bounds (tree, rtx, int, bool, tree);
-extern rtx store_expr (tree, rtx, int, bool);
+extern rtx store_expr_with_bounds (tree, rtx, int, bool, bool, tree);
+extern rtx store_expr (tree, rtx, int, bool, bool);
 
 /* Given an rtx that may include add and multiply operations,
    generate them as insns and return a pseudo-reg containing the value.
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(.../trunk)	(revision 224461)
+++ gcc/gimplify.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -8211,6 +8211,8 @@  gimplify_expr (tree *expr_p, gimple_seq
 			     TREE_OPERAND (*expr_p, 1));
 	  if (tmp)
 	    {
+	      REF_REVERSE_STORAGE_ORDER (tmp)
+	        = REF_REVERSE_STORAGE_ORDER (*expr_p);
 	      *expr_p = tmp;
 	      recalculate_side_effects (*expr_p);
 	      ret = GS_OK;
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(.../trunk)	(revision 224461)
+++ gcc/calls.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -1093,7 +1093,7 @@  store_unaligned_arguments_into_pseudos (
 
 	    args[i].aligned_regs[j] = reg;
 	    word = extract_bit_field (word, bitsize, 0, 1, NULL_RTX,
-				      word_mode, word_mode);
+				      word_mode, word_mode, false);
 
 	    /* There is no need to restrict this code to loading items
 	       in TYPE_ALIGN sized hunks.  The bitfield instructions can
@@ -1110,7 +1110,7 @@  store_unaligned_arguments_into_pseudos (
 
 	    bytes -= bitsize / BITS_PER_UNIT;
 	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
-			     word_mode, word);
+			     word_mode, word, false);
 	  }
       }
 }
@@ -1391,7 +1391,7 @@  initialize_argument_information (int num
 	      else
 		copy = assign_temp (type, 1, 0);
 
-	      store_expr (args[i].tree_value, copy, 0, false);
+	      store_expr (args[i].tree_value, copy, 0, false, false);
 
 	      /* Just change the const function to pure and then let
 		 the next test clear the pure based on
@@ -2112,8 +2112,8 @@  load_register_parameters (struct arg_dat
 		  rtx dest = gen_rtx_REG (word_mode, REGNO (reg) + nregs - 1);
 		  unsigned int bitoff = (nregs - 1) * BITS_PER_WORD;
 		  unsigned int bitsize = size * BITS_PER_UNIT - bitoff;
-		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1,
-					     dest, word_mode, word_mode);
+		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1, dest,
+					     word_mode, word_mode, false);
 		  if (BYTES_BIG_ENDIAN)
 		    x = expand_shift (LSHIFT_EXPR, word_mode, x,
 				      BITS_PER_WORD - bitsize, dest, 1);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	(.../trunk)	(revision 224461)
+++ gcc/expmed.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -62,24 +62,24 @@  static void store_fixed_bit_field (rtx,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   rtx);
+				   rtx, bool);
 static void store_fixed_bit_field_1 (rtx, unsigned HOST_WIDE_INT,
 				     unsigned HOST_WIDE_INT,
-				     rtx);
+				     rtx, bool);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   rtx);
+				   rtx, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
-				    unsigned HOST_WIDE_INT, rtx, int);
+				    unsigned HOST_WIDE_INT, rtx, int, bool);
 static rtx extract_fixed_bit_field_1 (machine_mode, rtx,
 				      unsigned HOST_WIDE_INT,
-				      unsigned HOST_WIDE_INT, rtx, int);
+				      unsigned HOST_WIDE_INT, rtx, int, bool);
 static rtx lshift_value (machine_mode, unsigned HOST_WIDE_INT, int);
 static rtx extract_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				    unsigned HOST_WIDE_INT, int);
+				    unsigned HOST_WIDE_INT, int, bool);
 static void do_cmp_and_jump (rtx, rtx, enum rtx_code, machine_mode, rtx_code_label *);
 static rtx expand_smod_pow2 (machine_mode, rtx, HOST_WIDE_INT);
 static rtx expand_sdiv_pow2 (machine_mode, rtx, HOST_WIDE_INT);
@@ -333,6 +333,79 @@  negate_rtx (machine_mode mode, rtx x)
   return result;
 }
 
+/* Whether reverse storage order is supported on the target.  */
+static int reverse_storage_order_supported = -1;
+
+/* Check whether reverse storage order is supported on the target.  */
+
+static void
+check_reverse_storage_order_support (void)
+{
+  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+    {
+      reverse_storage_order_supported = 0;
+      sorry ("reverse scalar storage order");
+    }
+  else
+    reverse_storage_order_supported = 1;
+}
+
+/* Whether reverse FP storage order is supported on the target.  */
+static int reverse_float_storage_order_supported = -1;
+
+/* Check whether reverse FP storage order is supported on the target.  */
+
+static void
+check_reverse_float_storage_order_support (void)
+{
+  if (FLOAT_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+    {
+      reverse_float_storage_order_supported = 0;
+      sorry ("reverse floating-point scalar storage order");
+    }
+  else
+    reverse_float_storage_order_supported = 1;
+}
+
+/* Return an rtx representing value of X with reverse storage order.
+   MODE is the intended mode of the result,
+   useful if X is a CONST_INT.  */
+
+rtx
+flip_storage_order (enum machine_mode mode, rtx x)
+{
+  enum machine_mode int_mode;
+  rtx result;
+
+  if (mode == QImode)
+    return x;
+
+  if (__builtin_expect (reverse_storage_order_supported < 0, 0))
+    check_reverse_storage_order_support ();
+
+  if (SCALAR_INT_MODE_P (mode))
+    int_mode = mode;
+  else
+    {
+      if (FLOAT_MODE_P (mode)
+	  && __builtin_expect (reverse_float_storage_order_supported < 0, 0))
+	check_reverse_float_storage_order_support ();
+
+      int_mode = int_mode_for_mode (mode);
+      gcc_assert (int_mode != BLKmode);
+      x = gen_lowpart (int_mode, x);
+    }
+
+  result = simplify_unary_operation (BSWAP, int_mode, x, int_mode);
+  if (result == 0)
+    result = expand_unop (int_mode, bswap_optab, x, NULL_RTX, 1);
+
+  if (int_mode != mode)
+    result = gen_lowpart (mode, result);
+
+  return result;
+}
+
 /* Adjust bitfield memory MEM so that it points to the first unit of mode
    MODE that contains a bitfield of size BITSIZE at bit position BITNUM.
    If MODE is BLKmode, return a reference to every byte in the bitfield.
@@ -636,7 +709,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 		   unsigned HOST_WIDE_INT bitregion_start,
 		   unsigned HOST_WIDE_INT bitregion_end,
 		   machine_mode fieldmode,
-		   rtx value, bool fallback_p)
+		   rtx value, bool reverse, bool fallback_p)
 {
   rtx op0 = str_rtx;
   rtx orig_value;
@@ -714,6 +787,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0);
 	  if (sub)
 	    {
+	      if (reverse)
+		sub = flip_storage_order (GET_MODE (op0), sub);
 	      emit_move_insn (op0, sub);
 	      return true;
 	    }
@@ -724,6 +799,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
 				     bitnum / BITS_PER_UNIT);
 	  if (sub)
 	    {
+	      if (reverse)
+		value = flip_storage_order (fieldmode, value);
 	      emit_move_insn (sub, value);
 	      return true;
 	    }
@@ -736,6 +813,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (simple_mem_bitfield_p (op0, bitsize, bitnum, fieldmode))
     {
       op0 = adjust_bitfield_address (op0, fieldmode, bitnum / BITS_PER_UNIT);
+      if (reverse)
+	value = flip_storage_order (fieldmode, value);
       emit_move_insn (op0, value);
       return true;
     }
@@ -762,6 +841,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
      can be done with a movstrict instruction.  */
 
   if (!MEM_P (op0)
+      && !reverse
       && lowpart_bit_field_p (bitnum, bitsize, GET_MODE (op0))
       && bitsize == GET_MODE_BITSIZE (fieldmode)
       && optab_handler (movstrict_optab, fieldmode) != CODE_FOR_nothing)
@@ -805,7 +885,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	 be less than full.
 	 However, only do that if the value is not BLKmode.  */
 
-      unsigned int backwards = WORDS_BIG_ENDIAN && fieldmode != BLKmode;
+      const bool backwards = WORDS_BIG_ENDIAN && fieldmode != BLKmode;
       unsigned int nwords = (bitsize + (BITS_PER_WORD - 1)) / BITS_PER_WORD;
       unsigned int i;
       rtx_insn *last;
@@ -828,7 +908,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 				  ? GET_MODE_SIZE (fieldmode) / UNITS_PER_WORD
 				  - i - 1
 				  : i);
-	  unsigned int bit_offset = (backwards
+	  unsigned int bit_offset = (backwards ^ reverse
 				     ? MAX ((int) bitsize - ((int) i + 1)
 					    * BITS_PER_WORD,
 					    0)
@@ -852,7 +932,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 				  bitnum + bit_offset,
 				  bitregion_start, bitregion_end,
 				  word_mode,
-				  value_word, fallback_p))
+				  value_word, reverse, fallback_p))
 	    {
 	      delete_insns_since (last);
 	      return false;
@@ -888,7 +968,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	    return false;
 
 	  store_split_bit_field (op0, bitsize, bitnum, bitregion_start,
-				 bitregion_end, value);
+				 bitregion_end, value, reverse);
 	  return true;
 	}
     }
@@ -899,6 +979,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
   extraction_insn insv;
   if (!MEM_P (op0)
+      && !reverse
       && get_best_reg_extraction_insn (&insv, EP_insv,
 				       GET_MODE_BITSIZE (GET_MODE (op0)),
 				       fieldmode)
@@ -907,7 +988,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
   /* If OP0 is a memory, try copying it to a register and seeing if a
      cheap register alternative is available.  */
-  if (MEM_P (op0))
+  if (MEM_P (op0) && !reverse)
     {
       if (get_best_mem_extraction_insn (&insv, EP_insv, bitsize, bitnum,
 					fieldmode)
@@ -927,7 +1008,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  rtx tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, bitpos,
 				 bitregion_start, bitregion_end,
-				 fieldmode, orig_value, false))
+				 fieldmode, orig_value, reverse, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
 	      return true;
@@ -940,7 +1021,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
     return false;
 
   store_fixed_bit_field (op0, bitsize, bitnum, bitregion_start,
-			 bitregion_end, value);
+			 bitregion_end, value, reverse);
   return true;
 }
 
@@ -953,7 +1034,9 @@  store_bit_field_1 (rtx str_rtx, unsigned
    These two fields are 0, if the C++ memory model does not apply,
    or we are not interested in keeping track of bitfield regions.
 
-   FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
+   FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
+
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
@@ -961,7 +1044,7 @@  store_bit_field (rtx str_rtx, unsigned H
 		 unsigned HOST_WIDE_INT bitregion_start,
 		 unsigned HOST_WIDE_INT bitregion_end,
 		 machine_mode fieldmode,
-		 rtx value)
+		 rtx value, bool reverse)
 {
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
   if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, fieldmode,
@@ -975,6 +1058,8 @@  store_bit_field (rtx str_rtx, unsigned H
 	{
 	  str_rtx = adjust_bitfield_address (str_rtx, fieldmode,
 					     bitnum / BITS_PER_UNIT);
+	  if (reverse)
+	    value = flip_storage_order (fieldmode, value);
 	  gcc_assert (bitnum % BITS_PER_UNIT == 0);
 	  emit_move_insn (str_rtx, value);
 	}
@@ -987,7 +1072,7 @@  store_bit_field (rtx str_rtx, unsigned H
 	  gcc_assert (bitnum + bitsize <= GET_MODE_BITSIZE (fieldmode));
 	  temp = copy_to_reg (str_rtx);
 	  if (!store_bit_field_1 (temp, bitsize, bitnum, 0, 0,
-				  fieldmode, value, true))
+				  fieldmode, value, reverse, true))
 	    gcc_unreachable ();
 
 	  emit_move_insn (str_rtx, temp);
@@ -1020,19 +1105,21 @@  store_bit_field (rtx str_rtx, unsigned H
 
   if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
 			  bitregion_start, bitregion_end,
-			  fieldmode, value, true))
+			  fieldmode, value, reverse, true))
     gcc_unreachable ();
 }
 
 /* Use shifts and boolean operations to store VALUE into a bit field of
-   width BITSIZE in OP0, starting at bit BITNUM.  */
+   width BITSIZE in OP0, starting at bit BITNUM.
+
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitnum,
 		       unsigned HOST_WIDE_INT bitregion_start,
 		       unsigned HOST_WIDE_INT bitregion_end,
-		       rtx value)
+		       rtx value, bool reverse)
 {
   /* There is a case not handled here:
      a structure with a known alignment of just a halfword
@@ -1055,14 +1142,14 @@  store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitnum, bitregion_start,
-				 bitregion_end, value);
+				 bitregion_end, value, reverse);
 	  return;
 	}
 
       op0 = narrow_bit_field_mem (op0, mode, bitsize, bitnum, &bitnum);
     }
 
-  store_fixed_bit_field_1 (op0, bitsize, bitnum, value);
+  store_fixed_bit_field_1 (op0, bitsize, bitnum, value, reverse);
 }
 
 /* Helper function for store_fixed_bit_field, stores
@@ -1071,7 +1158,7 @@  store_fixed_bit_field (rtx op0, unsigned
 static void
 store_fixed_bit_field_1 (rtx op0, unsigned HOST_WIDE_INT bitsize,
 			 unsigned HOST_WIDE_INT bitnum,
-			 rtx value)
+			 rtx value, bool reverse)
 {
   machine_mode mode;
   rtx temp;
@@ -1084,7 +1171,7 @@  store_fixed_bit_field_1 (rtx op0, unsign
   /* Note that bitsize + bitnum can be greater than GET_MODE_BITSIZE (mode)
      for invalid input, such as f5 from gcc.dg/pr48335-2.c.  */
 
-  if (BYTES_BIG_ENDIAN)
+  if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     /* BITNUM is the distance between our msb
        and that of the containing datum.
        Convert it to the distance from the lsb.  */
@@ -1130,6 +1217,9 @@  store_fixed_bit_field_1 (rtx op0, unsign
 			      bitnum, NULL_RTX, 1);
     }
 
+  if (reverse)
+    value = flip_storage_order (mode, value);
+
   /* Now clear the chosen bits in OP0,
      except that if VALUE is -1 we need not bother.  */
   /* We keep the intermediates in registers to allow CSE to combine
@@ -1139,8 +1229,10 @@  store_fixed_bit_field_1 (rtx op0, unsign
 
   if (! all_one)
     {
-      temp = expand_binop (mode, and_optab, temp,
-			   mask_rtx (mode, bitnum, bitsize, 1),
+      rtx mask = mask_rtx (mode, bitnum, bitsize, 1);
+      if (reverse)
+	mask = flip_storage_order (mode, mask);
+      temp = expand_binop (mode, and_optab, temp, mask,
 			   NULL_RTX, 1, OPTAB_LIB_WIDEN);
       temp = force_reg (mode, temp);
     }
@@ -1168,6 +1260,8 @@  store_fixed_bit_field_1 (rtx op0, unsign
    (within the word).
    VALUE is the value to store.
 
+   If REVERSE is true, the store is to be done in reverse order.
+
    This does not yet handle fields wider than BITS_PER_WORD.  */
 
 static void
@@ -1175,10 +1269,9 @@  store_split_bit_field (rtx op0, unsigned
 		       unsigned HOST_WIDE_INT bitpos,
 		       unsigned HOST_WIDE_INT bitregion_start,
 		       unsigned HOST_WIDE_INT bitregion_end,
-		       rtx value)
+		       rtx value, bool reverse)
 {
-  unsigned int unit;
-  unsigned int bitsdone = 0;
+  unsigned int unit, total_bits, bitsdone = 0;
 
   /* Make sure UNIT isn't larger than BITS_PER_WORD, we can only handle that
      much at a time.  */
@@ -1209,12 +1302,14 @@  store_split_bit_field (rtx op0, unsigned
 					       : word_mode, value));
     }
 
+  total_bits = GET_MODE_BITSIZE (GET_MODE (value));
+
   while (bitsdone < bitsize)
     {
       unsigned HOST_WIDE_INT thissize;
-      rtx part, word;
       unsigned HOST_WIDE_INT thispos;
       unsigned HOST_WIDE_INT offset;
+      rtx part, word;
 
       offset = (bitpos + bitsdone) / unit;
       thispos = (bitpos + bitsdone) % unit;
@@ -1239,13 +1334,18 @@  store_split_bit_field (rtx op0, unsigned
       thissize = MIN (bitsize - bitsdone, BITS_PER_WORD);
       thissize = MIN (thissize, unit - thispos);
 
-      if (BYTES_BIG_ENDIAN)
+      if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  /* Fetch successively less significant portions.  */
 	  if (CONST_INT_P (value))
 	    part = GEN_INT (((unsigned HOST_WIDE_INT) (INTVAL (value))
 			     >> (bitsize - bitsdone - thissize))
 			    & (((HOST_WIDE_INT) 1 << thissize) - 1));
+          /* Likewise, but the source is little-endian.  */
+          else if (reverse)
+	    part = extract_fixed_bit_field (word_mode, value, thissize,
+					    bitsize - bitsdone - thissize,
+					    NULL_RTX, 1, false);
 	  else
 	    {
 	      int total_bits = GET_MODE_BITSIZE (GET_MODE (value));
@@ -1254,7 +1354,7 @@  store_split_bit_field (rtx op0, unsigned
 		 endianness compensation) to fetch the piece we want.  */
 	      part = extract_fixed_bit_field (word_mode, value, thissize,
 					      total_bits - bitsize + bitsdone,
-					      NULL_RTX, 1);
+					      NULL_RTX, 1, false);
 	    }
 	}
       else
@@ -1264,9 +1364,14 @@  store_split_bit_field (rtx op0, unsigned
 	    part = GEN_INT (((unsigned HOST_WIDE_INT) (INTVAL (value))
 			     >> bitsdone)
 			    & (((HOST_WIDE_INT) 1 << thissize) - 1));
+	  /* Likewise, but the source is big-endian.  */
+          else if (reverse)
+	    part = extract_fixed_bit_field (word_mode, value, thissize,
+					    total_bits - bitsdone - thissize,
+					    NULL_RTX, 1, false);
 	  else
 	    part = extract_fixed_bit_field (word_mode, value, thissize,
-					    bitsdone, NULL_RTX, 1);
+					    bitsdone, NULL_RTX, 1, false);
 	}
 
       /* If OP0 is a register, then handle OFFSET here.
@@ -1304,7 +1409,8 @@  store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, thissize, offset * unit + thispos,
-			       bitregion_start, bitregion_end, part);
+			       bitregion_start, bitregion_end, part,
+			       reverse);
       bitsdone += thissize;
     }
 }
@@ -1429,7 +1535,7 @@  static rtx
 extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		     unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
 		     machine_mode mode, machine_mode tmode,
-		     bool fallback_p)
+		     bool reverse, bool fallback_p)
 {
   rtx op0 = str_rtx;
   machine_mode int_mode;
@@ -1455,6 +1561,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
       && bitnum == 0
       && bitsize == GET_MODE_BITSIZE (GET_MODE (op0)))
     {
+      if (reverse)
+	op0 = flip_storage_order (mode, op0);
       /* We're trying to extract a full register from itself.  */
       return op0;
     }
@@ -1571,6 +1679,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
      as the least significant bit of the value is the least significant
      bit of either OP0 or a word of OP0.  */
   if (!MEM_P (op0)
+      && !reverse
       && lowpart_bit_field_p (bitnum, bitsize, GET_MODE (op0))
       && bitsize == GET_MODE_BITSIZE (mode1)
       && TRULY_NOOP_TRUNCATION_MODES_P (mode1, GET_MODE (op0)))
@@ -1586,6 +1695,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
   if (simple_mem_bitfield_p (op0, bitsize, bitnum, mode1))
     {
       op0 = adjust_bitfield_address (op0, mode1, bitnum / BITS_PER_UNIT);
+      if (reverse)
+	op0 = flip_storage_order (mode1, op0);
       return convert_extracted_bit_field (op0, mode, tmode, unsignedp);
     }
 
@@ -1598,7 +1709,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	 This is because the most significant word is the one which may
 	 be less than full.  */
 
-      unsigned int backwards = WORDS_BIG_ENDIAN;
+      const bool backwards = WORDS_BIG_ENDIAN;
       unsigned int nwords = (bitsize + (BITS_PER_WORD - 1)) / BITS_PER_WORD;
       unsigned int i;
       rtx_insn *last;
@@ -1625,7 +1736,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	       ? GET_MODE_SIZE (GET_MODE (target)) / UNITS_PER_WORD - i - 1
 	       : i);
 	  /* Offset from start of field in OP0.  */
-	  unsigned int bit_offset = (backwards
+	  unsigned int bit_offset = (backwards ^ reverse
 				     ? MAX ((int) bitsize - ((int) i + 1)
 					    * BITS_PER_WORD,
 					    0)
@@ -1635,7 +1746,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	    = extract_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					     bitsize - i * BITS_PER_WORD),
 				   bitnum + bit_offset, 1, target_part,
-				   mode, word_mode, fallback_p);
+				   mode, word_mode, reverse, fallback_p);
 
 	  gcc_assert (target_part);
 	  if (!result_part)
@@ -1685,7 +1796,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	{
 	  if (!fallback_p)
 	    return NULL_RTX;
-	  target = extract_split_bit_field (op0, bitsize, bitnum, unsignedp);
+	  target = extract_split_bit_field (op0, bitsize, bitnum, unsignedp,
+					    reverse);
 	  return convert_extracted_bit_field (target, mode, tmode, unsignedp);
 	}
     }
@@ -1695,6 +1807,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
   enum extraction_pattern pattern = unsignedp ? EP_extzv : EP_extv;
   extraction_insn extv;
   if (!MEM_P (op0)
+      && !reverse
       /* ??? We could limit the structure size to the part of OP0 that
 	 contains the field, with appropriate checks for endianness
 	 and TRULY_NOOP_TRUNCATION.  */
@@ -1711,7 +1824,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 
   /* If OP0 is a memory, try copying it to a register and seeing if a
      cheap register alternative is available.  */
-  if (MEM_P (op0))
+  if (MEM_P (op0) & !reverse)
     {
       if (get_best_mem_extraction_insn (&extv, pattern, bitsize, bitnum,
 					tmode))
@@ -1736,7 +1849,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	  xop0 = copy_to_reg (xop0);
 	  rtx result = extract_bit_field_1 (xop0, bitsize, bitpos,
 					    unsignedp, target,
-					    mode, tmode, false);
+					    mode, tmode, reverse, false);
 	  if (result)
 	    return result;
 	  delete_insns_since (last);
@@ -1755,7 +1868,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
   gcc_assert (int_mode != BLKmode);
 
   target = extract_fixed_bit_field (int_mode, op0, bitsize, bitnum,
-				    target, unsignedp);
+				    target, unsignedp, reverse);
   return convert_extracted_bit_field (target, mode, tmode, unsignedp);
 }
 
@@ -1770,6 +1883,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
    TMODE is the mode the caller would like the value to have;
    but the value may be returned with type MODE instead.
 
+   If REVERSE is true, the extraction is to be done in reverse order.
+
    If a TARGET is specified and we can store in it at no extra cost,
    we do so, and return TARGET.
    Otherwise, we return a REG of mode TMODE or MODE, with TMODE preferred
@@ -1778,7 +1893,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 rtx
 extract_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
-		   machine_mode mode, machine_mode tmode)
+		   machine_mode mode, machine_mode tmode, bool reverse)
 {
   machine_mode mode1;
 
@@ -1800,6 +1915,8 @@  extract_bit_field (rtx str_rtx, unsigned
 	{
 	  rtx result = adjust_bitfield_address (str_rtx, mode1,
 						bitnum / BITS_PER_UNIT);
+	  if (reverse)
+	    result = flip_storage_order (mode1, result);
 	  gcc_assert (bitnum % BITS_PER_UNIT == 0);
 	  return convert_extracted_bit_field (result, mode, tmode, unsignedp);
 	}
@@ -1811,13 +1928,15 @@  extract_bit_field (rtx str_rtx, unsigned
     }
 
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
-			      target, mode, tmode, true);
+			      target, mode, tmode, reverse, true);
 }
 
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
    from bit BITNUM of OP0.
 
    UNSIGNEDP is nonzero for an unsigned bit field (don't sign-extend value).
+   If REVERSE is true, the extraction is to be done in reverse order.
+
    If TARGET is nonzero, attempts to store the value there
    and return TARGET, but this is not guaranteed.
    If TARGET is not used, create a pseudo-reg of mode TMODE for the value.  */
@@ -1826,7 +1945,7 @@  static rtx
 extract_fixed_bit_field (machine_mode tmode, rtx op0,
 			 unsigned HOST_WIDE_INT bitsize,
 			 unsigned HOST_WIDE_INT bitnum, rtx target,
-			 int unsignedp)
+			 int unsignedp, bool reverse)
 {
   if (MEM_P (op0))
     {
@@ -1837,13 +1956,14 @@  extract_fixed_bit_field (machine_mode tm
       if (mode == VOIDmode)
 	/* The only way this should occur is if the field spans word
 	   boundaries.  */
-	return extract_split_bit_field (op0, bitsize, bitnum, unsignedp);
+	return extract_split_bit_field (op0, bitsize, bitnum, unsignedp,
+					reverse);
 
       op0 = narrow_bit_field_mem (op0, mode, bitsize, bitnum, &bitnum);
     }
 
   return extract_fixed_bit_field_1 (tmode, op0, bitsize, bitnum,
-				    target, unsignedp);
+				    target, unsignedp, reverse);
 }
 
 /* Helper function for extract_fixed_bit_field, extracts
@@ -1853,7 +1973,7 @@  static rtx
 extract_fixed_bit_field_1 (machine_mode tmode, rtx op0,
 			   unsigned HOST_WIDE_INT bitsize,
 			   unsigned HOST_WIDE_INT bitnum, rtx target,
-			   int unsignedp)
+			   int unsignedp, bool reverse)
 {
   machine_mode mode = GET_MODE (op0);
   gcc_assert (SCALAR_INT_MODE_P (mode));
@@ -1862,13 +1982,15 @@  extract_fixed_bit_field_1 (machine_mode
      for invalid input, such as extract equivalent of f5 from
      gcc.dg/pr48335-2.c.  */
 
-  if (BYTES_BIG_ENDIAN)
+  if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     /* BITNUM is the distance between our msb and that of OP0.
        Convert it to the distance from the lsb.  */
     bitnum = GET_MODE_BITSIZE (mode) - bitsize - bitnum;
 
   /* Now BITNUM is always the distance between the field's lsb and that of OP0.
      We have reduced the big-endian case to the little-endian case.  */
+  if (reverse)
+    op0 = flip_storage_order (mode, op0);
 
   if (unsignedp)
     {
@@ -1940,11 +2062,14 @@  lshift_value (machine_mode mode, unsigne
 
    OP0 is the REG, SUBREG or MEM rtx for the first of the two words.
    BITSIZE is the field width; BITPOS, position of its first bit, in the word.
-   UNSIGNEDP is 1 if should zero-extend the contents; else sign-extend.  */
+   UNSIGNEDP is 1 if should zero-extend the contents; else sign-extend.
+
+   If REVERSE is true, the extraction is to be done in reverse order.  */
 
 static rtx
 extract_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-			 unsigned HOST_WIDE_INT bitpos, int unsignedp)
+			 unsigned HOST_WIDE_INT bitpos, int unsignedp,
+			 bool reverse)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1999,11 +2124,11 @@  extract_split_bit_field (rtx op0, unsign
 	 whose meaning is determined by BYTES_PER_UNIT.
 	 OFFSET is in UNITs, and UNIT is in bits.  */
       part = extract_fixed_bit_field (word_mode, word, thissize,
-				      offset * unit + thispos, 0, 1);
+				      offset * unit + thispos, 0, 1, reverse);
       bitsdone += thissize;
 
       /* Shift this part into place for the result.  */
-      if (BYTES_BIG_ENDIAN)
+      if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  if (bitsize != bitsdone)
 	    part = expand_shift (LSHIFT_EXPR, word_mode, part,
Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	(.../trunk)	(revision 224461)
+++ gcc/expmed.h	(.../branches/scalar-storage-order)	(revision 224467)
@@ -676,6 +676,10 @@  extern rtx emit_cstore (rtx target, enum
    May emit insns.  */
 extern rtx negate_rtx (machine_mode, rtx);
 
+/* Arguments MODE, RTX: return an rtx for the flipping of that value.
+   May emit insns.  */
+extern rtx flip_storage_order (enum machine_mode, rtx);
+
 /* Expand a logical AND operation.  */
 extern rtx expand_and (machine_mode, rtx, rtx, rtx);
 
@@ -707,10 +711,10 @@  extern void store_bit_field (rtx, unsign
 			     unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     machine_mode, rtx);
+			     machine_mode, rtx, bool);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, rtx,
-			      machine_mode, machine_mode);
+			      machine_mode, machine_mode, bool);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
 extern rtx expand_mult (machine_mode, rtx, rtx, rtx, int);
 extern rtx expand_mult_highpart_adjust (machine_mode, rtx, rtx, rtx, rtx, int);
Index: gcc/tree-dfa.c
===================================================================
--- gcc/tree-dfa.c	(.../trunk)	(revision 224461)
+++ gcc/tree-dfa.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -398,12 +398,14 @@  get_or_create_ssa_default_def (struct fu
    base variable.  The access range is delimited by bit positions *POFFSET and
    *POFFSET + *PMAX_SIZE.  The access size is *PSIZE bits.  If either
    *PSIZE or *PMAX_SIZE is -1, they could not be determined.  If *PSIZE
-   and *PMAX_SIZE are equal, the access is non-variable.  */
+   and *PMAX_SIZE are equal, the access is non-variable.  If *PREVERSE is
+   true, the storage order of the reference is reversed.  */
 
 tree
 get_ref_base_and_extent (tree exp, HOST_WIDE_INT *poffset,
 			 HOST_WIDE_INT *psize,
-			 HOST_WIDE_INT *pmax_size)
+			 HOST_WIDE_INT *pmax_size,
+			 bool *preverse)
 {
   offset_int bitsize = -1;
   offset_int maxsize;
@@ -411,11 +413,20 @@  get_ref_base_and_extent (tree exp, HOST_
   offset_int bit_offset = 0;
   bool seen_variable_array_ref = false;
 
-  /* First get the final access size from just the outermost expression.  */
+  /* First get the final access size and the storage order from just the
+     outermost expression.  */
   if (TREE_CODE (exp) == COMPONENT_REF)
-    size_tree = DECL_SIZE (TREE_OPERAND (exp, 1));
+    {
+      size_tree = DECL_SIZE (TREE_OPERAND (exp, 1));
+      *preverse
+	= TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (exp, 0)))
+	  && !AGGREGATE_TYPE_P (TREE_TYPE (exp));
+    }
   else if (TREE_CODE (exp) == BIT_FIELD_REF)
-    size_tree = TREE_OPERAND (exp, 1);
+    {
+      size_tree = TREE_OPERAND (exp, 1);
+      *preverse = REF_REVERSE_STORAGE_ORDER (exp);
+    }
   else if (!VOID_TYPE_P (TREE_TYPE (exp)))
     {
       machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
@@ -423,7 +434,16 @@  get_ref_base_and_extent (tree exp, HOST_
 	size_tree = TYPE_SIZE (TREE_TYPE (exp));
       else
 	bitsize = int (GET_MODE_PRECISION (mode));
+      *preverse
+	= ((TREE_CODE (exp) == ARRAY_REF
+	    && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (exp, 0))))
+	   || (TREE_CODE (exp) == MEM_REF
+	       && REF_REVERSE_STORAGE_ORDER (exp)))
+	  && !AGGREGATE_TYPE_P (TREE_TYPE (exp));
     }
+  else
+    *preverse = false;
+
   if (size_tree != NULL_TREE
       && TREE_CODE (size_tree) == INTEGER_CST)
     bitsize = wi::to_offset (size_tree);
Index: gcc/tree-dfa.h
===================================================================
--- gcc/tree-dfa.h	(.../trunk)	(revision 224461)
+++ gcc/tree-dfa.h	(.../branches/scalar-storage-order)	(revision 224467)
@@ -30,7 +30,7 @@  extern tree ssa_default_def (struct func
 extern void set_ssa_default_def (struct function *, tree, tree);
 extern tree get_or_create_ssa_default_def (struct function *, tree);
 extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,
-				     HOST_WIDE_INT *, HOST_WIDE_INT *);
+				     HOST_WIDE_INT *, HOST_WIDE_INT *, bool *);
 extern tree get_addr_base_and_unit_offset_1 (tree, HOST_WIDE_INT *,
 					     tree (*) (tree));
 extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	(.../trunk)	(revision 224461)
+++ gcc/print-tree.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -576,6 +576,13 @@  print_node (FILE *file, const char *pref
       if (TYPE_NEEDS_CONSTRUCTING (node))
 	fputs (" needs-constructing", file);
 
+      if ((code == RECORD_TYPE
+	   || code == UNION_TYPE
+	   || code == QUAL_UNION_TYPE
+	   || code == ARRAY_TYPE)
+	  && TYPE_REVERSE_STORAGE_ORDER (node))
+	fputs (" reverse-storage-order", file);
+
       /* The transparent-union flag is used for different things in
 	 different nodes.  */
       if ((code == UNION_TYPE || code == RECORD_TYPE)
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(.../trunk)	(revision 224461)
+++ gcc/varasm.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -125,7 +125,7 @@  static int compare_constant (const tree,
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int);
+					       unsigned int, bool);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -2076,7 +2076,8 @@  assemble_variable_contents (tree decl, c
 	/* Output the actual data.  */
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
-			 get_variable_align (decl));
+			 get_variable_align (decl),
+			 false);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2802,11 +2803,12 @@  assemble_integer (rtx x, unsigned int si
 }
 
 void
-assemble_real (REAL_VALUE_TYPE d, machine_mode mode, unsigned int align)
+assemble_real (REAL_VALUE_TYPE d, machine_mode mode, unsigned int align,
+	       bool reverse)
 {
   long data[4] = {0, 0, 0, 0};
-  int i;
   int bitsize, nelts, nunits, units_per;
+  rtx elt;
 
   /* This is hairy.  We have a quantity of known size.  real_to_target
      will put it into an array of *host* longs, 32 bits per element
@@ -2828,15 +2830,23 @@  assemble_real (REAL_VALUE_TYPE d, machin
   real_to_target (data, &d, mode);
 
   /* Put out the first word with the specified alignment.  */
-  assemble_integer (GEN_INT (data[0]), MIN (nunits, units_per), align, 1);
+  if (reverse)
+    elt = flip_storage_order (SImode, GEN_INT (data[nelts - 1]));
+  else
+    elt = GEN_INT (data[0]);
+  assemble_integer (elt, MIN (nunits, units_per), align, 1);
   nunits -= units_per;
 
   /* Subsequent words need only 32-bit alignment.  */
   align = min_align (align, 32);
 
-  for (i = 1; i < nelts; i++)
+  for (int i = 1; i < nelts; i++)
     {
-      assemble_integer (GEN_INT (data[i]), MIN (nunits, units_per), align, 1);
+      if (reverse)
+	elt = flip_storage_order (SImode, GEN_INT (data[nelts - 1 - i]));
+      else
+	elt = GEN_INT (data[i]);
+      assemble_integer (elt, MIN (nunits, units_per), align, 1);
       nunits -= units_per;
     }
 }
@@ -3138,10 +3148,12 @@  compare_constant (const tree t1, const t
 	if (typecode == ARRAY_TYPE)
 	  {
 	    HOST_WIDE_INT size_1 = int_size_in_bytes (TREE_TYPE (t1));
-	    /* For arrays, check that the sizes all match.  */
+	    /* For arrays, check that mode, size and storage order match.  */
 	    if (TYPE_MODE (TREE_TYPE (t1)) != TYPE_MODE (TREE_TYPE (t2))
 		|| size_1 == -1
-		|| size_1 != int_size_in_bytes (TREE_TYPE (t2)))
+		|| size_1 != int_size_in_bytes (TREE_TYPE (t2))
+		|| TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t1))
+		   != TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t2)))
 	      return 0;
 	  }
 	else
@@ -3427,7 +3439,7 @@  assemble_constant_contents (tree exp, co
   targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size);
 
   /* Output the value of EXP.  */
-  output_constant (exp, size, align);
+  output_constant (exp, size, align, false);
 
   targetm.asm_out.decl_end ();
 }
@@ -3864,7 +3876,7 @@  output_constant_pool_2 (machine_mode mod
 
 	gcc_assert (CONST_DOUBLE_AS_FLOAT_P (x));
 	REAL_VALUE_FROM_CONST_DOUBLE (r, x);
-	assemble_real (r, mode, align);
+	assemble_real (r, mode, align, false);
 	break;
       }
 
@@ -4364,7 +4376,11 @@  initializer_constant_valid_p_1 (tree val
 	      tree reloc;
 	      reloc = initializer_constant_valid_p_1 (elt, TREE_TYPE (elt),
 						      NULL);
-	      if (!reloc)
+	      if (!reloc
+		  /* An absolute value is required with reverse SSO.  */
+		  || (reloc != null_pointer_node
+		      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (value))
+		      && !AGGREGATE_TYPE_P (TREE_TYPE (elt))))
 		{
 		  if (cache)
 		    {
@@ -4656,8 +4672,8 @@  typedef struct {
 } oc_outer_state;
 
 static unsigned HOST_WIDE_INT
-  output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int,
-		      oc_outer_state *);
+output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool,
+		    oc_outer_state *);
 
 /* Output assembler code for constant EXP, with no label.
    This includes the pseudo-op such as ".int" or ".byte", and a newline.
@@ -4679,13 +4695,18 @@  static unsigned HOST_WIDE_INT
    for a structure constructor that wants to produce more than SIZE bytes.
    But such constructors will never be generated for any possible input.
 
-   ALIGN is the alignment of the data in bits.  */
+   ALIGN is the alignment of the data in bits.
+
+   If REVERSE is true, EXP is interpreted in reverse storage order wrt the
+   target order.  */
 
 static unsigned HOST_WIDE_INT
-output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align)
+output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
+		 bool reverse)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
+  rtx cst;
 
   if (size == 0 || flag_syntax_only)
     return size;
@@ -4780,9 +4801,10 @@  output_constant (tree exp, unsigned HOST
     case FIXED_POINT_TYPE:
     case POINTER_BOUNDS_TYPE:
     case NULLPTR_TYPE:
-      if (! assemble_integer (expand_expr (exp, NULL_RTX, VOIDmode,
-					   EXPAND_INITIALIZER),
-			      MIN (size, thissize), align, 0))
+      cst = expand_expr (exp, NULL_RTX, VOIDmode, EXPAND_INITIALIZER);
+      if (reverse)
+	cst = flip_storage_order (TYPE_MODE (TREE_TYPE (exp)), cst);
+      if (!assemble_integer (cst, MIN (size, thissize), align, 0))
 	error ("initializer for integer/fixed-point value is too complicated");
       break;
 
@@ -4790,13 +4812,17 @@  output_constant (tree exp, unsigned HOST
       if (TREE_CODE (exp) != REAL_CST)
 	error ("initializer for floating value is not a floating constant");
       else
-	assemble_real (TREE_REAL_CST (exp), TYPE_MODE (TREE_TYPE (exp)), align);
+	{
+	  assemble_real (TREE_REAL_CST (exp), TYPE_MODE (TREE_TYPE (exp)),
+			 align, reverse);
+	}
       break;
 
     case COMPLEX_TYPE:
-      output_constant (TREE_REALPART (exp), thissize / 2, align);
+      output_constant (TREE_REALPART (exp), thissize / 2, align, reverse);
       output_constant (TREE_IMAGPART (exp), thissize / 2,
-		       min_align (align, BITS_PER_UNIT * (thissize / 2)));
+		       min_align (align, BITS_PER_UNIT * (thissize / 2)),
+		       reverse);
       break;
 
     case ARRAY_TYPE:
@@ -4804,7 +4830,7 @@  output_constant (tree exp, unsigned HOST
       switch (TREE_CODE (exp))
 	{
 	case CONSTRUCTOR:
-	  return output_constructor (exp, size, align, NULL);
+	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
 	  thissize
 	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
@@ -4815,11 +4841,13 @@  output_constant (tree exp, unsigned HOST
 	    machine_mode inner = TYPE_MODE (TREE_TYPE (TREE_TYPE (exp)));
 	    unsigned int nalign = MIN (align, GET_MODE_ALIGNMENT (inner));
 	    int elt_size = GET_MODE_SIZE (inner);
-	    output_constant (VECTOR_CST_ELT (exp, 0), elt_size, align);
+	    output_constant (VECTOR_CST_ELT (exp, 0), elt_size, align,
+			     reverse);
 	    thissize = elt_size;
 	    for (unsigned int i = 1; i < VECTOR_CST_NELTS (exp); i++)
 	      {
-		output_constant (VECTOR_CST_ELT (exp, i), elt_size, nalign);
+		output_constant (VECTOR_CST_ELT (exp, i), elt_size, nalign,
+				 reverse);
 		thissize += elt_size;
 	      }
 	    break;
@@ -4832,7 +4860,7 @@  output_constant (tree exp, unsigned HOST
     case RECORD_TYPE:
     case UNION_TYPE:
       gcc_assert (TREE_CODE (exp) == CONSTRUCTOR);
-      return output_constructor (exp, size, align, NULL);
+      return output_constructor (exp, size, align, reverse, NULL);
 
     case ERROR_MARK:
       return 0;
@@ -4846,7 +4874,6 @@  output_constant (tree exp, unsigned HOST
 
   return size;
 }
-
 
 /* Subroutine of output_constructor, used for computing the size of
    arrays of unspecified length.  VAL must be a CONSTRUCTOR of an array
@@ -4908,6 +4935,7 @@  typedef struct {
   int last_relative_index;    /* Implicit or explicit index of the last
 				 array element output within a bitfield.  */
   bool byte_buffer_in_use;    /* Whether BYTE is in use.  */
+  bool reverse;               /* Whether reverse storage order is in use.  */
 
   /* Current element.  */
   tree field;      /* Current field decl in a record.  */
@@ -4940,7 +4968,8 @@  output_constructor_array_range (oc_local
       if (local->val == NULL_TREE)
 	assemble_zeros (fieldsize);
       else
-	fieldsize = output_constant (local->val, fieldsize, align2);
+	fieldsize
+	  = output_constant (local->val, fieldsize, align2, local->reverse);
 
       /* Count its size.  */
       local->total_bytes += fieldsize;
@@ -5026,7 +5055,8 @@  output_constructor_regular_field (oc_loc
   if (local->val == NULL_TREE)
     assemble_zeros (fieldsize);
   else
-    fieldsize = output_constant (local->val, fieldsize, align2);
+    fieldsize
+      = output_constant (local->val, fieldsize, align2, local->reverse);
 
   /* Count its size.  */
   local->total_bytes += fieldsize;
@@ -5124,7 +5154,7 @@  output_constructor_bitfield (oc_local_st
       temp_state.bit_offset = next_offset % BITS_PER_UNIT;
       temp_state.byte = local->byte;
       local->total_bytes
-	  += output_constructor (local->val, 0, 0, &temp_state);
+	+= output_constructor (local->val, 0, 0, local->reverse, &temp_state);
       local->byte = temp_state.byte;
       return;
     }
@@ -5150,7 +5180,7 @@  output_constructor_bitfield (oc_local_st
 
       /* Number of bits we can process at once (all part of the same byte).  */
       this_time = MIN (end_offset - next_offset, BITS_PER_UNIT - next_bit);
-      if (BYTES_BIG_ENDIAN)
+      if (local->reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  /* On big-endian machine, take the most significant bits (of the
 	     bits that are significant) first and put them into bytes from
@@ -5214,12 +5244,11 @@  output_constructor_bitfield (oc_local_st
    caller output state of relevance in recursive invocations.  */
 
 static unsigned HOST_WIDE_INT
-output_constructor (tree exp, unsigned HOST_WIDE_INT size,
-		    unsigned int align, oc_outer_state *outer)
+output_constructor (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
+		    bool reverse, oc_outer_state *outer)
 {
   unsigned HOST_WIDE_INT cnt;
   constructor_elt *ce;
-
   oc_local_state local;
 
   /* Setup our local state to communicate with helpers.  */
@@ -5236,6 +5265,11 @@  output_constructor (tree exp, unsigned H
   local.byte_buffer_in_use = outer != NULL;
   local.byte = outer ? outer->byte : 0;
   local.last_relative_index = -1;
+  /* The storage order is specified for every aggregate type.  */
+  if (AGGREGATE_TYPE_P (local.type))
+    local.reverse = TYPE_REVERSE_STORAGE_ORDER (local.type);
+  else
+    local.reverse = reverse;
 
   gcc_assert (HOST_BITS_PER_WIDE_INT >= BITS_PER_UNIT);
 
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(.../trunk)	(revision 224461)
+++ gcc/tree-inline.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -984,6 +984,7 @@  remap_gimple_op_r (tree *tp, int *walk_s
 	      && (!is_parm (TREE_OPERAND (old, 0))
 		  || (!id->transform_parameter && is_parm (ptr))))
 	    TREE_THIS_NOTRAP (*tp) = 1;
+	  REF_REVERSE_STORAGE_ORDER (*tp) = REF_REVERSE_STORAGE_ORDER (old);
 	  *walk_subtrees = 0;
 	  return NULL;
 	}
@@ -1241,6 +1242,7 @@  copy_tree_body_r (tree *tp, int *walk_su
 	      && (!is_parm (TREE_OPERAND (old, 0))
 		  || (!id->transform_parameter && is_parm (ptr))))
 	    TREE_THIS_NOTRAP (*tp) = 1;
+	  REF_REVERSE_STORAGE_ORDER (*tp) = REF_REVERSE_STORAGE_ORDER (old);
 	  *walk_subtrees = 0;
 	  return NULL;
 	}
Index: gcc/output.h
===================================================================
--- gcc/output.h	(.../trunk)	(revision 224461)
+++ gcc/output.h	(.../branches/scalar-storage-order)	(revision 224467)
@@ -280,7 +280,7 @@  extern section *get_named_text_section (
 
 #ifdef REAL_VALUE_TYPE_SIZE
 /* Assemble the floating-point constant D into an object of size MODE.  */
-extern void assemble_real (REAL_VALUE_TYPE, machine_mode, unsigned);
+extern void assemble_real (REAL_VALUE_TYPE, machine_mode, unsigned, bool);
 #endif
 
 /* Write the address of the entity given by SYMBOL to SEC.  */
Index: gcc/tree-outof-ssa.c
===================================================================
--- gcc/tree-outof-ssa.c	(.../trunk)	(revision 224461)
+++ gcc/tree-outof-ssa.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -332,7 +332,7 @@  insert_value_copy_on_edge (edge e, int d
   else if (src_mode == BLKmode)
     {
       x = dest_rtx;
-      store_expr (src, x, 0, false);
+      store_expr (src, x, 0, false, false);
     }
   else
     x = expand_expr (src, dest_rtx, dest_mode, EXPAND_NORMAL);
Index: gcc/gimple-expr.c
===================================================================
--- gcc/gimple-expr.c	(.../trunk)	(revision 224461)
+++ gcc/gimple-expr.c	(.../branches/scalar-storage-order)	(revision 224467)
@@ -168,14 +168,16 @@  useless_type_conversion_p (tree outer_ty
   else if (TREE_CODE (inner_type) == ARRAY_TYPE
 	   && TREE_CODE (outer_type) == ARRAY_TYPE)
     {
-      /* Preserve string attributes.  */
+      /* Preserve various attributes.  */
+      if (TYPE_REVERSE_STORAGE_ORDER (inner_type)
+	  != TYPE_REVERSE_STORAGE_ORDER (outer_type))
+	return false;
       if (TYPE_STRING_FLAG (inner_type) != TYPE_STRING_FLAG (outer_type))
 	return false;
 
       /* Conversions from array types with unknown extent to
 	 array types with known extent are not useless.  */
-      if (!TYPE_DOMAIN (inner_type)
-	  && TYPE_DOMAIN (outer_type))
+      if (!TYPE_DOMAIN (inner_type) && TYPE_DOMAIN (outer_type))
 	return false;
 
       /* Nor are conversions from array types with non-constant size to
Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h	(.../trunk)	(revision 224461)
+++ gcc/tree-core.h	(.../branches/scalar-storage-order)	(revision 224467)
@@ -1119,8 +1119,14 @@  struct GTY(()) tree_base {
 
    saturating_flag:
 
+       TYPE_REVERSE_STORAGE_ORDER in
+           RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE, ARRAY_TYPE
+
        TYPE_SATURATING in
-           all types
+           other types
+
+       REF_REVERSE_STORAGE_ORDER in
+           BIT_FIELD_REF, MEM_REF
 
        VAR_DECL_IS_VIRTUAL_OPERAND in
 	   VAR_DECL