diff mbox

[4/6] scalar-storage-order merge: bulk

Message ID 1938106.13zyn2TWAm@polaris
State New
Headers show

Commit Message

Eric Botcazou Oct. 6, 2015, 11:04 a.m. UTC
This is the bulk of the implementation.

	* calls.c (store_unaligned_arguments_into_pseudos): Adjust calls to
	extract_bit_field and store_bit_field.
	(initialize_argument_information): Adjust call to store_expr.
	(load_register_parameters): Adjust call to extract_bit_field.
	* expmed.c (check_reverse_storage_order_support): New function.
	(check_reverse_float_storage_order_support): Likewise.
	(flip_storage_order): Likewise.
	(store_bit_field_1): Add REVERSE parameter.  Flip the storage order
	of the value if it is true.  Pass REVERSE to recursive call after
	adjusting the target offset.
	Do not use extraction or movstrict instruction if REVERSE is true.
	Pass REVERSE to store_fixed_bit_field.
	(store_bit_field): Add REVERSE parameter and pass to it to above.
	(store_fixed_bit_field): Add REVERSE parameter and pass to it to
	store_split_bit_field and store_fixed_bit_field_1.
	(store_fixed_bit_field_1):  Add REVERSE parameter.  Flip the storage
	order of the value if it is true and adjust the target offset.
	(store_split_bit_field): Add REVERSE parameter and pass it to
	store_fixed_bit_field.  Adjust the target offset if it is true.
	(extract_bit_field_1): Add REVERSE parameter.  Flip the storage order
	of the value if it is true.  Pass REVERSE to recursive call after
	adjusting the target offset.
	Do not use extraction or subreg instruction if REVERSE is true.
	Pass REVERSE to extract_fixed_bit_field.
	(extract_bit_field): Add REVERSE parameter and pass to it to above.
	(extract_fixed_bit_field): Add REVERSE parameter and pass to it to
	extract_split_bit_field and extract_fixed_bit_field_1.
	(extract_fixed_bit_field_1): Add REVERSE parameter.  Flip the storage
	order of the value if it is true and adjust the target offset.
	(extract_split_bit_field): Add REVERSE parameter and pass it to
	extract_fixed_bit_field.  Adjust the target offset if it is true.
	* expmed.h (flip_storage_order): Declare.
	(store_bit_field): Adjust prototype.
	(extract_bit_field): Likewise.
	* expr.c (emit_group_load_1): Adjust calls to extract_bit_field.
	(emit_group_store): Adjust call to store_bit_field.
	(copy_blkmode_from_reg): Likewise.
	(copy_blkmode_to_reg): Likewise.
	(write_complex_part): Likewise.
 	(read_complex_part): Likewise.
	(optimize_bitfield_assignment_op): Add REVERSE parameter.  Assert
	that it isn't true if the target is a register.
	<PLUS_EXPR>: If it is, do not optimize unless bitsize is equal to 1,
	and flip the storage order of the value.
	<BIT_IOR_EXPR>: Flip the storage order of the value.
	(get_bit_range): Adjust call to get_inner_reference.
	(expand_assignment): Adjust calls to get_inner_reference, store_expr,
	optimize_bitfield_assignment_op and store_field.  Handle MEM_EXPRs
	with reverse storage order.
	(store_expr_with_bounds): Add REVERSE parameter and pass it to
	recursive calls and call to store_bit_field.  Force the value into a
	register if it is true and then flip the storage order of the value.
	(store_expr): Add REVERSE parameter and pass it to above.
	(categorize_ctor_elements_1): Adjust call to
	initializer_constant_valid_p.
	(store_constructor_field): Add REVERSE parameter and pass it to
	recursive calls and call to store_field.
	(store_constructor): Add REVERSE parameter and pass it to calls to
	store_constructor_field and store_expr.  Set it to true for an
	aggregate type with TYPE_REVERSE_STORAGE_ORDER.
	(store_field): Add REVERSE parameter and pass it to recursive calls
	and calls to store_expr and store_bit_field.  Temporarily flip the
	storage order of the value with record type and integral mode and
	adjust the shift if it is true.
	(get_inner_reference): Add PREVERSEP parameter and set it to true
	upon encoutering a reference with reverse storage order.
	(expand_expr_addr_expr_1): Adjust call to get_inner_reference.
	(expand_constructor): Adjust call to store_constructor.
	(expand_expr_real_2) <CASE_CONVERT>: Pass TYPE_REVERSE_STORAGE_ORDER
	of the union type to store_expr in the MEM case and assert that it
	isn't set in the REG case.  Adjust call to store_field.
	(expand_expr_real_1) <MEM_REF>: Handle reverse storage order.
	<normal_inner_ref>: Add REVERSEP variable and adjust calls to
	get_inner_reference and extract_bit_field. Temporarily flip the
	storage order of the value with record type and integral mode and
	adjust the shift if it is true.  Flip the storage order of the value
	at the end if it is true.
	<VIEW_CONVERT_EXPR>: Add REVERSEP variable and adjust call to
	get_inner_reference.  Do not fetch an inner reference if it is true.
	* expr.h (store_expr_with_bounds): Ajust prototype.
	(store_expr): Likewise.
	* fold-const.c (make_bit_field_ref): Add REVERSEP parameter and set
	REF_REVERSE_STORAGE_ORDER on the reference according to it.
	(optimize_bit_field_compare): Deal with reverse storage order.
	Adjust calls to get_inner_reference and make_bit_field_ref.
	(decode_field_reference): Add PREVERSEP parameter and adjust call to
	get_inner_reference.
	(fold_truth_andor_1): Deal with reverse storage order.  Adjust calls
	to decode_field_reference and make_bit_field_ref.
	(fold_unary_loc) <CASE_CONVERT>: Adjust call to get_inner_reference.
	<VIEW_CONVERT_EXPR>: Propagate the REF_REVERSE_STORAGE_ORDER flag.
	(fold_comparison): Adjust call to get_inner_reference.
	(split_address_to_core_and_offset): Adjust call to
	get_inner_reference.
	* gimple-expr.c (useless_type_conversion_p): Return false for array
	types with different TYPE_REVERSE_STORAGE_ORDER flag.
	* gimplify.c (gimplify_expr) <MEM_REF>: Propagate the
	REF_REVERSE_STORAGE_ORDER flag.
	* lto-streamer-out.c (hash_tree): Deal with
	TYPE_REVERSE_STORAGE_ORDER.
	* output.h (assemble_real): Adjust prototype.
	* print-tree.c (print_node): Convey TYPE_REVERSE_STORAGE_ORDER.
	* stor-layout.c (finish_record_layout): Propagate the
	TYPE_REVERSE_STORAGE_ORDER flag to the variants.
	* tree-core.h (TYPE_REVERSE_STORAGE_ORDER): Document.
	(TYPE_SATURATING): Adjust.
	(REF_REVERSE_STORAGE_ORDER): Document.
	* tree-dfa.c (get_ref_base_and_extent): Add PREVERSE parameter and
	set it to true upon encoutering a reference with reverse storage
	order.
	* tree-dfa.h (get_ref_base_and_extent): Adjust prototype.
	* tree-inline.c (remap_gimple_op_r): Propagate the
	REF_REVERSE_STORAGE_ORDER flag.
	(copy_tree_body_r): Likewise.
	* tree-outof-ssa.c (insert_value_copy_on_edge): Adjust call to
	store_expr.
	* tree-streamer-in.c (unpack_ts_base_value_fields): Deal with
	TYPE_REVERSE_STORAGE_ORDER and REF_REVERSE_STORAGE_ORDER.
	* tree-streamer-out.c (pack_ts_base_value_fields): Likewise.
	* tree.c (stabilize_reference) <BIT_FIELD_REF>: Propagate the
	REF_REVERSE_STORAGE_ORDER flag.
	(verify_type_variant): Deal with TYPE_REVERSE_STORAGE_ORDER.
	(gimple_canonical_types_compatible_p): Likewise.
	* tree.h (TYPE_REVERSE_STORAGE_ORDER): New flag.
	(TYPE_SATURATING): Adjust.
	(REF_REVERSE_STORAGE_ORDER): New flag.
	(reverse_storage_order_for_component_p): New inline predicate.
 	(storage_order_barrier_p): Likewise.
	(get_inner_reference): Adjust prototype.
	* varasm.c (assemble_real): Add REVERSE parameter.  Flip the storage
	order of the value if REVERSE is true.
	(compare_constant) <CONSTRUCTOR>: Compare TYPE_REVERSE_STORAGE_ORDER.
	(assemble_constant_contents): Adjust call to output_constant.
	(output_constant_pool_2): Adjust call to assemble_real.
	(initializer_constant_valid_p_1) <CONSTRUCTOR>: Deal with
	TYPE_REVERSE_STORAGE_ORDER.
	(output_constant): Add REVERSE parameter.
	<INTEGER_TYPE>: Flip the storage order of the value if REVERSE is
	true.
	<REAL_TYPE>: Adjust call to assemble_real.
	<COMPLEX_TYPE>: Pass it to recursive calls.
	<ARRAY_TYPE>: Likewise.  Adjust call to output_constructor.
	<RECORD_TYPE>: Likewise.  Adjust call to output_constructor.
	(struct oc_local_state): Add REVERSE field.
	(output_constructor_array_range): Adjust calls to output_constant.
	(output_constructor_regular_field): Likewise.
	(output_constructor_bitfield): Adjust call to output_constructor.
	Flip the storage order of the value if REVERSE is true.
	(output_constructor): Add REVERSE parameter.  Set it to true for an
	aggregate type with TYPE_REVERSE_STORAGE_ORDER.  Adjust call to
	output_constructor_bitfield.
lto/
	* lto.c (compare_tree_sccs_1): Deal with TYPE_REVERSE_STORAGE_ORDER.

 calls.c             |   10 -
 expmed.c            |  262 +++++++++++++++++++++++++++++++++---------
 expmed.h            |    8 -
 expr.c              |  324 +++++++++++++++++++++++++++++++-------------------
 expr.h              |    5 
 fold-const.c        |  132 ++++++++++++---------
 gimple-expr.c       |    8 -
 gimplify.c          |    2 
 lto-streamer-out.c  |    3 
 lto/lto.c           |    5 
 output.h            |    2 
 print-tree.c        |    7 +
 stor-layout.c       |   11 +
 tree-core.h         |    8 +
 tree-dfa.c          |   11 +
 tree-dfa.h          |    2 
 tree-inline.c       |    2 
 tree-outof-ssa.c    |    2 
 tree-streamer-in.c  |    7 -
 tree-streamer-out.c |    7 -
 tree.c              |   10 +
 tree.h              |   85 +++++++++++++
 varasm.c            |  117 +++++++++++++-----
 varasm.h            |    2 
 24 files changed, 742 insertions(+), 290 deletions(-)

Comments

Jeff Law Oct. 13, 2015, 4:07 p.m. UTC | #1
On 10/06/2015 05:04 AM, Eric Botcazou wrote:
> This is the bulk of the implementation.
>
> 	* calls.c (store_unaligned_arguments_into_pseudos): Adjust calls to
> 	extract_bit_field and store_bit_field.
> 	(initialize_argument_information): Adjust call to store_expr.
> 	(load_register_parameters): Adjust call to extract_bit_field.
> 	* expmed.c (check_reverse_storage_order_support): New function.
> 	(check_reverse_float_storage_order_support): Likewise.
> 	(flip_storage_order): Likewise.
> 	(store_bit_field_1): Add REVERSE parameter.  Flip the storage order
> 	of the value if it is true.  Pass REVERSE to recursive call after
> 	adjusting the target offset.
> 	Do not use extraction or movstrict instruction if REVERSE is true.
> 	Pass REVERSE to store_fixed_bit_field.
> 	(store_bit_field): Add REVERSE parameter and pass to it to above.
> 	(store_fixed_bit_field): Add REVERSE parameter and pass to it to
> 	store_split_bit_field and store_fixed_bit_field_1.
> 	(store_fixed_bit_field_1):  Add REVERSE parameter.  Flip the storage
> 	order of the value if it is true and adjust the target offset.
> 	(store_split_bit_field): Add REVERSE parameter and pass it to
> 	store_fixed_bit_field.  Adjust the target offset if it is true.
> 	(extract_bit_field_1): Add REVERSE parameter.  Flip the storage order
> 	of the value if it is true.  Pass REVERSE to recursive call after
> 	adjusting the target offset.
> 	Do not use extraction or subreg instruction if REVERSE is true.
> 	Pass REVERSE to extract_fixed_bit_field.
> 	(extract_bit_field): Add REVERSE parameter and pass to it to above.
> 	(extract_fixed_bit_field): Add REVERSE parameter and pass to it to
> 	extract_split_bit_field and extract_fixed_bit_field_1.
> 	(extract_fixed_bit_field_1): Add REVERSE parameter.  Flip the storage
> 	order of the value if it is true and adjust the target offset.
> 	(extract_split_bit_field): Add REVERSE parameter and pass it to
> 	extract_fixed_bit_field.  Adjust the target offset if it is true.
> 	* expmed.h (flip_storage_order): Declare.
> 	(store_bit_field): Adjust prototype.
> 	(extract_bit_field): Likewise.
> 	* expr.c (emit_group_load_1): Adjust calls to extract_bit_field.
> 	(emit_group_store): Adjust call to store_bit_field.
> 	(copy_blkmode_from_reg): Likewise.
> 	(copy_blkmode_to_reg): Likewise.
> 	(write_complex_part): Likewise.
>   	(read_complex_part): Likewise.
> 	(optimize_bitfield_assignment_op): Add REVERSE parameter.  Assert
> 	that it isn't true if the target is a register.
> 	<PLUS_EXPR>: If it is, do not optimize unless bitsize is equal to 1,
> 	and flip the storage order of the value.
> 	<BIT_IOR_EXPR>: Flip the storage order of the value.
> 	(get_bit_range): Adjust call to get_inner_reference.
> 	(expand_assignment): Adjust calls to get_inner_reference, store_expr,
> 	optimize_bitfield_assignment_op and store_field.  Handle MEM_EXPRs
> 	with reverse storage order.
> 	(store_expr_with_bounds): Add REVERSE parameter and pass it to
> 	recursive calls and call to store_bit_field.  Force the value into a
> 	register if it is true and then flip the storage order of the value.
> 	(store_expr): Add REVERSE parameter and pass it to above.
> 	(categorize_ctor_elements_1): Adjust call to
> 	initializer_constant_valid_p.
> 	(store_constructor_field): Add REVERSE parameter and pass it to
> 	recursive calls and call to store_field.
> 	(store_constructor): Add REVERSE parameter and pass it to calls to
> 	store_constructor_field and store_expr.  Set it to true for an
> 	aggregate type with TYPE_REVERSE_STORAGE_ORDER.
> 	(store_field): Add REVERSE parameter and pass it to recursive calls
> 	and calls to store_expr and store_bit_field.  Temporarily flip the
> 	storage order of the value with record type and integral mode and
> 	adjust the shift if it is true.
> 	(get_inner_reference): Add PREVERSEP parameter and set it to true
> 	upon encoutering a reference with reverse storage order.
> 	(expand_expr_addr_expr_1): Adjust call to get_inner_reference.
> 	(expand_constructor): Adjust call to store_constructor.
> 	(expand_expr_real_2) <CASE_CONVERT>: Pass TYPE_REVERSE_STORAGE_ORDER
> 	of the union type to store_expr in the MEM case and assert that it
> 	isn't set in the REG case.  Adjust call to store_field.
> 	(expand_expr_real_1) <MEM_REF>: Handle reverse storage order.
> 	<normal_inner_ref>: Add REVERSEP variable and adjust calls to
> 	get_inner_reference and extract_bit_field. Temporarily flip the
> 	storage order of the value with record type and integral mode and
> 	adjust the shift if it is true.  Flip the storage order of the value
> 	at the end if it is true.
> 	<VIEW_CONVERT_EXPR>: Add REVERSEP variable and adjust call to
> 	get_inner_reference.  Do not fetch an inner reference if it is true.
> 	* expr.h (store_expr_with_bounds): Ajust prototype.
> 	(store_expr): Likewise.
> 	* fold-const.c (make_bit_field_ref): Add REVERSEP parameter and set
> 	REF_REVERSE_STORAGE_ORDER on the reference according to it.
> 	(optimize_bit_field_compare): Deal with reverse storage order.
> 	Adjust calls to get_inner_reference and make_bit_field_ref.
> 	(decode_field_reference): Add PREVERSEP parameter and adjust call to
> 	get_inner_reference.
> 	(fold_truth_andor_1): Deal with reverse storage order.  Adjust calls
> 	to decode_field_reference and make_bit_field_ref.
> 	(fold_unary_loc) <CASE_CONVERT>: Adjust call to get_inner_reference.
> 	<VIEW_CONVERT_EXPR>: Propagate the REF_REVERSE_STORAGE_ORDER flag.
> 	(fold_comparison): Adjust call to get_inner_reference.
> 	(split_address_to_core_and_offset): Adjust call to
> 	get_inner_reference.
> 	* gimple-expr.c (useless_type_conversion_p): Return false for array
> 	types with different TYPE_REVERSE_STORAGE_ORDER flag.
> 	* gimplify.c (gimplify_expr) <MEM_REF>: Propagate the
> 	REF_REVERSE_STORAGE_ORDER flag.
> 	* lto-streamer-out.c (hash_tree): Deal with
> 	TYPE_REVERSE_STORAGE_ORDER.
> 	* output.h (assemble_real): Adjust prototype.
> 	* print-tree.c (print_node): Convey TYPE_REVERSE_STORAGE_ORDER.
> 	* stor-layout.c (finish_record_layout): Propagate the
> 	TYPE_REVERSE_STORAGE_ORDER flag to the variants.
> 	* tree-core.h (TYPE_REVERSE_STORAGE_ORDER): Document.
> 	(TYPE_SATURATING): Adjust.
> 	(REF_REVERSE_STORAGE_ORDER): Document.
> 	* tree-dfa.c (get_ref_base_and_extent): Add PREVERSE parameter and
> 	set it to true upon encoutering a reference with reverse storage
> 	order.
> 	* tree-dfa.h (get_ref_base_and_extent): Adjust prototype.
> 	* tree-inline.c (remap_gimple_op_r): Propagate the
> 	REF_REVERSE_STORAGE_ORDER flag.
> 	(copy_tree_body_r): Likewise.
> 	* tree-outof-ssa.c (insert_value_copy_on_edge): Adjust call to
> 	store_expr.
> 	* tree-streamer-in.c (unpack_ts_base_value_fields): Deal with
> 	TYPE_REVERSE_STORAGE_ORDER and REF_REVERSE_STORAGE_ORDER.
> 	* tree-streamer-out.c (pack_ts_base_value_fields): Likewise.
> 	* tree.c (stabilize_reference) <BIT_FIELD_REF>: Propagate the
> 	REF_REVERSE_STORAGE_ORDER flag.
> 	(verify_type_variant): Deal with TYPE_REVERSE_STORAGE_ORDER.
> 	(gimple_canonical_types_compatible_p): Likewise.
> 	* tree.h (TYPE_REVERSE_STORAGE_ORDER): New flag.
> 	(TYPE_SATURATING): Adjust.
> 	(REF_REVERSE_STORAGE_ORDER): New flag.
> 	(reverse_storage_order_for_component_p): New inline predicate.
>   	(storage_order_barrier_p): Likewise.
> 	(get_inner_reference): Adjust prototype.
> 	* varasm.c (assemble_real): Add REVERSE parameter.  Flip the storage
> 	order of the value if REVERSE is true.
> 	(compare_constant) <CONSTRUCTOR>: Compare TYPE_REVERSE_STORAGE_ORDER.
> 	(assemble_constant_contents): Adjust call to output_constant.
> 	(output_constant_pool_2): Adjust call to assemble_real.
> 	(initializer_constant_valid_p_1) <CONSTRUCTOR>: Deal with
> 	TYPE_REVERSE_STORAGE_ORDER.
> 	(output_constant): Add REVERSE parameter.
> 	<INTEGER_TYPE>: Flip the storage order of the value if REVERSE is
> 	true.
> 	<REAL_TYPE>: Adjust call to assemble_real.
> 	<COMPLEX_TYPE>: Pass it to recursive calls.
> 	<ARRAY_TYPE>: Likewise.  Adjust call to output_constructor.
> 	<RECORD_TYPE>: Likewise.  Adjust call to output_constructor.
> 	(struct oc_local_state): Add REVERSE field.
> 	(output_constructor_array_range): Adjust calls to output_constant.
> 	(output_constructor_regular_field): Likewise.
> 	(output_constructor_bitfield): Adjust call to output_constructor.
> 	Flip the storage order of the value if REVERSE is true.
> 	(output_constructor): Add REVERSE parameter.  Set it to true for an
> 	aggregate type with TYPE_REVERSE_STORAGE_ORDER.  Adjust call to
> 	output_constructor_bitfield.
> lto/
> 	* lto.c (compare_tree_sccs_1): Deal with TYPE_REVERSE_STORAGE_ORDER.
I must admit, I'm surprised at how many places we compare types in 
ever-so-slightly different ways.  This kind of patch makes that obvious. 
  Not asking you to fix that, just a minor rant.

I suspect there are many comments that we should update as a result of 
this work.  Essentially there's many throughout GCC of the form "On a 
big-endian target" and the like.  With these changes it's more a 
property of the data -- the vast majority of the time the data's 
property is the same as the target, but with this patch it can vary.

I just happened to spot one in varasm.c::output_constructor_bitfield, as 
the comment started shortly after some code this patch changes.  No 
doubt there's others.  I'm torn whether or not to ask you to scan and 
update them.  It's a lot of work and I'm not terribly sure how valuable 
it'll generally be.

Thanks for not trying too hard to optimize this stuff, it makes the 
expmed.c patches (for example) a lot easier to work through :-)

I didn't even try to verify you've got all the paths covered.  I mostly 
tried to make sure the patch didn't break existing code.  I'm going to 
assume that the patch essentially works and that if there's paths 
missing where reversal is needed that you'll take care of them as 
they're exposed.

I'm a bit dismayed at the number of changes to fold-const.c, but 
presumably there's no good way around them.  I probably would have 
looked for a way to punt earlier (that may not be trivial though). 
Given you've done the work, no need to undo it now.

Throughout the code, in cases where you've just added an argument, I've 
assumed you've passed it correctly in the callers -- I largely glossed 
over those with that assumption in mind.

I think this patch is fine.


Jeff
Eric Botcazou Oct. 20, 2015, 4:33 p.m. UTC | #2
> I suspect there are many comments that we should update as a result of
> this work.  Essentially there's many throughout GCC of the form "On a
> big-endian target" and the like.  With these changes it's more a
> property of the data -- the vast majority of the time the data's
> property is the same as the target, but with this patch it can vary.
> 
> I just happened to spot one in varasm.c::output_constructor_bitfield, as
> the comment started shortly after some code this patch changes.  No
> doubt there's others.  I'm torn whether or not to ask you to scan and
> update them.  It's a lot of work and I'm not terribly sure how valuable
> it'll generally be.

I have adjusted the output_constructor_bitfield case to "For big-endian data".
The cases in expmed.c have naked "big-endian" or "big-endian case" so are OK.
I have changed 2 sibling cases in expr.c but left the others unchanged since 
they are about calling conventions.  I think that's pretty much it for the 
files that are touched by the patch.

> Thanks for not trying too hard to optimize this stuff, it makes the
> expmed.c patches (for example) a lot easier to work through :-)
> 
> I didn't even try to verify you've got all the paths covered.  I mostly
> tried to make sure the patch didn't break existing code.  I'm going to
> assume that the patch essentially works and that if there's paths
> missing where reversal is needed that you'll take care of them as
> they're exposed.

Yes, the main issue in expr.c & expmed.c is getting all the paths covered.
I have run the compile and execute C torture testsuites with -fsso-struct set 
to big-endian on x86-64 so I'm relatively confident, but that's not a proof.
And one of the design constraints of the implementation was to change nothing 
to the default endianness support, so I'm more confident about the absence of 
breakages (IIRC we didn't get a single PR for a breakage with the AdaCore 
compilers while we did get a few for cases where the data was not reversed).

> I'm a bit dismayed at the number of changes to fold-const.c, but
> presumably there's no good way around them.  I probably would have
> looked for a way to punt earlier (that may not be trivial though).
> Given you've done the work, no need to undo it now.

Yes, that's the bitfield optimization stuff, so it depends on the endianness.
Initially the changes were even more pervasive because I tried to optimize 
e.g. equality comparisons of 2 bitfields with reverse endianness in there.
But this broke the "all accesses to a scalar in memory are done with the same 
storage order" invariant and was thus causing annoying issues later on.

> I think this patch is fine.

OK, thanks for the thorough review.  All the parts have been approved, modulo 
the C++ FE part.  As I already explained, this part is very likely incomplete 
(based on the changes that were made in the Ada FE proper) and only makes sure 
that C code compiled by the C++ compiler works as intended, but nothing more.
So I can propose to install only the generic, C and Ada changes for now and to 
explicitly disable the feature for C++ until a more complete implementation of 
the C++ FE part is written.  I'm OK to work on it but I'll probably need help.
diff mbox

Patch

Index: tree.c
===================================================================
--- tree.c	(.../trunk/gcc)	(revision 228112)
+++ tree.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -4166,6 +4166,7 @@  stabilize_reference (tree ref)
       result = build_nt (BIT_FIELD_REF,
 			 stabilize_reference (TREE_OPERAND (ref, 0)),
 			 TREE_OPERAND (ref, 1), TREE_OPERAND (ref, 2));
+      REF_REVERSE_STORAGE_ORDER (result) = REF_REVERSE_STORAGE_ORDER (ref);
       break;
 
     case ARRAY_REF:
@@ -12814,7 +12815,10 @@  verify_type_variant (const_tree t, tree
   verify_variant_match (TYPE_PACKED);
   if (TREE_CODE (t) == REFERENCE_TYPE)
     verify_variant_match (TYPE_REF_IS_RVALUE);
-  verify_variant_match (TYPE_SATURATING);
+  if (AGGREGATE_TYPE_P (t))
+    verify_variant_match (TYPE_REVERSE_STORAGE_ORDER);
+  else
+    verify_variant_match (TYPE_SATURATING);
   /* FIXME: This check trigger during libstdc++ build.  */
   if (RECORD_OR_UNION_TYPE_P (t) && COMPLETE_TYPE_P (t) && 0)
     verify_variant_match (TYPE_FINAL_P);
@@ -13114,6 +13118,7 @@  gimple_canonical_types_compatible_p (con
       if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2),
 						trust_type_canonical)
 	  || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
+	  || TYPE_REVERSE_STORAGE_ORDER (t1) != TYPE_REVERSE_STORAGE_ORDER (t2)
 	  || TYPE_NONALIASED_COMPONENT (t1) != TYPE_NONALIASED_COMPONENT (t2))
 	return false;
       else
@@ -13187,6 +13192,9 @@  gimple_canonical_types_compatible_p (con
       {
 	tree f1, f2;
 
+	if (TYPE_REVERSE_STORAGE_ORDER (t1) != TYPE_REVERSE_STORAGE_ORDER (t2))
+	  return false;
+
 	/* For aggregate types, all the fields must be the same.  */
 	for (f1 = TYPE_FIELDS (t1), f2 = TYPE_FIELDS (t2);
 	     f1 || f2;
Index: tree.h
===================================================================
--- tree.h	(.../trunk/gcc)	(revision 228112)
+++ tree.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -896,8 +896,29 @@  extern void omp_clause_range_check_faile
 #define IDENTIFIER_TRANSPARENT_ALIAS(NODE) \
   (IDENTIFIER_NODE_CHECK (NODE)->base.deprecated_flag)
 
-/* In fixed-point types, means a saturating type.  */
-#define TYPE_SATURATING(NODE) (TYPE_CHECK (NODE)->base.u.bits.saturating_flag)
+/* In an aggregate type, indicates that the scalar fields of the type are
+   stored in reverse order from the target order.  This effectively
+   toggles BYTES_BIG_ENDIAN and WORDS_BIG_ENDIAN within the type.  */
+#define TYPE_REVERSE_STORAGE_ORDER(NODE) \
+  (TREE_CHECK4 (NODE, RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE, ARRAY_TYPE)->base.u.bits.saturating_flag)
+
+/* In a non-aggregate type, indicates a saturating type.  */
+#define TYPE_SATURATING(NODE) \
+  (TREE_NOT_CHECK4 (NODE, RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE, ARRAY_TYPE)->base.u.bits.saturating_flag)
+
+/* In a BIT_FIELD_REF and MEM_REF, indicates that the reference is to a group
+   of bits stored in reverse order from the target order.  This effectively
+   toggles both BYTES_BIG_ENDIAN and WORDS_BIG_ENDIAN for the reference.
+
+   The overall strategy is to preserve the invariant that every scalar in
+   memory is associated with a single storage order, i.e. all accesses to
+   this scalar are done with the same storage order.  This invariant makes
+   it possible to factor out the storage order in most transformations, as
+   only the address and/or the value (in target order) matter for them.
+   But, of course, the storage order must be preserved when the accesses
+   themselves are rewritten or transformed.  */
+#define REF_REVERSE_STORAGE_ORDER(NODE) \
+  (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag)
 
 /* These flags are available for each language front end to use internally.  */
 #define TREE_LANG_FLAG_0(NODE) \
@@ -4288,6 +4309,64 @@  handled_component_p (const_tree t)
     }
 }
 
+/* Return true T is a component with reverse storage order.  */
+
+static inline bool
+reverse_storage_order_for_component_p (tree t)
+{
+  /* The storage order only applies to scalar components.  */
+  if (AGGREGATE_TYPE_P (TREE_TYPE (t)) || VECTOR_TYPE_P (TREE_TYPE (t)))
+    return false;
+
+  if (TREE_CODE (t) == REALPART_EXPR || TREE_CODE (t) == IMAGPART_EXPR)
+    t = TREE_OPERAND (t, 0);
+
+  switch (TREE_CODE (t))
+    {
+    case ARRAY_REF:
+    case COMPONENT_REF:
+      /* ??? Fortran can take COMPONENT_REF of a void type.  */
+      return !VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 0)))
+	     && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (t, 0)));
+
+    case BIT_FIELD_REF:
+    case MEM_REF:
+      return REF_REVERSE_STORAGE_ORDER (t);
+
+    case ARRAY_RANGE_REF:
+    case VIEW_CONVERT_EXPR:
+    default:
+      return false;
+    }
+
+  gcc_unreachable ();
+}
+
+/* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
+   that can modify the storage order of objects.  Note that, even if the
+   TYPE_REVERSE_STORAGE_ORDER flag is set on both the inner type and the
+   outer type, a VIEW_CONVERT_EXPR can modify the storage order because
+   it can change the partition of the aggregate object into scalars.  */
+
+static inline bool
+storage_order_barrier_p (const_tree t)
+{
+  if (TREE_CODE (t) != VIEW_CONVERT_EXPR)
+    return false;
+
+  if (AGGREGATE_TYPE_P (TREE_TYPE (t))
+      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t)))
+    return true;
+
+  tree op = TREE_OPERAND (t, 0);
+
+  if (AGGREGATE_TYPE_P (TREE_TYPE (op))
+      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (op)))
+    return true;
+
+  return false;
+}
+
 /* Given a DECL or TYPE, return the scope in which it was declared, or
    NUL_TREE if there is no containing scope.  */
 
@@ -5088,7 +5167,7 @@  extern bool complete_ctor_at_level_p (co
    the access position and size.  */
 extern tree get_inner_reference (tree, HOST_WIDE_INT *, HOST_WIDE_INT *,
 				 tree *, machine_mode *, int *, int *,
-				 bool);
+				 int *, bool);
 
 extern tree build_personality_function (const char *);
 
Index: fold-const.c
===================================================================
--- fold-const.c	(.../trunk/gcc)	(revision 228112)
+++ fold-const.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -119,12 +119,12 @@  static int operand_equal_for_comparison_
 static int twoval_comparison_p (tree, tree *, tree *, int *);
 static tree eval_subst (location_t, tree, tree, tree, tree, tree);
 static tree make_bit_field_ref (location_t, tree, tree,
-				HOST_WIDE_INT, HOST_WIDE_INT, int);
+				HOST_WIDE_INT, HOST_WIDE_INT, int, int);
 static tree optimize_bit_field_compare (location_t, enum tree_code,
 					tree, tree, tree);
 static tree decode_field_reference (location_t, tree, HOST_WIDE_INT *,
 				    HOST_WIDE_INT *,
-				    machine_mode *, int *, int *,
+				    machine_mode *, int *, int *, int *,
 				    tree *, tree *);
 static int simple_operand_p (const_tree);
 static bool simple_operand_p_2 (tree);
@@ -3614,15 +3614,17 @@  distribute_real_division (location_t loc
 }
 
 /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER
-   starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero.  */
+   starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero
+   and uses reverse storage order if REVERSEP is nonzero.  */
 
 static tree
 make_bit_field_ref (location_t loc, tree inner, tree type,
-		    HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos, int unsignedp)
+		    HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+		    int unsignedp, int reversep)
 {
   tree result, bftype;
 
-  if (bitpos == 0)
+  if (bitpos == 0 && !reversep)
     {
       tree size = TYPE_SIZE (TREE_TYPE (inner));
       if ((INTEGRAL_TYPE_P (TREE_TYPE (inner))
@@ -3639,6 +3641,7 @@  make_bit_field_ref (location_t loc, tree
 
   result = build3_loc (loc, BIT_FIELD_REF, bftype, inner,
 		       size_int (bitsize), bitsize_int (bitpos));
+  REF_REVERSE_STORAGE_ORDER (result) = reversep;
 
   if (bftype != type)
     result = fold_convert_loc (loc, type, result);
@@ -3676,6 +3679,7 @@  optimize_bit_field_compare (location_t l
   int const_p = TREE_CODE (rhs) == INTEGER_CST;
   machine_mode lmode, rmode, nmode;
   int lunsignedp, runsignedp;
+  int lreversep, rreversep;
   int lvolatilep = 0, rvolatilep = 0;
   tree linner, rinner = NULL_TREE;
   tree mask;
@@ -3687,20 +3691,23 @@  optimize_bit_field_compare (location_t l
      do anything if the inner expression is a PLACEHOLDER_EXPR since we
      then will no longer be able to replace it.  */
   linner = get_inner_reference (lhs, &lbitsize, &lbitpos, &offset, &lmode,
-				&lunsignedp, &lvolatilep, false);
+				&lunsignedp, &lreversep, &lvolatilep, false);
   if (linner == lhs || lbitsize == GET_MODE_BITSIZE (lmode) || lbitsize < 0
       || offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR || lvolatilep)
     return 0;
 
- if (!const_p)
+  if (const_p)
+    rreversep = lreversep;
+  else
    {
      /* If this is not a constant, we can only do something if bit positions,
-	sizes, and signedness are the same.  */
-     rinner = get_inner_reference (rhs, &rbitsize, &rbitpos, &offset, &rmode,
-				   &runsignedp, &rvolatilep, false);
+	sizes, signedness and storage order are the same.  */
+     rinner
+       = get_inner_reference (rhs, &rbitsize, &rbitpos, &offset, &rmode,
+			      &runsignedp, &rreversep, &rvolatilep, false);
 
      if (rinner == rhs || lbitpos != rbitpos || lbitsize != rbitsize
-	 || lunsignedp != runsignedp || offset != 0
+	 || lunsignedp != runsignedp || lreversep != rreversep || offset != 0
 	 || TREE_CODE (rinner) == PLACEHOLDER_EXPR || rvolatilep)
        return 0;
    }
@@ -3728,7 +3735,7 @@  optimize_bit_field_compare (location_t l
   if (nbitsize == lbitsize)
     return 0;
 
-  if (BYTES_BIG_ENDIAN)
+  if (lreversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     lbitpos = nbitsize - lbitsize - lbitpos;
 
   /* Make the mask to be used against the extracted field.  */
@@ -3745,17 +3752,17 @@  optimize_bit_field_compare (location_t l
 				     make_bit_field_ref (loc, linner,
 							 unsigned_type,
 							 nbitsize, nbitpos,
-							 1),
+							 1, lreversep),
 				     mask),
 			fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
 				     make_bit_field_ref (loc, rinner,
 							 unsigned_type,
 							 nbitsize, nbitpos,
-							 1),
+							 1, rreversep),
 				     mask));
 
-  /* Otherwise, we are handling the constant case. See if the constant is too
-     big for the field.  Warn and return a tree of for 0 (false) if so.  We do
+  /* Otherwise, we are handling the constant case.  See if the constant is too
+     big for the field.  Warn and return a tree for 0 (false) if so.  We do
      this not only for its own sake, but to avoid having to test for this
      error case below.  If we didn't, we might generate wrong code.
 
@@ -3793,7 +3800,8 @@  optimize_bit_field_compare (location_t l
   /* Make a new bitfield reference, shift the constant over the
      appropriate number of bits and mask it with the computed mask
      (in case this was a signed field).  If we changed it, make a new one.  */
-  lhs = make_bit_field_ref (loc, linner, unsigned_type, nbitsize, nbitpos, 1);
+  lhs = make_bit_field_ref (loc, linner, unsigned_type, nbitsize, nbitpos, 1,
+			    lreversep);
 
   rhs = const_binop (BIT_AND_EXPR,
 		     const_binop (LSHIFT_EXPR,
@@ -3821,6 +3829,8 @@  optimize_bit_field_compare (location_t l
 
    *PUNSIGNEDP is set to the signedness of the field.
 
+   *PREVERSEP is set to the storage order of the field.
+
    *PMASK is set to the mask used.  This is either contained in a
    BIT_AND_EXPR or derived from the width of the field.
 
@@ -3832,7 +3842,7 @@  optimize_bit_field_compare (location_t l
 static tree
 decode_field_reference (location_t loc, tree exp, HOST_WIDE_INT *pbitsize,
 			HOST_WIDE_INT *pbitpos, machine_mode *pmode,
-			int *punsignedp, int *pvolatilep,
+			int *punsignedp, int *preversep, int *pvolatilep,
 			tree *pmask, tree *pand_mask)
 {
   tree outer_type = 0;
@@ -3865,7 +3875,7 @@  decode_field_reference (location_t loc,
     }
 
   inner = get_inner_reference (exp, pbitsize, pbitpos, &offset, pmode,
-			       punsignedp, pvolatilep, false);
+			       punsignedp, preversep, pvolatilep, false);
   if ((inner == exp && and_mask == 0)
       || *pbitsize < 0 || offset != 0
       || TREE_CODE (inner) == PLACEHOLDER_EXPR)
@@ -5369,6 +5379,7 @@  fold_truth_andor_1 (location_t loc, enum
   HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
   HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
   int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
+  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
   machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
   machine_mode lnmode, rnmode;
   tree ll_mask, lr_mask, rl_mask, rr_mask;
@@ -5481,33 +5492,39 @@  fold_truth_andor_1 (location_t loc, enum
   volatilep = 0;
   ll_inner = decode_field_reference (loc, ll_arg,
 				     &ll_bitsize, &ll_bitpos, &ll_mode,
-				     &ll_unsignedp, &volatilep, &ll_mask,
-				     &ll_and_mask);
+				     &ll_unsignedp, &ll_reversep, &volatilep,
+				     &ll_mask, &ll_and_mask);
   lr_inner = decode_field_reference (loc, lr_arg,
 				     &lr_bitsize, &lr_bitpos, &lr_mode,
-				     &lr_unsignedp, &volatilep, &lr_mask,
-				     &lr_and_mask);
+				     &lr_unsignedp, &lr_reversep, &volatilep,
+				     &lr_mask, &lr_and_mask);
   rl_inner = decode_field_reference (loc, rl_arg,
 				     &rl_bitsize, &rl_bitpos, &rl_mode,
-				     &rl_unsignedp, &volatilep, &rl_mask,
-				     &rl_and_mask);
+				     &rl_unsignedp, &rl_reversep, &volatilep,
+				     &rl_mask, &rl_and_mask);
   rr_inner = decode_field_reference (loc, rr_arg,
 				     &rr_bitsize, &rr_bitpos, &rr_mode,
-				     &rr_unsignedp, &volatilep, &rr_mask,
-				     &rr_and_mask);
+				     &rr_unsignedp, &rr_reversep, &volatilep,
+				     &rr_mask, &rr_and_mask);
 
   /* It must be true that the inner operation on the lhs of each
      comparison must be the same if we are to be able to do anything.
      Then see if we have constants.  If not, the same must be true for
      the rhs's.  */
-  if (volatilep || ll_inner == 0 || rl_inner == 0
+  if (volatilep
+      || ll_reversep != rl_reversep
+      || ll_inner == 0 || rl_inner == 0
       || ! operand_equal_p (ll_inner, rl_inner, 0))
     return 0;
 
   if (TREE_CODE (lr_arg) == INTEGER_CST
       && TREE_CODE (rr_arg) == INTEGER_CST)
-    l_const = lr_arg, r_const = rr_arg;
-  else if (lr_inner == 0 || rr_inner == 0
+    {
+      l_const = lr_arg, r_const = rr_arg;
+      lr_reversep = ll_reversep;
+    }
+  else if (lr_reversep != rr_reversep
+	   || lr_inner == 0 || rr_inner == 0
 	   || ! operand_equal_p (lr_inner, rr_inner, 0))
     return 0;
   else
@@ -5560,7 +5577,7 @@  fold_truth_andor_1 (location_t loc, enum
   lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
   xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
 
-  if (BYTES_BIG_ENDIAN)
+  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     {
       xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
       xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
@@ -5625,7 +5642,7 @@  fold_truth_andor_1 (location_t loc, enum
       rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
       xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
 
-      if (BYTES_BIG_ENDIAN)
+      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
 	  xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
@@ -5648,12 +5665,12 @@  fold_truth_andor_1 (location_t loc, enum
       if (lnbitsize == rnbitsize && xll_bitpos == xlr_bitpos)
 	{
 	  lhs = make_bit_field_ref (loc, ll_inner, lntype, lnbitsize, lnbitpos,
-				    ll_unsignedp || rl_unsignedp);
+				    ll_unsignedp || rl_unsignedp, ll_reversep);
 	  if (! all_ones_mask_p (ll_mask, lnbitsize))
 	    lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
 
 	  rhs = make_bit_field_ref (loc, lr_inner, rntype, rnbitsize, rnbitpos,
-				    lr_unsignedp || rr_unsignedp);
+				    lr_unsignedp || rr_unsignedp, lr_reversep);
 	  if (! all_ones_mask_p (lr_mask, rnbitsize))
 	    rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
 
@@ -5676,10 +5693,12 @@  fold_truth_andor_1 (location_t loc, enum
 
 	  lhs = make_bit_field_ref (loc, ll_inner, lntype,
 				    ll_bitsize + rl_bitsize,
-				    MIN (ll_bitpos, rl_bitpos), ll_unsignedp);
+				    MIN (ll_bitpos, rl_bitpos),
+				    ll_unsignedp, ll_reversep);
 	  rhs = make_bit_field_ref (loc, lr_inner, rntype,
 				    lr_bitsize + rr_bitsize,
-				    MIN (lr_bitpos, rr_bitpos), lr_unsignedp);
+				    MIN (lr_bitpos, rr_bitpos),
+				    lr_unsignedp, lr_reversep);
 
 	  ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
 				 size_int (MIN (xll_bitpos, xrl_bitpos)));
@@ -5742,7 +5761,7 @@  fold_truth_andor_1 (location_t loc, enum
      that field, perform the mask operation.  Then compare with the
      merged constant.  */
   result = make_bit_field_ref (loc, ll_inner, lntype, lnbitsize, lnbitpos,
-			       ll_unsignedp || rl_unsignedp);
+			       ll_unsignedp || rl_unsignedp, ll_reversep);
 
   ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
   if (! all_ones_mask_p (ll_mask, lnbitsize))
@@ -7597,10 +7616,11 @@  fold_unary_loc (location_t loc, enum tre
 	  HOST_WIDE_INT bitsize, bitpos;
 	  tree offset;
 	  machine_mode mode;
-	  int unsignedp, volatilep;
-          tree base = TREE_OPERAND (op0, 0);
-	  base = get_inner_reference (base, &bitsize, &bitpos, &offset,
-				      &mode, &unsignedp, &volatilep, false);
+	  int unsignedp, reversep, volatilep;
+	  tree base
+	    = get_inner_reference (TREE_OPERAND (op0, 0), &bitsize, &bitpos,
+				   &offset, &mode, &unsignedp, &reversep,
+				   &volatilep, false);
 	  /* If the reference was to a (constant) zero offset, we can use
 	     the address of the base if it has the same base type
 	     as the result type and the pointer type is unqualified.  */
@@ -7739,8 +7759,12 @@  fold_unary_loc (location_t loc, enum tre
 
     case VIEW_CONVERT_EXPR:
       if (TREE_CODE (op0) == MEM_REF)
-	return fold_build2_loc (loc, MEM_REF, type,
-				TREE_OPERAND (op0, 0), TREE_OPERAND (op0, 1));
+        {
+	  tem = fold_build2_loc (loc, MEM_REF, type,
+				 TREE_OPERAND (op0, 0), TREE_OPERAND (op0, 1));
+	  REF_REVERSE_STORAGE_ORDER (tem) = REF_REVERSE_STORAGE_ORDER (op0);
+	  return tem;
+	}
 
       return NULL_TREE;
 
@@ -8309,7 +8333,7 @@  fold_comparison (location_t loc, enum tr
       tree base0, base1, offset0 = NULL_TREE, offset1 = NULL_TREE;
       HOST_WIDE_INT bitsize, bitpos0 = 0, bitpos1 = 0;
       machine_mode mode;
-      int volatilep, unsignedp;
+      int volatilep, reversep, unsignedp;
       bool indirect_base0 = false, indirect_base1 = false;
 
       /* Get base and offset for the access.  Strip ADDR_EXPR for
@@ -8319,9 +8343,10 @@  fold_comparison (location_t loc, enum tr
       base0 = arg0;
       if (TREE_CODE (arg0) == ADDR_EXPR)
 	{
-	  base0 = get_inner_reference (TREE_OPERAND (arg0, 0),
-				       &bitsize, &bitpos0, &offset0, &mode,
-				       &unsignedp, &volatilep, false);
+	  base0
+	    = get_inner_reference (TREE_OPERAND (arg0, 0),
+				   &bitsize, &bitpos0, &offset0, &mode,
+				   &unsignedp, &reversep, &volatilep, false);
 	  if (TREE_CODE (base0) == INDIRECT_REF)
 	    base0 = TREE_OPERAND (base0, 0);
 	  else
@@ -8353,9 +8378,10 @@  fold_comparison (location_t loc, enum tr
       base1 = arg1;
       if (TREE_CODE (arg1) == ADDR_EXPR)
 	{
-	  base1 = get_inner_reference (TREE_OPERAND (arg1, 0),
-				       &bitsize, &bitpos1, &offset1, &mode,
-				       &unsignedp, &volatilep, false);
+	  base1
+	    = get_inner_reference (TREE_OPERAND (arg1, 0),
+				   &bitsize, &bitpos1, &offset1, &mode,
+				   &unsignedp, &reversep, &volatilep, false);
 	  if (TREE_CODE (base1) == INDIRECT_REF)
 	    base1 = TREE_OPERAND (base1, 0);
 	  else
@@ -14213,15 +14239,15 @@  split_address_to_core_and_offset (tree e
 {
   tree core;
   machine_mode mode;
-  int unsignedp, volatilep;
+  int unsignedp, reversep, volatilep;
   HOST_WIDE_INT bitsize;
   location_t loc = EXPR_LOCATION (exp);
 
   if (TREE_CODE (exp) == ADDR_EXPR)
     {
       core = get_inner_reference (TREE_OPERAND (exp, 0), &bitsize, pbitpos,
-				  poffset, &mode, &unsignedp, &volatilep,
-				  false);
+				  poffset, &mode, &unsignedp, &reversep,
+				  &volatilep, false);
       core = build_fold_addr_expr_loc (loc, core);
     }
   else
Index: expr.c
===================================================================
--- expr.c	(.../trunk/gcc)	(revision 228112)
+++ expr.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -134,11 +134,11 @@  static rtx_insn *compress_float_constant
 static rtx get_subtarget (rtx);
 static void store_constructor_field (rtx, unsigned HOST_WIDE_INT,
 				     HOST_WIDE_INT, machine_mode,
-				     tree, int, alias_set_type);
-static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
+				     tree, int, alias_set_type, bool);
+static void store_constructor (tree, rtx, int, HOST_WIDE_INT, bool);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
 			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
-			machine_mode, tree, alias_set_type, bool);
+			machine_mode, tree, alias_set_type, bool, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
 
@@ -1691,7 +1691,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, r
 		  && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode))
 		tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
 					     (bytepos % slen0) * BITS_PER_UNIT,
-					     1, NULL_RTX, mode, mode);
+					     1, NULL_RTX, mode, mode, false);
 	    }
 	  else
 	    {
@@ -1701,7 +1701,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, r
 	      mem = assign_stack_temp (GET_MODE (src), slen);
 	      emit_move_insn (mem, src);
 	      tmps[i] = extract_bit_field (mem, bytelen * BITS_PER_UNIT,
-					   0, 1, NULL_RTX, mode, mode);
+					   0, 1, NULL_RTX, mode, mode, false);
 	    }
 	}
       /* FIXME: A SIMD parallel will eventually lead to a subreg of a
@@ -1744,7 +1744,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, r
       else
 	tmps[i] = extract_bit_field (src, bytelen * BITS_PER_UNIT,
 				     bytepos * BITS_PER_UNIT, 1, NULL_RTX,
-				     mode, mode);
+				     mode, mode, false);
 
       if (shift)
 	tmps[i] = expand_shift (LSHIFT_EXPR, mode, tmps[i],
@@ -2052,7 +2052,7 @@  emit_group_store (rtx orig_dst, rtx src,
 	  store_bit_field (dest,
 			   adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
 			   bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
-			   VOIDmode, tmps[i]);
+			   VOIDmode, tmps[i], false);
 	}
 
       /* Optimize the access just a bit.  */
@@ -2065,7 +2065,7 @@  emit_group_store (rtx orig_dst, rtx src,
 
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 0, 0, mode, tmps[i], false);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2195,7 +2195,9 @@  copy_blkmode_from_reg (rtx target, rtx s
       store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1,
-					  NULL_RTX, copy_mode, copy_mode));
+					  NULL_RTX, copy_mode, copy_mode,
+					  false),
+		       false);
     }
 }
 
@@ -2272,7 +2274,9 @@  copy_blkmode_to_reg (machine_mode mode,
 		       0, 0, word_mode,
 		       extract_bit_field (src_word, bitsize,
 					  bitpos % BITS_PER_WORD, 1,
-					  NULL_RTX, word_mode, word_mode));
+					  NULL_RTX, word_mode, word_mode,
+					  false),
+		       false);
     }
 
   if (mode == BLKmode)
@@ -3017,7 +3021,8 @@  write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
+		   false);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3080,7 +3085,7 @@  read_complex_part (rtx cplx, bool imag_p
     }
 
   return extract_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
-			    true, NULL_RTX, imode, imode);
+			    true, NULL_RTX, imode, imode, false);
 }
 
 /* A subroutine of emit_move_insn_1.  Yet another lowpart generator.
@@ -4470,7 +4475,7 @@  optimize_bitfield_assignment_op (unsigne
 				 unsigned HOST_WIDE_INT bitregion_start,
 				 unsigned HOST_WIDE_INT bitregion_end,
 				 machine_mode mode1, rtx str_rtx,
-				 tree to, tree src)
+				 tree to, tree src, bool reverse)
 {
   machine_mode str_mode = GET_MODE (str_rtx);
   unsigned int str_bitsize = GET_MODE_BITSIZE (str_mode);
@@ -4543,6 +4548,8 @@  optimize_bitfield_assignment_op (unsigne
     }
   else if (!REG_P (str_rtx) && GET_CODE (str_rtx) != SUBREG)
     return false;
+  else
+    gcc_assert (!reverse);
 
   /* If the bit field covers the whole REG/MEM, store_field
      will likely generate better code.  */
@@ -4553,7 +4560,7 @@  optimize_bitfield_assignment_op (unsigne
   if (bitpos + bitsize > str_bitsize)
     return false;
 
-  if (BYTES_BIG_ENDIAN)
+  if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     bitpos = str_bitsize - bitpos - bitsize;
 
   switch (code)
@@ -4566,7 +4573,7 @@  optimize_bitfield_assignment_op (unsigne
 	 We might win by one instruction for the other bitfields
 	 too if insv/extv instructions aren't used, so that
 	 can be added later.  */
-      if (bitpos + bitsize != str_bitsize
+      if ((reverse || bitpos + bitsize != str_bitsize)
 	  && (bitsize != 1 || TREE_CODE (op1) != INTEGER_CST))
 	break;
 
@@ -4584,13 +4591,17 @@  optimize_bitfield_assignment_op (unsigne
 	  set_mem_expr (str_rtx, 0);
 	}
 
-      binop = code == PLUS_EXPR ? add_optab : sub_optab;
-      if (bitsize == 1 && bitpos + bitsize != str_bitsize)
+      if (bitsize == 1 && (reverse || bitpos + bitsize != str_bitsize))
 	{
 	  value = expand_and (str_mode, value, const1_rtx, NULL);
 	  binop = xor_optab;
 	}
+      else
+	binop = code == PLUS_EXPR ? add_optab : sub_optab;
+
       value = expand_shift (LSHIFT_EXPR, str_mode, value, bitpos, NULL_RTX, 1);
+      if (reverse)
+	value = flip_storage_order (str_mode, value);
       result = expand_binop (str_mode, binop, str_rtx,
 			     value, str_rtx, 1, OPTAB_WIDEN);
       if (result != str_rtx)
@@ -4623,6 +4634,8 @@  optimize_bitfield_assignment_op (unsigne
 	  value = expand_and (str_mode, value, mask, NULL_RTX);
 	}
       value = expand_shift (LSHIFT_EXPR, str_mode, value, bitpos, NULL_RTX, 1);
+      if (reverse)
+	value = flip_storage_order (str_mode, value);
       result = expand_binop (str_mode, binop, str_rtx,
 			     value, str_rtx, 1, OPTAB_WIDEN);
       if (result != str_rtx)
@@ -4677,10 +4690,10 @@  get_bit_range (unsigned HOST_WIDE_INT *b
       machine_mode rmode;
       HOST_WIDE_INT rbitsize, rbitpos;
       tree roffset;
-      int unsignedp;
-      int volatilep = 0;
+      int unsignedp, reversep, volatilep = 0;
       get_inner_reference (TREE_OPERAND (exp, 0), &rbitsize, &rbitpos,
-			   &roffset, &rmode, &unsignedp, &volatilep, false);
+			   &roffset, &rmode, &unsignedp, &reversep,
+			   &volatilep, false);
       if ((rbitpos % BITS_PER_UNIT) != 0)
 	{
 	  *bitstart = *bitend = 0;
@@ -4796,6 +4809,8 @@  expand_assignment (tree to, tree from, b
       reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
       reg = force_not_mem (reg);
       mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
+      if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
+	reg = flip_storage_order (mode, reg);
 
       if (icode != CODE_FOR_nothing)
 	{
@@ -4808,7 +4823,8 @@  expand_assignment (tree to, tree from, b
 	  expand_insn (icode, 2, ops);
 	}
       else
-	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg);
+	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
+			 false);
       return;
     }
 
@@ -4819,7 +4835,8 @@  expand_assignment (tree to, tree from, b
      problem.  Same for (partially) storing into a non-memory object.  */
   if (handled_component_p (to)
       || (TREE_CODE (to) == MEM_REF
-	  && mem_ref_refers_to_non_mem_p (to))
+	  && (REF_REVERSE_STORAGE_ORDER (to)
+	      || mem_ref_refers_to_non_mem_p (to)))
       || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE)
     {
       machine_mode mode1;
@@ -4827,13 +4844,12 @@  expand_assignment (tree to, tree from, b
       unsigned HOST_WIDE_INT bitregion_start = 0;
       unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
-      int unsignedp;
-      int volatilep = 0;
+      int unsignedp, reversep, volatilep = 0;
       tree tem;
 
       push_temp_slots ();
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
-				 &unsignedp, &volatilep, true);
+				 &unsignedp, &reversep, &volatilep, true);
 
       /* Make sure bitpos is not negative, it can wreak havoc later.  */
       if (bitpos < 0)
@@ -4952,22 +4968,22 @@  expand_assignment (tree to, tree from, b
 	  if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from)))
 	      && bitpos == 0
 	      && bitsize == mode_bitsize)
-	    result = store_expr (from, to_rtx, false, nontemporal);
+	    result = store_expr (from, to_rtx, false, nontemporal, reversep);
 	  else if (bitsize == mode_bitsize / 2
 		   && (bitpos == 0 || bitpos == mode_bitsize / 2))
 	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
-				 nontemporal);
+				 nontemporal, reversep);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
 				  bitregion_start, bitregion_end,
-				  mode1, from,
-				  get_alias_set (to), nontemporal);
+				  mode1, from, get_alias_set (to),
+				  nontemporal, reversep);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
 				  bitregion_start, bitregion_end,
-				  mode1, from,
-				  get_alias_set (to), nontemporal);
+				  mode1, from, get_alias_set (to),
+				  nontemporal, reversep);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
 	    {
 	      rtx from_rtx;
@@ -4987,8 +5003,8 @@  expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
-				    mode1, from,
-				    get_alias_set (to), nontemporal);
+				    mode1, from, get_alias_set (to),
+				    nontemporal, reversep);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
 	      emit_move_insn (XEXP (to_rtx, 1), read_complex_part (temp, true));
 	    }
@@ -5007,14 +5023,14 @@  expand_assignment (tree to, tree from, b
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
 					       bitregion_start, bitregion_end,
-					       mode1,
-					       to_rtx, to, from))
+					       mode1, to_rtx, to, from,
+					       reversep))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
 				  bitregion_start, bitregion_end,
-				  mode1, from,
-				  get_alias_set (to), nontemporal);
+				  mode1, from, get_alias_set (to),
+				  nontemporal, reversep);
 	}
 
       if (result)
@@ -5168,7 +5184,7 @@  expand_assignment (tree to, tree from, b
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
-  result = store_expr_with_bounds (from, to_rtx, 0, nontemporal, to);
+  result = store_expr_with_bounds (from, to_rtx, 0, nontemporal, false, to);
   preserve_temp_slots (result);
   pop_temp_slots ();
   return;
@@ -5207,12 +5223,14 @@  emit_storent_insn (rtx to, rtx from)
 
    If NONTEMPORAL is true, try using a nontemporal store instruction.
 
+   If REVERSE is true, the store is to be done in reverse order.
+
    If BTARGET is not NULL then computed bounds of EXP are
    associated with BTARGET.  */
 
 rtx
 store_expr_with_bounds (tree exp, rtx target, int call_param_p,
-			bool nontemporal, tree btarget)
+			bool nontemporal, bool reverse, tree btarget)
 {
   rtx temp;
   rtx alt_rtl = NULL_RTX;
@@ -5234,7 +5252,8 @@  store_expr_with_bounds (tree exp, rtx ta
       expand_expr (TREE_OPERAND (exp, 0), const0_rtx, VOIDmode,
 		   call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL);
       return store_expr_with_bounds (TREE_OPERAND (exp, 1), target,
-				     call_param_p, nontemporal, btarget);
+				     call_param_p, nontemporal, reverse,
+				     btarget);
     }
   else if (TREE_CODE (exp) == COND_EXPR && GET_MODE (target) == BLKmode)
     {
@@ -5249,12 +5268,12 @@  store_expr_with_bounds (tree exp, rtx ta
       NO_DEFER_POP;
       jumpifnot (TREE_OPERAND (exp, 0), lab1, -1);
       store_expr_with_bounds (TREE_OPERAND (exp, 1), target, call_param_p,
-			      nontemporal, btarget);
+			      nontemporal, reverse, btarget);
       emit_jump_insn (targetm.gen_jump (lab2));
       emit_barrier ();
       emit_label (lab1);
       store_expr_with_bounds (TREE_OPERAND (exp, 2), target, call_param_p,
-			      nontemporal, btarget);
+			      nontemporal, reverse, btarget);
       emit_label (lab2);
       OK_DEFER_POP;
 
@@ -5393,9 +5412,9 @@  store_expr_with_bounds (tree exp, rtx ta
       rtx tmp_target;
 
   normal_expr:
-      /* If we want to use a nontemporal store, force the value to
-	 register first.  */
-      tmp_target = nontemporal ? NULL_RTX : target;
+      /* If we want to use a nontemporal or a reverse order store, force the
+	 value into a register first.  */
+      tmp_target = nontemporal || reverse ? NULL_RTX : target;
       temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
 			       (call_param_p
 				? EXPAND_STACK_PARM : EXPAND_NORMAL),
@@ -5470,7 +5489,7 @@  store_expr_with_bounds (tree exp, rtx ta
 	      else
 		store_bit_field (target,
 				 INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-				 0, 0, 0, GET_MODE (temp), temp);
+				 0, 0, 0, GET_MODE (temp), temp, reverse);
 	    }
 	  else
 	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
@@ -5569,6 +5588,8 @@  store_expr_with_bounds (tree exp, rtx ta
 	;
       else
 	{
+	  if (reverse)
+	    temp = flip_storage_order (GET_MODE (target), temp);
 	  temp = force_operand (temp, target);
 	  if (temp != target)
 	    emit_move_insn (target, temp);
@@ -5580,9 +5601,11 @@  store_expr_with_bounds (tree exp, rtx ta
 
 /* Same as store_expr_with_bounds but ignoring bounds of EXP.  */
 rtx
-store_expr (tree exp, rtx target, int call_param_p, bool nontemporal)
+store_expr (tree exp, rtx target, int call_param_p, bool nontemporal,
+	    bool reverse)
 {
-  return store_expr_with_bounds (exp, target, call_param_p, nontemporal, NULL);
+  return store_expr_with_bounds (exp, target, call_param_p, nontemporal,
+				 reverse, NULL);
 }
 
 /* Return true if field F of structure TYPE is a flexible array.  */
@@ -5802,8 +5825,12 @@  categorize_ctor_elements_1 (const_tree c
 	    init_elts += mult * tc;
 
 	    if (const_from_elts_p && const_p)
-	      const_p = initializer_constant_valid_p (value, elt_type)
-			!= NULL_TREE;
+	      const_p
+		= initializer_constant_valid_p (value,
+						elt_type,
+						TYPE_REVERSE_STORAGE_ORDER
+						(TREE_TYPE (ctor)))
+		  != NULL_TREE;
 	  }
 	  break;
 	}
@@ -5908,6 +5935,7 @@  all_zeros_p (const_tree exp)
    TARGET, BITSIZE, BITPOS, MODE, EXP are as for store_field.
    CLEARED is as for store_constructor.
    ALIAS_SET is the alias set to use for any stores.
+   If REVERSE is true, the store is to be done in reverse order.
 
    This provides a recursive shortcut back to store_constructor when it isn't
    necessary to go through store_field.  This is so that we can pass through
@@ -5917,7 +5945,8 @@  all_zeros_p (const_tree exp)
 static void
 store_constructor_field (rtx target, unsigned HOST_WIDE_INT bitsize,
 			 HOST_WIDE_INT bitpos, machine_mode mode,
-			 tree exp, int cleared, alias_set_type alias_set)
+			 tree exp, int cleared,
+			 alias_set_type alias_set, bool reverse)
 {
   if (TREE_CODE (exp) == CONSTRUCTOR
       /* We can only call store_constructor recursively if the size and
@@ -5946,10 +5975,12 @@  store_constructor_field (rtx target, uns
 	  set_mem_alias_set (target, alias_set);
 	}
 
-      store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
+      store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT,
+			 reverse);
     }
   else
-    store_field (target, bitsize, bitpos, 0, 0, mode, exp, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, 0, mode, exp, alias_set, false,
+		 reverse);
 }
 
 
@@ -5975,10 +6006,12 @@  fields_length (const_tree type)
    CLEARED is true if TARGET is known to have been zero'd.
    SIZE is the number of bytes of TARGET we are allowed to modify: this
    may not be the same as the size of EXP if we are assigning to a field
-   which has been packed to exclude padding bits.  */
+   which has been packed to exclude padding bits.
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 static void
-store_constructor (tree exp, rtx target, int cleared, HOST_WIDE_INT size)
+store_constructor (tree exp, rtx target, int cleared, HOST_WIDE_INT size,
+		   bool reverse)
 {
   tree type = TREE_TYPE (exp);
   HOST_WIDE_INT exp_size = int_size_in_bytes (type);
@@ -5992,6 +6025,9 @@  store_constructor (tree exp, rtx target,
 	unsigned HOST_WIDE_INT idx;
 	tree field, value;
 
+	/* The storage order is specified for every aggregate type.  */
+	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
+
 	/* If size is zero or the target is already cleared, do nothing.  */
 	if (size == 0 || cleared)
 	  cleared = 1;
@@ -6135,7 +6171,8 @@  store_constructor (tree exp, rtx target,
 
 	    store_constructor_field (to_rtx, bitsize, bitpos, mode,
 				     value, cleared,
-				     get_alias_set (TREE_TYPE (field)));
+				     get_alias_set (TREE_TYPE (field)),
+				     reverse);
 	  }
 	break;
       }
@@ -6150,6 +6187,9 @@  store_constructor (tree exp, rtx target,
 	HOST_WIDE_INT minelt = 0;
 	HOST_WIDE_INT maxelt = 0;
 
+	/* The storage order is specified for every aggregate type.  */
+	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
+
 	domain = TYPE_DOMAIN (type);
 	const_bounds_p = (TYPE_MIN_VALUE (domain)
 			  && TYPE_MAX_VALUE (domain)
@@ -6290,7 +6330,7 @@  store_constructor (tree exp, rtx target,
 
 			store_constructor_field
 			  (target, bitsize, bitpos, mode, value, cleared,
-			   get_alias_set (elttype));
+			   get_alias_set (elttype), reverse);
 		      }
 		  }
 		else
@@ -6305,7 +6345,7 @@  store_constructor (tree exp, rtx target,
 					VAR_DECL, NULL_TREE, domain);
 		    index_r = gen_reg_rtx (promote_decl_mode (index, NULL));
 		    SET_DECL_RTL (index, index_r);
-		    store_expr (lo_index, index_r, 0, false);
+		    store_expr (lo_index, index_r, 0, false, reverse);
 
 		    /* Build the head of the loop.  */
 		    do_pending_stack_adjust ();
@@ -6330,9 +6370,9 @@  store_constructor (tree exp, rtx target,
 		    xtarget = adjust_address (xtarget, mode, 0);
 		    if (TREE_CODE (value) == CONSTRUCTOR)
 		      store_constructor (value, xtarget, cleared,
-					 bitsize / BITS_PER_UNIT);
+					 bitsize / BITS_PER_UNIT, reverse);
 		    else
-		      store_expr (value, xtarget, 0, false);
+		      store_expr (value, xtarget, 0, false, reverse);
 
 		    /* Generate a conditional jump to exit the loop.  */
 		    exit_cond = build2 (LT_EXPR, integer_type_node,
@@ -6375,7 +6415,7 @@  store_constructor (tree exp, rtx target,
 					  expand_normal (position),
 					  highest_pow2_factor (position));
 		xtarget = adjust_address (xtarget, mode, 0);
-		store_expr (value, xtarget, 0, false);
+		store_expr (value, xtarget, 0, false, reverse);
 	      }
 	    else
 	      {
@@ -6393,7 +6433,8 @@  store_constructor (tree exp, rtx target,
 		    MEM_KEEP_ALIAS_SET_P (target) = 1;
 		  }
 		store_constructor_field (target, bitsize, bitpos, mode, value,
-					 cleared, get_alias_set (elttype));
+					 cleared, get_alias_set (elttype),
+					 reverse);
 	      }
 	  }
 	break;
@@ -6526,7 +6567,7 @@  store_constructor (tree exp, rtx target,
 		  : eltmode;
 		bitpos = eltpos * elt_size;
 		store_constructor_field (target, bitsize, bitpos, value_mode,
-					 value, cleared, alias);
+					 value, cleared, alias, reverse);
 	      }
 	  }
 
@@ -6559,14 +6600,16 @@  store_constructor (tree exp, rtx target,
    (in general) be different from that for TARGET, since TARGET is a
    reference to the containing structure.
 
-   If NONTEMPORAL is true, try generating a nontemporal store.  */
+   If NONTEMPORAL is true, try generating a nontemporal store.
+
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
 	     unsigned HOST_WIDE_INT bitregion_start,
 	     unsigned HOST_WIDE_INT bitregion_end,
 	     machine_mode mode, tree exp,
-	     alias_set_type alias_set, bool nontemporal)
+	     alias_set_type alias_set, bool nontemporal,  bool reverse)
 {
   if (TREE_CODE (exp) == ERROR_MARK)
     return const0_rtx;
@@ -6581,7 +6624,7 @@  store_field (rtx target, HOST_WIDE_INT b
       /* We're storing into a struct containing a single __complex.  */
 
       gcc_assert (!bitpos);
-      return store_expr (exp, target, 0, nontemporal);
+      return store_expr (exp, target, 0, nontemporal, reverse);
     }
 
   /* If the structure is in a register or if the component
@@ -6643,16 +6686,27 @@  store_field (rtx target, HOST_WIDE_INT b
 
       temp = expand_normal (exp);
 
-      /* If BITSIZE is narrower than the size of the type of EXP
-	 we will be narrowing TEMP.  Normally, what's wanted are the
-	 low-order bits.  However, if EXP's type is a record and this is
-	 big-endian machine, we want the upper BITSIZE bits.  */
-      if (BYTES_BIG_ENDIAN && GET_MODE_CLASS (GET_MODE (temp)) == MODE_INT
-	  && bitsize < (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (temp))
-	  && TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
-	temp = expand_shift (RSHIFT_EXPR, GET_MODE (temp), temp,
-			     GET_MODE_BITSIZE (GET_MODE (temp)) - bitsize,
-			     NULL_RTX, 1);
+      /* If the value has a record type and an integral mode then, if BITSIZE
+	 is narrower than this mode and this is a big-endian machine, we must
+	 first put the value into the low-order bits.  Moreover, the field may
+	 be not aligned on a byte boundary; in this case, if it has reverse
+	 storage order, it needs to be accessed as a scalar field with reverse
+	 storage order and we must first put the value into target order.  */
+      if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE
+	  && GET_MODE_CLASS (GET_MODE (temp)) == MODE_INT)
+	{
+	  HOST_WIDE_INT size = GET_MODE_BITSIZE (GET_MODE (temp));
+
+	  reverse = TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (exp));
+
+	  if (reverse)
+	    temp = flip_storage_order (GET_MODE (temp), temp);
+
+	  if (bitsize < size
+	      && reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+	    temp = expand_shift (RSHIFT_EXPR, GET_MODE (temp), temp,
+				 size - bitsize, NULL_RTX, 1);
+	}
 
       /* Unless MODE is VOIDmode or BLKmode, convert TEMP to MODE.  */
       if (mode != VOIDmode && mode != BLKmode
@@ -6712,7 +6766,7 @@  store_field (rtx target, HOST_WIDE_INT b
 	      temp_target = gen_reg_rtx (mode);
 	      temp_target
 	        = extract_bit_field (temp, size * BITS_PER_UNIT, 0, 1,
-				     temp_target, mode, mode);
+				     temp_target, mode, mode, false);
 	      temp = temp_target;
 	    }
 	}
@@ -6720,7 +6774,7 @@  store_field (rtx target, HOST_WIDE_INT b
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
 		       bitregion_start, bitregion_end,
-		       mode, temp);
+		       mode, temp, reverse);
 
       return const0_rtx;
     }
@@ -6735,7 +6789,7 @@  store_field (rtx target, HOST_WIDE_INT b
       if (!MEM_KEEP_ALIAS_SET_P (to_rtx) && MEM_ALIAS_SET (to_rtx) != 0)
 	set_mem_alias_set (to_rtx, alias_set);
 
-      return store_expr (exp, to_rtx, 0, nontemporal);
+      return store_expr (exp, to_rtx, 0, nontemporal, reverse);
     }
 }
 
@@ -6744,7 +6798,8 @@  store_field (rtx target, HOST_WIDE_INT b
    codes and find the ultimate containing object, which we return.
 
    We set *PBITSIZE to the size in bits that we want, *PBITPOS to the
-   bit position, and *PUNSIGNEDP to the signedness of the field.
+   bit position, *PUNSIGNEDP to the signedness and *PREVERSEP to the
+   storage order of the field.
    If the position of the field is variable, we store a tree
    giving the variable offset (in units) in *POFFSET.
    This offset is in addition to the bit position.
@@ -6778,7 +6833,7 @@  tree
 get_inner_reference (tree exp, HOST_WIDE_INT *pbitsize,
 		     HOST_WIDE_INT *pbitpos, tree *poffset,
 		     machine_mode *pmode, int *punsignedp,
-		     int *pvolatilep, bool keep_aligning)
+		     int *preversep, int *pvolatilep, bool keep_aligning)
 {
   tree size_tree = 0;
   machine_mode mode = VOIDmode;
@@ -6786,8 +6841,8 @@  get_inner_reference (tree exp, HOST_WIDE
   tree offset = size_zero_node;
   offset_int bit_offset = 0;
 
-  /* First get the mode, signedness, and size.  We do this from just the
-     outermost expression.  */
+  /* First get the mode, signedness, storage order and size.  We do this from
+     just the outermost expression.  */
   *pbitsize = -1;
   if (TREE_CODE (exp) == COMPONENT_REF)
     {
@@ -6840,6 +6895,8 @@  get_inner_reference (tree exp, HOST_WIDE
 	*pbitsize = tree_to_uhwi (size_tree);
     }
 
+  *preversep = reverse_storage_order_for_component_p (exp);
+
   /* Compute cumulative bit-offset for nested component-refs and array-refs,
      and find the ultimate containing object.  */
   while (1)
@@ -7516,7 +7573,7 @@  expand_expr_addr_expr_1 (tree exp, rtx t
   rtx result, subtarget;
   tree inner, offset;
   HOST_WIDE_INT bitsize, bitpos;
-  int volatilep, unsignedp;
+  int unsignedp, reversep, volatilep = 0;
   machine_mode mode1;
 
   /* If we are taking the address of a constant and are at the top level,
@@ -7623,8 +7680,8 @@  expand_expr_addr_expr_1 (tree exp, rtx t
 	 handle "aligning nodes" here: we can just bypass them because
 	 they won't change the final object whose address will be returned
 	 (they actually exist only for that purpose).  */
-      inner = get_inner_reference (exp, &bitsize, &bitpos, &offset,
-				   &mode1, &unsignedp, &volatilep, false);
+      inner = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
+				   &unsignedp, &reversep, &volatilep, false);
       break;
     }
 
@@ -7808,7 +7865,7 @@  expand_constructor (tree exp, rtx target
       target = assign_temp (type, TREE_ADDRESSABLE (exp), 1);
     }
 
-  store_constructor (exp, target, 0, int_expr_size (exp));
+  store_constructor (exp, target, 0, int_expr_size (exp), false);
   return target;
 }
 
@@ -8081,11 +8138,12 @@  expand_expr_real_2 (sepops ops, rtx targ
 	    store_expr (treeop0,
 			adjust_address (target, TYPE_MODE (valtype), 0),
 			modifier == EXPAND_STACK_PARM,
-			false);
+			false, TYPE_REVERSE_STORAGE_ORDER (type));
 
 	  else
 	    {
-	      gcc_assert (REG_P (target));
+	      gcc_assert (REG_P (target)
+			  && !TYPE_REVERSE_STORAGE_ORDER (type));
 
 	      /* Store this field into a union of the proper type.  */
 	      store_field (target,
@@ -8093,7 +8151,8 @@  expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0, false);
+			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0,
+			   false, false);
 	    }
 
 	  /* Return the entire union.  */
@@ -9122,7 +9181,7 @@  expand_expr_real_2 (sepops ops, rtx targ
         int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
 	int bitsize = GET_MODE_UNIT_BITSIZE (vec_mode);
         temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
-				  target, mode, mode);
+				  target, mode, mode, false);
         gcc_assert (temp);
         return temp;
       }
@@ -9278,14 +9337,14 @@  expand_expr_real_2 (sepops ops, rtx targ
 	jumpifnot (treeop0, lab0, -1);
 	store_expr (treeop1, temp,
 		    modifier == EXPAND_STACK_PARM,
-		    false);
+		    false, false);
 
 	emit_jump_insn (targetm.gen_jump (lab1));
 	emit_barrier ();
 	emit_label (lab0);
 	store_expr (treeop2, temp,
 		    modifier == EXPAND_STACK_PARM,
-		    false);
+		    false, false);
 
 	emit_label (lab1);
 	OK_DEFER_POP;
@@ -9838,6 +9897,7 @@  expand_expr_real_1 (tree exp, rtx target
 
     case MEM_REF:
       {
+	const bool reverse = REF_REVERSE_STORAGE_ORDER (exp);
 	addr_space_t as
 	  = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0))));
 	machine_mode address_mode;
@@ -9852,6 +9912,7 @@  expand_expr_real_1 (tree exp, rtx target
 	    HOST_WIDE_INT offset = mem_ref_offset (exp).to_short_addr ();
 	    base = TREE_OPERAND (base, 0);
 	    if (offset == 0
+	        && !reverse
 		&& tree_fits_uhwi_p (TYPE_SIZE (type))
 		&& (GET_MODE_BITSIZE (DECL_MODE (base))
 		    == tree_to_uhwi (TYPE_SIZE (type))))
@@ -9861,13 +9922,14 @@  expand_expr_real_1 (tree exp, rtx target
 	      {
 		temp = assign_stack_temp (DECL_MODE (base),
 					  GET_MODE_SIZE (DECL_MODE (base)));
-		store_expr (base, temp, 0, false);
+		store_expr (base, temp, 0, false, false);
 		temp = adjust_address (temp, BLKmode, offset);
 		set_mem_size (temp, int_size_in_bytes (type));
 		return temp;
 	      }
 	    exp = build3 (BIT_FIELD_REF, type, base, TYPE_SIZE (type),
 			  bitsize_int (offset * BITS_PER_UNIT));
+	    REF_REVERSE_STORAGE_ORDER (exp) = reverse;
 	    return expand_expr (exp, target, tmode, modifier);
 	  }
 	address_mode = targetm.addr_space.address_mode (as);
@@ -9917,8 +9979,12 @@  expand_expr_real_1 (tree exp, rtx target
 					0, TYPE_UNSIGNED (TREE_TYPE (exp)),
 					(modifier == EXPAND_STACK_PARM
 					 ? NULL_RTX : target),
-					mode, mode);
+					mode, mode, false);
 	  }
+	if (reverse
+	    && modifier != EXPAND_MEMORY
+	    && modifier != EXPAND_WRITE)
+	  temp = flip_storage_order (mode, temp);
 	return temp;
       }
 
@@ -10124,9 +10190,10 @@  expand_expr_real_1 (tree exp, rtx target
 	machine_mode mode1, mode2;
 	HOST_WIDE_INT bitsize, bitpos;
 	tree offset;
-	int volatilep = 0, must_force_mem;
-	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
-					&mode1, &unsignedp, &volatilep, true);
+	int reversep, volatilep = 0, must_force_mem;
+	tree tem
+	  = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
+				 &unsignedp, &reversep, &volatilep, true);
 	rtx orig_op0, memloc;
 	bool clear_mem_expr = false;
 
@@ -10181,7 +10248,11 @@  expand_expr_real_1 (tree exp, rtx target
 	  {
 	    if (bitpos == 0
 		&& bitsize == GET_MODE_BITSIZE (GET_MODE (op0)))
-	      return op0;
+	      {
+		if (reversep)
+		  op0 = flip_storage_order (GET_MODE (op0), op0);
+		return op0;
+	      }
 	    if (bitpos == 0
 		&& bitsize == GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 0)))
 		&& bitsize)
@@ -10367,20 +10438,38 @@  expand_expr_real_1 (tree exp, rtx target
 	    if (MEM_P (op0) && REG_P (XEXP (op0, 0)))
 	      mark_reg_pointer (XEXP (op0, 0), MEM_ALIGN (op0));
 
+	    /* If the result has a record type and the extraction is done in
+	       an integral mode, then the field may be not aligned on a byte
+	       boundary; in this case, if it has reverse storage order, it
+	       needs to be extracted as a scalar field with reverse storage
+	       order and put back into memory order afterwards.  */
+	    if (TREE_CODE (type) == RECORD_TYPE
+		&& GET_MODE_CLASS (ext_mode) == MODE_INT)
+	      reversep = TYPE_REVERSE_STORAGE_ORDER (type);
+
 	    op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp,
 				     (modifier == EXPAND_STACK_PARM
 				      ? NULL_RTX : target),
-				     ext_mode, ext_mode);
+				     ext_mode, ext_mode, reversep);
+
+	    /* If the result has a record type and the mode of OP0 is an
+	       integral mode then, if BITSIZE is narrower than this mode
+	       and this is a big-endian machine, we must put the field
+	       into the high-order bits.  And we must also put it back
+	       into memory order if it has been previously reversed.  */
+	    if (TREE_CODE (type) == RECORD_TYPE
+		&& GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT)
+	      {
+		HOST_WIDE_INT size = GET_MODE_BITSIZE (GET_MODE (op0));
 
-	    /* If the result is a record type and BITSIZE is narrower than
-	       the mode of OP0, an integral mode, and this is a big endian
-	       machine, we must put the field into the high-order bits.  */
-	    if (TREE_CODE (type) == RECORD_TYPE && BYTES_BIG_ENDIAN
-		&& GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
-		&& bitsize < (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (op0)))
-	      op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
-				  GET_MODE_BITSIZE (GET_MODE (op0))
-				  - bitsize, op0, 1);
+		if (bitsize < size
+		    && reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+		  op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
+				      size - bitsize, op0, 1);
+
+		if (reversep)
+		  op0 = flip_storage_order (GET_MODE (op0), op0);
+	      }
 
 	    /* If the result type is BLKmode, store the data into a temporary
 	       of the appropriate type, but with the mode corresponding to the
@@ -10426,6 +10515,12 @@  expand_expr_real_1 (tree exp, rtx target
 	  set_mem_expr (op0, NULL_TREE);
 
 	MEM_VOLATILE_P (op0) |= volatilep;
+
+        if (reversep
+	    && modifier != EXPAND_MEMORY
+	    && modifier != EXPAND_WRITE)
+	  op0 = flip_storage_order (mode1, op0);
+
 	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
 	    || modifier == EXPAND_CONST_ADDRESS
 	    || modifier == EXPAND_INITIALIZER)
@@ -10489,17 +10584,16 @@  expand_expr_real_1 (tree exp, rtx target
 	machine_mode mode1;
 	HOST_WIDE_INT bitsize, bitpos;
 	tree offset;
-	int unsignedp;
-	int volatilep = 0;
+	int unsignedp, reversep, volatilep = 0;
 	tree tem
-	  = get_inner_reference (treeop0, &bitsize, &bitpos,
-				 &offset, &mode1, &unsignedp, &volatilep,
-				 true);
+	  = get_inner_reference (treeop0, &bitsize, &bitpos, &offset, &mode1,
+				 &unsignedp, &reversep, &volatilep, true);
 	rtx orig_op0;
 
 	/* ??? We should work harder and deal with non-zero offsets.  */
 	if (!offset
 	    && (bitpos % BITS_PER_UNIT) == 0
+	    && !reversep
 	    && bitsize >= 0
 	    && compare_tree_int (TYPE_SIZE (type), bitsize) == 0)
 	  {
@@ -10573,7 +10667,7 @@  expand_expr_real_1 (tree exp, rtx target
       else if (reduce_bit_field)
 	return extract_bit_field (op0, TYPE_PRECISION (type), 0,
 				  TYPE_UNSIGNED (type), NULL_RTX,
-				  mode, mode);
+				  mode, mode, false);
       /* As a last resort, spill op0 to memory, and reload it in a
 	 different mode.  */
       else if (!MEM_P (op0))
Index: expr.h
===================================================================
--- expr.h	(.../trunk/gcc)	(revision 228112)
+++ expr.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -212,6 +212,7 @@  extern rtx_insn *emit_move_complex_push
 extern rtx_insn *emit_move_complex_parts (rtx, rtx);
 extern rtx read_complex_part (rtx, bool);
 extern void write_complex_part (rtx, rtx, bool);
+extern rtx read_complex_part (rtx, bool);
 extern rtx emit_move_resolve_push (machine_mode, rtx);
 
 /* Push a block of length SIZE (perhaps variable)
@@ -229,8 +230,8 @@  extern void expand_assignment (tree, tre
    and storing the value into TARGET.
    If SUGGEST_REG is nonzero, copy the value through a register
    and return that register, if that is possible.  */
-extern rtx store_expr_with_bounds (tree, rtx, int, bool, tree);
-extern rtx store_expr (tree, rtx, int, bool);
+extern rtx store_expr_with_bounds (tree, rtx, int, bool, bool, tree);
+extern rtx store_expr (tree, rtx, int, bool, bool);
 
 /* Given an rtx that may include add and multiply operations,
    generate them as insns and return a pseudo-reg containing the value.
Index: stor-layout.c
===================================================================
--- stor-layout.c	(.../trunk/gcc)	(revision 228112)
+++ stor-layout.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -2046,11 +2046,16 @@  finish_record_layout (record_layout_info
   /* Compute bitfield representatives.  */
   finish_bitfield_layout (rli->t);
 
-  /* Propagate TYPE_PACKED to variants.  With C++ templates,
-     handle_packed_attribute is too early to do this.  */
+  /* Propagate TYPE_PACKED and TYPE_REVERSE_STORAGE_ORDER to variants.
+     With C++ templates, it is too early to do this when the attribute
+     is being parsed.  */
   for (variant = TYPE_NEXT_VARIANT (rli->t); variant;
        variant = TYPE_NEXT_VARIANT (variant))
-    TYPE_PACKED (variant) = TYPE_PACKED (rli->t);
+    {
+      TYPE_PACKED (variant) = TYPE_PACKED (rli->t);
+      TYPE_REVERSE_STORAGE_ORDER (variant)
+	= TYPE_REVERSE_STORAGE_ORDER (rli->t);
+    }
 
   /* Lay out any static members.  This is done now because their type
      may use the record's type.  */
Index: gimplify.c
===================================================================
--- gimplify.c	(.../trunk/gcc)	(revision 228112)
+++ gimplify.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -8256,6 +8256,8 @@  gimplify_expr (tree *expr_p, gimple_seq
 			     TREE_OPERAND (*expr_p, 1));
 	  if (tmp)
 	    {
+	      REF_REVERSE_STORAGE_ORDER (tmp)
+	        = REF_REVERSE_STORAGE_ORDER (*expr_p);
 	      *expr_p = tmp;
 	      recalculate_side_effects (*expr_p);
 	      ret = GS_OK;
Index: calls.c
===================================================================
--- calls.c	(.../trunk/gcc)	(revision 228112)
+++ calls.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -1086,7 +1086,7 @@  store_unaligned_arguments_into_pseudos (
 
 	    args[i].aligned_regs[j] = reg;
 	    word = extract_bit_field (word, bitsize, 0, 1, NULL_RTX,
-				      word_mode, word_mode);
+				      word_mode, word_mode, false);
 
 	    /* There is no need to restrict this code to loading items
 	       in TYPE_ALIGN sized hunks.  The bitfield instructions can
@@ -1103,7 +1103,7 @@  store_unaligned_arguments_into_pseudos (
 
 	    bytes -= bitsize / BITS_PER_UNIT;
 	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
-			     word_mode, word);
+			     word_mode, word, false);
 	  }
       }
 }
@@ -1384,7 +1384,7 @@  initialize_argument_information (int num
 	      else
 		copy = assign_temp (type, 1, 0);
 
-	      store_expr (args[i].tree_value, copy, 0, false);
+	      store_expr (args[i].tree_value, copy, 0, false, false);
 
 	      /* Just change the const function to pure and then let
 		 the next test clear the pure based on
@@ -2105,8 +2105,8 @@  load_register_parameters (struct arg_dat
 		  rtx dest = gen_rtx_REG (word_mode, REGNO (reg) + nregs - 1);
 		  unsigned int bitoff = (nregs - 1) * BITS_PER_WORD;
 		  unsigned int bitsize = size * BITS_PER_UNIT - bitoff;
-		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1,
-					     dest, word_mode, word_mode);
+		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1, dest,
+					     word_mode, word_mode, false);
 		  if (BYTES_BIG_ENDIAN)
 		    x = expand_shift (LSHIFT_EXPR, word_mode, x,
 				      BITS_PER_WORD - bitsize, dest, 1);
Index: expmed.c
===================================================================
--- expmed.c	(.../trunk/gcc)	(revision 228112)
+++ expmed.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -57,24 +57,24 @@  static void store_fixed_bit_field (rtx,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   rtx);
+				   rtx, bool);
 static void store_fixed_bit_field_1 (rtx, unsigned HOST_WIDE_INT,
 				     unsigned HOST_WIDE_INT,
-				     rtx);
+				     rtx, bool);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   rtx);
+				   rtx, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
-				    unsigned HOST_WIDE_INT, rtx, int);
+				    unsigned HOST_WIDE_INT, rtx, int, bool);
 static rtx extract_fixed_bit_field_1 (machine_mode, rtx,
 				      unsigned HOST_WIDE_INT,
-				      unsigned HOST_WIDE_INT, rtx, int);
+				      unsigned HOST_WIDE_INT, rtx, int, bool);
 static rtx lshift_value (machine_mode, unsigned HOST_WIDE_INT, int);
 static rtx extract_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				    unsigned HOST_WIDE_INT, int);
+				    unsigned HOST_WIDE_INT, int, bool);
 static void do_cmp_and_jump (rtx, rtx, enum rtx_code, machine_mode, rtx_code_label *);
 static rtx expand_smod_pow2 (machine_mode, rtx, HOST_WIDE_INT);
 static rtx expand_sdiv_pow2 (machine_mode, rtx, HOST_WIDE_INT);
@@ -332,6 +332,94 @@  negate_rtx (machine_mode mode, rtx x)
   return result;
 }
 
+/* Whether reverse storage order is supported on the target.  */
+static int reverse_storage_order_supported = -1;
+
+/* Check whether reverse storage order is supported on the target.  */
+
+static void
+check_reverse_storage_order_support (void)
+{
+  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+    {
+      reverse_storage_order_supported = 0;
+      sorry ("reverse scalar storage order");
+    }
+  else
+    reverse_storage_order_supported = 1;
+}
+
+/* Whether reverse FP storage order is supported on the target.  */
+static int reverse_float_storage_order_supported = -1;
+
+/* Check whether reverse FP storage order is supported on the target.  */
+
+static void
+check_reverse_float_storage_order_support (void)
+{
+  if (FLOAT_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+    {
+      reverse_float_storage_order_supported = 0;
+      sorry ("reverse floating-point scalar storage order");
+    }
+  else
+    reverse_float_storage_order_supported = 1;
+}
+
+/* Return an rtx representing value of X with reverse storage order.
+   MODE is the intended mode of the result,
+   useful if X is a CONST_INT.  */
+
+rtx
+flip_storage_order (enum machine_mode mode, rtx x)
+{
+  enum machine_mode int_mode;
+  rtx result;
+
+  if (mode == QImode)
+    return x;
+
+  if (COMPLEX_MODE_P (mode))
+    {
+      rtx real = read_complex_part (x, false);
+      rtx imag = read_complex_part (x, true);
+
+      real = flip_storage_order (GET_MODE_INNER (mode), real);
+      imag = flip_storage_order (GET_MODE_INNER (mode), imag);
+
+      return gen_rtx_CONCAT (mode, real, imag);
+    }
+
+  if (__builtin_expect (reverse_storage_order_supported < 0, 0))
+    check_reverse_storage_order_support ();
+
+  if (SCALAR_INT_MODE_P (mode))
+    int_mode = mode;
+  else
+    {
+      if (FLOAT_MODE_P (mode)
+	  && __builtin_expect (reverse_float_storage_order_supported < 0, 0))
+	check_reverse_float_storage_order_support ();
+
+      int_mode = mode_for_size (GET_MODE_PRECISION (mode), MODE_INT, 0);
+      if (int_mode == BLKmode)
+	{
+	  sorry ("reverse storage order for %smode", GET_MODE_NAME (mode));
+	  return x;
+	}
+      x = gen_lowpart (int_mode, x);
+    }
+
+  result = simplify_unary_operation (BSWAP, int_mode, x, int_mode);
+  if (result == 0)
+    result = expand_unop (int_mode, bswap_optab, x, NULL_RTX, 1);
+
+  if (int_mode != mode)
+    result = gen_lowpart (mode, result);
+
+  return result;
+}
+
 /* Adjust bitfield memory MEM so that it points to the first unit of mode
    MODE that contains a bitfield of size BITSIZE at bit position BITNUM.
    If MODE is BLKmode, return a reference to every byte in the bitfield.
@@ -635,7 +723,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 		   unsigned HOST_WIDE_INT bitregion_start,
 		   unsigned HOST_WIDE_INT bitregion_end,
 		   machine_mode fieldmode,
-		   rtx value, bool fallback_p)
+		   rtx value, bool reverse, bool fallback_p)
 {
   rtx op0 = str_rtx;
   rtx orig_value;
@@ -713,6 +801,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0);
 	  if (sub)
 	    {
+	      if (reverse)
+		sub = flip_storage_order (GET_MODE (op0), sub);
 	      emit_move_insn (op0, sub);
 	      return true;
 	    }
@@ -723,6 +813,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
 				     bitnum / BITS_PER_UNIT);
 	  if (sub)
 	    {
+	      if (reverse)
+		value = flip_storage_order (fieldmode, value);
 	      emit_move_insn (sub, value);
 	      return true;
 	    }
@@ -735,6 +827,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (simple_mem_bitfield_p (op0, bitsize, bitnum, fieldmode))
     {
       op0 = adjust_bitfield_address (op0, fieldmode, bitnum / BITS_PER_UNIT);
+      if (reverse)
+	value = flip_storage_order (fieldmode, value);
       emit_move_insn (op0, value);
       return true;
     }
@@ -761,6 +855,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
      can be done with a movstrict instruction.  */
 
   if (!MEM_P (op0)
+      && !reverse
       && lowpart_bit_field_p (bitnum, bitsize, GET_MODE (op0))
       && bitsize == GET_MODE_BITSIZE (fieldmode)
       && optab_handler (movstrict_optab, fieldmode) != CODE_FOR_nothing)
@@ -804,7 +899,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	 be less than full.
 	 However, only do that if the value is not BLKmode.  */
 
-      unsigned int backwards = WORDS_BIG_ENDIAN && fieldmode != BLKmode;
+      const bool backwards = WORDS_BIG_ENDIAN && fieldmode != BLKmode;
       unsigned int nwords = (bitsize + (BITS_PER_WORD - 1)) / BITS_PER_WORD;
       unsigned int i;
       rtx_insn *last;
@@ -827,7 +922,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 				  ? GET_MODE_SIZE (fieldmode) / UNITS_PER_WORD
 				  - i - 1
 				  : i);
-	  unsigned int bit_offset = (backwards
+	  unsigned int bit_offset = (backwards ^ reverse
 				     ? MAX ((int) bitsize - ((int) i + 1)
 					    * BITS_PER_WORD,
 					    0)
@@ -851,7 +946,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 				  bitnum + bit_offset,
 				  bitregion_start, bitregion_end,
 				  word_mode,
-				  value_word, fallback_p))
+				  value_word, reverse, fallback_p))
 	    {
 	      delete_insns_since (last);
 	      return false;
@@ -887,7 +982,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	    return false;
 
 	  store_split_bit_field (op0, bitsize, bitnum, bitregion_start,
-				 bitregion_end, value);
+				 bitregion_end, value, reverse);
 	  return true;
 	}
     }
@@ -898,6 +993,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
   extraction_insn insv;
   if (!MEM_P (op0)
+      && !reverse
       && get_best_reg_extraction_insn (&insv, EP_insv,
 				       GET_MODE_BITSIZE (GET_MODE (op0)),
 				       fieldmode)
@@ -906,7 +1002,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
   /* If OP0 is a memory, try copying it to a register and seeing if a
      cheap register alternative is available.  */
-  if (MEM_P (op0))
+  if (MEM_P (op0) && !reverse)
     {
       if (get_best_mem_extraction_insn (&insv, EP_insv, bitsize, bitnum,
 					fieldmode)
@@ -926,7 +1022,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  rtx tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, bitpos,
 				 bitregion_start, bitregion_end,
-				 fieldmode, orig_value, false))
+				 fieldmode, orig_value, reverse, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
 	      return true;
@@ -939,7 +1035,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
     return false;
 
   store_fixed_bit_field (op0, bitsize, bitnum, bitregion_start,
-			 bitregion_end, value);
+			 bitregion_end, value, reverse);
   return true;
 }
 
@@ -952,7 +1048,9 @@  store_bit_field_1 (rtx str_rtx, unsigned
    These two fields are 0, if the C++ memory model does not apply,
    or we are not interested in keeping track of bitfield regions.
 
-   FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
+   FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
+
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
@@ -960,7 +1058,7 @@  store_bit_field (rtx str_rtx, unsigned H
 		 unsigned HOST_WIDE_INT bitregion_start,
 		 unsigned HOST_WIDE_INT bitregion_end,
 		 machine_mode fieldmode,
-		 rtx value)
+		 rtx value, bool reverse)
 {
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
   if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, fieldmode,
@@ -974,6 +1072,8 @@  store_bit_field (rtx str_rtx, unsigned H
 	{
 	  str_rtx = adjust_bitfield_address (str_rtx, fieldmode,
 					     bitnum / BITS_PER_UNIT);
+	  if (reverse)
+	    value = flip_storage_order (fieldmode, value);
 	  gcc_assert (bitnum % BITS_PER_UNIT == 0);
 	  emit_move_insn (str_rtx, value);
 	}
@@ -986,7 +1086,7 @@  store_bit_field (rtx str_rtx, unsigned H
 	  gcc_assert (bitnum + bitsize <= GET_MODE_BITSIZE (fieldmode));
 	  temp = copy_to_reg (str_rtx);
 	  if (!store_bit_field_1 (temp, bitsize, bitnum, 0, 0,
-				  fieldmode, value, true))
+				  fieldmode, value, reverse, true))
 	    gcc_unreachable ();
 
 	  emit_move_insn (str_rtx, temp);
@@ -1019,19 +1119,21 @@  store_bit_field (rtx str_rtx, unsigned H
 
   if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
 			  bitregion_start, bitregion_end,
-			  fieldmode, value, true))
+			  fieldmode, value, reverse, true))
     gcc_unreachable ();
 }
 
 /* Use shifts and boolean operations to store VALUE into a bit field of
-   width BITSIZE in OP0, starting at bit BITNUM.  */
+   width BITSIZE in OP0, starting at bit BITNUM.
+
+   If REVERSE is true, the store is to be done in reverse order.  */
 
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitnum,
 		       unsigned HOST_WIDE_INT bitregion_start,
 		       unsigned HOST_WIDE_INT bitregion_end,
-		       rtx value)
+		       rtx value, bool reverse)
 {
   /* There is a case not handled here:
      a structure with a known alignment of just a halfword
@@ -1054,14 +1156,14 @@  store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitnum, bitregion_start,
-				 bitregion_end, value);
+				 bitregion_end, value, reverse);
 	  return;
 	}
 
       op0 = narrow_bit_field_mem (op0, mode, bitsize, bitnum, &bitnum);
     }
 
-  store_fixed_bit_field_1 (op0, bitsize, bitnum, value);
+  store_fixed_bit_field_1 (op0, bitsize, bitnum, value, reverse);
 }
 
 /* Helper function for store_fixed_bit_field, stores
@@ -1070,7 +1172,7 @@  store_fixed_bit_field (rtx op0, unsigned
 static void
 store_fixed_bit_field_1 (rtx op0, unsigned HOST_WIDE_INT bitsize,
 			 unsigned HOST_WIDE_INT bitnum,
-			 rtx value)
+			 rtx value, bool reverse)
 {
   machine_mode mode;
   rtx temp;
@@ -1083,7 +1185,7 @@  store_fixed_bit_field_1 (rtx op0, unsign
   /* Note that bitsize + bitnum can be greater than GET_MODE_BITSIZE (mode)
      for invalid input, such as f5 from gcc.dg/pr48335-2.c.  */
 
-  if (BYTES_BIG_ENDIAN)
+  if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     /* BITNUM is the distance between our msb
        and that of the containing datum.
        Convert it to the distance from the lsb.  */
@@ -1129,6 +1231,9 @@  store_fixed_bit_field_1 (rtx op0, unsign
 			      bitnum, NULL_RTX, 1);
     }
 
+  if (reverse)
+    value = flip_storage_order (mode, value);
+
   /* Now clear the chosen bits in OP0,
      except that if VALUE is -1 we need not bother.  */
   /* We keep the intermediates in registers to allow CSE to combine
@@ -1138,8 +1243,10 @@  store_fixed_bit_field_1 (rtx op0, unsign
 
   if (! all_one)
     {
-      temp = expand_binop (mode, and_optab, temp,
-			   mask_rtx (mode, bitnum, bitsize, 1),
+      rtx mask = mask_rtx (mode, bitnum, bitsize, 1);
+      if (reverse)
+	mask = flip_storage_order (mode, mask);
+      temp = expand_binop (mode, and_optab, temp, mask,
 			   NULL_RTX, 1, OPTAB_LIB_WIDEN);
       temp = force_reg (mode, temp);
     }
@@ -1167,6 +1274,8 @@  store_fixed_bit_field_1 (rtx op0, unsign
    (within the word).
    VALUE is the value to store.
 
+   If REVERSE is true, the store is to be done in reverse order.
+
    This does not yet handle fields wider than BITS_PER_WORD.  */
 
 static void
@@ -1174,10 +1283,9 @@  store_split_bit_field (rtx op0, unsigned
 		       unsigned HOST_WIDE_INT bitpos,
 		       unsigned HOST_WIDE_INT bitregion_start,
 		       unsigned HOST_WIDE_INT bitregion_end,
-		       rtx value)
+		       rtx value, bool reverse)
 {
-  unsigned int unit;
-  unsigned int bitsdone = 0;
+  unsigned int unit, total_bits, bitsdone = 0;
 
   /* Make sure UNIT isn't larger than BITS_PER_WORD, we can only handle that
      much at a time.  */
@@ -1208,12 +1316,14 @@  store_split_bit_field (rtx op0, unsigned
 					       : word_mode, value));
     }
 
+  total_bits = GET_MODE_BITSIZE (GET_MODE (value));
+
   while (bitsdone < bitsize)
     {
       unsigned HOST_WIDE_INT thissize;
-      rtx part, word;
       unsigned HOST_WIDE_INT thispos;
       unsigned HOST_WIDE_INT offset;
+      rtx part, word;
 
       offset = (bitpos + bitsdone) / unit;
       thispos = (bitpos + bitsdone) % unit;
@@ -1238,13 +1348,18 @@  store_split_bit_field (rtx op0, unsigned
       thissize = MIN (bitsize - bitsdone, BITS_PER_WORD);
       thissize = MIN (thissize, unit - thispos);
 
-      if (BYTES_BIG_ENDIAN)
+      if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  /* Fetch successively less significant portions.  */
 	  if (CONST_INT_P (value))
 	    part = GEN_INT (((unsigned HOST_WIDE_INT) (INTVAL (value))
 			     >> (bitsize - bitsdone - thissize))
 			    & (((HOST_WIDE_INT) 1 << thissize) - 1));
+          /* Likewise, but the source is little-endian.  */
+          else if (reverse)
+	    part = extract_fixed_bit_field (word_mode, value, thissize,
+					    bitsize - bitsdone - thissize,
+					    NULL_RTX, 1, false);
 	  else
 	    {
 	      int total_bits = GET_MODE_BITSIZE (GET_MODE (value));
@@ -1253,7 +1368,7 @@  store_split_bit_field (rtx op0, unsigned
 		 endianness compensation) to fetch the piece we want.  */
 	      part = extract_fixed_bit_field (word_mode, value, thissize,
 					      total_bits - bitsize + bitsdone,
-					      NULL_RTX, 1);
+					      NULL_RTX, 1, false);
 	    }
 	}
       else
@@ -1263,9 +1378,14 @@  store_split_bit_field (rtx op0, unsigned
 	    part = GEN_INT (((unsigned HOST_WIDE_INT) (INTVAL (value))
 			     >> bitsdone)
 			    & (((HOST_WIDE_INT) 1 << thissize) - 1));
+	  /* Likewise, but the source is big-endian.  */
+          else if (reverse)
+	    part = extract_fixed_bit_field (word_mode, value, thissize,
+					    total_bits - bitsdone - thissize,
+					    NULL_RTX, 1, false);
 	  else
 	    part = extract_fixed_bit_field (word_mode, value, thissize,
-					    bitsdone, NULL_RTX, 1);
+					    bitsdone, NULL_RTX, 1, false);
 	}
 
       /* If OP0 is a register, then handle OFFSET here.
@@ -1303,7 +1423,8 @@  store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, thissize, offset * unit + thispos,
-			       bitregion_start, bitregion_end, part);
+			       bitregion_start, bitregion_end, part,
+			       reverse);
       bitsdone += thissize;
     }
 }
@@ -1428,7 +1549,7 @@  static rtx
 extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		     unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
 		     machine_mode mode, machine_mode tmode,
-		     bool fallback_p)
+		     bool reverse, bool fallback_p)
 {
   rtx op0 = str_rtx;
   machine_mode int_mode;
@@ -1454,6 +1575,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
       && bitnum == 0
       && bitsize == GET_MODE_BITSIZE (GET_MODE (op0)))
     {
+      if (reverse)
+	op0 = flip_storage_order (mode, op0);
       /* We're trying to extract a full register from itself.  */
       return op0;
     }
@@ -1570,6 +1693,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
      as the least significant bit of the value is the least significant
      bit of either OP0 or a word of OP0.  */
   if (!MEM_P (op0)
+      && !reverse
       && lowpart_bit_field_p (bitnum, bitsize, GET_MODE (op0))
       && bitsize == GET_MODE_BITSIZE (mode1)
       && TRULY_NOOP_TRUNCATION_MODES_P (mode1, GET_MODE (op0)))
@@ -1585,6 +1709,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
   if (simple_mem_bitfield_p (op0, bitsize, bitnum, mode1))
     {
       op0 = adjust_bitfield_address (op0, mode1, bitnum / BITS_PER_UNIT);
+      if (reverse)
+	op0 = flip_storage_order (mode1, op0);
       return convert_extracted_bit_field (op0, mode, tmode, unsignedp);
     }
 
@@ -1597,7 +1723,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	 This is because the most significant word is the one which may
 	 be less than full.  */
 
-      unsigned int backwards = WORDS_BIG_ENDIAN;
+      const bool backwards = WORDS_BIG_ENDIAN;
       unsigned int nwords = (bitsize + (BITS_PER_WORD - 1)) / BITS_PER_WORD;
       unsigned int i;
       rtx_insn *last;
@@ -1624,7 +1750,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	       ? GET_MODE_SIZE (GET_MODE (target)) / UNITS_PER_WORD - i - 1
 	       : i);
 	  /* Offset from start of field in OP0.  */
-	  unsigned int bit_offset = (backwards
+	  unsigned int bit_offset = (backwards ^ reverse
 				     ? MAX ((int) bitsize - ((int) i + 1)
 					    * BITS_PER_WORD,
 					    0)
@@ -1634,7 +1760,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	    = extract_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					     bitsize - i * BITS_PER_WORD),
 				   bitnum + bit_offset, 1, target_part,
-				   mode, word_mode, fallback_p);
+				   mode, word_mode, reverse, fallback_p);
 
 	  gcc_assert (target_part);
 	  if (!result_part)
@@ -1684,7 +1810,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	{
 	  if (!fallback_p)
 	    return NULL_RTX;
-	  target = extract_split_bit_field (op0, bitsize, bitnum, unsignedp);
+	  target = extract_split_bit_field (op0, bitsize, bitnum, unsignedp,
+					    reverse);
 	  return convert_extracted_bit_field (target, mode, tmode, unsignedp);
 	}
     }
@@ -1694,6 +1821,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
   enum extraction_pattern pattern = unsignedp ? EP_extzv : EP_extv;
   extraction_insn extv;
   if (!MEM_P (op0)
+      && !reverse
       /* ??? We could limit the structure size to the part of OP0 that
 	 contains the field, with appropriate checks for endianness
 	 and TRULY_NOOP_TRUNCATION.  */
@@ -1710,7 +1838,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 
   /* If OP0 is a memory, try copying it to a register and seeing if a
      cheap register alternative is available.  */
-  if (MEM_P (op0))
+  if (MEM_P (op0) & !reverse)
     {
       if (get_best_mem_extraction_insn (&extv, pattern, bitsize, bitnum,
 					tmode))
@@ -1735,7 +1863,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 	  xop0 = copy_to_reg (xop0);
 	  rtx result = extract_bit_field_1 (xop0, bitsize, bitpos,
 					    unsignedp, target,
-					    mode, tmode, false);
+					    mode, tmode, reverse, false);
 	  if (result)
 	    return result;
 	  delete_insns_since (last);
@@ -1753,9 +1881,21 @@  extract_bit_field_1 (rtx str_rtx, unsign
   /* Should probably push op0 out to memory and then do a load.  */
   gcc_assert (int_mode != BLKmode);
 
-  target = extract_fixed_bit_field (int_mode, op0, bitsize, bitnum,
-				    target, unsignedp);
-  return convert_extracted_bit_field (target, mode, tmode, unsignedp);
+  target = extract_fixed_bit_field (int_mode, op0, bitsize, bitnum, target,
+				    unsignedp, reverse);
+
+  /* Complex values must be reversed piecewise, so we need to undo the global
+     reversal, convert to the complex mode and reverse again.  */
+  if (reverse && COMPLEX_MODE_P (tmode))
+    {
+      target = flip_storage_order (int_mode, target);
+      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+      target = flip_storage_order (tmode, target);
+    }
+  else
+    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+
+  return target;
 }
 
 /* Generate code to extract a byte-field from STR_RTX
@@ -1769,6 +1909,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
    TMODE is the mode the caller would like the value to have;
    but the value may be returned with type MODE instead.
 
+   If REVERSE is true, the extraction is to be done in reverse order.
+
    If a TARGET is specified and we can store in it at no extra cost,
    we do so, and return TARGET.
    Otherwise, we return a REG of mode TMODE or MODE, with TMODE preferred
@@ -1777,7 +1919,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
 rtx
 extract_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
-		   machine_mode mode, machine_mode tmode)
+		   machine_mode mode, machine_mode tmode, bool reverse)
 {
   machine_mode mode1;
 
@@ -1799,6 +1941,8 @@  extract_bit_field (rtx str_rtx, unsigned
 	{
 	  rtx result = adjust_bitfield_address (str_rtx, mode1,
 						bitnum / BITS_PER_UNIT);
+	  if (reverse)
+	    result = flip_storage_order (mode1, result);
 	  gcc_assert (bitnum % BITS_PER_UNIT == 0);
 	  return convert_extracted_bit_field (result, mode, tmode, unsignedp);
 	}
@@ -1810,13 +1954,15 @@  extract_bit_field (rtx str_rtx, unsigned
     }
 
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
-			      target, mode, tmode, true);
+			      target, mode, tmode, reverse, true);
 }
 
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
    from bit BITNUM of OP0.
 
    UNSIGNEDP is nonzero for an unsigned bit field (don't sign-extend value).
+   If REVERSE is true, the extraction is to be done in reverse order.
+
    If TARGET is nonzero, attempts to store the value there
    and return TARGET, but this is not guaranteed.
    If TARGET is not used, create a pseudo-reg of mode TMODE for the value.  */
@@ -1825,7 +1971,7 @@  static rtx
 extract_fixed_bit_field (machine_mode tmode, rtx op0,
 			 unsigned HOST_WIDE_INT bitsize,
 			 unsigned HOST_WIDE_INT bitnum, rtx target,
-			 int unsignedp)
+			 int unsignedp, bool reverse)
 {
   if (MEM_P (op0))
     {
@@ -1836,13 +1982,14 @@  extract_fixed_bit_field (machine_mode tm
       if (mode == VOIDmode)
 	/* The only way this should occur is if the field spans word
 	   boundaries.  */
-	return extract_split_bit_field (op0, bitsize, bitnum, unsignedp);
+	return extract_split_bit_field (op0, bitsize, bitnum, unsignedp,
+					reverse);
 
       op0 = narrow_bit_field_mem (op0, mode, bitsize, bitnum, &bitnum);
     }
 
   return extract_fixed_bit_field_1 (tmode, op0, bitsize, bitnum,
-				    target, unsignedp);
+				    target, unsignedp, reverse);
 }
 
 /* Helper function for extract_fixed_bit_field, extracts
@@ -1852,7 +1999,7 @@  static rtx
 extract_fixed_bit_field_1 (machine_mode tmode, rtx op0,
 			   unsigned HOST_WIDE_INT bitsize,
 			   unsigned HOST_WIDE_INT bitnum, rtx target,
-			   int unsignedp)
+			   int unsignedp, bool reverse)
 {
   machine_mode mode = GET_MODE (op0);
   gcc_assert (SCALAR_INT_MODE_P (mode));
@@ -1861,13 +2008,15 @@  extract_fixed_bit_field_1 (machine_mode
      for invalid input, such as extract equivalent of f5 from
      gcc.dg/pr48335-2.c.  */
 
-  if (BYTES_BIG_ENDIAN)
+  if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
     /* BITNUM is the distance between our msb and that of OP0.
        Convert it to the distance from the lsb.  */
     bitnum = GET_MODE_BITSIZE (mode) - bitsize - bitnum;
 
   /* Now BITNUM is always the distance between the field's lsb and that of OP0.
      We have reduced the big-endian case to the little-endian case.  */
+  if (reverse)
+    op0 = flip_storage_order (mode, op0);
 
   if (unsignedp)
     {
@@ -1939,11 +2088,14 @@  lshift_value (machine_mode mode, unsigne
 
    OP0 is the REG, SUBREG or MEM rtx for the first of the two words.
    BITSIZE is the field width; BITPOS, position of its first bit, in the word.
-   UNSIGNEDP is 1 if should zero-extend the contents; else sign-extend.  */
+   UNSIGNEDP is 1 if should zero-extend the contents; else sign-extend.
+
+   If REVERSE is true, the extraction is to be done in reverse order.  */
 
 static rtx
 extract_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-			 unsigned HOST_WIDE_INT bitpos, int unsignedp)
+			 unsigned HOST_WIDE_INT bitpos, int unsignedp,
+			 bool reverse)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1998,11 +2150,11 @@  extract_split_bit_field (rtx op0, unsign
 	 whose meaning is determined by BYTES_PER_UNIT.
 	 OFFSET is in UNITs, and UNIT is in bits.  */
       part = extract_fixed_bit_field (word_mode, word, thissize,
-				      offset * unit + thispos, 0, 1);
+				      offset * unit + thispos, 0, 1, reverse);
       bitsdone += thissize;
 
       /* Shift this part into place for the result.  */
-      if (BYTES_BIG_ENDIAN)
+      if (reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  if (bitsize != bitsdone)
 	    part = expand_shift (LSHIFT_EXPR, word_mode, part,
Index: expmed.h
===================================================================
--- expmed.h	(.../trunk/gcc)	(revision 228112)
+++ expmed.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -676,6 +676,10 @@  extern rtx emit_cstore (rtx target, enum
    May emit insns.  */
 extern rtx negate_rtx (machine_mode, rtx);
 
+/* Arguments MODE, RTX: return an rtx for the flipping of that value.
+   May emit insns.  */
+extern rtx flip_storage_order (enum machine_mode, rtx);
+
 /* Expand a logical AND operation.  */
 extern rtx expand_and (machine_mode, rtx, rtx, rtx);
 
@@ -707,10 +711,10 @@  extern void store_bit_field (rtx, unsign
 			     unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     machine_mode, rtx);
+			     machine_mode, rtx, bool);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, rtx,
-			      machine_mode, machine_mode);
+			      machine_mode, machine_mode, bool);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
 extern rtx expand_mult (machine_mode, rtx, rtx, rtx, int);
 extern rtx expand_mult_highpart_adjust (machine_mode, rtx, rtx, rtx, rtx, int);
Index: tree-dfa.c
===================================================================
--- tree-dfa.c	(.../trunk/gcc)	(revision 228112)
+++ tree-dfa.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -383,12 +383,14 @@  get_or_create_ssa_default_def (struct fu
    base variable.  The access range is delimited by bit positions *POFFSET and
    *POFFSET + *PMAX_SIZE.  The access size is *PSIZE bits.  If either
    *PSIZE or *PMAX_SIZE is -1, they could not be determined.  If *PSIZE
-   and *PMAX_SIZE are equal, the access is non-variable.  */
+   and *PMAX_SIZE are equal, the access is non-variable.  If *PREVERSE is
+   true, the storage order of the reference is reversed.  */
 
 tree
 get_ref_base_and_extent (tree exp, HOST_WIDE_INT *poffset,
 			 HOST_WIDE_INT *psize,
-			 HOST_WIDE_INT *pmax_size)
+			 HOST_WIDE_INT *pmax_size,
+			 bool *preverse)
 {
   offset_int bitsize = -1;
   offset_int maxsize;
@@ -396,7 +398,8 @@  get_ref_base_and_extent (tree exp, HOST_
   offset_int bit_offset = 0;
   bool seen_variable_array_ref = false;
 
-  /* First get the final access size from just the outermost expression.  */
+  /* First get the final access size and the storage order from just the
+     outermost expression.  */
   if (TREE_CODE (exp) == COMPONENT_REF)
     size_tree = DECL_SIZE (TREE_OPERAND (exp, 1));
   else if (TREE_CODE (exp) == BIT_FIELD_REF)
@@ -413,6 +416,8 @@  get_ref_base_and_extent (tree exp, HOST_
       && TREE_CODE (size_tree) == INTEGER_CST)
     bitsize = wi::to_offset (size_tree);
 
+  *preverse = reverse_storage_order_for_component_p (exp);
+
   /* Initially, maxsize is the same as the accessed element size.
      In the following it will only grow (or become -1).  */
   maxsize = bitsize;
Index: tree-dfa.h
===================================================================
--- tree-dfa.h	(.../trunk/gcc)	(revision 228112)
+++ tree-dfa.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -30,7 +30,7 @@  extern tree ssa_default_def (struct func
 extern void set_ssa_default_def (struct function *, tree, tree);
 extern tree get_or_create_ssa_default_def (struct function *, tree);
 extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,
-				     HOST_WIDE_INT *, HOST_WIDE_INT *);
+				     HOST_WIDE_INT *, HOST_WIDE_INT *, bool *);
 extern tree get_addr_base_and_unit_offset_1 (tree, HOST_WIDE_INT *,
 					     tree (*) (tree));
 extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
Index: lto/lto.c
===================================================================
--- lto/lto.c	(.../trunk/gcc)	(revision 228112)
+++ lto/lto.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -1019,7 +1019,10 @@  compare_tree_sccs_1 (tree t1, tree t2, t
   compare_values (TREE_DEPRECATED);
   if (TYPE_P (t1))
     {
-      compare_values (TYPE_SATURATING);
+      if (AGGREGATE_TYPE_P (t1))
+	compare_values (TYPE_REVERSE_STORAGE_ORDER);
+      else
+	compare_values (TYPE_SATURATING);
       compare_values (TYPE_ADDR_SPACE);
     }
   else if (code == SSA_NAME)
Index: tree-streamer-out.c
===================================================================
--- tree-streamer-out.c	(.../trunk/gcc)	(revision 228112)
+++ tree-streamer-out.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -114,9 +114,14 @@  pack_ts_base_value_fields (struct bitpac
   bp_pack_value (bp, TREE_DEPRECATED (expr), 1);
   if (TYPE_P (expr))
     {
-      bp_pack_value (bp, TYPE_SATURATING (expr), 1);
+      if (AGGREGATE_TYPE_P (expr))
+	bp_pack_value (bp, TYPE_REVERSE_STORAGE_ORDER (expr), 1);
+      else
+	bp_pack_value (bp, TYPE_SATURATING (expr), 1);
       bp_pack_value (bp, TYPE_ADDR_SPACE (expr), 8);
     }
+  else if (TREE_CODE (expr) == BIT_FIELD_REF || TREE_CODE (expr) == MEM_REF)
+    bp_pack_value (bp, REF_REVERSE_STORAGE_ORDER (expr), 1);
   else if (TREE_CODE (expr) == SSA_NAME)
     {
       bp_pack_value (bp, SSA_NAME_IS_DEFAULT_DEF (expr), 1);
Index: print-tree.c
===================================================================
--- print-tree.c	(.../trunk/gcc)	(revision 228112)
+++ print-tree.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -570,6 +570,13 @@  print_node (FILE *file, const char *pref
       if (TYPE_NEEDS_CONSTRUCTING (node))
 	fputs (" needs-constructing", file);
 
+      if ((code == RECORD_TYPE
+	   || code == UNION_TYPE
+	   || code == QUAL_UNION_TYPE
+	   || code == ARRAY_TYPE)
+	  && TYPE_REVERSE_STORAGE_ORDER (node))
+	fputs (" reverse-storage-order", file);
+
       /* The transparent-union flag is used for different things in
 	 different nodes.  */
       if ((code == UNION_TYPE || code == RECORD_TYPE)
Index: varasm.c
===================================================================
--- varasm.c	(.../trunk/gcc)	(revision 228112)
+++ varasm.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -115,7 +115,7 @@  static int compare_constant (const tree,
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int);
+					       unsigned int, bool);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -2062,7 +2062,8 @@  assemble_variable_contents (tree decl, c
 	/* Output the actual data.  */
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
-			 get_variable_align (decl));
+			 get_variable_align (decl),
+			 false);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2788,11 +2789,12 @@  assemble_integer (rtx x, unsigned int si
 }
 
 void
-assemble_real (REAL_VALUE_TYPE d, machine_mode mode, unsigned int align)
+assemble_real (REAL_VALUE_TYPE d, machine_mode mode, unsigned int align,
+	       bool reverse)
 {
   long data[4] = {0, 0, 0, 0};
-  int i;
   int bitsize, nelts, nunits, units_per;
+  rtx elt;
 
   /* This is hairy.  We have a quantity of known size.  real_to_target
      will put it into an array of *host* longs, 32 bits per element
@@ -2814,15 +2816,24 @@  assemble_real (REAL_VALUE_TYPE d, machin
   real_to_target (data, &d, mode);
 
   /* Put out the first word with the specified alignment.  */
-  assemble_integer (GEN_INT (data[0]), MIN (nunits, units_per), align, 1);
+  if (reverse)
+    elt = flip_storage_order (SImode, gen_int_mode (data[nelts - 1], SImode));
+  else
+    elt = GEN_INT (data[0]);
+  assemble_integer (elt, MIN (nunits, units_per), align, 1);
   nunits -= units_per;
 
   /* Subsequent words need only 32-bit alignment.  */
   align = min_align (align, 32);
 
-  for (i = 1; i < nelts; i++)
+  for (int i = 1; i < nelts; i++)
     {
-      assemble_integer (GEN_INT (data[i]), MIN (nunits, units_per), align, 1);
+      if (reverse)
+	elt = flip_storage_order (SImode,
+				  gen_int_mode (data[nelts - 1 - i], SImode));
+      else
+	elt = GEN_INT (data[i]);
+      assemble_integer (elt, MIN (nunits, units_per), align, 1);
       nunits -= units_per;
     }
 }
@@ -3124,10 +3135,12 @@  compare_constant (const tree t1, const t
 	if (typecode == ARRAY_TYPE)
 	  {
 	    HOST_WIDE_INT size_1 = int_size_in_bytes (TREE_TYPE (t1));
-	    /* For arrays, check that the sizes all match.  */
+	    /* For arrays, check that mode, size and storage order match.  */
 	    if (TYPE_MODE (TREE_TYPE (t1)) != TYPE_MODE (TREE_TYPE (t2))
 		|| size_1 == -1
-		|| size_1 != int_size_in_bytes (TREE_TYPE (t2)))
+		|| size_1 != int_size_in_bytes (TREE_TYPE (t2))
+		|| TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t1))
+		   != TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t2)))
 	      return 0;
 	  }
 	else
@@ -3411,7 +3424,7 @@  assemble_constant_contents (tree exp, co
   targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size);
 
   /* Output the value of EXP.  */
-  output_constant (exp, size, align);
+  output_constant (exp, size, align, false);
 
   targetm.asm_out.decl_end ();
 }
@@ -3845,7 +3858,7 @@  output_constant_pool_2 (machine_mode mod
 
 	gcc_assert (CONST_DOUBLE_AS_FLOAT_P (x));
 	REAL_VALUE_FROM_CONST_DOUBLE (r, x);
-	assemble_real (r, mode, align);
+	assemble_real (r, mode, align, false);
 	break;
       }
 
@@ -4345,7 +4358,11 @@  initializer_constant_valid_p_1 (tree val
 	      tree reloc;
 	      reloc = initializer_constant_valid_p_1 (elt, TREE_TYPE (elt),
 						      NULL);
-	      if (!reloc)
+	      if (!reloc
+		  /* An absolute value is required with reverse SSO.  */
+		  || (reloc != null_pointer_node
+		      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (value))
+		      && !AGGREGATE_TYPE_P (TREE_TYPE (elt))))
 		{
 		  if (cache)
 		    {
@@ -4585,9 +4602,19 @@  initializer_constant_valid_p_1 (tree val
    therefore, we do not need to check for such things as
    arithmetic-combinations of integers.  */
 tree
-initializer_constant_valid_p (tree value, tree endtype)
+initializer_constant_valid_p (tree value, tree endtype, bool reverse)
 {
-  return initializer_constant_valid_p_1 (value, endtype, NULL);
+  tree reloc = initializer_constant_valid_p_1 (value, endtype, NULL);
+
+  /* An absolute value is required with reverse storage order.  */
+  if (reloc
+      && reloc != null_pointer_node
+      && reverse
+      && !AGGREGATE_TYPE_P (endtype)
+      && !VECTOR_TYPE_P (endtype))
+    reloc = NULL_TREE;
+
+  return reloc;
 }
 
 /* Return true if VALUE is a valid constant-valued expression
@@ -4637,8 +4664,8 @@  struct oc_outer_state {
 };
 
 static unsigned HOST_WIDE_INT
-  output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int,
-		      oc_outer_state *);
+output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool,
+		    oc_outer_state *);
 
 /* Output assembler code for constant EXP, with no label.
    This includes the pseudo-op such as ".int" or ".byte", and a newline.
@@ -4660,13 +4687,18 @@  static unsigned HOST_WIDE_INT
    for a structure constructor that wants to produce more than SIZE bytes.
    But such constructors will never be generated for any possible input.
 
-   ALIGN is the alignment of the data in bits.  */
+   ALIGN is the alignment of the data in bits.
+
+   If REVERSE is true, EXP is interpreted in reverse storage order wrt the
+   target order.  */
 
 static unsigned HOST_WIDE_INT
-output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align)
+output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
+		 bool reverse)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
+  rtx cst;
 
   if (size == 0 || flag_syntax_only)
     return size;
@@ -4761,9 +4793,10 @@  output_constant (tree exp, unsigned HOST
     case FIXED_POINT_TYPE:
     case POINTER_BOUNDS_TYPE:
     case NULLPTR_TYPE:
-      if (! assemble_integer (expand_expr (exp, NULL_RTX, VOIDmode,
-					   EXPAND_INITIALIZER),
-			      MIN (size, thissize), align, 0))
+      cst = expand_expr (exp, NULL_RTX, VOIDmode, EXPAND_INITIALIZER);
+      if (reverse)
+	cst = flip_storage_order (TYPE_MODE (TREE_TYPE (exp)), cst);
+      if (!assemble_integer (cst, MIN (size, thissize), align, 0))
 	error ("initializer for integer/fixed-point value is too complicated");
       break;
 
@@ -4771,13 +4804,17 @@  output_constant (tree exp, unsigned HOST
       if (TREE_CODE (exp) != REAL_CST)
 	error ("initializer for floating value is not a floating constant");
       else
-	assemble_real (TREE_REAL_CST (exp), TYPE_MODE (TREE_TYPE (exp)), align);
+	{
+	  assemble_real (TREE_REAL_CST (exp), TYPE_MODE (TREE_TYPE (exp)),
+			 align, reverse);
+	}
       break;
 
     case COMPLEX_TYPE:
-      output_constant (TREE_REALPART (exp), thissize / 2, align);
+      output_constant (TREE_REALPART (exp), thissize / 2, align, reverse);
       output_constant (TREE_IMAGPART (exp), thissize / 2,
-		       min_align (align, BITS_PER_UNIT * (thissize / 2)));
+		       min_align (align, BITS_PER_UNIT * (thissize / 2)),
+		       reverse);
       break;
 
     case ARRAY_TYPE:
@@ -4785,7 +4822,7 @@  output_constant (tree exp, unsigned HOST
       switch (TREE_CODE (exp))
 	{
 	case CONSTRUCTOR:
-	  return output_constructor (exp, size, align, NULL);
+	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
 	  thissize
 	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
@@ -4796,11 +4833,13 @@  output_constant (tree exp, unsigned HOST
 	    machine_mode inner = TYPE_MODE (TREE_TYPE (TREE_TYPE (exp)));
 	    unsigned int nalign = MIN (align, GET_MODE_ALIGNMENT (inner));
 	    int elt_size = GET_MODE_SIZE (inner);
-	    output_constant (VECTOR_CST_ELT (exp, 0), elt_size, align);
+	    output_constant (VECTOR_CST_ELT (exp, 0), elt_size, align,
+			     reverse);
 	    thissize = elt_size;
 	    for (unsigned int i = 1; i < VECTOR_CST_NELTS (exp); i++)
 	      {
-		output_constant (VECTOR_CST_ELT (exp, i), elt_size, nalign);
+		output_constant (VECTOR_CST_ELT (exp, i), elt_size, nalign,
+				 reverse);
 		thissize += elt_size;
 	      }
 	    break;
@@ -4813,7 +4852,7 @@  output_constant (tree exp, unsigned HOST
     case RECORD_TYPE:
     case UNION_TYPE:
       gcc_assert (TREE_CODE (exp) == CONSTRUCTOR);
-      return output_constructor (exp, size, align, NULL);
+      return output_constructor (exp, size, align, reverse, NULL);
 
     case ERROR_MARK:
       return 0;
@@ -4827,7 +4866,6 @@  output_constant (tree exp, unsigned HOST
 
   return size;
 }
-
 
 /* Subroutine of output_constructor, used for computing the size of
    arrays of unspecified length.  VAL must be a CONSTRUCTOR of an array
@@ -4889,6 +4927,7 @@  struct oc_local_state {
   int last_relative_index;    /* Implicit or explicit index of the last
 				 array element output within a bitfield.  */
   bool byte_buffer_in_use;    /* Whether BYTE is in use.  */
+  bool reverse;               /* Whether reverse storage order is in use.  */
 
   /* Current element.  */
   tree field;      /* Current field decl in a record.  */
@@ -4921,7 +4960,8 @@  output_constructor_array_range (oc_local
       if (local->val == NULL_TREE)
 	assemble_zeros (fieldsize);
       else
-	fieldsize = output_constant (local->val, fieldsize, align2);
+	fieldsize
+	  = output_constant (local->val, fieldsize, align2, local->reverse);
 
       /* Count its size.  */
       local->total_bytes += fieldsize;
@@ -5007,7 +5047,8 @@  output_constructor_regular_field (oc_loc
   if (local->val == NULL_TREE)
     assemble_zeros (fieldsize);
   else
-    fieldsize = output_constant (local->val, fieldsize, align2);
+    fieldsize
+      = output_constant (local->val, fieldsize, align2, local->reverse);
 
   /* Count its size.  */
   local->total_bytes += fieldsize;
@@ -5105,7 +5146,7 @@  output_constructor_bitfield (oc_local_st
       temp_state.bit_offset = next_offset % BITS_PER_UNIT;
       temp_state.byte = local->byte;
       local->total_bytes
-	  += output_constructor (local->val, 0, 0, &temp_state);
+	+= output_constructor (local->val, 0, 0, local->reverse, &temp_state);
       local->byte = temp_state.byte;
       return;
     }
@@ -5131,7 +5172,7 @@  output_constructor_bitfield (oc_local_st
 
       /* Number of bits we can process at once (all part of the same byte).  */
       this_time = MIN (end_offset - next_offset, BITS_PER_UNIT - next_bit);
-      if (BYTES_BIG_ENDIAN)
+      if (local->reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 	{
 	  /* On big-endian machine, take the most significant bits (of the
 	     bits that are significant) first and put them into bytes from
@@ -5195,12 +5236,11 @@  output_constructor_bitfield (oc_local_st
    caller output state of relevance in recursive invocations.  */
 
 static unsigned HOST_WIDE_INT
-output_constructor (tree exp, unsigned HOST_WIDE_INT size,
-		    unsigned int align, oc_outer_state *outer)
+output_constructor (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
+		    bool reverse, oc_outer_state *outer)
 {
   unsigned HOST_WIDE_INT cnt;
   constructor_elt *ce;
-
   oc_local_state local;
 
   /* Setup our local state to communicate with helpers.  */
@@ -5217,6 +5257,11 @@  output_constructor (tree exp, unsigned H
   local.byte_buffer_in_use = outer != NULL;
   local.byte = outer ? outer->byte : 0;
   local.last_relative_index = -1;
+  /* The storage order is specified for every aggregate type.  */
+  if (AGGREGATE_TYPE_P (local.type))
+    local.reverse = TYPE_REVERSE_STORAGE_ORDER (local.type);
+  else
+    local.reverse = reverse;
 
   gcc_assert (HOST_BITS_PER_WIDE_INT >= BITS_PER_UNIT);
 
Index: varasm.h
===================================================================
--- varasm.h	(.../trunk/gcc)	(revision 228112)
+++ varasm.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -60,7 +60,7 @@  extern void assemble_alias (tree, tree);
    We assume that VALUE has been folded as much as possible;
    therefore, we do not need to check for such things as
    arithmetic-combinations of integers.  */
-extern tree initializer_constant_valid_p (tree, tree);
+extern tree initializer_constant_valid_p (tree, tree, bool = false);
 
 /* Return true if VALUE is a valid constant-valued expression
    for use in initializing a static bit-field; one that can be
Index: tree-inline.c
===================================================================
--- tree-inline.c	(.../trunk/gcc)	(revision 228112)
+++ tree-inline.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -967,6 +967,7 @@  remap_gimple_op_r (tree *tp, int *walk_s
 	      && (!is_parm (TREE_OPERAND (old, 0))
 		  || (!id->transform_parameter && is_parm (ptr))))
 	    TREE_THIS_NOTRAP (*tp) = 1;
+	  REF_REVERSE_STORAGE_ORDER (*tp) = REF_REVERSE_STORAGE_ORDER (old);
 	  *walk_subtrees = 0;
 	  return NULL;
 	}
@@ -1224,6 +1225,7 @@  copy_tree_body_r (tree *tp, int *walk_su
 	      && (!is_parm (TREE_OPERAND (old, 0))
 		  || (!id->transform_parameter && is_parm (ptr))))
 	    TREE_THIS_NOTRAP (*tp) = 1;
+	  REF_REVERSE_STORAGE_ORDER (*tp) = REF_REVERSE_STORAGE_ORDER (old);
 	  *walk_subtrees = 0;
 	  return NULL;
 	}
Index: tree-streamer-in.c
===================================================================
--- tree-streamer-in.c	(.../trunk/gcc)	(revision 228112)
+++ tree-streamer-in.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -142,9 +142,14 @@  unpack_ts_base_value_fields (struct bitp
   TREE_DEPRECATED (expr) = (unsigned) bp_unpack_value (bp, 1);
   if (TYPE_P (expr))
     {
-      TYPE_SATURATING (expr) = (unsigned) bp_unpack_value (bp, 1);
+      if (AGGREGATE_TYPE_P (expr))
+	TYPE_REVERSE_STORAGE_ORDER (expr) = (unsigned) bp_unpack_value (bp, 1);
+      else
+	TYPE_SATURATING (expr) = (unsigned) bp_unpack_value (bp, 1);
       TYPE_ADDR_SPACE (expr) = (unsigned) bp_unpack_value (bp, 8);
     }
+  else if (TREE_CODE (expr) == BIT_FIELD_REF || TREE_CODE (expr) == MEM_REF)
+    REF_REVERSE_STORAGE_ORDER (expr) = (unsigned) bp_unpack_value (bp, 1);
   else if (TREE_CODE (expr) == SSA_NAME)
     {
       SSA_NAME_IS_DEFAULT_DEF (expr) = (unsigned) bp_unpack_value (bp, 1);
Index: output.h
===================================================================
--- output.h	(.../trunk/gcc)	(revision 228112)
+++ output.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -280,7 +280,7 @@  extern section *get_named_text_section (
 
 #ifdef REAL_VALUE_TYPE_SIZE
 /* Assemble the floating-point constant D into an object of size MODE.  */
-extern void assemble_real (REAL_VALUE_TYPE, machine_mode, unsigned);
+extern void assemble_real (REAL_VALUE_TYPE, machine_mode, unsigned, bool);
 #endif
 
 /* Write the address of the entity given by SYMBOL to SEC.  */
Index: tree-outof-ssa.c
===================================================================
--- tree-outof-ssa.c	(.../trunk/gcc)	(revision 228112)
+++ tree-outof-ssa.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -315,7 +315,7 @@  insert_value_copy_on_edge (edge e, int d
   else if (src_mode == BLKmode)
     {
       x = dest_rtx;
-      store_expr (src, x, 0, false);
+      store_expr (src, x, 0, false, false);
     }
   else
     x = expand_expr (src, dest_rtx, dest_mode, EXPAND_NORMAL);
Index: gimple-expr.c
===================================================================
--- gimple-expr.c	(.../trunk/gcc)	(revision 228112)
+++ gimple-expr.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -159,14 +159,16 @@  useless_type_conversion_p (tree outer_ty
   else if (TREE_CODE (inner_type) == ARRAY_TYPE
 	   && TREE_CODE (outer_type) == ARRAY_TYPE)
     {
-      /* Preserve string attributes.  */
+      /* Preserve various attributes.  */
+      if (TYPE_REVERSE_STORAGE_ORDER (inner_type)
+	  != TYPE_REVERSE_STORAGE_ORDER (outer_type))
+	return false;
       if (TYPE_STRING_FLAG (inner_type) != TYPE_STRING_FLAG (outer_type))
 	return false;
 
       /* Conversions from array types with unknown extent to
 	 array types with known extent are not useless.  */
-      if (!TYPE_DOMAIN (inner_type)
-	  && TYPE_DOMAIN (outer_type))
+      if (!TYPE_DOMAIN (inner_type) && TYPE_DOMAIN (outer_type))
 	return false;
 
       /* Nor are conversions from array types with non-constant size to
Index: tree-core.h
===================================================================
--- tree-core.h	(.../trunk/gcc)	(revision 228112)
+++ tree-core.h	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -1110,8 +1110,14 @@  struct GTY(()) tree_base {
 
    saturating_flag:
 
+       TYPE_REVERSE_STORAGE_ORDER in
+           RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE, ARRAY_TYPE
+
        TYPE_SATURATING in
-           all types
+           other types
+
+       REF_REVERSE_STORAGE_ORDER in
+           BIT_FIELD_REF, MEM_REF
 
        VAR_DECL_IS_VIRTUAL_OPERAND in
 	   VAR_DECL
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(.../trunk/gcc)	(revision 228112)
+++ lto-streamer-out.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -977,7 +977,8 @@  hash_tree (struct streamer_tree_cache_d
     hstate.add_flag (TREE_PRIVATE (t));
   if (TYPE_P (t))
     {
-      hstate.add_flag (TYPE_SATURATING (t));
+      hstate.add_flag (AGGREGATE_TYPE_P (t)
+		       ? TYPE_REVERSE_STORAGE_ORDER (t) : TYPE_SATURATING (t));
       hstate.add_flag (TYPE_ADDR_SPACE (t));
     }
   else if (code == SSA_NAME)