Patchwork [ARM] Misaligned access support for ARM Neon

login
register
mail settings
Submitter Julian Brown
Date Oct. 4, 2010, 2:59 p.m.
Message ID <20101004155936.43295304@rex.config>
Download mbox | patch
Permalink /patch/66695/
State New
Headers show

Comments

Julian Brown - Oct. 4, 2010, 2:59 p.m.
On Wed, 22 Sep 2010 23:54:38 +0100
Richard Earnshaw <Richard.Earnshaw@buzzard.freeserve.co.uk> wrote:

> On 03/08/10 17:32, Julian Brown wrote:
> > On Mon, 7 Jun 2010 20:08:48 +0100
> > Julian Brown <julian@codesourcery.com> wrote:
> > 
> >>>>> This is a new version of the patch, which adds movmisalign
> >>>>> patterns for little-endian NEON, and uses a new (since the last
> >>>>> version of the patch was posted) target hook
> >>>>> (TARGET_SUPPORT_VECTOR_MISALIGNMENT) to describe the alignments
> >>>>> supported by NEON.
> > 
> > The previously-posted version of this patch no longer works on
> > current mainline, so here's a new version which does.
> > 
> > Backing up to the start of the problem, since it's been a while --
> > this patch adds several things to NEON support in the ARM backend:
> > 
> > 1. Implementations of the movmisalign pattern for loading and
> > storing vectors which are not naturally aligned.
> > 
> > 2. Constraint/operand printing tweaks to disallow pre-decrement for 
> > addresses used by the above, and allow printing of alignment
> > specifiers for same.
> > 
> > 3. Implementations of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> > and TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE, to tell the
> > middle-end which alignments are supported for vector loads/stores
> > by the hardware.
> > 
> > 4. Testsuite tweaks to specify that certain tests only require
> > vectors to be aligned to the natural alignment of their elements,
> > but not necessarily less than that. Also tweaks to force some tests
> > to use the -mvectorize-with-neon-quad option.
> > 
> >> [...] there's still an assumption that elements from increasing
> >> memory locations go in increasing lane numbers (which is only true
> >> in little-endian mode for NEON at present), but I don't think this
> >> patch makes things any worse. Fixing big-endian mode is another
> >> problem for another day :-).
> > 
> > This still holds, but as previously discussed, probably should not
> > be a sticking point for getting this patch applied.
> > 
> > There remains a small amount of noise in testsuite results with this
> > patch, i.e.:
> > 
> > PASS -> FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/ve
> > ct/vect-72.c scan-tree-dump-times vect "Alignment of access forced
> > using peeling " 0
> > 
> > This fails because a loop containing both an unaligned load and an
> > unaligned store is unpeeled, making the load aligned. It seems to
> > be a valid thing to do, so I'm not sure why it's a failure.
> > 
> > New FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc
> > scan-tree-dump-times vect "vectorized 1 loops" 1 New FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc
> > scan-tree-dump-times vect "vectorizing stmts using SLP" 1
> > 
> > These were analysed in:
> > 
> >   http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01351.html
> > 
> > New FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/vect/vect-outer-4c.c
> > scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> > 
> > and this in:
> > 
> >   http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01328.html
> > 
> > Also several tests transition from XPASS to PASS.
> > 
> > Tested with cross to ARM Linux (-mthumb -march=armv7-a -mfpu=neon
> > -mfloat-abi=softfp), gcc/g++/libstdc++. OK to apply?
> > 
> > ChangeLog
> > 
> >     gcc/
> >     * expr.c (expand_assignment): Add assertion to prevent emitting
> > null rtx for movmisalign pattern.
> >     (expand_expr_real_1): Likewise.
> >     * config/arm/arm.c (arm_builtin_support_vector_misalignment):
> > New. (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above.
> >     (arm_vector_alignment_reachable): New.
> >     (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above.
> >     (neon_vector_mem_operand): Disallow PRE_DEC for misaligned
> > loads. (arm_print_operand): Include alignment qualifier in %A.
> >     * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant.
> >     (movmisalign<mode>): New expander.
> >     (movmisalign<mode>_neon_store, movmisalign<mode>_neon_load): New
> >     insn patterns.
> > 
> >     gcc/testsuite/
> >     * gcc.dg/vect/vect-42.c: Use vect_element_align instead of
> >     vect_hw_misalign.
> >     * gcc.dg/vect/vect-60.c: Likewise.
> >     * gcc.dg/vect/vect-56.c: Likewise.
> >     * gcc.dg/vect/vect-93.c: Likewise.
> >     * gcc.dg/vect/no-scevccp-outer-8.c: Likewise.
> >     * gcc.dg/vect/vect-95.c: Likewise.
> >     * gcc.dg/vect/vect-96.c: Likewise.
> >     * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when
> > available.
> >     * gcc.dg/vect/slp-25.c: Likewise.
> >     * gcc.dg/vect/slp-3.c: Likewise.
> >     * gcc.dg/vect/vect-multitypes-1.c: Likewise.
> >     * gcc.dg/vect/no-vfa-pr29145.c: Likewise.
> >     * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use
> > vect_element_align.
> >     * gcc.dg/vect/vect-109.c: Likewise.
> >     * gcc.dg/vect/vect-peel-1.c: Likewise.
> >     * gcc.dg/vect/vect-peel-2.c: Likewise.
> >     * lib/target-supports.exp
> >     (check_effective_target_arm_vect_no_misalign): New.
> >     (check_effective_target_vect_no_align): Use above.
> >     (check_effective_target_vect_element_align): New.
> >     (add_options_for_quad_vectors): New.
> 
> 
> I've spent a long time pondering this patch and I'm still not entirely
> happy that forcing the vectorizer to pretend these operations are
> unaligned is the correct way to specify this, but I must admit that I
> can't see a reasonable alternative at the moment that isn't
> significantly less pleasant in some other respect.  So this is OK
> apart from:
> 
> +	if (align_bits != 0)
> +	  asm_fprintf (stream, ", :%d", align_bits);
> 
> The comma is incorrect in the alignment syntax.  The correct form is
> [Rn:align].  That is, the ':' is a direct replacement for '@' in the
> strict UAL form.

Fixed.

Here's the version I'm about to commit, re-tested lightly. It only
differs in trivial ways from the previously-posted version, in order to
apply to current mainline.

Thanks,

Julian

ChangeLog

    gcc/
    * expr.c (expand_assignment): Add assertion to prevent emitting
    null rtx for movmisalign pattern.
    (expand_expr_real_1): Likewise.
    * config/arm/arm.c (arm_builtin_support_vector_misalignment): New.
    (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above.
    (arm_vector_alignment_reachable): New.
    (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above.
    (neon_vector_mem_operand): Disallow PRE_DEC for misaligned loads.
    (arm_print_operand): Include alignment qualifier in %A.
    * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant.
    (movmisalign<mode>): New expander.
    (movmisalign<mode>_neon_store, movmisalign<mode>_neon_load): New
    insn patterns.

    gcc/testsuite/
    * gcc.dg/vect/vect-42.c: Use vect_element_align instead of
    vect_hw_misalign.
    * gcc.dg/vect/vect-60.c: Likewise.
    * gcc.dg/vect/vect-56.c: Likewise.
    * gcc.dg/vect/vect-93.c: Likewise.
    * gcc.dg/vect/no-scevccp-outer-8.c: Likewise.
    * gcc.dg/vect/vect-95.c: Likewise.
    * gcc.dg/vect/vect-96.c: Likewise.
    * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when available.
    * gcc.dg/vect/slp-25.c: Likewise.
    * gcc.dg/vect/slp-3.c: Likewise.
    * gcc.dg/vect/vect-multitypes-1.c: Likewise.
    * gcc.dg/vect/no-vfa-pr29145.c: Likewise.
    * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use vect_element_align.
    * gcc.dg/vect/vect-109.c: Likewise.
    * gcc.dg/vect/vect-peel-1.c: Likewise.
    * gcc.dg/vect/vect-peel-2.c: Likewise.
    * lib/target-supports.exp
    (check_effective_target_arm_vect_no_misalign): New.
    (check_effective_target_vect_no_align): Use above.
    (check_effective_target_vect_element_align): New.
    (add_options_for_quad_vectors): New.
Ramana Radhakrishnan - Oct. 7, 2010, 1:06 p.m.
This caused PR45932.  Can you please have a look ?

Ramana

Patch

Index: gcc/testsuite/gcc.dg/vect/vect-42.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-42.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-42.c	(working copy)
@@ -64,8 +64,8 @@  int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 3 "vect" { target vect_no_align } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" { target { { ! vector_alignment_reachable } && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { xfail { vect_no_align || { { !  vector_alignment_reachable } || vect_hw_misalign  } } } } }  */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 "vect" { target vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { vect_no_align || { { ! vector_alignment_reachable } || vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" { target { { ! vector_alignment_reachable } && { ! vect_element_align } } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { xfail { vect_no_align || { { !  vector_alignment_reachable } || vect_element_align  } } } } }  */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 "vect" { target vect_element_align } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { vect_no_align || { { ! vector_alignment_reachable } || vect_element_align } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-outer-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-5.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-outer-5.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_float } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdio.h>
 #include <stdarg.h>
Index: gcc/testsuite/gcc.dg/vect/vect-60.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-60.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-60.c	(working copy)
@@ -69,8 +69,8 @@  int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_align } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target { vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail { vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target { vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target { vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail { vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target { vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-109.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-109.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-109.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
@@ -72,8 +73,8 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned store" 2 "vect" { xfail vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 "vect" { target vect_hw_misalign } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_element_align } } } */
+/* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned store" 2 "vect" { xfail vect_element_align } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 "vect" { target vect_element_align } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-peel-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-peel-1.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-peel-1.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
@@ -45,7 +46,7 @@  int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail  vect_no_align } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target vect_hw_misalign  } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target vect_element_align  } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_element_align } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail  vect_no_align } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-peel-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-peel-2.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-peel-2.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
@@ -46,7 +47,7 @@  int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail  vect_no_align } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target vect_hw_misalign  } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target vect_hw_misalign } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target vect_element_align  } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target vect_element_align } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-56.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-56.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-56.c	(working copy)
@@ -68,8 +68,8 @@  int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_align } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target { vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail { vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target { vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target { vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail { vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target { vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-25.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-25.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/slp-25.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/vect-93.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-93.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-93.c	(working copy)
@@ -72,7 +72,7 @@  int main (void)
 /* main && main1 together: */
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { target powerpc*-*-* i?86-*-* x86_64-*-* } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { target { vect_no_align && {! vector_alignment_reachable} } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 3 "vect" { xfail { { vect_no_align } || { { ! vector_alignment_reachable} || vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 3 "vect" { xfail { { vect_no_align } || { { ! vector_alignment_reachable} || vect_element_align } } } } } */
 
 /* in main1: */
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target !powerpc*-*-* !i?86-*-* !x86_64-*-* } } } */
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c	(working copy)
@@ -46,5 +46,5 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! { vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! { vect_element_align } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-95.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-95.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-95.c	(working copy)
@@ -56,14 +56,14 @@  int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail {vect_hw_misalign} } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail {vect_element_align} } } } */
 
 /* For targets that support unaligned loads we version for the two unaligned 
    stores and generate misaligned accesses for the loads. For targets that 
    don't support unaligned loads we version for all four accesses.  */
 
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_hw_misalign} } } }  */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 2 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align || vect_element_align} } } }  */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 2 "vect" { xfail { vect_no_align || vect_element_align } } } } */
 /*  { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" { target vect_no_align } } } */
 /*  { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 4 "vect" { target vect_no_align } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-96.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-96.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-96.c	(working copy)
@@ -44,6 +44,6 @@  int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target { {! vect_no_align} && vector_alignment_reachable } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { { vect_no_align } || { { ! vector_alignment_reachable} || vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 1 "vect" { target { vect_no_align || { {! vector_alignment_reachable} && {! vect_hw_misalign} } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { { vect_no_align } || { { ! vector_alignment_reachable} || vect_element_align } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 1 "vect" { target { vect_no_align || { {! vector_alignment_reachable} && {! vect_element_align} } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/slp-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-3.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/slp-3.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include <stdio.h>
Index: gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c	(revision 164939)
+++ gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c	(working copy)
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-add-options quad_vectors } */
 
 #include <stdarg.h>
 #include "tree-vect.h"
@@ -92,9 +93,9 @@  int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_align } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { target { vect_hw_misalign}  } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 8 "vect" { xfail { vect_no_align || vect_hw_misalign } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { target { vect_hw_misalign  } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { target { vect_element_align}  } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail { vect_no_align || vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 8 "vect" { xfail { vect_no_align || vect_element_align } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { target { vect_element_align  } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 164939)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1813,6 +1813,18 @@  proc check_effective_target_arm32 { } {
     }]
 }
 
+# Return 1 if this is an ARM target that only supports aligned vector accesses
+proc check_effective_target_arm_vect_no_misalign { } {
+    return [check_no_compiler_messages arm_vect_no_misalign assembly {
+	#if !defined(__arm__) \
+	    || (defined(__ARMEL__) \
+	        && (!defined(__thumb__) || defined(__thumb2__)))
+	#error FOO
+	#endif
+    }]
+}
+
+
 # Return 1 if this is an ARM target supporting -mfpu=vfp
 # -mfloat-abi=softfp.  Some multilibs may be incompatible with these
 # options.
@@ -2776,7 +2788,7 @@  proc check_effective_target_vect_no_alig
 	if { [istarget mipsisa64*-*-*]
 	     || [istarget sparc*-*-*]
 	     || [istarget ia64-*-*]
-	     || [check_effective_target_arm32]
+	     || [check_effective_target_arm_vect_no_misalign]
 	     || ([istarget mips*-*-*]
 		 && [check_effective_target_mips_loongson]) } {
 	    set et_vect_no_align_saved 1
@@ -2913,6 +2925,25 @@  proc check_effective_target_vector_align
     return $et_vector_alignment_reachable_for_64bit_saved
 }
 
+# Return 1 if the target only requires element alignment for vector accesses
+
+proc check_effective_target_vect_element_align { } {
+    global et_vect_element_align
+
+    if [info exists et_vect_element_align] {
+	verbose "check_effective_target_vect_element_align: using cached result" 2
+    } else {
+	set et_vect_element_align 0
+	if { [istarget arm*-*-*]
+	     || [check_effective_target_vect_hw_misalign] } {
+	   set et_vect_element_align 1
+	}
+    }
+
+    verbose "check_effective_target_vect_element_align: returning $et_vect_element_align" 2
+    return $et_vect_element_align
+}
+
 # Return 1 if the target supports vector conditional operations, 0 otherwise.
 
 proc check_effective_target_vect_condition { } {
@@ -3480,6 +3511,16 @@  proc add_options_for_bind_pic_locally { 
     return $flags
 }
 
+# Add to FLAGS the flags needed to enable 128-bit vectors.
+
+proc add_options_for_quad_vectors { flags } {
+    if [is-effective-target arm_neon_ok] {
+	return "$flags -mvectorize-with-neon-quad"
+    }
+
+    return $flags
+}
+
 # Return 1 if the target provides a full C99 runtime.
 
 proc check_effective_target_c99_runtime { } {
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 164939)
+++ gcc/expr.c	(working copy)
@@ -4223,6 +4223,9 @@  expand_assignment (tree to, tree from, b
 	reg = copy_to_mode_reg (op_mode1, reg);
 
       insn = GEN_FCN (icode) (mem, reg);
+      /* The movmisalign<mode> pattern cannot fail, else the assignment would
+         silently be omitted.  */
+      gcc_assert (insn != NULL_RTX);
       emit_insn (insn);
       return;
     }
@@ -8674,6 +8677,7 @@  expand_expr_real_1 (tree exp, rtx target
 
 	    /* Nor can the insn generator.  */
 	    insn = GEN_FCN (icode) (reg, temp);
+	    gcc_assert (insn != NULL_RTX);
 	    emit_insn (insn);
 
 	    return reg;
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 164939)
+++ gcc/config/arm/arm.c	(working copy)
@@ -242,6 +242,11 @@  static bool cortex_a9_sched_adjust_cost 
 static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *);
 static unsigned int arm_units_per_simd_word (enum machine_mode);
 static bool arm_class_likely_spilled_p (reg_class_t);
+static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
+static bool arm_builtin_support_vector_misalignment (enum machine_mode mode,
+						     const_tree type,
+						     int misalignment,
+						     bool is_packed);
 
 
 /* Table of machine attributes.  */
@@ -557,6 +562,14 @@  static const struct attribute_spec arm_a
 #undef TARGET_CLASS_LIKELY_SPILLED_P
 #define TARGET_CLASS_LIKELY_SPILLED_P arm_class_likely_spilled_p
 
+#undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
+#define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \
+  arm_vector_alignment_reachable
+
+#undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
+#define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \
+  arm_builtin_support_vector_misalignment
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Obstack for minipool constant handling.  */
@@ -8834,7 +8847,8 @@  neon_vector_mem_operand (rtx op, int typ
     return arm_address_register_rtx_p (ind, 0);
 
   /* Allow post-increment with Neon registers.  */
-  if (type != 1 && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))
+  if ((type != 1 && GET_CODE (ind) == POST_INC)
+      || (type == 0 && GET_CODE (ind) == PRE_DEC))
     return arm_address_register_rtx_p (XEXP (ind, 0), 0);
 
   /* FIXME: vld1 allows register post-modify.  */
@@ -16317,6 +16331,8 @@  arm_print_operand (FILE *stream, rtx x, 
       {
 	rtx addr;
 	bool postinc = FALSE;
+	unsigned align, modesize, align_bits;
+
 	gcc_assert (GET_CODE (x) == MEM);
 	addr = XEXP (x, 0);
 	if (GET_CODE (addr) == POST_INC)
@@ -16324,7 +16340,29 @@  arm_print_operand (FILE *stream, rtx x, 
 	    postinc = 1;
 	    addr = XEXP (addr, 0);
 	  }
-	asm_fprintf (stream, "[%r]", REGNO (addr));
+	asm_fprintf (stream, "[%r", REGNO (addr));
+
+	/* We know the alignment of this access, so we can emit a hint in the
+	   instruction (for some alignments) as an aid to the memory subsystem
+	   of the target.  */
+	align = MEM_ALIGN (x) >> 3;
+	modesize = GET_MODE_SIZE (GET_MODE (x));
+	
+	/* Only certain alignment specifiers are supported by the hardware.  */
+	if (modesize == 16 && (align % 32) == 0)
+	  align_bits = 256;
+	else if ((modesize == 8 || modesize == 16) && (align % 16) == 0)
+	  align_bits = 128;
+	else if ((align % 8) == 0)
+	  align_bits = 64;
+	else
+	  align_bits = 0;
+	
+	if (align_bits != 0)
+	  asm_fprintf (stream, ":%d", align_bits);
+
+	asm_fprintf (stream, "]");
+
 	if (postinc)
 	  fputs("!", stream);
       }
@@ -23145,4 +23183,43 @@  arm_expand_sync (enum machine_mode mode,
     }
 }
 
+static bool
+arm_vector_alignment_reachable (const_tree type, bool is_packed)
+{
+  /* Vectors which aren't in packed structures will not be less aligned than
+     the natural alignment of their element type, so this is safe.  */
+  if (TARGET_NEON && !BYTES_BIG_ENDIAN)
+    return !is_packed;
+
+  return default_builtin_vector_alignment_reachable (type, is_packed);
+}
+
+static bool
+arm_builtin_support_vector_misalignment (enum machine_mode mode,
+					 const_tree type, int misalignment,
+					 bool is_packed)
+{
+  if (TARGET_NEON && !BYTES_BIG_ENDIAN)
+    {
+      HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type);
+
+      if (is_packed)
+        return align == 1;
+
+      /* If the misalignment is unknown, we should be able to handle the access
+	 so long as it is not to a member of a packed data structure.  */
+      if (misalignment == -1)
+        return true;
+
+      /* Return true if the misalignment is a multiple of the natural alignment
+         of the vector's element type.  This is probably always going to be
+	 true in practice, since we've already established that this isn't a
+	 packed access.  */
+      return ((misalignment % align) == 0);
+    }
+  
+  return default_builtin_support_vector_misalignment (mode, type, misalignment,
+						      is_packed);
+}
+
 #include "gt-arm.h"
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 164939)
+++ gcc/config/arm/neon.md	(working copy)
@@ -141,6 +141,7 @@ 
    (UNSPEC_VUZP2		202)
    (UNSPEC_VZIP1		203)
    (UNSPEC_VZIP2		204)
+   (UNSPEC_MISALIGNED_ACCESS	205)
    (UNSPEC_VCLE			206)
    (UNSPEC_VCLT			207)])
 
@@ -369,6 +370,52 @@ 
   neon_disambiguate_copy (operands, dest, src, 4);
 })
 
+(define_expand "movmisalign<mode>"
+  [(set (match_operand:VDQX 0 "nonimmediate_operand"	      "")
+	(unspec:VDQX [(match_operand:VDQX 1 "general_operand" "")]
+		     UNSPEC_MISALIGNED_ACCESS))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+{
+  /* This pattern is not permitted to fail during expansion: if both arguments
+     are non-registers (e.g. memory := constant, which can be created by the
+     auto-vectorizer), force operand 1 into a register.  */
+  if (!s_register_operand (operands[0], <MODE>mode)
+      && !s_register_operand (operands[1], <MODE>mode))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+})
+
+(define_insn "*movmisalign<mode>_neon_store"
+  [(set (match_operand:VDX 0 "memory_operand"		       "=Um")
+	(unspec:VDX [(match_operand:VDX 1 "s_register_operand" " w")]
+		    UNSPEC_MISALIGNED_ACCESS))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+  "vst1.<V_sz_elem>\t{%P1}, %A0"
+  [(set_attr "neon_type" "neon_vst1_1_2_regs_vst2_2_regs")])
+
+(define_insn "*movmisalign<mode>_neon_load"
+  [(set (match_operand:VDX 0 "s_register_operand"	   "=w")
+	(unspec:VDX [(match_operand:VDX 1 "memory_operand" " Um")]
+		    UNSPEC_MISALIGNED_ACCESS))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+  "vld1.<V_sz_elem>\t{%P0}, %A1"
+  [(set_attr "neon_type" "neon_vld1_1_2_regs")])
+
+(define_insn "*movmisalign<mode>_neon_store"
+  [(set (match_operand:VQX 0 "memory_operand"		       "=Um")
+	(unspec:VQX [(match_operand:VQX 1 "s_register_operand" " w")]
+		    UNSPEC_MISALIGNED_ACCESS))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+  "vst1.<V_sz_elem>\t{%q1}, %A0"
+  [(set_attr "neon_type" "neon_vst1_1_2_regs_vst2_2_regs")])
+
+(define_insn "*movmisalign<mode>_neon_load"
+  [(set (match_operand:VQX 0 "s_register_operand"	   "=w")
+	(unspec:VQX [(match_operand:VQX 1 "memory_operand" " Um")]
+		    UNSPEC_MISALIGNED_ACCESS))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+  "vld1.<V_sz_elem>\t{%q0}, %A1"
+  [(set_attr "neon_type" "neon_vld1_1_2_regs")])
+
 (define_insn "vec_set<mode>_internal"
   [(set (match_operand:VD 0 "s_register_operand" "=w")
         (vec_merge:VD