diff mbox series

[2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

Message ID mpt7cszpc4d.fsf@arm.com
State New
Headers show
Series [1/2] md: Allow <FOO> to refer to the value of int iterator FOO | expand

Commit Message

Richard Sandiford May 23, 2023, 10:37 a.m. UTC
At -O2, and so with SLP vectorisation enabled:

    struct complx_t { float re, im; };
    complx_t add(complx_t a, complx_t b) {
      return {a.re + b.re, a.im + b.im};
    }

generates:

        fmov    w3, s1
        fmov    x0, d0
        fmov    x1, d2
        fmov    w2, s3
        bfi     x0, x3, 32, 32
        fmov    d31, x0
        bfi     x1, x2, 32, 32
        fmov    d30, x1
        fadd    v31.2s, v31.2s, v30.2s
        fmov    x1, d31
        lsr     x0, x1, 32
        fmov    s1, w0
        lsr     w0, w1, 0
        fmov    s0, w0
        ret

This is because complx_t is passed and returned in FPRs, but GCC gives
it DImode.  We therefore “need” to assemble a DImode pseudo from the
two individual floats, bitcast it to a vector, do the arithmetic,
bitcast it back to a DImode pseudo, then extract the individual floats.

There are many problems here.  The most basic is that we shouldn't
use SLP for such a trivial example.  But SLP should in principle be
beneficial for more complicated examples, so preventing SLP for the
example above just changes the reproducer needed.  A more fundamental
problem is that it doesn't make sense to use single DImode pseudos in a
testcase like this.  I have a WIP patch to allow re and im to be stored
in individual SFmode pseudos instead, but it's quite an invasive change
and might end up going nowhere.

A simpler problem to tackle is that we allow DImode pseudos to be stored
in FPRs, but we don't provide any patterns for inserting values into
them, even though INS makes that easy for element-like insertions.
This patch adds some patterns for that.

Doing that showed that aarch64_modes_tieable_p was too strict:
it didn't allow SFmode and DImode values to be tied, even though
both of them occupy a single GPR and FPR, and even though we allow
both classes to change between the modes.

The *aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS> pattern is
especially ugly, but it's not clear what target-independent
code ought to simplify it to, if it was going to simplify it.

We should probably do the same thing for extractions, but that's left
as future work.

After the patch we generate:

        ins     v0.s[1], v1.s[0]
        ins     v2.s[1], v3.s[0]
        fadd    v0.2s, v0.2s, v2.2s
        fmov    x0, d0
        ushr    d1, d0, 32
        lsr     w0, w0, 0
        fmov    s0, w0
        ret

which seems like a step in the right direction.

All in all, there's nothing elegant about this patchh.  It just
seems like the least worst option.

Tested on aarch64-linux-gnu and aarch64_be-elf (including ILP32).
Pushed to trunk.

Richard


gcc/
	PR target/109632
	* config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow
	subregs between any scalars that are 64 bits or smaller.
	* config/aarch64/iterators.md (SUBDI_BITS): New int iterator.
	(bits_etype): New int attribute.
	* config/aarch64/aarch64.md (*insv_reg<mode>_<SUBDI_BITS>)
	(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): New patterns.
	(*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/ins_bitfield_1.c: New test.
	* gcc.target/aarch64/ins_bitfield_2.c: Likewise.
	* gcc.target/aarch64/ins_bitfield_3.c: Likewise.
	* gcc.target/aarch64/ins_bitfield_4.c: Likewise.
	* gcc.target/aarch64/ins_bitfield_5.c: Likewise.
	* gcc.target/aarch64/ins_bitfield_6.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc                 |  12 ++
 gcc/config/aarch64/aarch64.md                 |  62 +++++++
 gcc/config/aarch64/iterators.md               |   4 +
 .../gcc.target/aarch64/ins_bitfield_1.c       | 142 ++++++++++++++++
 .../gcc.target/aarch64/ins_bitfield_2.c       | 142 ++++++++++++++++
 .../gcc.target/aarch64/ins_bitfield_3.c       | 156 ++++++++++++++++++
 .../gcc.target/aarch64/ins_bitfield_4.c       | 156 ++++++++++++++++++
 .../gcc.target/aarch64/ins_bitfield_5.c       | 139 ++++++++++++++++
 .../gcc.target/aarch64/ins_bitfield_6.c       | 139 ++++++++++++++++
 9 files changed, 952 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c

Comments

Richard Biener May 23, 2023, 10:51 a.m. UTC | #1
On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> At -O2, and so with SLP vectorisation enabled:
>
>     struct complx_t { float re, im; };
>     complx_t add(complx_t a, complx_t b) {
>       return {a.re + b.re, a.im + b.im};
>     }
>
> generates:
>
>         fmov    w3, s1
>         fmov    x0, d0
>         fmov    x1, d2
>         fmov    w2, s3
>         bfi     x0, x3, 32, 32
>         fmov    d31, x0
>         bfi     x1, x2, 32, 32
>         fmov    d30, x1
>         fadd    v31.2s, v31.2s, v30.2s
>         fmov    x1, d31
>         lsr     x0, x1, 32
>         fmov    s1, w0
>         lsr     w0, w1, 0
>         fmov    s0, w0
>         ret
>
> This is because complx_t is passed and returned in FPRs, but GCC gives
> it DImode.

Isn't that the choice of the target?  Of course "FPRs" might mean a
single FPR here and arguably DFmode would be similarly bad?

That said, to the ppc folks who also recently tried to change how
argument passing materializes I suggested to piggy-back on a
SRA style analysis (you could probably simply build the access
tree for all function parameters using its infrastructure) to drive
RTL expansion heuristics (it's heuristics after all...) what exact
(set of) pseudo / stack slot we want to form from the actual
argument hard registers.

> We therefore “need” to assemble a DImode pseudo from the
> two individual floats, bitcast it to a vector, do the arithmetic,
> bitcast it back to a DImode pseudo, then extract the individual floats.
>
> There are many problems here.  The most basic is that we shouldn't
> use SLP for such a trivial example.  But SLP should in principle be
> beneficial for more complicated examples, so preventing SLP for the
> example above just changes the reproducer needed.  A more fundamental
> problem is that it doesn't make sense to use single DImode pseudos in a
> testcase like this.  I have a WIP patch to allow re and im to be stored
> in individual SFmode pseudos instead, but it's quite an invasive change
> and might end up going nowhere.
>
> A simpler problem to tackle is that we allow DImode pseudos to be stored
> in FPRs, but we don't provide any patterns for inserting values into
> them, even though INS makes that easy for element-like insertions.
> This patch adds some patterns for that.
>
> Doing that showed that aarch64_modes_tieable_p was too strict:
> it didn't allow SFmode and DImode values to be tied, even though
> both of them occupy a single GPR and FPR, and even though we allow
> both classes to change between the modes.
>
> The *aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS> pattern is
> especially ugly, but it's not clear what target-independent
> code ought to simplify it to, if it was going to simplify it.
>
> We should probably do the same thing for extractions, but that's left
> as future work.
>
> After the patch we generate:
>
>         ins     v0.s[1], v1.s[0]
>         ins     v2.s[1], v3.s[0]
>         fadd    v0.2s, v0.2s, v2.2s
>         fmov    x0, d0
>         ushr    d1, d0, 32
>         lsr     w0, w0, 0
>         fmov    s0, w0
>         ret
>
> which seems like a step in the right direction.
>
> All in all, there's nothing elegant about this patchh.  It just
> seems like the least worst option.
>
> Tested on aarch64-linux-gnu and aarch64_be-elf (including ILP32).
> Pushed to trunk.
>
> Richard
>
>
> gcc/
>         PR target/109632
>         * config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow
>         subregs between any scalars that are 64 bits or smaller.
>         * config/aarch64/iterators.md (SUBDI_BITS): New int iterator.
>         (bits_etype): New int attribute.
>         * config/aarch64/aarch64.md (*insv_reg<mode>_<SUBDI_BITS>)
>         (*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): New patterns.
>         (*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Likewise.
>
> gcc/testsuite/
>         * gcc.target/aarch64/ins_bitfield_1.c: New test.
>         * gcc.target/aarch64/ins_bitfield_2.c: Likewise.
>         * gcc.target/aarch64/ins_bitfield_3.c: Likewise.
>         * gcc.target/aarch64/ins_bitfield_4.c: Likewise.
>         * gcc.target/aarch64/ins_bitfield_5.c: Likewise.
>         * gcc.target/aarch64/ins_bitfield_6.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc                 |  12 ++
>  gcc/config/aarch64/aarch64.md                 |  62 +++++++
>  gcc/config/aarch64/iterators.md               |   4 +
>  .../gcc.target/aarch64/ins_bitfield_1.c       | 142 ++++++++++++++++
>  .../gcc.target/aarch64/ins_bitfield_2.c       | 142 ++++++++++++++++
>  .../gcc.target/aarch64/ins_bitfield_3.c       | 156 ++++++++++++++++++
>  .../gcc.target/aarch64/ins_bitfield_4.c       | 156 ++++++++++++++++++
>  .../gcc.target/aarch64/ins_bitfield_5.c       | 139 ++++++++++++++++
>  .../gcc.target/aarch64/ins_bitfield_6.c       | 139 ++++++++++++++++
>  9 files changed, 952 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d6fc94015fa..146c2ad4988 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24827,6 +24827,18 @@ aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>    if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2))
>      return true;
>
> +  /* Allow changes between scalar modes if both modes fit within 64 bits.
> +     This is because:
> +
> +     - We allow all such modes for both FPRs and GPRs.
> +     - They occupy a single register for both FPRs and GPRs.
> +     - We can reinterpret one mode as another in both types of register.  */
> +  if (is_a<scalar_mode> (mode1)
> +      && is_a<scalar_mode> (mode2)
> +      && known_le (GET_MODE_SIZE (mode1), 8)
> +      && known_le (GET_MODE_SIZE (mode2), 8))
> +    return true;
> +
>    /* We specifically want to allow elements of "structure" modes to
>       be tieable to the structure.  This more general condition allows
>       other rarer situations too.  The reason we don't extend this to
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 286f044cb8b..8b8951d7b14 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5862,6 +5862,25 @@ (define_expand "insv<mode>"
>    operands[3] = force_reg (<MODE>mode, value);
>  })
>
> +(define_insn "*insv_reg<mode>_<SUBDI_BITS>"
> +  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r,w,?w")
> +                         (const_int SUBDI_BITS)
> +                         (match_operand 1 "const_int_operand"))
> +       (match_operand:GPI 2 "register_operand" "r,w,r"))]
> +  "multiple_p (UINTVAL (operands[1]), <SUBDI_BITS>)
> +   && UINTVAL (operands[1]) + <SUBDI_BITS> <= <GPI:sizen>"
> +  {
> +    if (which_alternative == 0)
> +      return "bfi\t%<w>0, %<w>2, %1, <SUBDI_BITS>";
> +
> +    operands[1] = gen_int_mode (UINTVAL (operands[1]) / <SUBDI_BITS>, SImode);
> +    if (which_alternative == 1)
> +      return "ins\t%0.<bits_etype>[%1], %2.<bits_etype>[0]";
> +    return "ins\t%0.<bits_etype>[%1], %w2";
> +  }
> +  [(set_attr "type" "bfm,neon_ins_q,neon_ins_q")]
> +)
> +
>  (define_insn "*insv_reg<mode>"
>    [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>                           (match_operand 1 "const_int_operand" "n")
> @@ -5874,6 +5893,27 @@ (define_insn "*insv_reg<mode>"
>    [(set_attr "type" "bfm")]
>  )
>
> +(define_insn_and_split "*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>"
> +  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r,w,?w")
> +                         (const_int SUBDI_BITS)
> +                         (match_operand 1 "const_int_operand"))
> +       (zero_extend:GPI (match_operand:ALLX 2  "register_operand" "r,w,r")))]
> +  "<SUBDI_BITS> <= <ALLX:sizen>
> +   && multiple_p (UINTVAL (operands[1]), <SUBDI_BITS>)
> +   && UINTVAL (operands[1]) + <SUBDI_BITS> <= <GPI:sizen>"
> +  "#"
> +  "&& 1"
> +  [(set (zero_extract:GPI (match_dup 0)
> +                         (const_int SUBDI_BITS)
> +                         (match_dup 1))
> +       (match_dup 2))]
> +  {
> +    operands[2] = lowpart_subreg (<GPI:MODE>mode, operands[2],
> +                                 <ALLX:MODE>mode);
> +  }
> +  [(set_attr "type" "bfm,neon_ins_q,neon_ins_q")]
> +)
> +
>  (define_insn "*aarch64_bfi<GPI:mode><ALLX:mode>4"
>    [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>                           (match_operand 1 "const_int_operand" "n")
> @@ -5884,6 +5924,28 @@ (define_insn "*aarch64_bfi<GPI:mode><ALLX:mode>4"
>    [(set_attr "type" "bfm")]
>  )
>
> +(define_insn_and_split "*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>"
> +  [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+r,w,?w")
> +                        (const_int SUBDI_BITS)
> +                        (match_operand 1 "const_int_operand"))
> +       (match_operator:DI 2 "subreg_lowpart_operator"
> +         [(zero_extend:SI
> +            (match_operand:ALLX 3  "register_operand" "r,w,r"))]))]
> +  "<SUBDI_BITS> <= <ALLX:sizen>
> +   && multiple_p (UINTVAL (operands[1]), <SUBDI_BITS>)
> +   && UINTVAL (operands[1]) + <SUBDI_BITS> <= 64"
> +  "#"
> +  "&& 1"
> +  [(set (zero_extract:DI (match_dup 0)
> +                        (const_int SUBDI_BITS)
> +                        (match_dup 1))
> +       (match_dup 2))]
> +  {
> +    operands[2] = lowpart_subreg (DImode, operands[3], <ALLX:MODE>mode);
> +  }
> +  [(set_attr "type" "bfm,neon_ins_q,neon_ins_q")]
> +)
> +
>  ;;  Match a bfi instruction where the shift of OP3 means that we are
>  ;;  actually copying the least significant bits of OP3 into OP0 by way
>  ;;  of the AND masks and the IOR instruction.  A similar instruction
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 4f1fd648e7f..8aabdb7c023 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -3174,6 +3174,8 @@ (define_int_attr atomic_ldoptab
>   [(UNSPECV_ATOMIC_LDOP_OR "ior") (UNSPECV_ATOMIC_LDOP_BIC "bic")
>    (UNSPECV_ATOMIC_LDOP_XOR "xor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
>
> +(define_int_iterator SUBDI_BITS [8 16 32])
> +
>  ;; -------------------------------------------------------------------
>  ;; Int Iterators Attributes.
>  ;; -------------------------------------------------------------------
> @@ -4004,3 +4006,5 @@ (define_int_attr fpscr_name
>     (UNSPECV_SET_FPSR "fpsr")
>     (UNSPECV_GET_FPCR "fpcr")
>     (UNSPECV_SET_FPCR "fpcr")])
> +
> +(define_int_attr bits_etype [(8 "b") (16 "h") (32 "s")])
> diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
> new file mode 100644
> index 00000000000..592e98b9470
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
> @@ -0,0 +1,142 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mlittle-endian --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +typedef unsigned char v16qi __attribute__((vector_size(16)));
> +typedef unsigned short v8hi __attribute__((vector_size(16)));
> +typedef unsigned int v4si __attribute__((vector_size(16)));
> +
> +struct di_qi_1 { unsigned char c[4]; unsigned int x; };
> +struct di_qi_2 { unsigned int x; unsigned char c[4]; };
> +
> +struct di_hi_1 { unsigned short s[2]; unsigned int x; };
> +struct di_hi_2 { unsigned int x; unsigned short s[2]; };
> +
> +struct di_si { unsigned int i[2]; };
> +
> +struct si_qi_1 { unsigned char c[2]; unsigned short x; };
> +struct si_qi_2 { unsigned short x; unsigned char c[2]; };
> +
> +struct si_hi { unsigned short s[2]; };
> +
> +#define TEST(NAME, STYPE, VTYPE, LHS, RHS)     \
> +  void                                         \
> +  NAME (VTYPE x)                               \
> +  {                                            \
> +    register struct STYPE y asm ("v1");                \
> +    asm volatile ("" : "=w" (y));              \
> +    LHS = RHS;                                 \
> +    asm volatile ("" :: "w" (y));              \
> +  }
> +
> +/*
> +** f_di_qi_0:
> +**     ins     v1\.b\[0\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_0, di_qi_1, v16qi, y.c[0], x[0])
> +
> +/*
> +** f_di_qi_1:
> +**     ins     v1\.b\[3\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_1, di_qi_1, v16qi, y.c[3], x[0])
> +
> +/*
> +** f_di_qi_2:
> +**     ins     v1\.b\[4\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_2, di_qi_2, v16qi, y.c[0], x[0])
> +
> +/*
> +** f_di_qi_3:
> +**     ins     v1\.b\[7\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_3, di_qi_2, v16qi, y.c[3], x[0])
> +
> +/*
> +** f_di_hi_0:
> +**     ins     v1\.h\[0\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_0, di_hi_1, v8hi, y.s[0], x[0])
> +
> +/*
> +** f_di_hi_1:
> +**     ins     v1\.h\[1\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_1, di_hi_1, v8hi, y.s[1], x[0])
> +
> +/*
> +** f_di_hi_2:
> +**     ins     v1\.h\[2\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_2, di_hi_2, v8hi, y.s[0], x[0])
> +
> +/*
> +** f_di_hi_3:
> +**     ins     v1\.h\[3\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_3, di_hi_2, v8hi, y.s[1], x[0])
> +
> +/*
> +** f_di_si_0:
> +**     ins     v1\.s\[0\], v0\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_0, di_si, v4si, y.i[0], x[0])
> +
> +/*
> +** f_di_si_1:
> +**     ins     v1\.s\[1\], v0\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_1, di_si, v4si, y.i[1], x[0])
> +
> +/*
> +** f_si_qi_0:
> +**     ins     v1\.b\[0\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_0, si_qi_1, v16qi, y.c[0], x[0])
> +
> +/*
> +** f_si_qi_1:
> +**     ins     v1\.b\[1\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_1, si_qi_1, v16qi, y.c[1], x[0])
> +
> +/*
> +** f_si_qi_2:
> +**     ins     v1\.b\[2\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_2, si_qi_2, v16qi, y.c[0], x[0])
> +
> +/*
> +** f_si_qi_3:
> +**     ins     v1\.b\[3\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_3, si_qi_2, v16qi, y.c[1], x[0])
> +
> +/*
> +** f_si_hi_0:
> +**     ins     v1\.h\[0\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_0, si_hi, v8hi, y.s[0], x[0])
> +
> +/*
> +** f_si_hi_1:
> +**     ins     v1\.h\[1\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_1, si_hi, v8hi, y.s[1], x[0])
> diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
> new file mode 100644
> index 00000000000..152418889fa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
> @@ -0,0 +1,142 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mbig-endian --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +typedef unsigned char v16qi __attribute__((vector_size(16)));
> +typedef unsigned short v8hi __attribute__((vector_size(16)));
> +typedef unsigned int v4si __attribute__((vector_size(16)));
> +
> +struct di_qi_1 { unsigned char c[4]; unsigned int x; };
> +struct di_qi_2 { unsigned int x; unsigned char c[4]; };
> +
> +struct di_hi_1 { unsigned short s[2]; unsigned int x; };
> +struct di_hi_2 { unsigned int x; unsigned short s[2]; };
> +
> +struct di_si { unsigned int i[2]; };
> +
> +struct si_qi_1 { unsigned char c[2]; unsigned short x; };
> +struct si_qi_2 { unsigned short x; unsigned char c[2]; };
> +
> +struct si_hi { unsigned short s[2]; };
> +
> +#define TEST(NAME, STYPE, VTYPE, LHS, RHS)     \
> +  void                                         \
> +  NAME (VTYPE x)                               \
> +  {                                            \
> +    register struct STYPE y asm ("v1");                \
> +    asm volatile ("" : "=w" (y));              \
> +    LHS = RHS;                                 \
> +    asm volatile ("" :: "w" (y));              \
> +  }
> +
> +/*
> +** f_di_qi_0:
> +**     ins     v1\.b\[7\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_0, di_qi_1, v16qi, y.c[0], x[15])
> +
> +/*
> +** f_di_qi_1:
> +**     ins     v1\.b\[4\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_1, di_qi_1, v16qi, y.c[3], x[15])
> +
> +/*
> +** f_di_qi_2:
> +**     ins     v1\.b\[3\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_2, di_qi_2, v16qi, y.c[0], x[15])
> +
> +/*
> +** f_di_qi_3:
> +**     ins     v1\.b\[0\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_3, di_qi_2, v16qi, y.c[3], x[15])
> +
> +/*
> +** f_di_hi_0:
> +**     ins     v1\.h\[3\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_0, di_hi_1, v8hi, y.s[0], x[7])
> +
> +/*
> +** f_di_hi_1:
> +**     ins     v1\.h\[2\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_1, di_hi_1, v8hi, y.s[1], x[7])
> +
> +/*
> +** f_di_hi_2:
> +**     ins     v1\.h\[1\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_2, di_hi_2, v8hi, y.s[0], x[7])
> +
> +/*
> +** f_di_hi_3:
> +**     ins     v1\.h\[0\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_3, di_hi_2, v8hi, y.s[1], x[7])
> +
> +/*
> +** f_di_si_0:
> +**     ins     v1\.s\[1\], v0\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_0, di_si, v4si, y.i[0], x[3])
> +
> +/*
> +** f_di_si_1:
> +**     ins     v1\.s\[0\], v0\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_1, di_si, v4si, y.i[1], x[3])
> +
> +/*
> +** f_si_qi_0:
> +**     ins     v1\.b\[3\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_0, si_qi_1, v16qi, y.c[0], x[15])
> +
> +/*
> +** f_si_qi_1:
> +**     ins     v1\.b\[2\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_1, si_qi_1, v16qi, y.c[1], x[15])
> +
> +/*
> +** f_si_qi_2:
> +**     ins     v1\.b\[1\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_2, si_qi_2, v16qi, y.c[0], x[15])
> +
> +/*
> +** f_si_qi_3:
> +**     ins     v1\.b\[0\], v0\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_3, si_qi_2, v16qi, y.c[1], x[15])
> +
> +/*
> +** f_si_hi_0:
> +**     ins     v1\.h\[1\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_0, si_hi, v8hi, y.s[0], x[7])
> +
> +/*
> +** f_si_hi_1:
> +**     ins     v1\.h\[0\], v0\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_1, si_hi, v8hi, y.s[1], x[7])
> diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
> new file mode 100644
> index 00000000000..0ef95a97996
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
> @@ -0,0 +1,156 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mlittle-endian --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +struct di_qi_1 { unsigned char c[4]; unsigned int x; };
> +struct di_qi_2 { unsigned int x; unsigned char c[4]; };
> +
> +struct di_hi_1 { unsigned short s[2]; unsigned int x; };
> +struct di_hi_2 { unsigned int x; unsigned short s[2]; };
> +
> +struct di_si { unsigned int i[2]; };
> +
> +struct si_qi_1 { unsigned char c[2]; unsigned short x; };
> +struct si_qi_2 { unsigned short x; unsigned char c[2]; };
> +
> +struct si_hi { unsigned short s[2]; };
> +
> +#define TEST(NAME, STYPE, ETYPE, LHS)          \
> +  void                                         \
> +  NAME (volatile ETYPE *ptr)                   \
> +  {                                            \
> +    register struct STYPE y asm ("v1");                \
> +    asm volatile ("" : "=w" (y));              \
> +    ETYPE x = *ptr;                            \
> +    __UINT64_TYPE__ value = (ETYPE) x;         \
> +    LHS = value;                               \
> +    asm volatile ("" :: "w" (y));              \
> +  }
> +
> +/*
> +** f_di_qi_0:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[0\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_1:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[3\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
> +
> +/*
> +** f_di_qi_2:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[4\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_3:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[7\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
> +
> +/*
> +** f_di_hi_0:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[0\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_1:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[1\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
> +
> +/*
> +** f_di_hi_2:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[2\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_3:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[3\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
> +
> +/*
> +** f_di_si_0:
> +**     ldr     s([0-9]+), \[x0\]
> +**     ins     v1\.s\[0\], v\1\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_0, di_si, unsigned int, y.i[0])
> +
> +/*
> +** f_di_si_1:
> +**     ldr     s([0-9]+), \[x0\]
> +**     ins     v1\.s\[1\], v\1\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_1, di_si, unsigned int, y.i[1])
> +
> +/*
> +** f_si_qi_0:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[0\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_1:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[1\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
> +
> +/*
> +** f_si_qi_2:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[2\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_3:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[3\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
> +
> +/*
> +** f_si_hi_0:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[0\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
> +
> +/*
> +** f_si_hi_1:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[1\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
> diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
> new file mode 100644
> index 00000000000..98e25c86959
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
> @@ -0,0 +1,156 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mbig-endian --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +struct di_qi_1 { unsigned char c[4]; unsigned int x; };
> +struct di_qi_2 { unsigned int x; unsigned char c[4]; };
> +
> +struct di_hi_1 { unsigned short s[2]; unsigned int x; };
> +struct di_hi_2 { unsigned int x; unsigned short s[2]; };
> +
> +struct di_si { unsigned int i[2]; };
> +
> +struct si_qi_1 { unsigned char c[2]; unsigned short x; };
> +struct si_qi_2 { unsigned short x; unsigned char c[2]; };
> +
> +struct si_hi { unsigned short s[2]; };
> +
> +#define TEST(NAME, STYPE, ETYPE, LHS)          \
> +  void                                         \
> +  NAME (volatile ETYPE *ptr)                   \
> +  {                                            \
> +    register struct STYPE y asm ("v1");                \
> +    asm volatile ("" : "=w" (y));              \
> +    ETYPE x = *ptr;                            \
> +    __UINT64_TYPE__ value = (ETYPE) x;         \
> +    LHS = value;                               \
> +    asm volatile ("" :: "w" (y));              \
> +  }
> +
> +/*
> +** f_di_qi_0:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[7\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_1:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[4\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
> +
> +/*
> +** f_di_qi_2:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[3\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_3:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[0\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
> +
> +/*
> +** f_di_hi_0:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[3\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_1:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[2\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
> +
> +/*
> +** f_di_hi_2:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[1\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_3:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[0\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
> +
> +/*
> +** f_di_si_0:
> +**     ldr     s([0-9]+), \[x0\]
> +**     ins     v1\.s\[1\], v\1\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_0, di_si, unsigned int, y.i[0])
> +
> +/*
> +** f_di_si_1:
> +**     ldr     s([0-9]+), \[x0\]
> +**     ins     v1\.s\[0\], v\1\.s\[0\]
> +**     ret
> +*/
> +TEST (f_di_si_1, di_si, unsigned int, y.i[1])
> +
> +/*
> +** f_si_qi_0:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[3\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_1:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[2\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
> +
> +/*
> +** f_si_qi_2:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[1\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_3:
> +**     ldr     b([0-9]+), \[x0\]
> +**     ins     v1\.b\[0\], v\1\.b\[0\]
> +**     ret
> +*/
> +TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
> +
> +/*
> +** f_si_hi_0:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[1\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
> +
> +/*
> +** f_si_hi_1:
> +**     ldr     h([0-9]+), \[x0\]
> +**     ins     v1\.h\[0\], v\1\.h\[0\]
> +**     ret
> +*/
> +TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
> diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
> new file mode 100644
> index 00000000000..6debf5419cd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
> @@ -0,0 +1,139 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mlittle-endian --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +struct di_qi_1 { unsigned char c[4]; unsigned int x; };
> +struct di_qi_2 { unsigned int x; unsigned char c[4]; };
> +
> +struct di_hi_1 { unsigned short s[2]; unsigned int x; };
> +struct di_hi_2 { unsigned int x; unsigned short s[2]; };
> +
> +struct di_si { unsigned int i[2]; };
> +
> +struct si_qi_1 { unsigned char c[2]; unsigned short x; };
> +struct si_qi_2 { unsigned short x; unsigned char c[2]; };
> +
> +struct si_hi { unsigned short s[2]; };
> +
> +#define TEST(NAME, STYPE, ETYPE, LHS)          \
> +  void                                         \
> +  NAME (void)                                  \
> +  {                                            \
> +    register struct STYPE y asm ("v1");                \
> +    register ETYPE x asm ("x0");               \
> +    asm volatile ("" : "=w" (y), "=r" (x));    \
> +    LHS = x;                                   \
> +    asm volatile ("" :: "w" (y));              \
> +  }
> +
> +/*
> +** f_di_qi_0:
> +**     ins     v1\.b\[0\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_1:
> +**     ins     v1\.b\[3\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
> +
> +/*
> +** f_di_qi_2:
> +**     ins     v1\.b\[4\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_3:
> +**     ins     v1\.b\[7\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
> +
> +/*
> +** f_di_hi_0:
> +**     ins     v1\.h\[0\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_1:
> +**     ins     v1\.h\[1\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
> +
> +/*
> +** f_di_hi_2:
> +**     ins     v1\.h\[2\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_3:
> +**     ins     v1\.h\[3\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
> +
> +/*
> +** f_di_si_0:
> +**     ins     v1\.s\[0\], w0
> +**     ret
> +*/
> +TEST (f_di_si_0, di_si, unsigned int, y.i[0])
> +
> +/*
> +** f_di_si_1:
> +**     ins     v1\.s\[1\], w0
> +**     ret
> +*/
> +TEST (f_di_si_1, di_si, unsigned int, y.i[1])
> +
> +/*
> +** f_si_qi_0:
> +**     ins     v1\.b\[0\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_1:
> +**     ins     v1\.b\[1\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
> +
> +/*
> +** f_si_qi_2:
> +**     ins     v1\.b\[2\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_3:
> +**     ins     v1\.b\[3\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
> +
> +/*
> +** f_si_hi_0:
> +**     ins     v1\.h\[0\], w0
> +**     ret
> +*/
> +TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
> +
> +/*
> +** f_si_hi_1:
> +**     ins     v1\.h\[1\], w0
> +**     ret
> +*/
> +TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
> diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c
> new file mode 100644
> index 00000000000..cb8af6b0623
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c
> @@ -0,0 +1,139 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mbig-endian --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +struct di_qi_1 { unsigned char c[4]; unsigned int x; };
> +struct di_qi_2 { unsigned int x; unsigned char c[4]; };
> +
> +struct di_hi_1 { unsigned short s[2]; unsigned int x; };
> +struct di_hi_2 { unsigned int x; unsigned short s[2]; };
> +
> +struct di_si { unsigned int i[2]; };
> +
> +struct si_qi_1 { unsigned char c[2]; unsigned short x; };
> +struct si_qi_2 { unsigned short x; unsigned char c[2]; };
> +
> +struct si_hi { unsigned short s[2]; };
> +
> +#define TEST(NAME, STYPE, ETYPE, LHS)          \
> +  void                                         \
> +  NAME (void)                                  \
> +  {                                            \
> +    register struct STYPE y asm ("v1");                \
> +    register ETYPE x asm ("x0");               \
> +    asm volatile ("" : "=w" (y), "=r" (x));    \
> +    LHS = x;                                   \
> +    asm volatile ("" :: "w" (y));              \
> +  }
> +
> +/*
> +** f_di_qi_0:
> +**     ins     v1\.b\[7\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_1:
> +**     ins     v1\.b\[4\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
> +
> +/*
> +** f_di_qi_2:
> +**     ins     v1\.b\[3\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_di_qi_3:
> +**     ins     v1\.b\[0\], w0
> +**     ret
> +*/
> +TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
> +
> +/*
> +** f_di_hi_0:
> +**     ins     v1\.h\[3\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_1:
> +**     ins     v1\.h\[2\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
> +
> +/*
> +** f_di_hi_2:
> +**     ins     v1\.h\[1\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
> +
> +/*
> +** f_di_hi_3:
> +**     ins     v1\.h\[0\], w0
> +**     ret
> +*/
> +TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
> +
> +/*
> +** f_di_si_0:
> +**     ins     v1\.s\[1\], w0
> +**     ret
> +*/
> +TEST (f_di_si_0, di_si, unsigned int, y.i[0])
> +
> +/*
> +** f_di_si_1:
> +**     ins     v1\.s\[0\], w0
> +**     ret
> +*/
> +TEST (f_di_si_1, di_si, unsigned int, y.i[1])
> +
> +/*
> +** f_si_qi_0:
> +**     ins     v1\.b\[3\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_1:
> +**     ins     v1\.b\[2\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
> +
> +/*
> +** f_si_qi_2:
> +**     ins     v1\.b\[1\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
> +
> +/*
> +** f_si_qi_3:
> +**     ins     v1\.b\[0\], w0
> +**     ret
> +*/
> +TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
> +
> +/*
> +** f_si_hi_0:
> +**     ins     v1\.h\[1\], w0
> +**     ret
> +*/
> +TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
> +
> +/*
> +** f_si_hi_1:
> +**     ins     v1\.h\[0\], w0
> +**     ret
> +*/
> +TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
> --
> 2.25.1
>
Richard Sandiford May 23, 2023, 11:02 a.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> At -O2, and so with SLP vectorisation enabled:
>>
>>     struct complx_t { float re, im; };
>>     complx_t add(complx_t a, complx_t b) {
>>       return {a.re + b.re, a.im + b.im};
>>     }
>>
>> generates:
>>
>>         fmov    w3, s1
>>         fmov    x0, d0
>>         fmov    x1, d2
>>         fmov    w2, s3
>>         bfi     x0, x3, 32, 32
>>         fmov    d31, x0
>>         bfi     x1, x2, 32, 32
>>         fmov    d30, x1
>>         fadd    v31.2s, v31.2s, v30.2s
>>         fmov    x1, d31
>>         lsr     x0, x1, 32
>>         fmov    s1, w0
>>         lsr     w0, w1, 0
>>         fmov    s0, w0
>>         ret
>>
>> This is because complx_t is passed and returned in FPRs, but GCC gives
>> it DImode.
>
> Isn't that the choice of the target?  Of course "FPRs" might mean a
> single FPR here and arguably DFmode would be similarly bad?

Yeah, the problem is really the "single register" aspect, rather than
the exact choice of mode.  We're happy to store DImode values in FPRs
if it makes sense (and we will do for this example, after the patch).

V2SFmode or DFmode would be just as bad, like you say.

> That said, to the ppc folks who also recently tried to change how
> argument passing materializes I suggested to piggy-back on a
> SRA style analysis (you could probably simply build the access
> tree for all function parameters using its infrastructure) to drive
> RTL expansion heuristics (it's heuristics after all...) what exact
> (set of) pseudo / stack slot we want to form from the actual
> argument hard registers.

My long-term plan is to allow a DECL_RTL to be a PARALLEL of pseudos,
just like the DECL_INCOMING_RTL of a PARM_DECL can be a PARALLEL of
hard registers.  This also makes it possible to store things that are
currently BLKmode (but still passed and returned in registers).
E.g. it means that a 12-byte structure can be stored in registers
rather than being forced to the stack frame.

The idea (at least at first) is to handle only those cases that
make sense from an ABI point of view.  We'd still be relying on
SRA to split up operations on individual fields.

I have a WIP patch that gives some promising improvements,
but it needs more time.

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d6fc94015fa..146c2ad4988 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24827,6 +24827,18 @@  aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
   if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2))
     return true;
 
+  /* Allow changes between scalar modes if both modes fit within 64 bits.
+     This is because:
+
+     - We allow all such modes for both FPRs and GPRs.
+     - They occupy a single register for both FPRs and GPRs.
+     - We can reinterpret one mode as another in both types of register.  */
+  if (is_a<scalar_mode> (mode1)
+      && is_a<scalar_mode> (mode2)
+      && known_le (GET_MODE_SIZE (mode1), 8)
+      && known_le (GET_MODE_SIZE (mode2), 8))
+    return true;
+
   /* We specifically want to allow elements of "structure" modes to
      be tieable to the structure.  This more general condition allows
      other rarer situations too.  The reason we don't extend this to
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 286f044cb8b..8b8951d7b14 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5862,6 +5862,25 @@  (define_expand "insv<mode>"
   operands[3] = force_reg (<MODE>mode, value);
 })
 
+(define_insn "*insv_reg<mode>_<SUBDI_BITS>"
+  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r,w,?w")
+			  (const_int SUBDI_BITS)
+			  (match_operand 1 "const_int_operand"))
+	(match_operand:GPI 2 "register_operand" "r,w,r"))]
+  "multiple_p (UINTVAL (operands[1]), <SUBDI_BITS>)
+   && UINTVAL (operands[1]) + <SUBDI_BITS> <= <GPI:sizen>"
+  {
+    if (which_alternative == 0)
+      return "bfi\t%<w>0, %<w>2, %1, <SUBDI_BITS>";
+
+    operands[1] = gen_int_mode (UINTVAL (operands[1]) / <SUBDI_BITS>, SImode);
+    if (which_alternative == 1)
+      return "ins\t%0.<bits_etype>[%1], %2.<bits_etype>[0]";
+    return "ins\t%0.<bits_etype>[%1], %w2";
+  }
+  [(set_attr "type" "bfm,neon_ins_q,neon_ins_q")]
+)
+
 (define_insn "*insv_reg<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")
@@ -5874,6 +5893,27 @@  (define_insn "*insv_reg<mode>"
   [(set_attr "type" "bfm")]
 )
 
+(define_insn_and_split "*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>"
+  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r,w,?w")
+			  (const_int SUBDI_BITS)
+			  (match_operand 1 "const_int_operand"))
+	(zero_extend:GPI (match_operand:ALLX 2  "register_operand" "r,w,r")))]
+  "<SUBDI_BITS> <= <ALLX:sizen>
+   && multiple_p (UINTVAL (operands[1]), <SUBDI_BITS>)
+   && UINTVAL (operands[1]) + <SUBDI_BITS> <= <GPI:sizen>"
+  "#"
+  "&& 1"
+  [(set (zero_extract:GPI (match_dup 0)
+			  (const_int SUBDI_BITS)
+			  (match_dup 1))
+	(match_dup 2))]
+  {
+    operands[2] = lowpart_subreg (<GPI:MODE>mode, operands[2],
+				  <ALLX:MODE>mode);
+  }
+  [(set_attr "type" "bfm,neon_ins_q,neon_ins_q")]
+)
+
 (define_insn "*aarch64_bfi<GPI:mode><ALLX:mode>4"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")
@@ -5884,6 +5924,28 @@  (define_insn "*aarch64_bfi<GPI:mode><ALLX:mode>4"
   [(set_attr "type" "bfm")]
 )
 
+(define_insn_and_split "*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>"
+  [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+r,w,?w")
+			 (const_int SUBDI_BITS)
+			 (match_operand 1 "const_int_operand"))
+	(match_operator:DI 2 "subreg_lowpart_operator"
+	  [(zero_extend:SI
+	     (match_operand:ALLX 3  "register_operand" "r,w,r"))]))]
+  "<SUBDI_BITS> <= <ALLX:sizen>
+   && multiple_p (UINTVAL (operands[1]), <SUBDI_BITS>)
+   && UINTVAL (operands[1]) + <SUBDI_BITS> <= 64"
+  "#"
+  "&& 1"
+  [(set (zero_extract:DI (match_dup 0)
+			 (const_int SUBDI_BITS)
+			 (match_dup 1))
+	(match_dup 2))]
+  {
+    operands[2] = lowpart_subreg (DImode, operands[3], <ALLX:MODE>mode);
+  }
+  [(set_attr "type" "bfm,neon_ins_q,neon_ins_q")]
+)
+
 ;;  Match a bfi instruction where the shift of OP3 means that we are
 ;;  actually copying the least significant bits of OP3 into OP0 by way
 ;;  of the AND masks and the IOR instruction.  A similar instruction
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 4f1fd648e7f..8aabdb7c023 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3174,6 +3174,8 @@  (define_int_attr atomic_ldoptab
  [(UNSPECV_ATOMIC_LDOP_OR "ior") (UNSPECV_ATOMIC_LDOP_BIC "bic")
   (UNSPECV_ATOMIC_LDOP_XOR "xor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
 
+(define_int_iterator SUBDI_BITS [8 16 32])
+
 ;; -------------------------------------------------------------------
 ;; Int Iterators Attributes.
 ;; -------------------------------------------------------------------
@@ -4004,3 +4006,5 @@  (define_int_attr fpscr_name
    (UNSPECV_SET_FPSR "fpsr")
    (UNSPECV_GET_FPCR "fpcr")
    (UNSPECV_SET_FPCR "fpcr")])
+
+(define_int_attr bits_etype [(8 "b") (16 "h") (32 "s")])
diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
new file mode 100644
index 00000000000..592e98b9470
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
@@ -0,0 +1,142 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mlittle-endian --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef unsigned char v16qi __attribute__((vector_size(16)));
+typedef unsigned short v8hi __attribute__((vector_size(16)));
+typedef unsigned int v4si __attribute__((vector_size(16)));
+
+struct di_qi_1 { unsigned char c[4]; unsigned int x; };
+struct di_qi_2 { unsigned int x; unsigned char c[4]; };
+
+struct di_hi_1 { unsigned short s[2]; unsigned int x; };
+struct di_hi_2 { unsigned int x; unsigned short s[2]; };
+
+struct di_si { unsigned int i[2]; };
+
+struct si_qi_1 { unsigned char c[2]; unsigned short x; };
+struct si_qi_2 { unsigned short x; unsigned char c[2]; };
+
+struct si_hi { unsigned short s[2]; };
+
+#define TEST(NAME, STYPE, VTYPE, LHS, RHS)	\
+  void						\
+  NAME (VTYPE x)				\
+  {						\
+    register struct STYPE y asm ("v1");		\
+    asm volatile ("" : "=w" (y));		\
+    LHS = RHS;					\
+    asm volatile ("" :: "w" (y));		\
+  }
+
+/*
+** f_di_qi_0:
+**	ins	v1\.b\[0\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_0, di_qi_1, v16qi, y.c[0], x[0])
+
+/*
+** f_di_qi_1:
+**	ins	v1\.b\[3\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_1, di_qi_1, v16qi, y.c[3], x[0])
+
+/*
+** f_di_qi_2:
+**	ins	v1\.b\[4\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_2, di_qi_2, v16qi, y.c[0], x[0])
+
+/*
+** f_di_qi_3:
+**	ins	v1\.b\[7\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_3, di_qi_2, v16qi, y.c[3], x[0])
+
+/*
+** f_di_hi_0:
+**	ins	v1\.h\[0\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_0, di_hi_1, v8hi, y.s[0], x[0])
+
+/*
+** f_di_hi_1:
+**	ins	v1\.h\[1\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_1, di_hi_1, v8hi, y.s[1], x[0])
+
+/*
+** f_di_hi_2:
+**	ins	v1\.h\[2\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_2, di_hi_2, v8hi, y.s[0], x[0])
+
+/*
+** f_di_hi_3:
+**	ins	v1\.h\[3\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_3, di_hi_2, v8hi, y.s[1], x[0])
+
+/*
+** f_di_si_0:
+**	ins	v1\.s\[0\], v0\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_0, di_si, v4si, y.i[0], x[0])
+
+/*
+** f_di_si_1:
+**	ins	v1\.s\[1\], v0\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_1, di_si, v4si, y.i[1], x[0])
+
+/*
+** f_si_qi_0:
+**	ins	v1\.b\[0\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_0, si_qi_1, v16qi, y.c[0], x[0])
+
+/*
+** f_si_qi_1:
+**	ins	v1\.b\[1\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_1, si_qi_1, v16qi, y.c[1], x[0])
+
+/*
+** f_si_qi_2:
+**	ins	v1\.b\[2\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_2, si_qi_2, v16qi, y.c[0], x[0])
+
+/*
+** f_si_qi_3:
+**	ins	v1\.b\[3\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_3, si_qi_2, v16qi, y.c[1], x[0])
+
+/*
+** f_si_hi_0:
+**	ins	v1\.h\[0\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_0, si_hi, v8hi, y.s[0], x[0])
+
+/*
+** f_si_hi_1:
+**	ins	v1\.h\[1\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_1, si_hi, v8hi, y.s[1], x[0])
diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
new file mode 100644
index 00000000000..152418889fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
@@ -0,0 +1,142 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mbig-endian --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef unsigned char v16qi __attribute__((vector_size(16)));
+typedef unsigned short v8hi __attribute__((vector_size(16)));
+typedef unsigned int v4si __attribute__((vector_size(16)));
+
+struct di_qi_1 { unsigned char c[4]; unsigned int x; };
+struct di_qi_2 { unsigned int x; unsigned char c[4]; };
+
+struct di_hi_1 { unsigned short s[2]; unsigned int x; };
+struct di_hi_2 { unsigned int x; unsigned short s[2]; };
+
+struct di_si { unsigned int i[2]; };
+
+struct si_qi_1 { unsigned char c[2]; unsigned short x; };
+struct si_qi_2 { unsigned short x; unsigned char c[2]; };
+
+struct si_hi { unsigned short s[2]; };
+
+#define TEST(NAME, STYPE, VTYPE, LHS, RHS)	\
+  void						\
+  NAME (VTYPE x)				\
+  {						\
+    register struct STYPE y asm ("v1");		\
+    asm volatile ("" : "=w" (y));		\
+    LHS = RHS;					\
+    asm volatile ("" :: "w" (y));		\
+  }
+
+/*
+** f_di_qi_0:
+**	ins	v1\.b\[7\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_0, di_qi_1, v16qi, y.c[0], x[15])
+
+/*
+** f_di_qi_1:
+**	ins	v1\.b\[4\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_1, di_qi_1, v16qi, y.c[3], x[15])
+
+/*
+** f_di_qi_2:
+**	ins	v1\.b\[3\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_2, di_qi_2, v16qi, y.c[0], x[15])
+
+/*
+** f_di_qi_3:
+**	ins	v1\.b\[0\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_3, di_qi_2, v16qi, y.c[3], x[15])
+
+/*
+** f_di_hi_0:
+**	ins	v1\.h\[3\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_0, di_hi_1, v8hi, y.s[0], x[7])
+
+/*
+** f_di_hi_1:
+**	ins	v1\.h\[2\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_1, di_hi_1, v8hi, y.s[1], x[7])
+
+/*
+** f_di_hi_2:
+**	ins	v1\.h\[1\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_2, di_hi_2, v8hi, y.s[0], x[7])
+
+/*
+** f_di_hi_3:
+**	ins	v1\.h\[0\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_3, di_hi_2, v8hi, y.s[1], x[7])
+
+/*
+** f_di_si_0:
+**	ins	v1\.s\[1\], v0\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_0, di_si, v4si, y.i[0], x[3])
+
+/*
+** f_di_si_1:
+**	ins	v1\.s\[0\], v0\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_1, di_si, v4si, y.i[1], x[3])
+
+/*
+** f_si_qi_0:
+**	ins	v1\.b\[3\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_0, si_qi_1, v16qi, y.c[0], x[15])
+
+/*
+** f_si_qi_1:
+**	ins	v1\.b\[2\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_1, si_qi_1, v16qi, y.c[1], x[15])
+
+/*
+** f_si_qi_2:
+**	ins	v1\.b\[1\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_2, si_qi_2, v16qi, y.c[0], x[15])
+
+/*
+** f_si_qi_3:
+**	ins	v1\.b\[0\], v0\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_3, si_qi_2, v16qi, y.c[1], x[15])
+
+/*
+** f_si_hi_0:
+**	ins	v1\.h\[1\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_0, si_hi, v8hi, y.s[0], x[7])
+
+/*
+** f_si_hi_1:
+**	ins	v1\.h\[0\], v0\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_1, si_hi, v8hi, y.s[1], x[7])
diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
new file mode 100644
index 00000000000..0ef95a97996
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
@@ -0,0 +1,156 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mlittle-endian --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+struct di_qi_1 { unsigned char c[4]; unsigned int x; };
+struct di_qi_2 { unsigned int x; unsigned char c[4]; };
+
+struct di_hi_1 { unsigned short s[2]; unsigned int x; };
+struct di_hi_2 { unsigned int x; unsigned short s[2]; };
+
+struct di_si { unsigned int i[2]; };
+
+struct si_qi_1 { unsigned char c[2]; unsigned short x; };
+struct si_qi_2 { unsigned short x; unsigned char c[2]; };
+
+struct si_hi { unsigned short s[2]; };
+
+#define TEST(NAME, STYPE, ETYPE, LHS)		\
+  void						\
+  NAME (volatile ETYPE *ptr)			\
+  {						\
+    register struct STYPE y asm ("v1");		\
+    asm volatile ("" : "=w" (y));		\
+    ETYPE x = *ptr;				\
+    __UINT64_TYPE__ value = (ETYPE) x;		\
+    LHS = value;				\
+    asm volatile ("" :: "w" (y));		\
+  }
+
+/*
+** f_di_qi_0:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[0\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
+
+/*
+** f_di_qi_1:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[3\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
+
+/*
+** f_di_qi_2:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[4\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
+
+/*
+** f_di_qi_3:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[7\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
+
+/*
+** f_di_hi_0:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[0\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
+
+/*
+** f_di_hi_1:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[1\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
+
+/*
+** f_di_hi_2:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[2\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
+
+/*
+** f_di_hi_3:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[3\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
+
+/*
+** f_di_si_0:
+**	ldr	s([0-9]+), \[x0\]
+**	ins	v1\.s\[0\], v\1\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_0, di_si, unsigned int, y.i[0])
+
+/*
+** f_di_si_1:
+**	ldr	s([0-9]+), \[x0\]
+**	ins	v1\.s\[1\], v\1\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_1, di_si, unsigned int, y.i[1])
+
+/*
+** f_si_qi_0:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[0\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
+
+/*
+** f_si_qi_1:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[1\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
+
+/*
+** f_si_qi_2:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[2\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
+
+/*
+** f_si_qi_3:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[3\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
+
+/*
+** f_si_hi_0:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[0\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
+
+/*
+** f_si_hi_1:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[1\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
new file mode 100644
index 00000000000..98e25c86959
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
@@ -0,0 +1,156 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mbig-endian --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+struct di_qi_1 { unsigned char c[4]; unsigned int x; };
+struct di_qi_2 { unsigned int x; unsigned char c[4]; };
+
+struct di_hi_1 { unsigned short s[2]; unsigned int x; };
+struct di_hi_2 { unsigned int x; unsigned short s[2]; };
+
+struct di_si { unsigned int i[2]; };
+
+struct si_qi_1 { unsigned char c[2]; unsigned short x; };
+struct si_qi_2 { unsigned short x; unsigned char c[2]; };
+
+struct si_hi { unsigned short s[2]; };
+
+#define TEST(NAME, STYPE, ETYPE, LHS)		\
+  void						\
+  NAME (volatile ETYPE *ptr)			\
+  {						\
+    register struct STYPE y asm ("v1");		\
+    asm volatile ("" : "=w" (y));		\
+    ETYPE x = *ptr;				\
+    __UINT64_TYPE__ value = (ETYPE) x;		\
+    LHS = value;				\
+    asm volatile ("" :: "w" (y));		\
+  }
+
+/*
+** f_di_qi_0:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[7\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
+
+/*
+** f_di_qi_1:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[4\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
+
+/*
+** f_di_qi_2:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[3\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
+
+/*
+** f_di_qi_3:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[0\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
+
+/*
+** f_di_hi_0:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[3\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
+
+/*
+** f_di_hi_1:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[2\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
+
+/*
+** f_di_hi_2:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[1\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
+
+/*
+** f_di_hi_3:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[0\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
+
+/*
+** f_di_si_0:
+**	ldr	s([0-9]+), \[x0\]
+**	ins	v1\.s\[1\], v\1\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_0, di_si, unsigned int, y.i[0])
+
+/*
+** f_di_si_1:
+**	ldr	s([0-9]+), \[x0\]
+**	ins	v1\.s\[0\], v\1\.s\[0\]
+**	ret
+*/
+TEST (f_di_si_1, di_si, unsigned int, y.i[1])
+
+/*
+** f_si_qi_0:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[3\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
+
+/*
+** f_si_qi_1:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[2\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
+
+/*
+** f_si_qi_2:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[1\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
+
+/*
+** f_si_qi_3:
+**	ldr	b([0-9]+), \[x0\]
+**	ins	v1\.b\[0\], v\1\.b\[0\]
+**	ret
+*/
+TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
+
+/*
+** f_si_hi_0:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[1\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
+
+/*
+** f_si_hi_1:
+**	ldr	h([0-9]+), \[x0\]
+**	ins	v1\.h\[0\], v\1\.h\[0\]
+**	ret
+*/
+TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
new file mode 100644
index 00000000000..6debf5419cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
@@ -0,0 +1,139 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mlittle-endian --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+struct di_qi_1 { unsigned char c[4]; unsigned int x; };
+struct di_qi_2 { unsigned int x; unsigned char c[4]; };
+
+struct di_hi_1 { unsigned short s[2]; unsigned int x; };
+struct di_hi_2 { unsigned int x; unsigned short s[2]; };
+
+struct di_si { unsigned int i[2]; };
+
+struct si_qi_1 { unsigned char c[2]; unsigned short x; };
+struct si_qi_2 { unsigned short x; unsigned char c[2]; };
+
+struct si_hi { unsigned short s[2]; };
+
+#define TEST(NAME, STYPE, ETYPE, LHS)		\
+  void						\
+  NAME (void)					\
+  {						\
+    register struct STYPE y asm ("v1");		\
+    register ETYPE x asm ("x0");		\
+    asm volatile ("" : "=w" (y), "=r" (x));	\
+    LHS = x;					\
+    asm volatile ("" :: "w" (y));		\
+  }
+
+/*
+** f_di_qi_0:
+**	ins	v1\.b\[0\], w0
+**	ret
+*/
+TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
+
+/*
+** f_di_qi_1:
+**	ins	v1\.b\[3\], w0
+**	ret
+*/
+TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
+
+/*
+** f_di_qi_2:
+**	ins	v1\.b\[4\], w0
+**	ret
+*/
+TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
+
+/*
+** f_di_qi_3:
+**	ins	v1\.b\[7\], w0
+**	ret
+*/
+TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
+
+/*
+** f_di_hi_0:
+**	ins	v1\.h\[0\], w0
+**	ret
+*/
+TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
+
+/*
+** f_di_hi_1:
+**	ins	v1\.h\[1\], w0
+**	ret
+*/
+TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
+
+/*
+** f_di_hi_2:
+**	ins	v1\.h\[2\], w0
+**	ret
+*/
+TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
+
+/*
+** f_di_hi_3:
+**	ins	v1\.h\[3\], w0
+**	ret
+*/
+TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
+
+/*
+** f_di_si_0:
+**	ins	v1\.s\[0\], w0
+**	ret
+*/
+TEST (f_di_si_0, di_si, unsigned int, y.i[0])
+
+/*
+** f_di_si_1:
+**	ins	v1\.s\[1\], w0
+**	ret
+*/
+TEST (f_di_si_1, di_si, unsigned int, y.i[1])
+
+/*
+** f_si_qi_0:
+**	ins	v1\.b\[0\], w0
+**	ret
+*/
+TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
+
+/*
+** f_si_qi_1:
+**	ins	v1\.b\[1\], w0
+**	ret
+*/
+TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
+
+/*
+** f_si_qi_2:
+**	ins	v1\.b\[2\], w0
+**	ret
+*/
+TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
+
+/*
+** f_si_qi_3:
+**	ins	v1\.b\[3\], w0
+**	ret
+*/
+TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
+
+/*
+** f_si_hi_0:
+**	ins	v1\.h\[0\], w0
+**	ret
+*/
+TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
+
+/*
+** f_si_hi_1:
+**	ins	v1\.h\[1\], w0
+**	ret
+*/
+TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])
diff --git a/gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c
new file mode 100644
index 00000000000..cb8af6b0623
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c
@@ -0,0 +1,139 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mbig-endian --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+struct di_qi_1 { unsigned char c[4]; unsigned int x; };
+struct di_qi_2 { unsigned int x; unsigned char c[4]; };
+
+struct di_hi_1 { unsigned short s[2]; unsigned int x; };
+struct di_hi_2 { unsigned int x; unsigned short s[2]; };
+
+struct di_si { unsigned int i[2]; };
+
+struct si_qi_1 { unsigned char c[2]; unsigned short x; };
+struct si_qi_2 { unsigned short x; unsigned char c[2]; };
+
+struct si_hi { unsigned short s[2]; };
+
+#define TEST(NAME, STYPE, ETYPE, LHS)		\
+  void						\
+  NAME (void)					\
+  {						\
+    register struct STYPE y asm ("v1");		\
+    register ETYPE x asm ("x0");		\
+    asm volatile ("" : "=w" (y), "=r" (x));	\
+    LHS = x;					\
+    asm volatile ("" :: "w" (y));		\
+  }
+
+/*
+** f_di_qi_0:
+**	ins	v1\.b\[7\], w0
+**	ret
+*/
+TEST (f_di_qi_0, di_qi_1, unsigned char, y.c[0])
+
+/*
+** f_di_qi_1:
+**	ins	v1\.b\[4\], w0
+**	ret
+*/
+TEST (f_di_qi_1, di_qi_1, unsigned char, y.c[3])
+
+/*
+** f_di_qi_2:
+**	ins	v1\.b\[3\], w0
+**	ret
+*/
+TEST (f_di_qi_2, di_qi_2, unsigned char, y.c[0])
+
+/*
+** f_di_qi_3:
+**	ins	v1\.b\[0\], w0
+**	ret
+*/
+TEST (f_di_qi_3, di_qi_2, unsigned char, y.c[3])
+
+/*
+** f_di_hi_0:
+**	ins	v1\.h\[3\], w0
+**	ret
+*/
+TEST (f_di_hi_0, di_hi_1, unsigned short, y.s[0])
+
+/*
+** f_di_hi_1:
+**	ins	v1\.h\[2\], w0
+**	ret
+*/
+TEST (f_di_hi_1, di_hi_1, unsigned short, y.s[1])
+
+/*
+** f_di_hi_2:
+**	ins	v1\.h\[1\], w0
+**	ret
+*/
+TEST (f_di_hi_2, di_hi_2, unsigned short, y.s[0])
+
+/*
+** f_di_hi_3:
+**	ins	v1\.h\[0\], w0
+**	ret
+*/
+TEST (f_di_hi_3, di_hi_2, unsigned short, y.s[1])
+
+/*
+** f_di_si_0:
+**	ins	v1\.s\[1\], w0
+**	ret
+*/
+TEST (f_di_si_0, di_si, unsigned int, y.i[0])
+
+/*
+** f_di_si_1:
+**	ins	v1\.s\[0\], w0
+**	ret
+*/
+TEST (f_di_si_1, di_si, unsigned int, y.i[1])
+
+/*
+** f_si_qi_0:
+**	ins	v1\.b\[3\], w0
+**	ret
+*/
+TEST (f_si_qi_0, si_qi_1, unsigned char, y.c[0])
+
+/*
+** f_si_qi_1:
+**	ins	v1\.b\[2\], w0
+**	ret
+*/
+TEST (f_si_qi_1, si_qi_1, unsigned char, y.c[1])
+
+/*
+** f_si_qi_2:
+**	ins	v1\.b\[1\], w0
+**	ret
+*/
+TEST (f_si_qi_2, si_qi_2, unsigned char, y.c[0])
+
+/*
+** f_si_qi_3:
+**	ins	v1\.b\[0\], w0
+**	ret
+*/
+TEST (f_si_qi_3, si_qi_2, unsigned char, y.c[1])
+
+/*
+** f_si_hi_0:
+**	ins	v1\.h\[1\], w0
+**	ret
+*/
+TEST (f_si_hi_0, si_hi, unsigned short, y.s[0])
+
+/*
+** f_si_hi_1:
+**	ins	v1\.h\[0\], w0
+**	ret
+*/
+TEST (f_si_hi_1, si_hi, unsigned short, y.s[1])